All Topics
All Topics
Technology
Technology
Design
Design
Programming
Programming
Science
Science
News
News
Gaming
Gaming
Entertainment
Entertainment
Business
Business
Finance
Finance
Sports
Sports
Health
Health
Food
Food
Travel
Travel
Art
Art
Music
Music
Books
Books
Education
Education
Politics
Politics
Personal
Personal
No algorithm. No AI slop. No ads. Just RSS. Pro-human. Indie writers. Real journalism. Open web. Chronological. Hand toasted.

ICLR 2026 Affiliation Dataset: PDF-derived institutional data for 5,356 accepted papers with treemap visualizations

By

stared

17d ago· 5 min readenCode

Summary

A GitHub repository provides an end-to-end pipeline that extracts institutional affiliations from the PDF title blocks of 5,356 ICLR 2026 accepted papers, avoiding the OpenReview profile drift problem. The project delivers a clean dataset (CSV + XLSX) and treemap visualizations showing which institutions are shaping AI research, with affiliations sourced directly from paper PDFs rather than author profiles.

Key quotes

· 3 pulled
This avoids the OpenReview-profile drift problem (where authors' current job appears on every paper they ever wrote — e.g. listing Wyoming as the affiliation for a paper actually written at UBC).
Affiliations come from the paper's title block PDF, not from author profiles.
End-to-end pipeline that turns 5,356 ICLR 2026 accepted papers into a clean, PDF-derived institutional-affiliation dataset and a publication-ready treemap of who is shaping AI research right now.
Snippet from the RSS feed
PDF-derived institutional affiliations for 5,356 ICLR 2026 accepted papers — full pipeline (scrape → parse → render), clean dataset (CSV + XLSX), and treemap charts. - DmytroLopushanskyy/iclr2026-a...

You might also wanna read