ICLR 2026 Affiliation Dataset: PDF-derived institutional data for 5,356 accepted papers with treemap visualizations
By
stared
Kettled twice. Extra chewy, extra trustworthy.
Summary
A GitHub repository provides an end-to-end pipeline that extracts institutional affiliations from the PDF title blocks of 5,356 ICLR 2026 accepted papers, avoiding the OpenReview profile drift problem. The project delivers a clean dataset (CSV + XLSX) and treemap visualizations showing which institutions are shaping AI research, with affiliations sourced directly from paper PDFs rather than author profiles.
Key quotes
· 3 pulledThis avoids the OpenReview-profile drift problem (where authors' current job appears on every paper they ever wrote — e.g. listing Wyoming as the affiliation for a paper actually written at UBC).
Affiliations come from the paper's title block PDF, not from author profiles.
End-to-end pipeline that turns 5,356 ICLR 2026 accepted papers into a clean, PDF-derived institutional-affiliation dataset and a publication-ready treemap of who is shaping AI research right now.
You might also wanna read
Talkie: A 13B Vintage Language Model Trained on 1930s Texts
This article introduces "talkie-1930-13b-it," a vintage language model trained on texts from the 1930s, designed to simulate conversation wi
Introduction to Machine Learning: Visual Guide to Classification with Home Data Example
This article provides an introductory, visual explanation of machine learning concepts using a practical example of classifying homes in New
Introduction to Decision Trees: Understanding Entropy and Information Gain in Machine Learning
This article provides an introduction to decision trees, focusing on entropy and information gain concepts in machine learning. It explains
mlu-explain.github.io·3mo agoAnthropic's AI Fluency Index: Measuring How People Develop AI Collaboration Skills
Anthropic's AI Fluency Index report examines how people develop AI collaboration skills through observable behaviors in Claude.ai conversati
Measuring Data Processing Effectiveness: Defining Insight and Compression Efficiency
The article discusses methods for measuring how much data a person can effectively process or understand, focusing on defining 'insight' as
Research on AI Assistance and Coding Skill Development: Productivity vs. Learning Trade-offs
The article discusses research on how AI assistance impacts coding skill development, examining whether AI tools that increase productivity
