Data-driven unification of 72 biomedical publication types and study designs into a hierarchical rubric
By
Smalheiser, Neil R, Menke, Joe D, Holt, Arthur W
Summary
This article presents a data-driven approach to unifying 72 biomedical publication types and study designs (PTs) into a single rubric and hierarchy. The researchers computed pairwise similarities of each PT against all others to form a similarity matrix, then performed hierarchical clustering to categorize them. Spearman correlations among PT pairs ranged from strongly negative to strongly positive (−0.732 to +0.997), with a mean of 0.176. The analysis yielded 13 clusters of PTs and 5 broader categories, providing a unified framework for classifying biomedical research types.
Source

Key quotes
· 4 pulledOur goal is to unify the 72 biomedical publication types and study designs (collectively, PTs) into a single rubric and hierarchy.
This is carried out in a data-driven manner by computing pairwise similarities of each PT against all others to form a similarity matrix.
Spearman correlations among PT pairs ranged from strongly negative to strongly positive (−0.732 to +0.997), with a mean of 0.176.
Overall, we obtained 13 clusters of PTs and 5 broader categories.
You might also wanna read
Statistical Analysis Reveals DSM-5 Disorders Don't Align with Natural Symptom Clusters
A groundbreaking study published in Clinical Psychological Science uses statistical clustering methods to analyze DSM-5 psychiatric symptoms
Institutional Books: A 242B token dataset from Harvard Library's collections
ICLR 2026 Affiliation Dataset: PDF-derived institutional data for 5,356 accepted papers with treemap visualizations
A GitHub repository provides an end-to-end pipeline that extracts institutional affiliations from the PDF title blocks of 5,356 ICLR 2026 ac
GPTZero Analysis Finds 100+ Hallucinations in NeurIPS 2025 Accepted Papers
GPTZero's analysis of 4,841 papers accepted by NeurIPS 2025 reveals at least 100 papers contain confirmed hallucinations, including fabricat
gptzero.me·5mo agoResearch on Hierarchical JSON Representations for Preserving Scientific Sentence Meaning
This research paper investigates whether structured hierarchical JSON representations can effectively preserve the meaning of scientific sen
Research Team Collects 10,000 Hours of Neuro-Language Data for Thought-to-Text Models
A research team has collected approximately 10,000 hours of neuro-language data from thousands of individuals over six months, claiming it t

Comments
Sign in to join the conversation.
No comments yet. Be the first.