All Topics
All Topics
Technology
Technology
Design
Design
Programming
Programming
Science
Science
News
News
Gaming
Gaming
Entertainment
Entertainment
Business
Business
Finance
Finance
Sports
Sports
Health
Health
Food
Food
Travel
Travel
Art
Art
Music
Music
Books
Books
Education
Education
Politics
Politics
Personal
Personal
No algorithm. No AI slop. No ads. Just RSS. Pro-human. Indie writers. Real journalism. Open web. Chronological. Hand toasted.

Understanding Bias in Linear Least Squares Regression with Simple Test Data

By

azeemba

4mo ago· 5 min readenInsight

Summary

This article explains why linear least squares regression can appear biased when applied to simple test data, focusing on the underlying assumptions of ordinary least squares (OLS) models. The author discusses how OLS assumes independence between X and error terms, and that X is known with certainty. The analysis shows that when data points perfectly fit a regression line with zero error terms, principal component analysis would align with the regression line direction, matching intuitive expectations about the relationship between variables.

Key quotes

· 5 pulled
The answer lies in the data generating process.
the simple OLS model assumes Y = β₀ + β₁X + ε and that X and ε are independent.
It also assumes that we know X with certainty.
Because all the data points lie along the regression line, there is no variance in any other direction.
So if we were to do PCA on this, the principal component would be in exactly the same direction as our regression line, matching your intuition.
Snippet from the RSS feed
I used python to generate a correlated data set for testing, and then plotted a basic linear least-squares fit. The result looked a bit strange to me, because the line doesn't really seem to pass &...

You might also wanna read

Critique of heritability estimates: How redefining the concept inflates life span heritability figures

This article critically examines the concept of heritability in human life span, arguing that the commonly cited figure of 50% heritability

dynomight.net·19d ago

Critique of P-Value Circling and Misuse of Statistical Significance Thresholds

The article critiques the misuse of p-values in statistical analysis, specifically addressing the practice of 'p-value circling' where resea

vilgot-huhn.github.io·5mo ago

Data Interpretation Challenges: Analyzing Contra Dance Attendance and Mask Requirements

Jeff Kaufman analyzed contra dance attendance data in relation to surgical mask requirements, comparing it with survey data to validate find

entropicthoughts.com·8mo ago

The Statistical Reality: Why Everything Is Correlated and Its Implications for Research

This article explores the statistical and psychological observation that all real-world variables exhibit non-zero correlations, challenging

gwern.net·9mo ago

Why Global Statistics Matter More Than Personal Experience

The article emphasizes the limitations of relying solely on personal experience to understand the world, advocating for the use of carefully

ourworldindata.org·10mo ago

King's College London seeks Research Associate for epigenetic study of Parkinson's and Alzheimer's

King's College London is hiring a Research Associate to investigate epigenetic regulation in neurodegenerative diseases like Parkinson's and

dementiaresearcher.nihr.ac.uk·1h ago