All Topics
All Topics
Technology
Technology
AI
AI
Business
Business
Entertainment
Entertainment
News
News
Programming
Programming
Security
Security
Science
Science
Design
Design
Environment
Environment
Finance
Finance
Crypto
Crypto
Politics
Politics
Sports
Sports
Education
Education
Gaming
Gaming
Art
Art
Music
Music
Health
Health
Books
Books
Food
Food
Travel
Travel
Personal
Personal
Bluesky
Twitter

Schmidhuber claims 1991 paper introduced first linear-scaling Transformer variant, predating Google's 2017 model

By

@SchmidhuberAI

10d ago· 26 min readenOpinion

Summary

Jürgen Schmidhuber claims priority for the Transformer architecture, stating he published a first variant called the "unnormalised linear Transformer" (ULTRA) in March 1991, decades before Google's 2017 Transformer paper. The key distinction is that ULTRA's computational costs scale linearly with input size, whereas Google's 2017 Transformer scales quadratically. This is presented as a historical correction to the AI narrative around who invented key components of modern large language models like ChatGPT.

Source

Twitter / XSchmidhuber claims 1991 paper introduced first linear-scaling Transformer variant, predating Google's 2017 modelpeople.idsia.ch

Key quotes

· 3 pulled
In March 1991, when compute was millions of times more expensive than today, even before the LSTM, Schmidhuber published a first Transformer variant, which is now called the unnormalised linear Transformer.
ULTRA's computational costs scale linearly in input size, rather than quadratically.
The T in ChatGPT[GPT3] stands for an artificial neural network (NN) called Transformer.
Snippet from the RSS feed
the 1991 Transformer variant scales linearly; the 2017 Transformer scales quadratically

You might also wanna read

Comments

Sign in to join the conversation.

No comments yet. Be the first.