Schmidhuber claims 1991 paper introduced first linear-scaling Transformer variant, predating Google's 2017 model
By
@SchmidhuberAI
Summary
Jürgen Schmidhuber claims priority for the Transformer architecture, stating he published a first variant called the "unnormalised linear Transformer" (ULTRA) in March 1991, decades before Google's 2017 Transformer paper. The key distinction is that ULTRA's computational costs scale linearly with input size, whereas Google's 2017 Transformer scales quadratically. This is presented as a historical correction to the AI narrative around who invented key components of modern large language models like ChatGPT.
Source
Key quotes
· 3 pulledIn March 1991, when compute was millions of times more expensive than today, even before the LSTM, Schmidhuber published a first Transformer variant, which is now called the unnormalised linear Transformer.
ULTRA's computational costs scale linearly in input size, rather than quadratically.
The T in ChatGPT[GPT3] stands for an artificial neural network (NN) called Transformer.
You might also wanna read
Jürgen Schmidhuber: Pioneer of LSTM Networks in AI
Professor Jürgen Schmidhuber, a pioneer in artificial intelligence, discussed his groundbreaking work on Long Short-Term Memory (LSTM) netwo
Complete Timeline of Large Language Models: From Transformers (2017) to Current Models
A comprehensive timeline tracking the evolution of Large Language Models from the original Transformer (2017) through major models like Chat
New Method Enables Constant-Cost Self-Attention Computation for Transformers
Researchers present a novel mathematical approach to compute self-attention in Transformer AI models with constant cost per token, rather th
The growing complexity of modern LLM architectures: From Llama to Nemotron
The article discusses how LLM architectures have evolved from the clean, simple Transformer stacks of Llama (2022-2023) to much more complex
ianbarber.blog·15d agoTransformer Co-Creator Criticizes AI Research Narrow Focus, Moves Beyond His Own Technology
Llion Jones, co-author of the seminal 2017 "Attention Is All You Need" paper that introduced transformer technology, has publicly criticized
IBM Patents Implementation of 200-Year-Old Mathematical Technique Using PyTorch
IBM has been granted a patent for implementing a 200-year-old mathematical technique (generalized continued fractions and series transformat

Comments
Sign in to join the conversation.
No comments yet. Be the first.