Technology

Art

Schmidhuber claims 1991 paper introduced first linear-scaling Transformer variant, predating Google's 2017 model

@SchmidhuberAI

10d ago· 26 min readenOpinion

technology science deep learning ai history

Summary

Jürgen Schmidhuber claims priority for the Transformer architecture, stating he published a first variant called the "unnormalised linear Transformer" (ULTRA) in March 1991, decades before Google's 2017 Transformer paper. The key distinction is that ULTRA's computational costs scale linearly with input size, whereas Google's 2017 Transformer scales quadratically. This is presented as a historical correction to the AI narrative around who invented key components of modern large language models like ChatGPT.

Source

Twitter / XSchmidhuber claims 1991 paper introduced first linear-scaling Transformer variant, predating Google's 2017 modelpeople.idsia.ch

Key quotes

· 3 pulled

In March 1991, when compute was millions of times more expensive than today, even before the LSTM, Schmidhuber published a first Transformer variant, which is now called the unnormalised linear Transformer.

ULTRA's computational costs scale linearly in input size, rather than quadratically.

The T in ChatGPT[GPT3] stands for an artificial neural network (NN) called Transformer.

Snippet from the RSS feed

the 1991 Transformer variant scales linearly; the 2017 Transformer scales quadratically

You might also wanna read

Jürgen Schmidhuber: Pioneer of LSTM Networks in AI

Professor Jürgen Schmidhuber, a pioneer in artificial intelligence, discussed his groundbreaking work on Long Short-Term Memory (LSTM) netwo

jazzyear.com·1y ago

Complete Timeline of Large Language Models: From Transformers (2017) to Current Models

A comprehensive timeline tracking the evolution of Large Language Models from the original Transformer (2017) through major models like Chat

llm-timeline.com·4mo ago

New Method Enables Constant-Cost Self-Attention Computation for Transformers

Researchers present a novel mathematical approach to compute self-attention in Transformer AI models with constant cost per token, rather th

arxiv.org·5mo ago

The growing complexity of modern LLM architectures: From Llama to Nemotron

The article discusses how LLM architectures have evolved from the clean, simple Transformer stacks of Llama (2022-2023) to much more complex

ianbarber.blog·15d ago

Transformer Co-Creator Criticizes AI Research Narrow Focus, Moves Beyond His Own Technology

Llion Jones, co-author of the seminal 2017 "Attention Is All You Need" paper that introduced transformer technology, has publicly criticized

venturebeat.com·8mo ago

IBM Patents Implementation of 200-Year-Old Mathematical Technique Using PyTorch

IBM has been granted a patent for implementing a 200-year-old mathematical technique (generalized continued fractions and series transformat

leetarxiv.substack.com·7mo ago

Comments

No comments yet. Be the first.