All Topics
All Topics
Technology
Technology
Design
Design
Programming
Programming
Science
Science
News
News
Gaming
Gaming
Entertainment
Entertainment
Business
Business
Finance
Finance
Sports
Sports
Health
Health
Food
Food
Travel
Travel
Art
Art
Music
Music
Books
Books
Education
Education
Politics
Politics
Personal
Personal
No algorithm. No AI slop. No ads. Just RSS. Pro-human. Indie writers. Real journalism. Open web. Chronological. Hand toasted.

Theoretical Analysis Reveals Why Linear RNNs Are More Parallelizable Than Nonlinear RNNs

By

[Submitted on 4 Mar 2026 (v1), last revised 1 Jun 2026 (this version, v3)]

6d ago· 2 min readenInsight

Summary

This paper establishes a theoretical connection between types of RNNs and standard complexity classes to explain why linear RNNs (LRNNs) are more parallelizable than traditional nonlinear RNNs. The authors show that LRNNs can be viewed as log-depth arithmetic circuits (slight depth overhead vs. transformers' boolean circuits), while nonlinear RNNs can solve L-complete and even P-complete problems, creating a fundamental barrier to parallelization. The theory also identifies fine-grained expressivity differences between LRNN variants (permutation-diagonal vs. diagonal-plus-low-rank) and associates each RNN type with corresponding automata-theoretic models. The work aims to provide a foundation for designing LLM architectures that balance expressivity and parallelism.

Key quotes

· 4 pulled
We show that LRNNs can be viewed as log-depth (bounded fan-in) arithmetic circuits, which represents only a slight depth overhead relative to log-depth boolean circuits that transformers admit.
We show that nonlinear RNNs can solve L-complete problems (and even P-complete ones, under polynomial precision), revealing a fundamental barrier to parallelizing them as efficiently as transformers.
Our theory also identifies fine-grained expressivity differences between recent popular LRNN variants: permutation-diagonal LRNNs are NC1-complete whereas diagonal-plus-low-rank LRNNs are more expressive (PNC1-complete).
Together, our results reveal fundamental tradeoffs between nonlinear RNNs and different variants of LRNNs, providing a foundation for designing LLM architectures that achieve an optimal balance between expressivity and parallelism.
Snippet from the RSS feed
The community is increasingly exploring linear RNNs (LRNNs) as language models, motivated by their expressive power and parallelizability. While prior work establishes the expressivity benefits of LRNNs over transformers, it is unclear what makes LRNNs --

You might also wanna read

Study Reveals Convergent Evolution in How Language Models Learn Number Representations

This research paper investigates how different language models (Transformers, Linear RNNs, LSTMs, and classical word embeddings) learn to re

arxiv.org·1mo ago

GPU Programming Project: Implementing Parallelizable RNNs with CUDA

A student's final project for CS179: GPU Programming implementing the paper "Were RNNs All We Needed?" by Feng et al. The project focuses on

dhruvmsheth.github.io·8mo ago

The surprising structural similarities between neural networks and cryptographic ciphers

This article explores the structural and algorithmic similarities between neural network architectures (like recurrent neural networks) and

Reiner’s webpage and articles.·1mo ago

Transformers Can Learn to Predict Permuted Congruential Generator Sequences Through Curriculum Learning and Scaling Laws

This research paper investigates whether Transformer models can learn to predict sequences generated by Permuted Congruential Generators (PC

arXiv.org·1mo ago

Understanding Continuous Batching in Large Language Models: From Attention Mechanisms to Throughput Optimization

This technical blog post explains continuous batching in large language models (LLMs) by starting from first principles of attention mechani

huggingface.co·3mo ago

Research Directions for Overcoming Memory and Interconnect Challenges in Large Language Model Inference Hardware

This article discusses the technical challenges of Large Language Model (LLM) inference, highlighting how the autoregressive Decode phase ma

arxiv.org·4mo ago