Theoretical Analysis Reveals Why Linear RNNs Are More Parallelizable Than Nonlinear RNNs

[Submitted on 4 Mar 2026 (v1), last revised 1 Jun 2026 (this version, v3)]

6d ago· 2 min readenInsight

75/100

Toasty

Bagelometer↗

Lightly toasted, lightly seasoned, mostly correct.

Score75TypeanalysisSentimentneutral

Summary

This paper establishes a theoretical connection between types of RNNs and standard complexity classes to explain why linear RNNs (LRNNs) are more parallelizable than traditional nonlinear RNNs. The authors show that LRNNs can be viewed as log-depth arithmetic circuits (slight depth overhead vs. transformers' boolean circuits), while nonlinear RNNs can solve L-complete and even P-complete problems, creating a fundamental barrier to parallelization. The theory also identifies fine-grained expressivity differences between LRNN variants (permutation-diagonal vs. diagonal-plus-low-rank) and associates each RNN type with corresponding automata-theoretic models. The work aims to provide a foundation for designing LLM architectures that balance expressivity and parallelism.

Key quotes

· 4 pulled

We show that LRNNs can be viewed as log-depth (bounded fan-in) arithmetic circuits, which represents only a slight depth overhead relative to log-depth boolean circuits that transformers admit.

We show that nonlinear RNNs can solve L-complete problems (and even P-complete ones, under polynomial precision), revealing a fundamental barrier to parallelizing them as efficiently as transformers.

Our theory also identifies fine-grained expressivity differences between recent popular LRNN variants: permutation-diagonal LRNNs are NC1-complete whereas diagonal-plus-low-rank LRNNs are more expressive (PNC1-complete).

Together, our results reveal fundamental tradeoffs between nonlinear RNNs and different variants of LRNNs, providing a foundation for designing LLM architectures that achieve an optimal balance between expressivity and parallelism.

Snippet from the RSS feed

The community is increasingly exploring linear RNNs (LRNNs) as language models, motivated by their expressive power and parallelizability. While prior work establishes the expressivity benefits of LRNNs over transformers, it is unclear what makes LRNNs --

You might also wanna read

Study Reveals Convergent Evolution in How Language Models Learn Number Representations

This research paper investigates how different language models (Transformers, Linear RNNs, LSTMs, and classical word embeddings) learn to re

arxiv.org·1mo ago

GPU Programming Project: Implementing Parallelizable RNNs with CUDA

A student's final project for CS179: GPU Programming implementing the paper "Were RNNs All We Needed?" by Feng et al. The project focuses on

dhruvmsheth.github.io·8mo ago

The surprising structural similarities between neural networks and cryptographic ciphers

This article explores the structural and algorithmic similarities between neural network architectures (like recurrent neural networks) and

Reiner’s webpage and articles.·1mo ago

Transformers Can Learn to Predict Permuted Congruential Generator Sequences Through Curriculum Learning and Scaling Laws

This research paper investigates whether Transformer models can learn to predict sequences generated by Permuted Congruential Generators (PC

arXiv.org·1mo ago

Understanding Continuous Batching in Large Language Models: From Attention Mechanisms to Throughput Optimization

This technical blog post explains continuous batching in large language models (LLMs) by starting from first principles of attention mechani

huggingface.co·3mo ago

Research Directions for Overcoming Memory and Interconnect Challenges in Large Language Model Inference Hardware

This article discusses the technical challenges of Large Language Model (LLM) inference, highlighting how the autoregressive Decode phase ma

arxiv.org·4mo ago