All Topics
All Topics
Technology
Technology
Design
Design
Programming
Programming
Science
Science
News
News
Gaming
Gaming
Entertainment
Entertainment
Business
Business
Finance
Finance
Sports
Sports
Health
Health
Food
Food
Travel
Travel
Art
Art
Music
Music
Books
Books
Education
Education
Politics
Politics
Personal
Personal
No algorithm. No AI slop. No ads. Just RSS. Pro-human. Indie writers. Real journalism. Open web. Chronological. Hand toasted.

Transformers Can Learn to Predict Permuted Congruential Generator Sequences Through Curriculum Learning and Scaling Laws

By

[Submitted on 30 Oct 2025 (v1), last revised 16 Feb 2026 (this version, v2)]

28d ago· 2 min readenInsight

Summary

This research paper investigates whether Transformer models can learn to predict sequences generated by Permuted Congruential Generators (PCGs), a family of pseudorandom number generators more complex than linear congruential generators (LCGs). The authors demonstrate that Transformers can successfully perform in-context prediction on unseen PCG sequences, even surpassing classical attack capabilities. Key findings include: (1) models can predict outputs even when truncated to a single bit; (2) models can jointly learn multiple distinct PRNGs simultaneously; (3) a scaling law exists where the number of in-context elements needed for near-perfect prediction grows as the square root of the modulus; (4) learning large moduli (≥2^20) requires curriculum learning with smaller moduli data; and (5) a novel clustering phenomenon emerges in embedding layers where integer inputs form bitwise rotationally-invariant clusters, enabling transfer learning from smaller to larger moduli.

Key quotes

· 5 pulled
We show that Transformers can nevertheless successfully perform in-context prediction on unseen sequences from diverse PCG variants, in tasks that are beyond published classical attacks.
Surprisingly, we find even when the output is truncated to a single bit, it can be reliably predicted by the model.
We demonstrate a scaling law with modulus m: the number of in-context sequence elements required for near-perfect prediction grows as √m.
For larger moduli, optimization enters extended stagnation phases; in our experiments, learning moduli m ≥ 2^20 requires incorporating training data from smaller moduli, demonstrating a critical necessity for curriculum learning.
We analyze embedding layers and uncover a novel clustering phenomenon: the top principal components spontaneously group the integer inputs into bitwise rotationally-invariant clusters, revealing how representations can transfer from smaller to larger moduli.
Snippet from the RSS feed
We study the ability of Transformer models to learn sequences generated by Permuted Congruential Generators (PCGs), a widely used family of pseudo-random number generators (PRNGs). PCGs introduce substantial additional difficulty over linear congruential

You might also wanna read