







The article presents δ-mem, a lightweight memory mechanism for large language models that augments frozen full-attention backbones with a compact online state of associative memory. It compresses past information into a fixed-size state matrix updated by delta-rule learning, gene
This research paper investigates whether Transformer models can learn to predict sequences generated by Permuted Congruential Generators (PCGs), a family of pseudorandom number generators more complex than linear congruential generators (LCGs). The authors demonstrate that Transf






