All Topics
All Topics
Technology
Technology
Design
Design
Programming
Programming
Science
Science
News
News
Gaming
Gaming
Entertainment
Entertainment
Business
Business
Finance
Finance
Sports
Sports
Health
Health
Food
Food
Travel
Travel
Art
Art
Music
Music
Books
Books
Education
Education
Politics
Politics
Personal
Personal
No algorithm. No AI slop. No ads. Just RSS. Pro-human. Indie writers. Real journalism. Open web. Chronological. Hand toasted.

Using Curriculum Learning and PufferLib to Train Superhuman AI Agents for 2048 and Tetris

By

a1k0n

5mo ago· 5 min readenInsight

Summary

The article describes using PufferLib, a reinforcement learning framework, to train gaming agents that achieve superhuman performance in 2048 and Tetris. The author details how curriculum learning (gradually increasing difficulty) and Pareto sweeps (systematic hyperparameter optimization) enabled a 15MB policy to beat massive search-based solutions in 2048 after just 75 minutes of training. The article also discusses discovering that bugs in Tetris can become features for AI agents, and emphasizes the importance of speed, iteration, and systematic experimentation in reinforcement learning research.

Key quotes

· 5 pulled
PufferLib allows anyone with a gaming computer to play the RL game, but getting from 'pretty good' to 'superhuman' requires tweaking every lever, repeatedly.
This is the story of how I trained agents that beat massive (few-TB) search-based solutions on 2048 using a 15MB policy trained for 75 minutes and discovered that bugs can be features in Tetris.
Training gaming agents is an addictive game. A game of sleepless nights, grinds, explorations, sweeps, and prayers.
PufferLib's C-based environments run at 1M+ st
TLDR? PufferLib, Pareto sweeps, and curriculum learning.
Snippet from the RSS feed
Training gaming agents is an addictive game. A game of sleepless nights, grinds, explorations, sweeps, and prayers. PufferLib allows anyone with a gaming computer to play the RL game, but getting from “pretty good” to “superhuman” requires tweaking every

You might also wanna read