A 13-year-old's $500 journey training a custom language model on a Mamba-2 backbone

Faris Allafi · July 2026 · Model: hr-diffuse-1-nano on Hugging Face · Discussion on Hacker News

6h ago· 13 min readenInsight

technology machine learning programming ai research

Summary

A 13-year-old developer documents their journey training a custom language model (DIMBA II) using a masked diffusion approach on a bidirectional Mamba-2 backbone, spending $500 of their own money. The model deliberately produces incorrect answers (e.g., "capital of Japan is Paris") as a design choice. The article covers six failed self-correction methods attempted during training, honest results, and technical insights into transformer architecture, diffusion models, and the Mamba architecture. It's a blend of technical report and personal narrative about the challenges of training LLMs from scratch.

Source

Hacker NewsA 13-year-old's $500 journey training a custom language model on a Mamba-2 backbonehamiltonianresearch.xyz

Key quotes

· 4 pulled

I am 13, and I spent hours of my time, and my own money, to train a language model that thinks the capital of Japan is Paris.

First thing you should know: contrary to common belief, the capital of Japan is in fact Tokyo.

You might think I am just building another ChatGPT wrapper, and that could not be farther from the truth.

The transformer architecture, popularized by the paper Attention Is All You Need (Vaswani et al., 2017), is the current SOTA architecture in LLMs.

Snippet from the RSS feed

DIMBA II: masked diffusion on a bidirectional Mamba-2 backbone, trained for $500. Honest results, six failed self-correction methods, and one dial.

You might also wanna read

iLLaDA: An 8B Masked Diffusion Language Model Trained with Bidirectional Attention

The paper introduces iLLaDA, an 8-billion parameter masked diffusion language model trained from scratch with fully bidirectional attention,

arxiv.org·10d ago

Mamba Explained: How State Space Models Challenge Transformer Dominance in AI

Mamba is a novel AI model based on State Space Models (SSMs) that emerges as a formidable alternative to Transformer models. It addresses th

thegradient.pub·13d ago

LLMs Can Describe Their Own Internal Decision-Making Processes, New Research Shows

This research paper demonstrates that large language models (LLMs) can accurately describe their own internal decision-making processes. The

arxiv.org·28d ago

BabyVision Benchmark Reveals MLLMs Fail at Basic Visual Tasks That 3-Year-Olds Can Solve

This paper introduces BabyVision, a benchmark designed to assess core visual reasoning abilities in Multimodal LLMs (MLLMs) independent of l

arxiv.org·12d ago

New Chinese AI models and Liquid Foundation Models push LLM efficiency and reasoning forward

The article discusses recent developments in language models, highlighting new Chinese models from StepFun and MiniMax that offer affordable

heise.de·22d ago

Australian startup Springboards launches Flint, an LLM trained to break out of AI groupthink for creative tasks

Most large language models suffer from "groupthink" — producing predictable, similar responses to open-ended questions. Australian startup S

MIT Technology Review·21h ago

Australian startup Springboards launches Flint, an LLM trained to break out of AI groupthink for creative tasks

Most large language models suffer from "groupthink" — producing predictable, similar responses to open-ended questions. Australian startup S

technologyreview.com·21h ago

Australian startup Springboards launches Flint, an LLM trained to break out of AI groupthink for creative tasks

Most large language models suffer from "groupthink" — producing predictable, similar responses to open-ended questions. Australian startup S

technologyreview.com·21h ago

Comments

No comments yet. Be the first.