How replacing a 3 GB SQLite database with a 10 MB FST binary achieved a 300x memory reduction for a Finnish-English dictionary
By
hiAndrewQuinn
The bagel they save for the regulars. Don't skim, savour.
Summary
A developer describes replacing a 3 GB SQLite database for a Finnish-English dictionary with a 10 MB finite state transducer (FST) binary, achieving a ~300x memory reduction. The article details the technical journey of moving from a hacked-together database solution to a specialized, static data structure optimized for incremental search. Key insights include using FSTs for efficient string lookup, the trade-offs between dynamic databases and static specialized structures, and the importance of choosing the right tool for specific use cases rather than defaulting to familiar solutions.
Key quotes
· 3 pulledIt's much more valuable to walk away with the heuristic 'some dude got a 300x memory reduction by swapping out a database he hacked together for a tiny, static, specialized data structure that does exactly what he needs it to and no more.'
I found myself with an increasingly rare opportunity to work this weekend on Taskusanakirja, also often called tsk, a Finnish-English dictionary with incremental search
All numbers have been rounded to their first significant digit, because I'm a fan of Rob Eastaway's 'zequals' method of getting to the point when it comes to estimation.
You might also wanna read
NVIDIA Announces "Hack for Impact" London Event for Autonomous AI Agent Development
NVIDIA is hosting a "Hack for Impact" event in London, challenging participants to build autonomous agentic applications using open-source m
Four practical steps to control Azure Foundry token costs for agentic AI workloads
This article provides practical guidance on controlling token costs in Microsoft Azure Foundry, particularly for agentic AI workloads where
MerLean-Prover: A Recursive Agent Harness for Lean 4 Theorem Proving Outperforms Baselines
MerLean-Prover is an end-to-end Lean4 theorem prover that replaces 'sorry' declarations with kernel-checkable proofs using three agent types
Why small pull request policies can backfire on software quality
The article critiques a common software engineering policy that limits pull requests (PRs) to small sizes (e.g., 500 lines, few files). Whil
apenwarr.ca·6h agoHow Anthropic contains Claude's expanding access across its products
Anthropic describes how it has evolved its approach to granting Claude, its AI assistant, increasingly broad access to internal systems over
Testing Cursor's Jira integration: How ticket quality affects AI agent performance
Cursor launched a Jira integration that lets developers assign tickets directly to an AI agent, eliminating context switching. The author te
bit.ly·7h ago