PostgreSQL work_mem Configuration Pitfalls: How Low Memory Settings Can Cause Catastrophic Outages
By
enz
If you only eat one bagel today, this is the bagel.
Summary
The article discusses a critical PostgreSQL database performance issue where a production cluster with 2 TB of RAM was killed by the OOM (Out of Memory) killer despite having work_mem set to only 2 MB. The author investigates this counterintuitive scenario where a seemingly low memory setting caused catastrophic failure, exploring how PostgreSQL's work_mem parameter interacts with query execution, sorting operations, and memory allocation. The piece serves as a technical analysis of database memory management pitfalls and provides insights for database administrators on properly configuring PostgreSQL memory settings to avoid production outages.
Key quotes
· 5 pulledHer production cluster had just been killed by the OOM killer after eating 2 TB of RAM. work_mem was set to 2 MB.
Something didn't add up. Hetty, like me, likes playing with monster hardware. 2 TB of RAM is not unusual in her world.
But losing the whole cluster to a single query during peak operations is a very different kind of problem from a 3am outage.
The work_mem setting is a trap because it's not what you think it is. It's not a limit on how much memory a query can use.
When PostgreSQL needs to sort data, it can use work_mem for each sort operation, and if you have multiple sorts in a query, each can allocate work_mem separately.
You might also wanna read
PostgreSQL VACUUM Limitations: Why Indexes Become Bloated and How to Fix Them
This article debunks common misconceptions about PostgreSQL's VACUUM command, explaining that while VACUUM cleans dead tuples from tables, i
PostgreSQL Locks: Deadlocks, Object-Level Locks, and Predicate Locks
This technical article is part of a series on PostgreSQL locks, focusing on three main topics: deadlocks, remaining object-level locks, and
PostgreSQL Double Buffering: Why OS RAM and Buffer Cache Compete and How the 25% Rule Helps
This article explains the problem of double buffering in PostgreSQL, where the OS filesystem cache and PostgreSQL's shared_buffers both cach
PostgreSQL Transaction ID Wraparound Incident: A Production Database Outage Case Study
This article details a real-world PostgreSQL production incident caused by transaction ID wraparound, a critical database failure mode. The
Performance Implications of UUID Version 4 as Primary Keys in PostgreSQL
The article discusses performance issues with using UUID Version 4 as primary keys in PostgreSQL databases, explaining that random UUIDs cau

PostgreSQL Data Modeling: Comparing Lookup Tables vs Enum Types for Limited Value Columns
This technical article compares three approaches for implementing string columns with limited distinct values in PostgreSQL: simple strings
