PostgreSQL VACUUM Limitations: Why Indexes Become Bloated and How to Fix Them
By
birdculture
Hand-rolled, kettle-boiled, baked to perfection. Worth every minute at the bakery.
Summary
This article debunks common misconceptions about PostgreSQL's VACUUM command, explaining that while VACUUM cleans dead tuples from tables, it doesn't properly clean up index bloat. The article details how B-tree indexes become fragmented over time as rows are deleted and updated, leaving behind 'dead' index entries that waste space and degrade performance. It covers the storage anatomy of PostgreSQL, the difference between table and index cleanup, and provides practical solutions for properly maintaining indexes through techniques like REINDEX, CREATE INDEX CONCURRENTLY, and pg_repack.
Key quotes
· 5 pulledVACUUM is lying to you about your indexes
When you delete a row in PostgreSQL, it is just marked as a 'dead tuple'
B-trees fragment and how to fix them properly
Only when all transactions referencing the dead tuple are gone, VACUUM can finally remove it
The dirty secret: VACUUM doesn't clean up indexes the same way it cleans tables
You might also wanna read
PostgreSQL work_mem Configuration Pitfalls: How Low Memory Settings Can Cause Catastrophic Outages
The article discusses a critical PostgreSQL database performance issue where a production cluster with 2 TB of RAM was killed by the OOM (Ou
PostgreSQL Locks: Deadlocks, Object-Level Locks, and Predicate Locks
This technical article is part of a series on PostgreSQL locks, focusing on three main topics: deadlocks, remaining object-level locks, and
PostgreSQL Double Buffering: Why OS RAM and Buffer Cache Compete and How the 25% Rule Helps
This article explains the problem of double buffering in PostgreSQL, where the OS filesystem cache and PostgreSQL's shared_buffers both cach
PostgreSQL Transaction ID Wraparound Incident: A Production Database Outage Case Study
This article details a real-world PostgreSQL production incident caused by transaction ID wraparound, a critical database failure mode. The
Performance Implications of UUID Version 4 as Primary Keys in PostgreSQL
The article discusses performance issues with using UUID Version 4 as primary keys in PostgreSQL databases, explaining that random UUIDs cau

PostgreSQL Data Modeling: Comparing Lookup Tables vs Enum Types for Limited Value Columns
This technical article compares three approaches for implementing string columns with limited distinct values in PostgreSQL: simple strings
