The PostgreSQL Tuning Playbook I Actually Use

Quick take

Stop copying postgresql.conf snippets from blog posts. Measure your actual workload, change one thing, measure again. Most Postgres performance problems are bad queries, not bad config.

I’ve been running PostgreSQL in production since my fintech startup days, where our workload was 90% reads with occasional write spikes when market data flooded in. Then at Dropbyke in Seoul we had the opposite problem – constant GPS pings from bikes generating a firehose of writes. Same database engine, completely different tuning.

That experience taught me something that most “Postgres tuning guides” skip: there’s no universal config. The defaults ship conservative on purpose. Your job is to understand your workload and nudge the knobs that matter for your traffic pattern.

Here is the playbook I actually follow.

Step Zero: Know Where Time Goes

You can’t tune what you can’t see. Before touching any config, enable pg_stat_statements. This single extension has saved me more hours than any other Postgres tool.

# postgresql.conf
shared_preload_libraries = 'pg_stat_statements'

CREATE EXTENSION IF NOT EXISTS pg_stat_statements;

Once it’s running, this query shows you exactly where your database spends its time:

SELECT
  substring(query, 1, 80) AS short_query,
  calls,
  round(total_time::numeric, 2) AS total_ms,
  round(mean_time::numeric, 2) AS mean_ms,
  rows
FROM pg_stat_statements
ORDER BY total_time DESC
LIMIT 20;

The top 5 results from this query drive 80% of your tuning decisions. Not config file changes. Not hardware upgrades. Query fixes and index work.

At the fintech startup, this query revealed that a single unindexed join in our news aggregation pipeline was responsible for 40% of total database time. We added one composite index and response times dropped by half. No config changes needed.

Connection Management: The Silent Killer

PostgreSQL forks a process per connection. That’s the model. It’s reliable but it doesn’t scale to hundreds of idle connections sitting around doing nothing.

I’ve seen this pattern at every startup I’ve worked at: the app pool defaults to 10 connections per instance, you scale to 50 instances, and suddenly Postgres is managing 500 processes. Most of them idle. All of them eating memory.

Put PgBouncer in Front

[pgbouncer]
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 20

Transaction mode is what you want for web workloads. The connection returns to the pool after each transaction, not after the client disconnects. This means 1000 application connections share 20 actual Postgres connections.

The catch: transaction mode breaks session-level features like prepared statements and temp tables. If you need those, use session mode for that specific pool and transaction mode for everything else.

Set `max_connections` Deliberately

With PgBouncer handling the fan-in, your Postgres max_connections can be surprisingly low. I typically run 50-100 even on busy systems.

-- Check how many connections you actually use
SELECT count(*) FROM pg_stat_activity;
SELECT count(*) FROM pg_stat_activity WHERE state = 'idle';

If your idle count is more than half your total, you have a pooling problem. Fix that before tuning anything else.

Memory: Four Settings That Matter

Memory configuration in Postgres is a balancing act between the database buffer cache and the OS page cache. Get it wrong and they fight each other.

`shared_buffers`

Start at 25% of total RAM. This is Postgres’s own cache.

shared_buffers = 4GB  # on a 16GB server

Going higher than 25-30% rarely helps and can hurt because you’re stealing memory from the OS cache, which Postgres also relies on for reads.

`work_mem`

This one is tricky. It’s allocated per sort or hash operation, not per connection. A single complex query with multiple joins can allocate work_mem several times over.

work_mem = 64MB

Before you bump this, check if queries are actually spilling to disk:

EXPLAIN (ANALYZE, BUFFERS)
SELECT o.*, u.name
FROM orders o
JOIN users u ON u.id = o.user_id
WHERE o.created_at > '2019-01-01'
ORDER BY o.created_at DESC;

Look for Sort Method: external merge Disk in the output. That means the sort exceeded work_mem and hit disk. But the fix is usually a better query or a better index, not a bigger work_mem. Only raise it when you have confirmed the query is already optimal.

`effective_cache_size`

A planner hint, not an allocation. Tell the planner how much total cache is available (Postgres buffers + OS cache).

effective_cache_size = 12GB  # on a 16GB server

`maintenance_work_mem`

Used by VACUUM and CREATE INDEX. These operations run infrequently so you can be generous here without affecting query concurrency.

maintenance_work_mem = 1GB

Index Discipline: Where the Real Wins Are

I’ll say this plainly: indexing strategy matters more than every postgresql.conf setting combined. A missing index on a hot query path will dominate your performance profile no matter how perfectly tuned your memory settings are.

Find Tables That Need Indexes

SELECT
  schemaname,
  relname AS table_name,
  seq_scan,
  idx_scan,
  n_live_tup AS row_count
FROM pg_stat_user_tables
WHERE seq_scan > idx_scan
  AND n_live_tup > 10000
ORDER BY seq_scan - idx_scan DESC;

Tables with high sequential scans and low index scans on more than 10k rows are your targets.

Index Patterns I Use Constantly

-- Simple lookup (most common)
CREATE INDEX idx_users_email ON users(email);

-- Partial index: only index what you query
CREATE INDEX idx_orders_pending
  ON orders(created_at)
  WHERE status = 'pending';

-- Covering index: avoid hitting the heap
CREATE INDEX idx_orders_user_covering
  ON orders(user_id)
  INCLUDE (status, total);

-- Expression index: for case-insensitive lookups
CREATE INDEX idx_users_email_lower
  ON users(lower(email));

Partial indexes are underused. If you have a million-row orders table but you only ever query pending orders, indexing only the pending rows saves space and speeds up both writes and reads.

Read the Execution Plan

Don’t guess. Run EXPLAIN (ANALYZE, BUFFERS) on every query you care about.

EXPLAIN (ANALYZE, BUFFERS)
SELECT *
FROM orders
WHERE user_id = 123
  AND created_at > '2019-01-01';

Red flags to look for:

Seq Scan on tables with more than a few thousand rows
Large gaps between estimated rows and actual rows (means stale statistics)
Buffers: read numbers that are disproportionately high (means cache misses)

Query Habits That Scale

Keyset pagination over OFFSET. WHERE id > last_seen_id ORDER BY id LIMIT 20 stays fast at any depth. OFFSET 10000 doesn’t.
Avoid SELECT * on wide tables. If a table has 40 columns and you need 3, select 3. The planner can use covering indexes and you reduce I/O.
Keep transactions short. Long transactions hold locks and prevent vacuum from cleaning up dead rows. This is the number one cause of table bloat I see in practice.

Vacuum: The Maintenance You Can’t Skip

MVCC means every UPDATE creates a new row version and leaves the old one behind. VACUUM reclaims that space. If autovacuum falls behind your write rate, tables bloat, indexes bloat, and everything slows down.

Check Bloat Right Now

SELECT
  schemaname,
  relname,
  n_dead_tup,
  n_live_tup,
  round(100.0 * n_dead_tup / nullif(n_live_tup, 0), 2) AS dead_pct,
  last_autovacuum
FROM pg_stat_user_tables
WHERE n_dead_tup > 1000
ORDER BY n_dead_tup DESC;

A dead_pct above 10-20% means autovacuum isn’t keeping up.

Tune Autovacuum for Your Write Load

The defaults assume modest write traffic. For busy tables, lower the scale factor and raise the cost limit so vacuum runs more frequently and more aggressively.

# Global settings
autovacuum_vacuum_scale_factor = 0.02
autovacuum_analyze_scale_factor = 0.01
autovacuum_vacuum_cost_limit = 1000

For specific hot tables, set per-table overrides:

ALTER TABLE orders SET (
  autovacuum_vacuum_scale_factor = 0.01,
  autovacuum_vacuum_cost_limit = 2000
);

At Dropbyke, our GPS tracking table was accumulating dead tuples faster than default autovacuum could clean them. We dropped the scale factor to 0.005 for that one table and the problem disappeared.

WAL and Checkpoints

Write-heavy systems often hit checkpoint spikes – moments where the system pauses to flush dirty pages to disk. Smoothing checkpoints reduces latency jitter.

checkpoint_completion_target = 0.9
wal_buffers = 64MB

If you’re tempted to set synchronous_commit = off for speed, understand what you’re trading. You can lose the last few hundred milliseconds of committed transactions after a crash. For some workloads that’s acceptable. For fintech, it isn’t. Know your domain.

The Monitoring Minimum

You don’t need a fancy dashboard to start. These five checks, run daily or hooked into alerts, catch most problems early:

Connection count and idle sessions – pg_stat_activity
Cache hit ratio – should be above 99% for OLTP workloads (check pg_statio_user_tables)
Top queries by total time – pg_stat_statements
Dead tuple ratio on hot tables – pg_stat_user_tables
Checkpoint frequency and duration – pg_stat_bgwriter

If your cache hit ratio drops below 99%, you either need more RAM or your working set has outgrown your instance. If checkpoint writes spike, your write load has changed and you need to revisit WAL settings.

It never ends

Postgres tuning isn’t a one-time task. Workloads change. Traffic grows. New features add new query patterns. The cycle is always the same: measure, identify the bottleneck, fix the bottleneck, measure again.

From what I’ve seen, the fix is a better query or a missing index about 80% of the time. Config tuning accounts for maybe 15%. Hardware for the remaining 5%. Spend your time accordingly.

The PostgreSQL Tuning Playbook I Actually Use

Quick take

Step Zero: Know Where Time Goes

Connection Management: The Silent Killer

Put PgBouncer in Front

Set `max_connections` Deliberately

Memory: Four Settings That Matter

`shared_buffers`

`work_mem`

`effective_cache_size`

`maintenance_work_mem`

Index Discipline: Where the Real Wins Are

Find Tables That Need Indexes

Index Patterns I Use Constantly

Read the Execution Plan

Query Habits That Scale

Vacuum: The Maintenance You Can’t Skip

Check Bloat Right Now

Tune Autovacuum for Your Write Load

WAL and Checkpoints

The Monitoring Minimum

It never ends

Assumptions

Limits

References

Quick take

Step Zero: Know Where Time Goes

Connection Management: The Silent Killer

Put PgBouncer in Front

Set max_connections Deliberately

Memory: Four Settings That Matter

shared_buffers

work_mem

effective_cache_size

maintenance_work_mem

Index Discipline: Where the Real Wins Are

Find Tables That Need Indexes

Index Patterns I Use Constantly

Read the Execution Plan

Query Habits That Scale

Vacuum: The Maintenance You Can’t Skip

Check Bloat Right Now

Tune Autovacuum for Your Write Load

WAL and Checkpoints

The Monitoring Minimum

It never ends

Assumptions

Limits

References

Set `max_connections` Deliberately

`shared_buffers`

`work_mem`

`effective_cache_size`

`maintenance_work_mem`