Database Replication Patterns That Actually Matter

Quick take

Every replication pattern is a bet on which failure you can tolerate. Pick wrong and you either lose data or lose availability. There’s no option C.

At the fintech startup we ingested financial news and market data from dozens of sources into PostgreSQL. Millions of rows per day. The kind of dataset where “eventually consistent” means a trader sees yesterday’s price and you get a very angry phone call.

That experience shaped how I think about replication. Not as a feature you toggle on, but as an architectural decision that determines how your system fails. Because it will fail.

Why you replicate

The reasons are fewer than people think:

Availability. Your primary dies, a replica takes over, users keep working.
Read scaling. Heavy reporting queries stop competing with writes.
Latency. Put a copy closer to users so reads don’t cross the Atlantic.
Isolation. Backups, analytics, migrations – all without touching production writes.

That’s basically it. If your reason isn’t one of these, you probably don’t need replication yet. You need a better backup strategy.

The three modes

Asynchronous

The primary commits, returns success to the client, then ships WAL records to replicas whenever it gets around to it.

-- postgresql.conf on the primary
wal_level = replica
max_wal_senders = 5
-- No synchronous_standby_names. That's the point.

The replica connects with a primary_conninfo and streams changes:

# recovery.conf (or standby.signal in PG12+)
primary_conninfo = 'host=primary-db port=5432 user=replicator'

This is the fastest mode for writes. The primary doesn’t wait for anyone. But there’s a window – could be milliseconds, could be seconds under load – where a replica is behind. If the primary dies in that window, those transactions are gone.

At the fintech startup, async replication was fine for our read replicas serving the news feed. A 200ms lag on a news article? Nobody notices. A 200ms lag on a stock price used for trading signals? Different conversation entirely.

Synchronous

The primary waits for at least one replica to confirm it wrote the WAL to disk before telling the client “committed.”

-- postgresql.conf
synchronous_standby_names = 'replica1'
synchronous_commit = on

Zero data loss on failover. Sounds great. The cost: every single write now includes a network round trip to the replica. In the same datacenter, maybe 1-2ms added latency. Across regions? 50-150ms. Per write.

Worse: if that replica goes down, your primary blocks all writes until the replica comes back or you reconfigure. I’ve seen this take down production systems that were “highly available.” The replication designed for availability became the single point of failure.

Quorum (semi-synchronous)

The pragmatic middle ground. Wait for any N out of M replicas to acknowledge, not all of them.

-- Wait for any 1 of these 3 replicas
synchronous_standby_names = 'ANY 1 (replica1, replica2, replica3)'

One replica can die and writes keep flowing. You still get the durability guarantee because at least one standby has the data. This is what I’d recommend for most production PostgreSQL setups that need strong durability.

The math is simple: if you have 3 standbys and require ANY 1, you can lose 2 replicas before writes stall. With ANY 2, you can lose 1. Pick based on how many failures you want to survive simultaneously.

Topologies

Primary-replica (the default)

One writer, N readers. Dead simple. Works for the vast majority of applications.

  Writes           Reads
    |                |
    v                v
[Primary] ----> [Replica 1]
    |
    +---------> [Replica 2]

If you’re building a new system and someone suggests multi-primary on day one, push back hard. Start here.

Cascading replicas

A replica can feed other replicas downstream instead of every replica pulling from the primary. This matters when you have many replicas or they span continents.

-- On the cascading replica (replica2 feeds from replica1, not primary)
primary_conninfo = 'host=replica1 port=5432 user=replicator'

The tradeoff: more lag for downstream replicas. Each hop adds latency. But the primary’s max_wal_senders slots aren’t exhausted by a fleet of read replicas. We used this at the fintech startup to feed analytics replicas off a primary standby – kept the main failover target clean while analytics could thrash their copy however they wanted.

Multi-primary

Multiple nodes accept writes and sync with each other. Sounds appealing. Is usually painful.

The fundamental problem: two nodes write conflicting data to the same row at the same time. Now what?

PostgreSQL’s logical replication can do this with BDR or similar extensions, but you’re signing up for conflict resolution logic, operational complexity, and debugging sessions that make you question your career choices. MySQL Group Replication and Galera have the same fundamental tradeoffs.

Use multi-primary when you have offices in multiple continents that all need local write latency and the business has accepted the cost. Not because it sounds cool on a whiteboard.

Read-your-writes consistency

This is where async replication bites you in the application layer.

User writes a comment. POST succeeds (hits the primary). Page refreshes. GET goes to a replica. Comment isn’t there. User writes the comment again. Now you have duplicates.

Three approaches, from simple to robust:

1. Sticky routing after writes

func handleRequest(w http.ResponseWriter, r *http.Request) {
    cookie, _ := r.Cookie("last_write")
    if cookie != nil && time.Since(parseCookieTime(cookie)) < 5*time.Second {
        // Recent write -- route to primary
        db = primaryDB
    } else {
        db = replicaDB
    }
    // ...
}

Crude but effective. After a write, force reads to the primary for a few seconds. The window just needs to exceed your typical replication lag.

2. Session affinity to a specific replica

Pin a user session to one replica. They won’t see inconsistency within their own session because they always read from the same copy. Doesn’t help with cross-user consistency but handles the most visible problem.

3. LSN tracking

The proper solution. After a write, capture the WAL position (LSN) from the primary. Before reading from a replica, check if the replica has replayed past that LSN.

-- After write, on the primary:
SELECT pg_current_wal_lsn();
-- Returns something like: 0/16B9188

-- Before read, on the replica:
SELECT pg_last_wal_replay_lsn();
-- If this >= the saved LSN, the replica is caught up

More work to implement. More correct.

Conflict resolution in multi-primary

If you went down the multi-primary path despite my warning, you need a conflict strategy.

Last write wins (LWW). Timestamp comparison. Whichever write has the later timestamp survives. Simple, lossy. A perfectly valid update gets silently dropped because a clock was 1ms behind. If your data can tolerate silent overwrites, fine. Financial data can’t.

Application-level merge. You define the merge logic per table or per field. A counter gets added. A set gets unioned. A text field takes the longer version. This works but every schema change means updating merge rules.

CRDTs. Conflict-free replicated data types. Mathematically guaranteed to converge. Limited to specific data structures: counters, sets, registers. You can’t CRDT your way through an arbitrary relational schema. Good for specific use cases, not a general solution.

Failover

Promoting a replica sounds easy. pg_promote() or SELECT pg_promote() in PG12+. Done. Except no.

The hard parts:

Fencing the old primary. If the old primary isn’t actually dead – just slow, or network-partitioned – you now have two nodes accepting writes. Split brain. The nightmare scenario. You need STONITH (shoot the other node in the head) or at minimum, revoke the old primary’s ability to accept connections.

Client reconnection. Your application’s connection string points to the old primary. Now it needs to point to the new one. Options: DNS update (slow propagation), virtual IP (fast but requires infra), connection proxy like PgBouncer or HAProxy that handles routing, or application-level logic.

# HAProxy config for automatic failover routing
listen postgres
    bind *:5432
    option httpchk GET /primary
    server pg1 10.0.1.1:5432 check port 8008
    server pg2 10.0.1.2:5432 check port 8008

Rebuilding the old primary. After failover, the old primary has diverged. It accepted some writes the new primary doesn’t have (or vice versa). You need to either pg_rewind it or rebuild from scratch with pg_basebackup. Test this process before you need it.

I’ve run failover drills at every company I’ve worked at. The drill always surfaces something the documentation missed. Always.

Monitoring replication

Replication lag is a ticking bomb with a variable fuse. You need to watch it continuously.

-- On the primary: check all connected replicas
SELECT
    client_addr,
    state,
    sent_lsn,
    write_lsn,
    flush_lsn,
    replay_lsn,
    pg_wal_lsn_diff(sent_lsn, replay_lsn) AS replay_lag_bytes
FROM pg_stat_replication;

Key things to alert on:

Replay lag exceeding your tolerance. If you promised “reads within 1 second of writes,” alert before that threshold.
Replica disconnection. A replica that silently falls off is worse than one that loudly crashes.
WAL accumulation on the primary. If a replica can’t keep up, the primary retains WAL segments. Disk fills. Primary crashes. You’ve now lost the thing that was supposed to protect you.

-- Check WAL retention on the primary
SELECT pg_wal_lsn_diff(pg_current_wal_lsn(), '0/0') AS total_wal_bytes;

-- Or more practically, watch the pg_wal directory size
-- and alert when it grows beyond expected bounds.

Choosing your pattern

Skip the decision matrix. Ask three questions:

Can you lose any committed transactions? If no, synchronous or quorum replication. Full stop. This was non-negotiable for the financial data at the fintech startup. A lost trade record isn’t a bug, it’s a regulatory incident.

Do you need writes in multiple regions? If no – and it’s almost always no – use primary-replica. If yes, accept the operational cost of multi-primary and budget engineering time accordingly. At Decloud, we keep writes centralized and replicate reads out. Simpler. Fewer 3am pages.

What’s your read-to-write ratio? If reads dominate (10:1 or more), async replicas for reads with synchronous replication to a single failover standby gives you the best of both: fast reads, safe failover, and write performance that’s only slightly penalized.

Replication isn’t a feature flag. It’s a contract between your system and your users about what happens when hardware fails, networks partition, and Murphy’s Law does its thing. Pick the pattern that matches the promises you’ve actually made, not the ones you wish you could make.