Common Errors When Setting Up PostgreSQL Streaming Replication

I have been setting up postgresql streaming replication lately so here's some notes on common errors I bumped into during the setup.

1. The Missing `standby.signal` File

The Symptom: You complete your base backup, start up the secondary instance, and realize it's accepting write traffic or acting like a completely independent primary database instead of a read-only follower.

The Fix: PostgreSQL needs an explicit flag to boot up in standby mode. If you use pg_basebackup, always include the -R flag. This automatically creates a file named standby.signal in your data directory and writes your standby.auto.conf settings. If you forget it, you can create it manually:

sudo -u postgres touch /var/lib/postgresql/17/main/standby.signal

2. The `pg_hba.conf` Security Wall

The Symptom: Your logs on the secondary server show a frustrating loop of: FATAL: no pg_hba.conf entry for host "10.104.0.5", user "replication_user"

The Fix: PostgreSQL is secure by default and blocks all remote database connections. You must explicitly allow your replica's IP address to connect using the replication keyword. Add this to the bottom of your primary server's /etc/postgresql/17/main/pg_hba.conf:

# Allow replication traffic from the secondary IP
host    replication     replication_user     10.104.0.5/32           scram-sha-256

Don't forget to reload the configuration on the primary afterward: SELECT pg_reload_conf();.

3. Forgetting `wal_log_hints` Before a Failover

The Symptom: Your primary crashes, you successfully promote your secondary to master, but when the old primary comes back online, pg_rewind fails with: pg_rewind: error: target server needs to use either data checksums or "wal_log_hints = on"

The Fix: pg_rewind is a lifesaver that prevents you from having to completely re-download massive databases after a failover, but it requires extra data in the logs to do its job. You must enable wal_log_hints = on in your postgresql.conf on both servers before any disaster happens. If you didn't, your only option is to delete the old primary's data and run a slow, fresh pg_basebackup.

4. Underestimating `wal_keep_size`

The Symptom: Replication works beautifully for a few hours, but suddenly breaks under heavy write loads with an error indicating that the secondary has "fallen behind" or that WAL segments have been cleared.

The Fix: The primary server automatically purges older Write-Ahead Log (WAL) segments to save disk space. If your secondary experiences a brief network hiccup or a massive write-heavy benchmark (like pgbench), the primary might delete the logs before the secondary can read them. Give your replica a safety buffer by increasing this value on your primary:

# In postgresql.conf (PostgreSQL 13+)
wal_keep_size = 1024MB  # Adjust higher based on your transaction volume

Summary Checklist

Before you pull your hair out debugging, remember that PostgreSQL's error logs are incredibly descriptive. On Debian systems, bypass the systemd journal and check the direct cluster log file:

sudo tail -n 50 /var/log/postgresql/postgresql-17-main.log

Nine times out of ten, your fix is just a quick IP tweak in pg_hba.conf or a forgotten parameter in postgresql.conf away!

Common Errors When Setting Up PostgreSQL Streaming Replication

Comments

More from this blog

Git blame after massive code formatting

Podman in Incus container

So You Want to Send an Email?

Reuse existing ssh agent

1. The Missing `standby.signal` File

2. The `pg_hba.conf` Security Wall

3. Forgetting `wal_log_hints` Before a Failover

4. Underestimating `wal_keep_size`

Summary Checklist

Command Palette

Comments

More from this blog

1. The Missing standby.signal File

2. The pg_hba.conf Security Wall

3. Forgetting wal_log_hints Before a Failover

4. Underestimating wal_keep_size

Summary Checklist

1. The Missing `standby.signal` File

2. The `pg_hba.conf` Security Wall

3. Forgetting `wal_log_hints` Before a Failover

4. Underestimating `wal_keep_size`