Common Errors When Setting Up PostgreSQL Streaming Replication
I am a web developer focusing on building web application using Python and Django. Full profile on https://kamal.koditi.my/.
I have been setting up postgresql streaming replication lately so here's some notes on common errors I bumped into during the setup.
1. The Missing standby.signal File
The Symptom: You complete your base backup, start up the secondary instance, and realize it's accepting write traffic or acting like a completely independent primary database instead of a read-only follower.
The Fix: PostgreSQL needs an explicit flag to boot up in standby mode. If you use pg_basebackup, always include the -R flag. This automatically creates a file named standby.signal in your data directory and writes your standby.auto.conf settings. If you forget it, you can create it manually:
sudo -u postgres touch /var/lib/postgresql/17/main/standby.signal
2. The pg_hba.conf Security Wall
The Symptom: Your logs on the secondary server show a frustrating loop of: FATAL: no pg_hba.conf entry for host "10.104.0.5", user "replication_user"
The Fix: PostgreSQL is secure by default and blocks all remote database connections. You must explicitly allow your replica's IP address to connect using the replication keyword. Add this to the bottom of your primary server's /etc/postgresql/17/main/pg_hba.conf:
# Allow replication traffic from the secondary IP
host replication replication_user 10.104.0.5/32 scram-sha-256
Don't forget to reload the configuration on the primary afterward: SELECT pg_reload_conf();.
3. Forgetting wal_log_hints Before a Failover
The Symptom: Your primary crashes, you successfully promote your secondary to master, but when the old primary comes back online, pg_rewind fails with: pg_rewind: error: target server needs to use either data checksums or "wal_log_hints = on"
The Fix: pg_rewind is a lifesaver that prevents you from having to completely re-download massive databases after a failover, but it requires extra data in the logs to do its job. You must enable wal_log_hints = on in your postgresql.conf on both servers before any disaster happens. If you didn't, your only option is to delete the old primary's data and run a slow, fresh pg_basebackup.
4. Underestimating wal_keep_size
The Symptom: Replication works beautifully for a few hours, but suddenly breaks under heavy write loads with an error indicating that the secondary has "fallen behind" or that WAL segments have been cleared.
The Fix: The primary server automatically purges older Write-Ahead Log (WAL) segments to save disk space. If your secondary experiences a brief network hiccup or a massive write-heavy benchmark (like pgbench), the primary might delete the logs before the secondary can read them. Give your replica a safety buffer by increasing this value on your primary:
# In postgresql.conf (PostgreSQL 13+)
wal_keep_size = 1024MB # Adjust higher based on your transaction volume
Summary Checklist
Before you pull your hair out debugging, remember that PostgreSQL's error logs are incredibly descriptive. On Debian systems, bypass the systemd journal and check the direct cluster log file:
sudo tail -n 50 /var/log/postgresql/postgresql-17-main.log
Nine times out of ten, your fix is just a quick IP tweak in pg_hba.conf or a forgotten parameter in postgresql.conf away!



