HOWTO: Execute a PostgreSQL Failover with Patroni¶
Purpose: Execute a controlled or emergency PostgreSQL failover in a Patroni-managed cluster with structured run-record capture.
Difficulty: Advanced
Track: Disaster Recovery Automation
Overview¶
Both paths (planned switchover and emergency failover) produce the same output: before/after Patroni cluster state, replication lag at the point of promotion, and application reconnection confirmation. Capture these at each step; they form the run record.
1. Pre-Failover State Capture¶
- Patroni cluster status snapshot.
- Replication lag baseline.
- Application connection pool state.
2. Controlled Switchover¶
patronictl -c /etc/patroni/config.yml switchover.- Pre-switchover health check.
- Execution and timing.
- Post-switchover state confirmation.
3. Emergency Failover¶
- Detecting primary failure: Patroni leader election log.
- Understanding the DCS-based election process.
- Manual trigger if automatic election is inhibited.
4. Post-Failover Validation¶
- New primary confirmed in Patroni cluster list.
- Replication from old primary (now replica) established.
- Application reconnection confirmed.
- pgBackRest reconfigured for new primary.
5. Run-record capture and Run Record¶
- Storing before/after cluster state under
<runtime-root>/logs/dr/postgresql-failover/. - Timing data for RTO measurement.
- Linking to the DR drill run record.
References¶
- ADR-0501 – PostgreSQL on Dedicated VM with DR Replication
- HOWTO: Provision PostgreSQL HA
- HOWTO: Run the DR Failback
License: MIT-0 for code, CC-BY-4.0 for documentation unless otherwise stated.