HOWTO: Run a DR Failback After PostgreSQL Recovery¶
Purpose: Restore the original PostgreSQL primary as a replica after a failover, re-synchronise it, and optionally promote it back to primary with full run-record capture.
Difficulty: Advanced
Track: Disaster Recovery Automation
Overview¶
Failback is often neglected in DR rehearsals: but it is the step that proves you can return to a known-good topology without data loss or extended downtime. In HybridOps, failback is a structured module operation with checkpoints at each stage and run records produced throughout. This HOWTO covers the full path.
1. Pre-Failback Assessment¶
- Current cluster topology after failover.
- Original primary node state: is it recoverable?
- Decision: rebuild from backup or re-sync from replica.
2. Rebuilding the Original Node¶
- Stopping PostgreSQL on the original primary.
pgbackrest restorefrom the latest backup.- Patroni configuration for standby mode.
3. Re-attaching as a Replica¶
- Starting Patroni on the rebuilt node.
- Confirming Patroni member registration in DCS.
- Monitoring WAL catch-up and lag.
4. Validation Before Switchback¶
- Replication lag below threshold.
- Original node listed as healthy replica in Patroni.
- pgBackRest stanza check on the rebuilt node.
5. Optional Controlled Switchback¶
patronictl switchoverto promote original primary.- Application connection pool drain and reconnect.
- Post-switchback cluster state snapshot.
6. Closing the DR Drill Record¶
- Final cluster topology run record.
- Total elapsed time from failover to clean failback.
- DR drill run record closure with all linked records.
References¶
- ADR-0501 – PostgreSQL on Dedicated VM with DR Replication
- HOWTO: Execute a PostgreSQL Failover
- HOWTO: Set Up pgBackRest Backup
License: MIT-0 for code, CC-BY-4.0 for documentation unless otherwise stated.