HOWTO: Run a DR Failback After PostgreSQL Recovery¶

Purpose: Restore the original PostgreSQL primary as a replica after a failover, re-synchronise it, and optionally promote it back to primary with full run-record capture.

Difficulty: Advanced

Track: Disaster Recovery Automation

Overview¶

Failback is often neglected in DR rehearsals: but it is the step that proves you can return to a known-good topology without data loss or extended downtime. In HybridOps, failback is a structured module operation with checkpoints at each stage and run records produced throughout. This HOWTO covers the full path.

1. Pre-Failback Assessment¶

Current cluster topology after failover.
Original primary node state: is it recoverable?
Decision: rebuild from backup or re-sync from replica.

2. Rebuilding the Original Node¶

Stopping PostgreSQL on the original primary.
pgbackrest restore from the latest backup.
Patroni configuration for standby mode.

3. Re-attaching as a Replica¶

Starting Patroni on the rebuilt node.
Confirming Patroni member registration in DCS.
Monitoring WAL catch-up and lag.

4. Validation Before Switchback¶

Replication lag below threshold.
Original node listed as healthy replica in Patroni.
pgBackRest stanza check on the rebuilt node.

5. Optional Controlled Switchback¶

patronictl switchover to promote original primary.
Application connection pool drain and reconnect.
Post-switchback cluster state snapshot.

6. Closing the DR Drill Record¶

Final cluster topology run record.
Total elapsed time from failover to clean failback.
DR drill run record closure with all linked records.

References¶

License: MIT-0 for code, CC-BY-4.0 for documentation unless otherwise stated.