Skip to content

Failback PostgreSQL Cloud SQL DR to On-Prem (HyOps Blueprint)

Purpose: Gate the failback decision and then repoint the stable PostgreSQL service endpoint back to the on-prem PostgreSQL HA lane.
Owner: Platform engineering / SRE
Trigger: End of a managed-cloud DR event or failback drill
Impact: Applications are redirected back to the on-prem PostgreSQL HA endpoint.
Severity: P1
Pre-reqs: The managed cloud primary has been fenced, the on-prem PostgreSQL HA lane has already been rebuilt or reseeded, and DNS authority credentials are available.
Rollback strategy: If the manual gate is not confirmed, do nothing. If cutback is unsafe, keep service on the managed DR primary until the on-prem target is re-verified.

Context

Blueprint ref: dr/postgresql-cloudsql-failback-onprem@v1
Location: hybridops-core/blueprints/dr/postgresql-cloudsql-failback-onprem@v1/blueprint.yml

Default step flow:

  1. core/shared/manual-gate
  2. platform/network/dns-routing

Important:

  • this blueprint does not rebuild the on-prem cluster for you
  • rebuild, reseed, or reverse-sync work must already be complete before the manual gate is confirmed
  • this keeps the product honest until reverse replication automation is explicitly shipped and tested
  • DNS cutback consumes endpoint_host from the on-prem PostgreSQL HA state because the route uses an A record

Manual gate expectations

Set the manual gate only after all of these are already true:

  • managed_primary_fenced=true
  • onprem_target_rebuilt=true
  • onprem_primary_writable=true
  • failback_approved=true

Validate and execute

hyops blueprint validate --ref dr/postgresql-cloudsql-failback-onprem@v1
hyops blueprint preflight --env dev --ref dr/postgresql-cloudsql-failback-onprem@v1
hyops blueprint deploy --env dev --ref dr/postgresql-cloudsql-failback-onprem@v1 --execute

Verify

Confirm:

  • manual gate state is cap.control.manual_gate = confirmed
  • DNS routing state is cap.network.dns_routing = ready
  • the published record now targets the on-prem PostgreSQL HA endpoint contract
  • application writes land only on the restored on-prem primary