HOWTO: Configure the DR Decision Service¶
Purpose: Configure the HybridOps DR decision service to evaluate probe signals and produce structured failover recommendations.
Difficulty: Advanced
Track: Disaster Recovery Automation
Overview¶
Automated DR depends on the quality of the decision that triggers it. The HybridOps DR decision service replaces ad hoc alerting with a structured evaluation: defined signal sources, weighted thresholds, and a reasoned decision record that a runner can consume or an operator can review. This HOWTO covers the full configuration.
1. Decision Service Architecture¶
- Signal sources: Prometheus federation, probe records, health endpoints.
- Evaluation engine: threshold rules and confidence scoring.
- Output: structured decision record with supporting signal context.
2. Configuring Signal Sources¶
- Prometheus federation endpoint configuration.
- Probe result feed integration.
- Health check endpoint polling interval.
3. Authoring Evaluation Rules¶
- Rule schema: signal, threshold, weight, window.
- Combining multiple signals into a confidence score.
- Hysteresis configuration to prevent flapping.
4. Recommendation Output Format¶
- Recommendation levels:
monitor,recommend-failover,trigger-failover. - Decision record written under
<runtime-root>/state/records/. - Service state and log path under
<runtime-root>/state/and<runtime-root>/logs/.
5. Integration with the Failover Module¶
- Decision dispatcher consumes the emitted
trigger-failoverdecision record and writes an approval-gated dispatch request. - Runner or a later consumer reads the approved dispatch request.
- Manual gate insertion on
recommend-failoverlevel. - Decision record becomes part of the failover run record.
References¶
- ADR-0402 – Prometheus Federation DR Signal Plane
- HOWTO: Execute a PostgreSQL Failover
- HOWTO: Interpret Probe Output
License: MIT-0 for code, CC-BY-4.0 for documentation unless otherwise stated.