Skip to content

HOWTO: Configure the DR Decision Service

Purpose: Configure the HybridOps DR decision service to evaluate probe signals and produce structured failover recommendations.

Difficulty: Advanced

Track: Disaster Recovery Automation


Overview

Automated DR depends on the quality of the decision that triggers it. The HybridOps DR decision service replaces ad hoc alerting with a structured evaluation: defined signal sources, weighted thresholds, and a reasoned decision record that a runner can consume or an operator can review. This HOWTO covers the full configuration.


1. Decision Service Architecture

  • Signal sources: Prometheus federation, probe records, health endpoints.
  • Evaluation engine: threshold rules and confidence scoring.
  • Output: structured decision record with supporting signal context.

2. Configuring Signal Sources

  • Prometheus federation endpoint configuration.
  • Probe result feed integration.
  • Health check endpoint polling interval.

3. Authoring Evaluation Rules

  • Rule schema: signal, threshold, weight, window.
  • Combining multiple signals into a confidence score.
  • Hysteresis configuration to prevent flapping.

4. Recommendation Output Format

  • Recommendation levels: monitor, recommend-failover, trigger-failover.
  • Decision record written under <runtime-root>/state/records/.
  • Service state and log path under <runtime-root>/state/ and <runtime-root>/logs/.

5. Integration with the Failover Module

  • Decision dispatcher consumes the emitted trigger-failover decision record and writes an approval-gated dispatch request.
  • Runner or a later consumer reads the approved dispatch request.
  • Manual gate insertion on recommend-failover level.
  • Decision record becomes part of the failover run record.

References


License: MIT-0 for code, CC-BY-4.0 for documentation unless otherwise stated.