Disaster Recovery
Observo supports "Pilot Light" & "Warm Standby" DR models for Site deployments. Both of these strategies offer RPO/RTO choices based on which our customers make deployment decisions.

Pilot Light
Site metadata is synced and ready to optimize RPO
Elements like compute are shut-off to optimize on cost
This increases RTO when a failover is triggered as compute instances for clusters need to be initialized
Recommended when RTO can be longer and cost optimization is more important
Typical RPO: < 15 minutes
Typical RTO: 1-2 hours
Warm Standby
Site metadata is synced and ready to optimize RPO
To reduce RTO, a fraction of compute instances are initialized
This option results in higher cost than "Pilot Light"
Recommended for critical workloads
Typical RPO: < 5 minutes
Typical RTO: 10-15 minutes
Implementation Details
Standby Site Configuration
In the standby site (as shown in the diagram), the dataplane is scaled to 0 to optimize costs while maintaining readiness. All configuration data is regularly pulled from the manager like the active site to ensure the standby site can quickly become operational when needed. When a failover is triggered, the dataplane is scaled up, allowing the site to start working with full functionality.
Scaling Commands
Scale Dataplane to Zero (For Standby Site)
Scale Up Dataplane (During Failover)
HPA Management for Dataplane
Disable HPA (For Standby Site)
Enable HPA (During Failover)
Key Features
Automated failover capabilities
Regular health checks and monitoring
Cross-region data replication
Automated backup and restore procedures
Configurable sync intervals for metadata
Flexible compute scaling options
Considerations
Network latency between primary and DR sites
Data consistency requirements
Cost implications of chosen strategy
Compliance and regulatory requirements
Testing and validation procedures
Regular testing of failover procedures to ensure recovery works as expected
Last updated
Was this helpful?

