Disk-Based Buffering
Overview
Observo Pipeline provides disk-based buffering capabilities to handle backpressure events and ensure data reliability during destination outages. When a destination becomes unavailable, the pipeline can temporarily store data on disk and automatically resume forwarding once the destination is available again.
How It Works
When a backpressure event occurs (such as a destination outage or slowdown):
Data Storage: Observo Pipeline engages a backpressure queue that writes incoming data to disk
Temporary Buffering: Data accumulates in the disk buffer during the outage period
Automatic Recovery: Once the destination becomes available, the pipeline automatically forwards buffered data from disk
Continuous Operation: The pipeline maintains data flow and prevents data loss during temporary disruptions
Getting Started
Navigate to Destinations
Select a destination where you want to enable disk based buffering
Go to Buffering Configuration
Fill in the required fields as shown below:
Buffering Configuration
Buffer Type: Select where buffered data will be stored during backpressure events.This configuration supports two storage mechanisms:
Memory: Utilizes RAM for faster buffer operations but offers no persistence guarantees. Data is lost if the pipeline restarts. Suitable for scenarios prioritizing speed over durability.
Disk: Persists data to disk storage, providing durability across pipeline restarts and ensuring data survives system failures. Recommended for production environments where data retention is critical.
Note: This document covers the configuration details for Disk based buffering.
Max Bytes Size: The maximum number of bytes allowed in the buffer.
Minimum size must be at least 268435488 bytes (256 MB).
This setting determines the total capacity of the disk buffer. Once this threshold is met, the "When Full" policy determines how the pipeline handles additional incoming data.
Example:
When Full: Determines how the pipeline handles incoming events when the buffer reaches its maximum capacity.
You have two options:
1. Block (Default)
Behavior: Stops accepting new data when the buffer is full
Impact: Creates upstream backpressure, causing data sources to reduce their event transmission rate until buffer space becomes available
Use Case: When data integrity is critical and every event must be preserved
Best For: Mission-critical logs, compliance data, financial transactions
2. Drop Newest
Behavior: Discards new incoming data when buffer capacity is exhausted
Impact: Maintains consistent pipeline throughput by sacrificing newer events
Use Case: When maintaining pipeline throughput is more important than preserving every data point
Best For: High-volume metrics, non-critical telemetry, sampling scenarios
Configuration Example
Best Practices
Sizing Your Buffer
Assess Data Volume: Calculate your average data ingestion rate (bytes/second)
Estimate Outage Duration: Consider typical outage windows for your destinations
Choosing the Right "When Full" Policy
Choose Block when:
Data loss is unacceptable
Compliance or audit requirements mandate complete data retention
You can tolerate temporary pipeline pauses
Downstream systems can handle delayed data delivery
Choose Drop when:
Real-time data flow is the highest priority
Your use case can tolerate some data loss during peak load periods
Preventing upstream source blockage is critical
You're working with high-volume, less critical data streams
Troubleshooting
Buffer Frequently Full
Symptoms: Buffer reaches capacity often, causing data blocks or drops
Solutions:
Increase Max Bytes Size
Optimize destination performance
Review data volume patterns for spikes
Disk Space Issues
Symptoms: Disk space exhaustion, buffer write failures
Solutions:
Ensure sufficient disk space (3-4x buffer size recommended)
Configure appropriate buffer size for available disk resources
Last updated
Was this helpful?

