Disk-Based Buffering

Overview

Observo Pipeline provides disk-based buffering capabilities to handle backpressure events and ensure data reliability during destination outages. When a destination becomes unavailable, the pipeline can temporarily store data on disk and automatically resume forwarding once the destination is available again.

How It Works

When a backpressure event occurs (such as a destination outage or slowdown):

  1. Data Storage: Observo Pipeline engages a backpressure queue that writes incoming data to disk

  2. Temporary Buffering: Data accumulates in the disk buffer during the outage period

  3. Automatic Recovery: Once the destination becomes available, the pipeline automatically forwards buffered data from disk

  4. Continuous Operation: The pipeline maintains data flow and prevents data loss during temporary disruptions

Getting Started

  1. Navigate to Destinations

  2. Select a destination where you want to enable disk based buffering

  3. Go to Buffering Configuration

  4. Fill in the required fields as shown below:

Buffering Configuration

  • Buffer Type: Select where buffered data will be stored during backpressure events.This configuration supports two storage mechanisms:

    • Memory: Utilizes RAM for faster buffer operations but offers no persistence guarantees. Data is lost if the pipeline restarts. Suitable for scenarios prioritizing speed over durability.

    • Disk: Persists data to disk storage, providing durability across pipeline restarts and ensuring data survives system failures. Recommended for production environments where data retention is critical.

Note: This document covers the configuration details for Disk based buffering.

  • Max Bytes Size: The maximum number of bytes allowed in the buffer.

    • Minimum size must be at least 268435488 bytes (256 MB).

    • This setting determines the total capacity of the disk buffer. Once this threshold is met, the "When Full" policy determines how the pipeline handles additional incoming data.

    • Example:

When Full: Determines how the pipeline handles incoming events when the buffer reaches its maximum capacity.

You have two options:

  • 1. Block (Default)

    • Behavior: Stops accepting new data when the buffer is full

    • Impact: Creates upstream backpressure, causing data sources to reduce their event transmission rate until buffer space becomes available

    • Use Case: When data integrity is critical and every event must be preserved

    • Best For: Mission-critical logs, compliance data, financial transactions

2. Drop Newest

  • Behavior: Discards new incoming data when buffer capacity is exhausted

  • Impact: Maintains consistent pipeline throughput by sacrificing newer events

  • Use Case: When maintaining pipeline throughput is more important than preserving every data point

  • Best For: High-volume metrics, non-critical telemetry, sampling scenarios

Configuration Example

Best Practices

Sizing Your Buffer

  1. Assess Data Volume: Calculate your average data ingestion rate (bytes/second)

  2. Estimate Outage Duration: Consider typical outage windows for your destinations

Choosing the Right "When Full" Policy

Choose Block when:

  • Data loss is unacceptable

  • Compliance or audit requirements mandate complete data retention

  • You can tolerate temporary pipeline pauses

  • Downstream systems can handle delayed data delivery

Choose Drop when:

  • Real-time data flow is the highest priority

  • Your use case can tolerate some data loss during peak load periods

  • Preventing upstream source blockage is critical

  • You're working with high-volume, less critical data streams

Troubleshooting

Buffer Frequently Full

Symptoms: Buffer reaches capacity often, causing data blocks or drops

Solutions:

  • Increase Max Bytes Size

  • Optimize destination performance

  • Review data volume patterns for spikes

Disk Space Issues

Symptoms: Disk space exhaustion, buffer write failures

Solutions:

  • Ensure sufficient disk space (3-4x buffer size recommended)

  • Configure appropriate buffer size for available disk resources

Last updated

Was this helpful?