Pipeline Overview

This document outlines how to monitor your pipeline within the Observo platform, specifically focusing on metrics such as pipeline input, pipeline output, and optimization percentage. This ensures you can assess data flow, optimization and data output

Pipeline Overview

Key Metrics

Pipeline Input

  • Raw Input Bytes: This represents the total volume of data after decompression ingested to the source.

  • Parsed Input Bytes: This indicates the volume of data after initial parsing and metadata addition.

Pipeline Output

  • Total Output: The final volume of data that exits the pipeline towards its destination.

Optimization Percentage

This metric shows how efficiently the pipeline reduces data volume during processing.

  • Positive optimization means data reduction

  • Negative optimization means data expansion due to enrichment.

Overview Tab

Pipeline Overview Tab

From the Overview tab:

  • Optimization: Overall efficiency of the pipeline for the time period specified.

  • Pipeline Errors: Should be 0 for a healthy pipeline.

  • Total Input: Sum of all input data for the time period specified

  • Total Output: Sum of all output data for the time period specified

Component Level View

Component level view provides drilled down information on each component.

  • Track each component’s input and output volumes.

  • Understand how each transform stage contributes to overall optimization.

  • Identify where data might be expanded (negative optimization).

Source

Source Component

The Source is the entry point of data ingestion in an Observo pipeline. It’s responsible for receiving raw data from external systems (e.g., log agents like Datadog, Beats, or direct API calls) and preparing it for processing.

Raw Input Bytes

  • Represents the size of data after decompression of any compressed formats (e.g., gzip).

  • This means that if logs arrive compressed from a source (like a Datadog agent), Observo decompresses them before measuring this metric.

Parsed Input Bytes

  • Represents the data size after parsing into a structured format, such as converting JSON strings into JSON objects.

  • Parsing might expand the size slightly, depending on the format and structure of the incoming data.

  • For example: A JSON string might get split into fields and expanded.

  • Note : If a source doesn't have a parser configured, the size of parsed input bytes might be higher than Raw Input Bytes because of metadata addition. This will be negligible when working with larger amounts of data.

Why might the Parsed Input be larger than Raw Input?

Parsing Overhead: Converting raw logs into structured formats can add metadata and expand nested structures.

Metadata Injection: Observo adds small metadata tags to support analytics, error tracking, and usage insights. This ensures that features like analytics dashboards, pipeline error detection work seamlessly.

Destinations

Sink Component

Destination is the final stage in an Observo pipeline, where processed data is sent to its target system — for example, Splunk, S3, or a downstream analytics platform.

Optimization

  • This metric measures the overall efficiency of the entire pipeline in reducing (or enriching) the data volume.

  • It is calculated as the percentage difference between the raw input (after decompression) at the source and the output at the sink.

  • Optimization = ((Raw Input - Sink Input) / Raw Input) * 100%

  • It shows how effective the entire pipeline was in reducing data volume from entry to exit.

  • This is useful for estimating storage costs and impact.

Optimization on Parsed Input

  • This metric focuses specifically on the pipeline transforms, measuring how much data was reduced (or increased) between the parsed input (at the source) and the sink input.

  • It excludes the parser itself (since parsing happens at the source, before the pipeline transforms are applied).

  • Parsed Optimization = ((Parsed Input - Sink Output) / Parsed Input) * 100%

  • This isolates the efficiency of your transforms (e.g., filtering, enrichment, aggregation) from parsing overhead.

  • It gives a clear picture of how well your pipeline can optimize the data you actually work with.

FAQ

  1. Why does my pipeline show negative optimization?

    • A negative optimization percentage means that the output volume at a stage is larger than the input volume. This can occur due to:

    • Data Enrichment: Certain transforms (like indexing, annotation, or enrichment) may add metadata or additional fields to your logs, increasing the data size.

    • Index Expansion: Some pipelines expand logs to include index-friendly structures or data for easier querying in downstream systems (e.g., Splunk).

    • Format Conversion: Converting logs to a more verbose format (e.g., JSON or CSV) can increase their size compared to the original compressed or less verbose format.

  2. Why does the input volume sometimes seem larger than expected?

    • In observo.ai, the input volume is calculated after decompressing the data. Here’s why:

    • Compression: Many log sources send data compressed to reduce bandwidth (e.g., gzip, deflate). When this data enters observo.ai, it’s decompressed so that transformations and optimizations can be accurately measured.

    • Parsed Input Bytes: After parsing (e.g., splitting into structured logs), the pipeline reflects the actual volume that will be processed, even if it’s larger than the raw compressed size.

    • This ensures that pipeline metrics reflect the true volume of data the pipeline must handle, rather than an artificially small compressed representation.

  3. Why does a transform show negative optimization?

    • A negative optimization percentage in a transform means that the output data size increased compared to the input.

    • Data Enrichment: The transform might add additional fields, tags, or metadata (e.g., adding geolocation data, user context, or correlation IDs) to logs.

    • Format Conversion: Converting from a compact format (like binary) to a verbose one (like JSON) can lead to larger data.

Last updated

Was this helpful?