Aggregate Metrics

The Aggregate Metrics function aggregates metric data from multiple events into a single event, summarizing the data for efficient storage and analysis. This is useful for reducing the volume of metric data while preserving key insights.

Purpose

Use the Aggregate Metrics function when you need to consolidate metric data from multiple events into a single event. Common use cases include:

  • Summarizing time-series metrics such as CPU usage, request latency.

  • Reducing the volume of metric data for storage or visualization.

  • Preparing metric data for downstream processing or analysis.

Usage

Select Aggregate Metrics transform. Add Name (required) and Description (optional).

General Configuration:

  • Bypass Transform: Defaults to disabled. When enabled, this transform will be bypassed entirely, allowing the event to pass through without any modifications.

  • Add Filter Conditions: Defaults to disabled. When enabled, it allows events to filter through conditions. Only events that meet the true condition will be processed; all others will bypass this transform. Based on AND/OR conditions, "+Rule" or "+Group" buttons.

Aggregate Metrics: Enabled: Defaults to enabled, meaning it does evaluate all events. Toggle Enabled off to prevent event processing to feed data to the downstream Transforms.

Aggregate Conditions Rules: Set of event fields to evaluate and add/set. First field entry (1 rule) key-value pair added by default. Click Add Buttons to add more rule fields, with the following inputs (optional):

  • Aggregation Conditions: Defaults to empty. When set, allows events to filter through conditions. Only events that meet the true condition will be processed; all others will bypass this transform. Based on AND/OR conditions, and "+Rule" or "+Group" buttons.

  • Aggregation Interval (seconds): Specify the time interval in seconds for aggregating the metrics. After each aggregation interval, metrics are flushed.

Examples

When using Thanos with prometheus-remote-write, there are several key metrics that help monitor ingestion, querying, compaction, and object store performance. Let’s concentrate on Object Store Metrics.

Thanos Object Storage Metrics (S3, GCS, Azure, etc.)

Metric Name
Description
Example Values
Labels

thanos_objstore_bucket_operations_total

Total number of operations performed on the object storage (uploads, downloads, deletions)

10,235

operation="upload" or operation="download"

thanos_objstore_bucket_operation_failures_total

Number of failed upload object storage operations

45

operation="upload", error="timeout"

thanos_objstore_bucket_size_bytes

Total size of the object storage bucket in bytes.

1.2e+12 (1.2TB)

No Labels Required

Total Object Store Operations (Uploads or Downloads)

Scenario: Captures total object store operations based on the operations labels: uploads or downloads.

Aggregate Conditions

  • Group Rule

Condition
Label
Label Condition
Value

AND

thanos_objstore_bucket_operations_total

exists

N/A

  • Rule1: Aggregation Conditions

Condition
Label
Label Condition
Value

OR

operation

equals

upload

  • Rule2: Aggregation Conditions

Condition
Label
Label Condition
Value

OR

operation

equals

download

Aggregation Interval (seconds)

60

Outcome: Shows the rate of object store operations (uploads, downloads) at a 1 minute interval.

Limitations

  • The major advantage to aggregation is the reduction of volume. It may reduce costs directly in situations that charge by metric event volume, or indirectly by requiring less CPU to process and/or less network bandwidth to transmit and receive.

  • In systems that are constrained by the processing required to ingest metric events it may help to reduce the processing overhead. This may apply to transforms and sinks downstream of the aggregate transform as well.

Last updated

Was this helpful?