Trace Summarization

The Trace Summarization Optimizer in Observo AI allows users to aggregate and condense distributed trace data by combining related trace spans into a single, coherent summary that streamlines performance analysis and troubleshooting.

Purpose

The Trace Summarization Optimizer processes distributed trace data by grouping related spans, reducing the volume of raw trace events into a concise and coherent summary. It highlights key performance metrics, such as latency and error occurrences, enabling teams to quickly pinpoint bottlenecks and anomalies within complex systems. This streamlined view accelerates troubleshooting and performance optimization by distilling essential insights from otherwise overwhelming trace data.

Usage

Select Trace Summarization Optimizer transform. Add Name (required) and Description (optional).

General Configuration:

  • Bypass Transform: Defaults to disable. When enabled, this transform will be bypassed entirely, allowing the event to pass through without any modifications.

  • Add Filter Conditions: Defaults to disable. When enabled, it allows events to filter through conditions. Only events that meet the true condition will be processed; all others will bypass this transform. Based on AND/OR conditions, "+Rule" or "+Group" buttons.

Trace Summarization: TraceID Columns (pulldown):

  • First column name: First column name from where to extract the TraceID. Leave empty if not required.

    Examples:
    msg.col1
  • Regex for First column: Regex for the TraceID which will be extracted from the First column name field value. Leave empty if not required.

    Examples:
    req:([0-9]+)
  • Second column name: Second column name from where to extract the TraceID. Leave empty if not required.

    Examples:
    msg.col1
  • Regex for Second column: Regex for the TraceID which will be extracted from the Second column name field value. Leave empty if not required.

    Examples:
    req:([0-9]+)
  • TraceID columns: Add column names which contain TraceID as values. Leave empty if not required.

    Examples:
    msg.request_id

Trace Summarization Configs (pulldown):

  • Field Names to keep last value: Click Add button to create list of columns where we want to save only last value. Leave empty if not required.

    Examples:
    kubernetes.pod_id
  • Field Name which contains raw log: Column which contains the raw log. Defaults log.

    Examples:
    logs
  • Reduction Strategy for raw log: The method used to aggregate the values of the field. For numeric fields, you can use strategies like 'sum', 'max', or 'min'. For string fields, you can use strategies like 'concat', 'retain', or 'discard'.

    Methods
    Append value to array
    Concatenate strings with space separator
    Concatenate strings with newline separator
    Concatenate strings without separator
  • Prefix logs with timestamp: Log message field will be prefixed with a timestamp value. Disabled by default. Toggle on to enable.

  • Max Events: The maximum number of events to group together. Defaults 30.

  • Flush Time(seconds): The maximum amount of time in seconds to wait before flushing events to Destination. Defaults 30 seconds.

Examples

Trace Summarization

This transformer has the ability to summarize/collate data based on trace information from the incoming logs.

Consider the example below of two incoming log lines which are logging data for a single SQL query and emitted as follows.

Now in the above case, the cost of storing two individual logs becomes high due to the cost of metadata which is exactly the same.

Fields marked in yellow are the raw log lines emitted and those marked in red are fields used to uniquely identify those streams.

Running these through the Trace Summarization transform returns the following:

Trace Summarization works by "newline" concatenating the raw log into one.

There are two aspects to configuring Trace Summarization:

  1. Identification: Identify groups of incoming log streams based on trace information.

  2. Trace Summarization Configs: Once identified provide configs on summarizing the events.

Identification

An incoming trace can be identified based on field values or values extracted from those field values.

There are two ways that information can be used.

  1. Direct Columns: If the field value is directly usable for identification then those columns are called direct columns.

    • Kubernetes.pod_id
    • docker.id
  1. In-Direct Columns: If the field value is not directly usable they are referred to as in-direct columns.

    Example: "message":"Query with ID:df4gyhb completed in 4ms"

In the above example, the id needs to be extracted.

```
{message, regex "ID:([a-z0-9]+)"}
\
```

Notes:

  • At max two indirect columns can be specified.

  • Regex supplied has to provide a capturing group and the value of the first capture group is used as an extracted field.

  • Regex can be tested at Test Regex

Trace Summarization Configs

  • Keep Last values: While summarizing multiple events keep only the last values pertaining to the events. The default is to keep only the first values for the field.

  • Max Events: Maximum number of events to collate together.

  • Flush Time: Maximum time a stream of matching events will be kept in memory before it is flushed to the destination.

  • Raw Log Field: This field describes the raw log field which is a newline concatenated together to form a merged log line. This is the field marked in yellow in the example above.

  • Prefix logs with Timestamp: Setting this config prefixes timestamp to the individual log lines which are collated.

Trace Summarization Best Practices

Here are the top best practices for Trace Summarization:

  1. Define the Raw Trace Field Clearly: Specify the field that contains your raw trace data (for example, trace.message) so that the transform correctly targets the trace events.

  2. Configure "Max Events" Appropriately: Set the "Max Events" parameter to a value that captures sufficient detail for a complete trace while preventing overly large groups that might hinder processing performance.

  3. Set Optimal Flush Time: Choose a "Flush Time (seconds)" that provides a balance between collecting enough events for accurate summarization and delivering near-real-time insights for timely troubleshooting.

  4. Employ Effective Regex for Span Identification: Use a precise regular expression to detect the beginning of a trace line in multi-line logs, ensuring that each new trace event is correctly recognized and grouped.

  5. Include Relevant Metadata: Integrate additional metadata (such as service names, host IDs, or pod identifiers) to help correlate related events across distributed systems and provide context to the summarized trace.

  6. Validate with Representative Data: Test your configuration using real-world trace samples to confirm that the summarization accurately reflects key performance metrics, error conditions, and overall system behavior.

  7. Monitor and Refine Continuously: Regularly review the output and adjust parameters like "Max Events" and "Flush Time" as trace patterns and system requirements evolve, ensuring that the summarization remains effective over time.

  • Cloudtrail Optimizer: Transform group to process AWS Cloudtrail events.

  • Exception Summarization: This transformer performs summarization on the exception data tying together multiline exceptions to one.

Last updated

Was this helpful?