Reduce

The Reduce function aggregates multiple events into a single event based on specified criteria. This is useful for summarizing data, calculating metrics, or reducing the volume of events for downstream processing.

Purpose

Use this function when you need to consolidate multiple events into a single event. Common use cases include:

  • Calculating sums, averages, or other aggregations over a time window.

  • Grouping events by a specific field such as user ID, location.

  • Reducing the volume of data for storage or analysis.

Usage

Select Reduce transform. Add Name (required) and Description (optional).

General Configuration:

  • Bypass Transform: Defaults to disable. When enabled, this transform will be bypassed entirely, allowing the event to pass through without any modifications.

  • Add Filter Conditions: Defaults to disable. When enabled, it allows events to filter through conditions. Only events that meet the true condition will be processed; all others will bypass this transform. Based on AND/OR conditions, "+Rule" or "+Group" buttons.

Reduce: Enabled: Defaults to enabled, meaning it does evaluate all events. Toggle Enabled off to prevent event processing to feed data to the downstream Transforms.

Max Events: Maximum number of events to group together before triggering a flush. If the number of events in a transaction exceeds this value, the current transaction will be flushed. Defaults to 100.

Batch Flush Timeout (seconds): Maximum amount of time in seconds to wait before triggering a flush. If the number of events in a transaction exceeds this value, the current transaction will be flushed. Defaults to 60.

Reduce Conditions: Defaults to empty. When set, allows events to filter through conditions. Only events that meet the true condition will be processed; all others will bypass this transform. Based on AND/OR conditions, "+Rule" or "+Group" buttons.

Group By: An ordered list of fields based on which events should be grouped. Events with matching values for the specified keys are grouped together, enabling independent reduction of each group.

Note on Group By: This separation of event streams facilitates the isolation of distinct data sets. When no fields are specified, all events are consolidated into a single group. For example, if "group_by" is set to ["host", "region"], events sharing the same values for both "host" and "region" fields will be grouped for reduction.Reduction Methods Rules: A list of field names and their corresponding custom reduction strategies. Set of event fields to evaluate and add/set. Click Add button to add new field as a key-value pair, with the following inputs:

  • Field Name: The name of the field whose value is being aggregated.

  • Reduction Strategy: The method used to aggregate the values of the field. For numeric fields, you can use strategies like 'sum', 'max', or 'min'. For string fields, you can use strategies like 'concat', 'retain', or 'discard'.

     Option
     Append value to array
     Append value to array and squash
     Concatenate strings with space separator
     Concatenate strings with newline separator
     Concatenate and squash strings with newline separator
     Concatenate strings without separator
     Keep first value
     Keep last value
     Flattened array of unique values
     Keep maximum value
     Keep minimum value
     Sum values

For each designated field, the provided strategy is applied to combine events, deviating from the default process.

The default procedure follows these guidelines:

  • For string fields, the initial value is retained, while subsequent values are disregarded.

  • Numeric values are summed.

  • Only one reduction strategy is allowed per field name.

  • Only root level field names are supported

Advanced: A list of data types and their corresponding custom reduction strategies contains Data Type Names and Reduction Strategy.

Data Type Name

String

Timestamp

Integer

Float

Boolean

Map

Array

null

Reduction Strategy

Append value to array

Append value to array and squash

Concatenate strings with space separator

Concatenate strings with newline separator

Concatenate and squash strings with newline separator

Concatenate strings without separator

Keep first value

Keep last value

Flattened array of unique values

Keep maximum value

Keep minimum value

Sum values

Examples

Examples require that Enabled is toggled on.

Filtering By Traffic Type

Scenario: Focus only on PAN Traffic Logs. But reduce the src_ip, dest_ip, src_port and dest_port fields if repeated patterns are discovered. Reduce the src_ip and dest_ip fields to a single unique field. Maintain an array of all dest_port and src_port fields.

Reduce Conditions

Condition
Label
Label Condition
Value

AND

palo_alto.log_type

equals

TRAFFIC

Reduce Rules

Max Events
Batch Flush Timeout (seconds)
Group By

100

100

src_port,dest_port

Reductions Methods
Field Name
Reduction Strategy

Rule 1

src_ip

Keep first value

Rule 2

src_port

Append value to array

Rule 3

dest_ip

Keep first value

Rule 4

dest_port

Append value to array

Outcome: Reduces src_ip and dest_ip to single unique field instance. An array of all dest_port and src_port fields is maintained even if those fields are repeated.

PAN Traffic Log (sum together 8 similar events)

{
  "appname": "pan",
  "facility": "lpr",
  "hostname": "cgen",
  "palo_alto": {
    "action": "allow",
    "action_flags": "0x0",
    "action_source": "from-policy",
    "app": "incomplete",
    "bytes": 9259,
    "bytes_in": 0,
    "bytes_out": 9259,
    "dest_interface": "ethernet1/2",
    "dest_ip": "192.168.5.187",
    "dest_location": "10.0.0.0-10.255.255.255",
    "dest_port": ["769", "594", "903", "367", "765", "134", "836", "728"],
    "dest_translated_ip": "192.168.5.187",
    "dest_translated_port": "769",
    "dest_user": "",
    "dest_vm": "",
    "dest_zone": "trusted",
    ...
    "src_interface": "ethernet1/3",
    "src_ip": "172.16.1.67",
    "src_location": "United States",
    "src_port": [
      "57105",
      "58251",
      "52901",
      "52056",
      "52709",
      "51497",
      "54692",
      "59294"
    ],
    "src_translated_ip": "172.16.1.67",
    "src_translated_port": "57105"
    ...
  }
}

Sum Field Values

Scenario: Focus only on PAN Traffic Logs. Sum the bytes, bytes_in and bytes_out fields. Maintains an array of src_port and dest_port fields.

Reduce Conditions

Condition
Label
Label Condition
Value

AND

palo_alto.log_type

equals

TRAFFIC

Reduce Rules

Max Events
Batch Flush Timeout (seconds)
Group By

100

100

src_port, dest_port, action

Reductions Methods
Field Name
Reduction Strategy

Rule 1

bytes

Sum values

Rule 2

bytes_in

Sum values

Rule 3

bytes_out

Sum values

Rule 4

action

Append value to array

Outcome: All the bytes, bytes_in and bytes_out fields are summed. The bytes and bytes_out are summed across 8 events. The bytes_in field maintains its zero capacity. An array for action fields is maintained even if these fields are repeated.

PAN Traffic Log (sum together 8 similar events)

{
  "appname": "pan",
  "facility": "lpr",
  "hostname": "pan_syslog-1",
  "palo_alto": {
    "action": [
      "allow",
      "allow",
      "allow",
      "allow",
      "allow",
      "allow",
      "allow",
      "allow"
    ],
    "action_flags": "0x0",
    "action_source": "from-policy",
    "app": "incomplete",
    "bytes": 36059,
    "bytes_in": 0,
    "bytes_out": 36059,
    "dest_interface": "ethernet1/2",
    "dest_ip": "192.168.5.187",
    "dest_location": "10.0.0.0-10.255.255.255",
    "dest_port": "769"
    ...
  }
}

Behavior

  • Group By: If no group-by fields are specified, all events are aggregated together.

  • Aggregations: Supported operations include sum, average, count, min, max, and first.

  • Time Window: If no time window is specified, the function aggregates events across the entire dataset.

Limitations

  • Aggregating large volumes of data can impact performance.

  • The Reduce function operates within the constraints of the pipeline’s processing capacity.

Best Practices

  1. Use Centralized Logging

  • Why: Centralized logging simplifies management and analysis.

  • Best Practice: Aggregate logs from multiple sources into a single system such as ELK Stack or Splunk.

  • Example: Send logs from servers, applications, and network devices to a centralized log management platform.

  1. Normalize Log Formats

  • Why: Inconsistent log formats make aggregation and analysis difficult.

  • Best Practice: Standardize log formats across all sources such as JSON or key-value pairs.

  • Example: Convert all logs to a common schema:

     "timestamp": "2023-10-01T12:00:00Z",
     "level": "ERROR",
     "message": "Database connection failed"
  1. Aggregate by Common Fields

  • Why: Aggregating by common fields such as timestamp, error type reduces redundancy and improves analysis.

  • Best Practice: Group logs by shared attributes such as count errors by type or summarize logs by time intervals.

  • Example: Aggregate all HTTP 404 errors into a single metric: HTTP 404 errors: 1000.

  1. Use Metrics for High-Volume Logs

  • Why: High-volume logs can overwhelm systems and increase costs.

  • Best Practice: Convert repetitive logs into metrics such as count, average, or sum.

  • Example: Instead of logging each API request, log the total number of requests per minute.

  1. Implement Real-Time Aggregation

  • Why: Real-time aggregation provides immediate insights and reduces storage needs.

  • Best Practice: Use streaming platforms such as Apache Kafka, AWS Kinesis to aggregate logs in real time.

  • Example: Aggregate logs by severity level in real time to trigger alerts for critical issues.

  1. Archive and Summarize Historical Data

  • Why: Storing raw logs indefinitely is costly and inefficient.

  • Best Practice: Summarize and archive historical logs such as daily or weekly summaries.

  • Example: Archive raw logs after 30 days but retain aggregated summaries such as error counts by day.

  • Filter Event: Apply conditions to filter data before or after removing fields.

  • Aggregate Metrics: Aggregate multiple metrics into a single metric based on a set of conditions.

  • Explode: Transforms a single event containing an array into multiple events

Additional Resources

Last updated

Was this helpful?