Filter Events

The Filter Events function in Observo AI allows you to selectively include or exclude events from your data pipeline based on defined conditions. This helps reduce noise, improve data quality, and optimize storage and processing costs.

Purpose

Use the Filter Events function to control which events flow through your pipeline by:

  • Removing unwanted data such as debug logs, health checks, or test environment events

  • Isolating specific events for targeted processing, routing, or analysis

  • Reducing data volume to lower ingestion and storage costs

  • Improving data quality by filtering out noisy or irrelevant events

  • Meeting compliance requirements by excluding sensitive or regulated data

How It Works

The Filter Events function evaluates each incoming event against your defined conditions:

  1. Condition Evaluation: Each event is tested against your filter conditions

  2. Drop Data Setting: Determines whether matching events are kept or removed

    • Drop Data OFF (default): Events that match conditions pass through; others are blocked

    • Drop Data ON: Events that match conditions are dropped; others pass through

  3. Event Processing: Allowed events continue to downstream transforms and destinations

Configuration

To configure the Filter Events function:

  1. Select Filter Events transform from the function library

  2. Add a Name (required) and Description (optional)

  3. Configure the filter settings:

Filter Settings

Enabled

  • Default: ON (enabled)

  • Purpose: Controls whether the filter actively evaluates events

  • When ON: All events are evaluated against filter conditions

  • When OFF: All events bypass this transform without evaluation

Drop Data

  • Default: OFF (disabled)

  • Purpose: Determines whether matching events are kept or removed

  • When OFF: Events matching conditions pass through (include mode)

  • When ON: Events matching conditions are dropped (exclude mode)

Filter Conditions

  • Default: Empty

  • Purpose: Define the logic that determines which events are affected

  • Options: Build conditions using:

    • +Rule: Add a single condition (field, operator, value)

    • +Group: Add a nested group of conditions with AND/OR logic

  • Operators: Select from the list of available operators depend on the field type.

Usage

Include Mode (Drop Data OFF)

Use this mode when you want to keep only specific events.

The filter acts as a whitelist: only events matching your conditions pass through.

Common Use Cases:

  • Keep only error or critical severity logs

  • Process events from specific sources or applications

  • Include only production environment data

  • Filter events within a specific time range or value threshold

Example: Keep only critical severity events

Drop Data: OFF
Conditions: log.level equals "critical"
Result: Only critical logs pass through; all other severity levels are blocked

Exclude Mode (Drop Data ON)

Use this mode when you want to remove specific events.

The filter acts as a blacklist: events matching your conditions are dropped.

Common Use Cases:

  • Remove debug or verbose logging

  • Exclude health check or monitoring probe events

  • Drop test environment data

  • Filter out events with null or empty values

Example: Remove debug logs

Drop Data: ON
Conditions: log.level equals "debug"
Result: Debug logs are dropped; all other severity levels pass through

Examples

Example 1: Drop Events with Specific Pattern (Single Condition)

Scenario: Drop events where the log.message field starts with "Bad" and contains the word "entry".

Configuration:

Setting
Value

Enabled

ON

Drop Data

ON

Filter Conditions:

Operator
Field
Condition
Value

OR

log.message

matches regex

^Bad.+entry.+$

Result: Any log.message that starts with "Bad" and contains "entry" is dropped from the pipeline.

Example 2: Drop Events with Multiple Conditions (AND Logic)

Sample Log Event:

{
  "log": {
    "level": "info",
    "message": "Request processed successfully",
    "source": "api"
  }
}

Scenario: Drop events that are both "info" level AND from "api" source.

Configuration:

Setting
Value

Enabled

ON

Drop Data

ON

Filter Conditions:

Operator
Field
Condition
Value

AND

log.level

equals

info

AND

log.source

contains

api

Result: Events with log.level = "info" AND log.source containing "api" are dropped. Both conditions must be true for the event to be dropped.

Example 3: Separate PAN Traffic & Threat Logs

Scenario: You have a mixed stream of Palo Alto Networks (PAN) logs containing both Traffic and Threat events. You want to route them separately for different processing and destinations.

Solution: Create two Filter Events transforms in your pipeline:

Transform 1: Get Threat Events

Configuration:

Setting
Value

Enabled

ON

Drop Data

OFF

Filter Conditions:

Operator
Field
Condition
Value

AND

palo_alto.log_type

equals

THREAT

Sample Output:

{
  "appname": "pan",
  "facility": "lpr",
  "hostname": "cgen",
  "palo_alto": {
    "future_use1": "1",
    "log_subtype": "end",
    "log_type": "THREAT",
    "receive_time": "2025/02/20 16:42:00",
    "serial_number": "007051000113358",
    "version": "0"
  },
  "severity": "alert",
  "source_ip": "192.168.3.48",
  "timestamp": "2025-02-20T16:42:00.735Z"
}

Transform 2: Get Traffic Events

Configuration:

Setting
Value

Enabled

ON

Drop Data

OFF

Filter Conditions:

Operator
Field
Condition
Value

AND

palo_alto.log_type

equals

TRAFFIC

Sample Output:

{
  "appname": "pan",
  "facility": "lpr",
  "hostname": "cgen",
  "palo_alto": {
    "Action": "allow",
    "log_type": "TRAFFIC",
    "packets": "42",
    "packets_in": "0"
  },
  "severity": "alert",
  "source_ip": "192.168.3.48",
  "timestamp": "2025-02-20T16:41:53.851Z"
}

Benefits of Separation:

  • Apply custom transformations specific to each log type (enrichment, reduction, sampling)

  • Route events to different destinations (e.g., Splunk for Threat, S3 for Traffic)

  • Support compliance and retention policies based on log category

  • Improve performance and data fidelity

Best Practices

1. Test Before Enabling in Production

Always test your filter conditions in a development or staging environment before deploying to production. This prevents accidental data loss from overly aggressive filters.

2. Start with Include Mode (Drop Data OFF)

When defining new filters, start with Drop Data OFF to explicitly specify what you want to keep.

3. Use Specific Conditions

Be as specific as possible in your filter conditions to avoid unintended consequences:

  • Avoid overly broad filters (e.g., dropping all "info" logs without considering source)

  • Combine multiple conditions using AND logic for precision

4. Combine with Other Optimization Functions

Maximize cost savings and efficiency by combining Filter Events with complementary functions:

Sample Function

Use Sample Function to retain only a percentage of high-volume log streams while maintaining visibility for analysis.

Remove Fields Function

Use Remove Field Function to drop unnecessary fields from events, reducing log size and ingestion costs.

Dedupe Function

Use Dedupe Function to eliminate duplicate events based on specified fields, improving storage efficiency.

5. Use Regex Carefully

While regex patterns are powerful, they can impact performance:

  • Keep regex patterns simple and efficient

  • Test regex performance with expected data volumes

  • Consider using simpler operators (equals, contains) when possible

Limitations

Data Recovery

Dropped events cannot be recovered once they are removed from the pipeline. Always ensure your filter conditions are correct before enabling in production.

Performance Impact

Complex filter conditions may impact pipeline performance, especially:

  • Deeply nested condition groups

  • Multiple regex pattern matching

  • High-cardinality field evaluations

  • Very large numbers of OR conditions

Field Availability

Filter conditions can only reference fields that exist in events at the time of filtering:

  • Cannot filter on fields created by downstream transforms

  • Field names must exactly match (case-sensitive)

  • Nested fields require proper path notation

Troubleshooting

Problem: No Events Passing Through

Solutions:

  • Temporarily disable the filter (toggle Enabled OFF) to verify upstream data flow

  • Review sample events to confirm field names and values

  • Check the Drop Data setting matches your intent (include vs. exclude mode)

  • Simplify conditions to test one at a time

  • Use the pipeline preview or test mode to see which conditions are matching

Problem: Unexpected Events Being Dropped

Solutions:

  • Review the exact operators used (equals vs. contains, etc.)

  • Test conditions against known event samples

  • Add null checks for optional fields

  • Verify data types match expected values

  • Check for typos in field names or values

Problem: Filter Not Evaluating

Solutions:

  • Verify Enabled toggle is ON

  • Check that filter conditions are properly configured and saved

  • Review pipeline flow to ensure transform is in correct position

  • Check for any configuration errors or validation warnings

Problem: Performance Degradation

Solutions:

  • Simplify condition logic where possible

  • Reduce the use of regex; use simpler operators when possible

  • Move filter earlier in pipeline to reduce downstream processing

  • Optimize regex patterns for performance

  • Consider splitting complex filters into multiple stages

Enhance your data pipeline by combining Filter Events with these related functions:

Additional Resources

Last updated

Was this helpful?