Filter Events

The Filter Events function in Observo AI allows you to selectively include or exclude events from your data pipeline based on defined conditions. This helps reduce noise, improve data quality, and optimize storage and processing costs.

Purpose

Use the Filter Events function to control which events flow through your pipeline by:

Removing unwanted data such as debug logs, health checks, or test environment events
Isolating specific events for targeted processing, routing, or analysis
Reducing data volume to lower ingestion and storage costs
Improving data quality by filtering out noisy or irrelevant events
Meeting compliance requirements by excluding sensitive or regulated data

How It Works

The Filter Events function evaluates each incoming event against your defined conditions:

Condition Evaluation: Each event is tested against your filter conditions
Drop Data Setting: Determines whether matching events are kept or removed
- Drop Data OFF (default): Events that match conditions pass through; others are blocked
- Drop Data ON: Events that match conditions are dropped; others pass through
Event Processing: Allowed events continue to downstream transforms and destinations

Configuration

To configure the Filter Events function:

Select Filter Events transform from the function library
Add a Name (required) and Description (optional)
Configure the filter settings:

Filter Settings

Enabled

Default: ON (enabled)
Purpose: Controls whether the filter actively evaluates events
When ON: All events are evaluated against filter conditions
When OFF: All events bypass this transform without evaluation

Drop Data

Default: OFF (disabled)
Purpose: Determines whether matching events are kept or removed
When OFF: Events matching conditions pass through (include mode)
When ON: Events matching conditions are dropped (exclude mode)

Filter Conditions

Default: Empty
Purpose: Define the logic that determines which events are affected
Options: Build conditions using:
- +Rule: Add a single condition (field, operator, value)
- +Group: Add a nested group of conditions with AND/OR logic
Operators: Select from the list of available operators depend on the field type.

Usage

Include Mode (Drop Data OFF)

Use this mode when you want to keep only specific events.

The filter acts as a whitelist: only events matching your conditions pass through.

Common Use Cases:

Keep only error or critical severity logs
Process events from specific sources or applications
Include only production environment data
Filter events within a specific time range or value threshold

Example: Keep only critical severity events

Drop Data: OFF
Conditions: log.level equals "critical"
Result: Only critical logs pass through; all other severity levels are blocked

Exclude Mode (Drop Data ON)

Use this mode when you want to remove specific events.

The filter acts as a blacklist: events matching your conditions are dropped.

Common Use Cases:

Remove debug or verbose logging
Exclude health check or monitoring probe events
Drop test environment data
Filter out events with null or empty values

Example: Remove debug logs

Drop Data: ON
Conditions: log.level equals "debug"
Result: Debug logs are dropped; all other severity levels pass through

Examples

Example 1: Drop Events with Specific Pattern (Single Condition)

Scenario: Drop events where the log.message field starts with "Bad" and contains the word "entry".

Configuration:

Setting

Value

Enabled

Drop Data

Filter Conditions:

Operator

Field

Condition

Value

log.message

matches regex

^Bad.+entry.+$

Result: Any log.message that starts with "Bad" and contains "entry" is dropped from the pipeline.

Example 2: Drop Events with Multiple Conditions (AND Logic)

Sample Log Event:

{
  "log": {
    "level": "info",
    "message": "Request processed successfully",
    "source": "api"
  }
}

Scenario: Drop events that are both "info" level AND from "api" source.

Configuration:

Setting

Value

Enabled

Drop Data

Filter Conditions:

Operator

Field

Condition

Value

AND

log.level

equals

info

AND

log.source

contains

api

Result: Events with log.level = "info" AND log.source containing "api" are dropped. Both conditions must be true for the event to be dropped.

Example 3: Separate PAN Traffic & Threat Logs

Scenario: You have a mixed stream of Palo Alto Networks (PAN) logs containing both Traffic and Threat events. You want to route them separately for different processing and destinations.

Solution: Create two Filter Events transforms in your pipeline:

Transform 1: Get Threat Events

Configuration:

Setting

Value

Enabled

Drop Data

OFF

Filter Conditions:

Operator

Field

Condition

Value

AND

palo_alto.log_type

equals

THREAT

Sample Output:

{
  "appname": "pan",
  "facility": "lpr",
  "hostname": "cgen",
  "palo_alto": {
    "future_use1": "1",
    "log_subtype": "end",
    "log_type": "THREAT",
    "receive_time": "2025/02/20 16:42:00",
    "serial_number": "007051000113358",
    "version": "0"
  },
  "severity": "alert",
  "source_ip": "192.168.3.48",
  "timestamp": "2025-02-20T16:42:00.735Z"
}

Transform 2: Get Traffic Events

Configuration:

Setting

Value

Enabled

Drop Data

OFF

Filter Conditions:

Operator

Field

Condition

Value

AND

palo_alto.log_type

equals

TRAFFIC

Sample Output:

{
  "appname": "pan",
  "facility": "lpr",
  "hostname": "cgen",
  "palo_alto": {
    "Action": "allow",
    "log_type": "TRAFFIC",
    "packets": "42",
    "packets_in": "0"
  },
  "severity": "alert",
  "source_ip": "192.168.3.48",
  "timestamp": "2025-02-20T16:41:53.851Z"
}

Benefits of Separation:

Apply custom transformations specific to each log type (enrichment, reduction, sampling)
Route events to different destinations (e.g., Splunk for Threat, S3 for Traffic)
Support compliance and retention policies based on log category
Improve performance and data fidelity

Best Practices

1. Test Before Enabling in Production

Always test your filter conditions in a development or staging environment before deploying to production. This prevents accidental data loss from overly aggressive filters.

2. Start with Include Mode (Drop Data OFF)

When defining new filters, start with Drop Data OFF to explicitly specify what you want to keep.

3. Use Specific Conditions

Be as specific as possible in your filter conditions to avoid unintended consequences:

Avoid overly broad filters (e.g., dropping all "info" logs without considering source)
Combine multiple conditions using AND logic for precision

4. Combine with Other Optimization Functions

Maximize cost savings and efficiency by combining Filter Events with complementary functions:

Sample Function

Use Sample Function to retain only a percentage of high-volume log streams while maintaining visibility for analysis.

Remove Fields Function

Use Remove Field Function to drop unnecessary fields from events, reducing log size and ingestion costs.

Dedupe Function

Use Dedupe Function to eliminate duplicate events based on specified fields, improving storage efficiency.

5. Use Regex Carefully

While regex patterns are powerful, they can impact performance:

Keep regex patterns simple and efficient
Test regex performance with expected data volumes
Consider using simpler operators (equals, contains) when possible

Limitations

Data Recovery

Dropped events cannot be recovered once they are removed from the pipeline. Always ensure your filter conditions are correct before enabling in production.

Performance Impact

Complex filter conditions may impact pipeline performance, especially:

Deeply nested condition groups
Multiple regex pattern matching
High-cardinality field evaluations
Very large numbers of OR conditions

Field Availability

Filter conditions can only reference fields that exist in events at the time of filtering:

Cannot filter on fields created by downstream transforms
Field names must exactly match (case-sensitive)
Nested fields require proper path notation

Troubleshooting

Problem: No Events Passing Through

Solutions:

Temporarily disable the filter (toggle Enabled OFF) to verify upstream data flow
Review sample events to confirm field names and values
Check the Drop Data setting matches your intent (include vs. exclude mode)
Simplify conditions to test one at a time
Use the pipeline preview or test mode to see which conditions are matching

Problem: Unexpected Events Being Dropped

Solutions:

Review the exact operators used (equals vs. contains, etc.)
Test conditions against known event samples
Add null checks for optional fields
Verify data types match expected values
Check for typos in field names or values

Problem: Filter Not Evaluating

Solutions:

Verify Enabled toggle is ON
Check that filter conditions are properly configured and saved
Review pipeline flow to ensure transform is in correct position
Check for any configuration errors or validation warnings

Problem: Performance Degradation

Solutions:

Simplify condition logic where possible
Reduce the use of regex; use simpler operators when possible
Move filter earlier in pipeline to reduce downstream processing
Optimize regex patterns for performance
Consider splitting complex filters into multiple stages

Enhance your data pipeline by combining Filter Events with these related functions:

Sample Function: Retain a percentage of events for representative analysis of high-volume streams
Remove Fields Function: Drop unnecessary fields to reduce event size and storage costs
Dedupe Function: Remove duplicate events based on specified fields
Add Fields Function: Add computed or static fields before filtering
Rename Fields Function: Standardize field names across different log sources

Additional Resources

PreviousExplode NextHash Replace

Last updated 4 months ago

Was this helpful?

hashtagPurpose

hashtagHow It Works

hashtagConfiguration

hashtagFilter Settings

hashtagEnabled

hashtagDrop Data

hashtagFilter Conditions

hashtagUsage

hashtagInclude Mode (Drop Data OFF)

hashtagExclude Mode (Drop Data ON)

hashtagExamples

hashtagExample 1: Drop Events with Specific Pattern (Single Condition)

hashtagExample 2: Drop Events with Multiple Conditions (AND Logic)

hashtagExample 3: Separate PAN Traffic & Threat Logs

hashtagTransform 1: Get Threat Events

hashtagTransform 2: Get Traffic Events

hashtagBest Practices

hashtag1. Test Before Enabling in Production

hashtag2. Start with Include Mode (Drop Data OFF)

hashtag3. Use Specific Conditions

hashtag4. Combine with Other Optimization Functions

hashtagSample Function

hashtagRemove Fields Function

hashtagDedupe Function

hashtag5. Use Regex Carefully

hashtagLimitations

hashtagData Recovery

hashtagPerformance Impact

hashtagField Availability

hashtagTroubleshooting

hashtagProblem: No Events Passing Through

hashtagProblem: Unexpected Events Being Dropped

hashtagProblem: Filter Not Evaluating

hashtagProblem: Performance Degradation

hashtagRelated Functions

hashtagAdditional Resources

Purpose

How It Works

Configuration

Filter Settings

Enabled

Drop Data

Filter Conditions

Usage

Include Mode (Drop Data OFF)

Exclude Mode (Drop Data ON)

Examples

Example 1: Drop Events with Specific Pattern (Single Condition)

Example 2: Drop Events with Multiple Conditions (AND Logic)

Example 3: Separate PAN Traffic & Threat Logs

Transform 1: Get Threat Events

Transform 2: Get Traffic Events

Best Practices

1. Test Before Enabling in Production

2. Start with Include Mode (Drop Data OFF)

3. Use Specific Conditions

4. Combine with Other Optimization Functions

Sample Function

Remove Fields Function

Dedupe Function

5. Use Regex Carefully

Limitations

Data Recovery

Performance Impact

Field Availability

Troubleshooting

Problem: No Events Passing Through

Problem: Unexpected Events Being Dropped

Problem: Filter Not Evaluating

Problem: Performance Degradation

Related Functions

Additional Resources