Pattern Extractor

Pattern Extractor

The Observo AI data pipeline features an intelligent module that rapidly extracts pattern clusters from streaming data. In the context of log data, pattern mining involves identifying recurring sequences or structures in logs generated by software systems, networks, or applications—key elements that help understand system behavior, diagnose issues, and enhance overall reliability.

Purpose

Observo AI Pattern Extractor leverages memory-efficient algorithms to process petabyte-scale log data in real time, using machine learning to identify and apply patterns that transform raw streams into structured insights. By condensing recurring event sequences into compact representations, the system reduces data volume while grouping similar log entries to simplify analysis. These pattern clusters establish baselines of normal behavior, enabling dynamic anomaly detection when deviations occur. At the same time, detailed logs are abstracted into higher-level behaviors and trends, allowing analysts to focus on system dynamics rather than sifting through individual entries.

Pattern Extractor

The Observo AI Pattern Extractor is a multi-faceted process that combines three integrated components to transform raw log data into actionable insights:

  • Log Metadata Enricher Configurations: This initial step enriches incoming log data with contextual metadata—such as timestamps, log levels, host details, and event identifiers—based on predefined or dynamic configurations. By tagging logs with this additional context, the system sets the stage for more accurate pattern recognition.

  • Pattern Extractor: Leveraging memory-efficient algorithms, the Pattern Extractor processes the enriched logs in real time to identify recurring patterns and group similar events. This step not only condenses multiple instances of similar events into a single representation, reducing data dimensionality, but also establishes a baseline of normal behavior that is critical for detecting anomalies.

  • Enricher Sentiment Analyzer: Once patterns are identified, this module applies deep learning–based sentiment analysis to assess the contextual tone of each pattern. By assigning sentiment scores (such as positive, neutral, or negative), it highlights patterns that might indicate potential issues or significant events, helping teams prioritize critical alerts and streamline incident response.

Pattern Extractor Configuration

The Observo AI Pattern Extractor transforms raw log data into actionable insights through intelligent configuration at the source level. The process begins by enriching incoming log data with essential contextual metadata—including timestamps, host details, and event identifiers—using flexible, customizable configurations. This metadata enrichment creates the foundation for accurate and meaningful pattern identification.

How It Works

Using memory-efficient algorithms, the Pattern Extractor processes enriched logs in real time to identify recurring patterns and intelligently group similar events. This real-time analysis enables immediate detection of operational trends, security threats, and system anomalies as they occur.

Configuration Steps

To configure Pattern Extractor for your data source:

  1. Open your Source in edit mode within any pipeline

  2. Navigate to the Pattern Extractor tab

  3. Complete the three required sections**:**

    • Log Metadata Enricher Configuration: Define which log fields to enrich with metadata

    • Pattern Extractor Enricher: Specify the fields to analyze for pattern identification

    • Sentiment Analyzer: Configure sentiment analysis parameters for detected patterns

Log Metadata Enricher Configuration (LogSchema)

  • Log Payload Paths (Add as needed): Defines a list of paths where the log payload may exist in an event. If more than one path is defined, the first path that exists in the event will be used to parse the log payload. Leave empty if there is no log payload in the event. Example: message.

  • Timestamp Paths (Add as needed): Defines a list of paths where the timestamp may exist in an event. If more than one path is defined, the first path that exists in the event will be used to parse the timestamp. If no path exists in the event or if there are no paths specified by the user, the timestamp will be set by Observo as the current time. Example: time.

  • Host Paths (Add as needed): Defines a list of paths where the host may exist in an event. If more than one path is defined, the first path that exists in the event will be used as the hostname. Leave empty if there is no host in the event. Example: hostname.

Pattern Extractor Enricher (Pattern Extractor Configs)

  • Add tags as required. These tags are used for clustering Pattern Extractions.

  • Examples are kubernetes.labels.app, kubernetes.labels.env or sentiment which effectively instruct Pattern Extractor to do pattern extraction at that granularity.

  • Keep in mind the cardinality of these tuples should be < 500.

  • Advanced:

    • Truncate log lines bigger than 2KB (False): Enabled if truncation required.

    • Setting to true will truncate log lines bigger than 2KB when doing pattern extraction

Note: The Pattern Extractor Enricher can detect text-based or regex-based patterns.

Sentiment Analyzer (Config)

  • Enabled (False): Enable or disable this config group.

  • Use logpath already set for log locations (False): Enable this if you want logpath to be used in addition to Locations for raw log

  • Locations for raw log (Add as needed): Locations where raw log can be found. Leave empty if not required. Example: log.

  • Advanced:

    • Field name which will be populated with sentiment value. Example: sentiment

    • Negative Words (Add as needed): Define negative words that signify negative sentiment in your log data. The configuration list is fully customizable, allowing you to add additional terms or remove any that aren't relevant to your use case.

    • Negative Regexes (Add as needed): Define regular expression patterns that identify negative sentiment in your log data. The configuration list is fully customizable, allowing you to add additional regex patterns or remove any that aren't relevant to your use case.

Pattern Generation Volume Requirement

The Pattern Extractor requires a minimum of 40,000 log entries to initially generate and display patterns in the Data Insights Dashboard. This baseline volume is necessary to capture sufficient variability in the data for reliable pattern extraction.

Timeline by Log Ingestion Rate:

  • High volume (≥1,000 logs/sec): Patterns generated within a few minutes

  • Medium volume (100–500 logs/sec): Patterns generated in 2–8 minutes

  • Low volume (10–50 logs/sec): Patterns generated in 15 minutes to over 1 hour

  • Very low volume (<10 logs/sec): Patterns generated in several hours or longer

Examples

The following examples demonstrate specific implementation scenarios for different log types, showing how to optimize Pattern Extractor settings for maximum insight generation:

  • Network Log Example - Security-focused network traffic analysis

  • Windows Standard Log Example - Windows event log pattern identification

  • Kubernetes Log Example - Kubernetes application log pattern extraction

Each example provides detailed field mappings and configuration settings tailored to extract the most valuable patterns from that specific log type.

Network Log Example

In this example, the Log Payload Paths in the Log Metadata Enricher Configurations definition is set to the parsed threat_category field, while the Pattern Extractor Configs in the Pattern Extractor Enricher definition is configured with sentiment, threat_category, threat_severity, threat_sub_name and message. The Sentiment Analyzer is enabled, with the Use logpath already set for log locations option turned on, and the Location for raw logs is set to threat_category and threat_severity.

Network Log Entry

{
"action":"BLOCK",
"application":"SSH",
"application_category":"Remote Access",
"bytes_received":"2388",
"bytes_sent":"1167",
"dest_ip":"91.132.75.115",
"dest_port":"22",
"device":"MACBOOK-15",
"hostname":"zscaler-nss-fw",
"location":"Berlin",
"location_type":"Office",
"nss_sub_type":"FIREWALL",
"protocol":"TCP",
"service":"observo.ai",
"src_ip":"10.151.180.212",
"src_port":"12662",
"syslog_priority":"134",
"syslog_version":"1",
"threat_category":"Reconnaissance",
"threat_name":"Port Scan",
"threat_severity":"Medium",
"timestamp":"2025-09-03 11:17:44",
"user":"asmith",
"user_domain":"example.com",
"vendor_id":"32473",
"zscaler_tag":"NSS-FW"
}

Log Metadata Enricher Configurations (LogSchema)

Log Payload Paths

threat_category

Pattern Extractor Enricher (Pattern Extractor Configs)

Pattern Extractor Configs

sentiment

threat_category

threat_severity

threat_sub_name

message

Sentiment Analyzer (Config)

Enabled
Use logpath already set for log locations
Locations for raw log

Toggled on

Toggled on

threat_category threat_severity

The results of this data pipeline telemetry analysis, is reflected in the Data Insights Dashboard. See Data Insights Dashboard section for more details about each panel.

The resulting output is reflected in the accompanying panels:

  • Tags Trends for Patterns: Tracks trends in tagged log data to identify recurring behaviors. These recurring behaviours reflect all pattern tags defined in Pattern Extractor Enricher (Pattern Extractor Configs) section and can be selected based on date and time.

  • Patterns Trend: Analyzes recurring log patterns over time to detect operational shifts. All pattern tags are reflected in the Pattern Insights tab and can be selected based on date and time.

Windows Standard Log Example

In this example, the Log Payload Paths in the Log Metadata Enricher Configurations definition is set to the parsed Message field, while the Pattern Extractor Configs in the Pattern Extractor Enricher definition is configured with sentiment, Message, EventCode, Keywords, user, and tail_data. The Sentiment Analyzer is enabled, with the Use logpath already set for log locations option turned on, and the Location for raw logs is set to Keywords, tail_data and Message.

Windows Standard Log Entry

{
  "EventCode": "4624",
  "EventCount": 1,
  "EventType": "0",
  "Keywords": "Audit Success",
  "LogName": "Security",
  "Message": "An account was successfully logged on.",
  "OpCode": "Info",
  "SourceName": "Microsoft Windows security auditing.",
  "Type": "Information",
  "authentication_keylength": "0",
  "authentication_method": "Negotiate",
  "category": "Logon",
  "dest": "sh-windataserver",
  "event_id": "14018015808",
  "eventtime": "08/29/2025 04:49:52 pm",
  "host": "sh-windataserver",
  "logon_elevated_token": "Yes",
  "logon_impersonation_level": "Impersonation",
  "logon_process": "Advapi",
  "logon_type": "5",
  "logon_virtual_account": "No",
  "metadata": {
    "event_timestamp": 1757073054,
    "host": "sh-windataserver",
    "index": "obai_win",
    "path": "C:Program FilesSplunkUniversalForwarder\binsplunk-winevtlog.exe",
    "source": "WinEventLog:Security",
    "sourcetype": "WinEventLog"
  },
  "process": "C:WindowSsystem32services.exe",
  "process_id": "0x27c",
  "service": "observo.ai",
  "session_id": "0x3E7",
  "src_nt_domain": "NT AUTHORITY",
  "src_nt_domain_type": "WORKGROUP",
  "src_user": "SH-WINDATASERVE$",
  "src_user_id": "NT AUTHORITYSYSTEM",
  "tail_data": "This event is generated when a logon session is created. It is generated on the computer that was accessed.|The subject fields indicate the account on the local system which requested the logon. This is most commonly a service such as the Server service, or a local process such as Winlogon.exe or Services.exe.|The logon type field indicates the kind of logon that occurred. The most common types are 2 (interactive) and 3 (network).|The New Logon fields indicate the account for whom the new logon was created, i.e. the account that was logged on.|The network fields indicate where a remote logon request originated. Workstation name is not always available and may be left blank in some cases.|The impersonation level field indicates the extent to which a process in the logon session can impersonate.|The authentication information fields provide detailed information about this specific logon request.|- Logon GUID is a unique identifier that can be used to correlate this event with a KDC event.|- Transited services indicate which intermediate services have participated in this logon request.|- Package name indicates which sub-protocol was used among the NTLM protocols.|- Key length indicates the length of the generated session key. This will be 0 if no session key was requested.",
  "timestamp": "2025-08-29T16:49:52.688224033Z",
  "user": "SYSTEM",
  "user_nt_domain": "NT AUTHORITY"
}

Log Metadata Enricher Configurations (LogSchema)

Log Payload Paths

Message

Pattern Extractor Enricher (Pattern Extractor Configs)

Pattern Extractor Configs

sentiment

Message

EventCode

Keywords

user

tail_data

Sentiment Analyzer (Config)

Enabled
Use logpath already set for log locations
Locations for raw log

Toggled on

Toggled on

Keywords tail_data Message

The results of this data pipeline telemetry analysis, is reflected in the Data Insights Dashboard. See Data Insights Dashboard section for more details about each panel.

The resulting output is reflected in the accompanying panels:

  • Tags Trends for Patterns: Tracks trends in tagged log data to identify recurring behaviors. These recurring behaviours reflect all pattern tags defined in Pattern Extractor Enricher (Pattern Extractor Configs) section and can be selected based on date and time.

  • Patterns Trend: Analyzes recurring log patterns over time to detect operational shifts. All pattern tags are reflected in the Pattern Insights tab and can be selected based on date and time.

Kubernetes Log Example

In this example, the Log Payload Paths in the Log Metadata Enricher Configurations definition is set to the parsed message field, while the Pattern Extractor Configs in the Pattern Extractor Enricher definition is configured with sentiment and kubernetes.labels.app. The Sentiment Analyzer is enabled, with the Use logpath already set for log locations option turned on, and the Location for raw logs is set to message.

Kubernetes Log Entry

{
  "docker": {
    "container_id": "4b032691614a46b3af7004889e9d8365"
  },
  "kubernetes": {
    "container_image": "api.amazonaws.com/connect:connect-api",
    "container_image_id": "docker-pullable://api.amazonaws.com/connect@sha256:032c5d82eed3b2780e094adaf5810c3000faf8d1396c96fe93810fd0df5bca61",
    "container_name": "connect-api",
    "host": "ip-73.86.135.168.ec2.internal",
    "labels": {
      "app": "connect-api",
      "pod-template-hash": "7bb79bd977"
    },
    "master_url": "https://42.232.251.188:443/api",
    "namespace_id": "2045db6a-21e9-425c-917b-eeac43fe5122",
    "namespace_name": "tools-v3",
    "pod_id": "44a703a2-249b-4eee-b6b7-14d2b410a1ed",
    "pod_name": "connect-api-6d2a984253cf4ea39a8185c7c203ec86"
  },
  "message": "method TRACE for path category in from src 217.27.65.151 took 34ms",
  "timestamp": "2024-03-21T07:58:09Z"
}

Log Metadata Enricher Configurations (LogSchema)

Log Payload Paths

message

Pattern Extractor Enricher (Pattern Extractor Configs)

Pattern Extractor Configs

sentiment

message

Sentiment Analyzer (Config)

Enabled
Use logpath already set for log locations
Locations for raw log

Toggled on

Toggled on

message

The results of this data pipeline telemetry analysis, is reflected in the Data Insights Dashboard. See Data Insights Dashboard section for more details about each panel.

The resulting output is reflected in the accompanying panels:

  • Tags Trends for Patterns: Tracks trends in tagged log data to identify recurring behaviors. These recurring behaviours reflect all pattern tags defined in Pattern Extractor Enricher (Pattern Extractor Configs) section and can be selected based on date and time.

  • Patterns Trend: Analyzes recurring log patterns over time to detect operational shifts. All pattern tags are reflected in the Pattern Insights tab and can be selected based on date and time.

Note : In this log pattern, the Pattern Insights are reflected as regex-based patterns.

Patterns in Action

Observo AI Patterns can be applied to optimize data pipelines using two primary methods:

  1. Direct Patterns

  2. Query Patterns

Below is a streamlined guide to leveraging these approaches, with examples for Windows and Kubernetes logs.

Direct Patterns

Direct Pattern allows users to copy patterns from the Observo AI Pattern Insights panel and apply them to transforms, such as Filter Event transforms, for immediate data processing optimization.

Patterns can be text-based or regex-based, enabling flexible and targeted data filtering.

  • Text-Based: Plain text for straightforward matching.

    • Example: For Windows logs, use “An account was successfully logged on” to filter successful logon events (EventCode 4624).

  • Regex-Based: Regular expressions for complex matching.

    • Example: For Kubernetes logs, use DEBUG\s+-\s+Query\s+ID:[0-9a-fA-F]{8,}\s+started\s+on\s+host\s+[^\s]+ to filter debug events for query starts.

Text-Based
Regex-Based

Steps to Apply Patterns Directly

  1. Access the Pattern Insights Panel: Navigate to the Pattern Insights panel in the Data Insights dashboard.

  2. Select a Pattern: Choose a pattern from the list of identified patterns such as text-based or regex-based).

  3. Copy the Pattern: Copy the pattern text or regex expression (use the copy icon).

  4. Apply to Transform: Paste the pattern into a transform, such as a Filter Events transform, to filter or process data based on the selected pattern.

  5. Test and Validate: Run the transform to ensure the pattern correctly filters or processes the data as intended.

Direct Pattern Examples

  • Windows Logs:

    • Pattern: "An account failed to log on"

    • Action: Apply to a Filter Events transform to isolate failed logon attempts for security monitoring.

  • Kubernetes Logs:

    • Pattern: ERROR\s+-\s+Query\s+ID:[0-9a-fA-F]{8,}\s+failed\s+due\s+to\s+insufficient\s+resources.*

    • Action: Apply to a Filter Events transform to capture resource-related error events for debugging.

Query Patterns

Use targeted queries with the Orion AI Agent to unlock pattern capabilities for pipeline optimization and event detection.

Pipeline Optimization Query

  • Query: How can I optimize this pipeline?

  • Output: Returns the top 5 patterns by frequency with actionable recommendations.

  • Action: Apply a Filter Events transform to focus on high-impact patterns.

Detection Query

  • Query Example: Use the threat_category field to get all events that are Anomalous

  • Output: Applies pre-trained anomaly detection patterns, isolates anomalous events, and creates a Filter Events transform to capture them.

Query Patterns Examples

Windows Logs

  • Query: How can I optimize this pipeline? Results (Top 5 Patterns):

Pattern
Percentage

An account was successfully logged on

36.24%

Special privileges assigned to new logons

24.88%

An account failed to log on

14.51%

A logon was attempted using explicit credentials

7.13%

An account was logged off

7.12%

  • Action: Apply a Detection Query: Apply the one with the highest percentage. You would then get a Filter Events that identifies EventCode 4624 is associated with the “An account was successfully logged on.” pattern and capture only those events.

Kubernetes Logs

  • Query: How can I optimize this pipeline? Results (Top 5 Patterns):

Pattern
Percentage

DEBUG - Query ID:<:hex:> started on host <*>

19.93%

DEBUG - Query ID:<:hex:> enqueued on host <*>

19.93%

DEBUG - Query ID:<:hex:> in-progress on host <*>

19.93%

ERROR - Query ID:<:hex:> failed due to insufficient resources on host <> took <>

13.50%

ERROR - Query ID:<:hex:> failed on host <> took <>

13.43%

  • Action: Apply a Detection Query: Use the message field to get all events that have debug This creates a Filter Events transform to capture only debug-level application events.

Benefits of Both Methods

  • Efficiency: Direct Pattern Application enables rapid filtering without queries, while Query Patterns use the Orion AI Agent to automate pattern discovery and application.

  • Precision: Regex-based patterns and anomaly detection queries ensure accurate targeting of complex or critical events.

  • Scalability: Both methods support diverse data sources such as Windows and Kubernetes logs and transform types.

  • Actionable Insights: Patterns focus on high-impact or anomalous events, streamlining analysis and debugging.

Notes

  • Patterns enhance pipeline efficiency by prioritizing significant event patterns

  • Combine both methods for maximum flexibility:

    • Use Direct Pattern Application for quick manual filtering

    • Use Query Pattern for automated, query-driven optimization via Orion AI Agent

Key Benefits

The Observo AI Pattern Extractor not only identifies patterns in massive streams of log data but transforms them into actionable intelligence. The following key benefits highlight how it improves scale, efficiency, and decision-making across security, DevOps, and IT operations.

  • High-Speed Pattern Discovery: Extracts recurring structures from streaming log data in real time, enabling teams to quickly understand system behavior, track trends, and uncover operational insights.

  • Massive Scale with Full Fidelity: Processes petabyte-scale log data using memory-efficient algorithms, while preserving complete raw data in the data lake for compliance, audit, and rehydration into any analytics tool on demand.

  • Smarter Anomaly & Event Detection: Establishes baselines of normal activity and flags deviations dynamically. Deep learning–based sentiment analysis further prioritizes patterns that indicate risk, failure, or negative impact.

  • Simplified Analysis & Faster Response: Groups and abstracts millions of log lines into high-level clusters, surfacing only the most relevant signals. Analysts can focus on meaningful events rather than sifting through noise, accelerating troubleshooting and incident response.

  • Cross-Functional Value: Provides actionable insights for Security, DevOps, and IT Operations teams alike—helping each group detect anomalies, reduce risk, and optimize system reliability from a shared view of telemetry data.

Last updated

Was this helpful?