Extract Regex

The Extract Regex function in Observo AI uses regular expressions to match patterns and extract information from your data. This is useful for parsing unstructured or semi-structured data into structured fields for further processing.

Purpose

To use the Extract Regex function, define a regular expression pattern and specify the fields to capture. The function will apply the regex to the input data and extract the matched groups into new fields.

Usage

Select Extract Regex transform. Add Name (required) and Description (optional).

General Configuration:

  • Bypass Transform: Defaults to disabled. When enabled, this transform will be bypassed entirely, allowing the event to pass through without any modifications.

  • Add Filter Conditions: Defaults to disabled. When enabled, it allows events to filter through conditions. Only events that meet the true condition will be processed; all others will bypass this transform. Based on AND/OR conditions, "+Rule" or "+Group" buttons.

Extract: Enabled: Defaults to enabled, meaning it does evaluate all events. Toggle Enabled off to prevent event processing to feed data to the downstream Transforms.

Extract Fields Rules: Set of event fields to evaluate and add/set. First set of field entries added by default. Click Add button to add new fields, with the following inputs:

  • Field Name: Enter field name whose value will be used for extraction.

  • Regular Expression for Extraction: Regular expression to specify the extraction part. Require unnamed capture groups to identify and take the first capture group.

  • New Field Name: Enter field name. Make sure that the field name doesn't contain '-'.

Examples

Examples require that Enabled is toggled on.

Extracting a Single Field

Scenario: Extract the IP address from a log message.

Extract Fields Rule

Field Name
Regular Expression for Extraction
New Field Name

message

(\b(?:\d{1,3}\.){3}\d{1,3}\b)

source_ip

Input Data
Output Data
{

"message":"[2025-04-21 12:01:33] INFO: Connection established from IP: 192.168.1.10",

"timestamp":"2025-04-21T12:12:54Z"

}

{

"message":"[2025-04-21 12:01:33] INFO: Connection established from IP: 192.168.1.10",

"source_ip":"192.168.1.10",

"timestamp":"2025-04-21T12:12:54Z"

}

Results: New field source_ip is added to the log entry.

Extracting Multiple Fields

Scenario: Extract the username and domain from an email address.

Extract Fields Rule 1

Field Name
Regular Expression for Extraction
New Field Name

message

Email:\s([\w.+-]+)@[\w.-]+\.\w+

username

Input Data
Output Data
{

"message":"[2025-04-21 09:13:45] INFO: User login successful - Email: [email protected]",

"timestamp":"2025-04-21T11:29:39Z"

}

{

"message":"[2025-04-21 09:13:45] INFO: User login successful - Email: [email protected]",

"timestamp":"2025-04-21T11:29:39Z",

"username":"alice.smith"

}

Results: New fields username is added to the log entry.

Extract Fields Rule 2

Field Name
Regular Expression for Extraction
New Field Name

message

Email:\s[\w.+-]+@([\w.-]+\.\w+)

domain

Input Data
Output Data
{

"message":"[2025-04-21 09:13:45] INFO: User login successful - Email: [email protected]",

"timestamp":"2025-04-21T11:29:39Z"

}

{

"domain":"example.com",

"message":"[2025-04-21 09:13:45] INFO: User login successful - Email: [email protected]",

"timestamp":"2025-04-21T11:29:39Z",

"username":"alice.smith"

}

Results: New fields domain is added to the log entry and username is retained from the previous extract field entry.

Best Practices

  • Filter Event: Apply conditions to filter data before or after removing fields.

  • Rename Fields: Rename fields to standardize naming conventions.

  • Add Fields: Add new fields to your data.

Additional Resources

Last updated

Was this helpful?