XML

The XML (Parser) in Observo AI allows users to selectively parse XML content, offering configurable options such as defining new field names, retaining or removing original fields, and parsing numbers and booleans.

Purpose

The purpose of the XML Parser is used to read, analyze, and convert XML data into a format that destinations can process. It ensures that the XML document follows the correct syntax and structure while extracting meaningful data for further use. With the ability to include attributes in the parsed structure, version control for updates, and a user-friendly configuration interface, the XML Parser provides a versatile solution for efficiently handling XML data within diverse data processing scenarios.

Usage

Select XML Parser transform. Add Name (required) and Description (optional).

General Configuration:

  • Bypass Transform: Defaults to disable. When enabled, this transform will be bypassed entirely, allowing the event to pass through without any modifications.

  • Add Filter Conditions: Defaults to disable. When enabled, it allows events to filter through conditions. Only events that meet the true condition will be processed; all others will bypass this transform. Based on AND/OR conditions, "+Rule" or "+Group" buttons.

Grok Parser: Enabled: Defaults to enabled, meaning it does evaluate all events. Toggle Enabled off to prevent event processing to feed data to the downstream Transforms.

Fields to Parse Rules: Set of event fields to evaluate and add/set. First field entry (1 rule) key-value pair added by default. Click Add button to add new field as a key-value pair, with the following inputs:

  • Field Name: Field name on whose value the XML parser will apply.

  • New Field Name: New field name for storing the parsed XML structure. If left blank, the XML will be stored at the root level. If a new field is already present, results of parsing the event will be merged with the existing field's values.

  • Keep Original Field: Keep the original field after parsing. If false, the original field will be removed.

  • Parse Numbers: Parse numbers into integers or floats.

  • Parse Booleans: Parse boolean values.

  • Include Attributes: Include attributes in the parsed XML structure. Note: attribute keys are prefixed with '@'.

Examples

XML Parse Windows Security Logs

Scenario: Parse Windows Security logs and convert them into a XML format.

Fields to Parse Rules

Field Name
NewField Name
Keep Original Field
Parse Numbers
Parse Booleans
Include Attributes

message

[Empty]

disabled

disabled

disabled

disabled

Input

{
"message": "<book category=\"CHILDREN\"><title lang=\"en\">Harry Potter</title><author>J K. Rowling</author><year>2005</year></book>",
"timestamp": "2025-05-18T13:35:37Z"
}

Output

{
  "book": {
    "@category": "CHILDREN",
    "author": "J K. Rowling",
    "title": {
      "@lang": "en",
      "text": "Harry Potter"
    },
    "year": 2005
  },
  "timestamp": "2025-05-18T13:35:37Z"
}

Best Practices for XML Parsing

When using an XML parser in Observo AI pipelines, it's important to follow best practices to ensure efficiency, security, and scalability. Here are some key guidelines:

  1. Efficient Parsing Strategy

    • Use streaming parsers such as SAX or StAX for large XML log files to avoid memory overload.

    • Opt for DOM parsers only when the entire XML structure needs to be manipulated.

    • Leverage indexed lookups and XPath queries to quickly extract relevant log entries.

  2. Data Normalization & Structuring

    • Convert XML logs into a structured format such as JSON for easier processing in AI models.

    • Use consistent naming conventions and schema validation (XSD) to ensure uniformity.

    • Implement log enrichment, adding metadata such as timestamps, system info, or correlation IDs.

  3. Security Considerations

    • Prevent XML External Entity (XXE) attacks by disabling external entity processing.

    • Implement input validation to reject malformed or oversized XML logs.

    • Use schema validation (XSD) to ensure data integrity before processing.

  4. Performance Optimization

    • Batch process logs instead of parsing them one-by-one in real-time.

    • Utilize parallel processing for high-throughput environments.

    • Cache frequently accessed log structures to reduce redundant parsing operations.

  5. Integration with Observability Pipelines

    • Ensure compatibility with SIEM systems such as Splunk, ELK stack by mapping parsed logs to standard formats.

    • Use event-driven processing, triggering actions based on parsed log insights.

    • Implement logging levels (INFO, WARNING, ERROR) to filter out unnecessary data for analysis.

By following these best practices, the XML parser in Observo AI pipelines can efficiently process security logs, detect anomalies, and enhance system observability.

  • Syslog Parser: Parse Syslog event into structured JSON.

  • CEF Parser: Extracts and normalizes fields from CEF-formatted logs, enabling efficient search, correlation, and analysis in SIEM systems.

Last updated

Was this helpful?