File

The Observo AI File Destination allows users to output telemetry data to local or NFS filesystems with configurable encoding, compression such as Gzip, and format options such as JSON, Parquet, supporting high-performance staging and validation to ensure efficient, reliable log storage and downstream integration.

Purpose

The best practices for configuring the Observo AI File Destination, based on the provided documentation and aligned with the Filesystem/NFS Destination guide, are as follows:

  • Use Gzip Compression: Enable Gzip compression (available in the Optional Settings) to reduce file sizes and optimize storage usage. This is particularly important for high-volume data to minimize disk space requirements and improve transfer efficiency when moving files to the output location.

  • Configure a High-Performance Staging Location: Specify a high-performance, stable disk for the Staging Location (e.g., /var/log/observo/staging) to buffer files before compression and transfer. Use fast storage (e.g., SSDs) to avoid bottlenecks, especially for large datasets. Ensure the staging location has sufficient disk space and is local to the Observo Site to prevent network latency issues.

  • Leverage Parquet for Analytics (Linux Only): If the downstream system supports analytics, use the Parquet data format (available on Linux-based Observo Sites) for efficient columnar storage and querying. Ensure the Parquet schema aligns with the target system’s requirements to avoid data format issues.

  • Test and Validate: Before deploying in production, test the configuration by sending sample data through the pipeline. Verify that files are created in the correct output location with the expected format (e.g., JSON, Raw, or Parquet) and content. Check Observo console logs for any errors during file creation.

These practices optimize performance, reliability, and manageability when using the Observo AI File Destination for outputting telemetry data to local or NFS filesystems.

Prerequisites

Before configuring a File destination in Observo AI, ensure the following requirements are met:

  • Observo AI Account: You must have an active Observo AI account with administrative access to the Observo console.

  • Filesystem Access: A local filesystem or network-attached file system (NFS) must be available and accessible to the Observo Site (data plane). Ensure write permissions are granted for the target output location.

  • Storage Capacity: Verify that the target filesystem has sufficient disk space for storing output files, especially for high-volume data pipelines.

  • Observo Site Deployment: A functional Observo Site must be deployed in your environment (on-premises or cloud) to handle data routing. Refer to the Observo AI documentation for deployment instructions.

  • Data Format: Determine the desired output data format (e.g., JSON, Raw, or Parquet). Note that Parquet is supported only on Linux-based Observo Sites, not Windows.

  • Network Configuration (for NFS): If using NFS, ensure network connectivity to the NFS server, with no firewall rules blocking access to the NFS mount point. Confirm that the Observo Site has the necessary NFS client software installed.

Prerequisite
Description
Notes

Observo AI Platform

The Observo AI Site must be installed and available.

Verify support for JSON, Raw, or Parquet formats.

Filesystem

Local or NFS location for output files.

Ensure write permissions and sufficient disk space.

Network (NFS)

Connectivity to NFS server if used.

Default ports: TCP 2049; check firewall rules.

Storage

High-performance storage for staging and output.

Recommended for large datasets to avoid bottlenecks.

Integration

To configure a File destination in Observo AI for outputting telemetry data to a local filesystem or NFS, follow these steps:

  1. Log in to Observo AI:

    • Navigate to the Destinations tab in the Observo console.

    • Click Add Destinations and select Create New.

    • Choose "File" from the list of available destinations.

  2. General Settings:

    • Name: Enter a unique identifier, such as file-dest-1.

    • Description (Optional): Provide a description for the destination.

    • File Path: Enter the path of the file starting with /.

      Example

      /my/path/file.txt

  3. Encoding:

    • Encoding Codec: The codec to use for encoding events. Default: JSON Encoding.

      Options
      Sub-Options

      JSON Encoding

      Pretty JSON (False): Format JSON with indentation and line breaks for better readability.

      logfmt Encoding

      None

      Apache Avro Encoding

      Avro Schema: Specify the Apache Avro schema definition for serializing events. Examples: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] }

      Newline Delimited JSON Encoding

      None

      No encoding

      None

      Plain text encoding

      None

      Parquet

      Include Raw Log (False): Capture the complete log message as an additional field(observo_record) apart from the given schema. Examples: In addition to the Parquet schema, there will be a field named "observo_record" in the Parquet file. Parquet Schema: Enter parquet schema for encoding. Examples: message root { optional binary stream; optional binary time; optional group kubernetes { optional binary pod_name; optional binary pod_id; optional binary docker_id; optional binary container_hash; optional binary container_image; optional group labels { optional binary pod-template-hash; } } }

      Common Event Format (CEF)

      CEF Device Event Class ID: Provide a unique identifier for categorizing the type of event (maximum 1023 characters). Example: login-failure CEF Device Product: Specify the product name that generated the event (maximum 63 characters). Example: Log Analyzer CEF Device Vendor: Specify the vendor name that produced the event (maximum 63 characters). Example: Observo CEF Device Version: Specify the version of the product that generated the event (maximum 31 characters). Example: 1.0.0 CEF Extensions (Add): Define custom key-value pairs for additional event data fields in CEF format. CEF Name: Provide a human-readable description of the event (maximum 512 characters). Example: cef.name CEF Severity: Indicate the importance of the event with a value from 0 (lowest) to 10 (highest). Example: 5 CEF Version (Select): Specify which version of the CEF specification to use for formatting. - CEF specification version 0.1 - CEF specification version 1.x

      CSV Format

      CSV Fields (Add): Specify the field names to include as columns in the CSV output and their order. Examples: - timestamp - host - message CSV Buffer Capacity (Optional): Set the internal buffer size (in bytes) used when writing CSV data. Example: 8192 CSV Delimitier (Optional): Set the character that separates fields in the CSV output. Example: , Enable Double Quote Escapes (True): When enabled, quotes in field data are escaped by doubling them. When disabled, an escape character is used instead. CSV Escape Character (Optional): Set the character used to escape quotes when double_quote is disabled. Example: <br> CSV Quote Character (Optional): Set the character used for quoting fields in the CSV output. Example: " CSV Quoting Style (Optional): Control when field values should be wrapped in quote characters. Options: - Always quot all fields - Quote only when necessary - Never use quotes - Quote all non-numeric fields

      Protocol Buffers

      Protobuf Message Type: Specify the fully qualified message type name for Protobuf serialization. Example: package.Message Protobuf Descriptor File: Specify the path to the compiled protobuf descriptor file (.desc). Example: /path/to/descriptor.desc

      Graylog Extended Log Format (GELF)

      None

  4. Advanced Settings:

    • Compression: Select either: No compression, GZip compression, Zstd compression. Default: No compression

      Options

      Gzip compression

      Zstd compression

      No compression

  5. Test the Integration:

    • Save the configuration settings in Observo AI.

    • Send test data through the pipeline and verify that files are created in the specified output location with the expected format and content.

    • Check the Observo console’s logs for pipeline status and confirm successful file creation.

Example Scenarios

DataVault Enterprises, a fictitious company managing large volumes of telemetry data, wants to configure Observo to output security logs to a local filesystem on a Linux-based Observo Site for archival and analysis. The logs should be stored in JSON format with Gzip compression to optimize storage and facilitate downstream processing. The configuration ensures reliable file creation in a high-performance storage location.

Standard File Destination Setup

Here is a standard File Destination configuration example. Only the required sections and their associated field updates are displayed in the table below:

General Settings

Field
Value
Description

Name

file-dest-datavault-1

Unique identifier for the destination.

Description

Outputs security logs to a local filesystem for DataVault Enterprises' archival and analysis.

Provides context for the destination's purpose.

File Path

/var/log/observo/security-logs/datavault-logs.txt

Specifies the full path for output files, starting with /, on the local filesystem.

Encoding

Field
Value
Description

Encoding Codec

JSON Encoding

Specifies JSON as the encoding format for events written to the file.

Advanced Settings

Field
Value
Description

Compression

Gzip compression

Enables Gzip compression to reduce file sizes and optimize storage usage.

Test the Integration:

  • Save settings, send test data, verify files are created in /var/log/observo/security-logs/datavault-logs.txt with JSON format and Gzip compression.

  • Saves configuration, tests data flow, and confirms files are correctly generated and compressed.

Notes:

  • Ensure the Observo Site has write permissions to the directory /var/log/observo/security-logs/ and sufficient disk space for storing Gzip-compressed files.

  • Monitor Observo logs in the console to confirm successful file creation and check for errors like permission issues or disk space shortages.

  • Use a high-performance storage location such as SSD for the file path to avoid bottlenecks, as recommended in best practices.

  • Confirm the downstream system can decompress Gzip files and process JSON payloads.

This configuration enables DataVault Enterprises to output security logs from Observo to a local filesystem in JSON format with Gzip compression for efficient archival and analysis.

Troubleshooting

Common issues and solutions when configuring a File destination:

  • File Write Failures:

    • Issue: Observo Site cannot write files to the output or staging location.

    • Solution: Verify that the specified paths (output and staging) exist and have write permissions for the Observo Site process. Check disk space availability using df -h.

  • No Data Written:

    • Issue: No files are created in the output location.

    • Solution: Confirm the pipeline is active and correctly configured to route data to the File destination. Check Observo logs for errors. Ensure the source is sending data.

  • NFS Connectivity Issues:

    • Issue: Unable to access the NFS output location.

    • Solution: Verify network connectivity to the NFS server (e.g., ping or telnet to port 2049). Ensure the NFS mount point is correctly configured and accessible. Check firewall rules.

  • Parquet Format Errors:

    • Issue: Parquet files are not generated or contain errors (Linux only).

    • Solution: Confirm that the Observo Site is running on Linux. Verify the Parquet schema configuration in the Parquet Settings tab. Check logs for schema mismatch errors.

  • Backpressure Issues:

    • Issue: Pipeline stalls due to backpressure from the File destination.

    • Solution: Check the backpressure behavior setting (Block or Drop). Increase Max File Size or Max Open Files to handle larger datasets. Ensure sufficient disk space.

Issue
Possible Cause
Resolution

No files written

Incorrect path or permissions

Verify output/staging paths and permissions

NFS errors

Network or mount issues

Check NFS server connectivity and mount settings

Large file sizes

Exceeds Max File Size

Increase Max File Size or adjust partitioning

High CPU usage

Excessive file operations

Optimize Max Open Files or use faster storage

Parquet errors

Schema mismatch or non-Linux system

Verify Parquet settings; ensure Linux environment

Resources

For additional guidance and detailed information, refer to the following resources:

Last updated

Was this helpful?