Elasticsearch

The Observo AI Elasticsearch destination enables the transmission of log and event data to Elasticsearch clusters for advanced search, analytics, and visualization, supporting customizable encoding, secure authentication (Basic or AWS), and Gzip compression for efficient data integration.

Purpose

The Observo AI Elasticsearch destination enables the transmission of log and event data to Elasticsearch clusters, facilitating advanced search, analytics, and visualization capabilities. This integration allows organizations to harness Elasticsearch's powerful indexing and querying features for comprehensive observability and operational insights.

Prerequisites

Before configuring the Elasticsearch destination in Observo AI, ensure the following requirements are met:

Observo AI Platform Setup

  • Observo AI Site: Ensure that the Observo AI Site is installed and operational.

Elasticsearch Cluster Configuration

  • Elasticsearch Endpoint URL: Determine the Elasticsearch endpoint URL, typically in the format http://<elasticsearch-host>:9200.

  • Authentication Credentials: If authentication is enabled, obtain the necessary username and password or API key.

  • Index Settings: Decide on the index or index pattern where the data will be stored.

  • TLS Configuration: If using HTTPS, ensure that the necessary TLS certificates are in place.

Network and Connectivity

  • HTTP/HTTPS Access: Ensure that Observo AI can communicate with the Elasticsearch endpoint over HTTP or HTTPS.

  • Firewall Rules: If using firewalls or network security groups, configure them to allow outbound traffic from Observo AI to the Elasticsearch endpoint.

Integration

To configure Elasticsearch as a destination in Observo AI, follow these steps:

  1. Access Observo AI Destinations:

    • Navigate to the Destinations tab in the Observo AI interface.

    • Click on the "Add Destination" button and select "Create New".

    • Choose "Elasticsearch" from the list of available destinations.

  2. General Settings:

    • Name: Provide a unique identifier for the destination such as elasticsearch-dest-1.

    • Description (Optional): Add a description for the destination.

    • Elasticsearch Endpoint (Add as needed): The Elasticsearch endpoints to send logs to. Each endpoint must contain an HTTP scheme, and may specify a hostname or IP address and port.

      Examples

      https://127.0.0.1:9200

      http://my-elasticsearch-endpoint

    • Mode: Elasticsearch Bulk API Indexing mode. Determines which Elasticsearch Bulk API Indexing mode to use.

      Options
      Description

      Bulk

      Batch process multiple operations in a single request

      Data Stream

      Continuous flow of timestamped data, optimized for ingestion

    • Id Key: The name of the event key that should map to Elasticsearch’s id field. By default, the _id field is not set, which allows Elasticsearch to set this automatically. Setting your own Elasticsearch IDs can impact performance.

      Examples

      id

      _id

    • Pipeline: The name of the ingest pipeline to apply in Elasticsearch.

  3. Authentication (Optional):

    • Auth Strategy: The authentication strategy to use. Choose between the following authentication mechanisms:

      Options

      Amazon OpenSearch Service-specific authentication

      HTTP Basic Authentication

      No selection - choose this option if you are using API token based authentication. This will have to be specified as a HTTP header.

    • Auth Access Key Id: The AWS access key ID.

      Example

      AKIAIOSFODNN7EXAMPLE

    • Auth Secret Access Key: The AWS secret access key.

      Example

      wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

    • Auth User: Basic authentication username.

      Examples

      ${ELASTICSEARCH_PASSWORD}

      username

    • Auth Password: Basic authentication password.

      Examples

      ${ELASTICSEARCH_PASSWORD}

      password

    • Auth Region: The AWS region to send STS requests to. Defaults to the configured region for the service itself.

      Example

      use-west-2

    • Auth Assume Role: The ARN of an IAM role to assume.

      Example

      arn:aws:iam::123456789098:role/my_role

    • Auth Load Timeout Secs: Timeout for successfully loading any credentials, in seconds. Relevant when the default credentials chain is used or assume_role.

      Example

      30

    • Auth IMDS Connect Timeout Seconds: Connect timeout for IMDS.

    • Auth IMDS Max Attempts: Number of IMDS retries for fetching tokens and metadata.

    • Auth IMDS Read Timeout Seconds: Read timeout for IMDS.

  4. Acknowledgement (Optional):

    • Acknowledgements Enabled (Disabled): Whether or not end-to-end acknowledgements are enabled. When enabled, any source connected to this supporting end-to-end acknowledgements, will wait for events to be acknowledged by the sink before acknowledging them at the source.

  5. Encoding (Optional):

    • Fields to Exclude (Add): List any fields that should be excluded from the serialized payload. Default: host

      Examples

      fields1

      date

      host

    • Encoding Timestamp Format: Specify the timestamp format (default is RFC 3339).

      Options
      Description

      RFC 3339 timerstamp

      Human-readable date-time format with timezone (ISO 8601-based)

      Unix tinestamp

      Seconds since January 1, 1970 (UTC epoch)

  6. Bulk Mode Configuration:

    • Bulk Index: The name of the index to write events to. Relevant when Mode=Bulk.

      Examples

      application-{{ application_id }}-%Y-%m-%d

      {{ index }}

    • Bulk Action: Action to use when making requests to the Elasticsearch Bulk API. Currently, Observo only supports index and create. update and delete actions are not supported. Relevant when Mode=Bulk.

      Options
      Description

      Create

      Adds a new document only if it doesn’t exist

      Index

      Adds or replaces a document with the same ID

  7. Request Configurations:

    • Request Concurrency: Configuration for outbound request concurrency. Default: Adaptive concurrency.

      Options
      Description

      Adaptive concurrency

      Adjusts parallelism based on system load

      A fixed concurrency of 1

      Processes one task at a time only

    • Request Rate Limit Duration Secs: The time window used for the rate_limit_num option. Default: 1.

    • Request Rate Limit Num: The maximum number of requests allowed within the rate_limit_duration_secs time window. Default: Unlimited.

    • Request Retry Attempts: The maximum number of retries to make for failed requests. The default, represents an infinite number of retries. Default: Unlimited.

    • Request Retry Initial Backoff Secs: The amount of time to wait in seconds before attempting the first retry for a failed request. After the first retry has failed, the fibonacci sequence will be used to select future backoffs. Default: 1.

    • Request Retry Max Duration Secs: The maximum amount of time to wait between retries. Default: 3600.

    • Request Timeout Secs: The time a request waits before being aborted. It is recommended that this value is not lowered below the service’s internal timeout, as this could create orphaned requests, and duplicate data downstream. Default: 60.

  8. Data Stream Mode Configuration:

    • Data Stream Auto Routing (True): Automatically routes events by deriving the data stream name using specific event fields. Data stream name is <type>-<dataset>-<namespace>, where value comes from the data_stream configuration field of the same name. If enabled, the value of the Data Stream Type, Data Stream Dataset, and Data Stream Namespace event fields will be used if they are present. Otherwise, the values set here in the configuration will be used.

    • Data Stream Dataset: The data stream dataset used to construct the data stream at index time. Default: generic

      Examples

      generic

      nginx

      {{ service }}

    • Data Stream Namespace: The data stream namespace used to construct the data stream at index time. Default: default

      Example

      {{ environment }}

    • Data Stream Sync Fields (True): Automatically adds and syncs the data_stream.* event fields if they are missing from the event. This ensures that fields match the name of the data stream that is receiving events.

    • Data Stream Type: The data stream type used to construct the data stream at index time. Default: log

      Examples

      metrics

      synthetics

      {{ type }}

  9. Batching Requirements (Optional):

    • Batch Max Bytes (Increment as needed): The maximum size of a batch that will be processed by a sink. This is based on the uncompressed size of the batched events, before they are serialized / compressed.

    • Batch Max Events (Increment as needed): The maximum size of a batch before it is flushed.

    • Batch Timeout Seconds (Increment as needed): The maximum age of a batch before it is flushed. Default: 1

  10. AWS Configuration (Optional):

    • AWS Endpoint: Custom endpoint for use with AWS-compatible services.

      Example

      http://127.0.0.0:5000/path/to/service

    • AWS Region: The AWS Region of the target service.

      Example

      us-east-1

  11. TLS Configuration (Optional):

    • TLS CA: Provide the CA certificate in PEM format.

    • TLS Certificate: Provide the client certificate in PEM format.

    • TLS Key: Provide the private key in PEM format.

    • TLS Key Passphrase: If the key is encrypted, provide the passphrase.

    • Verify Certificate: Enable or disable certificate verification.

    • Verify Hostname: Enable or disable hostname verification.

  12. Buffering Configuration:

    • Buffer Type: Specifies the buffering mechanism for event delivery.

      Options
      Description

      Memory

      High-Performance, in-memory buffering Max Events: The maximum number of events allowed in the buffer. Default: 500 When Full: Event handling behavior when a buffer is full. Default: Block - Block: Wait for free space in the buffer.This applies backpressure up the topology, signalling that sources should slow down the acceptance/consumption of events. This means that while no data is lost, data will pile up at the edge. - Drop Newest: Drop the event instead of waiting for free space in the buffer. The event will be intentionally dropped. This mode is typically used when performance is the highest priority, and it is preferable to temporarily lose events rather than cause a slowdown in the acceptance/consumption of events.

      Disk

      Lower-Performance, Less-costly, on disk buffering Max Bytes Size: The maximum number of bytes size allowed in the buffer. Must be at-least 268435488 When Full: Event handling behavior when a buffer is full. Default: Block - Block: Wait for free space in the buffer. This applies backpressure up the topology, signalling that sources should slow down the acceptance/consumption of events. This means that while no data is lost, data will pile up at the edge. - Drop Newest: Drop the event instead of waiting for free space in the buffer. The event will be intentionally dropped. This mode is typically used when performance is the highest priority, and it is preferable to temporarily lose events rather than cause a slowdown in the acceptance/consumption of events.

  13. Advanced Settings (Optional):

    • Metrics Timezone: The name of the timezone to apply to timestamp conversions that do not contain an explicit time zone. The time zone name may be any name in the TZ database, or local to indicate system local time.

      Example

      local

      America/New_York

      EST5EDT

    • Metrics Host Tag: Name of the tag in the metric to use for the source host.

      Examples

      host

      hostname

    • Api Version: The API version of Elasticsearch

      Options

      Auto-detect the API version

      Ekasticsearch 6.x API

      Ekasticsearch 7.x API

      Ekasticsearch 8.x API

    • Compression: Compression configuration. All compression algorithms use the default compression level unless otherwise specified. Default: No compression

      Options
      Description

      Gzip compression

      Widely used DEFLATE-based compression format

      No compression

      No compression applied to data

      Zlib compression

      DEFLATE-based, lightweight compression library

    • Distribution Retry Initial Backoff Secs (Increment as needed): Initial delay between attempts to reactivate endpoints once they become unhealthy.

    • Distribution Retry Max Duration Secs (Increment as needed): Maximum delay between attempts to reactivate endpoints once they become unhealthy.

    • Doc Type: The doc_type for your index data. Only relevant for Elasticsearch <= 6.X. Deprecated for version >= 7.0. Default: _doc

    • Metrics Metric Tag Values: Controls how metric tag values are encoded.

      Options
      Descriptions

      Tags will be exposed as single strings

      When set to single, only the last non-bare value of tags will be displayed with the metric.

      Tags exposed as arrays of strings

      When set to full, all metric tags will be exposed as separate assignments.

    • Query (add as needed): A query string parameter and its value to add to the query string. Example:

      key
      value

      X-Powered-By

      Observo

    • Request Retry Partial (False): Whether to retry successful requests containing partial failures. To avoid duplicates in Elasticsearch, please use option id_key.

    • Suppress Type Name (False): Whether to send the type field to Elasticsearch. Deprecated in Elasticsearch 7.x and removed in Elasticsearch 8.x. If enabled, the doc_type option will be ignored.

    • Rejection Reporting: Elasticsearch may reject some events due to internal constraints such as non-adherence to schema. If rejection-reporting is turned on (temporarily), it may help isolate the cause for rejection. Smaller batch-size often make corner-cases easier to debug, while keeping overhead in check.

      Options

      Report stats but drop request and response payloads

      Report response payload but drop request (significant overhead)

      Report both request and response (very high overhead)

  14. Save and Test Configuration:

    • Save the configuration settings.

    • Send sample data to verify that it reaches the specified Elasticsearch index.

Example Scenarios

Apex Financial Services, a fictitious enterprise in the financial services sector, specializes in wealth management and transaction processing. To enhance observability and gain actionable insights into their transaction logs and customer interaction data, Apex decides to integrate their Observo AI platform with an Elasticsearch cluster. This integration will enable advanced search, analytics, and visualization of their financial data, helping them monitor market trends, detect anomalies, and ensure regulatory compliance. Below is the detailed configuration process for setting up Elasticsearch as a destination in Observo AI, based on the provided documentation, with all required fields specified.

Standard Elastic Search Source Setup

Here is a standard Elastic Search Source configuration example. Only the required sections and their associated field updates are displayed in the table below:

General Settings

Field
Value
Description

Name

apex-es-transactions

Unique identifier for the Elasticsearch destination.

Description

Elasticsearch destination for transaction logs and customer interactions

Optional description for clarity.

Elasticsearch Endpoint

https://es-cluster.apexfin.com:9200

The secure endpoint for the Elasticsearch cluster.

Mode

Bulk

Uses Elasticsearch Bulk API for batch processing multiple operations in a single request.

Id Key

transaction_id

Maps to Elasticsearch’s _id field for unique transaction identification.

Pipeline

apex-transaction-pipeline

The ingest pipeline to apply for data preprocessing in Elasticsearch.

2. Authentication

Field
Value
Description

Auth Strategy

HTTP Basic Authentication

Uses username and password for secure access.

Auth User

apex_admin

Username for Elasticsearch authentication.

Auth Password

${ELASTICSEARCH_PASSWORD}

Password stored in environment variable for security.

Encoding

Field
Value
Description

Fields to Exclude

host, client_ip

Excludes sensitive fields from the serialized payload.

Encoding Timestamp Format

RFC 3339 timestamp

Uses human-readable ISO 8601-based format with timezone.

Bulk Mode Configuration

Field
Value
Description

Bulk Index

transactions-%Y-%m-%d

Dynamic index name based on date for daily transaction logs.

Bulk Action

Index

Adds or replaces documents with the same transaction_id.

Request Configuration

Field
Value
Description

Request Concurrency

Adaptive concurrency

Adjusts parallelism based on system load for optimal performance.

Request Rate Limit Duration Secs

1

Time window for rate limiting (default).

Request Rate Limit Num

100

Maximum requests allowed within the time window.

Request Retry Attempts

3

Maximum retries for failed requests.

Request Retry Initial Backoff Secs

1

Initial wait time before retrying a failed request.

Request Retry Max Duration Secs

3600

Maximum wait time between retries (default).

Request Timeout Secs

60

Time before a request is aborted (default).

Data Stream Mode Configuration

Field
Value
Description

Data Stream Auto Routing

True

Automatically derives data stream name using event fields.

Data Stream Dataset

transactions

Dataset used to construct the data stream name.

Data Stream Namespace

production

Namespace for the data stream, reflecting the environment.

Data Stream Sync Fields

True

Ensures data_stream.* fields match the receiving data stream.

Data Stream Type

log

Type used to construct the data stream name.

Batching Configuration

Field
Value
Description

Batch Max Bytes

10485760

Maximum batch size (10 MB) for uncompressed events.

Batch Max Events

1000

Maximum number of events in a batch before flushing.

Batch Timeout Seconds

1

Maximum age of a batch before flushing (default).

TLS Configuration

Field
Value
Description

TLS CA

/certs/apex_ca.pem

Path to the CA certificate in PEM format.

TLS Certificate

/certs/apex_client_cert.pem

Path to the client certificate in PEM format.

TLS Key

/certs/apex_client_key.pem

Path to the private key in PEM format.

TLS Key Passphrase

${TLS_KEY_PASSPHRASE}

Passphrase for the encrypted private key, stored securely.

Verify Certificate

True

Enables certificate verification for secure communication.

Verify Hostname

True

Enables hostname verification for added security.

Buffering Configuration

Field
Value
Description

Buffer Type

Memory

Uses high-performance, in-memory buffering.

Max Events

500

Maximum number of events in the buffer (default).

When Full

Block

Applies backpressure to wait for free space, preventing data loss.

Advanced Settings

Field
Value
Description

Metrics Timezone

America/New_York

Timezone for timestamp conversions, matching Apex’s primary location.

Metrics Host Tag

hostname

Tag used for the source host in metrics.

Api Version

Auto-detect

Automatically detects the Elasticsearch API version.

Compression

Gzip compression

Applies Gzip compression to reduce data transfer size.

Distribution Retry Initial Backoff Secs

1

Initial delay for retrying unhealthy endpoints.

Distribution Retry Max Duration Secs

3600

Maximum delay for retrying unhealthy endpoints.

Doc Type

_doc

Default document type for Elasticsearch (relevant for <= 6.x).

Metrics Metric Tag Values

Tags exposed as arrays of strings

Exposes all metric tags as separate assignments.

Query

X-Powered-By: Observo

Adds a query string parameter to identify the source.

Request Retry Partial

False

Does not retry requests with partial failures to avoid duplicates.

Suppress Type Name

False

Sends the type field to Elasticsearch (relevant for <= 6.x).

Rejection Reporting

Report stats but drop request and response payloads

Reports stats to help debug rejections with minimal overhead.

Test Configuration

  • Save the configuration in the Observo AI interface.

  • Send sample transaction data (e.g., a mock transaction log) to verify ingestion.

  • Use Elasticsearch’s search functionality to confirm that data appears in the transactions-%Y-%m-%d index.

  • Monitor Observo AI’s Notifications tab for any errors or warnings.

Scenario Troubleshooting

  • Authentication Issues: Verify that apex_admin and the password stored in ${ELASTICSEARCH_PASSWORD} are valid and have write permissions.

  • Index Not Found: Ensure the transactions-%Y-%m-%d index pattern is correctly configured in Elasticsearch.

  • Network Issues: Confirm that Observo AI can reach https://es-cluster.apexfin.com:9200 and that firewall rules allow outbound HTTPS traffic.

  • Data Format Errors: Validate that transaction logs match the expected schema and that the apex-transaction-pipeline ingest pipeline is correctly set up.

This configuration enables Apex Financial Services to efficiently stream and analyze their financial data in Elasticsearch,

Troubleshooting

If issues arise with the Elasticsearch destination in Observo AI, use the following steps to diagnose and resolve them:

Verify Configuration Settings

  • Ensure that the Elasticsearch Endpoint URL, Authentication Credentials, and Index are correctly entered and match the Elasticsearch setup.

  • Confirm that the Elasticsearch cluster is operational and accessible.

Check Authentication

  • Verify that the provided credentials are valid and have the necessary permissions to write to the specified index.

  • Ensure that the credentials have not expired or been revoked.

Monitor Logs

  • Check Observo AI’s Notifications tab for errors or warnings related to data transmission.

  • In the Elasticsearch interface, search the specified index to confirm data arrival.

Validate Data Format and Schema

  • Ensure that the data sent from Observo AI matches the expected format and schema in Elasticsearch.

  • If using custom mappings, verify that they are properly configured in Elasticsearch.

Network and Connectivity

  • Ensure that Observo AI can reach the Elasticsearch endpoint over the network.

  • If using firewalls or proxies, verify their configurations to allow necessary traffic.

Common Error Messages

  • "Authentication failed": Indicates invalid or missing credentials. Verify the credentials' validity and permissions.

  • "Index not found": Check that the specified index exists in Elasticsearch and that the credentials have write permissions.

  • "Error in creation of index": If you encounter this error, this is because the index that Observo is writing to does not exist. To fix this issue, do one of the following:

    • Create the index in Elasticsearch

    • Give create_index permissions to Observo.

  • "No data ingested": Confirm that data is being sent and matches the expected format.

  • Error in writing document to Index”: Limit of total fields [1000] has been exceeded while adding new fields. Refer here in order to fix this issue.

Test Data Flow

  • Send sample data from Observo AI and verify its ingestion in Elasticsearch.

  • Use Elasticsearch's search functionality to locate and analyze the ingested data.

Resources

For additional guidance and detailed information, refer to the following resources:

  • Observo AI Elasticsearch Documentation: Comprehensive guide to configuring Elasticsearch destination in Observo AI.

  • Elasticsearch Documentation: Instructions for setting up and managing Elasticsearch clusters.

  • Observo AI Support: Contact support for assistance with configuration and troubleshooting.

  • Elasticsearch Community: Engage with the Elasticsearch community for best practices and solutions.

Last updated

Was this helpful?