Elasticsearch
The Observo AI Elasticsearch destination enables the transmission of log and event data to Elasticsearch clusters for advanced search, analytics, and visualization, supporting customizable encoding, secure authentication (Basic or AWS), and Gzip compression for efficient data integration.
Purpose
The Observo AI Elasticsearch destination enables the transmission of log and event data to Elasticsearch clusters, facilitating advanced search, analytics, and visualization capabilities. This integration allows organizations to harness Elasticsearch's powerful indexing and querying features for comprehensive observability and operational insights.
Prerequisites
Before configuring the Elasticsearch destination in Observo AI, ensure the following requirements are met:
Observo AI Platform Setup
Observo AI Site: Ensure that the Observo AI Site is installed and operational.
Elasticsearch Cluster Configuration
Elasticsearch Endpoint URL: Determine the Elasticsearch endpoint URL, typically in the format
http://<elasticsearch-host>:9200.Authentication Credentials: If authentication is enabled, obtain the necessary username and password or API key.
Index Settings: Decide on the index or index pattern where the data will be stored.
TLS Configuration: If using HTTPS, ensure that the necessary TLS certificates are in place.
Network and Connectivity
HTTP/HTTPS Access: Ensure that Observo AI can communicate with the Elasticsearch endpoint over HTTP or HTTPS.
Firewall Rules: If using firewalls or network security groups, configure them to allow outbound traffic from Observo AI to the Elasticsearch endpoint.
Integration
To configure Elasticsearch as a destination in Observo AI, follow these steps:
Access Observo AI Destinations:
Navigate to the Destinations tab in the Observo AI interface.
Click on the "Add Destination" button and select "Create New".
Choose "Elasticsearch" from the list of available destinations.
General Settings:
Name: Provide a unique identifier for the destination such as
elasticsearch-dest-1.Description (Optional): Add a description for the destination.
Elasticsearch Endpoint (Add as needed): The Elasticsearch endpoints to send logs to. Each endpoint must contain an HTTP scheme, and may specify a hostname or IP address and port.
Exampleshttps://127.0.0.1:9200
http://my-elasticsearch-endpoint
Mode: Elasticsearch Bulk API Indexing mode. Determines which Elasticsearch Bulk API Indexing mode to use.
OptionsDescriptionBulk
Batch process multiple operations in a single request
Data Stream
Continuous flow of timestamped data, optimized for ingestion
Id Key: The name of the event key that should map to Elasticsearch’s id field. By default, the _id field is not set, which allows Elasticsearch to set this automatically. Setting your own Elasticsearch IDs can impact performance.
Examplesid
_id
Pipeline: The name of the ingest pipeline to apply in Elasticsearch.
Authentication (Optional):
Auth Strategy: The authentication strategy to use. Choose between the following authentication mechanisms:
OptionsAmazon OpenSearch Service-specific authentication
HTTP Basic Authentication
No selection - choose this option if you are using API token based authentication. This will have to be specified as a HTTP header.
Auth Access Key Id: The AWS access key ID.
ExampleAKIAIOSFODNN7EXAMPLE
Auth Secret Access Key: The AWS secret access key.
ExamplewJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Auth User: Basic authentication username.
Examples${ELASTICSEARCH_PASSWORD}
username
Auth Password: Basic authentication password.
Examples${ELASTICSEARCH_PASSWORD}
password
Auth Region: The AWS region to send STS requests to. Defaults to the configured region for the service itself.
Exampleuse-west-2
Auth Assume Role: The ARN of an IAM role to assume.
Examplearn:aws:iam::123456789098:role/my_role
Auth Load Timeout Secs: Timeout for successfully loading any credentials, in seconds. Relevant when the default credentials chain is used or assume_role.
Example30
Auth IMDS Connect Timeout Seconds: Connect timeout for IMDS.
Auth IMDS Max Attempts: Number of IMDS retries for fetching tokens and metadata.
Auth IMDS Read Timeout Seconds: Read timeout for IMDS.
Acknowledgement (Optional):
Acknowledgements Enabled (Disabled): Whether or not end-to-end acknowledgements are enabled. When enabled, any source connected to this supporting end-to-end acknowledgements, will wait for events to be acknowledged by the sink before acknowledging them at the source.
Encoding (Optional):
Fields to Exclude (Add): List any fields that should be excluded from the serialized payload. Default: host
Examplesfields1
date
host
Encoding Timestamp Format: Specify the timestamp format (default is RFC 3339).
OptionsDescriptionRFC 3339 timerstamp
Human-readable date-time format with timezone (ISO 8601-based)
Unix tinestamp
Seconds since January 1, 1970 (UTC epoch)
Bulk Mode Configuration:
Bulk Index: The name of the index to write events to. Relevant when Mode=Bulk.
Examplesapplication-{{ application_id }}-%Y-%m-%d
{{ index }}
Bulk Action: Action to use when making requests to the Elasticsearch Bulk API. Currently, Observo only supports index and create. update and delete actions are not supported. Relevant when Mode=Bulk.
OptionsDescriptionCreate
Adds a new document only if it doesn’t exist
Index
Adds or replaces a document with the same ID
Request Configurations:
Request Concurrency: Configuration for outbound request concurrency. Default: Adaptive concurrency.
OptionsDescriptionAdaptive concurrency
Adjusts parallelism based on system load
A fixed concurrency of 1
Processes one task at a time only
Request Rate Limit Duration Secs: The time window used for the rate_limit_num option. Default: 1.
Request Rate Limit Num: The maximum number of requests allowed within the rate_limit_duration_secs time window. Default: Unlimited.
Request Retry Attempts: The maximum number of retries to make for failed requests. The default, represents an infinite number of retries. Default: Unlimited.
Request Retry Initial Backoff Secs: The amount of time to wait in seconds before attempting the first retry for a failed request. After the first retry has failed, the fibonacci sequence will be used to select future backoffs. Default: 1.
Request Retry Max Duration Secs: The maximum amount of time to wait between retries. Default: 3600.
Request Timeout Secs: The time a request waits before being aborted. It is recommended that this value is not lowered below the service’s internal timeout, as this could create orphaned requests, and duplicate data downstream. Default: 60.
Data Stream Mode Configuration:
Data Stream Auto Routing (True): Automatically routes events by deriving the data stream name using specific event fields. Data stream name is <type>-<dataset>-<namespace>, where value comes from the data_stream configuration field of the same name. If enabled, the value of the Data Stream Type, Data Stream Dataset, and Data Stream Namespace event fields will be used if they are present. Otherwise, the values set here in the configuration will be used.
Data Stream Dataset: The data stream dataset used to construct the data stream at index time. Default: generic
Examplesgeneric
nginx
{{ service }}
Data Stream Namespace: The data stream namespace used to construct the data stream at index time. Default: default
Example{{ environment }}
Data Stream Sync Fields (True): Automatically adds and syncs the data_stream.* event fields if they are missing from the event. This ensures that fields match the name of the data stream that is receiving events.
Data Stream Type: The data stream type used to construct the data stream at index time. Default: log
Examplesmetrics
synthetics
{{ type }}
Batching Requirements (Optional):
Batch Max Bytes (Increment as needed): The maximum size of a batch that will be processed by a sink. This is based on the uncompressed size of the batched events, before they are serialized / compressed.
Batch Max Events (Increment as needed): The maximum size of a batch before it is flushed.
Batch Timeout Seconds (Increment as needed): The maximum age of a batch before it is flushed. Default: 1
AWS Configuration (Optional):
AWS Endpoint: Custom endpoint for use with AWS-compatible services.
Examplehttp://127.0.0.0:5000/path/to/service
AWS Region: The AWS Region of the target service.
Exampleus-east-1
TLS Configuration (Optional):
TLS CA: Provide the CA certificate in PEM format.
TLS Certificate: Provide the client certificate in PEM format.
TLS Key: Provide the private key in PEM format.
TLS Key Passphrase: If the key is encrypted, provide the passphrase.
Verify Certificate: Enable or disable certificate verification.
Verify Hostname: Enable or disable hostname verification.
Buffering Configuration:
Buffer Type: Specifies the buffering mechanism for event delivery.
OptionsDescriptionMemory
High-Performance, in-memory buffering Max Events: The maximum number of events allowed in the buffer. Default: 500 When Full: Event handling behavior when a buffer is full. Default: Block - Block: Wait for free space in the buffer.This applies backpressure up the topology, signalling that sources should slow down the acceptance/consumption of events. This means that while no data is lost, data will pile up at the edge. - Drop Newest: Drop the event instead of waiting for free space in the buffer. The event will be intentionally dropped. This mode is typically used when performance is the highest priority, and it is preferable to temporarily lose events rather than cause a slowdown in the acceptance/consumption of events.
Disk
Lower-Performance, Less-costly, on disk buffering Max Bytes Size: The maximum number of bytes size allowed in the buffer. Must be at-least 268435488 When Full: Event handling behavior when a buffer is full. Default: Block - Block: Wait for free space in the buffer. This applies backpressure up the topology, signalling that sources should slow down the acceptance/consumption of events. This means that while no data is lost, data will pile up at the edge. - Drop Newest: Drop the event instead of waiting for free space in the buffer. The event will be intentionally dropped. This mode is typically used when performance is the highest priority, and it is preferable to temporarily lose events rather than cause a slowdown in the acceptance/consumption of events.
Advanced Settings (Optional):
Metrics Timezone: The name of the timezone to apply to timestamp conversions that do not contain an explicit time zone. The time zone name may be any name in the TZ database, or local to indicate system local time.
Examplelocal
America/New_York
EST5EDT
Metrics Host Tag: Name of the tag in the metric to use for the source host.
Exampleshost
hostname
Api Version: The API version of Elasticsearch
OptionsAuto-detect the API version
Ekasticsearch 6.x API
Ekasticsearch 7.x API
Ekasticsearch 8.x API
Compression: Compression configuration. All compression algorithms use the default compression level unless otherwise specified. Default: No compression
OptionsDescriptionGzip compression
Widely used DEFLATE-based compression format
No compression
No compression applied to data
Zlib compression
DEFLATE-based, lightweight compression library
Distribution Retry Initial Backoff Secs (Increment as needed): Initial delay between attempts to reactivate endpoints once they become unhealthy.
Distribution Retry Max Duration Secs (Increment as needed): Maximum delay between attempts to reactivate endpoints once they become unhealthy.
Doc Type: The doc_type for your index data. Only relevant for Elasticsearch <= 6.X. Deprecated for version >= 7.0. Default: _doc
Metrics Metric Tag Values: Controls how metric tag values are encoded.
OptionsDescriptionsTags will be exposed as single strings
When set to single, only the last non-bare value of tags will be displayed with the metric.
Tags exposed as arrays of strings
When set to full, all metric tags will be exposed as separate assignments.
Query (add as needed): A query string parameter and its value to add to the query string. Example:
keyvalueX-Powered-By
Observo
Request Retry Partial (False): Whether to retry successful requests containing partial failures. To avoid duplicates in Elasticsearch, please use option id_key.
Suppress Type Name (False): Whether to send the type field to Elasticsearch. Deprecated in Elasticsearch 7.x and removed in Elasticsearch 8.x. If enabled, the doc_type option will be ignored.
Rejection Reporting: Elasticsearch may reject some events due to internal constraints such as non-adherence to schema. If rejection-reporting is turned on (temporarily), it may help isolate the cause for rejection. Smaller batch-size often make corner-cases easier to debug, while keeping overhead in check.
OptionsReport stats but drop request and response payloads
Report response payload but drop request (significant overhead)
Report both request and response (very high overhead)
Save and Test Configuration:
Save the configuration settings.
Send sample data to verify that it reaches the specified Elasticsearch index.
Example Scenarios
Apex Financial Services, a fictitious enterprise in the financial services sector, specializes in wealth management and transaction processing. To enhance observability and gain actionable insights into their transaction logs and customer interaction data, Apex decides to integrate their Observo AI platform with an Elasticsearch cluster. This integration will enable advanced search, analytics, and visualization of their financial data, helping them monitor market trends, detect anomalies, and ensure regulatory compliance. Below is the detailed configuration process for setting up Elasticsearch as a destination in Observo AI, based on the provided documentation, with all required fields specified.
Standard Elastic Search Source Setup
Here is a standard Elastic Search Source configuration example. Only the required sections and their associated field updates are displayed in the table below:
General Settings
Name
apex-es-transactions
Unique identifier for the Elasticsearch destination.
Description
Elasticsearch destination for transaction logs and customer interactions
Optional description for clarity.
Elasticsearch Endpoint
https://es-cluster.apexfin.com:9200
The secure endpoint for the Elasticsearch cluster.
Mode
Bulk
Uses Elasticsearch Bulk API for batch processing multiple operations in a single request.
Id Key
transaction_id
Maps to Elasticsearch’s _id field for unique transaction identification.
Pipeline
apex-transaction-pipeline
The ingest pipeline to apply for data preprocessing in Elasticsearch.
2. Authentication
Auth Strategy
HTTP Basic Authentication
Uses username and password for secure access.
Auth User
apex_admin
Username for Elasticsearch authentication.
Auth Password
${ELASTICSEARCH_PASSWORD}
Password stored in environment variable for security.
Encoding
Fields to Exclude
host, client_ip
Excludes sensitive fields from the serialized payload.
Encoding Timestamp Format
RFC 3339 timestamp
Uses human-readable ISO 8601-based format with timezone.
Bulk Mode Configuration
Bulk Index
transactions-%Y-%m-%d
Dynamic index name based on date for daily transaction logs.
Bulk Action
Index
Adds or replaces documents with the same transaction_id.
Request Configuration
Request Concurrency
Adaptive concurrency
Adjusts parallelism based on system load for optimal performance.
Request Rate Limit Duration Secs
1
Time window for rate limiting (default).
Request Rate Limit Num
100
Maximum requests allowed within the time window.
Request Retry Attempts
3
Maximum retries for failed requests.
Request Retry Initial Backoff Secs
1
Initial wait time before retrying a failed request.
Request Retry Max Duration Secs
3600
Maximum wait time between retries (default).
Request Timeout Secs
60
Time before a request is aborted (default).
Data Stream Mode Configuration
Data Stream Auto Routing
True
Automatically derives data stream name using event fields.
Data Stream Dataset
transactions
Dataset used to construct the data stream name.
Data Stream Namespace
production
Namespace for the data stream, reflecting the environment.
Data Stream Sync Fields
True
Ensures data_stream.* fields match the receiving data stream.
Data Stream Type
log
Type used to construct the data stream name.
Batching Configuration
Batch Max Bytes
10485760
Maximum batch size (10 MB) for uncompressed events.
Batch Max Events
1000
Maximum number of events in a batch before flushing.
Batch Timeout Seconds
1
Maximum age of a batch before flushing (default).
TLS Configuration
TLS CA
/certs/apex_ca.pem
Path to the CA certificate in PEM format.
TLS Certificate
/certs/apex_client_cert.pem
Path to the client certificate in PEM format.
TLS Key
/certs/apex_client_key.pem
Path to the private key in PEM format.
TLS Key Passphrase
${TLS_KEY_PASSPHRASE}
Passphrase for the encrypted private key, stored securely.
Verify Certificate
True
Enables certificate verification for secure communication.
Verify Hostname
True
Enables hostname verification for added security.
Buffering Configuration
Buffer Type
Memory
Uses high-performance, in-memory buffering.
Max Events
500
Maximum number of events in the buffer (default).
When Full
Block
Applies backpressure to wait for free space, preventing data loss.
Advanced Settings
Metrics Timezone
America/New_York
Timezone for timestamp conversions, matching Apex’s primary location.
Metrics Host Tag
hostname
Tag used for the source host in metrics.
Api Version
Auto-detect
Automatically detects the Elasticsearch API version.
Compression
Gzip compression
Applies Gzip compression to reduce data transfer size.
Distribution Retry Initial Backoff Secs
1
Initial delay for retrying unhealthy endpoints.
Distribution Retry Max Duration Secs
3600
Maximum delay for retrying unhealthy endpoints.
Doc Type
_doc
Default document type for Elasticsearch (relevant for <= 6.x).
Metrics Metric Tag Values
Tags exposed as arrays of strings
Exposes all metric tags as separate assignments.
Query
X-Powered-By: Observo
Adds a query string parameter to identify the source.
Request Retry Partial
False
Does not retry requests with partial failures to avoid duplicates.
Suppress Type Name
False
Sends the type field to Elasticsearch (relevant for <= 6.x).
Rejection Reporting
Report stats but drop request and response payloads
Reports stats to help debug rejections with minimal overhead.
Test Configuration
Save the configuration in the Observo AI interface.
Send sample transaction data (e.g., a mock transaction log) to verify ingestion.
Use Elasticsearch’s search functionality to confirm that data appears in the transactions-%Y-%m-%d index.
Monitor Observo AI’s Notifications tab for any errors or warnings.
Scenario Troubleshooting
Authentication Issues: Verify that apex_admin and the password stored in ${ELASTICSEARCH_PASSWORD} are valid and have write permissions.
Index Not Found: Ensure the transactions-%Y-%m-%d index pattern is correctly configured in Elasticsearch.
Network Issues: Confirm that Observo AI can reach https://es-cluster.apexfin.com:9200 and that firewall rules allow outbound HTTPS traffic.
Data Format Errors: Validate that transaction logs match the expected schema and that the apex-transaction-pipeline ingest pipeline is correctly set up.
This configuration enables Apex Financial Services to efficiently stream and analyze their financial data in Elasticsearch,
Troubleshooting
If issues arise with the Elasticsearch destination in Observo AI, use the following steps to diagnose and resolve them:
Verify Configuration Settings
Ensure that the Elasticsearch Endpoint URL, Authentication Credentials, and Index are correctly entered and match the Elasticsearch setup.
Confirm that the Elasticsearch cluster is operational and accessible.
Check Authentication
Verify that the provided credentials are valid and have the necessary permissions to write to the specified index.
Ensure that the credentials have not expired or been revoked.
Monitor Logs
Check Observo AI’s Notifications tab for errors or warnings related to data transmission.
In the Elasticsearch interface, search the specified index to confirm data arrival.
Validate Data Format and Schema
Ensure that the data sent from Observo AI matches the expected format and schema in Elasticsearch.
If using custom mappings, verify that they are properly configured in Elasticsearch.
Network and Connectivity
Ensure that Observo AI can reach the Elasticsearch endpoint over the network.
If using firewalls or proxies, verify their configurations to allow necessary traffic.
Common Error Messages
"Authentication failed": Indicates invalid or missing credentials. Verify the credentials' validity and permissions.
"Index not found": Check that the specified index exists in Elasticsearch and that the credentials have write permissions.
"Error in creation of index": If you encounter this error, this is because the index that Observo is writing to does not exist. To fix this issue, do one of the following:
Create the index in Elasticsearch
Give create_index permissions to Observo.
"No data ingested": Confirm that data is being sent and matches the expected format.
“Error in writing document to Index”: Limit of total fields [1000] has been exceeded while adding new fields. Refer here in order to fix this issue.
Test Data Flow
Send sample data from Observo AI and verify its ingestion in Elasticsearch.
Use Elasticsearch's search functionality to locate and analyze the ingested data.
Resources
For additional guidance and detailed information, refer to the following resources:
Observo AI Elasticsearch Documentation: Comprehensive guide to configuring Elasticsearch destination in Observo AI.
Elasticsearch Documentation: Instructions for setting up and managing Elasticsearch clusters.
Observo AI Support: Contact support for assistance with configuration and troubleshooting.
Elasticsearch Community: Engage with the Elasticsearch community for best practices and solutions.
Last updated
Was this helpful?

