Azure Blob Storage Archival
The Observo AI Azure Blob Storage Archival destination enables scalable, cost-effective storage of telemetry data like logs, metrics, and traces in Google Cloud Storage, supporting flexible formats such as JSON, CSV, and Parquet with secure authentication, compression, and customizable access controls for observability and compliance.
Purpose
The Observo AI Azure Blob Storage Archival destination enables users to send telemetry data, including logs, metrics, and traces, to Microsoft Azure Blob Storage for scalable, cost-effective storage and further analysis. This destination integrates seamlessly with Azure's cloud ecosystem, allowing organizations to centralize telemetry data for observability, compliance, and analytics purposes.
Prerequisites
Before configuring the Azure Blob Storage Archival destination in Observo AI, ensure the following requirements are met to facilitate seamless data ingestion:
Azure Storage Account:
Create an Azure Storage account in the Azure portal if one does not already exist. This account serves as the storage hub for your data (Create a Storage Account).
Ensure the storage account is accessible and configured for write operations.
Note the storage account name and the container name where data will be stored.
Authentication:
Register an application in Azure Active Directory (Azure AD) to handle authentication for data ingestion (Register an Application).
Navigate to "App registrations" in the Azure portal, create a new registration, and note the Application (client) ID and Directory (tenant) ID.
Create a client secret under "Certificates & secrets" and securely store its value.
Alternatively, obtain an access key for the storage account (Manage Storage Account Access Keys).
Role Assignment:
Assign the "Storage Blob Data Contributor" role to the Azure AD application or service principal for the storage account to grant necessary permissions (Assign Azure Roles).
Verify the role assignment in the storage account’s "Access control (IAM)" section.
Blob Container:
Create a blob container within the storage account to store the telemetry data (Create a Container).
Ensure the container is accessible and matches the region of your storage account for optimal performance.
Azure Storage Account
Storage hub for telemetry data
Must be accessible for write operations
Authentication
Handles secure data ingestion
Store Client ID, Tenant ID, Client Secret, or Access Key
Role Assignment
Grants permissions to application
Assign "Storage Blob Data Contributor" role
Blob Container
Storage location for data
Create container in storage account
Integration
To configure Azure Blob Storage Archival as a destination in Observo AI, follow these steps:
Log in to Observo AI:
Navigate to the Destinations tab.
Click the Add Destinations button and select Create New.
Choose Azure Blob Storage from the list of available destinations to begin configuration.
Select use as archival to true
Select Azure Blob Storage Archival
General Settings:
Name: Add a unique identifier such as azure-blob-storage-1.
Description (Optional): Provide a description for the destination.
Container Name: Enter the Azure Blob Storage Account container name
Examplesmyblob
myobservostorage
Encoding:
Encoding Codec: The codec to use for encoding events. Default: JSON Encoding.
OptionsSub-OptionsJSON Encoding
Pretty JSON (False): Format JSON with indentation and line breaks for better readability. Fields to exclude from serialization (Add): Transformations to prepare an event for serialization. List of fields that are excluded from the encoded event. Example: message.payload Encoding Timestamp Format (Select): - RFC 3339 timestamp: Formats timestamps as RFC 3339 strings. (default) - Unix timestamp (Float): Formats timestamps as Unix epoch values in floating point. - Unix timestamp (Milliseconds): Formats timestamps as Unix epoch values in milliseconds. - Unix timestamp (Nanoseconds): Formats timestamps as Unix epoch values in nanoseconds. - Unix timestamp (Microseconds): Formats timestamps as Unix epoch values in microseconds. - Unix timestamp: Formats timestamps as Unix epoch values
logfmt Encoding
Fields to exclude from serialization (Add): Transformations to prepare an event for serialization. List of fields that are excluded from the encoded event. Example: message.payload Encoding Timestamp Format (Select): - RFC 3339 timestamp: Formats timestamps as RFC 3339 strings. (default) - Unix timestamp (Float): Formats timestamps as Unix epoch values in floating point. - Unix timestamp (Milliseconds): Formats timestamps as Unix epoch values in milliseconds. - Unix timestamp (Nanoseconds): Formats timestamps as Unix epoch values in nanoseconds. - Unix timestamp (Microseconds): Formats timestamps as Unix epoch values in microseconds. - Unix timestamp: Formats timestamps as Unix epoch values
Apache Avro Encoding
Avro Schema: Specify the Apache Avro schema definition for serializing events. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Fields to exclude from serialization (Add): Transformations to prepare an event for serialization. List of fields that are excluded from the encoded event. Example: message.payload Encoding Timestamp Format (Select): - RFC 3339 timestamp: Formats timestamps as RFC 3339 strings. (default) - Unix timestamp (Float): Formats timestamps as Unix epoch values in floating point. - Unix timestamp (Milliseconds): Formats timestamps as Unix epoch values in milliseconds. - Unix timestamp (Nanoseconds): Formats timestamps as Unix epoch values in nanoseconds. - Unix timestamp (Microseconds): Formats timestamps as Unix epoch values in microseconds. - Unix timestamp: Formats timestamps as Unix epoch values
Newline Delimited JSON Encoding
Fields to exclude from serialization (Add): Transformations to prepare an event for serialization. List of fields that are excluded from the encoded event. Example: message.payload Encoding Timestamp Format (Select): - RFC 3339 timestamp: Formats timestamps as RFC 3339 strings. (default) - Unix timestamp (Float): Formats timestamps as Unix epoch values in floating point. - Unix timestamp (Milliseconds): Formats timestamps as Unix epoch values in milliseconds. - Unix timestamp (Nanoseconds): Formats timestamps as Unix epoch values in nanoseconds. - Unix timestamp (Microseconds): Formats timestamps as Unix epoch values in microseconds. - Unix timestamp: Formats timestamps as Unix epoch values
No encoding
Fields to exclude from serialization (Add): Transformations to prepare an event for serialization. List of fields that are excluded from the encoded event. Example: message.payload Encoding Timestamp Format (Select): - RFC 3339 timestamp: Formats timestamps as RFC 3339 strings. (default) - Unix timestamp (Float): Formats timestamps as Unix epoch values in floating point. - Unix timestamp (Milliseconds): Formats timestamps as Unix epoch values in milliseconds. - Unix timestamp (Nanoseconds): Formats timestamps as Unix epoch values in nanoseconds. - Unix timestamp (Microseconds): Formats timestamps as Unix epoch values in microseconds. - Unix timestamp: Formats timestamps as Unix epoch values
Plain text encoding
Fields to exclude from serialization (Add): Transformations to prepare an event for serialization. List of fields that are excluded from the encoded event. Example: message.payload Encoding Timestamp Format (Select): - RFC 3339 timestamp: Formats timestamps as RFC 3339 strings. (default) - Unix timestamp (Float): Formats timestamps as Unix epoch values in floating point. - Unix timestamp (Milliseconds): Formats timestamps as Unix epoch values in milliseconds. - Unix timestamp (Nanoseconds): Formats timestamps as Unix epoch values in nanoseconds. - Unix timestamp (Microseconds): Formats timestamps as Unix epoch values in microseconds. - Unix timestamp: Formats timestamps as Unix epoch values
Parquet
Include Raw Log (False): Capture the complete log message as an additional field(observo_record) apart from the given schema. Examples: In addition to the Parquet schema, there will be a field named "observo_record" in the Parquet file. Parquet Schema: Enter parquet schema for encoding. Examples: message root { optional binary stream; optional binary time; optional group kubernetes { optional binary pod_name; optional binary pod_id; optional binary docker_id; optional binary container_hash; optional binary container_image; optional group labels { optional binary pod-template-hash; } } } Fields to exclude from serialization (Add): Transformations to prepare an event for serialization. List of fields that are excluded from the encoded event. Example: message.payload Encoding Timestamp Format (Select): - RFC 3339 timestamp: Formats timestamps as RFC 3339 strings. (default) - Unix timestamp (Float): Formats timestamps as Unix epoch values in floating point. - Unix timestamp (Milliseconds): Formats timestamps as Unix epoch values in milliseconds. - Unix timestamp (Nanoseconds): Formats timestamps as Unix epoch values in nanoseconds. - Unix timestamp (Microseconds): Formats timestamps as Unix epoch values in microseconds. - Unix timestamp: Formats timestamps as Unix epoch values
Common Event Format (CEF)
CEF Device Event Class ID: Provide a unique identifier for categorizing the type of event (maximum 1023 characters). Example: login-failure CEF Device Product: Specify the product name that generated the event (maximum 63 characters). Example: Log Analyzer CEF Device Vendor: Specify the vendor name that produced the event (maximum 63 characters). Example: Observo CEF Device Version: Specify the version of the product that generated the event (maximum 31 characters). Example: 1.0.0 CEF Extensions (Add): Define custom key-value pairs for additional event data fields in CEF format. CEF Name: Provide a human-readable description of the event (maximum 512 characters). Example: cef.name CEF Severity: Indicate the importance of the event with a value from 0 (lowest) to 10 (highest). Example: 5 CEF Version (Select): Specify which version of the CEF specification to use for formatting. - CEF specification version 0.1 - CEF specification version 1.x Fields to exclude from serialization (Add): Transformations to prepare an event for serialization. List of fields that are excluded from the encoded event. Example: message.payload Encoding Timestamp Format (Select): - RFC 3339 timestamp: Formats timestamps as RFC 3339 strings. (default) - Unix timestamp (Float): Formats timestamps as Unix epoch values in floating point. - Unix timestamp (Milliseconds): Formats timestamps as Unix epoch values in milliseconds. - Unix timestamp (Nanoseconds): Formats timestamps as Unix epoch values in nanoseconds. - Unix timestamp (Microseconds): Formats timestamps as Unix epoch values in microseconds. - Unix timestamp: Formats timestamps as Unix epoch values
CSV Format
CSV Fields (Add): Specify the field names to include as columns in the CSV output and their order. Examples: - timestamp - host - message CSV Buffer Capacity (Optional): Set the internal buffer size (in bytes) used when writing CSV data. Example: 8192 CSV Delimitier (Optional): Set the character that separates fields in the CSV output. Example: , Enable Double Quote Escapes (True): When enabled, quotes in field data are escaped by doubling them. When disabled, an escape character is used instead. CSV Escape Character (Optional): Set the character used to escape quotes when double_quote is disabled. Example: <br> CSV Quote Character (Optional): Set the character used for quoting fields in the CSV output. Example: " CSV Quoting Style (Optional): Control when field values should be wrapped in quote characters. Options: - Always quot all fields - Quote only when necessary - Never use quotes Quote all non-numeric fields Fields to exclude from serialization (Add): Transformations to prepare an event for serialization. List of fields that are excluded from the encoded event. Example: message.payload Encoding Timestamp Format (Select): - RFC 3339 timestamp: Formats timestamps as RFC 3339 strings. (default) - Unix timestamp (Float): Formats timestamps as Unix epoch values in floating point. - Unix timestamp (Milliseconds): Formats timestamps as Unix epoch values in milliseconds. - Unix timestamp (Nanoseconds): Formats timestamps as Unix epoch values in nanoseconds. - Unix timestamp (Microseconds): Formats timestamps as Unix epoch values in microseconds. - Unix timestamp: Formats timestamps as Unix epoch values
Protocol Buffers
Protobuf Message Type: Specify the fully qualified message type name for Protobuf serialization. Example: package.Message Protobuf Descriptor File: Specify the path to the compiled protobuf descriptor file (.desc). Example: /path/to/descriptor.desc Fields to exclude from serialization (Add): Transformations to prepare an event for serialization. List of fields that are excluded from the encoded event. Example: message.payload Encoding Timestamp Format (Select): - RFC 3339 timestamp: Formats timestamps as RFC 3339 strings. (default) - Unix timestamp (Float): Formats timestamps as Unix epoch values in floating point. - Unix timestamp (Milliseconds): Formats timestamps as Unix epoch values in milliseconds. - Unix timestamp (Nanoseconds): Formats timestamps as Unix epoch values in nanoseconds. - Unix timestamp (Microseconds): Formats timestamps as Unix epoch values in microseconds. - Unix timestamp: Formats timestamps as Unix epoch values
Graylog Extended Log Format (GELF)
Fields to exclude from serialization (Add): Transformations to prepare an event for serialization. List of fields that are excluded from the encoded event. Example: message.payload Encoding Timestamp Format (Select): - RFC 3339 timestamp: Formats timestamps as RFC 3339 strings. (default) - Unix timestamp (Float): Formats timestamps as Unix epoch values in floating point. - Unix timestamp (Milliseconds): Formats timestamps as Unix epoch values in milliseconds. - Unix timestamp (Nanoseconds): Formats timestamps as Unix epoch values in nanoseconds. - Unix timestamp (Microseconds): Formats timestamps as Unix epoch values in microseconds. - Unix timestamp: Formats timestamps as Unix epoch values
Batching Requirements (Default):
Batch Max Bytes: The maximum size of a batch that will be processed by a sink. This is based on the uncompressed size of the batched events, before they are serialized / compressed. Default: 500000 (5 MB)
Batch Max Events: The maximum size of a batch before it is flushed. Default: 1000
Batch Timeout Secs: The maximum age of a batch before it is flushed. Default: 300
TLS Configuration (Optional):
TLS CA: Provide the CA certificate in PEM format.
TLS CRT: Provide the client certificate in PEM format.
TLS Key: Provide the private key in PEM format.
TLS Key Pass: Passphrase used to unlock the encrypted key file.
This has no effect unless key_file is set.
Examples${KEY_PASS_ENV_VAR}
PassWord1
TLS Verify Certificate (False): Enables certificate verification. Certificates must be valid in terms of not being expired, and being issued by a trusted issuer. This verification operates in a hierarchical manner, checking validity of the certificate, the issuer of that certificate and so on until reaching a root certificate. Relevant for both incoming and outgoing connections. Do NOT set this to false unless you understand the risks of not verifying the validity of certificates.
TLS Verify Hostname: Enables hostname verification. If enabled, the hostname used to connect to the remote host must be present in the TLS certificate presented by the remote host, either as the Common Name or as an entry in the Subject Alternative Name extension. Only relevant for outgoing connections. Do NOT set this to false unless you understand the risks of not verifying the remote hostname
Advanced Settings (Optional):
Connection String: The Azure Blob Storage Account connection string. Either 'Storage Account', or this field, must be specified.
Blob Prefix: Prefix to apply to all blob keys. Useful for partitioning objects. Must end in / to act as a directory path.
Examplesdate=%F/hour=%H
year=%Y/month=%m/day=%d
application_id={{ application_id }}/date=%F
%Y/%m/%d
date=%F
Blob Append UUID to Timestamp (False): Whether or not to append a UUID v4 token to the end of the object key’s timestamp portion. Ensure uniqueness of name in high performance use cases.
ExampleFor object key `date=2022-07-18/1658176486`, setting this field to `true` would result in an object key that looked like `date=2022-07-18/1658176486-30f6652c-71da-4f9f-800d-a1189c47c547`.
Blob Time Format: The timestamp format for the time component of the blob key. By default, blob keys are appended with timestamp (in epoch seconds) reflecting when the objects are sent to S3. The resulting blob key is the key prefix followed by the formatted timestamp, eg: date=2022-07-18/1658176486. Supports strftime specifiers. Default: %s
Example%s
Storage Account: The Azure Blob Storage Account name. Either 'Connection String', or this field, must be specified.
Compression: Compression algorithm to use for the request body. Default: Gzip compression
OptionsDescriptionGzip compression
DEFLATE compression with headers for file storage
None
Data stored and transmitted in original form
Healthcheck (False): Whether or not to check the health of the sink when Observo Agent starts up.
Time Generated Key (Optional): Use this option to customize the log field used as TimeGenerated in Azure. The setting of log_schema.timestamp_key, usually timestamp, is used here by default. This field should be used in rare cases where TimeGenerated should point to a specific log field. For example, use this field to set the log field source_timestamp as holding the value that should be used as TimeGenerated on the Azure side.
Exampletime_generated
Save and Test Configuration:
Save the configuration settings in Observo AI.
Send sample data and verify that it reaches the specified blob container in Azure Blob Storage.
Example Scenarios
InsureTech Solutions, a fictitious North American insurance enterprise, leverages data-driven risk assessment and claims processing by generating extensive telemetry data, including policy transaction logs, claims metrics, and application traces, to ensure SOC 2 compliance, optimize workflows, and enhance fraud detection. To achieve scalable, secure, and cost-effective storage, the company integrates Observo AI with Microsoft Azure Blob Storage, aiming to centralize telemetry data in the insuretech-telemetry-archive container using CSV format with Gzip compression for analytics integration, strict access controls, and unique object naming for high-volume data. Authentication via Azure AD, TLS for secure transfer, and specific configurations ensure reliability and regulatory compliance for long-term retention and advanced analytics.
Standard Azure Blob Storage Archival Destination Setup
Here is a standard Azure Blob Storage Archival Destination configuration example. Only the required sections and their associated field updates are displayed in the table below:
General Settings
Name
insuretech-telemetry-archive
Unique identifier for the destination.
Description
Archival destination for InsureTech Solutions telemetry data
Optional description for clarity.
Container Name
insuretech-telemetry-archive
The Azure Blob Storage container name for storing telemetry data.
Encoding
Encoding Codec
CSV Format
Uses CSV format for compatibility with analytics platforms.
CSV Fields
timestamp, policy_id, claim_status, message
Specifies columns for CSV output: timestamp, policy ID, claim status, and message.
CSV Buffer Capacity
8192
Internal buffer size of 8192 bytes for writing CSV data.
CSV Delimiter
,
Uses comma as the field separator in CSV output.
Enable Double Quote Escapes
True
Quotes in field data are escaped by doubling them.
CSV Quote Character
"
Uses double quotes for quoting fields in CSV output.
CSV Quoting Style
Quote only when necessary
Quotes fields only when required (e.g., for fields containing commas).
Encoding Avro Schema
{ "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }, { "name": "policy_id", "type": "string" }, { "name": "claim_status", "type": "string" }] }
Optional Avro schema for additional serialization compatibility.
Encoding Metric Tag Values
Tags exposed as arrays of strings
Exposes all metric tags as arrays for detailed analytics.
Encoding Timestamp Format
RFC 3339 timestamp
Uses RFC 3339 format for consistent timestamp formatting.
Batching Requirements
Batch Max Bytes
10485760
Maximum batch size of 10MB (uncompressed) to balance throughput and memory usage.
Batch Max Events
1000
Flushes batches after 1000 events to manage batch size.
Batch Timeout Secs
300
Flushes batches after 300 seconds to ensure timely data transfer.
TLS Configuration
TLS CA
/path/to/ca-cert.pem
CA certificate in PEM format to verify the Azure Blob Storage endpoint.
TLS CRT
/path/to/client-cert.pem
Client certificate in PEM format for mutual TLS authentication.
TLS Key
/path/to/client-key.pem
Private key in PEM format corresponding to the client certificate.
TLS Key Pass
InsureTechTLSKey2025
Passphrase to unlock the encrypted key file.
TLS Verify Certificate
True
Enables certificate verification for secure connections.
TLS Verify Hostname
True
Ensures the hostname matches the certificate for added security.
Advanced Settings
Connection String
DefaultEndpointsProtocol=https;AccountName=insuretechstorage;AccountKey=xxxx;EndpointSuffix=core.windows.net
Azure Blob Storage connection string for authentication.
Blob Prefix
year=%Y/month=%m/day=%d/
Organizes blobs by year, month, and day for partitioning.
Blob Append UUID to Timestamp
True
Appends a UUID v4 token to the timestamp for unique blob keys (e.g., date=2025-07-13/1626196486-30f6652c-71da-4f9f-800d-a1189c47c547).
Blob Time Format
%s
Timestamps in seconds since the Unix epoch for blob keys.
Storage Account
insuretechstorage
The Azure Blob Storage account name.
Compression
Gzip compression
Uses DEFLATE compression for efficient storage.
Healthcheck
True
Enables health checks to verify sink connectivity on startup.
Time Generated Key
timestamp
Uses the default timestamp field as the TimeGenerated value in Azure.
Test Configuration
Send sample data and verify that it reaches the insuretech-telemetry-archive container in Azure Blob Storage.
Troubleshooting
If issues arise with the Azure Blob Storage Archival destination, use the following steps to diagnose and resolve them:
Verify Configuration Settings:
Ensure all fields, such as Storage Account Name, Container Name, Client ID, Tenant ID, Client Secret, or Access Key, are correctly entered and match Azure configurations.
Check Authentication:
Verify that the client secret is valid and has not expired, or confirm the access key is correct.
Confirm that the Azure AD application has the "Storage Blob Data Contributor" role assigned for the storage account.
Monitor Logs:
Check Observo AI logs for errors or warnings related to data transmission.
In the Azure portal, navigate to the storage account and inspect the blob container to confirm data arrival.
Validate Container Configuration:
Ensure the specified container exists and is accessible within the storage account.
Network and Connectivity:
Check for firewall rules or network policies that may block communication between Observo AI and Azure Blob Storage.
Verify that the storage account endpoint is accessible.
Test Data Flow:
Send sample data and monitor its arrival in the blob container.
Use the Analytics tab in the targeted Observo AI pipeline to monitor data volume and ensure expected throughput.
Check Quotas and Limits:
Verify that the storage account is not hitting Azure’s rate limits or quotas (Azure Blob Storage Quotas).
Data not appearing in container
Incorrect storage account or container name
Verify account and container names in configuration
Authentication errors
Expired or incorrect client secret or access key
Regenerate secret or key and update configuration
Connection failures
Network or firewall issues
Check network policies and connectivity
Slow data transfer
Backpressure or rate limiting
Adjust batching settings or check Azure quotas
Resources
For additional guidance and detailed information, refer to the following resources:
Last updated
Was this helpful?

