Azure Blob Storage Archival

The Observo AI Azure Blob Storage Archival destination enables scalable, cost-effective storage of telemetry data like logs, metrics, and traces in Google Cloud Storage, supporting flexible formats such as JSON, CSV, and Parquet with secure authentication, compression, and customizable access controls for observability and compliance.

Purpose

The Observo AI Azure Blob Storage Archival destination enables users to send telemetry data, including logs, metrics, and traces, to Microsoft Azure Blob Storage for scalable, cost-effective storage and further analysis. This destination integrates seamlessly with Azure's cloud ecosystem, allowing organizations to centralize telemetry data for observability, compliance, and analytics purposes.

Prerequisites

Before configuring the Azure Blob Storage Archival destination in Observo AI, ensure the following requirements are met to facilitate seamless data ingestion:

  • Azure Storage Account:

    • Create an Azure Storage account in the Azure portal if one does not already exist. This account serves as the storage hub for your data (Create a Storage Account).

    • Ensure the storage account is accessible and configured for write operations.

    • Note the storage account name and the container name where data will be stored.

  • Authentication:

    • Register an application in Azure Active Directory (Azure AD) to handle authentication for data ingestion (Register an Application).

    • Navigate to "App registrations" in the Azure portal, create a new registration, and note the Application (client) ID and Directory (tenant) ID.

    • Create a client secret under "Certificates & secrets" and securely store its value.

    • Alternatively, obtain an access key for the storage account (Manage Storage Account Access Keys).

  • Role Assignment:

    • Assign the "Storage Blob Data Contributor" role to the Azure AD application or service principal for the storage account to grant necessary permissions (Assign Azure Roles).

    • Verify the role assignment in the storage account’s "Access control (IAM)" section.

  • Blob Container:

    • Create a blob container within the storage account to store the telemetry data (Create a Container).

    • Ensure the container is accessible and matches the region of your storage account for optimal performance.

Prerequisite
Description
Notes

Azure Storage Account

Storage hub for telemetry data

Must be accessible for write operations

Authentication

Handles secure data ingestion

Store Client ID, Tenant ID, Client Secret, or Access Key

Role Assignment

Grants permissions to application

Assign "Storage Blob Data Contributor" role

Blob Container

Storage location for data

Create container in storage account

Integration

To configure Azure Blob Storage Archival as a destination in Observo AI, follow these steps:

  1. Log in to Observo AI:

    • Navigate to the Destinations tab.

    • Click the Add Destinations button and select Create New.

    • Choose Azure Blob Storage from the list of available destinations to begin configuration.

    • Select use as archival to true

    • Select Azure Blob Storage Archival

  2. General Settings:

    • Name: Add a unique identifier such as azure-blob-storage-1.

    • Description (Optional): Provide a description for the destination.

    • Container Name: Enter the Azure Blob Storage Account container name

      Examples

      myblob

      myobservostorage

  3. Encoding:

    • Encoding Codec: The codec to use for encoding events. Default: JSON Encoding.

      Options
      Sub-Options

      JSON Encoding

      Pretty JSON (False): Format JSON with indentation and line breaks for better readability. Fields to exclude from serialization (Add): Transformations to prepare an event for serialization. List of fields that are excluded from the encoded event. Example: message.payload Encoding Timestamp Format (Select): - RFC 3339 timestamp: Formats timestamps as RFC 3339 strings. (default) - Unix timestamp (Float): Formats timestamps as Unix epoch values in floating point. - Unix timestamp (Milliseconds): Formats timestamps as Unix epoch values in milliseconds. - Unix timestamp (Nanoseconds): Formats timestamps as Unix epoch values in nanoseconds. - Unix timestamp (Microseconds): Formats timestamps as Unix epoch values in microseconds. - Unix timestamp: Formats timestamps as Unix epoch values

      logfmt Encoding

      Fields to exclude from serialization (Add): Transformations to prepare an event for serialization. List of fields that are excluded from the encoded event. Example: message.payload Encoding Timestamp Format (Select): - RFC 3339 timestamp: Formats timestamps as RFC 3339 strings. (default) - Unix timestamp (Float): Formats timestamps as Unix epoch values in floating point. - Unix timestamp (Milliseconds): Formats timestamps as Unix epoch values in milliseconds. - Unix timestamp (Nanoseconds): Formats timestamps as Unix epoch values in nanoseconds. - Unix timestamp (Microseconds): Formats timestamps as Unix epoch values in microseconds. - Unix timestamp: Formats timestamps as Unix epoch values

      Apache Avro Encoding

      Avro Schema: Specify the Apache Avro schema definition for serializing events. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Fields to exclude from serialization (Add): Transformations to prepare an event for serialization. List of fields that are excluded from the encoded event. Example: message.payload Encoding Timestamp Format (Select): - RFC 3339 timestamp: Formats timestamps as RFC 3339 strings. (default) - Unix timestamp (Float): Formats timestamps as Unix epoch values in floating point. - Unix timestamp (Milliseconds): Formats timestamps as Unix epoch values in milliseconds. - Unix timestamp (Nanoseconds): Formats timestamps as Unix epoch values in nanoseconds. - Unix timestamp (Microseconds): Formats timestamps as Unix epoch values in microseconds. - Unix timestamp: Formats timestamps as Unix epoch values

      Newline Delimited JSON Encoding

      Fields to exclude from serialization (Add): Transformations to prepare an event for serialization. List of fields that are excluded from the encoded event. Example: message.payload Encoding Timestamp Format (Select): - RFC 3339 timestamp: Formats timestamps as RFC 3339 strings. (default) - Unix timestamp (Float): Formats timestamps as Unix epoch values in floating point. - Unix timestamp (Milliseconds): Formats timestamps as Unix epoch values in milliseconds. - Unix timestamp (Nanoseconds): Formats timestamps as Unix epoch values in nanoseconds. - Unix timestamp (Microseconds): Formats timestamps as Unix epoch values in microseconds. - Unix timestamp: Formats timestamps as Unix epoch values

      No encoding

      Fields to exclude from serialization (Add): Transformations to prepare an event for serialization. List of fields that are excluded from the encoded event. Example: message.payload Encoding Timestamp Format (Select): - RFC 3339 timestamp: Formats timestamps as RFC 3339 strings. (default) - Unix timestamp (Float): Formats timestamps as Unix epoch values in floating point. - Unix timestamp (Milliseconds): Formats timestamps as Unix epoch values in milliseconds. - Unix timestamp (Nanoseconds): Formats timestamps as Unix epoch values in nanoseconds. - Unix timestamp (Microseconds): Formats timestamps as Unix epoch values in microseconds. - Unix timestamp: Formats timestamps as Unix epoch values

      Plain text encoding

      Fields to exclude from serialization (Add): Transformations to prepare an event for serialization. List of fields that are excluded from the encoded event. Example: message.payload Encoding Timestamp Format (Select): - RFC 3339 timestamp: Formats timestamps as RFC 3339 strings. (default) - Unix timestamp (Float): Formats timestamps as Unix epoch values in floating point. - Unix timestamp (Milliseconds): Formats timestamps as Unix epoch values in milliseconds. - Unix timestamp (Nanoseconds): Formats timestamps as Unix epoch values in nanoseconds. - Unix timestamp (Microseconds): Formats timestamps as Unix epoch values in microseconds. - Unix timestamp: Formats timestamps as Unix epoch values

      Parquet

      Include Raw Log (False): Capture the complete log message as an additional field(observo_record) apart from the given schema. Examples: In addition to the Parquet schema, there will be a field named "observo_record" in the Parquet file. Parquet Schema: Enter parquet schema for encoding. Examples: message root { optional binary stream; optional binary time; optional group kubernetes { optional binary pod_name; optional binary pod_id; optional binary docker_id; optional binary container_hash; optional binary container_image; optional group labels { optional binary pod-template-hash; } } } Fields to exclude from serialization (Add): Transformations to prepare an event for serialization. List of fields that are excluded from the encoded event. Example: message.payload Encoding Timestamp Format (Select): - RFC 3339 timestamp: Formats timestamps as RFC 3339 strings. (default) - Unix timestamp (Float): Formats timestamps as Unix epoch values in floating point. - Unix timestamp (Milliseconds): Formats timestamps as Unix epoch values in milliseconds. - Unix timestamp (Nanoseconds): Formats timestamps as Unix epoch values in nanoseconds. - Unix timestamp (Microseconds): Formats timestamps as Unix epoch values in microseconds. - Unix timestamp: Formats timestamps as Unix epoch values

      Common Event Format (CEF)

      CEF Device Event Class ID: Provide a unique identifier for categorizing the type of event (maximum 1023 characters). Example: login-failure CEF Device Product: Specify the product name that generated the event (maximum 63 characters). Example: Log Analyzer CEF Device Vendor: Specify the vendor name that produced the event (maximum 63 characters). Example: Observo CEF Device Version: Specify the version of the product that generated the event (maximum 31 characters). Example: 1.0.0 CEF Extensions (Add): Define custom key-value pairs for additional event data fields in CEF format. CEF Name: Provide a human-readable description of the event (maximum 512 characters). Example: cef.name CEF Severity: Indicate the importance of the event with a value from 0 (lowest) to 10 (highest). Example: 5 CEF Version (Select): Specify which version of the CEF specification to use for formatting. - CEF specification version 0.1 - CEF specification version 1.x Fields to exclude from serialization (Add): Transformations to prepare an event for serialization. List of fields that are excluded from the encoded event. Example: message.payload Encoding Timestamp Format (Select): - RFC 3339 timestamp: Formats timestamps as RFC 3339 strings. (default) - Unix timestamp (Float): Formats timestamps as Unix epoch values in floating point. - Unix timestamp (Milliseconds): Formats timestamps as Unix epoch values in milliseconds. - Unix timestamp (Nanoseconds): Formats timestamps as Unix epoch values in nanoseconds. - Unix timestamp (Microseconds): Formats timestamps as Unix epoch values in microseconds. - Unix timestamp: Formats timestamps as Unix epoch values

      CSV Format

      CSV Fields (Add): Specify the field names to include as columns in the CSV output and their order. Examples: - timestamp - host - message CSV Buffer Capacity (Optional): Set the internal buffer size (in bytes) used when writing CSV data. Example: 8192 CSV Delimitier (Optional): Set the character that separates fields in the CSV output. Example: , Enable Double Quote Escapes (True): When enabled, quotes in field data are escaped by doubling them. When disabled, an escape character is used instead. CSV Escape Character (Optional): Set the character used to escape quotes when double_quote is disabled. Example: <br> CSV Quote Character (Optional): Set the character used for quoting fields in the CSV output. Example: " CSV Quoting Style (Optional): Control when field values should be wrapped in quote characters. Options: - Always quot all fields - Quote only when necessary - Never use quotes Quote all non-numeric fields Fields to exclude from serialization (Add): Transformations to prepare an event for serialization. List of fields that are excluded from the encoded event. Example: message.payload Encoding Timestamp Format (Select): - RFC 3339 timestamp: Formats timestamps as RFC 3339 strings. (default) - Unix timestamp (Float): Formats timestamps as Unix epoch values in floating point. - Unix timestamp (Milliseconds): Formats timestamps as Unix epoch values in milliseconds. - Unix timestamp (Nanoseconds): Formats timestamps as Unix epoch values in nanoseconds. - Unix timestamp (Microseconds): Formats timestamps as Unix epoch values in microseconds. - Unix timestamp: Formats timestamps as Unix epoch values

      Protocol Buffers

      Protobuf Message Type: Specify the fully qualified message type name for Protobuf serialization. Example: package.Message Protobuf Descriptor File: Specify the path to the compiled protobuf descriptor file (.desc). Example: /path/to/descriptor.desc Fields to exclude from serialization (Add): Transformations to prepare an event for serialization. List of fields that are excluded from the encoded event. Example: message.payload Encoding Timestamp Format (Select): - RFC 3339 timestamp: Formats timestamps as RFC 3339 strings. (default) - Unix timestamp (Float): Formats timestamps as Unix epoch values in floating point. - Unix timestamp (Milliseconds): Formats timestamps as Unix epoch values in milliseconds. - Unix timestamp (Nanoseconds): Formats timestamps as Unix epoch values in nanoseconds. - Unix timestamp (Microseconds): Formats timestamps as Unix epoch values in microseconds. - Unix timestamp: Formats timestamps as Unix epoch values

      Graylog Extended Log Format (GELF)

      Fields to exclude from serialization (Add): Transformations to prepare an event for serialization. List of fields that are excluded from the encoded event. Example: message.payload Encoding Timestamp Format (Select): - RFC 3339 timestamp: Formats timestamps as RFC 3339 strings. (default) - Unix timestamp (Float): Formats timestamps as Unix epoch values in floating point. - Unix timestamp (Milliseconds): Formats timestamps as Unix epoch values in milliseconds. - Unix timestamp (Nanoseconds): Formats timestamps as Unix epoch values in nanoseconds. - Unix timestamp (Microseconds): Formats timestamps as Unix epoch values in microseconds. - Unix timestamp: Formats timestamps as Unix epoch values

  4. Batching Requirements (Default):

    • Batch Max Bytes: The maximum size of a batch that will be processed by a sink. This is based on the uncompressed size of the batched events, before they are serialized / compressed. Default: 500000 (5 MB)

    • Batch Max Events: The maximum size of a batch before it is flushed. Default: 1000

    • Batch Timeout Secs: The maximum age of a batch before it is flushed. Default: 300

  5. TLS Configuration (Optional):

    • TLS CA: Provide the CA certificate in PEM format.

    • TLS CRT: Provide the client certificate in PEM format.

    • TLS Key: Provide the private key in PEM format.

    • TLS Key Pass: Passphrase used to unlock the encrypted key file.

    • This has no effect unless key_file is set.

      Examples

      ${KEY_PASS_ENV_VAR}

      PassWord1

    • TLS Verify Certificate (False): Enables certificate verification. Certificates must be valid in terms of not being expired, and being issued by a trusted issuer. This verification operates in a hierarchical manner, checking validity of the certificate, the issuer of that certificate and so on until reaching a root certificate. Relevant for both incoming and outgoing connections. Do NOT set this to false unless you understand the risks of not verifying the validity of certificates.

    • TLS Verify Hostname: Enables hostname verification. If enabled, the hostname used to connect to the remote host must be present in the TLS certificate presented by the remote host, either as the Common Name or as an entry in the Subject Alternative Name extension. Only relevant for outgoing connections. Do NOT set this to false unless you understand the risks of not verifying the remote hostname

  6. Advanced Settings (Optional):

    • Connection String: The Azure Blob Storage Account connection string. Either 'Storage Account', or this field, must be specified.

    • Blob Prefix: Prefix to apply to all blob keys. Useful for partitioning objects. Must end in / to act as a directory path.

      Examples

      date=%F/hour=%H

      year=%Y/month=%m/day=%d

      application_id={{ application_id }}/date=%F

      %Y/%m/%d

      date=%F

    • Blob Append UUID to Timestamp (False): Whether or not to append a UUID v4 token to the end of the object key’s timestamp portion. Ensure uniqueness of name in high performance use cases.

      Example

      For object key `date=2022-07-18/1658176486`, setting this field to `true` would result in an object key that looked like `date=2022-07-18/1658176486-30f6652c-71da-4f9f-800d-a1189c47c547`.

    • Blob Time Format: The timestamp format for the time component of the blob key. By default, blob keys are appended with timestamp (in epoch seconds) reflecting when the objects are sent to S3. The resulting blob key is the key prefix followed by the formatted timestamp, eg: date=2022-07-18/1658176486. Supports strftime specifiers. Default: %s

      Example

      %s

    • Storage Account: The Azure Blob Storage Account name. Either 'Connection String', or this field, must be specified.

    • Compression: Compression algorithm to use for the request body. Default: Gzip compression

      Options
      Description

      Gzip compression

      DEFLATE compression with headers for file storage

      None

      Data stored and transmitted in original form

    • Healthcheck (False): Whether or not to check the health of the sink when Observo Agent starts up.

    • Time Generated Key (Optional): Use this option to customize the log field used as TimeGenerated in Azure. The setting of log_schema.timestamp_key, usually timestamp, is used here by default. This field should be used in rare cases where TimeGenerated should point to a specific log field. For example, use this field to set the log field source_timestamp as holding the value that should be used as TimeGenerated on the Azure side.

      Example

      time_generated

  7. Save and Test Configuration:

    • Save the configuration settings in Observo AI.

    • Send sample data and verify that it reaches the specified blob container in Azure Blob Storage.

Example Scenarios

InsureTech Solutions, a fictitious North American insurance enterprise, leverages data-driven risk assessment and claims processing by generating extensive telemetry data, including policy transaction logs, claims metrics, and application traces, to ensure SOC 2 compliance, optimize workflows, and enhance fraud detection. To achieve scalable, secure, and cost-effective storage, the company integrates Observo AI with Microsoft Azure Blob Storage, aiming to centralize telemetry data in the insuretech-telemetry-archive container using CSV format with Gzip compression for analytics integration, strict access controls, and unique object naming for high-volume data. Authentication via Azure AD, TLS for secure transfer, and specific configurations ensure reliability and regulatory compliance for long-term retention and advanced analytics.

Standard Azure Blob Storage Archival Destination Setup

Here is a standard Azure Blob Storage Archival Destination configuration example. Only the required sections and their associated field updates are displayed in the table below:

General Settings

Field
Value
Description

Name

insuretech-telemetry-archive

Unique identifier for the destination.

Description

Archival destination for InsureTech Solutions telemetry data

Optional description for clarity.

Container Name

insuretech-telemetry-archive

The Azure Blob Storage container name for storing telemetry data.

Encoding

Field
Value
Description

Encoding Codec

CSV Format

Uses CSV format for compatibility with analytics platforms.

CSV Fields

timestamp, policy_id, claim_status, message

Specifies columns for CSV output: timestamp, policy ID, claim status, and message.

CSV Buffer Capacity

8192

Internal buffer size of 8192 bytes for writing CSV data.

CSV Delimiter

,

Uses comma as the field separator in CSV output.

Enable Double Quote Escapes

True

Quotes in field data are escaped by doubling them.

CSV Quote Character

"

Uses double quotes for quoting fields in CSV output.

CSV Quoting Style

Quote only when necessary

Quotes fields only when required (e.g., for fields containing commas).

Encoding Avro Schema

{ "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }, { "name": "policy_id", "type": "string" }, { "name": "claim_status", "type": "string" }] }

Optional Avro schema for additional serialization compatibility.

Encoding Metric Tag Values

Tags exposed as arrays of strings

Exposes all metric tags as arrays for detailed analytics.

Encoding Timestamp Format

RFC 3339 timestamp

Uses RFC 3339 format for consistent timestamp formatting.

Batching Requirements

Field
Value
Description

Batch Max Bytes

10485760

Maximum batch size of 10MB (uncompressed) to balance throughput and memory usage.

Batch Max Events

1000

Flushes batches after 1000 events to manage batch size.

Batch Timeout Secs

300

Flushes batches after 300 seconds to ensure timely data transfer.

TLS Configuration

Field
Value
Description

TLS CA

/path/to/ca-cert.pem

CA certificate in PEM format to verify the Azure Blob Storage endpoint.

TLS CRT

/path/to/client-cert.pem

Client certificate in PEM format for mutual TLS authentication.

TLS Key

/path/to/client-key.pem

Private key in PEM format corresponding to the client certificate.

TLS Key Pass

InsureTechTLSKey2025

Passphrase to unlock the encrypted key file.

TLS Verify Certificate

True

Enables certificate verification for secure connections.

TLS Verify Hostname

True

Ensures the hostname matches the certificate for added security.

Advanced Settings

Field
Value
Description

Connection String

DefaultEndpointsProtocol=https;AccountName=insuretechstorage;AccountKey=xxxx;EndpointSuffix=core.windows.net

Azure Blob Storage connection string for authentication.

Blob Prefix

year=%Y/month=%m/day=%d/

Organizes blobs by year, month, and day for partitioning.

Blob Append UUID to Timestamp

True

Appends a UUID v4 token to the timestamp for unique blob keys (e.g., date=2025-07-13/1626196486-30f6652c-71da-4f9f-800d-a1189c47c547).

Blob Time Format

%s

Timestamps in seconds since the Unix epoch for blob keys.

Storage Account

insuretechstorage

The Azure Blob Storage account name.

Compression

Gzip compression

Uses DEFLATE compression for efficient storage.

Healthcheck

True

Enables health checks to verify sink connectivity on startup.

Time Generated Key

timestamp

Uses the default timestamp field as the TimeGenerated value in Azure.

Test Configuration

  • Send sample data and verify that it reaches the insuretech-telemetry-archive container in Azure Blob Storage.

Troubleshooting

If issues arise with the Azure Blob Storage Archival destination, use the following steps to diagnose and resolve them:

  • Verify Configuration Settings:

    • Ensure all fields, such as Storage Account Name, Container Name, Client ID, Tenant ID, Client Secret, or Access Key, are correctly entered and match Azure configurations.

  • Check Authentication:

    • Verify that the client secret is valid and has not expired, or confirm the access key is correct.

    • Confirm that the Azure AD application has the "Storage Blob Data Contributor" role assigned for the storage account.

  • Monitor Logs:

    • Check Observo AI logs for errors or warnings related to data transmission.

    • In the Azure portal, navigate to the storage account and inspect the blob container to confirm data arrival.

  • Validate Container Configuration:

    • Ensure the specified container exists and is accessible within the storage account.

  • Network and Connectivity:

    • Check for firewall rules or network policies that may block communication between Observo AI and Azure Blob Storage.

    • Verify that the storage account endpoint is accessible.

  • Test Data Flow:

    • Send sample data and monitor its arrival in the blob container.

    • Use the Analytics tab in the targeted Observo AI pipeline to monitor data volume and ensure expected throughput.

  • Check Quotas and Limits:

Issue
Possible Cause
Resolution

Data not appearing in container

Incorrect storage account or container name

Verify account and container names in configuration

Authentication errors

Expired or incorrect client secret or access key

Regenerate secret or key and update configuration

Connection failures

Network or firewall issues

Check network policies and connectivity

Slow data transfer

Backpressure or rate limiting

Adjust batching settings or check Azure quotas

Resources

For additional guidance and detailed information, refer to the following resources:

Last updated

Was this helpful?