Azure Blob Storage Receiver

A source that reads logs from Azure Blob Storage containers, triggered by Azure Event Hub notifications when new blobs are created or via scheduled polling via directory traversal on Blob containers.

When mode is set to traversal the system can read and process files stored in Azure blob storage containers. It handles two types of files: regular files that are processed in full when changed, and append files (like logs) that can be read incrementally as new content is added. It supports gzip and regular log files.

Purpose

​​The Observo AI Azure Blob Storage Receiver source enables the ingestion of data from Azure Blob Storage containers into the Observo AI platform for processing and analysis. It supports various file formats, such as JSON or CSV, allowing organizations to leverage stored data for observability and analytics. This integration facilitates efficient data retrieval from Azure’s scalable storage service for monitoring and insights.

Prerequisites

Before configuring the Azure Blob Storage Receiver source in Observo AI, ensure the following requirements are met to facilitate seamless data ingestion:

  • Observo AI Platform Setup:

    • The Observo AI Site must be installed and operational.

    • Verify that the platform can process expected file formats, such as JSON, CSV, or Parquet, if applicable.

  • Azure Storage Account:

    • An active Azure subscription with a storage account and at least one container created (Create a Storage Account).

    • The storage account must be accessible to Observo AI, either publicly or via private endpoints or firewall rules.

    • Required permissions:

      • The account or application used by Observo AI must have read access to the container, typically via a Shared Access Signature (SAS) token, Storage Account Key, or Microsoft Entra ID role with “Storage Blob Data Reader” permissions (Azure Blob Storage Access Control).

  • Authentication:

    • Prepare one of the following authentication methods:

      • Storage Account Key: Use the access key from the storage account’s “Access keys” section.

      • Shared Access Signature (SAS): Generate a SAS token with read permissions for the container (Create SAS Token).

      • OAuth (Microsoft Entra ID): Use a Client ID, Tenant ID, Client Secret, and Scope (e.g., https://storage.azure.com/.default) for authentication.

  • Network and Connectivity:

    • Ensure Observo AI can communicate with Azure Blob Storage endpoints, typically over HTTPS (port 443).

    • If using private endpoints or firewall rules, configure them to allow access from Observo AI (Azure Private Link for Blob Storage).

    • If a proxy is used, ensure it supports HTTPS traffic to Azure endpoints.

Prerequisite
Description
Notes

Observo AI Platform

Must support Azure Blob Storage

Verify file format compatibility

Azure Storage Account

Active account with container

Create via Azure portal

Permissions

Read access to container

Use SAS, Storage Account Key, or Entra ID

Authentication

Key, SAS, or OAuth

Prepare credentials accordingly

Network

HTTPS connectivity

Allow port 443, configure private endpoints if needed

Integration

The Integration section outlines the configurations. To configure Azure Blob Storage Receiver as a source in Observo AI, follow these steps to set up and test the data flow:

  1. Log in to Observo AI:

    • Click on “Add Sources” button and select “Create New

    • Choose “Azure Blob Storage Receiver” from the list of available destinations to begin configuration.

  2. General Settings:

    • Name: A unique identifier for the source such as blob-storage-source-1.

    • Description: Optional description for the source.

    • Mode (Optional): The mode to choose, whether to use direct Azure Blob directory traversal or rely on EventHub for Blob notifications. Default: Use Event-Hub for Blob Notifications.

      Options
      Description

      EventHub

      Use EventHub for Blob Notifications

      Traversal

      Use Directory Traversal for Blobs

    • Event Hub Endpoint: Azure Event Hub endpoint triggering on the Blob Create event. The receiver subscribes to the events published by Azure Blob Storage and handled by Azure Event Hub. When it receives a Blob Create event, it reads the logs or traces from a corresponding blob and deletes it after processing. Required only when mode is EventHub. See Trigger Azure Event Hub on Blob Creation section for further details.

    • Authentication Method: The authentication method to use when connecting to Azure Blob Storage. Default: Connection String

      Options
      Description

      Connection String (Default)

      Use connection string for authentication.

      Service Principal (need to select)

      Use service principal for authentication.

      • Connection String (Default): ​​The connection string to use when connecting to Azure Blob Storage.

        Example

        DefaultEndpointsProtocol=https;AccountName=accountName;AccountKey=+idLkHYcL0MUWIKYHm2j4Q==;EndpointSuffix=core.windows.net

      • Service Principal (If selected):

        • Tenant ID: The tenant ID of the service principal to use when connecting to Azure Blob Storage. Example: ${tenant_id}

        • Client ID: The client ID of the service principal to use when connecting to Azure Blob Storage. Example: ${client_id}

        • Client Secret: The Client Secret of the service principal to use when connecting to Azure Blob Storage. Example: ${env:CLIENT_SECRET}

        • Storage Account URL: The URL of the storage account to use when connecting to Azure Blob Storage.

        • Azure Cloud: Defines which Azure Cloud to use when using the service_principal authentication method. Options: Azure Cloud or Azure US Government.

  3. Advanced Settings:

    • Logs Container Name (Optional): Name of the blob container with the logs.

    • Max Event Size (Optional): Buffer size in natural language. Specifies how big an event can be. Default: 1MB

    • (Internal Queue) Batch size in events: Batch size of intermediate queues Default: 100

    • (Internal Queue) Max batch size in events: Max batch size of intermediate queue

    • Processing Settings: Define JavaScript expressions for field extraction or select a Pipeline/Pack for data transformation. Default: 200

    • (Internal Queue) Flush timeout: Flush timeout for intermediate queue. Default: 1s

  4. Parser Config:

    • Enable Source Log parser: (False)

    • Toggle Enable Source Log parser Switch to enable

      • Select appropriate Parser from the Source Log Parser dropdown

      • Add additional Parsers as needed.

  5. Pattern Extractor:

  6. Archival Destination:

    • Toggle Enable Archival on Source Switch to enable

    • Under Archival Destination, select from the list of Archival Destinations (Required)

  7. Save and Test Configuration:

    • Save the configuration settings.

    • Use Observo AI’s testing tools to verify that data is being ingested from the specified container.

Example Scenarios

TechCorp, a fictitious Technology enterprise, wants to integrate their Azure Blob Storage container, which stores JSON log files, into the Observo platform for monitoring and analytics. They have an Azure storage account named "techcorpstorage" with a container called "logs-container" that holds the log files. They use a Connection String for authentication and rely on an Azure Event Hub to trigger blob notifications. The configuration will be set up to handle events up to 1MB, with specific batch sizes and flush timeouts for processing.

Standard Azure Blob Storage Source Setup

Here is a standard Azure Blob Storage Source configuration example. Only the required sections and their associated field updates are displayed in the table below:

General Settings

Field
Value
Notes

Name

blob-storage-techcorp-logs

Unique identifier for the source, e.g., indicating TechCorp’s log ingestion.

Description

Ingest JSON logs from TechCorp’s Azure Blob Storage for monitoring

Optional, provides context for the source’s purpose.

Storage Account Name

techcorpstorage

The name of the Azure storage account containing the logs.

Mode

Use Event-Hub for Blob Notifications

Default mode, relying on Event Hub to trigger on Blob Create events.

Event Hub Endpoint

Endpoint=sb://techcorpeventhub.servicebus.windows.net/;SharedAccessKeyName=RootManageSharedAccessKey;SharedAccessKey=abc123xyz==

Azure Event Hub endpoint configured to trigger on Blob Create events.

Authentication Method

Connection String

Default method for connecting to Azure Blob Storage.

Connection String

DefaultEndpointsProtocol=https;AccountName=techcorpstorage;AccountKey=+xyzLkHYcL0MUWIKYHm2j4Q==;EndpointSuffix=core.windows.net

The connection string for authenticating with the Azure storage account.

Advanced Settings

Field
Value
Notes

Logs Container Name

logs-container

Name of the blob container storing the JSON log files.

Max Event Size

1MB

Buffer size for events, set to the default of 1MB as specified.

(Internal Queue) Batch size in events

100

Default batch size for intermediate queues.

(Internal Queue) Max batch size in events

200

Maximum batch size for intermediate queues, set to default.

Processing Settings

Default JavaScript expressions

Define JavaScript expressions for field extraction or select a Pipeline/Pack; default used here.

(Internal Queue) Flush timeout

1s

Default flush timeout for the intermediate queue.

Troubleshooting

If issues arise with the Azure Blob Storage Receiver source in Observo AI, use the following steps to diagnose and resolve them:

  • Verify Configuration Settings:

    • Ensure all fields (e.g., Storage Account Name, Container, Blob Path, Authentication) are correctly entered and match the Azure setup.

    • Confirm that the container exists and contains blobs matching the Filename Filter.

  • Check Authentication:

    • For Storage Account Key, verify the key is valid and not rotated.

    • For SAS Token, ensure it has read permissions and is not expired.

    • For OAuth, confirm the Client ID, Tenant ID, Client Secret, and Scope are correct, and the application has “Storage Blob Data Reader” permissions.

  • Monitor Logs:

    • Check Observo AI’s Logs tab for errors or warnings related to data ingestion.

    • Use Azure’s Storage Analytics or Azure Monitor to verify blob access and activity (Azure Storage Analytics).

  • Validate Connectivity:

    • Ensure Observo AI can reach Azure Blob Storage endpoints over HTTPS (port 443).

    • If using private endpoints or firewall rules, verify they are correctly configured (Azure Private Link).

  • Common Error Messages:

    • “Authorization failure”: Indicates invalid credentials or insufficient permissions. Verify the Storage Account Key, SAS Token, or Entra ID role assignments.

    • “Container not found”: Check the container name and ensure it exists in the storage account.

    • “No data ingested”: Confirm blobs exist in the container and match the Filename Filter. Check the Polling Interval for delays.

  • Test Data Flow:

    • Capture real-time events and verify ingestion.

    • Use the Analytics tab in the targeted Observo AI pipeline to monitor data volume and ensure expected throughput

Issue
Possible Cause
Resolution

Data not ingested

Incorrect container or filter

Verify container name and Filename Filter

Authorization errors

Invalid or expired credentials

Check Storage Account Key, SAS, or OAuth settings

Connectivity issues

Firewall or private endpoint issues

Allow HTTPS on port 443, verify endpoints

“Container not found”

Incorrect container name

Confirm container exists in storage account

“Authorization failure”

Missing permissions

Update permissions or regenerate credentials

Trigger Azure Event Hub on Blob Creation

Azure Event Hubs is a scalable event ingestion service that can process millions of events per second. It is commonly used for real-time analytics, event-driven architectures, and data streaming. One powerful use case is triggering an Event Hub endpoint when a new blob is created in Azure Blob Storage. This setup enables real-time processing of data as soon as it is uploaded to storage.

Prerequisites

Before walk through the steps to configure Azure Event Hub to trigger a Blob Create event, ensure you have the following:

  1. Azure Subscription: You need an active Azure subscription. Create an Azure account if you don’t have one.

  2. Azure Blob Storage Account: A storage account with a container to store blobs. Create a Blob Storage account.

  3. Azure Event Hub: An Event Hub namespace and Event Hub instance. Create an Event Hub.

  4. Azure Event Grid: Used to route Blob Create events to the Event Hub. Learn more about Event Grid.

Step 1: Create an Azure Event Hub

  1. Go to the Azure Portal.

  2. Navigate to Event Hubs and click + Create.

  3. Provide a Namespace Name, select a Pricing Tier, and choose a Resource Group.

  4. Click Review + Create, then Create.

  5. Once the namespace is created, go to the namespace and create an Event Hub instance.

Reference: Create an Event Hub Namespace

Step 2: Enable Event Grid on Azure Blob Storage

  1. Go to your Azure Blob Storage Account in the Azure Portal.

  2. Navigate to Events under the Settings section.

  3. Click + Event Subscription to create a new subscription.

Reference: Enable Event Grid on Blob Storage

Step 3: Configure the Event Subscription

  1. Event Subscription Details:

  2. Provide a Name for the subscription.

  3. For Event Schema, select Event Grid Schema.

  4. Topic Details:

    • Set System Topic Name to a meaningful name (e.g., BlobCreateTopic).

  5. Event Types:

    • Select Blob Created as the event type. You can deselect other event types if not needed.

  6. Endpoint Details:

    • Set Endpoint Type to Event Hub.

    • Select the Event Hub namespace and instance you created earlier.

  7. Click Create to finalize the event subscription.

Reference: Create an Event Grid Subscription

Step 4: Verify the Configuration

  1. Upload a file to the Azure Blob Storage container.

  2. Go to the Event Hub in the Azure Portal.

  3. Use the Metrics or Live Events feature to verify that the Blob Create event is being routed to the Event Hub.

Reference: Monitor Event Hubs

Step 5: Obtain the Event Hub Endpoint String

To configure the OTEL receiver, you’ll need the Event Hub connection string. Here’s how to obtain it:

  1. Go to your Event Hub Namespace in the Azure Portal.

  2. Navigate to Shared Access Policies under the Settings section.

  3. Click on the policy (e.g., RootManageSharedAccessKey) or create a new one.

  4. Copy the Connection String from the policy.

The connection string will look like this:

Endpoint=sb://oteldata.servicebus.windows.net/;SharedAccessKeyName=otelhubbpollicy;SharedAccessKey=mPJVubIK5dJ6mLfZo1ucsdkLysLSQ6N7kddvsIcmoEs=;EntityPath=otellhub

Reference: Get Event Hub Connection String

Resources

For additional guidance and detailed information, refer to the following resources:

Last updated

Was this helpful?