Microsoft Entra Log Collector

The Microsoft Entra Log Collector Source in Observo AI enables the ingestion of JSON-formatted audit and sign-in logs from Microsoft Entra via its API, supporting real-time security monitoring, compliance auditing, and user activity analysis.

Purpose

The purpose of the Observo AI Source Microsoft Entra Log Collector is to enable users to ingest log data from Microsoft Entra (formerly Azure Active Directory) via its API endpoints into the Observo AI platform for analysis and processing. It facilitates the collection of audit logs, sign-in logs, and other events, typically in JSON format, allowing organizations to streamline data pipelines, enhance observability, and support use cases such as security monitoring, compliance auditing, and user activity analysis by processing Microsoft Entra log data in real time.

Prerequisites

Before configuring the Microsoft Entra Log Collector source in Observo AI, ensure the following requirements are met to facilitate seamless data ingestion:

  • Observo AI Platform Setup:

    • The Observo AI platform must be installed and operational, with support for the Microsoft Entra Log Collector as a data source.

    • Verify that the platform supports common data formats such as JSON, as Microsoft Entra logs are typically delivered in this format. Additional parsers may be needed for custom processing.

  • Microsoft Entra API Access:

    • An active Microsoft Entra tenant must be available to send log data to Observo AI.

    • Obtain the Microsoft Entra tenant ID, and generate a client ID, client secret, and necessary API permissions (e.g., AuditLog.Read.All, SignInLogs.Read.All) via the Microsoft Entra admin center or Azure portal.

  • Authentication:

    • Prepare one of the following authentication methods:

      • OAuth2: Obtain a client ID, client secret, and token endpoint URL from the Microsoft Entra admin center for secure access.

      • Secret Authentication: Use a stored secret within Observo AI's secure storage for credentials.

  • Network and Connectivity:

    • Ensure Observo AI can communicate with the Microsoft Entra API endpoint (e.g., https://graph.microsoft.com/v1.0/auditLogs).

    • Check for proxy settings, firewall rules, or VPC endpoint configurations that may affect connectivity to the Microsoft Entra API.

Prerequisite
Description
Notes

Observo AI Platform

Must be installed and support Microsoft Entra Log Collector

Verify support for JSON; additional parsers may be needed

Microsoft Entra API Access

Active Microsoft Entra tenant for log data submission

Obtain tenant ID, client ID, and client secret from admin center

Authentication

OAuth2 or Secret Authentication

Prepare credentials as required by the Microsoft Entra API

Network

Connectivity to the Microsoft Entra API endpoint

Check VPC endpoints, proxies, and firewalls

Integration

The Integration section outlines the configurations for the Microsoft Entra Log Collector source. To configure the Microsoft Entra Log Collector as a source in Observo AI, follow these steps to set up and test the data flow:

  1. Log in to Observo AI:

    • Navigate to the Sources tab.

    • Click the Add Source button and select Create New.

    • Choose Microsoft Entra Log Collector from the list of available sources to begin configuration.

  2. General Settings:

    • Name: A unique identifier for the source, such as entra-log-collector-source-1.

    • Description (Optional): Provide a description for the source.

    • Microsoft Endpoint: Microsoft API endpoint to collect data from. Supports templating with $LAST_VALUE$ when using checkpointing. Default: https://graph.microsoft.com/v1.0/auditLogs/directoryAudits?$filter=activityDateTime%20ge%20$LAST_VALUE$

      Examples

      https://graph.microsoft.com/v1.0/auditLogs/directoryAudits

      https://graph.microsoft.com/v1.0/auditLogs/signIns?since=$LAST_VALUES

    • Collection Interval: Duration between consecutive data collection requests. Default: 10m

      Examples

      10s

      1m

      10m

    • Response Log Path: JSON path to logs array in responses. Leave empty if the response is a direct array of logs. Default: value

      Examples

      Values

      Data

      Resource.logs

    • Headers (Add as needed): Headers to include in the HTTP request. Use the format {key: value}.

  3. Authentication (Optional):

  4. Checkpoint:

    • Enable Checkpoint (True): Enable incremental log collection using checkpointing.

    • Tracking Column: JSON path to the field used for tracking progress such as 'createdDateTime'. The value from the last log entry will be used. Default: activityDateTime

      Examples

      activityDateTime

      createdDateTime

      data.created_at

    • Initial Value: Starting value for the tracking column. Will be used for the first collection.

      Example

      2025-06-02T00:00:00Z

  5. Pagination:

    • Enable Pagination (False): Enable pagination support for handling paginated responses.

    • Pagination Type: Type of pagination to use. Default: Page-Based.

      Options
      Description

      Page-Based

      For traditional page numbers

      Attribute-Based

      For cursor or token-based pagination

    • Page Parameter Name: Query parameter name for the page number. Default: page

      Examples

      page (results in ?page=1)

      page_number

      pageNum

    • Size Parameter Name: Query parameter name for the page size. Default: size

      Examples

      size (results in ?size=50)

      limit

      page_size

    • Page Size: Number of records to request per page. Default: 50

      Examples

      50

      100

      200

    • Start Page: Page number to start pagination from. Works in conjunction with zero-based setting. Default: 0

      Examples

      0

      1

    • Maximum Pages: Maximum number of pages to retrieve in one collection cycle. Set to 0 for unlimited. Default: 50

      Examples

      50

      100

      0

    • Total Pages Path (Empty): JSON path to total pages count in response. Example: 'meta.total_pages' for {"meta": {"total_pages": 5}}

      Examples

      meta.total_pages

      pagination.pages

      page_info.total

    • Total Count Path: JSON path to total record count in response. Example: 'meta.total' for {"meta": {"total": 150}}

      Examples

      meta.total

      pagination.total_records

      count

    • Zero-Based Indexing (False): If true, page numbering starts at 0. If false, it starts at 1.

    • Response Attributes (Add as needed): JSON paths to attributes in the response body that contain next page information. Example: 'meta.nextCursor' for cursor-based pagination.

      Examples

      meta.nextCursor

      pagination.nextPage

      links.next

    • Header Attributes (Add as needed): Names of HTTP headers that contain next page information. Example: 'X-Next-Page' or 'Link' for GitHub-style pagination.

      Examples

      X-Next-Page

      X-Next-Cursor

      Link

    • Request Interval: Time to wait between pagination requests. Use a duration string like '100ms' or '1s'. Default: 100ms

      Examples

      100ms

      500ms

      1s

  6. TLS Configuration (Optional):

    • CA File: The CA certificate provided as an inline string in PEM format.

    • Include System CA Certs Pool (True): Include the system CA certificates pool in the list of CAs used to verify the server certificate.

    • Cert File: Path to the TLS cert to use for TLS required connections.

    • Key File: Path to the TLS key to use for TLS required connections.

    • Insecure (True): Skip TLS verification when connecting to the endpoint. This is insecure and should not be used in production.

    • Insecure Skip Verify (True): Enable TLS but not verify the certificate.

    • Server Name Override: The server name to use to verify the hostname on the returned certificates.

  7. Advanced Settings (Optional):

    • Proxy URL: URL of the proxy server to use when connecting to the endpoint.

    • Read Buffer Size: Size of the read buffer in bytes.

    • Write Buffer Size: Size of the write buffer in bytes.

    • Timeout: Timeout for the HTTP request. Use a number followed by a unit, such as '30s' or '1m'. Default: 10s

    • Compression: Compression algorithm to use for the request body.

      Options
      Description

      Gzip

      DEFLATE compression with headers for file storage

      Zlib

      DEFLATE format with minimal wrapper and checksums

      Deflate

      Combines LZ77 and Huffman for compression efficiency

      Snappy

      Prioritizes speed over compression ratio and complexity

      Zstd

      Fast compression with good ratio and dictionaries

      Lz4

      Ultra-fast compression with minimal resource overhead

    • Max Idle Connections: Maximum number of idle connections to keep open to the endpoint.

    • Idle Connection Timeout: Timeout for idle connections to the endpoint. Use a number followed by a unit, such as '30s' or '1m'.

    • HTTP 2 Read Idle Timeout: Timeout for HTTP/2 read idle connections to the endpoint. Use a number followed by a unit, such as '30s' or '1m'.

    • HTTP 2 Read Ping Timeout: Timeout for HTTP/2 read ping connections to the endpoint. Use a number followed by a unit, such as '30s' or '1m'.

    • Method: HTTP request method to use for requests. Supports GET and POST methods. Default: Get

    • Body: Request body for POST method. Supports templating with $LAST_VALUES when using checkpointing.

      Examples

      {"query": "fetch logs", "from": "$LAST_VALUES"}

  8. Parser Config:

    • Enable Source Log Parser: (False)

    • Toggle Enable Source Log Parser Switch to enable.

    • Select appropriate Parser from the Source Log Parser dropdown.

    • Add additional Parsers as needed.

  9. Pattern Extractor:

    • Refer to Observo AI's Pattern Extractor documentation for details on configuring pattern-based data extraction.

  10. Archival Destination:

    • Toggle Enable Archival on Source Switch to enable.

    • Under Archival Destination, select from the list of Archival Destinations (Required).

  11. Save and Test Configuration:

    • Save the configuration settings in Observo AI.

    • Send sample data to the Microsoft Entra Log Collector endpoint and verify ingestion in the Analytics tab for data flow.

Example Scenarios

Apex Financial Services, a fictitious mid-sized enterprise in the financial services sector, specializes in wealth management and investment advisory. To enhance its security posture and comply with regulatory requirements like SOC 2 and GDPR, Apex aims to monitor user activities and audit logs from its Microsoft Entra tenant. By integrating the Microsoft Entra Log Collector into the Observo AI platform, Apex will ingest real-time sign-in and audit logs to detect unauthorized access, ensure compliance, and streamline incident response. The IT team, led by their Security Operations Manager, follows the integration steps outlined in the provided document to configure the log collector, ensuring all required fields are specified for a robust data pipeline.

Standard Microsoft Entra Log Collector Source Setup

Here is a standard Microsoft Entra Log Collector Source configuration example. Only the required sections and their associated field updates are displayed in the table below:

General Settings

Field
Value
Description

Name

apex-entra-log-collector

Unique identifier for the source, reflecting Apex's branding and purpose.

Description

Collects Microsoft Entra sign-in and audit logs for security monitoring and compliance at Apex Financial Services.

Optional description to clarify the source's purpose.

Microsoft Endpoint

https://graph.microsoft.com/v1.0/auditLogs/signIns?$filter=createdDateTime%20ge%20$LAST_VALUE$

Endpoint for collecting sign-in logs, using checkpointing with $LAST_VALUE$ for incremental collection.

Collection Interval

5m

Data collection every 5 minutes to balance real-time monitoring with API rate limits.

Response Log Path

value

JSON path to the logs array in the API response, as sign-in logs are nested under "value".

Headers

{ "Content-Type": "application/json" }

HTTP header to ensure JSON content type for API requests.

Authentication

Field
Value
Description

Client ID

123e4567-e89b-12d3-a456-426614174000

Application (client) ID registered in Microsoft Entra for API access.

Client Secret

gX7fH9kP2mL8qW3rT5vY1zA6bC4dE0jN

Client secret generated for secure authentication to Microsoft Graph API.

Token URL

https://login.microsoftonline.com/987f6543-21ba-43cd-9876-543210fedcba/oauth2/v2.0/token

OAuth2 token endpoint, with Apex's tenant ID (987f6543-21ba-43cd-9876-543210fedcba).

Scopes

https://graph.microsoft.com/.default

Default scope for Microsoft Graph API to access required permissions (e.g., SignInLogs.Read.All).

Headers

{ "Accept": "application/json" }

Header to ensure JSON response for OAuth2 token requests.

Checkpoint

Field
Value
Description

Enable Checkpoint

True

Enables incremental log collection to avoid duplicating data.

Tracking Column

createdDateTime

JSON path to the timestamp field in sign-in logs for tracking progress.

Initial Value

2025-07-01T00:00:00Z

Starting timestamp for the first collection, set to July 1, 2025, to capture recent logs.

Pagination

Field
Value
Description

Enable Pagination

True

Enables pagination to handle large datasets from the Microsoft Graph API.

Pagination Type

Attribute-Based

Uses cursor-based pagination, common for Microsoft Graph API responses.

Page Parameter Name

$skiptoken

Query parameter for cursor-based pagination in Microsoft Graph API.

Size Parameter Name

$top

Query parameter to specify page size (e.g., ?$top=100).

Page Size

100

Requests 100 records per page to optimize API calls.

Start Page

0

Starts pagination from the first page (zero-based indexing).

Maximum Pages

0

Set to 0 for unlimited pages to ensure all logs are collected.

Total Pages Path

Left empty, as Microsoft Graph API does not provide total pages in responses.

Total Count Path

Left empty, as total record count is not provided in sign-in log responses.

Zero-Based Indexing

True

Page numbering starts at 0, aligning with API behavior.

Response Attributes

@odata.nextLink

JSON path to the next page cursor in the API response (e.g., {"@odata.nextLink": "url"}).

Header Attributes

Left empty, as pagination info is in the response body, not headers.

Request Interval

200ms

200ms delay between pagination requests to avoid rate limiting.

TLS Configuration

Field
Value
Description

CA File

-----BEGIN CERTIFICATE----- MIID...== -----END CERTIFICATE-----

Inline PEM-formatted CA certificate for verifying the Microsoft Graph API server.

Include System CA Certs Pool

True

Includes system CA certificates to ensure broad compatibility.

Cert File

/path/to/apex-client-cert.pem

Path to Apex's TLS certificate for client authentication.

Key File

/path/to/apex-client-key.pem

Path to the TLS key paired with the client certificate.

Insecure

False

Ensures TLS verification is enforced for production security.

Insecure Skip Verify

False

Requires certificate verification for secure connections.

Server Name Override

graph.microsoft.com

Specifies the server name for hostname verification in TLS certificates.

Advanced Settings

Field
Value
Description

Proxy URL

https://proxy.apexfs.com:8080

URL of Apex's corporate proxy server for outbound API requests.

Timeout

15s

Sets a 15-second timeout for HTTP requests to handle network latency.

Compression

Gzip

Uses Gzip compression to reduce request body size and optimize bandwidth.

Method

GET

HTTP GET method, as required by the Microsoft Graph API for log retrieval.

Test Configuration

  • After entering the above settings in Observo AI, Apex's IT team saves the configuration and tests it by sending sample sign-in log data to the Microsoft Entra Log Collector endpoint.

  • They verify successful ingestion in the Observo AI Analytics tab, ensuring logs are flowing and parsed correctly for security monitoring and compliance reporting.

Troubleshooting

If issues arise with the Microsoft Entra Log Collector source in Observo AI, use the following steps to diagnose and resolve them:

  • Verify Configuration Settings:

    • Ensure all fields, such as Endpoint, Client ID, Client Secret, and parser settings, are correctly entered and match the Microsoft Entra API setup.

    • Confirm the HTTP method such as GET or POST aligns with the endpoint's requirements.

  • Check Authentication:

    • Verify the authentication method:

      • For OAuth2, ensure the client ID, client secret, and token URL are valid and not expired, with appropriate API permissions (e.g., AuditLog.Read.All).

      • For Secret Authentication, confirm the secret is accessible in Observo AI's secure storage.

  • Validate Network Connectivity:

    • Check for firewall rules, proxy settings, or VPC endpoint configurations that may block access to the Microsoft Entra API endpoint.

    • Test connectivity using tools like curl or Postman with similar proxy configurations to verify access.

  • Common Error Messages:

    • "Inaccessible host": May indicate TLS version mismatches such as TLS 1.3 issues or DNS problems. Ensure the host supports the required TLS version and check DNS settings.

    • "Authentication failed": Verify that the client ID, client secret, or stored secret is correct and has the necessary permissions for the Microsoft Entra API.

    • "Request timeout": Check the Timeout setting and network latency; consider increasing the timeout value.

  • Monitor Logs and Data:

    • Verify that data is being ingested by monitoring the Microsoft Entra Log Collector endpoint activity.

    • Use the Analytics tab in the targeted Observo AI pipeline to monitor data volume and ensure expected throughput.

    • Check Observo AI logs for errors or warnings related to data ingestion from the Microsoft Entra Log Collector source.

Issue
Possible Cause
Resolution

Data not ingested

Incorrect URL or parser configuration

Verify URL and parser settings

Authentication errors

Invalid or expired credentials

Check client ID, client secret, or secret validity

Connectivity issues

Firewall or proxy blocking access

Test network connectivity and VPC endpoints

"Inaccessible host"

TLS or DNS issues

Ensure TLS compatibility and check DNS

"Authentication failed"

Misconfigured credentials

Verify auth method and permissions

"Request timeout"

Network latency or low timeout setting

Increase Timeout or check network

Resources

For additional guidance and detailed information, refer to the following resources:

Last updated

Was this helpful?