Microsoft Entra Log Collector
The Microsoft Entra Log Collector Source in Observo AI enables the ingestion of JSON-formatted audit and sign-in logs from Microsoft Entra via its API, supporting real-time security monitoring, compliance auditing, and user activity analysis.
Purpose
The purpose of the Observo AI Source Microsoft Entra Log Collector is to enable users to ingest log data from Microsoft Entra (formerly Azure Active Directory) via its API endpoints into the Observo AI platform for analysis and processing. It facilitates the collection of audit logs, sign-in logs, and other events, typically in JSON format, allowing organizations to streamline data pipelines, enhance observability, and support use cases such as security monitoring, compliance auditing, and user activity analysis by processing Microsoft Entra log data in real time.
Prerequisites
Before configuring the Microsoft Entra Log Collector source in Observo AI, ensure the following requirements are met to facilitate seamless data ingestion:
Observo AI Platform Setup:
The Observo AI platform must be installed and operational, with support for the Microsoft Entra Log Collector as a data source.
Verify that the platform supports common data formats such as JSON, as Microsoft Entra logs are typically delivered in this format. Additional parsers may be needed for custom processing.
Microsoft Entra API Access:
An active Microsoft Entra tenant must be available to send log data to Observo AI.
Obtain the Microsoft Entra tenant ID, and generate a client ID, client secret, and necessary API permissions (e.g., AuditLog.Read.All, SignInLogs.Read.All) via the Microsoft Entra admin center or Azure portal.
Authentication:
Prepare one of the following authentication methods:
OAuth2: Obtain a client ID, client secret, and token endpoint URL from the Microsoft Entra admin center for secure access.
Secret Authentication: Use a stored secret within Observo AI's secure storage for credentials.
Network and Connectivity:
Ensure Observo AI can communicate with the Microsoft Entra API endpoint (e.g., https://graph.microsoft.com/v1.0/auditLogs).
Check for proxy settings, firewall rules, or VPC endpoint configurations that may affect connectivity to the Microsoft Entra API.
Observo AI Platform
Must be installed and support Microsoft Entra Log Collector
Verify support for JSON; additional parsers may be needed
Microsoft Entra API Access
Active Microsoft Entra tenant for log data submission
Obtain tenant ID, client ID, and client secret from admin center
Authentication
OAuth2 or Secret Authentication
Prepare credentials as required by the Microsoft Entra API
Network
Connectivity to the Microsoft Entra API endpoint
Check VPC endpoints, proxies, and firewalls
Integration
The Integration section outlines the configurations for the Microsoft Entra Log Collector source. To configure the Microsoft Entra Log Collector as a source in Observo AI, follow these steps to set up and test the data flow:
Log in to Observo AI:
Navigate to the Sources tab.
Click the Add Source button and select Create New.
Choose Microsoft Entra Log Collector from the list of available sources to begin configuration.
General Settings:
Name: A unique identifier for the source, such as entra-log-collector-source-1.
Description (Optional): Provide a description for the source.
Microsoft Endpoint: Microsoft API endpoint to collect data from. Supports templating with $LAST_VALUE$ when using checkpointing. Default: https://graph.microsoft.com/v1.0/auditLogs/directoryAudits?$filter=activityDateTime%20ge%20$LAST_VALUE$
Exampleshttps://graph.microsoft.com/v1.0/auditLogs/directoryAudits
https://graph.microsoft.com/v1.0/auditLogs/signIns?since=$LAST_VALUES
Collection Interval: Duration between consecutive data collection requests. Default: 10m
Examples10s
1m
10m
Response Log Path: JSON path to logs array in responses. Leave empty if the response is a direct array of logs. Default: value
ExamplesValues
Data
Resource.logs
Headers (Add as needed): Headers to include in the HTTP request. Use the format {key: value}.
Authentication (Optional):
Client ID: Application(client) ID for authenticating to Microsoft APIs.
Client Secret: Value of client secret of Azure
Token URL: URL to get the OAuth2 token. Default: https://login.microsoftonline.com/{tenant_id}/oauth2/v2.0/token
Examplehttps://graph.microsoft.com/.default
Scopes (Add as needed): Scopes to request for OAuth2 authentication. Default: https://graph.microsoft.com/.default
Headers (Add as needed): Headers to include in the Oauth2 authentication HTTP request. Use the format {key: value}.
Checkpoint:
Enable Checkpoint (True): Enable incremental log collection using checkpointing.
Tracking Column: JSON path to the field used for tracking progress such as 'createdDateTime'. The value from the last log entry will be used. Default: activityDateTime
ExamplesactivityDateTime
createdDateTime
data.created_at
Initial Value: Starting value for the tracking column. Will be used for the first collection.
Example2025-06-02T00:00:00Z
Pagination:
Enable Pagination (False): Enable pagination support for handling paginated responses.
Pagination Type: Type of pagination to use. Default: Page-Based.
OptionsDescriptionPage-Based
For traditional page numbers
Attribute-Based
For cursor or token-based pagination
Page Parameter Name: Query parameter name for the page number. Default: page
Examplespage (results in ?page=1)
page_number
pageNum
Size Parameter Name: Query parameter name for the page size. Default: size
Examplessize (results in ?size=50)
limit
page_size
Page Size: Number of records to request per page. Default: 50
Examples50
100
200
Start Page: Page number to start pagination from. Works in conjunction with zero-based setting. Default: 0
Examples0
1
Maximum Pages: Maximum number of pages to retrieve in one collection cycle. Set to 0 for unlimited. Default: 50
Examples50
100
0
Total Pages Path (Empty): JSON path to total pages count in response. Example: 'meta.total_pages' for {"meta": {"total_pages": 5}}
Examplesmeta.total_pages
pagination.pages
page_info.total
Total Count Path: JSON path to total record count in response. Example: 'meta.total' for {"meta": {"total": 150}}
Examplesmeta.total
pagination.total_records
count
Zero-Based Indexing (False): If true, page numbering starts at 0. If false, it starts at 1.
Response Attributes (Add as needed): JSON paths to attributes in the response body that contain next page information. Example: 'meta.nextCursor' for cursor-based pagination.
Examplesmeta.nextCursor
pagination.nextPage
links.next
Header Attributes (Add as needed): Names of HTTP headers that contain next page information. Example: 'X-Next-Page' or 'Link' for GitHub-style pagination.
ExamplesX-Next-Page
X-Next-Cursor
Link
Request Interval: Time to wait between pagination requests. Use a duration string like '100ms' or '1s'. Default: 100ms
Examples100ms
500ms
1s
TLS Configuration (Optional):
CA File: The CA certificate provided as an inline string in PEM format.
Include System CA Certs Pool (True): Include the system CA certificates pool in the list of CAs used to verify the server certificate.
Cert File: Path to the TLS cert to use for TLS required connections.
Key File: Path to the TLS key to use for TLS required connections.
Insecure (True): Skip TLS verification when connecting to the endpoint. This is insecure and should not be used in production.
Insecure Skip Verify (True): Enable TLS but not verify the certificate.
Server Name Override: The server name to use to verify the hostname on the returned certificates.
Advanced Settings (Optional):
Proxy URL: URL of the proxy server to use when connecting to the endpoint.
Read Buffer Size: Size of the read buffer in bytes.
Write Buffer Size: Size of the write buffer in bytes.
Timeout: Timeout for the HTTP request. Use a number followed by a unit, such as '30s' or '1m'. Default: 10s
Compression: Compression algorithm to use for the request body.
OptionsDescriptionGzip
DEFLATE compression with headers for file storage
Zlib
DEFLATE format with minimal wrapper and checksums
Deflate
Combines LZ77 and Huffman for compression efficiency
Snappy
Prioritizes speed over compression ratio and complexity
Zstd
Fast compression with good ratio and dictionaries
Lz4
Ultra-fast compression with minimal resource overhead
Max Idle Connections: Maximum number of idle connections to keep open to the endpoint.
Idle Connection Timeout: Timeout for idle connections to the endpoint. Use a number followed by a unit, such as '30s' or '1m'.
HTTP 2 Read Idle Timeout: Timeout for HTTP/2 read idle connections to the endpoint. Use a number followed by a unit, such as '30s' or '1m'.
HTTP 2 Read Ping Timeout: Timeout for HTTP/2 read ping connections to the endpoint. Use a number followed by a unit, such as '30s' or '1m'.
Method: HTTP request method to use for requests. Supports GET and POST methods. Default: Get
Body: Request body for POST method. Supports templating with $LAST_VALUES when using checkpointing.
Examples{"query": "fetch logs", "from": "$LAST_VALUES"}
Parser Config:
Enable Source Log Parser: (False)
Toggle Enable Source Log Parser Switch to enable.
Select appropriate Parser from the Source Log Parser dropdown.
Add additional Parsers as needed.
Pattern Extractor:
Refer to Observo AI's Pattern Extractor documentation for details on configuring pattern-based data extraction.
Archival Destination:
Toggle Enable Archival on Source Switch to enable.
Under Archival Destination, select from the list of Archival Destinations (Required).
Save and Test Configuration:
Save the configuration settings in Observo AI.
Send sample data to the Microsoft Entra Log Collector endpoint and verify ingestion in the Analytics tab for data flow.
Example Scenarios
Apex Financial Services, a fictitious mid-sized enterprise in the financial services sector, specializes in wealth management and investment advisory. To enhance its security posture and comply with regulatory requirements like SOC 2 and GDPR, Apex aims to monitor user activities and audit logs from its Microsoft Entra tenant. By integrating the Microsoft Entra Log Collector into the Observo AI platform, Apex will ingest real-time sign-in and audit logs to detect unauthorized access, ensure compliance, and streamline incident response. The IT team, led by their Security Operations Manager, follows the integration steps outlined in the provided document to configure the log collector, ensuring all required fields are specified for a robust data pipeline.
Standard Microsoft Entra Log Collector Source Setup
Here is a standard Microsoft Entra Log Collector Source configuration example. Only the required sections and their associated field updates are displayed in the table below:
General Settings
Name
apex-entra-log-collector
Unique identifier for the source, reflecting Apex's branding and purpose.
Description
Collects Microsoft Entra sign-in and audit logs for security monitoring and compliance at Apex Financial Services.
Optional description to clarify the source's purpose.
Microsoft Endpoint
https://graph.microsoft.com/v1.0/auditLogs/signIns?$filter=createdDateTime%20ge%20$LAST_VALUE$
Endpoint for collecting sign-in logs, using checkpointing with $LAST_VALUE$ for incremental collection.
Collection Interval
5m
Data collection every 5 minutes to balance real-time monitoring with API rate limits.
Response Log Path
value
JSON path to the logs array in the API response, as sign-in logs are nested under "value".
Headers
{ "Content-Type": "application/json" }
HTTP header to ensure JSON content type for API requests.
Authentication
Client ID
123e4567-e89b-12d3-a456-426614174000
Application (client) ID registered in Microsoft Entra for API access.
Client Secret
gX7fH9kP2mL8qW3rT5vY1zA6bC4dE0jN
Client secret generated for secure authentication to Microsoft Graph API.
Token URL
https://login.microsoftonline.com/987f6543-21ba-43cd-9876-543210fedcba/oauth2/v2.0/token
OAuth2 token endpoint, with Apex's tenant ID (987f6543-21ba-43cd-9876-543210fedcba).
Scopes
https://graph.microsoft.com/.default
Default scope for Microsoft Graph API to access required permissions (e.g., SignInLogs.Read.All).
Headers
{ "Accept": "application/json" }
Header to ensure JSON response for OAuth2 token requests.
Checkpoint
Enable Checkpoint
True
Enables incremental log collection to avoid duplicating data.
Tracking Column
createdDateTime
JSON path to the timestamp field in sign-in logs for tracking progress.
Initial Value
2025-07-01T00:00:00Z
Starting timestamp for the first collection, set to July 1, 2025, to capture recent logs.
Pagination
Enable Pagination
True
Enables pagination to handle large datasets from the Microsoft Graph API.
Pagination Type
Attribute-Based
Uses cursor-based pagination, common for Microsoft Graph API responses.
Page Parameter Name
$skiptoken
Query parameter for cursor-based pagination in Microsoft Graph API.
Size Parameter Name
$top
Query parameter to specify page size (e.g., ?$top=100).
Page Size
100
Requests 100 records per page to optimize API calls.
Start Page
0
Starts pagination from the first page (zero-based indexing).
Maximum Pages
0
Set to 0 for unlimited pages to ensure all logs are collected.
Total Pages Path
Left empty, as Microsoft Graph API does not provide total pages in responses.
Total Count Path
Left empty, as total record count is not provided in sign-in log responses.
Zero-Based Indexing
True
Page numbering starts at 0, aligning with API behavior.
Response Attributes
@odata.nextLink
JSON path to the next page cursor in the API response (e.g., {"@odata.nextLink": "url"}).
Header Attributes
Left empty, as pagination info is in the response body, not headers.
Request Interval
200ms
200ms delay between pagination requests to avoid rate limiting.
TLS Configuration
CA File
-----BEGIN CERTIFICATE----- MIID...== -----END CERTIFICATE-----
Inline PEM-formatted CA certificate for verifying the Microsoft Graph API server.
Include System CA Certs Pool
True
Includes system CA certificates to ensure broad compatibility.
Cert File
/path/to/apex-client-cert.pem
Path to Apex's TLS certificate for client authentication.
Key File
/path/to/apex-client-key.pem
Path to the TLS key paired with the client certificate.
Insecure
False
Ensures TLS verification is enforced for production security.
Insecure Skip Verify
False
Requires certificate verification for secure connections.
Server Name Override
graph.microsoft.com
Specifies the server name for hostname verification in TLS certificates.
Advanced Settings
Proxy URL
https://proxy.apexfs.com:8080
URL of Apex's corporate proxy server for outbound API requests.
Timeout
15s
Sets a 15-second timeout for HTTP requests to handle network latency.
Compression
Gzip
Uses Gzip compression to reduce request body size and optimize bandwidth.
Method
GET
HTTP GET method, as required by the Microsoft Graph API for log retrieval.
Test Configuration
After entering the above settings in Observo AI, Apex's IT team saves the configuration and tests it by sending sample sign-in log data to the Microsoft Entra Log Collector endpoint.
They verify successful ingestion in the Observo AI Analytics tab, ensuring logs are flowing and parsed correctly for security monitoring and compliance reporting.
Troubleshooting
If issues arise with the Microsoft Entra Log Collector source in Observo AI, use the following steps to diagnose and resolve them:
Verify Configuration Settings:
Ensure all fields, such as Endpoint, Client ID, Client Secret, and parser settings, are correctly entered and match the Microsoft Entra API setup.
Confirm the HTTP method such as GET or POST aligns with the endpoint's requirements.
Check Authentication:
Verify the authentication method:
For OAuth2, ensure the client ID, client secret, and token URL are valid and not expired, with appropriate API permissions (e.g., AuditLog.Read.All).
For Secret Authentication, confirm the secret is accessible in Observo AI's secure storage.
Validate Network Connectivity:
Check for firewall rules, proxy settings, or VPC endpoint configurations that may block access to the Microsoft Entra API endpoint.
Test connectivity using tools like curl or Postman with similar proxy configurations to verify access.
Common Error Messages:
"Inaccessible host": May indicate TLS version mismatches such as TLS 1.3 issues or DNS problems. Ensure the host supports the required TLS version and check DNS settings.
"Authentication failed": Verify that the client ID, client secret, or stored secret is correct and has the necessary permissions for the Microsoft Entra API.
"Request timeout": Check the Timeout setting and network latency; consider increasing the timeout value.
Monitor Logs and Data:
Verify that data is being ingested by monitoring the Microsoft Entra Log Collector endpoint activity.
Use the Analytics tab in the targeted Observo AI pipeline to monitor data volume and ensure expected throughput.
Check Observo AI logs for errors or warnings related to data ingestion from the Microsoft Entra Log Collector source.
Data not ingested
Incorrect URL or parser configuration
Verify URL and parser settings
Authentication errors
Invalid or expired credentials
Check client ID, client secret, or secret validity
Connectivity issues
Firewall or proxy blocking access
Test network connectivity and VPC endpoints
"Inaccessible host"
TLS or DNS issues
Ensure TLS compatibility and check DNS
"Authentication failed"
Misconfigured credentials
Verify auth method and permissions
"Request timeout"
Network latency or low timeout setting
Increase Timeout or check network
Resources
For additional guidance and detailed information, refer to the following resources:
Last updated
Was this helpful?

