Cloudflare

Integrate Cloudflare as a source in the Observo AI platform using either the Splunk HEC Source or the S3 Source. This allows high-volume HTTP, DNS, and security logs to be securely streamed into Observo AI for intelligent routing, enrichment, and real-time analysis.

Purpose

Cloudflare's Logpush exports granular edge telemetry in near real-time. Observo AI receives this data via:

  • Splunk HEC for low-latency ingestion

  • AWS S3 for cost-effective storage and batch analytics

Observo AI processes these logs using AI-driven filters, enrichments, and destination rules—helping enterprises reduce SIEM costs, detect threats faster, and maintain audit visibility.

Prerequisites

Before configuring the Cloudflare Integration in Observo AI, ensure the following requirements are met to facilitate seamless data ingestion:

Observo AI Platform Setup:

  • The Observo AI platform must be installed and operational, with support for either Splunk HEC Source or S3 Source configurations.

  • Verify that the platform supports common data formats such as JSON and compressed formats (gzip, parquet), as Cloudflare logs are typically delivered in these formats.

  • Ensure proper network connectivity and firewall configurations to allow traffic from Cloudflare's egress IP ranges.

Cloudflare Account Requirements:

  • An active Cloudflare account with Logpush feature enabled (available on Pro, Business, and Enterprise plans).

  • Administrative access or API token permissions to configure Logpush jobs in the Cloudflare dashboard.

  • Access to Cloudflare Analytics & Logs section for configuration and monitoring.

Authentication and Security:

  • For HEC Integration: Prepare OAuth2 tokens or API keys for secure HTTP endpoint access.

  • For S3 Integration: Configure AWS IAM roles or access keys with appropriate S3 bucket permissions.

  • Valid TLS certificates (TLS 1.2+ required) with proper CA chain for secure data transmission.

  • IP allowlisting capabilities to restrict access to Cloudflare's official egress IP ranges.

Network and Connectivity:

  • Ensure Observo AI can receive HTTPS traffic on designated ports (typically 8088 or 10088 for HEC).

  • For S3 integration, verify read access to the designated S3 bucket with proper IAM policies.

  • Configure load balancers or reverse proxies for high availability and security hardening.

  • Establish proper DNS resolution for all endpoints involved in the integration.

Prerequisite
Description
Notes

Observo AI Platform

Must support Splunk HEC or S3 Source with JSON parsing

Verify TLS 1.2+ support and compression handling

Cloudflare Account

Active account with Logpush enabled and admin access

Pro/Business/Enterprise plans required for full features

Authentication

OAuth2/API tokens for HEC or AWS IAM for S3

Rotate credentials every 90 days minimum

Network Security

HTTPS endpoints with proper TLS and IP restrictions

Use trusted CA certificates, avoid self-signed in production

Integration Options

Observo AI supports two primary integration methods for Cloudflare log ingestion, each with distinct advantages for different enterprise requirements:

Option 1: Integration via Splunk HEC

Best for: Real-time analytics, immediate alerting, and low-latency requirements

Advantages:

  • Real-time log streaming with minimal latency

  • Direct authentication through secure tokens

  • Immediate data availability for analysis and alerting

  • Simplified architecture with fewer components

Considerations:

  • Requires stable HTTPS endpoint with high availability

  • Direct exposure to internet traffic requiring robust security measures

  • Real-time processing demands higher resource allocation

Option 2: AWS S3 Bucket Integration

Best for: High-volume environments, cost optimization, and batch processing

Advantages:

  • Cost-effective for large log volumes with S3 storage economics

  • Built-in redundancy and durability through AWS S3

  • Flexible processing schedules and batch optimization

  • Natural integration with existing AWS infrastructure

Considerations:

  • Slight delay in data availability due to polling intervals

  • Additional AWS costs for S3 storage and API calls

  • Requires AWS infrastructure and IAM management

Integration

Integration Option 1: Cloudflare → Observo AI via Splunk HEC

This section outlines the configuration for direct HTTPS integration using Splunk HEC source.

Observo AI Configuration:

  1. Log in to Observo AI:

    • Navigate to the Sources tab

    • Click the Add Source button and select Create New

    • Choose Splunk HEC from the list of available sources to begin configuration

    • Refer to Splunk HEC doc for configuration details

Cloudflare Configuration:

  1. Access Cloudflare Dashboard:

    • Navigate to Analytics → Logs → Logpush

    • Click Create Job to begin configuration

  2. Dataset Selection:

    • Choose appropriate dataset(s):

      • HTTP requests for web traffic analysis

      • Firewall events for security monitoring

      • DNS queries for DNS analytics

      • Multiple datasets can be configured with separate jobs

  3. Destination Configuration:

    • Type: HTTPS

    • URL: <Observo push source endpoint>/services/collector/raw

      • Retrieve the push source endpoint from Observo UI for Splunk HEC source you created in step 1.

    • Headers: Add Authorization: <Your Auth Code>

      • Ensure to add the same auth code in Observo Splunk HEC source

    • Compression: Enable gzip compression

  4. Advanced Settings:

    • Frequency: Real-time or batch (recommended: real-time for security events)

    • Format: JSON (recommended) or CSV

    • Field Selection: Choose relevant fields based on use case requirements

    • Filtering: Apply filters to reduce unnecessary log volume

Integration Option 2: Cloudflare → S3 → Observo AI via S3 Source

This section outlines the configuration for S3-based integration using batch processing.

AWS S3 Configuration:

  1. Create S3 Bucket:

Cloudflare Configuration:

  1. Logpush Job Creation:

    • Navigate to Analytics → Logs → Logpush → Create Job

    • Select desired dataset for logging

  2. S3 Destination Setup:

    • Destination Type: Amazon S3

    • Bucket: <S3 bucket name created in step 1>

    • Region: Match your S3 bucket region

    • Path Pattern: logs/cloudflare/{YYYY}/{MM}/{DD}/{HH}/

    • Filename Pattern: cf-logs-{TIMESTAMP}-{BATCH_ID}.json.gz

  3. Format and Compression:

    • Format: JSON or Parquet (JSON recommended for flexibility)

    • Compression: gzip (recommended for bandwidth optimization)

    • Field Selection: Configure based on analytical requirements

Observo AI S3 Source Configuration:

  1. S3 Source Setup:

    • Navigate to Sources → Add Source → S3

    • Refer to AWS S3 for configuration details

Test Configuration

For HEC Integration:

  • Save the configuration in the Observo AI interface

  • Use curl to test the HEC endpoint with sample Cloudflare log data

  • Verify token authentication and TLS connectivity

  • Monitor Observo AI logs for successful ingestion

  • Validate log parsing and field extraction in the Analytics tab

For S3 Integration:

  • Create test files in the S3 bucket with sample Cloudflare data

  • Verify IAM permissions and bucket access

  • Monitor S3 source polling and file processing

  • Confirm automatic JSON parsing and field extraction

  • Validate data flow through to downstream systems

Scenario Troubleshooting

HEC Integration Issues:

  • 401 Unauthorized: Verify HEC token validity and IP allowlist configuration

  • TLS Handshake Failures: Check certificate validity, CN/SAN matching, and TLS version compatibility

  • Connection Timeouts: Validate network connectivity, firewall rules, and load balancer configuration

  • High Latency: Optimize compression settings, review network path, and check processing capacity

S3 Integration Issues:

  • Access Denied: Verify IAM role permissions, bucket policies, and cross-account access

  • Files Not Processed: Check file patterns, path prefixes, and polling interval configuration

  • Parsing Errors: Validate JSON format, compression handling, and field extraction rules

  • Duplicate Processing: Ensure checkpointing is enabled and functioning correctly

Common Issues for Both Methods:

  • Log Format Mismatches: Use Cloudflare's field reference and test with sample data

  • Volume Overload: Implement rate limiting, batch processing optimization, and capacity scaling

  • Authentication Expiry: Establish token rotation procedures and monitoring

  • Network Security: Regularly update Cloudflare IP ranges and monitor for unauthorized access

Security Best Practices

Authentication and Access Control

  • Token Management: Rotate HEC tokens every 90 days minimum, use strong token generation

  • IP Restrictions: Maintain updated allowlists with official Cloudflare egress IP ranges

  • Multi-Factor Authentication: Enable MFA for all administrative access to logging configurations

  • Principle of Least Privilege: Grant minimal necessary permissions for service accounts

Transport Security

  • TLS Configuration: Use TLS 1.2 minimum, prefer TLS 1.3 for enhanced security

  • Certificate Management: Use trusted CA certificates, avoid self-signed certificates in production

  • Cipher Suites: Restrict to modern, secure cipher suites, disable legacy protocols

  • Certificate Monitoring: Implement automated certificate expiry monitoring and renewal

Network Security

  • Firewall Rules: Configure strict ingress rules allowing only necessary traffic

  • DDoS Protection: Implement rate limiting and DDoS mitigation at infrastructure level

  • Load Balancer Security: Use WAF rules and health checks for additional protection

  • Network Segmentation: Isolate logging infrastructure from other network segments

Data Protection

  • Encryption at Rest: Enable encryption for S3 buckets and local storage

  • Data Retention: Implement appropriate retention policies based on compliance requirements

  • Access Logging: Monitor and log all access to logging infrastructure and data

  • Data Masking: Apply data masking for sensitive information in log streams

Troubleshooting

If issues arise with the Cloudflare Integration in Observo AI, use the following comprehensive steps to diagnose and resolve them:

Configuration Validation

  • Endpoint Verification: Ensure all URLs, ports, and paths are correctly specified and accessible

  • Authentication Check: Verify tokens, certificates, and credentials are valid and not expired

  • Format Validation: Confirm log formats match expected JSON structure and field mappings

  • Network Connectivity: Test connectivity using tools like curl, telnet, or AWS CLI

Common Error Messages and Resolutions

Error Message
Possible Cause
Resolution Steps

"401 Unauthorized"

Invalid or expired HEC token

Verify token validity in Cloudflare and Observo AI configuration

"403 Forbidden"

IP not in allowlist or insufficient permissions

Update IP allowlist with current Cloudflare ranges, check IAM policies

"SSL Handshake Failed"

Certificate or TLS version mismatch

Verify certificate validity, check TLS version compatibility

"Connection Timeout"

Network connectivity or firewall issues

Test network path, review firewall rules, check load balancer health

"Access Denied (S3)"

IAM permissions or bucket policy issue

Verify IAM role permissions and S3 bucket policies

"JSON Parse Error"

Log format mismatch or corruption

Validate sample logs against expected JSON schema

"Rate Limit Exceeded"

Too many requests or large log volume

Adjust rate limits, implement backoff strategies, optimize batch sizes

"Certificate Expired"

TLS certificate has expired

Renew certificate and update configuration

Monitoring and Validation

  • Log Volume Monitoring: Track expected vs. actual log volume to identify gaps

  • Error Rate Analysis: Monitor ingestion error rates and patterns

  • Latency Measurement: Measure end-to-end latency from Cloudflare to Observo AI

  • Data Quality Checks: Validate field extraction, timestamp parsing, and data completeness

Performance Optimization

  • Compression Settings: Optimize compression levels for bandwidth vs. CPU trade-offs

  • Batch Size Tuning: Adjust batch sizes for optimal throughput and latency

  • Polling Frequency: Balance polling frequency with API costs and latency requirements

  • Resource Scaling: Monitor CPU, memory, and network utilization for scaling decisions

Advanced Troubleshooting

  • Network Trace Analysis: Use packet capture tools to analyze network-level issues

  • Log Analysis: Enable debug logging for detailed troubleshooting information

  • Health Check Implementation: Implement comprehensive health checks for all components

  • Backup Procedures: Establish procedures for failover and data recovery scenarios

Resources

Last updated

Was this helpful?