AWS Security Lake

Writes events in the OCSF schema format to Amazon Security Lake. It only ingests data adhering to the OCSF schema and stores it in the Parquet file format. The OCSF version must align with the specifications detailed in the OCSF documentation, specifically, OCSF version 1.0.0-rc.2 and OCSF version 1.1.0.

Purpose

The purpose of the Observo AI Amazon Security Lake Destination is to enable the seamless transmission of security and telemetry data from Observo AI to Amazon Security Lake for centralized storage, analysis, and querying. It integrates with Amazon Security Lake to consolidate data in a standardized format (OCSF), facilitating advanced analytics, threat detection, and compliance monitoring using AWS services like Amazon Athena. This destination helps organizations gain actionable insights by leveraging Security Lake’s scalable data lake capabilities for security data management.

Prerequisites

Before configuring the Amazon Security Lake Destination in Observo AI, ensure the following requirements are met to facilitate seamless data ingestion:

  • Observo AI Platform Setup:

    • The Observo AI Site must be installed and available.

    • Verify that the platform can send data in formats compatible with Amazon Security Lake, such as JSON or Parquet, adhering to the Open Cybersecurity Schema Framework (OCSF).

  • Amazon Security Lake Setup:

    • Create an Amazon Security Lake account and enable the service in the desired AWS Region (Amazon Security Lake).

    • Configure a custom source in Security Lake to receive data from Observo AI (Custom Sources).

    • Note the AWS Region, Account ID, and External ID for the custom source configuration.

  • AWS IAM Role:

    • Create an IAM Role in AWS with permissions to write to Amazon Security Lake. The role must include the securitylake:UpdateDataLake permission (IAM Roles for Security Lake).

    • Ensure the role’s trust policy allows Observo AI to assume the role using the External ID noted from the custom source.

    • Record the Role ARN (Amazon Resource Name) for configuration.

  • Amazon S3 and Glue:

    • Ensure an S3 bucket is configured in Security Lake for data storage. This is typically managed automatically by Security Lake but verifies accessibility (Amazon S3).

    • Verify that AWS Glue tables are set up for querying Security Lake data via Amazon Athena (AWS Glue).

  • Network and Connectivity:

    • Ensure Observo AI can communicate with Amazon Security Lake endpoints over HTTPS (port 443).

    • If using private endpoints, VPC configurations, or firewall rules, configure them to allow access to Security Lake APIs and S3 endpoints (AWS PrivateLink).

Prerequisite
Description
Notes

Observo AI Platform

Must support Security Lake formats

Verify compatibility with OCSF (JSON/Parquet)

Amazon Security Lake

Storage and analysis hub

Create custom source, note Region, Account ID, External ID

AWS IAM Role

Grants write permissions

Include securitylake:UpdateDataLake, record Role ARN

Amazon S3 and Glue

Data storage and querying

Verify S3 bucket and Glue table setup

Network

HTTPS connectivity

Allow port 443, configure private endpoints if needed

Integration

The Integration section outlines the configurations for the Observo AI Amazon Security Lake Destination. To configure Amazon Security Lake as a destination in Observo AI, follow these steps to set up and test the data flow:

  1. Log in to Observo AI:

    • Navigate to the Destinations tab.

    • Click the "Add Destinations" button and select "Create New".

    • Choose "Amazon Security Lake" from the list of available destinations to begin configuration.

  2. General Settings:

    • Name: Add a unique identifier such as security-lake-dest-1.

    • Description (Optional): Add a description for the destination.

    • S3 Bucket Name: The S3 bucket name. This must not include a leading s3:// or a trailing /.

      Examples

      aws-security-data-lake-us-east-1-xxxxxxxxxxxxxxx

    • AWS Region: Specify the AWS Region for Security Lake such as us-east-1.

    • Key Prefix: Prefix to apply to all object keys. Provide the prefix that will be applied to all object keys, ensuring it follows the format given in the example. Use the custom source value that was configured when setting up the AWS Security Lake. Enter the AWS account ID from which the security data originates. Choose a file name prefix according to your preference. Specify the AWS region that corresponds to the provided account ID. Ensure the event day follows the fixed format outlined in the example. Default: ext/<CUSTOM_SRC_NAME>/region=<AWS REGION>/accountId=<AWS ACCOUNT ID>/eventDay=%Y%m%d/<FILE NAME PREFIX>

      Examples

      ext/PaloAlto/region=us-east-1/accountId=1234455555/eventDay=%Y%m%d/observo

      ext/PaloAlto/region=us-east-1/accountId=1234455555/eventDay=%Y%m%d/

  3. Authentication:

    • Auth Access Key Id (Optional): The AWS access key ID. Example: AKIAIOSFODNN7EXAMPLE

    • Auth Secret Access Key (Optional): The AWS secret access key. Example: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

    • Provider Assume Role ARN: The ARN of the provider assumes the role. The account ID refers to the account where AWS security lake is configured. This is a required field along with an external ID.

      Example

      arn:aws:iam::123456789098:role/my_role

    • External ID: External ID to use when assuming a role. You should have this value when configuring the custom source for AWS security lake.

      Example

      addghvggf345dguufds

    • Auth Region (Optional): The AWS region to send STS requests to. Defaults to the configured region for the service itself. Example:

      Example

      us-west-2

    • Auth Load Timeout Secs (Optional): Timeout for successfully loading any credentials, in seconds. Relevant when the default credentials chain is used or assume_role.

      Example

      30

      • Auth IMDS Connect Timeout Seconds (Optional): Connect timeout for IMDS.

      • Auth IMDS Max Attempts (Optional): Number of IMDS retries for fetching tokens and metadata.

      • Auth IMDS Read Timeout Seconds (Optional): Read timeout for IMDS.

  4. Parquet Settings:

    • Encoding Codec: The codec to use for encoding events. Parquet is needed for Amazon Security Lake. Default: Parquet.

    • Parquet Schema: Enter parquet schema for encoding. The parquet schema must adhere to the same OCSF version as AWS Security Lake described in OCSF documentation

  5. Request Configuration (Optional):

    • Request Concurrency: Configuration for outbound request concurrency. Default: Adaptive Concurrency.

      Options

      A fixed concurrency of 1

      Adaptive concurrency

    • Request Rate Limit Duration Secs: The time window used for the rate_limit_num option. Default: 1

    • Request Rate Limit Num: The maximum number of requests allowed within the rate_limit_duration_secs time window.

    • Request Retry Attempts: The maximum number of retries to make for failed requests. The default, represents an infinite number of retries.

    • Request Retry Initial Backoff Secs: The amount of time to wait in seconds before attempting the first retry for a failed request. After the first retry has failed, the fibonacci sequence will be used to select future backoffs. Default: 1

    • Request Retry Max Duration Secs: The maximum amount of time to wait between retries. Default: 3600

    • Request Timeout Secs: The time a request waits before being aborted. It is recommended that this value is not lowered below the service’s internal timeout, as this could create orphaned requests, and duplicate data downstream. Default: 60

  6. Batching Configuration (Optional):

    • Batch Max Bytes: The maximum size of a batch that will be processed. This is based on the uncompressed size of the batched events, before they are serialized / compressed. Files sent to Security Lake should be sent in increments between 5 minutes and 1 event day. Customers may send files more often than 5 minutes if files are larger than 256MB in size. The object and size requirement is to optimize Security Lake for Query Performance. Not following the custom source requirements may have an impact on your Security Lake performance. Default: 100000000

    • Batch Max Events: The maximum size of a batch before it is flushed. Default: 1000

    • Batch Timeout Secs: The maximum age of a batch before it is flushed. Batches sent to Security Lake should be flushed every 5 minutes to 1 day. Files over 256MB can be sent more often than every 5 minutes to optimize query performance. Adhering to these size and timing guidelines is crucial for maintaining Security Lake's performance. Files under 256MB may have their intervals adjusted between 5 minutes and 1 day. Default: 300

  7. Acknowledgement (False):

    • Acknowledgements Enabled (False): Whether or not end-to-end acknowledgements are enabled. When enabled, any source connected to this supporting end-to-end acknowledgements, will wait for events to be acknowledged by the destination before acknowledging them at the source.

  8. TLS Configurations (Optional):

    • TLS CA File: The CA certificate provided as an inline string in PEM format.

    • TLS Crt File: The certificate as a string in PEM format.

    • TLS Key: The key provided as a string in PEM format.

    • TLS Verify Certificate: (False) Enables certificate verification. Certificates must be valid in terms of not being expired, and being issued by a trusted issuer. This verification operates in a hierarchical manner, checking validity of the certificate, the issuer of that certificate and so on until reaching a root certificate. Relevant for both incoming and outgoing connections. Do NOT set this to false unless you understand the risks of not verifying the validity of certificates.

    • TLS Verify Hostname: (False) Enables hostname verification. Hostname used to connect to the remote host must be present in the TLS certificate presented by the remote host, either as the Common Name or as an entry in the Subject Alternative Name extension. Only relevant for outgoing connections. NOT recommended to set this to false unless you understand the risks.

  9. Advanced Settings (Optional):

    • Object Acl: Canned ACL to apply to the created objects. Default: Bucket/object Owner All Access.

      Options

      Authenticated Users Read access

      EC2 readable

      FULL_CONTROL for object and bucket owner

      Read only for bucket owner

      Logs Writeable Bucket

      Bucket/Object Owner All Access

      AllUsers readable

      AllUsers Read Write

    • Compression: Compression configuration. All compression algorithms use the default compression level. Some cloud storage APIs and browsers will handle decompression, so files may not appear to be compressed. Default: Gzip Compression.

      Options

      Gzip compression

      Zstd compression

      No compression

    • Filename Extension: The filename extension to use in the object key. This overrides setting the extension based on the configured compression. Default: ​​gz.parquet

      Example

      parquet

    • Filename Time Format: The timestamp format for the time component of the object key. By default, object keys are appended with timestamp (in epoch seconds) reflecting when the objects are sent to S3. The resulting object key is the key prefix followed by the formatted timestamp such as date=2022-07-18/1658176486. Supports strftime specifiers. Default: %s

    • Server Side Encryption: The Server-side Encryption algorithm used when storing these objects.

      Options

      AES-256 Encryption (SSE-S3)

      AES-256 Encryption managed by AWS KMS (SSE-KMS / SSE-C)

    • Ssekms Key Id: Specifies the ID of the AWS Key Management Service (AWS KMS) symmetrical customer master key (CMK) that is used for the created objects. Only applies when server_side_encryption is configured to use KMS. If not specified, Amazon S3 uses the AWS managed CMK in AWS to protect the data.

      Example

      abcd1234

    • Storage Class: The S3 Storage Class for the created objects. Default: Standard Redundancy

      Options

      Glacier Deep Archive.

      Glacier Flexible Retrieval.

      Intelligent Tiering.

      Infrequently Accessed (Single Availability zone).

      Reduced Redundancy

      Standard Redundancy

      Infrequently Accessed.

    • Tags (Add): A list of tag key-value pairs

  10. Save and Test Configuration:

    • Save the configuration settings.

    • Send sample data and verify that it reaches the specified Security Lake S3 bucket and is queryable via Athena.

Example Scenarios

CyberSafe Inc., a fictitious organization, wants to integrate Observo with Amazon Security Lake to centralize security telemetry data for analysis. They have set up Amazon Security Lake in the us-west-2 region, created a custom source named observo-custom-source, and configured an IAM Role for Observo to write data. The data will be stored in an S3 bucket managed by Security Lake, and they will use the OCSF schema (version 1.1.0) for compatibility.

Standard Amazon Security Lake Destination Setup

Here is a standard Amazon Security Lake Destination configuration example. Only the required sections and their associated field updates are displayed in the table below:

General Settings

Field
Value
Description

Name

security-lake-cybersafe-1

Unique identifier for the destination.

Description

Centralizes CyberSafe Inc.'s security telemetry data in Amazon Security Lake for threat detection and compliance.

Provides context for the destination's purpose.

S3 Bucket Name

aws-security-lake-cybersafe-us-west-2

S3 bucket name managed by Security Lake, without s3:// or trailing /.

AWS Region

us-west-2

AWS region where Security Lake is enabled.

Key Prefix

ext/observo-custom-source/region=us-west-2/accountId=123456789012/eventDay=%Y%m%d/cybersafe-data

Follows format: ext/<CUSTOM_SRC_NAME>/region=<AWS REGION>/accountId=<AWS ACCOUNT ID>/eventDay=%Y%m%d/<FILE NAME PREFIX>. Uses custom source observo-custom-source, account ID 123456789012, and prefix cybersafe-data.

Authentication

Field
Value
Description

Provider Assume Role ARN

arn:aws:iam::123456789012:role/ObservoSecurityLakeRole

ARN of the IAM Role allows Observo to write to Security Lake, with account ID 123456789012.

External ID

cybersafe-observo-2025-uuid-1234

External ID from custom source setup in Security Lake for secure role assumption.

Auth Region

us-west-2

AWS region for sending STS requests, matching Security Lake region.

Parquet Settings

Field
Value
Description

Encoding Codec

Parquet

Specifies Parquet as the encoding format, required for Security Lake.

Parquet Schema

ocsf-1.1.0

OCSF schema version 1.1.0, ensuring compatibility with Security Lake’s data format.

Save and Test:

  • Save settings, send sample data, verify ingestion in S3 bucket, query via Athena.

  • Saves configuration, tests data flow to aws-security-lake-cybersafe-us-west-2, and confirms OCSF compliance using Athena.

Notes:

  • Ensure the IAM Role ObservoSecurityLakeRole has securitylake:UpdateDataLake permission and a trust policy allowing role assumption with the External ID.

  • Verify HTTPS connectivity (port 443) to Security Lake and S3 endpoints.

  • Monitor Observo AI’s Notifications tab and CloudWatch Logs for errors.

Troubleshooting

If issues arise with the Amazon Security Lake Destination in Observo AI, use the following steps to diagnose and resolve them:

  • Verify Configuration Settings:

    • Ensure fields like AWS Region, Account ID, External ID, Role ARN, and Custom Source Name match the Security Lake setup.

    • Confirm that the data format (JSON or Parquet) aligns with OCSF standards.

  • Check Authentication:

    • Verify that the Role ARN and External ID are correct and that the IAM Role has the securitylake:UpdateDataLake permission.

    • Ensure the trust policy allows Observo AI to assume the role (IAM Role Trust Policy).

  • Monitor Logs:

    • Check Observo AI’s Notifications tab for errors or warnings related to data transmission.

    • Use Amazon CloudWatch Logs or Security Lake’s S3 bucket to confirm data arrival (Query Security Lake).

  • Validate Data Format and Schema:

    • Ensure data fields align with the OCSF schema to prevent dropped events.

    • Verify that the Custom Source Name matches the Security Lake configuration.

  • Network and Connectivity:

    • Ensure Observo AI can reach Security Lake and S3 endpoints over HTTPS (port 443).

    • If using private endpoints, verify their configuration (AWS PrivateLink).

  • Common Error Messages:

    • "Access Denied": Indicates invalid IAM Role permissions or incorrect Role ARN/External ID. Verify role permissions and trust policy.

    • "No data ingested": Confirm data is being sent and matches the OCSF schema. Check the Custom Source Name and S3 bucket.

    • "Invalid schema": Ensure data fields comply with OCSF standards for the custom source.

    • "Connectivity issues": Verify HTTPS access to Security Lake and S3 endpoints.

  • Test Data Flow:

    • Send sample data and verify ingestion in the Security Lake S3 bucket.

    • Use the Analytics tab in the targeted Observo AI pipeline to monitor data volume and ensure expected throughput.

Issue
Possible Cause
Resolution

Data not ingested

Incorrect Custom Source Name or schema

Verify configuration and OCSF compliance

Access Denied

Invalid Role ARN or permissions

Check IAM Role and trust policy

Invalid schema

Non-compliant OCSF data

Align data fields with OCSF schema

Connectivity issues

Firewall or private endpoint issues

Allow HTTPS on port 443, verify endpoints

Resources

For additional guidance and detailed information, refer to the following resources:

Best Practices:

  • For Security Lake-specific configurations, ensure you have the necessary IAM permissions and follow the principle of least privilege, as recommended in the Security Lake security best practices.

  • If you encounter issues accessing these URLs or need further assistance, you can check the AWS Management Console or contact AWS Support for region-specific documentation or updates.

  • Always verify the AWS Region you are operating in, as some configurations such as S3 bucket setup or PrivateLink endpoints are region-specific.

Last updated

Was this helpful?