AWS Security Lake
Writes events in the OCSF schema format to Amazon Security Lake. It only ingests data adhering to the OCSF schema and stores it in the Parquet file format. The OCSF version must align with the specifications detailed in the OCSF documentation, specifically, OCSF version 1.0.0-rc.2 and OCSF version 1.1.0.
Purpose
The purpose of the Observo AI Amazon Security Lake Destination is to enable the seamless transmission of security and telemetry data from Observo AI to Amazon Security Lake for centralized storage, analysis, and querying. It integrates with Amazon Security Lake to consolidate data in a standardized format (OCSF), facilitating advanced analytics, threat detection, and compliance monitoring using AWS services like Amazon Athena. This destination helps organizations gain actionable insights by leveraging Security Lake’s scalable data lake capabilities for security data management.
Prerequisites
Before configuring the Amazon Security Lake Destination in Observo AI, ensure the following requirements are met to facilitate seamless data ingestion:
Observo AI Platform Setup:
The Observo AI Site must be installed and available.
Verify that the platform can send data in formats compatible with Amazon Security Lake, such as JSON or Parquet, adhering to the Open Cybersecurity Schema Framework (OCSF).
Amazon Security Lake Setup:
Create an Amazon Security Lake account and enable the service in the desired AWS Region (Amazon Security Lake).
Configure a custom source in Security Lake to receive data from Observo AI (Custom Sources).
Note the AWS Region, Account ID, and External ID for the custom source configuration.
AWS IAM Role:
Create an IAM Role in AWS with permissions to write to Amazon Security Lake. The role must include the securitylake:UpdateDataLake permission (IAM Roles for Security Lake).
Ensure the role’s trust policy allows Observo AI to assume the role using the External ID noted from the custom source.
Record the Role ARN (Amazon Resource Name) for configuration.
Network and Connectivity:
Ensure Observo AI can communicate with Amazon Security Lake endpoints over HTTPS (port 443).
If using private endpoints, VPC configurations, or firewall rules, configure them to allow access to Security Lake APIs and S3 endpoints (AWS PrivateLink).
Observo AI Platform
Must support Security Lake formats
Verify compatibility with OCSF (JSON/Parquet)
Amazon Security Lake
Storage and analysis hub
Create custom source, note Region, Account ID, External ID
AWS IAM Role
Grants write permissions
Include securitylake:UpdateDataLake, record Role ARN
Amazon S3 and Glue
Data storage and querying
Verify S3 bucket and Glue table setup
Network
HTTPS connectivity
Allow port 443, configure private endpoints if needed
Integration
The Integration section outlines the configurations for the Observo AI Amazon Security Lake Destination. To configure Amazon Security Lake as a destination in Observo AI, follow these steps to set up and test the data flow:
Log in to Observo AI:
Navigate to the Destinations tab.
Click the "Add Destinations" button and select "Create New".
Choose "Amazon Security Lake" from the list of available destinations to begin configuration.
General Settings:
Name: Add a unique identifier such as security-lake-dest-1.
Description (Optional): Add a description for the destination.
S3 Bucket Name: The S3 bucket name. This must not include a leading s3:// or a trailing /.
Examplesaws-security-data-lake-us-east-1-xxxxxxxxxxxxxxx
AWS Region: Specify the AWS Region for Security Lake such as us-east-1.
Key Prefix: Prefix to apply to all object keys. Provide the prefix that will be applied to all object keys, ensuring it follows the format given in the example. Use the custom source value that was configured when setting up the AWS Security Lake. Enter the AWS account ID from which the security data originates. Choose a file name prefix according to your preference. Specify the AWS region that corresponds to the provided account ID. Ensure the event day follows the fixed format outlined in the example. Default: ext/<CUSTOM_SRC_NAME>/region=<AWS REGION>/accountId=<AWS ACCOUNT ID>/eventDay=%Y%m%d/<FILE NAME PREFIX>
Examplesext/PaloAlto/region=us-east-1/accountId=1234455555/eventDay=%Y%m%d/observo
ext/PaloAlto/region=us-east-1/accountId=1234455555/eventDay=%Y%m%d/
Authentication:
Auth Access Key Id (Optional): The AWS access key ID. Example: AKIAIOSFODNN7EXAMPLE
Auth Secret Access Key (Optional): The AWS secret access key. Example: wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Provider Assume Role ARN: The ARN of the provider assumes the role. The account ID refers to the account where AWS security lake is configured. This is a required field along with an external ID.
Examplearn:aws:iam::123456789098:role/my_role
External ID: External ID to use when assuming a role. You should have this value when configuring the custom source for AWS security lake.
Exampleaddghvggf345dguufds
Auth Region (Optional): The AWS region to send STS requests to. Defaults to the configured region for the service itself. Example:
Exampleus-west-2
Auth Load Timeout Secs (Optional): Timeout for successfully loading any credentials, in seconds. Relevant when the default credentials chain is used or assume_role.
Example30
Auth IMDS Connect Timeout Seconds (Optional): Connect timeout for IMDS.
Auth IMDS Max Attempts (Optional): Number of IMDS retries for fetching tokens and metadata.
Auth IMDS Read Timeout Seconds (Optional): Read timeout for IMDS.
Parquet Settings:
Encoding Codec: The codec to use for encoding events. Parquet is needed for Amazon Security Lake. Default: Parquet.
Parquet Schema: Enter parquet schema for encoding. The parquet schema must adhere to the same OCSF version as AWS Security Lake described in OCSF documentation
Request Configuration (Optional):
Request Concurrency: Configuration for outbound request concurrency. Default: Adaptive Concurrency.
OptionsA fixed concurrency of 1
Adaptive concurrency
Request Rate Limit Duration Secs: The time window used for the rate_limit_num option. Default: 1
Request Rate Limit Num: The maximum number of requests allowed within the rate_limit_duration_secs time window.
Request Retry Attempts: The maximum number of retries to make for failed requests. The default, represents an infinite number of retries.
Request Retry Initial Backoff Secs: The amount of time to wait in seconds before attempting the first retry for a failed request. After the first retry has failed, the fibonacci sequence will be used to select future backoffs. Default: 1
Request Retry Max Duration Secs: The maximum amount of time to wait between retries. Default: 3600
Request Timeout Secs: The time a request waits before being aborted. It is recommended that this value is not lowered below the service’s internal timeout, as this could create orphaned requests, and duplicate data downstream. Default: 60
Batching Configuration (Optional):
Batch Max Bytes: The maximum size of a batch that will be processed. This is based on the uncompressed size of the batched events, before they are serialized / compressed. Files sent to Security Lake should be sent in increments between 5 minutes and 1 event day. Customers may send files more often than 5 minutes if files are larger than 256MB in size. The object and size requirement is to optimize Security Lake for Query Performance. Not following the custom source requirements may have an impact on your Security Lake performance. Default: 100000000
Batch Max Events: The maximum size of a batch before it is flushed. Default: 1000
Batch Timeout Secs: The maximum age of a batch before it is flushed. Batches sent to Security Lake should be flushed every 5 minutes to 1 day. Files over 256MB can be sent more often than every 5 minutes to optimize query performance. Adhering to these size and timing guidelines is crucial for maintaining Security Lake's performance. Files under 256MB may have their intervals adjusted between 5 minutes and 1 day. Default: 300
Acknowledgement (False):
Acknowledgements Enabled (False): Whether or not end-to-end acknowledgements are enabled. When enabled, any source connected to this supporting end-to-end acknowledgements, will wait for events to be acknowledged by the destination before acknowledging them at the source.
TLS Configurations (Optional):
TLS CA File: The CA certificate provided as an inline string in PEM format.
TLS Crt File: The certificate as a string in PEM format.
TLS Key: The key provided as a string in PEM format.
TLS Verify Certificate: (False) Enables certificate verification. Certificates must be valid in terms of not being expired, and being issued by a trusted issuer. This verification operates in a hierarchical manner, checking validity of the certificate, the issuer of that certificate and so on until reaching a root certificate. Relevant for both incoming and outgoing connections. Do NOT set this to false unless you understand the risks of not verifying the validity of certificates.
TLS Verify Hostname: (False) Enables hostname verification. Hostname used to connect to the remote host must be present in the TLS certificate presented by the remote host, either as the Common Name or as an entry in the Subject Alternative Name extension. Only relevant for outgoing connections. NOT recommended to set this to false unless you understand the risks.
Advanced Settings (Optional):
Object Acl: Canned ACL to apply to the created objects. Default: Bucket/object Owner All Access.
OptionsAuthenticated Users Read access
EC2 readable
FULL_CONTROL for object and bucket owner
Read only for bucket owner
Logs Writeable Bucket
Bucket/Object Owner All Access
AllUsers readable
AllUsers Read Write
Compression: Compression configuration. All compression algorithms use the default compression level. Some cloud storage APIs and browsers will handle decompression, so files may not appear to be compressed. Default: Gzip Compression.
OptionsGzip compression
Zstd compression
No compression
Filename Extension: The filename extension to use in the object key. This overrides setting the extension based on the configured compression. Default: gz.parquet
Exampleparquet
Filename Time Format: The timestamp format for the time component of the object key. By default, object keys are appended with timestamp (in epoch seconds) reflecting when the objects are sent to S3. The resulting object key is the key prefix followed by the formatted timestamp such as date=2022-07-18/1658176486. Supports strftime specifiers. Default: %s
Server Side Encryption: The Server-side Encryption algorithm used when storing these objects.
OptionsAES-256 Encryption (SSE-S3)
AES-256 Encryption managed by AWS KMS (SSE-KMS / SSE-C)
Ssekms Key Id: Specifies the ID of the AWS Key Management Service (AWS KMS) symmetrical customer master key (CMK) that is used for the created objects. Only applies when server_side_encryption is configured to use KMS. If not specified, Amazon S3 uses the AWS managed CMK in AWS to protect the data.
Exampleabcd1234
Storage Class: The S3 Storage Class for the created objects. Default: Standard Redundancy
OptionsGlacier Deep Archive.
Glacier Flexible Retrieval.
Intelligent Tiering.
Infrequently Accessed (Single Availability zone).
Reduced Redundancy
Standard Redundancy
Infrequently Accessed.
Tags (Add): A list of tag key-value pairs
Save and Test Configuration:
Save the configuration settings.
Send sample data and verify that it reaches the specified Security Lake S3 bucket and is queryable via Athena.
Example Scenarios
CyberSafe Inc., a fictitious organization, wants to integrate Observo with Amazon Security Lake to centralize security telemetry data for analysis. They have set up Amazon Security Lake in the us-west-2 region, created a custom source named observo-custom-source, and configured an IAM Role for Observo to write data. The data will be stored in an S3 bucket managed by Security Lake, and they will use the OCSF schema (version 1.1.0) for compatibility.
Standard Amazon Security Lake Destination Setup
Here is a standard Amazon Security Lake Destination configuration example. Only the required sections and their associated field updates are displayed in the table below:
General Settings
Name
security-lake-cybersafe-1
Unique identifier for the destination.
Description
Centralizes CyberSafe Inc.'s security telemetry data in Amazon Security Lake for threat detection and compliance.
Provides context for the destination's purpose.
S3 Bucket Name
aws-security-lake-cybersafe-us-west-2
S3 bucket name managed by Security Lake, without s3:// or trailing /.
AWS Region
us-west-2
AWS region where Security Lake is enabled.
Key Prefix
ext/observo-custom-source/region=us-west-2/accountId=123456789012/eventDay=%Y%m%d/cybersafe-data
Follows format: ext/<CUSTOM_SRC_NAME>/region=<AWS REGION>/accountId=<AWS ACCOUNT ID>/eventDay=%Y%m%d/<FILE NAME PREFIX>. Uses custom source observo-custom-source, account ID 123456789012, and prefix cybersafe-data.
Authentication
Provider Assume Role ARN
arn:aws:iam::123456789012:role/ObservoSecurityLakeRole
ARN of the IAM Role allows Observo to write to Security Lake, with account ID 123456789012.
External ID
cybersafe-observo-2025-uuid-1234
External ID from custom source setup in Security Lake for secure role assumption.
Auth Region
us-west-2
AWS region for sending STS requests, matching Security Lake region.
Parquet Settings
Encoding Codec
Parquet
Specifies Parquet as the encoding format, required for Security Lake.
Parquet Schema
ocsf-1.1.0
OCSF schema version 1.1.0, ensuring compatibility with Security Lake’s data format.
Save and Test:
Save settings, send sample data, verify ingestion in S3 bucket, query via Athena.
Saves configuration, tests data flow to aws-security-lake-cybersafe-us-west-2, and confirms OCSF compliance using Athena.
Notes:
Ensure the IAM Role ObservoSecurityLakeRole has securitylake:UpdateDataLake permission and a trust policy allowing role assumption with the External ID.
Verify HTTPS connectivity (port 443) to Security Lake and S3 endpoints.
Monitor Observo AI’s Notifications tab and CloudWatch Logs for errors.
Troubleshooting
If issues arise with the Amazon Security Lake Destination in Observo AI, use the following steps to diagnose and resolve them:
Verify Configuration Settings:
Ensure fields like AWS Region, Account ID, External ID, Role ARN, and Custom Source Name match the Security Lake setup.
Confirm that the data format (JSON or Parquet) aligns with OCSF standards.
Check Authentication:
Verify that the Role ARN and External ID are correct and that the IAM Role has the securitylake:UpdateDataLake permission.
Ensure the trust policy allows Observo AI to assume the role (IAM Role Trust Policy).
Monitor Logs:
Check Observo AI’s Notifications tab for errors or warnings related to data transmission.
Use Amazon CloudWatch Logs or Security Lake’s S3 bucket to confirm data arrival (Query Security Lake).
Validate Data Format and Schema:
Ensure data fields align with the OCSF schema to prevent dropped events.
Verify that the Custom Source Name matches the Security Lake configuration.
Network and Connectivity:
Ensure Observo AI can reach Security Lake and S3 endpoints over HTTPS (port 443).
If using private endpoints, verify their configuration (AWS PrivateLink).
Common Error Messages:
"Access Denied": Indicates invalid IAM Role permissions or incorrect Role ARN/External ID. Verify role permissions and trust policy.
"No data ingested": Confirm data is being sent and matches the OCSF schema. Check the Custom Source Name and S3 bucket.
"Invalid schema": Ensure data fields comply with OCSF standards for the custom source.
"Connectivity issues": Verify HTTPS access to Security Lake and S3 endpoints.
Test Data Flow:
Send sample data and verify ingestion in the Security Lake S3 bucket.
Use the Analytics tab in the targeted Observo AI pipeline to monitor data volume and ensure expected throughput.
Data not ingested
Incorrect Custom Source Name or schema
Verify configuration and OCSF compliance
Access Denied
Invalid Role ARN or permissions
Check IAM Role and trust policy
Invalid schema
Non-compliant OCSF data
Align data fields with OCSF schema
Connectivity issues
Firewall or private endpoint issues
Allow HTTPS on port 443, verify endpoints
Resources
For additional guidance and detailed information, refer to the following resources:
Best Practices:
For Security Lake-specific configurations, ensure you have the necessary IAM permissions and follow the principle of least privilege, as recommended in the Security Lake security best practices.
If you encounter issues accessing these URLs or need further assistance, you can check the AWS Management Console or contact AWS Support for region-specific documentation or updates.
Always verify the AWS Region you are operating in, as some configurations such as S3 bucket setup or PrivateLink endpoints are region-specific.
Last updated
Was this helpful?

