AWS S3 Archival
The Observo AI AWS S3 Archival destination enables cost-effective, secure, and scalable long-term storage of observability and security data in Amazon S3, supporting formats like JSON, CSV, and Parquet with customizable storage classes and lifecycle policies for optimized data retention and compliance.
Purpose
The AWS S3 Archival Destination in Observo AI enables the secure and efficient storage of event data, logs, and metrics in Amazon S3, supporting long-term retention and compliance requirements. It facilitates data archival in formats like Parquet, with robust AWS authentication, server-side encryption, and structured key prefixes for organized storage. This destination is ideal for organizations needing cost-effective, scalable, and compliant archival solutions, such as for audit logs or transactional data. It integrates seamlessly with Observo AI’s data pipelines to ensure reliable data transfer and accessibility for analytics or regulatory purposes.
Permissions Required Archival and Hydration
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "ListBucket",
"Effect": "Allow",
"Action": "s3:ListBucket",
"Resource": "arn:aws:s3:::<bucket-name>"
},
{
"Sid": "ReadWriteObjects",
"Effect": "Allow",
"Action": [
"s3:GetObject",
"s3:PutObject",
"s3:PutObjectAcl"
],
"Resource": "arn:aws:s3:::<bucket-name>/*"
}
]
}Prerequisites
Before configuring the AWS S3 Archival Destination in Observo AI, ensure the following requirements are met to facilitate seamless data archival:
Observo AI Platform Setup:
The Observo AI platform must be installed, operational, and configured to support AWS S3 as an archival destination.
For data archiving in Parquet format (.parquet, .parq, .pqt), confirm platform support and configure any necessary settings for Parquet compatibility.
AWS Account and Permissions:
An active AWS account with access to the target S3 bucket designated for archival is required.
Required IAM permissions for S3 archival operations:
s3:PutObject for uploading archived data.
s3:ListBucket to list bucket contents.
s3:GetBucketLocation to verify the bucket’s region.
For server-side encryption, include kms:GenerateDataKey and kms:Decrypt permissions for AWS KMS keys, if applicable.
Authentication:
Configure one of the following authentication methods:
Auto Authentication: Utilize IAM roles, shared credentials, environment variables, or a JSON credentials file.
Manual Authentication: Provide an AWS access key and secret key.
Secret Authentication: Use a stored secret within Observo AI’s secure storage for archival purposes.
Network and Connectivity:
Ensure Observo AI can connect to AWS S3 services for archival. If using VPC endpoints, verify their configuration for S3 access.
Check for proxy settings or firewall rules that may impact connectivity to AWS S3 archival endpoints.
Observo AI Platform
Must be installed and support S3 archival destinations
Verify Parquet support for archival, if needed
AWS Account
Active account with S3 bucket access for archival
Ensure bucket is created and accessible
IAM Permissions
Required for S3 archival operations
Include KMS permissions for encryption
Authentication
Auto, Manual, or Secret for archival access
Prepare credentials for secure access
Network
Connectivity to AWS S3 archival services
Verify VPC endpoints and proxy settings
Integration
To configure AWS S3 Archival as a destination in Observo AI, follow these steps to set up and test the data flow:
Log in to Observo AI:
Navigate to the Destinations tab.
Click the Add Destination button and select Create New.
Choose AWS S3 from the list of available destinations to begin configuration.
Select use as archival to true
Select AWS S3 Archival
General Settings:
Name: Provide a unique identifier for the destination, e.g., s3-destination-1.
Description (Optional): Add a description for the destination.
Bucket: Enter the name of the target S3 bucket.
Examplemy-bucket
Region: Specify the AWS region of the S3 bucket
Exampleus-east-1
Key Prefix: Prefix to apply to all object keys. Useful for partitioning objects. Must end in / to act as a directory path. Default: {{ _ob.source }}/year=%Y/month=%m/day=%d/
Examplesdate=%F/hour=%H
year=%Y/month=%m/day=%d
application_id={{ application_id }}/date=%F
%Y/%m/%d
date=%F
ACL: Canned ACL to apply to the created objects.
Select the option:DescriptionAuthenticated Users Read access
Read access granted to authenticated AWS users
EC2 readable
Allows Amazon EC2 instances to read objects
FULL_CONTROL for object and bucket owner
Grants full control to both object and bucket owners
Read only for bucket owner
Only the bucket owner can read objects
Logs Writeable Bucket
Allows write access for S3 log delivery
Bucket/Object Owner All Access
Grants full access to object and bucket owner
AllUsers readable
Public read access for everyone on the internet
AllUsers Read Write
Public read/write access for everyone globally
Acknowledgments:
Acknowledgements Enabled (False): Whether or not end-to-end acknowledgements are enabled. When enabled, any source connected to this supporting end-to-end acknowledgements, will wait for events to be acknowledged by the sink before acknowledging them at the source.
Authentication (Optional):
Auth Access Key Id: Enter The AWS access key ID
ExampleAKIAIOSFODNN7EXAMPLE
Auth Secret Access Key: Enter the AWS secret access key.
ExamplewJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY
Auth Assume Role: Enter the ARN of an IAM role to assume.
Examplearn:aws:iam::123456789098:role/my_role
Auth Region: Enter the AWS region to send STS requests to. Defaults to the configured region for the service itself.
Exampleus-east-1
Auth Load Timeout Secs: Timeout for successfully loading any credentials, in seconds. Relevant when the default credentials chain is used or assume_role.
Example30
Auth Imds Connect Timeout Seconds (Optional): Connect timeout for IMDS. Default: Empty
Auth Imds Max Attempts: Enter number of IMDS retries for fetching tokens and metadata. Default: None
Auth Imds Read Timeout Seconds: Read timeout for IMDS. Default: None
Auth IMDS Read Timeout Seconds: Read timeout for IMDS. Default: None
Request Configuration (Optional):
Request Concurrency: Configuration for outbound request concurrency. Default: Adaptive concurrency.
Adaptive concurrency
Adjusts parallelism based on system load
A fixed concurrency of 1
Processes one task at a time only
Request Rate Limit Duration Secs: The time window used for the rate_limit_num option. Default: 1.
Request Rate Limit Num: The maximum number of requests allowed within the rate_limit_duration_secs time window.
Request Retry Attempts: The maximum number of retries to make for failed requests. The default, represents an infinite number of retries. Default: Unlimited.
Request Retry Initial Backoff Secs: The amount of time to wait in seconds before attempting the first retry for a failed request. After the first retry has failed, the fibonacci sequence will be used to select future backoffs. Default: 1.
Request Retry Max Duration Secs: The maximum amount of time to wait between retries. Default: 3600.
Request Timeout Secs: The time a request waits before being aborted. It is recommended that this value is not lowered below the service’s internal timeout, as this could create orphaned requests, and duplicate data downstream. Default: 60.
Encoding:
Encoding Codec: The codec to use for encoding events. Default: JSON Encoding.
OptionsSub-OptionsJSON Encoding
Pretty JSON (False): Format JSON with indentation and line breaks for better readability. Encoding Avro Schema (Optional): The Avro schema. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format
logfmt Encoding
Encoding Avro Schema (Optional): The Avro schema. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format
Apache Avro Encoding
Avro Schema: Specify the Apache Avro schema definition for serializing events. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Avro Schema (Optional): The Avro schema. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format
Newline Delimited JSON Encoding
Encoding Avro Schema (Optional): The Avro schema. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (Default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format
No encoding
Encoding Avro Schema (Optional): The Avro schema. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format
Plain text encoding
Encoding Avro Schema (Optional): The Avro schema. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format
Parquet
Include Raw Log (False): Capture the complete log message as an additional field(observo_record) apart from the given schema. Examples: In addition to the Parquet schema, there will be a field named "observo_record" in the Parquet file. Parquet Schema: Enter parquet schema for encoding. Examples: message root { optional binary stream; optional binary time; optional group kubernetes { optional binary pod_name; optional binary pod_id; optional binary docker_id; optional binary container_hash; optional binary container_image; optional group labels { optional binary pod-template-hash; } } } Encoding Avro Schema (Optional): The Avro schema. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format
Common Event Format (CEF)
CEF Device Event Class ID: Provide a unique identifier for categorizing the type of event (maximum 1023 characters). Example: login-failure CEF Device Product: Specify the product name that generated the event (maximum 63 characters). Example: Log Analyzer CEF Device Vendor: Specify the vendor name that produced the event (maximum 63 characters). Example: Observo CEF Device Version: Specify the version of the product that generated the event (maximum 31 characters). Example: 1.0.0 CEF Extensions (Add): Define custom key-value pairs for additional event data fields in CEF format. CEF Name: Provide a human-readable description of the event (maximum 512 characters). Example: cef.name CEF Severity: Indicate the importance of the event with a value from 0 (lowest) to 10 (highest). Example: 5 CEF Version (Select): Specify which version of the CEF specification to use for formatting. - CEF specification version 0.1 - CEF specification version 1.x Encoding Avro Schema (Optional): The Avro schema. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format
CSV Format
CSV Fields (Add): Specify the field names to include as columns in the CSV output and their order. Examples: - timestamp - host - message CSV Buffer Capacity (Optional): Set the internal buffer size (in bytes) used when writing CSV data. Example: 8192 CSV Delimitier (Optional): Set the character that separates fields in the CSV output. Example: , Enable Double Quote Escapes (True): When enabled, quotes in field data are escaped by doubling them. When disabled, an escape character is used instead. CSV Escape Character (Optional): Set the character used to escape quotes when double_quote is disabled. Example: <br> CSV Quote Character (Optional): Set the character used for quoting fields in the CSV output. Example: " CSV Quoting Style (Optional): Control when field values should be wrapped in quote characters. Options: - Always quot all fields - Quote only when necessary - Never use quotes - Quote all non-numeric fields Encoding Avro Schema (Optional): The Avro schema. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format
Protocol Buffers
Protobuf Message Type: Specify the fully qualified message type name for Protobuf serialization. Example: package.Message Protobuf Descriptor File: Specify the path to the compiled protobuf descriptor file (.desc). Example: /path/to/descriptor.desc Encoding Avro Schema (Optional): The Avro schema. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format
Graylog Extended Log Format (GELF)
Encoding Avro Schema (Optional): The Avro schema. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format
Buffering Configuration (Optional):
Buffer Max Events: The maximum size of the buffer on disk. Default: 1000
Batching Requirements (Default):
Batch Max Bytes: The maximum size of a batch before it is flushed. Default: 100000000
Batch Max Events:The maximum size of a batch before it is flushed. Default: 1000
Batch Timeout Secs: The maximum age of a batch before it is flushed. Default: 300
TLS Configuration (Optional):
TLS CA: Provide the CA certificate in PEM format.
TLS CRT: Provide the client certificate in PEM format.
TLS Key: Provide the private key in PEM format.
Verify Certificate (False): Enables certificate verification. Certificates must be valid in terms of not being expired, and being issued by a trusted issuer. This verification operates in a hierarchical manner, checking validity of the certificate, the issuer of that certificate and so on until reaching a root certificate. Relevant for both incoming and outgoing connections. Do NOT set this to false unless you understand the risks of not verifying the validity of certificates.
Verify Hostname: Enables hostname verification. If enabled, the hostname used to connect to the remote host must be present in the TLS certificate presented by the remote host, either as the Common Name or as an entry in the Subject Alternative Name extension. Only relevant for outgoing connections. Do NOT set this to false unless you understand the risks of not verifying the remote hostname
Framing (Default): - Framing Character Delimited Delimiter: The ASCII (7-bit) character that delimits byte sequences. Default: Empty - Framing Method: The framing method. Default: Newline Delimeted
OptionsDescriptionsRaw Event data (not delimited)
No framing is applied. This method is best when each event is self-contained.
Single Character Delimited
Each event is separated by a specific single character (ASCII value)
Prefixed with Byte Length
Each event is prefixed with its byte length, ensuring precise separation between events
Newline Delimited
Each event is followed by a newline character (), which is commonly used for logging formats.
Advanced Settings (Optional):
Endpoint: Custom endpoint for use with AWS-compatible services.
Examplehttp://127.0.0.0:5000/path/to/service
Compression: Compression algorithm to use for the request body. Default: Gzip compression
OptionsDescriptionGzip compression
DEFLATE compression with headers for file storage
No compression
Data stored and transmitted in original form
Zlib compression
DEFLATE format with minimal wrapper and checksums
Filename Extension: The filename extension to use in the object key. This overrides setting the extension based on the configured compression.
Examplejson
Filename Time Format: The timestamp format for the time component of the object key. By default, object keys are appended with timestamp (in epoch seconds) reflecting when the objects are sent to S3. The resulting object key is the key prefix followed by the formatted timestamp, eg: date=2022-07-18/1658176486. Supports strftime specifiers. Default: %s
Example%s
Content Encoding: Overrides what content encoding has been applied to the object. Directly comparable to the Content-Encoding HTTP header. If not specified, the compression scheme used dictates this value.
Examplegzip
Content Type: Overrides the MIME type of the object. Directly comparable to the Content-Type HTTP header. If not specified, the compression scheme used dictates this value. When compression is set to none, the value text/x-log is used.
Exampleapplication/gzip
Grant Full Control: Grants READ, READ_ACP, and WRITE_ACP permissions on the created objects to the named [grantee]. This allows the grantee to read the created objects and their metadata, as well as read and modify the ACL on the created objects.
Examples79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be
http://acs.amazonaws.com/groups/global/AllUsers
Grant Read: Grants READ permissions on the created objects to the named [grantee]. This allows the grantee to read the created objects and their metadata.
Examples79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be
http://acs.amazonaws.com/groups/global/AllUsers
Grant Read Acp: Grants READ_ACP permissions on the created objects to the named [grantee]. This allows the grantee to read the ACL on the created objects.
Examples79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be
http://acs.amazonaws.com/groups/global/AllUsers
Grant Write Acp: Grants WRITE_ACP permissions on the created objects to the named [grantee]. This allows the grantee to modify the ACL on the created objects.
Examples79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be
http://acs.amazonaws.com/groups/global/AllUsers
Server Side Encryption: The Server-side Encryption algorithm used when storing these objects.
Select from options:AES-256 Encryption (SSE-S3)
AES-256 Encryption managed by AWS KMS (SSE-KMS / SSE-C)
Ssekms Key Id: Specifies the ID of the AWS Key Management Service (AWS KMS) symmetrical customer master key (CMK) that is used for the created objects. Only applies when server_side_encryption is configured to use KMS. If not specified, Amazon S3 uses the AWS managed CMK in AWS to protect the data.
Exampleabcd1234
Storage Class: The S3 Storage Class for the created objects. Default: Standard Redundancy
Select the Option:DescriptionGlacier Deep Archive
Lowest cost, long-term archival storage option
Glacier Flexible Retrieval
Low-cost archive with flexible access speeds
Intelligent Tiering
Automatically moves data to cost tiers
Infrequently Accessed (Single Availability Zone)
Low-cost storage in one availability zone
Reduced Redundancy
Lower durability, lower cost for duplicates
Standard Redundancy
High durability, multi-zone general-purpose storage
Infrequently Accessed
Low-cost, high-durability for less access
Tags: A list of tag key-value pairs. (Add key-value pairs as needed)
Save and Test Configuration:
Save the configuration settings.
Send sample data to the S3 bucket and verify that it is stored correctly.
Example Scenarios
FinSecure Solutions, a fictitious financial services enterprise specializing in wealth management and investment banking, generates vast transactional data, audit logs, and compliance records that must meet SEC Rule 17a-4 requirements. To ensure secure, cost-effective, and compliant long-term storage, FinSecure integrates Observo AI with AWS S3 in the us-east-1 region, using the Glacier Deep Archive storage class and Parquet format for efficient querying, server-side encryption for security, and structured data organization for compliance audits. The configuration leverages robust AWS authentication and TLS security to seamlessly integrate with FinSecure’s existing AWS infrastructure.
Standard AWS S3 Archival Destination Setup
Here is a standard AWS S3 Archival Destination configuration example. Only the required sections and their associated field updates are displayed in the table below:
General Settings
Name
finsecure-archival
Unique identifier for the archival destination.
Description
Archival destination for FinSecure transactional logs and compliance records
Optional description for clarity.
Bucket
finsecure-compliance-archive
Target S3 bucket for archival.
Region
us-east-1
AWS region where the bucket is located.
Key Prefix
compliance/{{ _ob.source }}/year=%Y/month=%m/day=%d/
Organizes archived objects by source, year, month, and day.
ACL
Read only for bucket owner
Ensures only the bucket owner can access archived objects for compliance.
Acknowledgements Enabled
True
Enables end-to-end acknowledgments to ensure data is archived before source acknowledgment.
Authentication
Auth Access Key Id
AKIAIOSFODNN7FINSEC
AWS access key ID for authentication.
Auth Secret Access Key
wJalrXUtnFEMI/K7MDENG/bPxRfiCYFINSECKEY
AWS secret access key for authentication.
Auth Assume Role
arn:aws:iam::987654321098:role/FinSecureArchivalRole
IAM role ARN for secure access to the archival bucket.
Auth Region
us-east-1
Region for STS requests, matching the bucket region.
Auth Load Timeout Secs
30
Timeout for loading credentials (seconds).
Auth Imds Connect Timeout Seconds
5
Connect timeout for IMDS (Instance Metadata Service).
Auth Imds Max Attempts
3
Number of retries for fetching IMDS tokens/metadata.
Auth Imds Read Timeout Seconds
5
Read timeout for IMDS requests.
Request Configuration
Request Concurrency
Adaptive concurrency
Adjusts parallelism based on system load for efficient archival.
Request Rate Limit Duration Secs
1
Time window for rate limiting requests.
Request Rate Limit Num
100
Maximum number of requests allowed within the time window.
Request Retry Attempts
3
Maximum retries for failed archival requests.
Request Retry Initial Backoff Secs
1
Initial wait time before the first retry (seconds).
Request Retry Max Duration Secs
3600
Maximum wait time between retries (seconds).
Request Timeout Secs
60
Timeout before aborting a request to prevent orphaned requests.
Encoding
Encoding Codec
Parquet
Uses Parquet format for efficient storage and querying of archived data.
Include Raw Log
True
Captures complete log message as an observo_record field in the Parquet file.
Parquet Schema
message root { optional binary stream; optional binary time; optional group transaction { optional binary account_id; optional binary transaction_id; optional binary amount; optional binary timestamp; } }
Defines schema for transactional data archival.
Encoding Avro Schema
{ "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }, { "name": "account_id", "type": "string" }, { "name": "transaction_id", "type": "string" }] }
Optional Avro schema for additional serialization.
Encoding Metric Tag Values
Tags exposed as arrays of strings
Exposes all metric tags as arrays for compliance auditing.
Encoding Timestamp Format
RFC3339 format
Standard timestamp format for consistency.
Buffering Configuration
Buffer Max Events
5000
Maximum size of the disk buffer to handle large transactional datasets.
Batching Configuration
Batch Max Bytes
50000000
Maximum batch size before flushing (50 MB) to optimize archival.
Batch Max Events
2000
Maximum events in a batch before flushing to balance throughput.
Batch Timeout Secs
600
Maximum batch age before flushing (10 minutes) for timely archival.
TLS Configuration
TLS CA
<PEM-encoded CA certificate>
CA certificate in PEM format for secure connections.
TLS CRT
<PEM-encoded client certificate>
Client certificate in PEM format for authentication.
TLS Key
<PEM-encoded private key>
Private key in PEM format for secure communication.
Verify Certificate
True
Enables certificate verification to ensure valid, trusted certificates.
Verify Hostname
True
Ensures the hostname matches the TLS certificate for outgoing connections.
Framing
Framing Character Delimited Delimiter
\n
Uses newline as the delimiter for byte sequences in archived data.
Framing Method
Newline Delimited
Each event is separated by a newline character, suitable for Parquet files.
Advanced Settings
Endpoint
https://s3.us-east-1.amazonaws.com
AWS S3 endpoint for the us-east-1 region.
Compression
Gzip compression
Uses Gzip to compress archived data for cost efficiency.
Filename Extension
parquet
Specifies Parquet extension for archived objects.
Filename Time Format
%Y-%m-%d-%H
Timestamp format for object keys (e.g., compliance/source/2025/07/12/14).
Content Encoding
gzip
Specifies Gzip as the content encoding for archived objects.
Content Type
application/x-parquet
MIME type for Parquet files.
Grant Full Control
Grants full control to the compliance team’s email for audit access.
Server Side Encryption
AES-256 Encryption (SSE-KMS)
Uses KMS-managed encryption for compliance.
Ssekms Key Id
finsecure-kms-key-1234
KMS key ID for encryption.
Storage Class
Glacier Deep Archive
Lowest-cost storage for long-term compliance archival.
Tags
Key: Project, Value: Compliance; Key: Environment, Value: Production
Tags for organizing and tracking archived objects.
Test Configuration
Send sample transactional data to the S3 bucket (finsecure-compliance-archive).
Check the bucket contents to confirm data is archived correctly in compliance/source/year=2025/month=07/day=12/ with the .parquet extension.
Troubleshooting
If issues occur with the AWS S3 Archival destination in Observo AI, use these steps to diagnose and resolve them:
Verify Configuration Settings:
Confirm that Bucket Name, Region, and Authentication fields match the AWS S3 archival setup.
Ensure the S3 bucket exists, is accessible, and is configured for archival in the specified region.
Check Authentication:
Auto Authentication: Verify IAM roles, shared credentials, or environment variables are correctly set up for archival access.
Manual Authentication: Ensure the access key and secret key are valid for the archival bucket.
Secret Authentication: Confirm the secret is accessible in Observo AI’s secure storage.
Validate Permissions:
Verify credentials have required permissions: s3:PutObject, s3:ListBucket, s3:GetBucketLocation.
For KMS encryption, ensure kms:GenerateDataKey and kms:Decrypt permissions are granted.
If using an IAM role, confirm it is correctly assumed for archival operations.
Network and Connectivity:
Check for firewall rules, VPC endpoint configurations, or proxy settings blocking access to AWS S3 archival services.
Test connectivity using the AWS CLI with similar proxy settings to confirm S3 archival access.
Common Error Messages:
"Access Denied": Indicates insufficient permissions. Verify IAM permissions for the archival bucket and KMS keys.
"Bucket does not exist": Check the bucket name and region. Ensure no certificate validation issues.
"Inaccessible host": May indicate TLS version mismatches or DNS issues. Confirm TLS compatibility and DNS settings.
Monitor Data:
Verify archived data is written to the S3 bucket by checking bucket contents.
Use Observo AI’s Analytics tab to monitor data volume and ensure expected archival throughput.
Data not archived
Incorrect bucket name or region
Verify bucket name and region
Authentication errors
Invalid credentials or role
Check authentication method and permissions
Connectivity issues
Firewall or proxy blocking archival access
Test network connectivity and VPC endpoints
"Access Denied"
Insufficient permissions
Verify IAM permissions for S3 and KMS
"Bucket does not exist"
Incorrect bucket name or certificate issues
Check bucket name and certificate settings
"Inaccessible host"
TLS or DNS issues
Ensure TLS compatibility and check DNS
Resources
For additional guidance on AWS S3 Archival setup, refer to these resources:
AWS Documentation:
Amazon S3 Bucket Configuration: Guide to configuring S3 buckets for archival storage.
Amazon S3 Permissions: Details on setting up permissions for archival operations.
AWS KMS Encryption: Information on configuring server-side encryption for archived data.
Best Practices:
Optimize bucket organization for archival with lifecycle policies to transition objects to Glacier or Deep Archive.
Enable versioning to protect archived data from overwrites or deletions.
Use S3 Storage Lens for insights into archival storage usage and cost optimization.
Last updated
Was this helpful?

