AWS S3 Archival

The Observo AI AWS S3 Archival destination enables cost-effective, secure, and scalable long-term storage of observability and security data in Amazon S3, supporting formats like JSON, CSV, and Parquet with customizable storage classes and lifecycle policies for optimized data retention and compliance.

Purpose

The AWS S3 Archival Destination in Observo AI enables the secure and efficient storage of event data, logs, and metrics in Amazon S3, supporting long-term retention and compliance requirements. It facilitates data archival in formats like Parquet, with robust AWS authentication, server-side encryption, and structured key prefixes for organized storage. This destination is ideal for organizations needing cost-effective, scalable, and compliant archival solutions, such as for audit logs or transactional data. It integrates seamlessly with Observo AI’s data pipelines to ensure reliable data transfer and accessibility for analytics or regulatory purposes.

Permissions Required Archival and Hydration

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "ListBucket",
      "Effect": "Allow",
      "Action": "s3:ListBucket",
      "Resource": "arn:aws:s3:::<bucket-name>"
    },
    {
      "Sid": "ReadWriteObjects",
      "Effect": "Allow",
      "Action": [
        "s3:GetObject",
        "s3:PutObject",
        "s3:PutObjectAcl"
      ],
      "Resource": "arn:aws:s3:::<bucket-name>/*"
    }
  ]
}

Prerequisites

Before configuring the AWS S3 Archival Destination in Observo AI, ensure the following requirements are met to facilitate seamless data archival:

  • Observo AI Platform Setup:

    • The Observo AI platform must be installed, operational, and configured to support AWS S3 as an archival destination.

    • For data archiving in Parquet format (.parquet, .parq, .pqt), confirm platform support and configure any necessary settings for Parquet compatibility.

  • AWS Account and Permissions:

    • An active AWS account with access to the target S3 bucket designated for archival is required.

    • Required IAM permissions for S3 archival operations:

      • s3:PutObject for uploading archived data.

      • s3:ListBucket to list bucket contents.

      • s3:GetBucketLocation to verify the bucket’s region.

      • For server-side encryption, include kms:GenerateDataKey and kms:Decrypt permissions for AWS KMS keys, if applicable.

  • Authentication:

    • Configure one of the following authentication methods:

      • Auto Authentication: Utilize IAM roles, shared credentials, environment variables, or a JSON credentials file.

      • Manual Authentication: Provide an AWS access key and secret key.

      • Secret Authentication: Use a stored secret within Observo AI’s secure storage for archival purposes.

  • Network and Connectivity:

    • Ensure Observo AI can connect to AWS S3 services for archival. If using VPC endpoints, verify their configuration for S3 access.

    • Check for proxy settings or firewall rules that may impact connectivity to AWS S3 archival endpoints.

Prerequisite
Description
Notes

Observo AI Platform

Must be installed and support S3 archival destinations

Verify Parquet support for archival, if needed

AWS Account

Active account with S3 bucket access for archival

Ensure bucket is created and accessible

IAM Permissions

Required for S3 archival operations

Include KMS permissions for encryption

Authentication

Auto, Manual, or Secret for archival access

Prepare credentials for secure access

Network

Connectivity to AWS S3 archival services

Verify VPC endpoints and proxy settings

Integration

To configure AWS S3 Archival as a destination in Observo AI, follow these steps to set up and test the data flow:

  1. Log in to Observo AI:

    • Navigate to the Destinations tab.

    • Click the Add Destination button and select Create New.

    • Choose AWS S3 from the list of available destinations to begin configuration.

    • Select use as archival to true

    • Select AWS S3 Archival

  2. General Settings:

    • Name: Provide a unique identifier for the destination, e.g., s3-destination-1.

    • Description (Optional): Add a description for the destination.

    • Bucket: Enter the name of the target S3 bucket.

      Example

      my-bucket

    • Region: Specify the AWS region of the S3 bucket

      Example

      us-east-1

    • Key Prefix: Prefix to apply to all object keys. Useful for partitioning objects. Must end in / to act as a directory path. Default: {{ _ob.source }}/year=%Y/month=%m/day=%d/

      Examples

      date=%F/hour=%H

      year=%Y/month=%m/day=%d

      application_id={{ application_id }}/date=%F

      %Y/%m/%d

      date=%F

    • ACL: Canned ACL to apply to the created objects.

      Select the option:
      Description

      Authenticated Users Read access

      Read access granted to authenticated AWS users

      EC2 readable

      Allows Amazon EC2 instances to read objects

      FULL_CONTROL for object and bucket owner

      Grants full control to both object and bucket owners

      Read only for bucket owner

      Only the bucket owner can read objects

      Logs Writeable Bucket

      Allows write access for S3 log delivery

      Bucket/Object Owner All Access

      Grants full access to object and bucket owner

      AllUsers readable

      Public read access for everyone on the internet

      AllUsers Read Write

      Public read/write access for everyone globally

  3. Acknowledgments:

    • Acknowledgements Enabled (False): Whether or not end-to-end acknowledgements are enabled. When enabled, any source connected to this supporting end-to-end acknowledgements, will wait for events to be acknowledged by the sink before acknowledging them at the source.

  4. Authentication (Optional):

    • Auth Access Key Id: Enter The AWS access key ID

      Example

      AKIAIOSFODNN7EXAMPLE

    • Auth Secret Access Key: Enter the AWS secret access key.

      Example

      wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

    • Auth Assume Role: Enter the ARN of an IAM role to assume.

      Example

      arn:aws:iam::123456789098:role/my_role

    • Auth Region: Enter the AWS region to send STS requests to. Defaults to the configured region for the service itself.

      Example

      us-east-1

    • Auth Load Timeout Secs: Timeout for successfully loading any credentials, in seconds. Relevant when the default credentials chain is used or assume_role.

      Example

      30

    • Auth Imds Connect Timeout Seconds (Optional): Connect timeout for IMDS. Default: Empty

    • Auth Imds Max Attempts: Enter number of IMDS retries for fetching tokens and metadata. Default: None

    • Auth Imds Read Timeout Seconds: Read timeout for IMDS. Default: None

    • Auth IMDS Read Timeout Seconds: Read timeout for IMDS. Default: None

  5. Request Configuration (Optional):

    • Request Concurrency: Configuration for outbound request concurrency. Default: Adaptive concurrency.

Options
Description

Adaptive concurrency

Adjusts parallelism based on system load

A fixed concurrency of 1

Processes one task at a time only

  • Request Rate Limit Duration Secs: The time window used for the rate_limit_num option. Default: 1.

  • Request Rate Limit Num: The maximum number of requests allowed within the rate_limit_duration_secs time window.

  • Request Retry Attempts: The maximum number of retries to make for failed requests. The default, represents an infinite number of retries. Default: Unlimited.

  • Request Retry Initial Backoff Secs: The amount of time to wait in seconds before attempting the first retry for a failed request. After the first retry has failed, the fibonacci sequence will be used to select future backoffs. Default: 1.

  • Request Retry Max Duration Secs: The maximum amount of time to wait between retries. Default: 3600.

  • Request Timeout Secs: The time a request waits before being aborted. It is recommended that this value is not lowered below the service’s internal timeout, as this could create orphaned requests, and duplicate data downstream. Default: 60.

  1. Encoding:

    • Encoding Codec: The codec to use for encoding events. Default: JSON Encoding.

      Options
      Sub-Options

      JSON Encoding

      Pretty JSON (False): Format JSON with indentation and line breaks for better readability. Encoding Avro Schema (Optional): The Avro schema. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format

      logfmt Encoding

      Encoding Avro Schema (Optional): The Avro schema. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format

      Apache Avro Encoding

      Avro Schema: Specify the Apache Avro schema definition for serializing events. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Avro Schema (Optional): The Avro schema. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format

      Newline Delimited JSON Encoding

      Encoding Avro Schema (Optional): The Avro schema. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (Default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format

      No encoding

      Encoding Avro Schema (Optional): The Avro schema. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format

      Plain text encoding

      Encoding Avro Schema (Optional): The Avro schema. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format

      Parquet

      Include Raw Log (False): Capture the complete log message as an additional field(observo_record) apart from the given schema. Examples: In addition to the Parquet schema, there will be a field named "observo_record" in the Parquet file. Parquet Schema: Enter parquet schema for encoding. Examples: message root { optional binary stream; optional binary time; optional group kubernetes { optional binary pod_name; optional binary pod_id; optional binary docker_id; optional binary container_hash; optional binary container_image; optional group labels { optional binary pod-template-hash; } } } Encoding Avro Schema (Optional): The Avro schema. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format

      Common Event Format (CEF)

      CEF Device Event Class ID: Provide a unique identifier for categorizing the type of event (maximum 1023 characters). Example: login-failure CEF Device Product: Specify the product name that generated the event (maximum 63 characters). Example: Log Analyzer CEF Device Vendor: Specify the vendor name that produced the event (maximum 63 characters). Example: Observo CEF Device Version: Specify the version of the product that generated the event (maximum 31 characters). Example: 1.0.0 CEF Extensions (Add): Define custom key-value pairs for additional event data fields in CEF format. CEF Name: Provide a human-readable description of the event (maximum 512 characters). Example: cef.name CEF Severity: Indicate the importance of the event with a value from 0 (lowest) to 10 (highest). Example: 5 CEF Version (Select): Specify which version of the CEF specification to use for formatting. - CEF specification version 0.1 - CEF specification version 1.x Encoding Avro Schema (Optional): The Avro schema. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format

      CSV Format

      CSV Fields (Add): Specify the field names to include as columns in the CSV output and their order. Examples: - timestamp - host - message CSV Buffer Capacity (Optional): Set the internal buffer size (in bytes) used when writing CSV data. Example: 8192 CSV Delimitier (Optional): Set the character that separates fields in the CSV output. Example: , Enable Double Quote Escapes (True): When enabled, quotes in field data are escaped by doubling them. When disabled, an escape character is used instead. CSV Escape Character (Optional): Set the character used to escape quotes when double_quote is disabled. Example: <br> CSV Quote Character (Optional): Set the character used for quoting fields in the CSV output. Example: " CSV Quoting Style (Optional): Control when field values should be wrapped in quote characters. Options: - Always quot all fields - Quote only when necessary - Never use quotes - Quote all non-numeric fields Encoding Avro Schema (Optional): The Avro schema. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format

      Protocol Buffers

      Protobuf Message Type: Specify the fully qualified message type name for Protobuf serialization. Example: package.Message Protobuf Descriptor File: Specify the path to the compiled protobuf descriptor file (.desc). Example: /path/to/descriptor.desc Encoding Avro Schema (Optional): The Avro schema. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format

      Graylog Extended Log Format (GELF)

      Encoding Avro Schema (Optional): The Avro schema. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format

  2. Buffering Configuration (Optional):

    • Buffer Max Events: The maximum size of the buffer on disk. Default: 1000

  3. Batching Requirements (Default):

    • Batch Max Bytes: The maximum size of a batch before it is flushed. Default: 100000000

    • Batch Max Events:The maximum size of a batch before it is flushed. Default: 1000

    • Batch Timeout Secs: The maximum age of a batch before it is flushed. Default: 300

  4. TLS Configuration (Optional):

    • TLS CA: Provide the CA certificate in PEM format.

    • TLS CRT: Provide the client certificate in PEM format.

    • TLS Key: Provide the private key in PEM format.

    • Verify Certificate (False): Enables certificate verification. Certificates must be valid in terms of not being expired, and being issued by a trusted issuer. This verification operates in a hierarchical manner, checking validity of the certificate, the issuer of that certificate and so on until reaching a root certificate. Relevant for both incoming and outgoing connections. Do NOT set this to false unless you understand the risks of not verifying the validity of certificates.

    • Verify Hostname: Enables hostname verification. If enabled, the hostname used to connect to the remote host must be present in the TLS certificate presented by the remote host, either as the Common Name or as an entry in the Subject Alternative Name extension. Only relevant for outgoing connections. Do NOT set this to false unless you understand the risks of not verifying the remote hostname

  5. Framing (Default): - Framing Character Delimited Delimiter: The ASCII (7-bit) character that delimits byte sequences. Default: Empty - Framing Method: The framing method. Default: Newline Delimeted

    Options
    Descriptions

    Raw Event data (not delimited)

    No framing is applied. This method is best when each event is self-contained.

    Single Character Delimited

    Each event is separated by a specific single character (ASCII value)

    Prefixed with Byte Length

    Each event is prefixed with its byte length, ensuring precise separation between events

    Newline Delimited

    Each event is followed by a newline character (), which is commonly used for logging formats.

  6. Advanced Settings (Optional):

    • Endpoint: Custom endpoint for use with AWS-compatible services.

      Example

      http://127.0.0.0:5000/path/to/service

    • Compression: Compression algorithm to use for the request body. Default: Gzip compression

      Options
      Description

      Gzip compression

      DEFLATE compression with headers for file storage

      No compression

      Data stored and transmitted in original form

      Zlib compression

      DEFLATE format with minimal wrapper and checksums

    • Filename Extension: The filename extension to use in the object key. This overrides setting the extension based on the configured compression.

      Example

      json

    • Filename Time Format: The timestamp format for the time component of the object key. By default, object keys are appended with timestamp (in epoch seconds) reflecting when the objects are sent to S3. The resulting object key is the key prefix followed by the formatted timestamp, eg: date=2022-07-18/1658176486. Supports strftime specifiers. Default: %s

      Example

      %s

    • Content Encoding: Overrides what content encoding has been applied to the object. Directly comparable to the Content-Encoding HTTP header. If not specified, the compression scheme used dictates this value.

      Example

      gzip

    • Content Type: Overrides the MIME type of the object. Directly comparable to the Content-Type HTTP header. If not specified, the compression scheme used dictates this value. When compression is set to none, the value text/x-log is used.

      Example

      application/gzip

    • Grant Full Control: Grants READ, READ_ACP, and WRITE_ACP permissions on the created objects to the named [grantee]. This allows the grantee to read the created objects and their metadata, as well as read and modify the ACL on the created objects.

      Examples

      79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be

      http://acs.amazonaws.com/groups/global/AllUsers

    • Grant Read: Grants READ permissions on the created objects to the named [grantee]. This allows the grantee to read the created objects and their metadata.

      Examples

      79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be

      http://acs.amazonaws.com/groups/global/AllUsers

    • Grant Read Acp: Grants READ_ACP permissions on the created objects to the named [grantee]. This allows the grantee to read the ACL on the created objects.

      Examples

      79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be

      http://acs.amazonaws.com/groups/global/AllUsers

    • Grant Write Acp: Grants WRITE_ACP permissions on the created objects to the named [grantee]. This allows the grantee to modify the ACL on the created objects.

      Examples

      79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be

      http://acs.amazonaws.com/groups/global/AllUsers

    • Server Side Encryption: The Server-side Encryption algorithm used when storing these objects.

      Select from options:

      AES-256 Encryption (SSE-S3)

      AES-256 Encryption managed by AWS KMS (SSE-KMS / SSE-C)

    • Ssekms Key Id: Specifies the ID of the AWS Key Management Service (AWS KMS) symmetrical customer master key (CMK) that is used for the created objects. Only applies when server_side_encryption is configured to use KMS. If not specified, Amazon S3 uses the AWS managed CMK in AWS to protect the data.

      Example

      abcd1234

    • Storage Class: The S3 Storage Class for the created objects. Default: Standard Redundancy

      Select the Option:
      Description

      Glacier Deep Archive

      Lowest cost, long-term archival storage option

      Glacier Flexible Retrieval

      Low-cost archive with flexible access speeds

      Intelligent Tiering

      Automatically moves data to cost tiers

      Infrequently Accessed (Single Availability Zone)

      Low-cost storage in one availability zone

      Reduced Redundancy

      Lower durability, lower cost for duplicates

      Standard Redundancy

      High durability, multi-zone general-purpose storage

      Infrequently Accessed

      Low-cost, high-durability for less access

    • ​​Tags: A list of tag key-value pairs. (Add key-value pairs as needed)

  7. Save and Test Configuration:

    • Save the configuration settings.

    • Send sample data to the S3 bucket and verify that it is stored correctly.

Example Scenarios

FinSecure Solutions, a fictitious financial services enterprise specializing in wealth management and investment banking, generates vast transactional data, audit logs, and compliance records that must meet SEC Rule 17a-4 requirements. To ensure secure, cost-effective, and compliant long-term storage, FinSecure integrates Observo AI with AWS S3 in the us-east-1 region, using the Glacier Deep Archive storage class and Parquet format for efficient querying, server-side encryption for security, and structured data organization for compliance audits. The configuration leverages robust AWS authentication and TLS security to seamlessly integrate with FinSecure’s existing AWS infrastructure.

Standard AWS S3 Archival Destination Setup

Here is a standard AWS S3 Archival Destination configuration example. Only the required sections and their associated field updates are displayed in the table below:

General Settings

Field
Value
Description

Name

finsecure-archival

Unique identifier for the archival destination.

Description

Archival destination for FinSecure transactional logs and compliance records

Optional description for clarity.

Bucket

finsecure-compliance-archive

Target S3 bucket for archival.

Region

us-east-1

AWS region where the bucket is located.

Key Prefix

compliance/{{ _ob.source }}/year=%Y/month=%m/day=%d/

Organizes archived objects by source, year, month, and day.

ACL

Read only for bucket owner

Ensures only the bucket owner can access archived objects for compliance.

Acknowledgements Enabled

True

Enables end-to-end acknowledgments to ensure data is archived before source acknowledgment.

Authentication

Field
Value
Description

Auth Access Key Id

AKIAIOSFODNN7FINSEC

AWS access key ID for authentication.

Auth Secret Access Key

wJalrXUtnFEMI/K7MDENG/bPxRfiCYFINSECKEY

AWS secret access key for authentication.

Auth Assume Role

arn:aws:iam::987654321098:role/FinSecureArchivalRole

IAM role ARN for secure access to the archival bucket.

Auth Region

us-east-1

Region for STS requests, matching the bucket region.

Auth Load Timeout Secs

30

Timeout for loading credentials (seconds).

Auth Imds Connect Timeout Seconds

5

Connect timeout for IMDS (Instance Metadata Service).

Auth Imds Max Attempts

3

Number of retries for fetching IMDS tokens/metadata.

Auth Imds Read Timeout Seconds

5

Read timeout for IMDS requests.

Request Configuration

Field
Value
Description

Request Concurrency

Adaptive concurrency

Adjusts parallelism based on system load for efficient archival.

Request Rate Limit Duration Secs

1

Time window for rate limiting requests.

Request Rate Limit Num

100

Maximum number of requests allowed within the time window.

Request Retry Attempts

3

Maximum retries for failed archival requests.

Request Retry Initial Backoff Secs

1

Initial wait time before the first retry (seconds).

Request Retry Max Duration Secs

3600

Maximum wait time between retries (seconds).

Request Timeout Secs

60

Timeout before aborting a request to prevent orphaned requests.

Encoding

Field
Value
Description

Encoding Codec

Parquet

Uses Parquet format for efficient storage and querying of archived data.

Include Raw Log

True

Captures complete log message as an observo_record field in the Parquet file.

Parquet Schema

message root { optional binary stream; optional binary time; optional group transaction { optional binary account_id; optional binary transaction_id; optional binary amount; optional binary timestamp; } }

Defines schema for transactional data archival.

Encoding Avro Schema

{ "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }, { "name": "account_id", "type": "string" }, { "name": "transaction_id", "type": "string" }] }

Optional Avro schema for additional serialization.

Encoding Metric Tag Values

Tags exposed as arrays of strings

Exposes all metric tags as arrays for compliance auditing.

Encoding Timestamp Format

RFC3339 format

Standard timestamp format for consistency.

Buffering Configuration

Field
Value
Description

Buffer Max Events

5000

Maximum size of the disk buffer to handle large transactional datasets.

Batching Configuration

Field
Value
Description

Batch Max Bytes

50000000

Maximum batch size before flushing (50 MB) to optimize archival.

Batch Max Events

2000

Maximum events in a batch before flushing to balance throughput.

Batch Timeout Secs

600

Maximum batch age before flushing (10 minutes) for timely archival.

TLS Configuration

Field
Value
Description

TLS CA

<PEM-encoded CA certificate>

CA certificate in PEM format for secure connections.

TLS CRT

<PEM-encoded client certificate>

Client certificate in PEM format for authentication.

TLS Key

<PEM-encoded private key>

Private key in PEM format for secure communication.

Verify Certificate

True

Enables certificate verification to ensure valid, trusted certificates.

Verify Hostname

True

Ensures the hostname matches the TLS certificate for outgoing connections.

Framing

Field
Value
Description

Framing Character Delimited Delimiter

\n

Uses newline as the delimiter for byte sequences in archived data.

Framing Method

Newline Delimited

Each event is separated by a newline character, suitable for Parquet files.

Advanced Settings

Field
Value
Description

Endpoint

https://s3.us-east-1.amazonaws.com

AWS S3 endpoint for the us-east-1 region.

Compression

Gzip compression

Uses Gzip to compress archived data for cost efficiency.

Filename Extension

parquet

Specifies Parquet extension for archived objects.

Filename Time Format

%Y-%m-%d-%H

Timestamp format for object keys (e.g., compliance/source/2025/07/12/14).

Content Encoding

gzip

Specifies Gzip as the content encoding for archived objects.

Content Type

application/x-parquet

MIME type for Parquet files.

Grant Full Control

Grants full control to the compliance team’s email for audit access.

Grant Read

Grants read access to the compliance team.

Grant Read Acp

Grants read ACL permissions to the compliance team.

Grant Write Acp

Grants write ACL permissions to the compliance team.

Server Side Encryption

AES-256 Encryption (SSE-KMS)

Uses KMS-managed encryption for compliance.

Ssekms Key Id

finsecure-kms-key-1234

KMS key ID for encryption.

Storage Class

Glacier Deep Archive

Lowest-cost storage for long-term compliance archival.

Tags

Key: Project, Value: Compliance; Key: Environment, Value: Production

Tags for organizing and tracking archived objects.

Test Configuration

  • Send sample transactional data to the S3 bucket (finsecure-compliance-archive).

  • Check the bucket contents to confirm data is archived correctly in compliance/source/year=2025/month=07/day=12/ with the .parquet extension.

Troubleshooting

If issues occur with the AWS S3 Archival destination in Observo AI, use these steps to diagnose and resolve them:

  • Verify Configuration Settings:

    • Confirm that Bucket Name, Region, and Authentication fields match the AWS S3 archival setup.

    • Ensure the S3 bucket exists, is accessible, and is configured for archival in the specified region.

  • Check Authentication:

    • Auto Authentication: Verify IAM roles, shared credentials, or environment variables are correctly set up for archival access.

    • Manual Authentication: Ensure the access key and secret key are valid for the archival bucket.

    • Secret Authentication: Confirm the secret is accessible in Observo AI’s secure storage.

  • Validate Permissions:

    • Verify credentials have required permissions: s3:PutObject, s3:ListBucket, s3:GetBucketLocation.

    • For KMS encryption, ensure kms:GenerateDataKey and kms:Decrypt permissions are granted.

    • If using an IAM role, confirm it is correctly assumed for archival operations.

  • Network and Connectivity:

    • Check for firewall rules, VPC endpoint configurations, or proxy settings blocking access to AWS S3 archival services.

    • Test connectivity using the AWS CLI with similar proxy settings to confirm S3 archival access.

  • Common Error Messages:

    • "Access Denied": Indicates insufficient permissions. Verify IAM permissions for the archival bucket and KMS keys.

    • "Bucket does not exist": Check the bucket name and region. Ensure no certificate validation issues.

    • "Inaccessible host": May indicate TLS version mismatches or DNS issues. Confirm TLS compatibility and DNS settings.

  • Monitor Data:

    • Verify archived data is written to the S3 bucket by checking bucket contents.

    • Use Observo AI’s Analytics tab to monitor data volume and ensure expected archival throughput.

Issue
Possible Cause
Resolution

Data not archived

Incorrect bucket name or region

Verify bucket name and region

Authentication errors

Invalid credentials or role

Check authentication method and permissions

Connectivity issues

Firewall or proxy blocking archival access

Test network connectivity and VPC endpoints

"Access Denied"

Insufficient permissions

Verify IAM permissions for S3 and KMS

"Bucket does not exist"

Incorrect bucket name or certificate issues

Check bucket name and certificate settings

"Inaccessible host"

TLS or DNS issues

Ensure TLS compatibility and check DNS

Resources

For additional guidance on AWS S3 Archival setup, refer to these resources:

  • AWS Documentation:

    • Amazon S3 Bucket Configuration: Guide to configuring S3 buckets for archival storage.

    • Amazon S3 Permissions: Details on setting up permissions for archival operations.

    • AWS KMS Encryption: Information on configuring server-side encryption for archived data.

  • Best Practices:

    • Optimize bucket organization for archival with lifecycle policies to transition objects to Glacier or Deep Archive.

    • Enable versioning to protect archived data from overwrites or deletions.

    • Use S3 Storage Lens for insights into archival storage usage and cost optimization.

Last updated

Was this helpful?