AWS S3

AWS S3 is a scalable object storage solution for unstructured data, such as text, binary data, logs, and media files. It is commonly used for data archiving, backup, and analytics. This document outlines the parameters required for configuring AWS S3 as a destination for event storage.

Purpose

Observo AI’s AWS S3 destination is designed to create a cost-effective, high-performance data lake for security and observability data. It can store enriched and normalized telemetry in such formats as Parquet, making it analytics-ready for querying and long-term use. This destination supports compliance needs and integrates with tools using the Open Cybersecurity Schema Framework (OCSF). It also enables advanced features like natural language search and dynamic routing to simplify security operations.

Prerequisites

Before configuring the AWS S3 destination in Observo AI, ensure the following requirements are met to facilitate seamless data export:

  • Observo AI Platform Setup:

    • The Observo AI platform must be installed and operational, with support for AWS S3 as a data destination.

    • If exporting data in Parquet format (.parquet, .parq, .pqt), verify that the platform supports this format, potentially requiring specific configurations.

  • AWS Account and Permissions:

    • An active AWS account with access to the target S3 buckets is required.

    • Required IAM permissions for S3:

      • s3:PutObject

      • s3:ListBucket

      • s3:GetBucketLocation (to determine the bucket's region)

    • If using server-side encryption, additional permissions such as kms:GenerateDataKey and kms:Decrypt may be required for AWS KMS keys.

  • Authentication:

    • Prepare one of the following authentication methods:

      • Auto Authentication: Use IAM roles, shared credentials, environment variables, or a JSON credentials file.

      • Manual Authentication: Provide an AWS access key and secret key.

      • Secret Authentication: Use a stored secret within Observo AI's secure storage.

  • Network and Connectivity:

    • Ensure Observo AI can communicate with AWS S3 services. If using VPC endpoints for S3, verify their configuration.

    • Check for proxy settings or firewall rules that may affect connectivity to AWS endpoints.

Prerequisite
Description
Notes

Observo AI Platform

Must be installed and support S3 destinations

Verify Parquet support if needed

AWS Account

Active account with S3 access

Ensure bucket exists and is accessible

IAM Permissions

Required for S3 operations

Include KMS permissions if using encryption

Authentication

Auto, Manual, or Secret

Prepare credentials accordingly

Network

Connectivity to AWS services

Check VPC endpoints and proxies

Integration

To configure AWS S3 as a destination in Observo AI, follow these steps to set up and test the data flow:

  1. Log in to Observo AI:

    • Navigate to the Destinations tab.

    • Click the Add Destination button and select Create New.

    • Choose AWS S3 from the list of available destinations to begin configuration.

  2. General Settings:

    • Name: Provide a unique identifier for the destination, e.g., s3-destination-1.

    • Description (Optional): Add a description for the destination.

    • Bucket: Enter the name of the target S3 bucket.

      Example

      my-bucket

    • Region: Specify the AWS region of the S3 bucket

      Example

      us-east-1

    • Key Prefix: Prefix to apply to all object keys. Useful for partitioning objects. Must end in / to act as a directory path. Default: %Y/%m/%d

      Examples

      date=%F/hour=%H

      year=%Y/month=%m/day=%d

      application_id={{ application_id }}/date=%F

      %Y/%m/%d

      date=%F

    • Filename Append UUID to Timestamp (True): Whether or not to append a UUID v4 token to the end of the object key’s timestamp portion. Ensure uniqueness of name in high performance use cases.

      Example

      For object key `date=2022-07-18/1658176486`, setting this field to `true` would result in an object key that looked like `date=2022-07-18/1658176486-30f6652c-71da-4f9f-800d-a1189c47c547`

    • ACL: Canned ACL to apply to the created objects.

      Select the option:
      Description

      Authenticated Users Read access

      Read access granted to authenticated AWS users

      EC2 readable

      Allows Amazon EC2 instances to read objects

      FULL_CONTROL for object and bucket owner

      Grants full control to both object and bucket owners

      Read only for bucket owner

      Only the bucket owner can read objects

      Logs Writeable Bucket

      Allows write access for S3 log delivery

      Bucket/Object Owner All Access

      Grants full access to object and bucket owner

      AllUsers readable

      Public read access for everyone on the internet

      AllUsers Read Write

      Public read/write access for everyone globally

  3. Encoding:

    • Encoding Codec: The codec to use for encoding events. Default: JSON Encoding.

      Options
      Sub-Options

      JSON Encoding

      Pretty JSON (False): Format JSON with indentation and line breaks for better readability. Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format

      logfmt Encoding

      Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format

      Apache Avro Encoding

      Avro Schema: Specify the Apache Avro schema definition for serializing events. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format

      Newline Delimited JSON Encoding

      Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (Default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format

      No encoding

      Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format

      Plain text encoding

      Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format

      Parquet

      Include Raw Log (False): Capture the complete log message as an additional field(observo_record) apart from the given schema. Examples: In addition to the Parquet schema, there will be a field named "observo_record" in the Parquet file. Parquet Schema: Enter parquet schema for encoding. Examples: message root { optional binary stream; optional binary time; optional group kubernetes { optional binary pod_name; optional binary pod_id; optional binary docker_id; optional binary container_hash; optional binary container_image; optional group labels { optional binary pod-template-hash; } } } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format (default) - UNIX format

      Common Event Format (CEF)

      CEF Device Event Class ID: Provide a unique identifier for categorizing the type of event (maximum 1023 characters). Example: login-failure CEF Device Product: Specify the product name that generated the event (maximum 63 characters). Example: Log Analyzer CEF Device Vendor: Specify the vendor name that produced the event (maximum 63 characters). Example: Observo CEF Device Version: Specify the version of the product that generated the event (maximum 31 characters). Example: 1.0.0 CEF Extensions (Add): Define custom key-value pairs for additional event data fields in CEF format. CEF Name: Provide a human-readable description of the event (maximum 512 characters). Example: cef.name CEF Severity: Indicate the importance of the event with a value from 0 (lowest) to 10 (highest). Example: 5 CEF Version (Select): Specify which version of the CEF specification to use for formatting. - CEF specification version 0.1 - CEF specification version 1.x Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format

      CSV Format

      CSV Fields (Add): Specify the field names to include as columns in the CSV output and their order. Examples: - timestamp - host - message CSV Buffer Capacity (Optional): Set the internal buffer size (in bytes) used when writing CSV data. Example: 8192 CSV Delimitier (Optional): Set the character that separates fields in the CSV output. Example: , Enable Double Quote Escapes (True): When enabled, quotes in field data are escaped by doubling them. When disabled, an escape character is used instead. CSV Escape Character (Optional): Set the character used to escape quotes when double_quote is disabled. Example: <br> CSV Quote Character (Optional): Set the character used for quoting fields in the CSV output. Example: " CSV Quoting Style (Optional): Control when field values should be wrapped in quote characters. Options: - Always quot all fields - Quote only when necessary - Never use quotes Quote all non-numeric fields Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format

      Protocol Buffers

      Protobuf Message Type: Specify the fully qualified message type name for Protobuf serialization. Example: package.Message Protobuf Descriptor File: Specify the path to the compiled protobuf descriptor file (.desc). Example: /path/to/descriptor.desc Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format (default) - UNIX format

      Graylog Extended Log Format (GELF)

      Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format (default) - UNIX format

  4. Request Configuration (Optional):

    • Request Concurrency: Configuration for outbound request concurrency. Default: Adaptive concurrency.

      Options
      Description

      Adaptive concurrency

      Adjusts parallelism based on system load

      A fixed concurrency of 1

      Processes one task at a time only

    • Request Rate Limit Duration Secs: The time window used for the rate_limit_num option. Default: 1.

    • Request Rate Limit Num: The maximum number of requests allowed within the rate_limit_duration_secs time window.

    • Request Retry Attempts: The maximum number of retries to make for failed requests. The default, represents an infinite number of retries. Default: Unlimited.

    • Request Retry Initial Backoff Secs: The amount of time to wait in seconds before attempting the first retry for a failed request. After the first retry has failed, the fibonacci sequence will be used to select future backoffs. Default: 1.

    • Request Retry Max Duration Secs: The maximum amount of time to wait between retries. Default: 3600.

    • Request Timeout Secs: The time a request waits before being aborted. It is recommended that this value is not lowered below the service’s internal timeout, as this could create orphaned requests, and duplicate data downstream. Default: 60.

  5. TLS Configuration (Optional):

    • TLS CA: Provide the CA certificate in PEM format.

    • TLS CRT: Provide the client certificate in PEM format.

    • TLS Key: Provide the private key in PEM format.

    • Verify Certificate (False): Enables certificate verification. Certificates must be valid in terms of not being expired, and being issued by a trusted issuer. This verification operates in a hierarchical manner, checking validity of the certificate, the issuer of that certificate and so on until reaching a root certificate. Relevant for both incoming and outgoing connections. Do NOT set this to false unless you understand the risks of not verifying the validity of certificates.

    • Verify Hostname: Enables hostname verification. If enabled, the hostname used to connect to the remote host must be present in the TLS certificate presented by the remote host, either as the Common Name or as an entry in the Subject Alternative Name extension. Only relevant for outgoing connections. Do NOT set this to false unless you understand the risks of not verifying the remote hostname

  6. Batching Requirements (Default):

    • Batch Max Bytes: The maximum size of a batch before it is flushed. Default: 100000000

    • Batch Max Events:The maximum size of a batch before it is flushed. Default: 1000

    • Batch Timeout Secs: The maximum age of a batch before it is flushed. Default: 300

  7. Acknowledgments:

    • Acknowledgements Enabled (False): Whether or not end-to-end acknowledgements are enabled. When enabled, any source connected to this supporting end-to-end acknowledgements, will wait for events to be acknowledged by the sink before acknowledging them at the source.

  8. Framing (Default):

    • Framing Character Delimited Delimiter: The ASCII (7-bit) character that delimits byte sequences. Default: Empty

    • Framing Method: The framing method. Default: Newline Delimeted

      Options
      Descriptions

      Raw Event data (not delimited)

      No framing is applied. This method is best when each event is self-contained.

      Single Character Delimited

      Each event is separated by a specific single character (ASCII value)

      Prefixed with Byte Length

      Each event is prefixed with its byte length, ensuring precise separation between events

      Newline Delimited

      Each event is followed by a newline character (), which is commonly used for logging formats.

  9. Authentication (Optional):

    • Auth Access Key Id: Enter The AWS access key ID

      Example

      wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

    • Auth Secret Access Key: Enter the AWS secret access key.

      Example

      wJalrXUtnFEMI/K7MDENG/bPxRfiCYEXAMPLEKEY

    • Auth Assume Role: Enter the ARN of an IAM role to assume.

      Example

      arn:aws:iam::123456789098:role/my_role

    • Auth Region: Enter the AWS region to send STS requests to. Defaults to the configured region for the service itself.

      Example

      us-east-1

    • Auth Load Timeout Secs: Timeout for successfully loading any credentials, in seconds. Relevant when the default credentials chain is used or assume_role.

      Example

      30

    • Auth Imds Connect Timeout Seconds (Optional): Connect timeout for IMDS. Default: Empty

    • Auth Imds Max Attempts: Enter number of IMDS retries for fetching tokens and metadata. Default: None

    • Auth Imds Read Timeout Seconds: Read timeout for IMDS. Default: None

    • Auth IMDS Read Timeout Seconds: Read timeout for IMDS. Default: None

  10. Buffering Configuration (Optional):

    • Buffer Type: Specifies the buffering mechanism for event delivery.

      Options
      Description

      Memory

      High-Performance, in-memory buffering Max Events: The maximum number of events allowed in the buffer. Default: 500 When Full: Event handling behavior when a buffer is full. Default: Block - Block: Wait for free space in the buffer.This applies backpressure up the topology, signalling that sources should slow down the acceptance/consumption of events. This means that while no data is lost, data will pile up at the edge. - Drop Newest: Drop the event instead of waiting for free space in the buffer. The event will be intentionally dropped. This mode is typically used when performance is the highest priority, and it is preferable to temporarily lose events rather than cause a slowdown in the acceptance/consumption of events.

      Disk

      Lower-Performance, Less-costly, on disk buffering Max Bytes Size: The maximum number of bytes size allowed in the buffer. Must be at-least 268435488 When Full: Event handling behavior when a buffer is full. Default: Block - Block: Wait for free space in the buffer. This applies backpressure up the topology, signalling that sources should slow down the acceptance/consumption of events. This means that while no data is lost, data will pile up at the edge. - Drop Newest: Drop the event instead of waiting for free space in the buffer. The event will be intentionally dropped. This mode is typically used when performance is the highest priority, and it is preferable to temporarily lose events rather than cause a slowdown in the acceptance/consumption of events.

  11. Advanced Settings (Optional):

    • Endpoint: Custom endpoint for use with AWS-compatible services.

      Example

      http://127.0.0.0:5000/path/to/service

    • Compression: Compression algorithm to use for the request body. Default: Gzip compression

      Options
      Description

      Gzip compression

      DEFLATE compression with headers for file storage

      No compression

      Data stored and transmitted in original form

      Zlib compression

      DEFLATE format with minimal wrapper and checksums

    • Filename Extension: The filename extension to use in the object key. This overrides setting the extension based on the configured compression.

      Example

      json

    • Filename Time Format: The timestamp format for the time component of the object key. By default, object keys are appended with timestamp (in epoch seconds) reflecting when the objects are sent to S3. The resulting object key is the key prefix followed by the formatted timestamp, eg: date=2022-07-18/1658176486. Supports strftime specifiers. Default: %s

      Example

      %s

    • Content Encoding: Overrides what content encoding has been applied to the object. Directly comparable to the Content-Encoding HTTP header. If not specified, the compression scheme used dictates this value.

      Example

      gzip

    • Content Type: Overrides the MIME type of the object. Directly comparable to the Content-Type HTTP header. If not specified, the compression scheme used dictates this value. When compression is set to none, the value text/x-log is used.

      Example

      application/gzip

    • Grant Full Control: Grants READ, READ_ACP, and WRITE_ACP permissions on the created objects to the named [grantee]. This allows the grantee to read the created objects and their metadata, as well as read and modify the ACL on the created objects.

      Examples

      79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be

      http://acs.amazonaws.com/groups/global/AllUsers

    • Grant Read: Grants READ permissions on the created objects to the named [grantee]. This allows the grantee to read the created objects and their metadata.

      Examples

      79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be

      http://acs.amazonaws.com/groups/global/AllUsers

    • Grant Read Acp: Grants READ_ACP permissions on the created objects to the named [grantee]. This allows the grantee to read the ACL on the created objects.

      Examples

      79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be

      http://acs.amazonaws.com/groups/global/AllUsers

    • Grant Write Acp: Grants WRITE_ACP permissions on the created objects to the named [grantee]. This allows the grantee to modify the ACL on the created objects.

      Examples

      79a59df900b949e55d96a1e698fbacedfd6e09d98eacf8f8d5218e7cd47ef2be

      http://acs.amazonaws.com/groups/global/AllUsers

    • Server Side Encryption: The Server-side Encryption algorithm used when storing these objects.

      Select from options:

      AES-256 Encryption (SSE-S3)

      AES-256 Encryption managed by AWS KMS (SSE-KMS / SSE-C)

    • Ssekms Key Id: Specifies the ID of the AWS Key Management Service (AWS KMS) symmetrical customer master key (CMK) that is used for the created objects. Only applies when server_side_encryption is configured to use KMS. If not specified, Amazon S3 uses the AWS managed CMK in AWS to protect the data.

      Example

      abcd1234

    • Storage Class: The S3 Storage Class for the created objects. Default: Standard Redundancy

      Select the Option:
      Description

      Glacier Deep Archive

      Lowest cost, long-term archival storage option

      Glacier Flexible Retrieval

      Low-cost archive with flexible access speeds

      Intelligent Tiering

      Automatically moves data to cost tiers

      Infrequently Accessed (Single Availability Zone)

      Low-cost storage in one availability zone

      Reduced Redundancy

      Lower durability, lower cost for duplicates

      Standard Redundancy

      High durability, multi-zone general-purpose storage

      Infrequently Accessed

      Low-cost, high-durability for less access

    • ​​Tags: A list of tag key-value pairs. (Add key-value pairs as needed)

  12. Save and Test Configuration:

    • Save the configuration settings.

    • Send sample data to the S3 bucket and verify that it is stored correctly.

Example Scenarios

HealthCarePlus, a fictitious healthcare enterprise, manages a network of hospitals and telehealth services, generating extensive patient data, compliance logs, and audit trails in JSON and Parquet formats. To support regulatory compliance and long-term data analysis, HealthCarePlus aims to export this telemetry data to an Amazon S3 bucket named healthcareplus-data-archive using the Observo AI platform. The bucket resides in the AWS region us-east-1, and an IAM role with necessary permissions ensures secure data writes. The configuration below outlines the steps to set up the AWS S3 destination in Observo AI, adhering to the required fields specified in the Integration section of the provided document, enabling HealthCarePlus to centralize data for compliance and analytics.

Standard AWS S3 Destination Setup

Here is a standard AWS S3 Destination configuration example. Only the required sections and their associated field updates are displayed in the table below:

General Settings

Field
Value
Description

Name

healthcareplus-s3-archive

Unique identifier for the S3 destination

Description

Export patient and compliance data to S3 for HealthCarePlus

Optional description of the destination

Bucket

healthcareplus-data-archive

Name of the target S3 bucket

Region

us-east-1

AWS region of the S3 bucket

Key Prefix

year=%Y/month=%m/day=%d/

Partitions objects by year, month, and day

ACL

Bucket/Object Owner All Access

Grants full access to bucket and object owner

Encoding

Field
Value
Description

Encoding Codec

Parquet

Encodes events in Parquet format for structured data

Parquet Include Raw Log

True

Includes complete log as observo_record field

Parquet Schema

message root { optional binary patient_id; optional binary timestamp; optional binary event_type; optional binary diagnosis_code; }

Parquet schema for patient data

Encoding Metric Tag Values

Single

Exposes metric tag values as single strings

Encoding Timestamp Format

RFC3339

Formats timestamps in RFC3339 format

Request Configuration

Field
Value
Description

Request Concurrency

Adaptive concurrency

Adjusts parallelism based on system load

Request Rate Limit Duration Secs

1

Time window for rate limiting

Request Rate Limit Num

1000

Maximum requests within the time window

Request Retry Attempts

3

Maximum retries for failed requests

Request Retry Initial Backoff Secs

1

Initial wait time before first retry

Request Retry Max Duration Secs

3600

Maximum wait time between retries

Request Timeout Secs

60

Time before aborting a request

TLS Configuration

Field
Value
Description

TLS CA

/opt/observo/certs/ca.crt

Path to CA certificate for server verification

TLS CRT

/opt/observo/certs/healthcareplus.crt

Path to client certificate for authentication

TLS Key

/opt/observo/certs/healthcareplus.key

Path to private key for authentication

Verify Certificate

True

Enables certificate verification

Verify Hostname

True

Verifies hostname in the TLS certificate

Batching Configuration

Field
Value
Description

Batch Max Bytes

100000000

Maximum batch size (100MB) before flushing

Batch Max Events

1000

Maximum number of events in a batch

Batch Timeout Secs

300

Maximum age of a batch before flushing

Acknowledgements

Field
Value
Description

Acknowledgements Enabled

True

Enables end-to-end acknowledgements for data delivery

Framing

Field
Value
Description

Framing Character Delimited Delimiter

Empty

Not used as newline-delimited framing is selected

Framing Method

Newline Delimited

Frames events with newline characters

Authentication

Field
Value
Description

Auth Access Key Id

AKIAHEALTHCAREPLUS123

AWS access key ID for authentication

Auth Secret Access Key

wJalrXUtnHEALTHCAREPLUSKEY

AWS secret access key for authentication

Auth Assume Role

arn:aws:iam::123456789012:role/healthcareplus-s3-role

IAM role ARN for S3 access

Auth Region

us-east-1

AWS region for STS requests

Auth Load Timeout Secs

30

Timeout for loading credentials

Auth Imds Connect Timeout Seconds

5

Connect timeout for IMDS

Auth Imds Max Attempts

3

Number of IMDS retries for fetching tokens

Auth Imds Read Timeout Seconds

5

Read timeout for IMDS

Buffering Configuration

Field
Value
Description

Buffer Type

Disk

Uses disk-based buffering for reliability

Max Bytes Size

268435488

Maximum buffer size (256MB)

When Full

Block

Applies backpressure when buffer is full

Advanced Settings

Field
Value
Description

Endpoint

None

Uses standard AWS S3 endpoints

Compression

Gzip compression

Applies Gzip compression to request body

Filename Extension

.parquet

Specifies Parquet extension for object keys

Filename Time Format

%s

Timestamps in seconds since Unix epoch

Content Encoding

gzip

Specifies Gzip content encoding

Content Type

application/x-parquet

Specifies Parquet MIME type

Grant Full Control

None

No additional grantees for full control

Grant Read

None

No additional grantees for read access

Grant Read Acp

None

No additional grantees for read ACL access

Grant Write Acp

None

No additional grantees for write ACL access

Server Side Encryption

AES-256 Encryption (SSE-S3)

Uses S3-managed AES-256 encryption

Ssekms Key Id

None

Not used as SSE-S3 is selected

Storage Class

Standard Redundancy

High durability, multi-zone storage

Tags

environment=production, data_type=healthcare

Key-value pairs for object metadata

Additional Configuration

  • Save and Test: Save the configuration and send sample patient data to the healthcareplus-data-archive bucket.

  • Verify data presence in the S3 bucket using the Observo AI Analytics tab to confirm successful data flow.

Outcome

With this configuration, HealthCarePlus successfully exports patient data, compliance logs, and audit trails to its S3 bucket via Observo AI, enabling secure, long-term storage for regulatory compliance and advanced analytics, thereby enhancing operational efficiency and data governance.

Troubleshooting

If issues arise with the AWS S3 destination in Observo AI, use the following steps to diagnose and resolve them:

  • Verify Configuration Settings:

    • Ensure all fields, such as Bucket Name, Region, and Authentication, are correctly entered and match the AWS setup.

    • Confirm that the S3 bucket exists and is accessible in the specified region.

  • Check Authentication:

    • For Auto Authentication, verify that IAM roles, shared credentials, or environment variables are correctly configured.

    • For Manual Authentication, ensure the access key and secret key are valid.

    • For Secret Authentication, confirm the secret is accessible in Observo AI.

  • Validate Permissions:

    • Ensure the credentials have the required permissions:

      • s3:PutObject, s3:ListBucket, s3:GetBucketLocation.

      • If using KMS encryption, verify kms:GenerateDataKey and kms:Decrypt permissions.

    • Check that the IAM role (if used) is correctly assumed.

  • Network and Connectivity:

    • Check for firewall rules, VPC endpoint configurations, or proxy settings that may block access to AWS S3 services.

    • Test connectivity using the AWS CLI with similar proxy configurations to verify access to S3.

  • Common Error Messages:

    • "Access Denied": Indicates insufficient permissions. Verify IAM permissions for the bucket and KMS keys (if used).

    • "Bucket does not exist": Check the bucket name and region. Ensure there are no certificate validation issues.

    • "Inaccessible host": May indicate TLS version mismatches or DNS issues. Ensure the host supports the required TLS version and check DNS settings.

  • Monitor Data:

    • Verify that data is being written to the S3 bucket by checking the bucket contents.

    • Use the Observo AI Analytics tab to monitor data volume and ensure expected throughput.

Issue
Possible Cause
Resolution

Data not written

Incorrect bucket name or region

Verify bucket name and region

Authentication errors

Invalid credentials or role

Check authentication method and permissions

Connectivity issues

Firewall or proxy blocking access

Test network connectivity and VPC endpoints

"Access Denied"

Insufficient permissions

Verify IAM permissions for S3 and KMS

"Bucket does not exist"

Incorrect bucket name or certificate issues

Check bucket name and certificate settings

"Inaccessible host"

TLS or DNS issues

Ensure TLS compatibility and check DNS

Resources

For additional guidance and detailed information, refer to the following resources:

  • AWS Documentation:

  • Best Practices:

    • Refer to general best practices for integrating S3 with data streaming platforms, such as optimizing bucket organization, enabling versioning, and using lifecycle policies for cost

Last updated

Was this helpful?