GCP Cloud Storage Archival

The Observo AI GCP Cloud Storage Archival destination enables scalable, cost-effective storage of telemetry data like logs, metrics, and traces in Google Cloud Storage, supporting formats such as JSON, CSV, and Parquet with secure authentication, compression, and customizable access controls for observability and compliance.

Purpose

The Observo AI GCP Cloud Storage Archival destination enables users to send telemetry data, including logs, metrics, and traces, to Google Cloud Storage for scalable, cost-effective storage and further analysis. This destination supports flexible data formats and integrates seamlessly with Google Cloud's ecosystem, allowing organizations to centralize telemetry data for observability, compliance, and analytics purposes.

Prerequisites

Before configuring the GCP Cloud Storage Archival destination in Observo AI, ensure the following requirements are met:

  • Google Cloud Project:

    • A Google Cloud project must be created and linked to your GCP Cloud Storage instance. It’s recommended to use a dedicated project for isolation, but an existing project can be used if permissions are correctly configured (Create a Google Cloud Project).

    • The Cloud Storage API must be enabled in the project (Enable Cloud Storage API).

    • Configure Essential Contacts for notifications to receive updates from Google Cloud (Manage Notification Contacts).

  • Authentication:

  • GCP Cloud Storage Bucket:

    • Ensure an active GCP Cloud Storage bucket is available for data storage. The bucket must be accessible and properly configured for write operations (Creating Storage Buckets).

    • Verify the bucket’s region aligns with your performance and compliance requirements.

Integration

To configure GCP Cloud Storage Archival as a destination in Observo AI, follow these steps:

  1. Log in to Observo AI:

    • Navigate to the Destinations tab.

    • Click the Add Destinations button and select Create New.

    • Choose GCP Cloud Storage from the list of available destinations to begin configuration.

    • Select use as archival to true

    • Select GCP Cloud Storage Archival

  2. General Settings:

    • Name: Add a unique identifier such as gcp-cloud-storage-1.

    • Description (Optional): Provide a description for the destination.

    • Bucket: The GCS bucket name.

      Example

      my-bucket

      Example

      :------------------------

      /my/path/credentials.json

    • Compression (Optional): Compression configuration. All compression algorithms use the default compression level unless otherwise specified. Default: No compression

      Options
      Description

      Gzip compression

      Widely used DEFLATE-based compression format

      No compression

      No compression applied to data

      Zlib compression

      DEFLATE-based, lightweight compression library

    • Acl (Optional): The Predefined ACL to apply to created objects. For more information, see Predefined ACLs. Default: Bucket/object private to project

      Options
      Description

      Bucket/object can be read by authenticated users

      Any authenticated GCP user can read the object

      Object and bucket owner granted OWNER permission

      The owner of the bucket and the object will have full control (owner access) over the object

      Object is private to bucket owner

      Only the bucket owner can access the object

      Bucket/object are private

      Both the bucket and object are private to the owner

      Bucket/object private to project

      Access is restricted to the project and its members

      Bucket/object can be read publicly

      Anyone can access the object without authentication

    • Filename Append UUID to Timestamp (False): Whether or not to append a UUID v4 token to the end of the object key’s timestamp portion. Ensure uniqueness of name in high performance use cases.

      Example

      For object key `date=2022-07-18/1658176486`, setting this field to `true` would result in an object key that looked like `date=2022-07-18/1658176486-30f6652c-71da-4f9f-800d-a1189c47c547`.

    • Filename Time Format: The timestamp format for the time component of the object key. By default, object keys are appended with a timestamp that reflects when the objects are sent to Cloud Storage, such that the resulting object key is functionally equivalent to joining the key prefix with the formatted timestamp, such as date=2022-07-18/1658176486. This would represent a key_prefix set to date=%F/ and the timestamp of Mon Jul 18 2022 20:34:44 GMT+0000, with the filename_time_format being set to %s, which renders timestamps in seconds since the Unix epoch. Supports the common strftime specifiers found in most languages. When set to an empty string, no timestamp will be appended to the key prefix. Default: %s

      Example

      %s

    • Key Prefix: Prefix to apply to all object keys. Useful for partitioning objects. Must end in / to act as a directory path. Default: year=%Y/month=%m/day=%d/

      Examples

      date=%F/hour=%H/

      year=%Y/month=%m/day=%d/

      application_id={{ application_id }}/date=%F/

      %Y/%m/%d/

      date=%F/

    • Storage Class (Optional): The storage class for created objects. For more information, see the storage classes documentation. Default: Standard

      Options
      Description

      Archive

      Cheapest, for data that is rarely accessed (long-term storage)

      Coldline

      Low-cost storage for infrequently accessed data, but available within milliseconds

      Nearline

      Suitable for data that is accessed less than once a month

      Standard

      For frequently accessed data, offering low latency and high availability

  3. Encoding:

    • Encoding Codec: The codec to use for encoding events. Default: JSON Encoding.

      Options
      Sub-Options

      JSON Encoding

      Pretty JSON (False): Format JSON with indentation and line breaks for better readability. Encoding Avro Schema (Optional): The Avro schema. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format

      logfmt Encoding

      Encoding Avro Schema (Optional): The Avro schema. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format

      Apache Avro Encoding

      Avro Schema: Specify the Apache Avro schema definition for serializing events. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Avro Schema (Optional): The Avro schema. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format

      Newline Delimited JSON Encoding

      Encoding Avro Schema (Optional): The Avro schema. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (Default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format

      No encoding

      Encoding Avro Schema (Optional): The Avro schema. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format

      Plain text encoding

      Encoding Avro Schema (Optional): The Avro schema. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format

      Parquet

      Include Raw Log (False): Capture the complete log message as an additional field(observo_record) apart from the given schema. Examples: In addition to the Parquet schema, there will be a field named "observo_record" in the Parquet file. Parquet Schema: Enter parquet schema for encoding. Examples: message root { optional binary stream; optional binary time; optional group kubernetes { optional binary pod_name; optional binary pod_id; optional binary docker_id; optional binary container_hash; optional binary container_image; optional group labels { optional binary pod-template-hash; } } } Encoding Avro Schema (Optional): The Avro schema. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format

      Common Event Format (CEF)

      CEF Device Event Class ID: Provide a unique identifier for categorizing the type of event (maximum 1023 characters). Example: login-failure CEF Device Product: Specify the product name that generated the event (maximum 63 characters). Example: Log Analyzer CEF Device Vendor: Specify the vendor name that produced the event (maximum 63 characters). Example: Observo CEF Device Version: Specify the version of the product that generated the event (maximum 31 characters). Example: 1.0.0 CEF Extensions (Add): Define custom key-value pairs for additional event data fields in CEF format. CEF Name: Provide a human-readable description of the event (maximum 512 characters). Example: cef.name CEF Severity: Indicate the importance of the event with a value from 0 (lowest) to 10 (highest). Example: 5 CEF Version (Select): Specify which version of the CEF specification to use for formatting. - CEF specification version 0.1 - CEF specification version 1.x Encoding Avro Schema (Optional): The Avro schema. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format

      CSV Format

      CSV Fields (Add): Specify the field names to include as columns in the CSV output and their order. Examples: - timestamp - host - message CSV Buffer Capacity (Optional): Set the internal buffer size (in bytes) used when writing CSV data. Example: 8192 CSV Delimitier (Optional): Set the character that separates fields in the CSV output. Example: , Enable Double Quote Escapes (True): When enabled, quotes in field data are escaped by doubling them. When disabled, an escape character is used instead. CSV Escape Character (Optional): Set the character used to escape quotes when double_quote is disabled. Example: <br> CSV Quote Character (Optional): Set the character used for quoting fields in the CSV output. Example: " CSV Quoting Style (Optional): Control when field values should be wrapped in quote characters. Options: - Always quot all fields - Quote only when necessary - Never use quotes - Quote all non-numeric fields Encoding Avro Schema (Optional): The Avro schema. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format

      Protocol Buffers

      Protobuf Message Type: Specify the fully qualified message type name for Protobuf serialization. Example: package.Message Protobuf Descriptor File: Specify the path to the compiled protobuf descriptor file (.desc). Example: /path/to/descriptor.desc Encoding Avro Schema (Optional): The Avro schema. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format

      Graylog Extended Log Format (GELF)

      Encoding Avro Schema (Optional): The Avro schema. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format

  4. Request Configuration (Optional):

    • Request Concurrency: Configuration for outbound request concurrency. Default: Adaptive concurrency.

      Options
      Description

      Adaptive concurrency

      Adjusts parallelism based on system load

      A fixed concurrency of 1

      Processes one task at a time only

    • Request Rate Limit Duration Secs: The time window used for the rate_limit_num option. Default: 1.

    • Request Rate Limit Num: The maximum number of requests allowed within the rate_limit_duration_secs time window. Default: Unlimited.

    • Request Retry Attempts: The maximum number of retries to make for failed requests. The default, represents an infinite number of retries. Default: Unlimited.

    • Request Retry Initial Backoff Secs: The amount of time to wait in seconds before attempting the first retry for a failed request. After the first retry has failed, the fibonacci sequence will be used to select future backoffs. Default: 1.

    • Request Retry Max Duration Secs: The maximum amount of time to wait between retries. Default: 3600.

    • Request Timeout Secs: The time a request waits before being aborted. It is recommended that this value is not lowered below the service’s internal timeout, as this could create orphaned requests, and duplicate data downstream. Default: 60.

  5. Batching Requirements (Default):

    • Batch Timeout Secs: The maximum age of a batch before it is flushed. Default: 1

    • Batch Max Bytes: The maximum size of a batch that will be processed by a sink. This is based on the uncompressed size of the batched events, before they are serialized / compressed. Default: Empty

    • Batch Max Events: The maximum size of a batch before it is flushed. Default: Empty

  6. Acknowledgement (False):

    • Acknowledgements Enabled (False): Whether or not end-to-end acknowledgements are enabled. When enabled, any source connected to this supporting end-to-end acknowledgements, will wait for events to be acknowledged by the destination before acknowledging them at the source.

  7. Framing (Default):

    • Framing Method: The framing method. Default: Newline Delimeted

      Options
      Descriptions

      Raw Event data (not delimited)

      No framing is applied. This method is best when each event is self-contained.

      Single Character Delimited

      Each event is separated by a specific single character (ASCII value)

      Prefixed with Byte Length

      Each event is prefixed with its byte length, ensuring precise separation between events

      Newline Delimited

      Each event is followed by a newline character (), which is commonly used for logging formats.

    • Framing Character Delimited Delimiter: The ASCII (7-bit) character that delimits byte sequences. Default: (Empty)

  8. TLS Configuration (Optional):

    • TLS CA: Provides the CA (Certificate Authority) certificate in PEM format. This certificate is used to verify the authenticity of the server being connected to during a TLS handshake. If not provided, the system will use the default CA certificates available on the host machine.

    • TLS CRT: The TLS certificate (in PEM format) used to authenticate the client with the GCS endpoint. This is part of the mutual TLS (mTLS) configuration if you are using client authentication.

    • TLS Key Pass: The private key (in PEM format) corresponding to the TLS certificate (TLS CRT). This key is used in combination with the certificate to authenticate the client when establishing a secure connection.

      Examples

      ${KEY_PASS_ENV_VAR}

      PassWord1

    • TLS Verify Certificate (False): Enables certificate verification. Certificates must be valid in terms of not being expired, and being issued by a trusted issuer. This verification operates in a hierarchical manner, checking validity of the certificate, the issuer of that certificate and so on until reaching a root certificate. Relevant for both incoming and outgoing connections. Do NOT set this to false unless you understand the risks of not verifying the validity of certificates.

    • TLS Verify Hostname (False): Enables hostname verification. Hostname used to connect to the remote host must be present in the TLS certificate presented by the remote host, either as the Common Name or as an entry in the Subject Alternative Name extension. Only relevant for outgoing connections. NOT recommended to set this to false unless you understand the risks.

  9. Advanced Settings (Optional):

    • Filename Append UUID to Timestamp (True): Whether or not to append a UUID v4 token to the end of the object key’s timestamp portion. Ensure uniqueness of name in high performance use cases.

      Example

      For object key `date=2022-07-18/1658176486`, setting this field to `true` would result in an object key that looked like `date=2022-07-18/1658176486-30f6652c-71da-4f9f-800d-a1189c47c547`.

    • Filename Extension: The filename extension to use in the object key. If not specified, the extension will be determined by the compression scheme used. Defines the extension to be appended to the object keys, based on the compression or encoding format used. For example, if using Gzip compression, you may set this to .gz, or if using Parquet encoding, it may be .parquet. The extension helps identify the format of the files stored in the GCS bucket. Default: None

    • Metadata (Add as needed): A key/value pair. Allows you to specify additional metadata for each object stored in GCS. Metadata is key-value pairs that can store useful information, such as the source of the data or the encoding format used. This metadata is included with the object and can be queried or used for auditing and monitoring purposes.

  10. Save and Test Configuration:

    • Save the configuration settings.

    • Test the connection to verify that Observo AI can successfully write data to the specified GCP Cloud Storage bucket.

Example Scenarios

MediCare Analytics, a fictitious U.S.-based healthcare enterprise, delivers advanced analytics and observability for hospitals and clinics by collecting telemetry data like patient monitoring logs, system metrics, and application traces to ensure HIPAA compliance, optimize operations, and enhance patient care. To achieve scalable, secure, and cost-effective storage, the company integrates Observo AI with Google Cloud Platform (GCP) Cloud Storage, centralizing telemetry data in the medicare-telemetry-archive bucket using Parquet format with Gzip compression for optimized storage and querying, strict access controls, and unique object naming for high-volume data. Authentication via a service account, TLS for secure data transfer, and tailored configurations ensure high reliability, minimal latency, and regulatory compliance for long-term storage and analytics.

Standard GCP Cloud Storage Archival Destination Setup

Here is a standard GCP Cloud Storage Archival Destination configuration example. Only the required sections and their associated field updates are displayed in the table below:

General Settings

Field
Value
Description

Name

medicare-telemetry-archive

Unique identifier for the destination.

Description

Archival destination for MediCare Analytics telemetry data

Optional description for clarity.

Bucket

medicare-telemetry-archive

The GCS bucket name for storing telemetry data.

Api Key

(Not specified)

Not used; authentication via service account credentials.

Credentials Path

/path/to/medicare-service-account.json

Path to the service account credentials JSON file for authentication.

Compression

Gzip compression

Uses DEFLATE-based compression for efficient storage.

Acl

Object is private to bucket owner

Restricts access to the bucket owner for HIPAA compliance.

Filename Append UUID to Timestamp

True

Appends a UUID v4 token to the timestamp for unique object keys (e.g., date=2025-07-13/1626196486-30f6652c-71da-4f9f-800d-a1189c47c547).

Filename Time Format

%s

Timestamps in seconds since the Unix epoch for object keys.

Key Prefix

year=%Y/month=%m/day=%d/

Organizes objects by year, month, and day for partitioning.

Storage Class

Archive

Cheapest storage class for rarely accessed, long-term data storage.

Encoding

Field
Value
Description

Encoding Codec

Parquet

Uses Parquet format for optimized storage and querying.

Parquet Schema

message root { optional binary stream; optional binary time; optional group patient { optional binary patient_id; optional binary record_type; } }

Defines the Parquet schema for patient telemetry data.

Include Raw Log

True

Captures the complete log message as an additional observo_record field.

Encoding Avro Schema

{ "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }, { "name": "patient_id", "type": "string" }] }

Optional Avro schema for additional serialization compatibility.

Encoding Metric Tag Values

Tags exposed as arrays of strings

Exposes all metric tags as arrays for detailed analytics.

Encoding Timestamp Format

RFC3339 format

Uses RFC3339 for consistent timestamp formatting.

Request Configuration

Field
Value
Description

Request Concurrency

Adaptive concurrency

Adjusts parallelism based on system load for optimal performance.

Request Rate Limit Duration Secs

1

Time window for rate limiting, set to 1 second.

Request Rate Limit Num

1000

Maximum of 1000 requests per second to prevent overloading.

Request Retry Attempts

3

Retries failed requests up to 3 times.

Request Retry Initial Backoff Secs

1

Waits 1 second before the first retry, using Fibonacci sequence for subsequent retries.

Request Retry Max Duration Secs

3600

Maximum wait time between retries, set to 1 hour.

Request Timeout Secs

60

Requests time out after 60 seconds to avoid orphaned requests.

Batching Requirements

Field
Value
Description

Batch Timeout Secs

1

Flushes batches after 1 second to ensure timely data transfer.

Batch Max Bytes

10485760

Maximum batch size of 10MB (uncompressed) to balance throughput and memory usage.

Batch Max Events

1000

Flushes batches after 1000 events to manage batch size.

Acknowledgement

Field
Value
Description

Acknowledgements Enabled

True

Enables end-to-end acknowledgements to ensure data delivery.

Framing

Field
Value
Description

Framing Method

Newline Delimited

Each event is followed by a newline character for compatibility with Parquet.

Framing Character Delimited Delimiter

(Empty)

Not used, as newline delimited is selected.

TLS Configuration

Field
Value
Description

TLS CA

/path/to/ca-cert.pem

CA certificate in PEM format to verify the GCS endpoint.

TLS CRT

/path/to/client-cert.pem

Client certificate in PEM format for mutual TLS authentication.

TLS Key Pass

MediCareTLSKey2025

Private key password for the client certificate.

TLS Verify Certificate

True

Enables certificate verification for secure connections.

TLS Verify Hostname

True

Ensures the hostname matches the certificate for added security.

Advanced Settings

Field
Value
Description

Filename Append UUID to Timestamp

True

Ensures unique object keys by appending a UUID (redundant with General Settings for emphasis).

Filename Extension

.parquet

Specifies Parquet file extension for stored objects.

Metadata

source=patient_monitoring, format=parquet

Adds metadata for auditing and identifying data source/format.

Test Configuration

  • Test the connection to verify that Observo AI can write data to the medicare-telemetry-archive bucket.

Troubleshooting

If you encounter issues with the GCP Cloud Storage Archival destination, use the following steps to diagnose and resolve them:

  • Verify Service Account Permissions:

    • Ensure the service account has the "Storage Object Admin" role. Check the IAM page in the Google Cloud Console and enable the "Include Google-provided role grants" option to view the service account such as [email protected].

  • Check Connection Status:

    • In the Observo AI interface, verify the destination’s connection status to confirm it is active.

  • Review Logs:

    • Check Observo AI logs for errors or warnings related to data transmission to GCP Cloud Storage.

  • Validate Bucket Configuration:

    • Confirm the bucket exists, is accessible, and matches the specified region and name.

  • Check Data Format:

    • Ensure the selected encoding format such as JSON, Parquet is compatible with downstream processes.

  • Proxy Configuration:

    • If using a proxy, verify the proxy settings are correctly configured (Proxy Configuration).

  • Test Data Flow:

    • Send sample data and verify it appears in the GCP Cloud Storage bucket.

  • Monitor Data Volume:

    • Use the Analytics tab in the Observo AI pipeline to monitor data volume and ensure expected throughput.

Issue
Possible Cause
Resolution

Data not reaching bucket

Incorrect service account credentials

Verify the JSON key file and permissions

Connection errors

Cloud Storage API not enabled or wrong region

Enable Cloud Storage API and confirm bucket region

Serialization errors

Incorrect encoding format

Ensure correct codec such as JSON, Parquet

Slow data transfer

Backpressure or rate limiting

Adjust batching settings or check GCP quotas

Resources

For additional guidance and detailed information, refer to the following resources:

Last updated

Was this helpful?