GCP Cloud Storage Archival

The Observo AI GCP Cloud Storage Archival destination enables scalable, cost-effective storage of telemetry data like logs, metrics, and traces in Google Cloud Storage, supporting formats such as JSON, CSV, and Parquet with secure authentication, compression, and customizable access controls for observability and compliance.

Purpose

The Observo AI GCP Cloud Storage Archival destination enables users to send telemetry data, including logs, metrics, and traces, to Google Cloud Storage for scalable, cost-effective storage and further analysis. This destination supports flexible data formats and integrates seamlessly with Google Cloud's ecosystem, allowing organizations to centralize telemetry data for observability, compliance, and analytics purposes.

Prerequisites

Before configuring the GCP Cloud Storage Archival destination in Observo AI, ensure the following requirements are met:

Google Cloud Project:
- A Google Cloud project must be created and linked to your GCP Cloud Storage instance. It’s recommended to use a dedicated project for isolation, but an existing project can be used if permissions are correctly configured (Create a Google Cloud Project).
- The Cloud Storage API must be enabled in the project (Enable Cloud Storage API).
- Configure Essential Contacts for notifications to receive updates from Google Cloud (Manage Notification Contacts).
Authentication:
- Set up authentication using a service account with the "Storage Object Admin" role to allow Observo AI to write to GCP Cloud Storage buckets (Service Accounts).
- Obtain a service account JSON key file for authentication (Creating and Managing Service Account Keys).
- Optionally, configure Google Cloud Identity or a third-party Identity Provider (IdP) for enhanced security (Configure Cloud Identity, Configure Third-Party IdP).
GCP Cloud Storage Bucket:
- Ensure an active GCP Cloud Storage bucket is available for data storage. The bucket must be accessible and properly configured for write operations (Creating Storage Buckets).
- Verify the bucket’s region aligns with your performance and compliance requirements.

Integration

To configure GCP Cloud Storage Archival as a destination in Observo AI, follow these steps:

Log in to Observo AI:
- Navigate to the Destinations tab.
- Click the Add Destinations button and select Create New.
- Choose GCP Cloud Storage from the list of available destinations to begin configuration.
- Select use as archival to true
- Select GCP Cloud Storage Archival
General Settings:
- Name: Add a unique identifier such as gcp-cloud-storage-1.
- Description (Optional): Provide a description for the destination.
- Bucket: The GCS bucket name.
  Example
  my-bucket
  Example
  :------------------------
  /my/path/credentials.json
- Compression (Optional): Compression configuration. All compression algorithms use the default compression level unless otherwise specified. Default: No compression
  Options
  Description
  Gzip compression
  Widely used DEFLATE-based compression format
  No compression
  No compression applied to data
  Zlib compression
  DEFLATE-based, lightweight compression library
- Acl (Optional): The Predefined ACL to apply to created objects. For more information, see Predefined ACLs. Default: Bucket/object private to project
  Options
  Description
  Bucket/object can be read by authenticated users
  Any authenticated GCP user can read the object
  Object and bucket owner granted OWNER permission
  The owner of the bucket and the object will have full control (owner access) over the object
  Object is private to bucket owner
  Only the bucket owner can access the object
  Bucket/object are private
  Both the bucket and object are private to the owner
  Bucket/object private to project
  Access is restricted to the project and its members
  Bucket/object can be read publicly
  Anyone can access the object without authentication
- Filename Append UUID to Timestamp (False): Whether or not to append a UUID v4 token to the end of the object key’s timestamp portion. Ensure uniqueness of name in high performance use cases.
  Example
  For object key `date=2022-07-18/1658176486`, setting this field to `true` would result in an object key that looked like `date=2022-07-18/1658176486-30f6652c-71da-4f9f-800d-a1189c47c547`.
- Filename Time Format: The timestamp format for the time component of the object key. By default, object keys are appended with a timestamp that reflects when the objects are sent to Cloud Storage, such that the resulting object key is functionally equivalent to joining the key prefix with the formatted timestamp, such as date=2022-07-18/1658176486. This would represent a key_prefix set to date=%F/ and the timestamp of Mon Jul 18 2022 20:34:44 GMT+0000, with the filename_time_format being set to %s, which renders timestamps in seconds since the Unix epoch. Supports the common strftime specifiers found in most languages. When set to an empty string, no timestamp will be appended to the key prefix. Default: %s
  Example
  %s
- Key Prefix: Prefix to apply to all object keys. Useful for partitioning objects. Must end in / to act as a directory path. Default: year=%Y/month=%m/day=%d/
  Examples
  date=%F/hour=%H/
  year=%Y/month=%m/day=%d/
  application_id={{ application_id }}/date=%F/
  %Y/%m/%d/
  date=%F/
- Storage Class (Optional): The storage class for created objects. For more information, see the storage classes documentation. Default: Standard
  Options
  Description
  Archive
  Cheapest, for data that is rarely accessed (long-term storage)
  Coldline
  Low-cost storage for infrequently accessed data, but available within milliseconds
  Nearline
  Suitable for data that is accessed less than once a month
  Standard
  For frequently accessed data, offering low latency and high availability
Encoding:
- Encoding Codec: The codec to use for encoding events. Default: JSON Encoding.
  Options
  Sub-Options
  JSON Encoding
  Pretty JSON (False): Format JSON with indentation and line breaks for better readability. Encoding Avro Schema (Optional): The Avro schema. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format
  logfmt Encoding
  Encoding Avro Schema (Optional): The Avro schema. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format
  Apache Avro Encoding
  Avro Schema: Specify the Apache Avro schema definition for serializing events. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Avro Schema (Optional): The Avro schema. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format
  Newline Delimited JSON Encoding
  Encoding Avro Schema (Optional): The Avro schema. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (Default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format
  No encoding
  Encoding Avro Schema (Optional): The Avro schema. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format
  Plain text encoding
  Encoding Avro Schema (Optional): The Avro schema. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format
  Parquet
  Include Raw Log (False): Capture the complete log message as an additional field(observo_record) apart from the given schema. Examples: In addition to the Parquet schema, there will be a field named "observo_record" in the Parquet file. Parquet Schema: Enter parquet schema for encoding. Examples: message root { optional binary stream; optional binary time; optional group kubernetes { optional binary pod_name; optional binary pod_id; optional binary docker_id; optional binary container_hash; optional binary container_image; optional group labels { optional binary pod-template-hash; } } } Encoding Avro Schema (Optional): The Avro schema. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format
  Common Event Format (CEF)
  CEF Device Event Class ID: Provide a unique identifier for categorizing the type of event (maximum 1023 characters). Example: login-failure CEF Device Product: Specify the product name that generated the event (maximum 63 characters). Example: Log Analyzer CEF Device Vendor: Specify the vendor name that produced the event (maximum 63 characters). Example: Observo CEF Device Version: Specify the version of the product that generated the event (maximum 31 characters). Example: 1.0.0 CEF Extensions (Add): Define custom key-value pairs for additional event data fields in CEF format. CEF Name: Provide a human-readable description of the event (maximum 512 characters). Example: cef.name CEF Severity: Indicate the importance of the event with a value from 0 (lowest) to 10 (highest). Example: 5 CEF Version (Select): Specify which version of the CEF specification to use for formatting. - CEF specification version 0.1 - CEF specification version 1.x Encoding Avro Schema (Optional): The Avro schema. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format
  CSV Format
  CSV Fields (Add): Specify the field names to include as columns in the CSV output and their order. Examples: - timestamp - host - message CSV Buffer Capacity (Optional): Set the internal buffer size (in bytes) used when writing CSV data. Example: 8192 CSV Delimitier (Optional): Set the character that separates fields in the CSV output. Example: , Enable Double Quote Escapes (True): When enabled, quotes in field data are escaped by doubling them. When disabled, an escape character is used instead. CSV Escape Character (Optional): Set the character used to escape quotes when double_quote is disabled. Example: <br> CSV Quote Character (Optional): Set the character used for quoting fields in the CSV output. Example: " CSV Quoting Style (Optional): Control when field values should be wrapped in quote characters. Options: - Always quot all fields - Quote only when necessary - Never use quotes - Quote all non-numeric fields Encoding Avro Schema (Optional): The Avro schema. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format
  Protocol Buffers
  Protobuf Message Type: Specify the fully qualified message type name for Protobuf serialization. Example: package.Message Protobuf Descriptor File: Specify the path to the compiled protobuf descriptor file (.desc). Example: /path/to/descriptor.desc Encoding Avro Schema (Optional): The Avro schema. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format
  Graylog Extended Log Format (GELF)
  Encoding Avro Schema (Optional): The Avro schema. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format
Request Configuration (Optional):
- Request Concurrency: Configuration for outbound request concurrency. Default: Adaptive concurrency.
  Options
  Description
  Adaptive concurrency
  Adjusts parallelism based on system load
  A fixed concurrency of 1
  Processes one task at a time only
- Request Rate Limit Duration Secs: The time window used for the rate_limit_num option. Default: 1.
- Request Rate Limit Num: The maximum number of requests allowed within the rate_limit_duration_secs time window. Default: Unlimited.
- Request Retry Attempts: The maximum number of retries to make for failed requests. The default, represents an infinite number of retries. Default: Unlimited.
- Request Retry Initial Backoff Secs: The amount of time to wait in seconds before attempting the first retry for a failed request. After the first retry has failed, the fibonacci sequence will be used to select future backoffs. Default: 1.
- Request Retry Max Duration Secs: The maximum amount of time to wait between retries. Default: 3600.
- Request Timeout Secs: The time a request waits before being aborted. It is recommended that this value is not lowered below the service’s internal timeout, as this could create orphaned requests, and duplicate data downstream. Default: 60.
Batching Requirements (Default):
- Batch Timeout Secs: The maximum age of a batch before it is flushed. Default: 1
- Batch Max Bytes: The maximum size of a batch that will be processed by a sink. This is based on the uncompressed size of the batched events, before they are serialized / compressed. Default: Empty
- Batch Max Events: The maximum size of a batch before it is flushed. Default: Empty
Acknowledgement (False):
- Acknowledgements Enabled (False): Whether or not end-to-end acknowledgements are enabled. When enabled, any source connected to this supporting end-to-end acknowledgements, will wait for events to be acknowledged by the destination before acknowledging them at the source.
Framing (Default):
- Framing Method: The framing method. Default: Newline Delimeted
  Options
  Descriptions
  Raw Event data (not delimited)
  No framing is applied. This method is best when each event is self-contained.
  Single Character Delimited
  Each event is separated by a specific single character (ASCII value)
  Prefixed with Byte Length
  Each event is prefixed with its byte length, ensuring precise separation between events
  Newline Delimited
  Each event is followed by a newline character (), which is commonly used for logging formats.
- Framing Character Delimited Delimiter: The ASCII (7-bit) character that delimits byte sequences. Default: (Empty)
TLS Configuration (Optional):
- TLS CA: Provides the CA (Certificate Authority) certificate in PEM format. This certificate is used to verify the authenticity of the server being connected to during a TLS handshake. If not provided, the system will use the default CA certificates available on the host machine.
- TLS CRT: The TLS certificate (in PEM format) used to authenticate the client with the GCS endpoint. This is part of the mutual TLS (mTLS) configuration if you are using client authentication.
- TLS Key Pass: The private key (in PEM format) corresponding to the TLS certificate (TLS CRT). This key is used in combination with the certificate to authenticate the client when establishing a secure connection.
  Examples
  ${KEY_PASS_ENV_VAR}
  PassWord1
- TLS Verify Certificate (False): Enables certificate verification. Certificates must be valid in terms of not being expired, and being issued by a trusted issuer. This verification operates in a hierarchical manner, checking validity of the certificate, the issuer of that certificate and so on until reaching a root certificate. Relevant for both incoming and outgoing connections. Do NOT set this to false unless you understand the risks of not verifying the validity of certificates.
- TLS Verify Hostname (False): Enables hostname verification. Hostname used to connect to the remote host must be present in the TLS certificate presented by the remote host, either as the Common Name or as an entry in the Subject Alternative Name extension. Only relevant for outgoing connections. NOT recommended to set this to false unless you understand the risks.
Advanced Settings (Optional):
- Filename Append UUID to Timestamp (True): Whether or not to append a UUID v4 token to the end of the object key’s timestamp portion. Ensure uniqueness of name in high performance use cases.
  Example
  For object key `date=2022-07-18/1658176486`, setting this field to `true` would result in an object key that looked like `date=2022-07-18/1658176486-30f6652c-71da-4f9f-800d-a1189c47c547`.
- Filename Extension: The filename extension to use in the object key. If not specified, the extension will be determined by the compression scheme used. Defines the extension to be appended to the object keys, based on the compression or encoding format used. For example, if using Gzip compression, you may set this to .gz, or if using Parquet encoding, it may be .parquet. The extension helps identify the format of the files stored in the GCS bucket. Default: None
- Metadata (Add as needed): A key/value pair. Allows you to specify additional metadata for each object stored in GCS. Metadata is key-value pairs that can store useful information, such as the source of the data or the encoding format used. This metadata is included with the object and can be queried or used for auditing and monitoring purposes.
Save and Test Configuration:
- Save the configuration settings.
- Test the connection to verify that Observo AI can successfully write data to the specified GCP Cloud Storage bucket.

Example Scenarios

MediCare Analytics, a fictitious U.S.-based healthcare enterprise, delivers advanced analytics and observability for hospitals and clinics by collecting telemetry data like patient monitoring logs, system metrics, and application traces to ensure HIPAA compliance, optimize operations, and enhance patient care. To achieve scalable, secure, and cost-effective storage, the company integrates Observo AI with Google Cloud Platform (GCP) Cloud Storage, centralizing telemetry data in the medicare-telemetry-archive bucket using Parquet format with Gzip compression for optimized storage and querying, strict access controls, and unique object naming for high-volume data. Authentication via a service account, TLS for secure data transfer, and tailored configurations ensure high reliability, minimal latency, and regulatory compliance for long-term storage and analytics.

Standard GCP Cloud Storage Archival Destination Setup

Here is a standard GCP Cloud Storage Archival Destination configuration example. Only the required sections and their associated field updates are displayed in the table below:

General Settings

Field

Value

Description

Name

medicare-telemetry-archive

Unique identifier for the destination.

Description

Archival destination for MediCare Analytics telemetry data

Optional description for clarity.

Bucket

medicare-telemetry-archive

The GCS bucket name for storing telemetry data.

Api Key

(Not specified)

Not used; authentication via service account credentials.

Credentials Path

/path/to/medicare-service-account.json

Path to the service account credentials JSON file for authentication.

Compression

Gzip compression

Uses DEFLATE-based compression for efficient storage.

Acl

Object is private to bucket owner

Restricts access to the bucket owner for HIPAA compliance.

Filename Append UUID to Timestamp

True

Appends a UUID v4 token to the timestamp for unique object keys (e.g., date=2025-07-13/1626196486-30f6652c-71da-4f9f-800d-a1189c47c547).

Filename Time Format

Timestamps in seconds since the Unix epoch for object keys.

Key Prefix

year=%Y/month=%m/day=%d/

Organizes objects by year, month, and day for partitioning.

Storage Class

Troubleshooting

If you encounter issues with the GCP Cloud Storage Archival destination, use the following steps to diagnose and resolve them:

Verify Service Account Permissions:
- Ensure the service account has the "Storage Object Admin" role. Check the IAM page in the Google Cloud Console and enable the "Include Google-provided role grants" option to view the service account such as [email protected].
Check Connection Status:
- In the Observo AI interface, verify the destination’s connection status to confirm it is active.
Review Logs:
- Check Observo AI logs for errors or warnings related to data transmission to GCP Cloud Storage.
Validate Bucket Configuration:
- Confirm the bucket exists, is accessible, and matches the specified region and name.
Check Data Format:
- Ensure the selected encoding format such as JSON, Parquet is compatible with downstream processes.
Proxy Configuration:
- If using a proxy, verify the proxy settings are correctly configured (Proxy Configuration).
Test Data Flow:
- Send sample data and verify it appears in the GCP Cloud Storage bucket.
Monitor Data Volume:
- Use the Analytics tab in the Observo AI pipeline to monitor data volume and ensure expected throughput.

Issue

Possible Cause

Resolution

Data not reaching bucket

Incorrect service account credentials

Verify the JSON key file and permissions

Connection errors

Cloud Storage API not enabled or wrong region

Enable Cloud Storage API and confirm bucket region

Serialization errors

Incorrect encoding format

Ensure correct codec such as JSON, Parquet

Slow data transfer

Backpressure or rate limiting

Adjust batching settings or check GCP quotas

Resources

For additional guidance and detailed information, refer to the following resources:

PreviousGCP Cloud Storage NextSentinelOne AI SIEM

Last updated 7 months ago

Was this helpful?

hashtagPurpose

hashtagPrerequisites

hashtagIntegration

hashtagExample Scenarios

hashtagStandard GCP Cloud Storage Archival Destination Setup

hashtagTroubleshooting

hashtagResources