GCP Cloud Storage Archival
The Observo AI GCP Cloud Storage Archival destination enables scalable, cost-effective storage of telemetry data like logs, metrics, and traces in Google Cloud Storage, supporting formats such as JSON, CSV, and Parquet with secure authentication, compression, and customizable access controls for observability and compliance.
Purpose
The Observo AI GCP Cloud Storage Archival destination enables users to send telemetry data, including logs, metrics, and traces, to Google Cloud Storage for scalable, cost-effective storage and further analysis. This destination supports flexible data formats and integrates seamlessly with Google Cloud's ecosystem, allowing organizations to centralize telemetry data for observability, compliance, and analytics purposes.
Prerequisites
Before configuring the GCP Cloud Storage Archival destination in Observo AI, ensure the following requirements are met:
Google Cloud Project:
A Google Cloud project must be created and linked to your GCP Cloud Storage instance. It’s recommended to use a dedicated project for isolation, but an existing project can be used if permissions are correctly configured (Create a Google Cloud Project).
The Cloud Storage API must be enabled in the project (Enable Cloud Storage API).
Configure Essential Contacts for notifications to receive updates from Google Cloud (Manage Notification Contacts).
Authentication:
Set up authentication using a service account with the "Storage Object Admin" role to allow Observo AI to write to GCP Cloud Storage buckets (Service Accounts).
Obtain a service account JSON key file for authentication (Creating and Managing Service Account Keys).
Optionally, configure Google Cloud Identity or a third-party Identity Provider (IdP) for enhanced security (Configure Cloud Identity, Configure Third-Party IdP).
GCP Cloud Storage Bucket:
Ensure an active GCP Cloud Storage bucket is available for data storage. The bucket must be accessible and properly configured for write operations (Creating Storage Buckets).
Verify the bucket’s region aligns with your performance and compliance requirements.
Integration
To configure GCP Cloud Storage Archival as a destination in Observo AI, follow these steps:
Log in to Observo AI:
Navigate to the Destinations tab.
Click the Add Destinations button and select Create New.
Choose GCP Cloud Storage from the list of available destinations to begin configuration.
Select use as archival to true
Select GCP Cloud Storage Archival
General Settings:
Name: Add a unique identifier such as gcp-cloud-storage-1.
Description (Optional): Provide a description for the destination.
Bucket: The GCS bucket name.
Examplemy-bucket
Example
:------------------------
/my/path/credentials.json
Compression (Optional): Compression configuration. All compression algorithms use the default compression level unless otherwise specified. Default: No compression
OptionsDescriptionGzip compression
Widely used DEFLATE-based compression format
No compression
No compression applied to data
Zlib compression
DEFLATE-based, lightweight compression library
Acl (Optional): The Predefined ACL to apply to created objects. For more information, see Predefined ACLs. Default: Bucket/object private to project
OptionsDescriptionBucket/object can be read by authenticated users
Any authenticated GCP user can read the object
Object and bucket owner granted OWNER permission
The owner of the bucket and the object will have full control (owner access) over the object
Object is private to bucket owner
Only the bucket owner can access the object
Bucket/object are private
Both the bucket and object are private to the owner
Bucket/object private to project
Access is restricted to the project and its members
Bucket/object can be read publicly
Anyone can access the object without authentication
Filename Append UUID to Timestamp (False): Whether or not to append a UUID v4 token to the end of the object key’s timestamp portion. Ensure uniqueness of name in high performance use cases.
ExampleFor object key `date=2022-07-18/1658176486`, setting this field to `true` would result in an object key that looked like `date=2022-07-18/1658176486-30f6652c-71da-4f9f-800d-a1189c47c547`.
Filename Time Format: The timestamp format for the time component of the object key. By default, object keys are appended with a timestamp that reflects when the objects are sent to Cloud Storage, such that the resulting object key is functionally equivalent to joining the key prefix with the formatted timestamp, such as date=2022-07-18/1658176486. This would represent a key_prefix set to date=%F/ and the timestamp of Mon Jul 18 2022 20:34:44 GMT+0000, with the filename_time_format being set to %s, which renders timestamps in seconds since the Unix epoch. Supports the common strftime specifiers found in most languages. When set to an empty string, no timestamp will be appended to the key prefix. Default: %s
Example%s
Key Prefix: Prefix to apply to all object keys. Useful for partitioning objects. Must end in / to act as a directory path. Default: year=%Y/month=%m/day=%d/
Examplesdate=%F/hour=%H/
year=%Y/month=%m/day=%d/
application_id={{ application_id }}/date=%F/
%Y/%m/%d/
date=%F/
Storage Class (Optional): The storage class for created objects. For more information, see the storage classes documentation. Default: Standard
OptionsDescriptionArchive
Cheapest, for data that is rarely accessed (long-term storage)
Coldline
Low-cost storage for infrequently accessed data, but available within milliseconds
Nearline
Suitable for data that is accessed less than once a month
Standard
For frequently accessed data, offering low latency and high availability
Encoding:
Encoding Codec: The codec to use for encoding events. Default: JSON Encoding.
OptionsSub-OptionsJSON Encoding
Pretty JSON (False): Format JSON with indentation and line breaks for better readability. Encoding Avro Schema (Optional): The Avro schema. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format
logfmt Encoding
Encoding Avro Schema (Optional): The Avro schema. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format
Apache Avro Encoding
Avro Schema: Specify the Apache Avro schema definition for serializing events. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Avro Schema (Optional): The Avro schema. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format
Newline Delimited JSON Encoding
Encoding Avro Schema (Optional): The Avro schema. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (Default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format
No encoding
Encoding Avro Schema (Optional): The Avro schema. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format
Plain text encoding
Encoding Avro Schema (Optional): The Avro schema. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format
Parquet
Include Raw Log (False): Capture the complete log message as an additional field(observo_record) apart from the given schema. Examples: In addition to the Parquet schema, there will be a field named "observo_record" in the Parquet file. Parquet Schema: Enter parquet schema for encoding. Examples: message root { optional binary stream; optional binary time; optional group kubernetes { optional binary pod_name; optional binary pod_id; optional binary docker_id; optional binary container_hash; optional binary container_image; optional group labels { optional binary pod-template-hash; } } } Encoding Avro Schema (Optional): The Avro schema. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format
Common Event Format (CEF)
CEF Device Event Class ID: Provide a unique identifier for categorizing the type of event (maximum 1023 characters). Example: login-failure CEF Device Product: Specify the product name that generated the event (maximum 63 characters). Example: Log Analyzer CEF Device Vendor: Specify the vendor name that produced the event (maximum 63 characters). Example: Observo CEF Device Version: Specify the version of the product that generated the event (maximum 31 characters). Example: 1.0.0 CEF Extensions (Add): Define custom key-value pairs for additional event data fields in CEF format. CEF Name: Provide a human-readable description of the event (maximum 512 characters). Example: cef.name CEF Severity: Indicate the importance of the event with a value from 0 (lowest) to 10 (highest). Example: 5 CEF Version (Select): Specify which version of the CEF specification to use for formatting. - CEF specification version 0.1 - CEF specification version 1.x Encoding Avro Schema (Optional): The Avro schema. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format
CSV Format
CSV Fields (Add): Specify the field names to include as columns in the CSV output and their order. Examples: - timestamp - host - message CSV Buffer Capacity (Optional): Set the internal buffer size (in bytes) used when writing CSV data. Example: 8192 CSV Delimitier (Optional): Set the character that separates fields in the CSV output. Example: , Enable Double Quote Escapes (True): When enabled, quotes in field data are escaped by doubling them. When disabled, an escape character is used instead. CSV Escape Character (Optional): Set the character used to escape quotes when double_quote is disabled. Example: <br> CSV Quote Character (Optional): Set the character used for quoting fields in the CSV output. Example: " CSV Quoting Style (Optional): Control when field values should be wrapped in quote characters. Options: - Always quot all fields - Quote only when necessary - Never use quotes - Quote all non-numeric fields Encoding Avro Schema (Optional): The Avro schema. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format
Protocol Buffers
Protobuf Message Type: Specify the fully qualified message type name for Protobuf serialization. Example: package.Message Protobuf Descriptor File: Specify the path to the compiled protobuf descriptor file (.desc). Example: /path/to/descriptor.desc Encoding Avro Schema (Optional): The Avro schema. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format
Graylog Extended Log Format (GELF)
Encoding Avro Schema (Optional): The Avro schema. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format
Request Configuration (Optional):
Request Concurrency: Configuration for outbound request concurrency. Default: Adaptive concurrency.
OptionsDescriptionAdaptive concurrency
Adjusts parallelism based on system load
A fixed concurrency of 1
Processes one task at a time only
Request Rate Limit Duration Secs: The time window used for the rate_limit_num option. Default: 1.
Request Rate Limit Num: The maximum number of requests allowed within the rate_limit_duration_secs time window. Default: Unlimited.
Request Retry Attempts: The maximum number of retries to make for failed requests. The default, represents an infinite number of retries. Default: Unlimited.
Request Retry Initial Backoff Secs: The amount of time to wait in seconds before attempting the first retry for a failed request. After the first retry has failed, the fibonacci sequence will be used to select future backoffs. Default: 1.
Request Retry Max Duration Secs: The maximum amount of time to wait between retries. Default: 3600.
Request Timeout Secs: The time a request waits before being aborted. It is recommended that this value is not lowered below the service’s internal timeout, as this could create orphaned requests, and duplicate data downstream. Default: 60.
Batching Requirements (Default):
Batch Timeout Secs: The maximum age of a batch before it is flushed. Default: 1
Batch Max Bytes: The maximum size of a batch that will be processed by a sink. This is based on the uncompressed size of the batched events, before they are serialized / compressed. Default: Empty
Batch Max Events: The maximum size of a batch before it is flushed. Default: Empty
Acknowledgement (False):
Acknowledgements Enabled (False): Whether or not end-to-end acknowledgements are enabled. When enabled, any source connected to this supporting end-to-end acknowledgements, will wait for events to be acknowledged by the destination before acknowledging them at the source.
Framing (Default):
Framing Method: The framing method. Default: Newline Delimeted
OptionsDescriptionsRaw Event data (not delimited)
No framing is applied. This method is best when each event is self-contained.
Single Character Delimited
Each event is separated by a specific single character (ASCII value)
Prefixed with Byte Length
Each event is prefixed with its byte length, ensuring precise separation between events
Newline Delimited
Each event is followed by a newline character (), which is commonly used for logging formats.
Framing Character Delimited Delimiter: The ASCII (7-bit) character that delimits byte sequences. Default: (Empty)
TLS Configuration (Optional):
TLS CA: Provides the CA (Certificate Authority) certificate in PEM format. This certificate is used to verify the authenticity of the server being connected to during a TLS handshake. If not provided, the system will use the default CA certificates available on the host machine.
TLS CRT: The TLS certificate (in PEM format) used to authenticate the client with the GCS endpoint. This is part of the mutual TLS (mTLS) configuration if you are using client authentication.
TLS Key Pass: The private key (in PEM format) corresponding to the TLS certificate (TLS CRT). This key is used in combination with the certificate to authenticate the client when establishing a secure connection.
Examples${KEY_PASS_ENV_VAR}
PassWord1
TLS Verify Certificate (False): Enables certificate verification. Certificates must be valid in terms of not being expired, and being issued by a trusted issuer. This verification operates in a hierarchical manner, checking validity of the certificate, the issuer of that certificate and so on until reaching a root certificate. Relevant for both incoming and outgoing connections. Do NOT set this to false unless you understand the risks of not verifying the validity of certificates.
TLS Verify Hostname (False): Enables hostname verification. Hostname used to connect to the remote host must be present in the TLS certificate presented by the remote host, either as the Common Name or as an entry in the Subject Alternative Name extension. Only relevant for outgoing connections. NOT recommended to set this to false unless you understand the risks.
Advanced Settings (Optional):
Filename Append UUID to Timestamp (True): Whether or not to append a UUID v4 token to the end of the object key’s timestamp portion. Ensure uniqueness of name in high performance use cases.
ExampleFor object key `date=2022-07-18/1658176486`, setting this field to `true` would result in an object key that looked like `date=2022-07-18/1658176486-30f6652c-71da-4f9f-800d-a1189c47c547`.
Filename Extension: The filename extension to use in the object key. If not specified, the extension will be determined by the compression scheme used. Defines the extension to be appended to the object keys, based on the compression or encoding format used. For example, if using Gzip compression, you may set this to .gz, or if using Parquet encoding, it may be .parquet. The extension helps identify the format of the files stored in the GCS bucket. Default: None
Metadata (Add as needed): A key/value pair. Allows you to specify additional metadata for each object stored in GCS. Metadata is key-value pairs that can store useful information, such as the source of the data or the encoding format used. This metadata is included with the object and can be queried or used for auditing and monitoring purposes.
Save and Test Configuration:
Save the configuration settings.
Test the connection to verify that Observo AI can successfully write data to the specified GCP Cloud Storage bucket.
Example Scenarios
MediCare Analytics, a fictitious U.S.-based healthcare enterprise, delivers advanced analytics and observability for hospitals and clinics by collecting telemetry data like patient monitoring logs, system metrics, and application traces to ensure HIPAA compliance, optimize operations, and enhance patient care. To achieve scalable, secure, and cost-effective storage, the company integrates Observo AI with Google Cloud Platform (GCP) Cloud Storage, centralizing telemetry data in the medicare-telemetry-archive bucket using Parquet format with Gzip compression for optimized storage and querying, strict access controls, and unique object naming for high-volume data. Authentication via a service account, TLS for secure data transfer, and tailored configurations ensure high reliability, minimal latency, and regulatory compliance for long-term storage and analytics.
Standard GCP Cloud Storage Archival Destination Setup
Here is a standard GCP Cloud Storage Archival Destination configuration example. Only the required sections and their associated field updates are displayed in the table below:
General Settings
Name
medicare-telemetry-archive
Unique identifier for the destination.
Description
Archival destination for MediCare Analytics telemetry data
Optional description for clarity.
Bucket
medicare-telemetry-archive
The GCS bucket name for storing telemetry data.
Api Key
(Not specified)
Not used; authentication via service account credentials.
Credentials Path
/path/to/medicare-service-account.json
Path to the service account credentials JSON file for authentication.
Compression
Gzip compression
Uses DEFLATE-based compression for efficient storage.
Acl
Object is private to bucket owner
Restricts access to the bucket owner for HIPAA compliance.
Filename Append UUID to Timestamp
True
Appends a UUID v4 token to the timestamp for unique object keys (e.g., date=2025-07-13/1626196486-30f6652c-71da-4f9f-800d-a1189c47c547).
Filename Time Format
%s
Timestamps in seconds since the Unix epoch for object keys.
Key Prefix
year=%Y/month=%m/day=%d/
Organizes objects by year, month, and day for partitioning.
Storage Class
Archive
Cheapest storage class for rarely accessed, long-term data storage.
Encoding
Encoding Codec
Parquet
Uses Parquet format for optimized storage and querying.
Parquet Schema
message root { optional binary stream; optional binary time; optional group patient { optional binary patient_id; optional binary record_type; } }
Defines the Parquet schema for patient telemetry data.
Include Raw Log
True
Captures the complete log message as an additional observo_record field.
Encoding Avro Schema
{ "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }, { "name": "patient_id", "type": "string" }] }
Optional Avro schema for additional serialization compatibility.
Encoding Metric Tag Values
Tags exposed as arrays of strings
Exposes all metric tags as arrays for detailed analytics.
Encoding Timestamp Format
RFC3339 format
Uses RFC3339 for consistent timestamp formatting.
Request Configuration
Request Concurrency
Adaptive concurrency
Adjusts parallelism based on system load for optimal performance.
Request Rate Limit Duration Secs
1
Time window for rate limiting, set to 1 second.
Request Rate Limit Num
1000
Maximum of 1000 requests per second to prevent overloading.
Request Retry Attempts
3
Retries failed requests up to 3 times.
Request Retry Initial Backoff Secs
1
Waits 1 second before the first retry, using Fibonacci sequence for subsequent retries.
Request Retry Max Duration Secs
3600
Maximum wait time between retries, set to 1 hour.
Request Timeout Secs
60
Requests time out after 60 seconds to avoid orphaned requests.
Batching Requirements
Batch Timeout Secs
1
Flushes batches after 1 second to ensure timely data transfer.
Batch Max Bytes
10485760
Maximum batch size of 10MB (uncompressed) to balance throughput and memory usage.
Batch Max Events
1000
Flushes batches after 1000 events to manage batch size.
Acknowledgement
Acknowledgements Enabled
True
Enables end-to-end acknowledgements to ensure data delivery.
Framing
Framing Method
Newline Delimited
Each event is followed by a newline character for compatibility with Parquet.
Framing Character Delimited Delimiter
(Empty)
Not used, as newline delimited is selected.
TLS Configuration
TLS CA
/path/to/ca-cert.pem
CA certificate in PEM format to verify the GCS endpoint.
TLS CRT
/path/to/client-cert.pem
Client certificate in PEM format for mutual TLS authentication.
TLS Key Pass
MediCareTLSKey2025
Private key password for the client certificate.
TLS Verify Certificate
True
Enables certificate verification for secure connections.
TLS Verify Hostname
True
Ensures the hostname matches the certificate for added security.
Advanced Settings
Filename Append UUID to Timestamp
True
Ensures unique object keys by appending a UUID (redundant with General Settings for emphasis).
Filename Extension
.parquet
Specifies Parquet file extension for stored objects.
Metadata
source=patient_monitoring, format=parquet
Adds metadata for auditing and identifying data source/format.
Test Configuration
Test the connection to verify that Observo AI can write data to the medicare-telemetry-archive bucket.
Troubleshooting
If you encounter issues with the GCP Cloud Storage Archival destination, use the following steps to diagnose and resolve them:
Verify Service Account Permissions:
Ensure the service account has the "Storage Object Admin" role. Check the IAM page in the Google Cloud Console and enable the "Include Google-provided role grants" option to view the service account such as [email protected].
Check Connection Status:
In the Observo AI interface, verify the destination’s connection status to confirm it is active.
Review Logs:
Check Observo AI logs for errors or warnings related to data transmission to GCP Cloud Storage.
Validate Bucket Configuration:
Confirm the bucket exists, is accessible, and matches the specified region and name.
Check Data Format:
Ensure the selected encoding format such as JSON, Parquet is compatible with downstream processes.
Proxy Configuration:
If using a proxy, verify the proxy settings are correctly configured (Proxy Configuration).
Test Data Flow:
Send sample data and verify it appears in the GCP Cloud Storage bucket.
Monitor Data Volume:
Use the Analytics tab in the Observo AI pipeline to monitor data volume and ensure expected throughput.
Data not reaching bucket
Incorrect service account credentials
Verify the JSON key file and permissions
Connection errors
Cloud Storage API not enabled or wrong region
Enable Cloud Storage API and confirm bucket region
Serialization errors
Incorrect encoding format
Ensure correct codec such as JSON, Parquet
Slow data transfer
Backpressure or rate limiting
Adjust batching settings or check GCP quotas
Resources
For additional guidance and detailed information, refer to the following resources:
Last updated
Was this helpful?

