GCP Cloud Storage

The GCP Cloud Storage configuration allows you to write events into Google Cloud Storage buckets. This destination supports compression, batching, TLS, and customizable metadata for objects. Below are the detailed configuration parameters to set up a GCP Cloud Storage destination.

Purpose

The Observo AI GCP Cloud Storage destination enables users to send telemetry data, including logs, metrics, and traces, to Google Cloud Storage for scalable, cost-effective storage and further analysis. This destination supports flexible data formats and integrates seamlessly with Google Cloud's ecosystem, allowing organizations to centralize telemetry data for observability, compliance, and analytics purposes.

Prerequisites

Before configuring the GCP Cloud Storage destination in Observo AI, ensure the following requirements are met:

  • Google Cloud Project:

    • A Google Cloud project must be created and linked to your GCP Cloud Storage instance. It’s recommended to use a dedicated project for isolation, but an existing project can be used if permissions are correctly configured (Create a Google Cloud Project).

    • The Cloud Storage API must be enabled in the project (Enable Cloud Storage API).

    • Configure Essential Contacts for notifications to receive updates from Google Cloud (Manage Notification Contacts).

  • Authentication:

  • GCP Cloud Storage Bucket:

    • Ensure an active GCP Cloud Storage bucket is available for data storage. The bucket must be accessible and properly configured for write operations (Creating Storage Buckets).

    • Verify the bucket’s region aligns with your performance and compliance requirements.

Integration

To configure GCP Cloud Storage as a destination in Observo AI, follow these steps:

  1. Log in to Observo AI:

    • Navigate to the Destinations tab.

    • Click the Add Destinations button and select Create New.

    • Choose GCP Cloud Storage from the list of available destinations to begin configuration.

  2. General Settings:

    • Name: Add a unique identifier such as gcp-cloud-storage-1.

    • Description (Optional): Provide a description for the destination.

    • Bucket: The GCS bucket name.

      Example

      my-bucket

    • Api Key (Optional): An API key. Either an API key, or a path to a service account credentials JSON file can be specified. If both are unset, the GOOGLE_APPLICATION_CREDENTIALS environment variable is checked for a filename. If no filename is named, an attempt is made to fetch an instance service account for the compute instance the program is running on. If this is not on a GCE instance, then you must define it with an API key or service account credentials JSON file.

    • Credentials Path: Path to a [service account] credentials JSON file. Either an API key, or a path to a service account credentials JSON file can be specified. If both are unset, the GOOGLE_APPLICATION_CREDENTIALS environment variable is checked for a filename. If no filename is named, an attempt is made to fetch an instance service account for the compute instance the program is running on. If this is not on a GCE instance, then you must define it with an API key or service account credentials JSON file.

      Example

      /my/path/credentials.json

    • Compression (Optional): Compression configuration. All compression algorithms use the default compression level unless otherwise specified. Default: No compression

      Options
      Description

      Gzip compression

      Widely used DEFLATE-based compression format

      No compression

      No compression applied to data

      Zlib compression

      DEFLATE-based, lightweight compression library

    • Acl (Optional): The Predefined ACL to apply to created objects. For more information, see Predefined ACLs. Default: Bucket/object private to project

      Options
      Description

      Bucket/object can be read by authenticated users

      Any authenticated GCP user can read the object

      Object and bucket owner granted OWNER permission

      The owner of the bucket and the object will have full control (owner access) over the object

      Object is private to bucket owner

      Only the bucket owner can access the object

      Bucket/object are private

      Both the bucket and object are private to the owner

      Bucket/object private to project

      Access is restricted to the project and its members

      Bucket/object can be read publicly

      Anyone can access the object without authentication

    • Filename Append UUID to Timestamp (False): Whether or not to append a UUID v4 token to the end of the object key’s timestamp portion. Ensure uniqueness of name in high performance use cases.

      Example

      For object key `date=2022-07-18/1658176486`, setting this field to `true` would result in an object key that looked like `date=2022-07-18/1658176486-30f6652c-71da-4f9f-800d-a1189c47c547`.

    • Filename Time Format: The timestamp format for the time component of the object key. By default, object keys are appended with a timestamp that reflects when the objects are sent to Cloud Storage, such that the resulting object key is functionally equivalent to joining the key prefix with the formatted timestamp, such as date=2022-07-18/1658176486. This would represent a key_prefix set to date=%F/ and the timestamp of Mon Jul 18 2022 20:34:44 GMT+0000, with the filename_time_format being set to %s, which renders timestamps in seconds since the Unix epoch. Supports the common strftime specifiers found in most languages. When set to an empty string, no timestamp will be appended to the key prefix. Default: %s

      Example

      %s

    • Key Prefix: Prefix to apply to all object keys. Useful for partitioning objects. Must end in / to act as a directory path. Default: year=%Y/month=%m/day=%d/

      Examples

      date=%F/hour=%H/

      year=%Y/month=%m/day=%d/

      application_id={{ application_id }}/date=%F/

      %Y/%m/%d/

      date=%F/

    • Storage Class (Optional): The storage class for created objects. For more information, see the storage classes documentation. Default: Standard

      Options
      Description

      Archive

      Cheapest, for data that is rarely accessed (long-term storage)

      Coldline

      Low-cost storage for infrequently accessed data, but available within milliseconds

      Nearline

      Suitable for data that is accessed less than once a month

      Standard

      For frequently accessed data, offering low latency and high availability

  3. Acknowledgement (False):

    • Acknowledgements Enabled (False): Whether or not end-to-end acknowledgements are enabled. When enabled, any source connected to this supporting end-to-end acknowledgements, will wait for events to be acknowledged by the destination before acknowledging them at the source.

  4. Encoding:

    • Encoding Codec: The codec to use for encoding events. Default: JSON Encoding.

      Options
      Sub-Options

      JSON Encoding

      Pretty JSON (False): Format JSON with indentation and line breaks for better readability. Encoding Avro Schema (Optional): The Avro schema. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format

      logfmt Encoding

      Encoding Avro Schema (Optional): The Avro schema. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format

      Apache Avro Encoding

      Avro Schema: Specify the Apache Avro schema definition for serializing events. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Avro Schema (Optional): The Avro schema. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format

      Newline Delimited JSON Encoding

      Encoding Avro Schema (Optional): The Avro schema. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (Default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format

      No encoding

      Encoding Avro Schema (Optional): The Avro schema. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format

      Plain text encoding

      Encoding Avro Schema (Optional): The Avro schema. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format

      Parquet

      Include Raw Log (False): Capture the complete log message as an additional field(observo_record) apart from the given schema. Examples: In addition to the Parquet schema, there will be a field named "observo_record" in the Parquet file. Parquet Schema: Enter parquet schema for encoding. Examples: message root { optional binary stream; optional binary time; optional group kubernetes { optional binary pod_name; optional binary pod_id; optional binary docker_id; optional binary container_hash; optional binary container_image; optional group labels { optional binary pod-template-hash; } } } Encoding Avro Schema (Optional): The Avro schema. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format

      Common Event Format (CEF)

      CEF Device Event Class ID: Provide a unique identifier for categorizing the type of event (maximum 1023 characters). Example: login-failure CEF Device Product: Specify the product name that generated the event (maximum 63 characters). Example: Log Analyzer CEF Device Vendor: Specify the vendor name that produced the event (maximum 63 characters). Example: Observo CEF Device Version: Specify the version of the product that generated the event (maximum 31 characters). Example: 1.0.0 CEF Extensions (Add): Define custom key-value pairs for additional event data fields in CEF format. CEF Name: Provide a human-readable description of the event (maximum 512 characters). Example: cef.name CEF Severity: Indicate the importance of the event with a value from 0 (lowest) to 10 (highest). Example: 5 CEF Version (Select): Specify which version of the CEF specification to use for formatting. - CEF specification version 0.1 - CEF specification version 1.x Encoding Avro Schema (Optional): The Avro schema. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format

      CSV Format

      CSV Fields (Add): Specify the field names to include as columns in the CSV output and their order. Examples: - timestamp - host - message CSV Buffer Capacity (Optional): Set the internal buffer size (in bytes) used when writing CSV data. Example: 8192 CSV Delimitier (Optional): Set the character that separates fields in the CSV output. Example: , Enable Double Quote Escapes (True): When enabled, quotes in field data are escaped by doubling them. When disabled, an escape character is used instead. CSV Escape Character (Optional): Set the character used to escape quotes when double_quote is disabled. Example: <br> CSV Quote Character (Optional): Set the character used for quoting fields in the CSV output. Example: " CSV Quoting Style (Optional): Control when field values should be wrapped in quote characters. Options: - Always quot all fields - Quote only when necessary - Never use quotes - Quote all non-numeric fields Encoding Avro Schema (Optional): The Avro schema. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format

      Protocol Buffers

      Protobuf Message Type: Specify the fully qualified message type name for Protobuf serialization. Example: package.Message Protobuf Descriptor File: Specify the path to the compiled protobuf descriptor file (.desc). Example: /path/to/descriptor.desc Encoding Avro Schema (Optional): The Avro schema. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format

      Graylog Extended Log Format (GELF)

      Encoding Avro Schema (Optional): The Avro schema. Example: { "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }] } Encoding Metric Tag Values (Select): Controls how metric tag values are encoded. - Tag values will be exposed as single strings (default) - Tags exposed as arrays of strings Note: When set to single, only the last non-bare value of tags will be displayed with the metric. When set to full, all metric tags will be exposed as separate assignments. Encoding Timestamp Format (Select): - RFC3339 format - UNIX format

  5. Request Configuration (Optional):

    • Request Concurrency: Configuration for outbound request concurrency. Default: Adaptive concurrency.

      Options
      Description

      Adaptive concurrency

      Adjusts parallelism based on system load

      A fixed concurrency of 1

      Processes one task at a time only

    • Request Rate Limit Duration Secs: The time window used for the rate_limit_num option. Default: 1.

    • Request Rate Limit Num: The maximum number of requests allowed within the rate_limit_duration_secs time window. Default: Unlimited.

    • Request Retry Attempts: The maximum number of retries to make for failed requests. The default, represents an infinite number of retries. Default: Unlimited.

    • Request Retry Initial Backoff Secs: The amount of time to wait in seconds before attempting the first retry for a failed request. After the first retry has failed, the fibonacci sequence will be used to select future backoffs. Default: 1.

    • Request Retry Max Duration Secs: The maximum amount of time to wait between retries. Default: 3600.

    • Request Timeout Secs: The time a request waits before being aborted. It is recommended that this value is not lowered below the service’s internal timeout, as this could create orphaned requests, and duplicate data downstream. Default: 60.

  6. Batching Requirements (Default):

    • Batch Timeout Secs: The maximum age of a batch before it is flushed. Default: 1

    • Batch Max Bytes: The maximum size of a batch that will be processed by a sink. This is based on the uncompressed size of the batched events, before they are serialized / compressed. Default: Empty

    • Batch Max Events: The maximum size of a batch before it is flushed. Default: Empty

  7. Framing (Default):

    • Framing Method: The framing method. Default: Newline Delimeted

      Options
      Descriptions

      Raw Event data (not delimited)

      No framing is applied. This method is best when each event is self-contained.

      Single Character Delimited

      Each event is separated by a specific single character (ASCII value)

      Prefixed with Byte Length

      Each event is prefixed with its byte length, ensuring precise separation between events

      Newline Delimited

      Each event is followed by a newline character (), which is commonly used for logging formats.

    • Framing Character Delimited Delimiter: The ASCII (7-bit) character that delimits byte sequences. Default: (Empty)

  8. TLS Configuration (Optional):

    • TLS CA: Provides the CA (Certificate Authority) certificate in PEM format. This certificate is used to verify the authenticity of the server being connected to during a TLS handshake. If not provided, the system will use the default CA certificates available on the host machine.

    • TLS CRT: The TLS certificate (in PEM format) used to authenticate the client with the GCS endpoint. This is part of the mutual TLS (mTLS) configuration if you are using client authentication.

    • TLS Key Pass: The private key (in PEM format) corresponding to the TLS certificate (TLS CRT). This key is used in combination with the certificate to authenticate the client when establishing a secure connection.

      Examples

      ${KEY_PASS_ENV_VAR}

      PassWord1

    • TLS Verify Certificate (False): Enables certificate verification. Certificates must be valid in terms of not being expired, and being issued by a trusted issuer. This verification operates in a hierarchical manner, checking validity of the certificate, the issuer of that certificate and so on until reaching a root certificate. Relevant for both incoming and outgoing connections. Do NOT set this to false unless you understand the risks of not verifying the validity of certificates.

    • TLS Verify Hostname (False): Enables hostname verification. Hostname used to connect to the remote host must be present in the TLS certificate presented by the remote host, either as the Common Name or as an entry in the Subject Alternative Name extension. Only relevant for outgoing connections. NOT recommended to set this to false unless you understand the risks.

  9. Advanced Settings (Optional):

    • Filename Extension: The filename extension to use in the object key. If not specified, the extension will be determined by the compression scheme used. Defines the extension to be appended to the object keys, based on the compression or encoding format used. For example, if using Gzip compression, you may set this to .gz, or if using Parquet encoding, it may be .parquet. The extension helps identify the format of the files stored in the GCS bucket. Default: None

    • Metadata (Add as needed): A key/value pair. Allows you to specify additional metadata for each object stored in GCS. Metadata is key-value pairs that can store useful information, such as the source of the data or the encoding format used. This metadata is included with the object and can be queried or used for auditing and monitoring purposes.

  10. Save and Test Configuration:

    • Save the configuration settings.

    • Test the connection to verify that Observo AI can successfully write data to the specified GCP Cloud Storage bucket.

Example Scenarios

FinSecure, a financial services enterprise, manages a vast portfolio of transactional data, compliance logs, and fraud detection metrics generated from its trading platforms and customer banking systems. To ensure regulatory compliance and enable advanced analytics, FinSecure aims to send these telemetry data, stored in JSON and Parquet formats, to a Google Cloud Storage (GCS) bucket named finsecure-telemetry-data using the Observo AI platform. The bucket is configured within a dedicated Google Cloud project, finsecure-project-2025, with a service account assigned the "Storage Object Admin" role for secure write operations. The configuration below outlines the steps to set up the GCS destination in Observo AI, adhering to the required fields specified in the Integration section of the provided document, enabling FinSecure to centralize data for observability, compliance, and fraud analysis.

Standard GCP Cloud Storage Destination Setup

Here is a standard GCP Cloud Storage Destination configuration example. Only the required sections and their associated field updates are displayed in the table below:

General Settings

Field
Value
Description

Name

finsecure-gcs-telemetry

Unique identifier for the GCS destination

Description

Store transactional and compliance telemetry in GCS for FinSecure

Optional description of the destination

Bucket

finsecure-telemetry-data

GCS bucket name for data storage

Api Key

None

Uses service account credentials instead

Credentials Path

/opt/observo/credentials/finsecure-service-account.json

Path to the service account JSON key file

Compression

Gzip

Applies Gzip compression to stored objects

Acl

Bucket/object private to project

Restricts access to the project and its members

Filename Append UUID to Timestamp

True

Appends UUID to timestamp for unique object keys

Filename Time Format

%s

Timestamps in seconds since Unix epoch

Key Prefix

year=%Y/month=%m/day=%d/

Partitions objects by year, month, and day

Storage Class

Nearline

Suitable for infrequently accessed compliance data

Acknowledgement

Field
Value
Description

Acknowledgements Enabled

True

Enables end-to-end acknowledgements for data delivery

Encoding

Field
Value
Description

Encoding Codec

Parquet

Encodes events in Parquet format for structured data

Parquet Include Raw Log

True

Includes complete log as observo_record field

Parquet Schema

message root { optional binary transaction_id; optional binary timestamp; optional binary account_id; optional binary amount; optional binary fraud_score; }

Parquet schema for transactional data

Encoding Avro Schema

{ "type": "record", "name": "log", "fields": [{ "name": "message", "type": "string" }, { "name": "transaction_id", "type": "string" }, { "name": "timestamp", "type": "string" }, { "name": "account_id", "type": "string" }, { "name": "amount", "type": "double" }, { "name": "fraud_score", "type": "double" }] }

Avro schema for additional serialization

Encoding Metric Tag Values

Single

Exposes metric tag values as single strings

Encoding Timestamp Format

RFC3339

Formats timestamps in RFC3339 format

Request Configuration

Field
Value
Description

Request Concurrency

Adaptive concurrency

Adjusts parallelism based on system load

Request Rate Limit Duration Secs

1

Time window for rate limiting

Request Rate Limit Num

1000

Maximum requests within the time window

Request Retry Attempts

3

Maximum retries for failed requests

Request Retry Initial Backoff Secs

1

Initial wait time before first retry

Request Retry Max Duration Secs

3600

Maximum wait time between retries

Request Timeout Secs

60

Time before aborting a request

Batching Configuration

Field
Value
Description

Batch Timeout Secs

1

Maximum age of a batch before flushing

Batch Max Bytes

10485760

Maximum batch size (10MB) before flushing

Batch Max Events

1000

Maximum number of events in a batch

Framing

Field
Value
Description

Framing Method

Newline Delimited

Frames events with newline characters

Framing Character Delimited Delimiter

Empty

Not used as newline-delimited framing is selected

TLS Configuration

Field
Value
Description

TLS CA

/opt/observo/certs/ca.crt

Path to CA certificate for server verification

TLS CRT

/opt/observo/certs/finsecure.crt

Path to client certificate for mTLS

TLS Key Pass

/opt/observo/certs/finsecure.key

Path to private key for mTLS

TLS Verify Certificate

True

Enables certificate verification

TLS Verify Hostname

True

Verifies hostname in the TLS certificate

TLS Key Pass

FinSecure2025

Passphrase to unlock the encrypted key file

Advanced Settings

Field
Value
Description

Filename Extension

.parquet

Specifies Parquet extension for object keys

Metadata

source=finsecure, encoding=parquet

Adds metadata for auditing and querying

Additional Configuration

  • Save and Test: Save the configuration and send sample transactional data to the finsecure-telemetry-data bucket. Verify data presence in the GCS bucket using the Observo AI Analytics tab to confirm successful data flow.

Outcome

With this configuration, FinSecure successfully sends transactional data, compliance logs, and fraud detection metrics to its GCS bucket via Observo AI, enabling real-time fraud analysis, regulatory compliance, and optimized financial operations through centralized, scalable storage and advanced analytics.

Troubleshooting

If you encounter issues with the GCP Cloud Storage destination, use the following steps to diagnose and resolve them:

  • Verify Service Account Permissions:

    • Ensure the service account has the "Storage Object Admin" role. Check the IAM page in the Google Cloud Console and enable the "Include Google-provided role grants" option to view the service account such as [email protected].

  • Check Connection Status:

    • In the Observo AI interface, verify the destination’s connection status to confirm it is active.

  • Review Logs:

    • Check Observo AI logs for errors or warnings related to data transmission to GCP Cloud Storage.

  • Validate Bucket Configuration:

    • Confirm the bucket exists, is accessible, and matches the specified region and name.

  • Check Data Format:

    • Ensure the selected encoding format such as JSON, Parquet is compatible with downstream processes.

  • Proxy Configuration:

    • If using a proxy, verify the proxy settings are correctly configured (Proxy Configuration).

  • Test Data Flow:

    • Send sample data and verify it appears in the GCP Cloud Storage bucket.

  • Monitor Data Volume:

    • Use the Analytics tab in the Observo AI pipeline to monitor data volume and ensure expected throughput.

Issue
Possible Cause
Resolution

Data not reaching bucket

Incorrect service account credentials

Verify the JSON key file and permissions

Connection errors

Cloud Storage API not enabled or wrong region

Enable Cloud Storage API and confirm bucket region

Serialization errors

Incorrect encoding format

Ensure correct codec such as JSON, Parquet

Slow data transfer

Backpressure or rate limiting

Adjust batching settings or check GCP quotas

Resources

For additional guidance and detailed information, refer to the following resources:

Last updated

Was this helpful?