GCP Flow Logs

The GCP Flow Logs Optimizer in Observo AI allows users to perform various aggregations, including smart summarization, on GCP Flow Logs data.

Purpose

The GCP Flow Logs Optimizer is designed to optimize and manage the vast amounts of data generated by Google Cloud Platform's Virtual Private Cloud (VPC) Flow Logs. VPC Flow Logs capture samples of network traffic to and from virtual machine (VM) instances, providing insights into network activities, which are essential for security analysis, troubleshooting, and compliance monitoring.

GCP VPC Flow Logs records a sample of network flows sent from and received by VM instances, including instances used as Google Kubernetes Engine nodes.

For detailed information on Flow Logs, you can refer to the documentation provided at this link.

Usage

Select GCP Flow Logs Optimizer transform. Add Name (required) and Description (optional).

General Configuration:

  • Bypass Transform: Defaults to disable. When enabled, this transform will be bypassed entirely, allowing the event to pass through without any modifications.

  • Add Filter Conditions: Defaults to disable. When enabled, it allows events to filter through conditions. Only events that meet the true condition will be processed; all others will bypass this transform. Based on AND/OR conditions, "+Rule" or "+Group" buttons.

GCP Flow Logs Optimizer: Filter Traffic (pulldown):

  • Enabled: Defaults to enabled, meaning it does evaluate all events. Toggle Enabled off to prevent event processing to feed data to the downstream Transforms.

  • Filter traffic within private subnets: Defaults to enabled. Filter traffic within private subnets. Toggle Enabled off to prevent filtering.

  • Filter traffic for CIDR pairs: Set of event fields to evaluate and add/set. Click Add button to add new field as a key-value pair, with the following inputs:

    • First CIDR: First CIDR. Examples: 192.168.2.1/16.

    • Second CIDR: Second CIDR. Examples: 192.168.2.1/16.

    Smart Summarization (pulldown):

  • Enabled: Defaults to enabled, meaning it does evaluate all events. Toggle Enabled off to prevent event processing to feed data to the downstream Transforms.

  • Aggregation interval(seconds): Aggregation interval(seconds) to use for summarization. Default: 60.

Examples

Smart Summarization

Smart summarization involves the process of data summarizing network flows through the identification of ephemeral ports within a VPC network flow.

Consider the scenario where two IP addresses, namely 'ip1' and 'ip2,' are engaged in communication. Let's assume that 'ip2' serves as a server, actively listening on port 8080, while 'ip1' initiates the connection.

Within the same capture window for a flow log, there can be multiple instances of network interactions between 'ip1' and 'ip2.' However, what remains constant in all of these interactions is that 'ip1' is communicating with 'ip2' on port 8080. Other details, such as the ephemeral ports used for this communication, become less significant.

Utilizing this insight, we can treat these flows uniformly as instances of 'ip1' communicating with 'ip2' on port 8080.

The original data, which includes source and destination addresses (srcaaddr, dstaddr), source and destination ports (srcport, dstport), start and end times, as well as packet and byte counts, appears as follows:

srcaddr
dstaddr
srcport
dstport
start
end
packets
bytes

Ip1

Ip2

32456

8080

01:00:23

01:00:25

15

200

Ip1

Ip2

32458

8080

01:00:24

01:00:27

5

100

After summarization, the data is transformed into the following format:

srcaddr
dstaddr
srcport
dstport
start
end
packets
bytes

Ip1

Ip2

-

8080

01:00:23

01:00:27

15

300

Before aggregation, we organize the flow logs based on their start times. As incoming flow logs have irregular timestamps at the start, each flow log entry start timestamp is aligned to the nearest boundary (e.g., 01:00, 02:00, 03:00, etc.).

During the aggregation process, the earliest start time and the latest end time are selected. For aggregating packet and byte counts, the method used is addition.

Note that, the data and column names here are not completely identical to the GCP VPC Flow Logs format. These are used for illustration purposes.

GCP Flow Logs Optimizer Best Practices

GCP Flow Logs using Observo AI’s Optimizer focus on filtering traffic and smart summarization:

Filter Traffic:

  • Define High‐Value Traffic:

    • Start by identifying which flows are meaningful for your security, compliance, or performance objectives.

    • Exclude internal or routine flows (for example, traffic among internal services) that add noise without analytical benefit.

  • Apply Granular Filtering:

    • Use metadata such as source/destination IPs, ports, or subnet ranges to set precise rules.

    • Discard flows that match “low-value” criteria such as very short sessions, heartbeats, or flows that show no anomalous behavior.

  • Automate Filtering in Ingestion Pipelines:

    • Implement rules at the data ingestion stage so that only high-value traffic is preserved, lowering storage and processing costs without compromising critical insights.

Smart Summarization:

  • Leverage Machine Learning for Summarization:

    • Allow the optimizer to automatically group similar network flows based on a key tuple such as source IP, source port, destination IP, destination port, and protocol.

    • This grouping allows summarization based on total bytes, packet counts, and duration while maintaining context.

  • Preserve Critical Insights:

    • Ensure the summarization process retains essential details for troubleshooting and forensic analysis.

    • Validate that the summarization does not mask anomalies or hide important patterns.

  • Achieve Significant Volume Reduction:

    • The smart summarization engine should aim to reduce data volume substantially (often by 80% or more) without sacrificing the fidelity needed for effective network analysis.

Additional Recommendations

  • Iterate and Fine-Tune: Regularly review your filter rules and summarization parameters to adapt to changing network patterns and business requirements.

  • Balance Efficiency with Visibility: While reducing log volume, confirm that your optimized logs still provide enough context to diagnose issues and monitor performance.

By combining targeted traffic filtering with intelligent summarization, Observo AI’s Optimizer for GCP Flow Logs helps reduce noise, cut costs, and ensure your data pipelines are both efficient and rich in actionable insights.

  • Cloudtrail Optimizer: Transform group to process AWS Cloudtrail events.

  • AWS VPC Flow Logs: Perform various optimizations, including smart summarization, on AWS VPC Flow Logs data.

Last updated

Was this helpful?