AWS VPC Flow Logs

The AWS VPC Flow Logs Optimizer in Observo AI allows users to perform various optimizations including smart summarization on AWS VPC Flow Logs data.

Purpose

The AWS VPC Flow Logs Optimizer refines raw VPC Flow Logs data by applying advanced aggregation techniques, ensuring that only high-value, actionable information is forwarded for analysis. It supports various aggregations—including smart summarization—to reduce noise and volume while highlighting critical network flow patterns. This optimization improves query performance, lowers storage costs, and enhances the overall efficiency of SIEM integration for robust security monitoring.

Usage

Select AWS VPC Flow Logs Optimizer transform. Add Name (required) and Description (optional).

General Configuration:

  • Bypass Transform: Defaults to disable. When enabled, this transform will be bypassed entirely, allowing the event to pass through without any modifications.

  • Add Filter Conditions: Defaults to disable. When enabled, it allows events to filter through conditions. Only events that meet the true condition will be processed; all others will bypass this transform. Based on AND/OR conditions, "+Rule" or "+Group" buttons.

AWS VPC Flow Logs Optimizer: Drop Fields (pulldown):

  • Enabled: Defaults to enabled, meaning it does evaluate all events. Toggle Enabled off to prevent event processing to feed data to the downstream Transforms.

  • Fields to Drop: Add field names which can be dropped. Click the Add button to add a new field to drop.

Filter Traffic (pulldown):

  • Enabled: Defaults to enabled, meaning it does evaluate all events. Toggle Enabled off to prevent event processing to feed data to the downstream Transforms.

  • Filter events with non OK log_status: Defaults to enabled. Filters out non OK log_status. This effectively removes SKIPDATA and NODATA log_status events. Toggle Enabled off to prevent filtering. For more info follow AWS.

  • Filter traffic within private subnets: Defaults to enabled. Filter traffic within private subnets. Toggle Enabled off to prevent filtering.

  • Filter traffic for CIDR pairs: Set of event fields to evaluate and add/set. Click Add button to add new field as a key-value pair, with the following inputs:

    • First CIDR: First CIDR. Examples: 192.168.2.1/16.

    • Second CIDR: Second CIDR. Examples: 192.168.2.1/16.

Smart Summarization (pulldown):

  • Enabled: Defaults to enabled, meaning it does evaluate all events. Toggle Enabled off to prevent event processing to feed data to the downstream Transforms.

  • Aggregation interval(seconds): Aggregation interval(seconds) to use for summarization. Default: 60.

Aggregation (pulldown):

  • Enabled: Defaults to Disabled, meaning it does NOT evaluate all events. Toggle Enabled on to allow event processing to feed data to the downstream Transforms.

  • Field names to Aggregate By: A comma separate list of columns to group by and merge. Use the Add button to add as needed.

    Default Examples

    srcaddr

    dstaddr

    srcport

    dstport

    action

  • Max Events: The maximum number of events to group together. Default 100.

  • Flush Time(seconds): The maximum amount of time in seconds to wait before flushing events to Destination. Default 30.

  • Aggregation Methods: Set of event fields to evaluate and add/set. Default: start. Click Add button to add new field as a key-value pair, with the following inputs:

    • Field Name: The name of the field whose value is being aggregated.

    • Aggregation Method: The method used to aggregate the values of the field. For example, if the field is an integer, you can sum the values, or keep the maximum value. If the field is a string, you can keep the first value, or keep the latest value. Here are the possible methods:

      • Keep first value.

      • Keep last value.

      • Keep maximum value

      • Keep minimum value

      • Sum values

      Field Name (Defaults)
      Aggregation Method

      start

      Keep minimum value

      end

      Keep maximum value

      protocol

      Keep last value

      tcp_flags

      Keep maximum value

      traffic_path

      Keep last value

      version

      Keep maximum value

Examples

Aggregate Logs

Scenario: Aggregate a set of fields based on (1) Field Names to Aggregate By and the (2) AggregationMethods settings. Max Events and Flush Time(seconds) control the frequency.

Aggregation (Pulldown)

Enabled
Max Events
Flush Time(seconds)

Toggled to enabled

100

30

Field Names to Aggregate By

srcaddr

dstaddr

srcport

dstport

Aggregation Methods

Field Name
Aggregation Method

srcaddr

Keep first value

dstaddr

Keep first value

scrport

Keep first value

dstport

Keep first value

start

Keep first value

end

Keep last value

packets

Sum values

bytes

Sum values

Input (fields within the 3 log entries)

srcaddr
dstaddr
srcport
dstport
start
end
packets
bytes

Ip1

Ip2

32000

8080

01:00:23

01:00:25

10

200

Ip1

Ip2

32000

8080

01:00:24

01:00:27

5

100

Ip1

Ip2

40000

8080

01:00:23

01:00:24

5

50

Output (Aggregated log entries)

srcaddr
dstaddr
srcport
dstport
start
end
packets
bytes

Ip1

Ip2

32000

8080

01:00:23

01:00:27

15

300

Ip1

Ip2

40000

8080

01:00:23

01:00:24

5

50

Results: Aggregate the log entry fields match and sum associated packets and bytes fields.

Smart Summarization

Smart summarization involves the process of data summarizing network flows through the identification of ephemeral ports within a VPC network flow.

Consider the scenario where two IP addresses, namely 'ip1' and 'ip2,' are engaged in communication. Let's assume that 'ip2' serves as a server, actively listening on port 8080, while 'ip1' initiates the connection.

Within the same capture window for a flow log, there can be multiple instances of network interactions between 'ip1' and 'ip2.' However, what remains constant in all of these interactions is that 'ip1' is communicating with 'ip2' on port 8080. Other details, such as the ephemeral ports used for this communication, become less significant.

Utilizing this insight, we can treat these flows uniformly as instances of 'ip1' communicating with 'ip2' on port 8080.

The original data, which includes source and destination addresses (srcaaddr, dstaddr), source and destination ports (srcport, dstport), start and end times, as well as packet and byte counts, appears as follows:

srcaddr
dstaddr
srcport
dstport
start
end
packets
bytes

Ip1

Ip2

32456

8080

01:00:23

01:00:25

15

200

Ip1

Ip2

32458

8080

01:00:24

01:00:27

5

100

After summarization, the data is transformed into the following format:

srcaddr
dstaddr
srcport
dstport
start
end
packets
bytes

Ip1

Ip2

-

8080

01:00:23

01:00:27

15

300

Before aggregation, we organize the flow logs based on their start times. As incoming flow logs have irregular timestamps at the start, each flow log entry start timestamp is aligned to the nearest boundary (e.g., 01:00, 02:00, 03:00, etc.).

During the aggregation process, the earliest start time and the latest end time are selected. For aggregating packet and byte counts, the method used is addition.

AWS VPC Flow Logs Optimizer Best Practices

Here’s a breakdown of best practices when using Observo AI’s VPC Flow Logs Optimizer, which leverages techniques like dropping fields, filtering traffic, smart summarization, and aggregation:

  1. Drop Fields

  • Identify Low-Value Data: Review the default 29 fields emitted by VPC flow logs and determine which ones are not used for your security, troubleshooting, or compliance needs.

  • Early Data Reduction: Drop extraneous fields at the ingestion stage to reduce data volume and processing cost without impacting key insights.

  1. Filter Traffic

  • Focus on High-Value Flows: Set rules to exclude internal or redundant traffic that does not contribute to your analytical objectives.

  • Tailor Filtering by Context: Use criteria like subnet, CIDR ranges, or specific interface IDs to drop traffic that is known to be “noisy” or irrelevant.

  • Reduce Unnecessary Log Entries: For example, filter out flows with minimal activity or those that simply indicate “NODATA” events (if applicable), ensuring that your logs only include actionable traffic.

  1. Smart Summarization

  • Automated Flow Grouping: Leverage ML-powered smart summarization to automatically identify network flows (using the key tuple: source IP, source port, destination IP, destination port, and protocol).

  • Volume Reduction: By aggregating similar flows, you can reduce log volume by over 80% while preserving important statistics like packet counts, bytes transferred, and time ranges.

  • Zero-Click Efficiency: This feature works without manual intervention, meaning your system continually adapts and maintains high-level insight with lower data noise.

  1. Aggregation

  • Custom Aggregation Semantics: In addition to smart summarization, provide options for custom aggregations that let you define how network flows should be grouped based on your domain or infrastructure specifics.

  • Improved Query Performance: Aggregated data not only reduces storage costs but also speeds up downstream queries and analysis, as smaller, summarized datasets are much faster to process.

Overall Recommendations

  • Combine Techniques for Maximum Efficiency: By first dropping non-essential fields and filtering out low-value traffic, you minimize the volume before applying smart summarization and aggregation.

  • Automate Where Possible: Use Observo AI’s dynamic pipelines that automatically adjust to the incoming data, reducing the need for constant manual tuning and boosting developer productivity.

  • Retain Analytical Integrity: Ensure that any reduction in data volume does not compromise critical insights required for security monitoring, troubleshooting, or cost analysis.

These best practices help you achieve a more efficient observability pipeline, lower storage and processing costs, and improve the overall performance of your AWS VPC Flow Log analysis.

  • Cloudtrail Optimizer: Transform group to process AWS Cloudtrail events.

  • GCP Flow Logs: Optimize VPC flow logs using this transform.

Last updated

Was this helpful?