Flow Analysis Intermediate

Flow analysis transforms raw NetFlow/IPFIX data into meaningful intelligence about who is talking to whom, how much bandwidth they consume, and whether the traffic patterns are normal.

Top-N Analysis

Identify the largest consumers of network resources:

Python
import pandas as pd

# Analyze top talkers from NetFlow data
flows = pd.read_parquet('netflow_records.parquet')

# Top 10 source IPs by total bytes
top_sources = flows.groupby('src_ip')['bytes'].sum().nlargest(10)

# Top 10 application ports by flow count
top_apps = flows.groupby('dst_port').size().nlargest(10)

# Top 10 conversation pairs by bandwidth
top_pairs = flows.groupby(['src_ip', 'dst_ip'])['bytes'].sum().nlargest(10)

Traffic Profiling

Build profiles of normal traffic to detect deviations. Key dimensions include:

  • Volume profile — Typical bytes/flows per hour, day, week
  • Application mix — Percentage breakdown by protocol and port
  • Geographic patterns — Normal destination countries and ASNs
  • Session characteristics — Typical flow duration and packet counts

Flow-Based Anomaly Detection

Detect unusual patterns that may indicate security threats or performance issues:

Anomaly TypeDetection MethodIndicator
DDoS AttackVolume spike to single destinationSudden 10x increase in flows to one IP
Port ScanOne source, many destination ports100+ unique dst ports from single src
Data ExfiltrationUnusual outbound volumeLarge upload to unusual destination
Lateral MovementNew internal communication pairsServer talking to servers it never contacted
Pro Tip: Store flow data in columnar formats (Parquet, ClickHouse) for fast analytical queries. Row-based databases become impractical with millions of flow records per hour.

Next Step

Learn how to visualize network analytics data effectively with dashboards and charts.

Next: Visualization →