Flow Analysis Intermediate
Flow analysis transforms raw NetFlow/IPFIX data into meaningful intelligence about who is talking to whom, how much bandwidth they consume, and whether the traffic patterns are normal.
Top-N Analysis
Identify the largest consumers of network resources:
Python
import pandas as pd # Analyze top talkers from NetFlow data flows = pd.read_parquet('netflow_records.parquet') # Top 10 source IPs by total bytes top_sources = flows.groupby('src_ip')['bytes'].sum().nlargest(10) # Top 10 application ports by flow count top_apps = flows.groupby('dst_port').size().nlargest(10) # Top 10 conversation pairs by bandwidth top_pairs = flows.groupby(['src_ip', 'dst_ip'])['bytes'].sum().nlargest(10)
Traffic Profiling
Build profiles of normal traffic to detect deviations. Key dimensions include:
- Volume profile — Typical bytes/flows per hour, day, week
- Application mix — Percentage breakdown by protocol and port
- Geographic patterns — Normal destination countries and ASNs
- Session characteristics — Typical flow duration and packet counts
Flow-Based Anomaly Detection
Detect unusual patterns that may indicate security threats or performance issues:
| Anomaly Type | Detection Method | Indicator |
|---|---|---|
| DDoS Attack | Volume spike to single destination | Sudden 10x increase in flows to one IP |
| Port Scan | One source, many destination ports | 100+ unique dst ports from single src |
| Data Exfiltration | Unusual outbound volume | Large upload to unusual destination |
| Lateral Movement | New internal communication pairs | Server talking to servers it never contacted |
Pro Tip: Store flow data in columnar formats (Parquet, ClickHouse) for fast analytical queries. Row-based databases become impractical with millions of flow records per hour.
Next Step
Learn how to visualize network analytics data effectively with dashboards and charts.
Next: Visualization →
Lilly Tech Systems