Feature Engineering Intermediate

Feature engineering is the art and science of transforming raw network data into meaningful inputs for ML models. Good features can make a simple model outperform a complex one trained on raw data.

Common Network Features

CategoryRaw DataEngineered Features
Traffic VolumeByte countersBytes/sec, rate of change, rolling average, peak-to-mean ratio
TimingTimestampsHour of day, day of week, is_business_hours, minutes_since_last_event
ErrorsError countersError rate, error ratio (errors/total packets), error trend
FlowsFlow recordsUnique src/dst pairs, flow duration distribution, new flow rate
TopologyDevice connectionsHop count, path diversity, device centrality score

Time-Series Feature Extraction

Python
import pandas as pd

def create_network_features(df):
    """Create ML features from network time-series data."""
    # Rolling statistics
    df['bytes_in_avg_5m'] = df['bytes_in'].rolling(5).mean()
    df['bytes_in_std_5m'] = df['bytes_in'].rolling(5).std()

    # Rate of change
    df['bytes_in_delta'] = df['bytes_in'].diff()
    df['bytes_in_pct_change'] = df['bytes_in'].pct_change()

    # Temporal features
    df['hour'] = df['timestamp'].dt.hour
    df['day_of_week'] = df['timestamp'].dt.dayofweek
    df['is_business_hours'] = df['hour'].between(8, 18).astype(int)

    # Z-score for anomaly detection
    df['bytes_in_zscore'] = (df['bytes_in'] - df['bytes_in'].mean()) / df['bytes_in'].std()
    return df

Feature Selection

Not all features improve model performance. Use these techniques to select the best ones:

  • Correlation Analysis — Remove highly correlated features (redundant information)
  • Feature Importance — Use Random Forest or XGBoost to rank features by predictive power
  • Recursive Feature Elimination — Iteratively remove the least important features
  • Domain Knowledge — Network engineers know which metrics matter most for specific problems
Pro Tip: Combine domain knowledge with automated feature selection. Start with features that make sense from a networking perspective, then use ML techniques to validate and refine your choices.

Next Step

Learn the best practices for deploying and maintaining ML models in network environments.

Next: Best Practices →