Advanced

Best Practices

Follow industry best practices for experiment tracking, reproducibility, model management, monitoring, and avoiding common ML mistakes.

Experiment Tracking

Track every experiment systematically to compare results and reproduce successes:

Log all hyperparameters, metrics, and artifacts for every run.
Use tools like MLflow, Weights & Biases, or Neptune.ai.
Tag experiments with meaningful names and descriptions.
Save the exact code version (git commit hash) used for each experiment.

Reproducibility

Python

import random
import numpy as np
import torch

def set_seed(seed=42):
    """Set all random seeds for reproducibility."""
    random.seed(seed)
    np.random.seed(seed)
    torch.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.deterministic = True
    torch.backends.cudnn.benchmark = False

set_seed(42)

# Pin exact library versions
# pip freeze > requirements.txt

Feature Store

💡

A feature store is a centralized repository for storing, sharing, and serving ML features. It ensures consistency between training and serving, prevents feature duplication, and enables feature reuse across teams. Tools include Feast, Tecton, and Hopsworks.

Model Registry

A model registry tracks model versions through their lifecycle:

Python

import mlflow

# Register a model
mlflow.register_model("runs:/<run_id>/model", "ProductionClassifier")

# Transition model stages
client = mlflow.tracking.MlflowClient()
client.transition_model_version_stage(
    name="ProductionClassifier",
    version=3,
    stage="Production"
)

A/B Testing Models

Before fully replacing a model in production, run A/B tests:

Route a small percentage of traffic to the new model (canary deployment).
Compare business metrics (not just ML metrics) between old and new models.
Ensure statistical significance before declaring a winner.
Monitor for unexpected side effects on downstream systems.

Monitoring Drift

Type	What Changes	Detection
Data Drift	Input feature distributions	KS test, PSI, distribution plots
Concept Drift	Relationship between X and y	Monitor prediction accuracy over time
Label Drift	Target variable distribution	Track target distribution changes

Common Mistakes

⚠

Data leakage: Using test data information during training (e.g., fitting scaler on full dataset before splitting).
Not stratifying: Random splits can create unbalanced folds for imbalanced datasets.
Optimizing the wrong metric: Accuracy on imbalanced data is misleading.
No baseline model: Always compare against a simple baseline (majority class, mean prediction).
Ignoring feature importance: Understanding which features matter leads to better models.
Not versioning data: Code versioning without data versioning breaks reproducibility.

Frequently Asked Questions

Use traditional ML (sklearn, XGBoost) for tabular data, small-to-medium datasets, and when interpretability matters. Use deep learning for images, text, audio, and very large datasets where feature engineering is impractical.

It depends on how fast your data distribution changes. E-commerce recommendations may need daily retraining, while fraud detection might retrain weekly. Monitor performance metrics and retrain when they degrade below your threshold.

PyTorch is more popular in research and increasingly in industry due to its Pythonic API and dynamic graphs. TensorFlow has stronger deployment tools (TF Serving, TFLite). For new projects, PyTorch is generally recommended unless you need specific TF ecosystem features.

Options include: oversampling the minority class (SMOTE), undersampling the majority class, using class weights in the loss function, using stratified sampling for train/test splits, or optimizing for F1/AUC instead of accuracy.

Data leakage occurs when information from the test set "leaks" into training. Common causes: fitting preprocessors on the full dataset, using future data for prediction, or including target-derived features. Prevention: always split data first, use sklearn Pipelines, and think carefully about what information would be available at prediction time.

← PreviousML Pipeline