Intermediate

Ongoing Model Monitoring

Implement continuous model performance monitoring with drift detection, threshold management, automated alerting, and structured escalation procedures.

Monitoring Dimensions

DimensionWhat to MonitorAlert Trigger
PerformanceAccuracy, AUC, precision, recallMetric drops below threshold
Data DriftInput feature distributionsPSI or KS statistic exceeds limit
Concept DriftRelationship between features and targetPrediction distribution shift
StabilityScore distribution consistencyPopulation stability index change
OperationalLatency, errors, throughputSLA breach or error spike
Key Principle: Monitoring should detect problems before they materialize as business losses. Set early warning thresholds well before the point where model output would become unreliable for decision-making.

Drift Detection Methods

  1. Population Stability Index (PSI)

    Compare the distribution of model scores or input features between a reference period and the current period. PSI above 0.25 typically indicates significant drift.

  2. Kolmogorov-Smirnov Test

    Statistical test comparing cumulative distributions of features between training and production data to detect distributional changes.

  3. Feature Importance Drift

    Monitor changes in feature importance rankings over time. Significant reordering may indicate fundamental changes in data relationships.

  4. Prediction Distribution Monitoring

    Track the distribution of model predictions. Shifts in prediction distribution without corresponding input changes suggest concept drift.

Escalation Framework

  • Level 1 - Watch: Metrics approaching thresholds. Increase monitoring frequency and notify model owner
  • Level 2 - Warning: Thresholds breached. Conduct root cause analysis, engage validation team, implement compensating controls
  • Level 3 - Action Required: Significant degradation. Initiate model review, consider fallback to previous version or manual process
  • Level 4 - Critical: Model producing harmful outputs. Immediately deactivate model, switch to contingency process, notify senior management

Monitoring Infrastructure

Automated Reports

Scheduled monitoring reports for model owners, validation teams, and risk committees with standardized metrics and trend analysis.

Real-Time Dashboards

Live dashboards showing model health across the portfolio with drill-down capability to individual model metrics.

Alert Management

Configurable alerting with severity levels, routing rules, acknowledgment tracking, and escalation timelines.

Outcome Tracking

Long-term outcome tracking to compare predictions against realized results as ground truth becomes available.

💡
Next Up: In the final lesson, we will cover best practices for extending SR 11-7 to modern AI/ML models and building a model risk culture.