Ongoing Model Monitoring
Implement continuous model performance monitoring with drift detection, threshold management, automated alerting, and structured escalation procedures.
Monitoring Dimensions
| Dimension | What to Monitor | Alert Trigger |
|---|---|---|
| Performance | Accuracy, AUC, precision, recall | Metric drops below threshold |
| Data Drift | Input feature distributions | PSI or KS statistic exceeds limit |
| Concept Drift | Relationship between features and target | Prediction distribution shift |
| Stability | Score distribution consistency | Population stability index change |
| Operational | Latency, errors, throughput | SLA breach or error spike |
Drift Detection Methods
Population Stability Index (PSI)
Compare the distribution of model scores or input features between a reference period and the current period. PSI above 0.25 typically indicates significant drift.
Kolmogorov-Smirnov Test
Statistical test comparing cumulative distributions of features between training and production data to detect distributional changes.
Feature Importance Drift
Monitor changes in feature importance rankings over time. Significant reordering may indicate fundamental changes in data relationships.
Prediction Distribution Monitoring
Track the distribution of model predictions. Shifts in prediction distribution without corresponding input changes suggest concept drift.
Escalation Framework
- Level 1 - Watch: Metrics approaching thresholds. Increase monitoring frequency and notify model owner
- Level 2 - Warning: Thresholds breached. Conduct root cause analysis, engage validation team, implement compensating controls
- Level 3 - Action Required: Significant degradation. Initiate model review, consider fallback to previous version or manual process
- Level 4 - Critical: Model producing harmful outputs. Immediately deactivate model, switch to contingency process, notify senior management
Monitoring Infrastructure
Automated Reports
Scheduled monitoring reports for model owners, validation teams, and risk committees with standardized metrics and trend analysis.
Real-Time Dashboards
Live dashboards showing model health across the portfolio with drill-down capability to individual model metrics.
Alert Management
Configurable alerting with severity levels, routing rules, acknowledgment tracking, and escalation timelines.
Outcome Tracking
Long-term outcome tracking to compare predictions against realized results as ground truth becomes available.
Lilly Tech Systems