Optimization & Continuous Improvement
A lead scoring model is never finished. Learn how to monitor performance, detect drift, A/B test scoring approaches, and build feedback loops that keep your system accurate and trusted over time.
Model Monitoring
| Metric | Frequency | Alert Threshold |
|---|---|---|
| PR-AUC | Weekly | Drop >5% from baseline |
| Score Distribution | Daily | Shift in mean score >10 points |
| Feature Drift | Weekly | KS statistic >0.1 on key features |
| Conversion Rate by Tier | Monthly | Tier conversion outside expected range |
| Sales Feedback | Ongoing | Rep override rate >20% |
Common Causes of Score Drift
- Market Changes: New competitors, economic shifts, or industry trends that change buyer behavior patterns
- Product Changes: New features, pricing changes, or market repositioning that attract different buyer profiles
- Data Source Changes: CRM field modifications, tracking code updates, or third-party data provider changes
- Seasonal Patterns: Budget cycles, fiscal year timing, and seasonal demand fluctuations
- Selection Bias: Scoring changes how leads are treated, creating feedback loops that alter future training data
A/B Testing Scoring Models
Champion-Challenger
Route 80% of leads through the current model (champion) and 20% through the new model (challenger). Compare conversion rates after sufficient sample size.
Shadow Scoring
Run the new model in parallel without affecting routing. Compare predictions against outcomes to validate before deployment.
Holdout Testing
Randomly assign a small percentage of leads to receive no AI scoring, relying on manual qualification. This measures the true incremental value of your model.
Multi-Armed Bandits
Dynamically allocate more traffic to the better-performing model over time, balancing exploration of new models with exploitation of proven ones.
Continuous Improvement Playbook
- Monthly Model Reviews: Analyze performance metrics, score distributions, and feature importance changes with data science and sales leadership
- Quarterly Retraining: Retrain models on the latest 12-18 months of data to capture evolving buyer behavior and market conditions
- Sales Feedback Integration: Collect structured feedback from reps on score accuracy and incorporate it into model refinement
- New Feature Exploration: Continuously test new data sources and features that might improve predictive power
- Bias Audits: Regularly check that scoring does not systematically disadvantage leads from specific industries, company sizes, or geographies
- Documentation: Maintain a model changelog that tracks all retraining events, feature changes, and performance impacts
Lilly Tech Systems