Design Ride ETA Prediction
A complete walkthrough of designing an ML system that predicts how long a ride will take. This is one of the most technically challenging ML system design problems because it combines spatial data, temporal patterns, real-time signals, and strict accuracy requirements.
Step 1: Clarify Requirements
- “When is ETA predicted?” — At ride request time (before matching) and during the ride (live updates)
- “Scale?” — 20M rides/day, 500K concurrent rides, global coverage
- “Accuracy target?” — Within 2 minutes for 80% of trips, within 5 minutes for 95%
- “Latency?” — Pre-ride ETA: <100ms; In-ride updates: every 30 seconds
- “Does ETA affect pricing?” — Yes, ETA directly impacts fare calculation and driver matching
ML Problem Formulation
# Problem formulation
# Business goal: Accurate ride duration for pricing, matching, and user trust
# ML task: Regression (predict trip duration in seconds)
# Input: Origin, destination, route, time, traffic, weather
# Output: Predicted duration (seconds) + confidence interval
# Training data: Completed trips with actual duration
# Loss function: Huber loss (robust to outliers from unusual trips)
# Key constraint: Under-prediction worse than over-prediction (user trust)
Step 2: High-Level Architecture
# Architecture overview
#
# [Ride Request]
# |
# [Route Planning Service] --> Returns candidate routes (from mapping service)
# |
# [Feature Assembly]
# |-- Static features: road segments, speed limits, intersections
# |-- Real-time features: current traffic, weather, events
# |-- Historical features: typical travel time for this route/time
# |
# [Segment-Level ETA Model] --> Predict time for each road segment
# |
# [Route-Level Aggregation] --> Sum segment ETAs + transition penalties
# |
# [Post-Processing] --> Calibration, confidence intervals
# |
# [ETA Response: 23 min (21-27 min range)]
#
# In-ride updates:
# [GPS Telemetry] --> [Remaining Route] --> [Updated ETA]
Step 3: Deep Dive — Spatial-Temporal Features
Road Segment Features (Static)
| Feature | Type | Description |
|---|---|---|
| segment_length_m | Numerical | Length of road segment in meters |
| road_type | Categorical | Highway, arterial, residential, unpaved |
| speed_limit_kmh | Numerical | Posted speed limit |
| num_traffic_lights | Numerical | Number of traffic signals on segment |
| num_lanes | Numerical | Number of lanes in travel direction |
| has_turn | Binary | Whether segment involves a turn (adds delay) |
| elevation_change | Numerical | Elevation difference (uphill is slower) |
Real-Time Features
| Feature | Type | Description |
|---|---|---|
| current_speed_ratio | Numerical | Current avg speed / free-flow speed on segment |
| congestion_level | Categorical | Free flow, light, moderate, heavy, standstill |
| active_incidents | Binary | Accidents, road closures on or near segment |
| weather_condition | Categorical | Clear, rain, snow, fog (affects driving speed) |
| precipitation_mm | Numerical | Current rainfall/snowfall intensity |
| nearby_ride_speeds | Numerical | Average speed of rides currently on nearby segments |
Temporal Features
| Feature | Type | Description |
|---|---|---|
| hour_of_day | Cyclical (sin/cos) | Rush hour patterns: 7–9am, 5–7pm are slowest |
| day_of_week | Categorical | Weekday vs. weekend traffic patterns differ significantly |
| is_holiday | Binary | Holidays have different traffic patterns |
| minutes_since_event | Numerical | Post-event congestion near stadiums, concert venues |
| historical_segment_time | Numerical | Median travel time for this segment at this hour/day |
Deep Dive — Model Architecture
Approach: Segment-Level Prediction + Aggregation
Rather than predicting trip ETA directly, predict the travel time for each road segment and sum them. This approach is more accurate because:
- Each segment has its own traffic pattern (highway vs. residential)
- The model generalizes to new routes it has never seen (novel combinations of known segments)
- You can update segment-level predictions in real-time as traffic changes
# Model architecture
#
# For each road segment in the route:
# segment_features = [static_features, real_time_features, temporal_features]
# segment_eta = model.predict(segment_features) # seconds
#
# Route ETA = sum(segment_etas) + sum(transition_penalties)
#
# transition_penalty = f(turn_angle, traffic_light, stop_sign)
#
# Model choices:
# V1: Gradient Boosted Trees (XGBoost/LightGBM)
# - Fast inference, handles mixed feature types
# - Predict log(travel_time) to handle skewed distribution
#
# V2: Graph Neural Network (GNN)
# - Model road network as a graph (intersections = nodes, roads = edges)
# - Message passing captures traffic propagation effects
# - A congested segment slows adjacent segments too
# - Significantly more complex but captures spatial dependencies
Graph Neural Network Deep Dive
# GNN for road network ETA
#
# Graph construction:
# Nodes = road intersections (~10M nodes for a city)
# Edges = road segments connecting intersections
# Node features = [traffic_signal, intersection_type, ...]
# Edge features = [segment_length, speed_limit, current_speed, ...]
#
# Architecture:
# 1. Initial embedding: Linear(edge_features) --> h_0
# 2. Message passing (3 layers):
# h_l+1 = GRU(h_l, aggregate(neighbor_messages))
# 3. Edge prediction: MLP(concat(h_source, h_target, edge_features))
# --> predicted_travel_time for each segment
#
# Why GNN works:
# - Captures traffic propagation (congestion on I-95 affects exits)
# - Handles variable-length routes naturally
# - Shared parameters across all segments (generalizes to new routes)
Deep Dive — Real-Time Updates
During a ride, the ETA must update as conditions change.
In-Ride ETA Update Flow
# Every 30 seconds during the ride:
#
# 1. Receive GPS telemetry from driver's phone
# 2. Map-match GPS to road segment (snap to nearest road)
# 3. Calculate actual speed on current segment
# 4. Compare actual progress vs. predicted progress
# 5. If behind schedule:
# - Remaining segments get current traffic features refreshed
# - Re-predict remaining segment ETAs
# - New ETA = elapsed_time + sum(remaining_segment_etas)
# 6. If a reroute occurred:
# - Compute ETA for new route from scratch
# 7. Smooth the ETA update (no sudden jumps):
# new_displayed_eta = 0.7 * new_prediction + 0.3 * previous_eta
Metrics & Evaluation
Offline Metrics
| Metric | Formula | Target |
|---|---|---|
| MAPE | mean(|actual - predicted| / actual) | < 12% |
| MAE | mean(|actual - predicted|) in seconds | < 120s for trips under 30 min |
| P80 Absolute Error | 80th percentile of |actual - predicted| | < 120s |
| P95 Absolute Error | 95th percentile of |actual - predicted| | < 300s |
| Under-prediction Rate | Fraction of trips where predicted < actual | < 40% (prefer over-predict) |
Online Metrics
| Metric | Description | Guardrail |
|---|---|---|
| Fare accuracy | Predicted fare vs. actual fare (ETA drives pricing) | Refund requests should not increase |
| User satisfaction | Post-ride ratings correlated with ETA accuracy | Should not decrease |
| Driver acceptance rate | Drivers accept rides based on estimated duration | Should not decrease |
| ETA convergence | How quickly in-ride ETA converges to actual | Within 10% by halfway point |
Step 4: Trade-Offs & Extensions
Accuracy vs. Latency
A GNN produces more accurate ETAs but takes 50ms per route. For ride matching where you need ETAs for 100 drivers simultaneously, use a fast segment lookup table and reserve GNN for the final selected route.
Global vs. Local Models
A single global model handles all cities, but traffic patterns vary enormously (NYC vs. rural Texas). Train city-specific models for top markets and fall back to a global model elsewhere.
Point Estimate vs. Distribution
Instead of predicting “23 minutes,” predict a distribution: “23 min (80% chance: 21–27 min).” Use quantile regression or mixture density networks. Show the range for airport trips where precision matters.
Multi-Modal ETA
Extend to predict ETA for ride + public transit + walking combinations. Requires integrating transit schedules, walking speed estimation, and transfer time models.
Lilly Tech Systems