Advanced

Case Study Questions

5 complete case studies that mirror real data science interview questions at top tech companies. Each includes a structured approach and detailed model answer.

✅

Framework for Case Studies: For every case study, follow this structure: (1) Clarify — ask questions to understand the problem scope, (2) Define metrics — what are we measuring and why, (3) Hypothesize — list potential causes or approaches, (4) Analyze — describe the data and methods you would use, (5) Recommend — provide actionable recommendations with tradeoffs.

Case Study 1: Metric Design for a Ride-Sharing App

Prompt: "You are a data scientist at a ride-sharing company. The CEO asks you to define the single most important metric (North Star metric) for the company and design a dashboard around it. What do you propose?"

💡

Model Answer:

Clarifying questions I would ask: What stage is the company at (growth vs profitability)? What geography and segments are we focused on? Are we optimizing for riders, drivers, or both?

North Star Metric: Weekly completed rides per active rider. This metric captures: (1) product engagement (users are taking rides), (2) marketplace health (both riders and drivers need to be present), (3) retention signal (weekly frequency shows habit formation), and (4) revenue proxy (more rides = more revenue). I chose this over total rides (which inflates with user growth but hides per-user trends) and revenue (which is influenced by pricing changes, not just product quality).

Supporting metrics for the dashboard:

Supply side: Active drivers per hour, driver utilization rate, average driver earnings per hour, driver churn rate
Demand side: Weekly active riders, new rider activation rate, rider churn rate, rides per rider
Marketplace health: Average wait time, surge pricing frequency, match rate (% of ride requests fulfilled), cancellation rate
Financial: Revenue per ride, gross margin, customer acquisition cost (CAC), lifetime value (LTV)

Guardrail metrics: Average wait time should not exceed 5 minutes (user experience), driver earnings per hour should not drop below a threshold (supply retention), and safety incident rate should remain flat or decrease.

Key tradeoff: Optimizing ride frequency could lead to unsustainable driver subsidies or excessive surge pricing. The dashboard should always show the North Star alongside unit economics to prevent short-sighted optimization.

Case Study 2: Root Cause Analysis — Engagement Drop

Prompt: "You wake up Monday morning and see that the 7-day active users on your social media platform dropped 8% compared to the previous week. Walk me through how you would investigate."

💡

Model Answer:

I would follow a systematic top-down investigation, starting broad and narrowing down:

Step 1: Verify the data. Before investigating the product, rule out data issues. Check if the logging pipeline had outages, if the metric definition changed, or if there was a tracking bug. Look at raw event counts, not just the derived metric. Check data freshness and completeness.

Step 2: External factors. Was there a holiday, seasonal pattern, or major external event (competitor launch, news event, app store policy change)? Compare to the same week last year if available. Check if the drop is consistent across all platforms or isolated to one (e.g., iOS only, which might indicate an App Store issue).

Step 3: Segment the drop. Break down the 8% by key dimensions:

Platform: iOS vs Android vs Web — if only one platform dropped, it might be a release bug
Geography: US vs International — could be a regional outage or localization issue
User tenure: New vs existing users — new user drop suggests acquisition or onboarding issues; existing user drop suggests engagement or product issues
User segments: Power users vs casual users — if power users dropped, it is more concerning

Step 4: Check for product changes. Was there a new release, feature flag change, or A/B test that went to 100%? Check the deployment log. A recent code push is the most common cause of sudden metric drops. If yes, correlate the timing with the metric drop.

Step 5: Check for performance issues. Did app latency increase? Did error rates spike? Did crash rates go up? Users abandon slow or buggy apps without complaint.

Step 6: Quantify and recommend. Once I identify the cause, I would quantify the impact (how many users affected, revenue impact), propose a fix, and set up monitoring to detect similar issues faster in the future.

Key communication point: Present findings as "Here is what we know, here is what we suspect, and here is what we need to investigate further" rather than waiting for a complete answer.

Case Study 3: Product Analytics — Should We Launch This Feature?

Prompt: "We ran an A/B test for a new 'Stories' feature on our e-commerce app. Treatment group shows +12% daily active users but -3% in purchases per session. Should we launch?"

💡

Model Answer:

My framework for this decision has four parts:

1. Validate the results. First, check the basics: sample ratio mismatch (SRM) to ensure randomization is correct, adequate sample size for both metrics, and whether the test ran long enough to capture weekly patterns. Check if both results are statistically significant and look at confidence intervals, not just point estimates.

2. Decompose the metrics. The +12% DAU and -3% purchases per session could be consistent or contradictory depending on the mechanism:

Scenario A (consistent): Stories attract users who browse but do not buy. These users increase DAU but dilute purchases per session. Total purchases might actually be flat or even up. Calculate total purchases = DAU × sessions per user × purchases per session.
Scenario B (contradictory): Stories distract buying-intent users from purchasing. The engagement is cannibalistic.

To distinguish: segment by user type. Did existing buyers reduce their purchase rate? Or did the new DAU come from non-buyers?

3. Evaluate business impact. Calculate the net revenue impact: if DAU is up 12% but purchase rate per session is down 3%, the overall purchase volume is approximately up 8.6% (1.12 × 0.97 = 1.086). If average order value is unchanged, total revenue increases. However, check if the DAU increase is sustainable or a novelty effect by examining the trend over the test duration.

4. My recommendation: If the DAU increase is driven by new user segments without cannibalizing existing buyer behavior, I would recommend launching with monitoring. Set guardrails: if purchases per session drops below -5% for existing buyers or if the DAU lift fades below +5% after 4 weeks, reconsider. Also track long-term retention — Stories engagement may improve 30-day retention even if it does not directly drive purchases.

Case Study 4: Recommendation System Impact

Prompt: "You built a new recommendation algorithm for a streaming platform. How would you design the evaluation and measure its impact?"

💡

Model Answer:

I would evaluate in three phases: offline, online, and long-term.

Phase 1: Offline Evaluation. Before exposing users, evaluate on historical data:

Precision@K and Recall@K: Of the top K recommendations, how many did the user actually engage with? And what fraction of items the user engaged with appeared in the top K?
NDCG (Normalized Discounted Cumulative Gain): Measures ranking quality, giving more credit when relevant items appear higher in the list.
Coverage: What percentage of the catalog gets recommended? Low coverage means the algorithm is stuck in a popularity bubble.
Diversity: How varied are the recommendations within a single user's list? Too similar items reduce discovery.

Phase 2: Online A/B Test. Metrics to track:

Primary metric: Streaming hours per user per week — this is the North Star because it directly measures whether users are finding content they enjoy.
Secondary metrics: Click-through rate on recommendations, percentage of streaming from recommended vs searched content, content diversity consumed.
Guardrails: Subscription cancellation rate, customer support tickets, new content discovery rate (to prevent filter bubbles).

Phase 3: Long-term Impact. Short-term metrics can be misleading for recommendation systems. A clickbait-optimized algorithm might boost short-term CTR but reduce long-term satisfaction. Track:

30-day and 90-day retention rates for the treatment group
User satisfaction surveys (Net Promoter Score)
Content diversity consumed over time (are users getting trapped in echo chambers?)

Key consideration: Recommendation systems have strong position bias (users click the first item regardless of quality). Use interleaving experiments or position-debiased evaluation to get accurate estimates. Also, new algorithms often suffer from cold-start issues with new users or new content — measure performance separately for these segments.

Case Study 5: Churn Prediction Model

Prompt: "Your subscription business has a 5% monthly churn rate. Design a churn prediction system and explain how you would use it to reduce churn."

💡

Model Answer:

I would approach this in four stages: definition, modeling, action, and measurement.

Stage 1: Define churn precisely. For a subscription business, churn is relatively clear (cancelled subscription), but I would still clarify: (1) Is it voluntary churn (user cancelled) or involuntary (payment failed)? These need different interventions. (2) What is the prediction window? I would predict churn 30 days ahead to give the retention team time to intervene. (3) What is the observation window? I would use the past 90 days of behavior as features.

Stage 2: Feature engineering and modeling.

Behavioral features: Login frequency trend (declining?), feature usage breadth and depth, session duration trend, days since last activity, support ticket frequency
Engagement features: Content consumption rate, social interactions (if applicable), notification response rate
Account features: Subscription plan, tenure, payment method, billing issues history
Derived features: Week-over-week change in activity (slope), ratio of current month activity to first month (engagement decay)

Model choice: I would start with gradient boosting (XGBoost/LightGBM) because it handles mixed feature types well, provides feature importance, and performs strongly on tabular data. Use 5% churn rate as the baseline — any model must beat this. Handle class imbalance with SMOTE or class weights, not undersampling (which discards data). Evaluate with precision-recall AUC rather than ROC-AUC because the class imbalance makes ROC-AUC misleadingly optimistic.

Stage 3: Actionable interventions. A prediction model is worthless without action. Rank users by churn probability and design tiered interventions:

High risk (>70% churn probability): Personal outreach from customer success, custom retention offer (discount, extended trial of premium features)
Medium risk (40-70%): Targeted email campaigns, in-app prompts highlighting underused features, re-engagement notifications
Low risk (<40%): Standard lifecycle marketing, monitor for risk escalation

Stage 4: Measure impact. The intervention itself must be A/B tested. Randomly hold out 10% of high-risk users as a control group (no intervention). Measure: (1) churn rate reduction in treatment vs control, (2) cost of intervention vs revenue saved (ROI), (3) long-term retention — did the intervention just delay churn or truly prevent it? Track false positive rate to avoid annoying happy customers with unnecessary retention offers.

Key insight: The most impactful churn predictors are usually simple: declining login frequency and approaching subscription renewal date. Fancy features add marginal value. Focus engineering effort on the intervention system, not squeezing the last 0.5% of model accuracy.

⚠

Important: In case study interviews, the process matters more than the "right" answer. Interviewers want to see structured thinking, the ability to ask good clarifying questions, awareness of tradeoffs, and practical recommendations. Practice talking through your thought process out loud — this is where most candidates struggle.

← Previous Pandas & Python Coding Next → Practice Questions & Tips