AI Product Sense Questions
Product sense is the most heavily weighted competency in AI PM interviews. These 10 questions test your ability to identify where AI adds value, design features that handle uncertainty, and think through the full user experience of AI-powered products.
Q1: How would you design an AI-powered feature for a food delivery app?
I would start by identifying the user pain point. The biggest frustration in food delivery is inaccurate delivery time estimates, which causes users to either wait impatiently or miss their food arriving.
Proposed feature: Intelligent Delivery Time Prediction
- User problem: Current estimates are static and often wrong by 10–20 minutes, leading to poor user experience and increased support tickets.
- AI approach: Use a model that considers real-time factors: restaurant preparation speed (historical per restaurant, per dish), current order volume, driver availability, traffic conditions, weather, and time of day.
- UX design: Show a confidence range ("Arriving in 25–35 minutes") rather than a single number. Update the estimate in real time as conditions change. Send proactive notifications if the estimate shifts significantly.
- Handling errors: When the model is uncertain (new restaurant, unusual conditions), widen the range and default to conservative estimates. Never promise a time the model has low confidence in.
- Success metrics: Reduce mean absolute error of delivery time predictions by 30%. Decrease "Where is my order?" support contacts by 25%. Improve order completion rates.
Why this works in an interview: It demonstrates structured thinking (problem, solution, UX, errors, metrics), AI-specific considerations (confidence intervals, handling uncertainty), and business impact.
Q2: A stakeholder wants to add an AI chatbot to your product. How do you evaluate this idea?
I would not immediately say yes or no. Instead, I would run this through a structured evaluation framework:
1. Problem validation: What specific user problem does the chatbot solve? Are users struggling with our existing support channels? What does our support ticket data show about the types of questions users ask? If 80% of tickets are simple FAQs, a chatbot makes sense. If most are complex complaints requiring empathy, it does not.
2. AI suitability: Is AI the right solution? Could we solve this with a better FAQ page, improved search, or structured help flows? AI chatbots add value when queries are diverse enough that static content cannot cover them, but predictable enough that the model can answer reliably.
3. Risk assessment: What happens when the chatbot gives a wrong answer? In healthcare: life-threatening. In e-commerce: annoying but recoverable. The risk profile determines how conservative the chatbot needs to be and whether human escalation is mandatory.
4. Data readiness: Do we have enough historical support conversations to train or evaluate the chatbot? What is our data quality? Without good data, even the best LLM will struggle to answer domain-specific questions accurately.
5. Build vs buy: Should we build a custom chatbot or integrate an existing solution (Intercom, Zendesk AI, custom GPT)? What is the total cost including maintenance, monitoring, and continuous improvement?
6. Success criteria: Define before building: What percentage of queries must the chatbot resolve without human handoff? What is the acceptable false answer rate? What is the ROI timeline?
Q3: How would you conduct user research for an AI-powered product?
User research for AI products is different from traditional products because users often cannot articulate what they want from AI, and their mental models of how AI works are frequently wrong.
Phase 1 — Discovery:
- Interview users about their current workflow and pain points, not about AI. Do not ask "Would you use an AI feature that does X?" because users will say yes to anything that sounds helpful.
- Observe users performing tasks manually to identify where AI could reduce effort, errors, or time. Look for repetitive pattern-matching tasks.
- Analyze support tickets, error logs, and drop-off points to find where users struggle most.
Phase 2 — Prototype testing:
- Use Wizard of Oz testing: simulate AI behavior with humans behind the scenes to test the UX before building the model. This avoids wasting months on ML that users do not want.
- Test error scenarios explicitly: show users what happens when the AI is wrong. Do they lose trust? Can they recover? This is unique to AI research.
- Measure mental model accuracy: Do users understand what the AI can and cannot do? Mismatched expectations lead to frustration even when the model performs well.
Phase 3 — Post-launch:
- Monitor override rates: How often do users reject AI suggestions? This is the most honest signal of trust and quality.
- Track confidence calibration: When users trust the AI, is it actually right? When they do not trust it, is it actually wrong? Misalignment here is dangerous.
- Run diary studies: Have users log their experience with the AI feature over 2–4 weeks. Trust with AI features builds (or erodes) over time, not in a single session.
Q4: Design an AI feature for Gmail that helps users write better emails.
User segments and pain points:
- Professionals: Spend 2.5 hours/day on email. Pain: writing takes too long, tone is hard to get right for different audiences.
- Non-native speakers: Worry about grammar and formality. Pain: fear of sounding unprofessional.
- Managers: Send many similar emails (feedback, follow-ups). Pain: repetitive writing tasks.
Feature: Smart Compose + Tone Advisor
- Real-time tone analysis: As you type, a subtle indicator shows the detected tone (formal, casual, urgent, apologetic). Users can adjust with a slider if the tone does not match their intent.
- Context-aware suggestions: Based on the recipient (your manager vs a peer vs an external client), suggest appropriate greetings, sign-offs, and formality level. Use prior email history with that recipient.
- Reply suggestions: For common email patterns (meeting requests, follow-ups, approvals), offer full draft replies that match the user's writing style, learned from their sent emails.
Privacy considerations: All analysis runs on-device where possible. Users can opt out entirely. No email content is used for model training without explicit consent. Enterprise admins can control feature availability.
Handling AI errors: Suggestions are never auto-applied. The original text is always preserved. If the user consistently ignores a suggestion type, reduce its frequency. Show a "not helpful" button on every suggestion for feedback.
Q5: How would you prioritize AI features on a product roadmap when model performance is uncertain?
Traditional roadmap prioritization uses frameworks like RICE (Reach, Impact, Confidence, Effort). For AI features, I modify this to account for uncertainty:
AI-RICE Framework:
- Reach: Same as traditional — how many users does this affect?
- Impact: What is the impact if the model works at target accuracy? And what is the impact if it works at 70% of target? AI features often deliver partial value even at lower accuracy.
- Confidence: This is the critical change. Split into (a) confidence that the problem matters to users, and (b) confidence that the ML model can achieve the required performance. These are independent.
- Effort: Include data collection, labeling, model training, evaluation infrastructure, monitoring, and ongoing maintenance — not just the initial build.
Additional prioritization principles for AI:
- Start with high-precision, low-recall: Ship features that are correct when they fire, even if they do not fire for all users. It is better to delight 30% of users than to annoy 100%.
- Invest in data infrastructure early: Features that generate high-quality training data for future features should be prioritized even if their standalone value is moderate.
- Plan for iteration: Unlike traditional features, AI features improve with more data. Build v1 to collect signal, v2 to improve accuracy, v3 to expand coverage.
- De-risk with prototypes: Before committing to a 6-month roadmap item, spend 2 weeks on a proof-of-concept to validate model feasibility.
Q6: Your AI recommendation system has high engagement but users complain it shows them "more of the same." What do you do?
This is the classic exploration-exploitation trade-off, and it is one of the hardest product problems in AI.
Diagnosis first:
- Analyze the diversity metrics of recommendations. Are we showing items from a narrow set of categories? Is the system stuck in a filter bubble?
- Segment users: Are power users satisfied while casual users complain? Or is the problem universal?
- Check if engagement metrics are masking dissatisfaction: Users might click because we show them familiar content, but their long-term retention might be declining.
Solutions:
- Inject diversity: Reserve 10–20% of recommendation slots for content outside the user's typical preferences. Measure whether these "exploration" items get engagement over time.
- Add user controls: Let users say "Show me something different" or adjust diversity preferences. This gives users agency and provides explicit signal.
- Track long-term metrics: Add 30-day retention and category breadth as success metrics alongside click-through rate. Optimize for user lifetime value, not session engagement.
- A/B test diversity levels: Test 10%, 20%, and 30% exploration rates. Measure short-term engagement loss against long-term retention gains.
Key insight: Optimizing for clicks is easy. Optimizing for user satisfaction requires measuring what users want long-term, not just what they click right now.
Q7: How would you design the user experience for an AI feature that is occasionally wrong?
Every AI feature is occasionally wrong. The product design must account for this from day one. Here is my framework:
1. Set expectations: Never position AI as infallible. Use language like "suggestions" and "might be" rather than "results" and "is." Show confidence scores when appropriate ("85% match").
2. Make errors recoverable: Users should be able to easily correct or dismiss AI output. The undo path should be faster than the AI action. Example: Gmail's Smart Reply shows 3 options but never auto-sends.
3. Calibrate user trust: Show AI suggestions alongside traditional UI so users can compare. As trust builds, give users the option to increase automation (e.g., auto-categorize emails after manually confirming 50 times).
4. Design for the error case first: What happens when the AI is wrong? If the cost of an error is high (medical diagnosis, financial transaction), add human confirmation. If the cost is low (music recommendation), let users skip easily.
5. Collect feedback implicitly: Track when users override AI suggestions. Use this signal to improve the model without requiring users to explicitly rate suggestions, which most users never do.
6. Degrade gracefully: When model confidence is low, fall back to non-AI behavior rather than showing a bad prediction. Users prefer no suggestion to a wrong suggestion.
Q8: You are PM for a search product. How would you evaluate adding an AI-generated summary at the top of search results?
Potential value: AI summaries can answer simple factual queries instantly, reducing time-to-answer by 80%. For complex queries, they provide a starting point that helps users decide which results to click.
Risks to evaluate:
- Accuracy: Hallucinated facts in search summaries directly erode trust in the entire product. One viral wrong answer can damage brand reputation.
- Traffic cannibalization: If summaries answer the query, users do not click results. This hurts publishers, advertisers, and the content ecosystem that feeds search quality.
- Query coverage: Summaries work well for factual queries but poorly for subjective, time-sensitive, or nuanced topics. Showing a confident-looking summary for a topic with no clear answer is misleading.
- Liability: Medical, legal, and financial queries have legal implications. A wrong summary about drug interactions or legal rights creates serious liability.
My recommendation: Launch in phases. Start with factual, low-risk queries (definitions, how-to instructions) where accuracy can be verified. Exclude sensitive categories (health, legal, financial). Add clear attribution to sources. A/B test impact on user satisfaction, not just engagement, because users might click less but be more satisfied.
Q9: How do you decide between building an AI feature in-house vs using a third-party API?
This is a critical strategic decision that I evaluate across six dimensions:
| Factor | Build In-House | Use Third-Party API |
|---|---|---|
| Competitive advantage | The AI capability is core to your product differentiation | The AI is a commodity (OCR, basic sentiment, transcription) |
| Data sensitivity | User data cannot leave your infrastructure (healthcare, finance) | Data is non-sensitive and API provider's terms are acceptable |
| Customization needs | You need domain-specific models trained on your data | General-purpose models work well enough for your use case |
| Latency requirements | You need sub-50ms predictions at scale | 200–500ms API round-trips are acceptable |
| Team capability | You have ML engineers who can build and maintain models | No ML team, or ML team should focus on core competencies |
| Time to market | You can afford 3–6 months to build | You need the feature shipped in 2–4 weeks |
My default approach: Start with a third-party API to validate the product hypothesis. If the feature proves valuable and the API limitations become a bottleneck, plan migration to in-house. This lets you learn what users actually need before investing in custom ML infrastructure.
Q10: How would you launch an AI product in a market where users are skeptical of AI?
User skepticism of AI is healthy and should be respected, not dismissed. My launch strategy would focus on earning trust incrementally:
1. Lead with the user benefit, not the technology: Do not market it as "AI-powered." Market it as "faster," "more accurate," or "personalized." Users care about outcomes, not algorithms. Grammarly succeeded by saying "Write with confidence," not "AI language model corrects your grammar."
2. Make AI opt-in initially: Let users choose to try the AI feature rather than forcing it on everyone. Early adopters will generate positive word-of-mouth. Show side-by-side comparisons: "Here's the result with AI assistance vs without."
3. Be transparent about limitations: Proactively communicate what the AI can and cannot do. Users who encounter known limitations feel informed. Users who encounter unknown limitations feel deceived.
4. Provide human fallbacks: In high-stakes domains, always offer a path to human assistance. This acts as a safety net that makes users willing to try the AI path. Over time, as trust builds, fewer users need the fallback.
5. Measure trust, not just usage: Track user trust surveys, override rates, and opt-out rates alongside traditional engagement metrics. A feature can have high usage but low trust (users feel forced to use it), which is unsustainable.
Lilly Tech Systems