Transparency & Explainability
As AI systems make increasingly consequential decisions — in healthcare, finance, criminal justice, and hiring — the ability to explain those decisions is both an ethical imperative and a regulatory requirement. These 10 questions cover the technical tools, legal landscape, and practical trade-offs of AI transparency.
Q1: What is the difference between interpretability and explainability?
Model Answer: These terms are often used interchangeably, but there is an important distinction.
Interpretability means the model itself is inherently understandable. A linear regression, decision tree, or rule-based system is interpretable because a human can directly examine the model's structure and understand how inputs map to outputs. No additional tool is needed.
Explainability means we can provide post-hoc explanations for a model's behavior, even if the model itself is a black box. A deep neural network is not interpretable, but we can use SHAP values, LIME, or attention visualization to explain individual predictions.
Key insight: Interpretability is a property of the model. Explainability is a property of the system (model plus explanation tooling). An interpretable model is always explainable, but an explainable model is not necessarily interpretable. Post-hoc explanations can be approximate, misleading, or unfaithful to the model's actual reasoning — this is a critical limitation that interviewers want you to acknowledge.
Q2: Compare SHAP and LIME. When would you use each?
Model Answer:
LIME (Local Interpretable Model-agnostic Explanations) explains individual predictions by perturbing the input, observing how predictions change, and fitting a simple interpretable model (usually linear) around that local region. It is fast, model-agnostic, and intuitive. Limitations: explanations can be unstable (different perturbation samples give different explanations), and the local model may not faithfully represent the true decision boundary.
SHAP (SHapley Additive exPlanations) uses game theory (Shapley values) to assign each feature a contribution to the prediction. It provides both local and global explanations. SHAP has stronger theoretical guarantees: it satisfies local accuracy, missingness, and consistency properties. Limitations: exact SHAP computation is exponential; practical implementations (TreeSHAP, DeepSHAP, KernelSHAP) make approximations. It can be slower than LIME for complex models.
When to use LIME: Quick, local explanations for individual predictions. Real-time explanations in production. When you need speed over theoretical rigor. When explaining to non-technical stakeholders who need simple "this feature pushed the prediction up" narratives.
When to use SHAP: When you need theoretically grounded explanations. Global feature importance across the dataset. Regulatory compliance where mathematical rigor matters. When you need consistent, reproducible explanations. Debugging model behavior systematically.
Q3: Does GDPR require explainability? What is the "right to explanation"?
Model Answer: This is a nuanced topic and a common interview question. The GDPR does not explicitly use the phrase "right to explanation," but several articles create de facto explainability requirements.
Article 22 gives individuals the right not to be subject to decisions based solely on automated processing that significantly affect them. When automated decisions are made, Articles 13 and 14 require providing "meaningful information about the logic involved, as well as the significance and the envisaged consequences" of the processing.
What this means in practice: If your model denies someone a loan, you must be able to explain why in terms the person can understand. "The model assigned a score of 0.3" is not sufficient. "Your application was declined primarily because your debt-to-income ratio exceeds our threshold, and you have had two missed payments in the last 12 months" is meaningful.
The debate: Legal scholars disagree on whether GDPR creates a true "right to explanation" or merely a "right to be informed." But from a practical engineering standpoint, the distinction does not matter much. If you build an AI system that makes consequential decisions about EU citizens, you need to be able to explain those decisions. Design for explainability from the start rather than trying to add it later.
The EU AI Act goes further, explicitly requiring transparency for high-risk AI systems, including documentation of training data, model architecture, and performance metrics across demographic groups.
Q4: When is it acceptable to use a black-box model?
Model Answer: The acceptability of black-box models depends on the stakes, the domain, and the availability of recourse.
Acceptable: Content recommendations (low-stakes, easy to override), image search, spam filtering, predictive text, and entertainment applications. When the user can easily dismiss or override the decision. When the performance gain over interpretable models is significant and the consequences of errors are minimal.
Questionable: Credit scoring, insurance pricing, hiring screening. These are high-stakes decisions, but if you have robust post-hoc explainability (SHAP, LIME), audit trails, and appeals processes, black-box models can be used with appropriate safeguards.
Unacceptable: Criminal sentencing, medical diagnosis without physician oversight, child welfare decisions, and any decision where the person cannot appeal or the consequences are irreversible. In these domains, either use interpretable models or ensure a human decision-maker reviews and can override the model's recommendation.
The practical test: Ask three questions: (1) Can the affected person appeal the decision? (2) Can we explain why the decision was made? (3) Is the consequence reversible? If any answer is "no," push hard for interpretable models or mandatory human oversight.
Q5: How would you explain a machine learning model's decision to a patient in healthcare?
Model Answer: Healthcare requires a layered explanation approach because the audience ranges from patients with no technical background to clinicians who want clinical detail.
For the patient: Use natural language and analogies. "The system analyzed your symptoms, lab results, and medical history. The three factors that most influenced the recommendation were your blood pressure trend over the last 6 months, your family history of heart disease, and your recent cholesterol levels. This is a recommendation for your doctor to review, not a final diagnosis." Avoid probability scores — "the model says 73% chance" can be anxiety-inducing and easily misinterpreted.
For the clinician: Provide feature importance rankings, confidence intervals, and similar patient comparisons. "Among patients with similar profiles (age 55-65, male, hypertensive, elevated LDL), 68% were diagnosed with coronary artery disease. The top contributing features were: systolic BP trend (+12% contribution), LDL cholesterol (+9%), family history (+7%). Note: the model has lower confidence for patients with comorbid diabetes, which applies here."
Critical safeguard: Always frame AI outputs as decision support, never as decisions. The model assists the physician; it does not replace clinical judgment. Make this explicit in the interface and the explanation.
Q6: What are the risks of post-hoc explanations?
Model Answer: Post-hoc explanations can create a false sense of understanding. Key risks include:
Unfaithfulness: The explanation may not accurately reflect the model's actual reasoning. LIME approximates the decision boundary locally with a linear model, which may miss nonlinear interactions that actually drove the prediction. You could get an explanation that sounds reasonable but is wrong.
Confirmation bias: Users tend to accept explanations that confirm their intuition and reject those that do not, regardless of the explanation's accuracy. If the model says "denied because of income" and the loan officer already suspected that, they accept it without scrutiny.
Gaming: Once users know what features the explanation highlights, they can game the system. If a credit model explains "your score was lowered because you have too many credit inquiries," people learn to avoid inquiries rather than addressing underlying creditworthiness.
Oversimplification: Complex models make decisions based on subtle feature interactions. Reducing this to "top 3 features" hides the complexity. A model might deny a loan based on 15 interacting features, but showing only the top 3 gives an incomplete and potentially misleading picture.
Liability: If an explanation is wrong and a user makes a decision based on it, who is liable? This is an emerging legal question with no clear answer yet.
Q7: What are model cards and why are they important?
Model Answer: Model cards, proposed by Mitchell et al. at Google in 2019, are standardized documentation for ML models. They serve as "nutrition labels" for AI systems.
A model card includes: (1) Model details — architecture, training procedure, version, date. (2) Intended use — primary use cases and out-of-scope uses. (3) Factors — relevant demographic groups, instruments, and environments. (4) Metrics — performance metrics disaggregated by relevant factors. (5) Evaluation data — the dataset used for evaluation and its characteristics. (6) Training data — description of training data (without exposing sensitive data). (7) Ethical considerations — known risks, limitations, and potential harms. (8) Caveats and recommendations.
Why they matter: Model cards create accountability and transparency. They force teams to think about fairness, limitations, and intended use before deployment. They help downstream users understand whether a model is appropriate for their use case. And they provide a paper trail for auditing and compliance.
In practice: Google, Hugging Face, and OpenAI publish model cards. Hugging Face has made model cards a community standard, with a dedicated field on every model page. The EU AI Act will likely make something similar mandatory for high-risk AI systems.
Q8: How do you balance transparency with intellectual property protection?
Model Answer: This is a genuine tension. Companies invest millions in developing proprietary models and do not want to reveal architecture details or training data that competitors could replicate. But stakeholders, regulators, and users demand transparency.
Layered transparency: Different stakeholders need different levels of detail. (1) Users need to understand what data is collected, how it affects decisions, and how to appeal. No model internals needed. (2) Regulators may need model architecture, training data summaries, and fairness metrics under NDA or in a secure environment. (3) External auditors can be given access to the model through an API for testing without seeing weights or code. (4) The public gets model cards, aggregate performance statistics, and impact assessments.
Techniques that enable transparency without IP exposure: Differential privacy on training data descriptions. SHAP values that explain individual predictions without revealing model weights. Aggregated fairness metrics that demonstrate compliance without exposing proprietary evaluation data. Secure multi-party computation for external auditing.
Key principle: You can be transparent about what a model does and how well it performs without being transparent about how it works internally. Most regulatory requirements and user needs can be satisfied at the behavioral level.
Q9: What is attention visualization and what are its limitations?
Model Answer: Attention visualization displays the attention weights in transformer models, showing which input tokens the model "attended to" when making a prediction. For example, in sentiment analysis, visualizing attention might show the model focused on words like "terrible" and "disappointing" when predicting negative sentiment.
Limitations are significant: (1) Attention weights are not explanations. Jain and Wallace (2019) showed that attention weights often do not correlate with feature importance as measured by gradient-based methods. A token receiving high attention may not actually influence the output. (2) Multi-head attention distributes attention across many heads, each capturing different patterns. Looking at one head gives an incomplete picture. Aggregating across heads is also problematic. (3) Attention shows correlation, not causation. A model might attend to "not" and "good" separately, but the meaning comes from their interaction, which attention visualization does not capture. (4) Users tend to over-interpret attention maps, constructing narratives about model reasoning that may not be accurate.
Better alternatives for transformers: Integrated gradients, which measure how much each input feature changes the prediction relative to a baseline. Layer-wise relevance propagation (LRP). Or simply SHAP, which works for any model including transformers.
Q10: How would you design an explainability system for a production ML pipeline?
Model Answer: I would design a multi-level system with different explanations for different audiences and use cases:
Real-time individual explanations: For each prediction, generate SHAP or LIME values showing the top contributing features. Cache explanations for common input patterns to reduce latency. Display in user-facing interfaces using natural language templates: "Your application was [approved/declined] primarily because [top 3 features in plain language]."
Batch global explanations: Run SHAP analysis on representative samples nightly. Generate global feature importance rankings, partial dependence plots, and interaction effects. Surface to data scientists and model owners via dashboards.
Audit trail: Store the model version, input features, prediction, confidence score, and explanation for every prediction. This is non-negotiable for regulated industries. Enable filtering by demographic group to support fairness audits.
Explanation quality monitoring: Track explanation stability — are similar inputs getting similar explanations? Monitor explanation coverage — what percentage of predictions can be meaningfully explained? Alert on explanation drift — if the top features change significantly over time, the model may have shifted.
Human-in-the-loop feedback: Allow users and reviewers to flag explanations as "confusing" or "incorrect." Use this feedback to improve explanation templates and identify edge cases where the model behaves unexpectedly.
Key Takeaways
- Interpretability is a model property; explainability is a system property — they are not the same
- SHAP has stronger theoretical guarantees; LIME is faster and simpler — know when to use each
- GDPR creates de facto explainability requirements even without an explicit "right to explanation"
- Black-box models are acceptable for low-stakes decisions but inappropriate for irreversible, high-consequence ones
- Post-hoc explanations carry real risks: unfaithfulness, confirmation bias, gaming, and oversimplification
- Model cards are becoming an industry standard and will likely be regulatory requirements
- Design explainability into systems from the start — retrofitting is expensive and often inadequate
Lilly Tech Systems