Intermediate

Bias & Fairness Questions

Bias and fairness are the most frequently asked AI ethics topics in interviews. These 12 questions cover the full spectrum — from defining types of bias to navigating impossible trade-offs between fairness definitions. Each answer is structured to show interviewers you can reason about bias practically, not just theoretically.

Q1: What are the main types of bias in machine learning?

💡

Model Answer: There are several categories of bias that can affect ML systems at different stages of the pipeline:

Historical bias exists in the real world and is reflected in training data. For example, if historically fewer women were hired for engineering roles, a model trained on that data will learn to prefer male candidates — even if gender is removed as a feature, because correlated features (name, hobbies, university) act as proxies.

Representation bias occurs when the training data does not reflect the population the model will serve. A facial recognition system trained primarily on lighter-skinned faces will perform poorly on darker-skinned faces — as demonstrated by the Gender Shades study showing error rates up to 34% higher for dark-skinned women.

Measurement bias arises when the features or labels used as proxies do not accurately capture the concept being measured. Using arrest records as a proxy for criminal behavior introduces bias because arrest rates reflect policing patterns, not just crime rates.

Aggregation bias occurs when a single model is used for groups with different underlying distributions. A diabetes prediction model trained on pooled data may perform well on average but poorly for specific ethnic groups whose glucose metabolism differs.

Evaluation bias happens when the benchmark used to evaluate a model does not represent the real-world use case. A sentiment analysis model evaluated on English movie reviews may show excellent accuracy but fail on tweets with slang, code-switching, or African American Vernacular English.

Deployment bias occurs when a model is used in contexts it was not designed for, or when users interact with it in unexpected ways.

Q2: What is the difference between demographic parity and equalized odds?

💡

Model Answer: These are two of the most important mathematical fairness definitions, and they often conflict with each other.

Demographic parity (also called statistical parity) requires that the positive prediction rate is the same across groups. If 30% of male applicants are accepted, 30% of female applicants should also be accepted. It ignores the actual qualification rates — it is purely about outcome equality.

Equalized odds requires that the true positive rate and false positive rate are equal across groups. In a loan context: among people who would actually repay, the approval rate should be equal across racial groups. And among people who would default, the denial rate should also be equal.

Key difference: Demographic parity ensures equal outcomes regardless of qualifications. Equalized odds ensures equal treatment conditional on the true outcome. In practice, you cannot satisfy both simultaneously unless the base rates are identical across groups (Chouldechova's impossibility theorem).

When to use which: Demographic parity is appropriate when you believe historical data is too biased to trust (e.g., hiring). Equalized odds is appropriate when you trust the ground truth labels and want to ensure equal accuracy (e.g., medical diagnosis). In interviews, demonstrate you understand this trade-off rather than claiming one is always correct.

Q3: How would you detect bias in a model that has already been deployed?

💡

Model Answer: I would implement a multi-layered monitoring strategy:

Slice-based evaluation: Break down model performance by demographic groups (age, gender, race, geography, income level). Compare accuracy, false positive rate, and false negative rate across slices. Disparities that exceed a predefined threshold trigger alerts.

Outcome monitoring: Track real-world outcomes, not just model predictions. If a loan approval model shows equal accuracy but approved applicants from one group default at much higher rates, the model may have different calibration across groups.

Feedback loop analysis: Models that influence their own training data can amplify bias over time. A content recommendation system that shows less diverse content leads to less diverse engagement data, which trains a less diverse model. Monitor for decreasing diversity in model inputs over time.

Adversarial testing: Use red-teaming approaches where you systematically test edge cases and demographic combinations the model may handle poorly. This is especially important for language models that may produce harmful outputs for certain identity groups.

User complaints and appeals: Analyze patterns in user complaints. If a disproportionate number of appeals come from a specific demographic, that is a strong signal of bias even if your quantitative metrics look acceptable.

Causal analysis: Go beyond correlational fairness metrics. Use causal inference techniques to determine whether protected attributes actually influence predictions through illegitimate pathways.

Q4: Explain the fairness-accuracy trade-off. Is it always real?

💡

Model Answer: The conventional wisdom is that improving fairness always reduces accuracy. This is sometimes true, but not always — and the nuance matters in interviews.

When the trade-off is real: If base rates genuinely differ between groups and you enforce demographic parity, you are asking the model to make predictions that diverge from the data distribution. This necessarily reduces overall accuracy. For example, if disease prevalence truly differs between populations, forcing equal prediction rates will increase false positives for one group and false negatives for another.

When the trade-off is illusory: Often, improving fairness actually improves real-world accuracy. If your training data is biased, a model that memorizes that bias is "accurate" on the biased data but inaccurate in the real world. Debiasing improves generalization. Additionally, models with bias often underperform on minority groups not because fairness constraints hurt performance, but because insufficient data or poor feature engineering for those groups was the problem all along. Investing in better data collection for underserved groups improves both fairness and accuracy.

In interviews, emphasize: The trade-off is context-dependent. A thoughtful answer acknowledges situations where it is real (and discusses who bears the cost) and situations where it is a false dichotomy (and discusses how to find win-win solutions through better data, features, or model architecture).

Q5: How would you debias a hiring algorithm?

💡

Model Answer: I would approach debiasing at three stages of the ML pipeline:

Pre-processing (data-level): Remove or transform biased features. Blind the model to gender by removing names, pronouns, and gendered terms from resumes. But be careful — proxy features like university name, sports, or sorority/fraternity membership can reintroduce gender signal. Use techniques like Learning Fair Representations (Zemel et al.) to transform the feature space so protected attributes cannot be recovered. Augment training data to balance representation across groups.

In-processing (model-level): Add fairness constraints directly to the optimization objective. Adversarial debiasing trains a model to make predictions while simultaneously training an adversary that tries to predict protected attributes from the model's internal representations. If the adversary fails, the model's predictions are independent of protected attributes. Calibrated equalized odds post-shifts can also be applied during training.

Post-processing (output-level): Adjust decision thresholds differently for different groups to achieve desired fairness metrics. This is the simplest approach but has limitations — it does not fix the underlying model, and it can feel like quotas rather than genuine fairness.

Beyond technical fixes: Involve domain experts in defining what "qualified" means. Challenge the label itself — if "successful hire" is defined as "stayed 2+ years and got promoted," that label may itself be biased if promotion practices are unfair. Use structured interviews and rubrics to generate less biased labels.

Q6: What is proxy discrimination and how do you prevent it?

💡

Model Answer: Proxy discrimination occurs when a model uses features that are highly correlated with protected attributes (race, gender, age) to make decisions, even when the protected attributes are not directly included as features.

Examples: ZIP code as a proxy for race in the US. First name as a proxy for gender or ethnicity. Browsing history on specific health websites as a proxy for medical conditions. University name as a proxy for socioeconomic status.

Prevention strategies: (1) Correlation analysis — measure the mutual information between each feature and protected attributes. Features with high correlation deserve scrutiny, though not automatic removal. (2) Causal reasoning — ask whether the feature's predictive power comes through a legitimate causal path or through the protected attribute. ZIP code predicting loan default through "proximity to employment" is more legitimate than through "racial demographics." (3) Feature importance auditing — use SHAP values to check whether proxy features disproportionately influence predictions for specific groups. (4) Counterfactual fairness — test whether changing a person's protected attribute (while keeping everything else the same) would change the prediction. If it would, proxy discrimination exists.

Q7: Explain the COMPAS case and what it teaches us about fairness.

💡

Model Answer: COMPAS (Correctional Offender Management Profiling for Alternative Sanctions) is a risk assessment tool used in US courts to predict recidivism. ProPublica's 2016 investigation found that COMPAS had significantly different false positive rates across racial groups: Black defendants who did not reoffend were almost twice as likely to be classified as high-risk compared to white defendants who did not reoffend.

Northpointe (the company) responded that COMPAS was calibrated — among defendants scored as high-risk, the actual recidivism rate was similar across races. Both claims were mathematically correct, which reveals the fundamental lesson:

Key insight: When base rates differ between groups (Black defendants had higher observed recidivism rates in the data, itself partly reflecting systemic bias in policing and sentencing), it is mathematically impossible to simultaneously achieve calibration, equal false positive rates, and equal false negative rates. This is known as the impossibility theorem of fairness.

Interview takeaway: COMPAS teaches that "fairness" is not a single metric but a choice among competing definitions, and that choice is fundamentally a value judgment, not a technical one. In an interview, show you understand this by discussing who should make that choice (not just engineers), what stakeholders are affected, and how to make the trade-off explicit and transparent.

Q8: How do feedback loops amplify bias in ML systems?

💡

Model Answer: Feedback loops occur when a model's predictions influence the data it is later trained on, creating a self-reinforcing cycle.

Predictive policing example: A model predicts higher crime in neighborhood A. Police increase patrols there. More arrests are made in neighborhood A (because more police are looking). The new data "confirms" the model's prediction. The model becomes more confident about neighborhood A. Meanwhile, crimes in neighborhood B go undetected because fewer police are deployed there. The model learns that B is safer, reducing patrols further. Over time, the model does not predict crime — it predicts policing.

Content recommendation example: A recommendation algorithm shows users content similar to what they have engaged with. Users can only engage with content they are shown. The model learns from this engagement data. Over time, recommendations become narrower and narrower, creating filter bubbles that the data "supports" but that actually reflect the model's own choices, not genuine user preferences.

Mitigation strategies: (1) Randomized exploration — show some content or make some decisions randomly to collect unbiased data. (2) Causal modeling — distinguish between "users liked this because we showed it" and "users would have liked this regardless." (3) Counterfactual evaluation — use off-policy evaluation methods to estimate what would have happened under different policies. (4) Regular retraining with fresh, independently collected data. (5) Monitor outcome diversity metrics over time.

Q9: What is intersectional bias and why does it matter?

💡

Model Answer: Intersectional bias occurs when a model is unfair to people at the intersection of multiple identity groups, even when it appears fair for each group individually.

Classic example: A model might perform equally well for men and women overall, and equally well for Black and white people overall. But it could still fail badly for Black women specifically. The Gender Shades study by Buolamwini and Gebru demonstrated exactly this: commercial facial recognition systems had error rates below 1% for light-skinned men but up to 34.7% for dark-skinned women. Looking at gender alone or race alone would have missed this disparity.

Why it matters: Testing fairness on one dimension at a time creates a false sense of security. The most vulnerable populations are often at the intersection of multiple marginalized identities, and standard fairness audits can completely overlook them if they only examine one axis at a time.

How to address it: (1) Disaggregate evaluation metrics by intersectional groups, not just individual protected attributes. (2) Be mindful that intersectional groups can have very small sample sizes, making statistical evaluation harder. Use bootstrap confidence intervals. (3) Engage with affected communities to understand which intersections face the greatest risks. (4) Design data collection strategies that ensure adequate representation of intersectional groups.

Q10: How would you explain model fairness to a non-technical executive?

💡

Model Answer: I would use a concrete analogy. Imagine a university admissions committee. There are different ways to define "fair admissions":

Equal acceptance rates (demographic parity): "We accept the same percentage from every high school." This ensures equal representation but ignores that some schools may have stronger academic programs.

Equal accuracy (equalized odds): "Among students who would thrive at our university, we accept the same percentage from every high school. And among students who would struggle, we reject the same percentage." This is fairer in one sense, but requires knowing who would actually thrive, which is hard to measure without bias.

Equal confidence (calibration): "When we say a student has an 80% chance of graduating, that is equally true regardless of which high school they came from." This ensures our predictions are reliable for everyone but does not guarantee equal outcomes.

The key message for executives: You cannot have all three simultaneously. The leadership team needs to decide which definition of fairness aligns with our values and our legal obligations, and that decision should be documented, reviewed, and communicated transparently. It is a business and values decision, not just a technical one.

Q11: What is the difference between individual and group fairness?

💡

Model Answer: These represent two fundamentally different philosophical approaches to fairness.

Group fairness requires that statistical properties of the model's predictions are equal across defined groups (e.g., equal acceptance rates for men and women). It focuses on aggregate outcomes and is easier to measure and enforce. But it allows individual unfairness — two identical candidates from different groups might get different predictions.

Individual fairness requires that similar individuals receive similar predictions, regardless of group membership. The intuition is simple: if Alice and Bob have identical qualifications except for gender, they should get the same prediction. Technically, this is formalized as a Lipschitz condition — individuals who are close in a relevant metric space should receive close predictions.

The challenge: Individual fairness requires defining a meaningful distance metric between individuals — what does "similar" mean? Who decides? Two people with the same GPA but from different socioeconomic backgrounds may not be "similar" in a meaningful way. This metric definition often reintroduces the very value judgments we are trying to formalize.

In practice: Most deployed systems use group fairness metrics because they are measurable and auditable. But the best approaches combine both — checking group-level statistics while also testing individual cases through counterfactual analysis.

Q12: You discover your production model has a 15% higher false positive rate for one demographic group. Walk me through what you do.

💡

Model Answer: I would follow a structured incident response approach:

Immediate assessment (hours): Determine the severity and scope. How many users are affected? What are the downstream consequences of a false positive? If the false positive means a person is incorrectly flagged for fraud, denied a loan, or subjected to additional screening, this is urgent. If the consequences are lower-stakes (e.g., an irrelevant recommendation), the timeline can be slightly longer.

Root cause analysis (days): Investigate why. Is the model underperforming because of insufficient training data for that group? Are proxy features leaking demographic information? Has a feedback loop amplified an initial small bias? Has the data distribution shifted since training? Understanding the cause determines the fix.

Short-term mitigation (days to weeks): Adjust the decision threshold for the affected group to equalize false positive rates. This is a post-processing fix — quick to deploy but not a permanent solution. Alternatively, add human review for borderline cases in the affected group. Communicate with affected users if they experienced harm.

Long-term fix (weeks to months): Address the root cause. If data imbalance, collect more representative data. If proxy features, remove or transform them. If feedback loops, introduce exploration. Retrain and re-evaluate. Deploy with monitoring that specifically tracks the previously identified disparity.

Process improvements: Update the fairness testing checklist to catch this earlier. Add automated alerts for disparity metrics exceeding thresholds. Conduct a post-mortem and share learnings across the organization.

Key Takeaways

💡

Know the six types of bias and be able to give examples of each
Understand the impossibility theorem: you cannot satisfy all fairness metrics simultaneously when base rates differ
Debiasing happens at three stages: pre-processing, in-processing, and post-processing
Feedback loops are one of the most dangerous and underappreciated sources of bias in production systems
Intersectional bias testing is essential — checking one demographic dimension at a time creates false confidence
The COMPAS case is the canonical example of fairness trade-offs — know it well
Always frame bias as a practical engineering problem, not just a philosophical one

← Previous Why AI Ethics Matters Next → Transparency & Explainability