1932

Abstract

A recent wave of research has attempted to define fairness quantitatively. In particular, this work has explored what fairness might mean in the context of decisions based on the predictions of statistical and machine learning models. The rapid growth of this new field has led to wildly inconsistent motivations, terminology, and notation, presenting a serious challenge for cataloging and comparing definitions. This article attempts to bring much-needed order. First, we explicate the various choices and assumptions made—often implicitly—to justify the use of prediction-based decision-making. Next, we show how such choices and assumptions can raise fairness concerns and we present a notationally consistent catalog of fairness definitions from the literature. In doing so, we offer a concise reference for thinking through the choices, assumptions, and fairness considerations of prediction-based decision-making.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-statistics-042720-125902
2021-03-07
2024-03-29
Loading full text...

Full text loading...

/deliver/fulltext/statistics/8/1/annurev-statistics-042720-125902.html?itemId=/content/journals/10.1146/annurev-statistics-042720-125902&mimeType=html&fmt=ahah

Literature Cited

  1. Alexander M. 2012. The New Jim Crow New York: New Press
  2. Angell R, Johnson B, Brun Y, Meliou A 2018. Themis: Automatically testing software for discrimination. Proceedings of the 2018 26th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering871–75 New York: ACM
    [Google Scholar]
  3. Angwin J, Larson J, Mattu S, Kirchner L 2016. Machine bias: There's software used across the country to predict future criminals. And it's biased against blacks. ProPublica May 23. https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing
    [Google Scholar]
  4. Barabas C, Dinakar K, Ito J, Virza M, Zittrain J 2018. Interventions over predictions: reframing the ethical debate for actuarial risk assessment. arXiv:1712.08238 [cs.LG]
  5. Barocas S, Hardt M, Narayanan A 2018. Fairness and Machine Learning http://www.fairmlbook.org
  6. Barocas S, Selbst AD. 2016. Big data's disparate impact. Calif. Law Rev. 104:671–732
    [Google Scholar]
  7. Berger JO. 1985. Statistical Decision Theory and Bayesian Analysis New York: Springer
  8. Berk R, Heidari H, Jabbari S, Kearns M, Roth A 2017. Fairness in criminal justice risk assessments: the state of the art. arXiv:1703.09207 [stat.ML]
  9. Brownsberger WN. 2017. Bill S.770: An act providing community-based sentencing alternatives for primary caretakers of dependent children who have been convicted of non-violent crimes Senate Docket No. 622 Commonw. Mass: https://malegislature.gov/Bills/190/S770
  10. Buolamwini J, Gebru T. 2018. Gender shades: intersectional accuracy disparities in commercial gender classification. Proc. Mach. Learn. Res. 81:77–91
    [Google Scholar]
  11. Calders T, Verwer S. 2010. Three naive Bayes approaches for discrimination-free classification. Data Min. Knowl. Discov. 21:277–92
    [Google Scholar]
  12. Chouldechova A. 2017. Fair prediction with disparate impact: a study of bias in recidivism prediction instruments. Big Data 5:153–63
    [Google Scholar]
  13. Chouldechova A, Benavides-Prado D, Fialko O, Vaithianathan R 2018. A case study of algorithm-assisted decision making in child maltreatment hotline screening decisions. Proc. Mach. Learn. Res. 81:134–48
    [Google Scholar]
  14. Chouldechova A, G'Sell M. 2017. Fairer and more accurate, but for whom?. arXiv:1707.00046 [stat.AP]
  15. Chouldechova A, Roth A. 2018. The frontiers of fairness in machine learning. arXiv:1810.08810 [cs.LG]
  16. Cleary AT. 1966. Test bias: validity of the scholastic aptitude test for negro and white students in integrated colleges Res. Bull. RB-66-31 Educ. Test. Serv Princeton, NJ:
  17. Corbett-Davies S, Goel S. 2018. The measure and mismeasure of fairness: a critical review of fair machine learning. arXiv:1808.00023 [cs.CY]
  18. Corbett-Davies S, Pierson E, Feller A, Goel S, Huq A 2017. Algorithmic decision making and the cost of fairness. Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining797–806 New York: ACM
    [Google Scholar]
  19. Coston A, Mishler A, Kennedy EH, Chouldechova A 2020. Counterfactual risk assessments, evaluation, and fairness. arXiv:1909.00066 [stat.ML]
  20. Courchane M, Nebhut D, Nickerson D 2000. Lessons learned: statistical techniques and fair lending. J. Hous. Res. 11:277–95
    [Google Scholar]
  21. Crenshaw K. 1989. Demarginalizing the intersection of race and sex: a black feminist critique of antidiscrimination doctrine, feminist theory and antiracist politics. Univ. Chic. Legal Forum 1989:8
    [Google Scholar]
  22. Daniller A. 2019. Two-thirds of Americans support marijuana legalization. FactTank Blog/Pew Research Center Nov. 14. https://www.pewresearch.org/fact-tank/2019/11/14/americans-support-marijuana-legalization/
    [Google Scholar]
  23. Darlington RB. 1971. Another look at ‘cultural fairness’. J. Educ. Meas. 8:71–82
    [Google Scholar]
  24. Day JN. 2002. Credit, capital and community: informal banking in immigrant communities in the United States, 1880–1924. Financ. Hist. Rev. 9:65–78
    [Google Scholar]
  25. De-Arteaga M, Dubrawski A, Chouldechova A 2018. Learning under selective labels in the presence of expert consistency. arXiv:1807.00905 [cs.LG]
  26. Dieterich W, Mendoza C, Brennan T 2016. COMPAS risk scales: demonstrating accuracy equity and predictive parity Work. Pap., Northpointe Inc Traverse City, MI: http://go.volarisgroup.com/rs/430-MBX-989/images/ProPublica_Commentary_Final_070616.pdf
  27. Dobbe R, Dean S, Gilbert T, Kohli N 2018. A broader view on bias in automated decision-making: reflecting on epistemology and dynamics. arXiv:1807.00553 [cs.LG]
  28. Dorans NJ, Cook LL. 2016. Fairness in Educational Assessment and Measurement New York: Taylor & Francis
  29. Dwork C, Hardt M, Pitassi T, Reingold O, Zemel R 2012. Fairness through awareness. Proceedings of the 3rd Innovations in Theoretical Computer Science Conference214–26 New York: ACM
    [Google Scholar]
  30. Eckhouse L, Lum K, Conti-Cook C, Ciccolini J 2018. Layers of bias: a unified approach for understanding problems with risk assessment. Crim. Justice Behav. 46:185–209
    [Google Scholar]
  31. Einhorn HJ, Bass AR. 1971. Methodological considerations relevant to discrimination in employment testing. Psychol. Bull. 75:261
    [Google Scholar]
  32. Eubanks V. 2018. Automating Inequality: How High-Tech Tools Profile, Police, and Punish the Poor New York: St. Martin's
  33. book 2016. Big data: a report on algorithmic systems, opportunity, and civil rights Rep., Exec. Off. Pres Washington, DC: https://obamawhitehouse.archives.gov/sites/default/files/microsites/ostp/2016_0504_data_discrimination.pdf
  34. Feldman M, Friedler SA, Moeller J, Scheidegger C, Venkatasubramanian S 2015. Certifying and removing disparate impact. Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining259–68 New York: ACM
    [Google Scholar]
  35. Flores AW, Bechtel K, Lowenkamp CT 2016. False positives, false negatives, and false analyses: A rejoinder to “Machine bias: There's software used across the country to predict future criminals. And it's biased against blacks.”. Fed. Probat. J. 80:38
    [Google Scholar]
  36. Friedler SA, Scheidegger C, Venkatasubramanian S 2016. On the (im)possibility of fairness. arXiv:1609.07236 [cs.CY]
  37. Friedler SA, Scheidegger C, Venkatasubramanian S, Choudhary S, Hamilton EP, Roth D 2018. A comparative study of fairness-enhancing interventions in machine learning. arXiv:1802.04422 [stat.ML]
  38. Fussell S. 2018. The algorithm that could save vulnerable New Yorkers from being forced out of their homes. Gizmodo Aug. 18. https://gizmodo.com/the-algorithm-that-could-save-vulnerable-new-yorkers-fr-1826807459
    [Google Scholar]
  39. Fuster A, Goldsmith-Pinkham P, Ramadorai T, Walther A 2020. Predictably unequal? The effects of machine learning on credit markets. SSRN https://dx.doi.org/10.2139/ssrn.3072038
    [Crossref] [Google Scholar]
  40. Galhotra S, Brun Y, Meliou A 2017. Fairness testing: testing software for discrimination. Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering498–510 New York: ACM
    [Google Scholar]
  41. Gebru T, Morgenstern J, Vecchione B, Vaughan JW, Wallach H et al. 2018. Datasheets for datasets. arXiv:1803.09010 [cs.DB]
  42. Gelman A, Carlin JB, Stern HS, Dunson DB, Vehtari A, Rubin DB 2013. Bayesian Data Analysis Boca Raton, FL: Chapman & Hall/CRC. , 3rd. ed.
  43. Green B. 2018. “Fair” risk assessments: a precarious approach for criminal justice reform Paper presented at the 5th Workshop on Fairness, Accountability, and Transparency in Machine Learning (FAT/ML 2018) Stockholm, Sweden:
  44. Green B, Hu L. 2018. The myth in the methodology: towards a recontextualization of fairness in machine learning Presented at Machine Learning: The Debates Workshop, 35th International Conference on Machine Learning (ICML) Stockholm, Sweden:
  45. Greiner JD, Rubin DB. 2011. Causal effects of perceived immutable characteristics. Rev. Econ. Stat. 93:775–85
    [Google Scholar]
  46. Harcourt BE. 2008. Against Prediction: Profiling, Policing, and Punishing in an Actuarial Age Chicago: Univ. Chicago Press
  47. Hardt M, Price E, Srebro N 2016. Equality of opportunity in supervised learning. arXiv:1610.02413 [cs.LG]
  48. Hernán MA, Robins JM. 2018. Causal Inference Boca Raton, FL: Chapman & Hall/CRC
  49. Holland S, Hosny A, Newman S, Joseph J, Chmielinski K 2018. The dataset nutrition label: a framework to drive higher data quality standards. arXiv:1805.03677 [cs.DB]
  50. Hu L, Chen Y. 2018a. A short-term intervention for long-term fairness in the labor market. arXiv:1712.00064 [cs.GT]
  51. Hu L, Chen Y. 2018b. Welfare and distributional impacts of fair classification. arXiv:1807.01134 [cs.LG]
  52. Hunter JE, Schmidt FL. 1976. Critical analysis of the statistical and ethical implications of various definitions of test bias. Psychol. Bull. 83:1053
    [Google Scholar]
  53. Hutchinson B, Mitchell M. 2019. 50 years of test (un)fairness: lessons for machine learning. arXiv:1811.10104 [cs.AI]
  54. Imbens GW, Rubin DB. 2015. Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction Cambridge, UK: Cambridge Univ. Press
  55. book 2003. Unequal Treatment: Confronting Racial and Ethnic Disparities in Health Care Washington, DC: Natl. Acad. Press
  56. Jackson JW, VanderWeele TJ. 2018. Decomposition analysis to identify intervention targets for reducing disparities. Epidemiology 29:825–35
    [Google Scholar]
  57. Johndrow JE, Lum K. 2019. An algorithm for removing sensitive information: application to race-independent recidivism prediction. Ann. Appl. Stat. 13:189–220
    [Google Scholar]
  58. Jung C, Kearns M, Neel S, Roth A, Stapleton L, Wu ZS 2019. Eliciting and enforcing subjective individual fairness. arXiv:1905.10660 [cs.LG]
  59. Kallus N, Zhou A. 2018. Residual unfairness in fair machine learning from prejudiced data. arXiv:1806.02887 [stat.ML]
  60. Kamiran F, Calders T. 2009. Classifying without discriminating. 2009 2nd International Conference on Computer, Control and Communication1–6 Red Hook, NY: Curran
    [Google Scholar]
  61. Karlin S, Rubin H. 1956. The theory of decision procedures for distributions with monotone likelihood ratio. Ann. Math. Stat. 27:272–99
    [Google Scholar]
  62. Kearns M, Neel S, Roth A, Wu ZS 2018. Preventing fairness gerrymandering: auditing and learning for subgroup fairness. Proc. Mach. Learn. Res. 80:2564–72
    [Google Scholar]
  63. Kilbertus N, Carulla MR, Parascandolo G, Hardt M, Janzing D, Schölkopf B 2017. Avoiding discrimination through causal reasoning. arXiv:1706.02744 [stat.ML]
  64. Kim MP, Reingold O, Rothblum GN 2018. Fairness through computationally-bounded awareness. arXiv:1803.03239 [cs.LG]
  65. Kleinberg J, Lakkaraju H, Leskovec J, Ludwig J, Mullainathan S 2017. Human decisions and machine predictions. Q. J. Econ. 133:237–93
    [Google Scholar]
  66. Kleinberg J, Ludwig J, Mullainathan S, Rambachan A 2018. Algorithmic fairness. AEA Pap. Proc. 108:22–27
    [Google Scholar]
  67. Kleinberg J, Mullainathan S, Raghavan M 2016. Inherent trade-offs in the fair determination of risk scores. arXiv:1609.05807 [cs.LG]
  68. Kohler-Hausmann I. 2018. Eddie Murphy and the dangers of counterfactual causal thinking about detecting racial discrimination. Northwestern Univ. Law Rev. 113:1163–228
    [Google Scholar]
  69. Koulish R. 2016. Immigration detention in the risk classification assessment era. Conn. Public Interest Law J. 16:1
    [Google Scholar]
  70. Kusner MJ, Loftus J, Russell C, Silva R 2017. Counterfactual fairness. arXiv:1703.06856 [stat.ML]
  71. Lakkaraju H, Kleinberg J, Leskovec J, Ludwig J, Mullainathan S 2017. The selective labels problem: Evaluating algorithmic predictions in the presence of unobservables. Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining275–84 New York: ACM
    [Google Scholar]
  72. Larson J, Mattu S, Kirchner L, Angwin J 2016. How we analyzed the COMPAS recidivism algorithm. ProPublica May 23. https://www.propublica.org/article/how-we-analyzed-the-compas-recidivism-algorithm
    [Google Scholar]
  73. book 2020. The use of pre-trial “risk assessment” instruments: a shared statement of civil rights concerns Statement, Leadersh. Conf. Civil Hum. Rights Washington, DC:
  74. Lewis MA. 1978. A comparison of three models for determining test fairness Tech. Rep., Fed. Aviat. Adm Washington, DC:
  75. Lipton Z, McAuley J, Chouldechova A 2018. Does mitigating ML's impact disparity require treatment disparity?. arXiv:1711.07076 [stat.ML]
  76. Liu LT, Dean S, Rolf E, Simchowitz M, Hardt M 2018. Delayed impact of fair machine learning. arXiv:1803.04383 [cs.LG]
  77. Lum K, Isaac W. 2016. To predict and serve. ? Significance 13:14–19
    [Google Scholar]
  78. journal 2018. Legionnaires' disease response: MODA assisted in a citywide response effort after an outbreak of Legionnaires' disease. MODA Project Library https://moda-nyc.github.io/Project-Library/projects/cooling-towers/
    [Google Scholar]
  79. Miller CC. 2015a. Can an algorithm hire better than a human. ? New York Times June 26. https://www.nytimes.com/2015/06/26/upshot/can-an-algorithm-hire-better-than-a-human.html
    [Google Scholar]
  80. Miller CC. 2015b. When algorithms discriminate. New York Times June 26. https://www.nytimes.com/2015/06/26/upshot/can-an-algorithm-hire-better-than-a-human.html
    [Google Scholar]
  81. Mitchell M, Wu S, Zaldivar A, Barnes P, Vasserman L et al. 2018. Model cards for model reporting. arXiv:1810.03993 [cs.LG]
  82. Nabi R, Shpitser I. 2018. Fair inference on outcomes. arXiv:1705.10378 [stat.ML]
  83. Narayanan A. 2018. 21 fairness definitions and their politics Presented at ACM FAT* (Fairness, Accountability and Transparency) Conference 2018 New York, NY:
  84. Ochigame R, Barabas C, Dinakar K, Virza M, Ito J 2018. Beyond legitimation: rethinking fairness, interpretability, and accuracy in machine learning Presented at Machine Learning: The Debates Workshop, 35th International Conference on Machine Learning (ICML) Stockholm, Sweden:
  85. O'Neil C. 2016. Weapons of Math Destruction: How Big Data Increases Inequality and Threatens Democracy New York: Crown
  86. O'Neil C, Gunn H. 2020. Near term artificial intelligence and the ethical matrix. Ethics of Artificial Intelligence SM Liao 235–69 Oxford, UK: Oxford Univ. Press
    [Google Scholar]
  87. Passi S, Barocas S. 2019. Problem formulation and fairness. arXiv:1901.02547 [cs.CY]
  88. Pearl J. 2009. Causality Cambridge, UK: Cambridge Univ. Press
  89. Pedreshi D, Ruggieri S, Turini F 2008. Discrimination-aware data mining. Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining560–68 New York: ACM
    [Google Scholar]
  90. Perković E, Textor J, Kalisch M, Maathuis MH 2015. A complete generalized adjustment criterion. arXiv:1507.01524 [math.ST]
  91. Petersen NS, Novick MR. 1976. An evaluation of some models for culture-fair selection. J. Educ. Meas. 13:3–29
    [Google Scholar]
  92. Potash E, Brew J, Loewi A, Majumdar S, Reece A et al. 2015. Predictive modeling for public health: preventing childhood lead poisoning. Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining2039–47 New York: ACM
    [Google Scholar]
  93. Rabanser S, Günnemann S, Lipton ZC 2018. Failing loudly: an empirical study of methods for detecting dataset shift. arXiv:1810.11953 [stat.ML]
  94. Raphling J. 2018. Criminalizing homelessness violates basic human rights. The Nation July 5. https://www.thenation.com/article/criminalizing-homelessness-violates-basic-human-rights/
    [Google Scholar]
  95. Rawls J. 1971. A Theory of Justice Cambridge, MA: Harvard Univ. Press
  96. Rothblum GN, Yona G. 2018. Probably approximately metric-fair learning. Proc. Mach. Learn. Res. 80:5680–88
    [Google Scholar]
  97. Rudin C. 2019. Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nat. Mach. Intell. 1:5206–15
    [Google Scholar]
  98. Selbst AD, boyd d, Friedler S, Venkatasubramanian S, Vertesi J 2019. Fairness and abstraction in sociotechnical systems. FAT* '19: Proceedings of the Conference on Fairness, Accountability and Transparency59–68 New York: ACM
    [Google Scholar]
  99. Silva S, Kenney M. 2018. Algorithms, platforms, and ethnic bias: an integrative essay. Phylon 55:9–37
    [Google Scholar]
  100. Simoiu C, Corbett-Davies S, Goel S 2017. The problem of infra-marginality in outcome tests for discrimination. Ann. Appl. Stat. 11:1193–216
    [Google Scholar]
  101. Skeem JL, Lowenkamp CT. 2016. Risk, race, and recidivism: Predictive bias and disparate impact. Criminology 54:680–712
    [Google Scholar]
  102. Stoyanovich J. 2018. Refining the concept of a nutritional label for data and models. Freedom to Tinker Blog May 3. https://freedom-to-tinker.com/2018/05/03/refining-the-concept-of-a-nutritional-label-for-data-and-models/
    [Google Scholar]
  103. Suresh H, Guttag JV. 2019. A framework for understanding unintended consequences of machine learning. arXiv:1901.10002 [cs.LG]
  104. Sweeney L. 2013. Discrimination in online ad delivery. Queue 11:10
    [Google Scholar]
  105. Thorndike RL. 1971. Concepts of culture-fairness. J. Educ. Meas. 8:63–70
    [Google Scholar]
  106. Vaithianathan R, Maloney T, Putnam-Hornstein E, Jiang N 2013. Children in the public benefit system at risk of maltreatment: Identification via predictive modeling. Am. J. Prev. Med. 45:354–59
    [Google Scholar]
  107. VanderWeele TJ, Hernán MA. 2012. Results on differential and dependent measurement error of the exposure and the outcome using signed directed acyclic graphs. Am. J. Epidemiol. 175:1303–10
    [Google Scholar]
  108. VanderWeele TJ, Robinson WR. 2014. On causal interpretation of race in regressions adjusting for confounding and mediating variables. Epidemiology 25:473–84
    [Google Scholar]
  109. Verma S, Rubin J. 2018. Fairness definitions explained. FairWare '18: Proceedings of the International Workshop on Software Fairness1–7 Red Hook, NY: Curran
    [Google Scholar]
  110. Wasserman L. 2010. All of Statistics: A Concise Course in Statistical Inference New York: Springer
  111. Wexler J. 2018. The what-if tool: code-free probing of machine learning models. Google AI Blog Sept. 11. https://ai.googleblog.com/2018/09/the-what-if-tool-code-free-probing-of.html
    [Google Scholar]
  112. Yang K, Stoyanovich J, Asudeh A, Howe B, Jagadish H, Miklau G 2018. A nutritional label for rankings. Proceedings of the 2018 International Conference on Management of Data1773–76 New York: ACM
    [Google Scholar]
  113. Zafar MB, Valera I, Gomez Rodriguez M, Gummadi KP 2017. Fairness beyond disparate treatment & disparate impact: Learning classification without disparate mistreatment. arXiv:1610.08452 [stat.ML]
  114. Zehlike M, Castillo C, Bonchi F, Baeza-Yates R, Hajian S, Megahed M 2017. Fairness measures: a platform for data collection and benchmarking in discrimination-aware ML. Machine Learning Software https://fairnessmeasures.github.io/
    [Google Scholar]
  115. Zhang J, Bareinboim E. 2018. Fairness in decision-making–the causal explanation formula. Thirty-Second AAAI Conference on Artificial Intelligence2037–45 Menlo Park, CA: AAAI
    [Google Scholar]
/content/journals/10.1146/annurev-statistics-042720-125902
Loading
/content/journals/10.1146/annurev-statistics-042720-125902
Loading

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error