1932

Abstract

Machine learning algorithms are becoming ubiquitous in modern life. When used to help inform human decision making, they have been criticized by some for insufficient accuracy, an absence of transparency, and unfairness. Many of these concerns can be legitimate, although they are less convincing when compared with the uneven quality of human decisions. There is now a large literature in statistics and computer science offering a range of proposed improvements. In this article, we focus on machine learning algorithms used to forecast risk, such as those employed by judges to anticipate a convicted offender's future dangerousness and by physicians to help formulate a medical prognosis or ration scarce medical care. We review a variety of conceptual, technical, and practical features common to risk algorithms and offer suggestions for how their development and use might be meaningfully advanced. Fairness concerns are emphasized.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-statistics-033021-120649
2023-03-09
2024-04-30
Loading full text...

Full text loading...

/deliver/fulltext/statistics/10/1/annurev-statistics-033021-120649.html?itemId=/content/journals/10.1146/annurev-statistics-033021-120649&mimeType=html&fmt=ahah

Literature Cited

  1. Avin C, Shpitser Id, Pearl J. 2005. Identifiability of path-specific effects. Proceedings of the Nineteenth International Joint Conference on Artificial Intelligence357–63 N.p.: IJCAI
    [Google Scholar]
  2. Athey S, Imbens G. 2019. Machine learning methods that economists should know about. Annu. Rev. Econ. 11:685–725
    [Google Scholar]
  3. Baer BR, Gilbert DE, Wells MT 2020. Fairness criteria through the lens of directed acyclic graphs: a statistical modeling perspective. The Oxford Handbook of Ethics of AI MD Dubber, F Pasquale, S Das 493–520 Oxford, UK: Oxford Univ. Press
    [Google Scholar]
  4. Becker GS. 1996. Accounting for Tastes Cambridge, MA: Harvard Univ. Press
  5. Ben-Michael E, Greiner DJ, Imai K, Jiang Z. 2022. Safe policy learning through extrapolation: application to pre-trial risk assessment. arXiv:2109.11679v3 [stat.ML]
  6. Berk RA. 2017. An impact assessment of machine learning risk forecasts on parole board decisions and recidivism. J. Exp. Criminol. 13:2633–55
    [Google Scholar]
  7. Berk RA. 2018. Machine Learning Forecasts of Risk in Criminal Justice Settings New York: Springer
  8. Berk RA. 2020. Statistical Learning from a Regression Perspective New York: Springer. , 3rd ed..
  9. Berk RA, DeLeeuw J. 1999. An evaluation of California's inmate classification system using a generalized regression discontinuity design. J. Am. Stat. Assoc. 94:4481045–52
    [Google Scholar]
  10. Berk RA, Elzarka A. 2020. Almost politically acceptable criminal justice risk assessment. Criminol. Public Policy 19:41231–57
    [Google Scholar]
  11. Berk RA, Freedman DA 2003. Statistical assumptions as empirical commitments. Punishment and Social Control TG Blomberg, S Cohen 234–58 Piscataway, NJ: Aldine de Gruyter. , 2nd ed..
    [Google Scholar]
  12. Berk RA, Heirdari H, Jabbari S, Kearns M, Roth A 2018. Fairness in criminal justice risk assessments: the state of the art. Sociol. Methods Res. 50:13–44
    [Google Scholar]
  13. Berk RA, Kuchibhotla AK, Tchetgen Tchetgen E 2021. Improving fairness in criminal justice algorithmic risk assessments using optimal transport and conformal prediction sets. arXiv:2111.09211 [STAT.AP]
  14. Berk RA, Sorenson SB, Barnes G. 2016. Forecasting domestic violence: a machine learning approach to help inform arraignment decisions. J. Empir. Legal Stud. 13:194–115
    [Google Scholar]
  15. Bickel PJ, Hammel EA, O'Connell JW. 1975. Sex bias in graduate admission: data from Berkeley. Science 187:394–404
    [Google Scholar]
  16. Bielen S, Grajzi P. 2021. Prosecution or persecution? Extraneous events and prosecutorial decisions. J. Empir. Legal Stud. 18:4765–800
    [Google Scholar]
  17. Binns R. 2019. On the apparent conflict between individual and group fairness. arXiv:1912:06883v1 [cs.LG]
  18. Bishop CM. 2006. Pattern Recognition and Machine Learning New York: Springer
  19. [Google Scholar]
  20. Bognar G, Hirose I. 2014. The Ethics of Health Care Rationing: An Introduction London: Routledge
  21. Bolukbasi T, Chang K-W, Zou J, Saligrama V, Kalai A. 2016. Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. arXiv:1607.06520 [cs.CL]
  22. Breiman L. 2001. Statistical modeling: two cultures. Stat. Sci. 16:3199–231
    [Google Scholar]
  23. Buja A, Brown L, Berk R, George E, Pitkin Eet al 2019a. Models as approximations I: consequences illustrated with linear regression. Stat. Sci. 34:4523–44
    [Google Scholar]
  24. Buja A, Brown L, Kuchibhotla AK, Berk R, George E, Zhao L 2019b. Models as approximations II: a model-free theory of parametric regression. Stat. Sci. 34:4545–65
    [Google Scholar]
  25. Calmon FP, Wei D, Vinzamuri B, Ramamurthy KN, Varshney KR 2017. Optimized pre-processing of discrimination prevention. Advances in Neural Information Processing Systems 30 (NIPS 2017) I Guyon, U Von Luxburg, S Bengio, H Wallach, R Fergus et al. N.p.: NeurIPS
    [Google Scholar]
  26. Campbell DT, Stanley JC. 1963. Experimental and Quasi-Experimental Designs for Research Chicago: Rand McNally & Co.
  27. Chen Y, Bühlmann P. 2021. Domain adaptation under structural causal models. J. Mach. Learn. Res. 22:2611–80
    [Google Scholar]
  28. Chouldechova A. 2017. Fair prediction with disparate impact: a study of bias in recidivism prediction instruments. Big Data 5:153–63
    [Google Scholar]
  29. Cochran WG. 2007. Sampling Techniques New York: Wiley
  30. Coglianese C. 2021. Administrative law in the automated state Work. Pap. 2273, Legal Sch. Repos., Univ. Pa https://scholarship.law.upenn.edu/faculty_scholarship/2273/
  31. Coglianese C, Lai A. 2022. Algorithm v. algorithm. Duke Law J 71:1281–342
    [Google Scholar]
  32. Coglianese C, Lehr D. 2018. Transparency and algorithmic governance. Adm. Law Rev. 71:1–56
    [Google Scholar]
  33. Cooper AF, Abrams E. 2021. Emergent unfairness in algorithmic fairness–accuracy trade-off research. AIES '21: Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society46–54 New York: ACM
    [Google Scholar]
  34. Corbett-Davies S, Pierson E, Feller A, Goel S, Hug A 2017. Algorithmic decision making and the cost of fairness. arXix:1701.08230v4 [cs.CY]
  35. Chzhen E, Denis M, Hebiri M, Oneto L, Pontil M. 2020. Fair regression with Wasserstein barycenters. arXiv:2006.07286 [stat.ML]
  36. Diana E, Gill W, Kearns M, Kenthapadi K, Roth A 2021. Minimax group fairness: algorithms and experiments. arXiv:2011.03108v2 [cs.LG]
  37. Dwork C, Hardt M, Pitassi T, Reingold O, Zemel 2011. Fairness through awareness. arXiv:1104.3913v2 [cs.CC]
  38. Feldman M, Sorelle AF, Moeller J, Scheidegger C, Venkatasubramanian S. 2015. Certifying and removing disparate impact. Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining259–68 New York: ACM
    [Google Scholar]
  39. Fisher FM, Kadane JB 1983. Empirically based sentencing guidelines and ethical considerations. Research on Sentencing: The Search for Reform, Volume II A Blumstein, J Cohen, SE Martin, MH Tonry 184–93 Washington, DC: Natl. Acad. Press
    [Google Scholar]
  40. Freedman D. 2006. Statistical Models: Theory and Practice Cambridge, UK: Cambridge Univ. Press
  41. Friedman JH. 2001. Greedy function approximation: a gradient boosting machine. Ann. Stat. 29:51189–232
    [Google Scholar]
  42. Gillen S, Jung C, Kearns M, Roth A 2018. Online learning with an unknown fairness metric. NIPS'18: Proceedings of the 32nd International Conference on Neural Information Processing Systems S Bengio, HM Wallach, H Larochelle, K Grauman, N Cesa-Bianchi 2600–9 N.p.: NeurIPS
    [Google Scholar]
  43. Hardt M, Price E, Srebro N 2016. Equality of opportunity in supervised learning. arXiv:1601.02413v1 [cs.LG]
  44. Hastie T, Tibshrani R, Friedman J. 2009. Elements of Statistical Learning New York: Springer. , 2nd ed..
  45. Hosanna-Tabor Evangelical Lutheran Church and School v. Equal Employment Opportunity Commission, 597 F. 3d 769, reversed; 2010.)
  46. Imai K, Jiang Z, Greiner DJ, Halen R, Shin S 2022. Experimental evaluation of algorithm-assisted human decision-making: an application of pretrial public safety assessment Presentation at Royal Statistical Society, Statistics and Law Section, Data Ethics and Governance Section, and Discussion Meetings Committee, virtual meeting Feb. 8
  47. Imbens GW, Lemieux T. 2008. Regression discontinuity designs: a guide to practice. J. Econom. 142:2615–35
    [Google Scholar]
  48. Jung C, Kearns M, Neel S, Roth A, Stapleton L, Wu ZS. 2020. An algorithmic framework for fairness elicitation. arXiv:1905.10660v2 [cs.LG]
  49. Kamiran F, Calders T. 2012. Data pre-processing techniques for classification without discrimination. Knowl. Inf. Syst. 33:1–33
    [Google Scholar]
  50. Kearns M, Roth A 2020. The Ethical Algorithm Oxford, UK: Oxford Univ. Press
  51. Kim MP, Ghorbani A, Zou J. 2019. Multiaccuracy: black-box post-processing for fairness classification. AIES '19: Proceedings of the 2019 AAAI/ACM Conference on AI, Ethics, and Society247–54 New York: ACM
    [Google Scholar]
  52. Kleinberg J, Himabindu L, Leskovec J, Ludwig J, Sendhil M. 2018. Human decisions and machine predictions. Q. J. Econ. 133:1237–93
    [Google Scholar]
  53. Kleinberg J, Mullainathan SM, Raghavan M. 2017. Inherent tradeoffs in the fair determination of risk scores. Proceedings of the 8th Conference on Innovations in Theoretical Computer Science CH Papadimitriou, artic. 43 Saarbrücken, Ger: Schloss Dagstuhl
    [Google Scholar]
  54. Kuchibhotla AK, Berk RA. 2020. Nested conformal prediction sets for classification with applications to probation data. arXiv:2104.09358
  55. Kuchibhotla AK, Kolassa JE, Kuffner TA. 2021. Post-selection inference. Annu. Rev. Stat. Appl. 9:505–27
    [Google Scholar]
  56. Kussner M, Loftus J, Russel C, Silva R 2018. Counterfactual fairness. arXiv:1703.06856v3 [stat.ML]
  57. Le Gouic T, Loubes J-M, Rigollet P 2020. Projection to fairness in statistical learning. arXIV:2005.11720v4 [cs.LG]
  58. Lei J, G'Sell M, Rinaldo R, Tibshirani RJ, Wasserman L 2018. Distribution-free predictive inference for regression. J. Am. Stat. Assoc. 113:523
    [Google Scholar]
  59. Madaan N, Mehta S, Agrawaal T, Malhotra V, Aggarwal A et al. 2018. Analyze, detect and remove gender stereotyping from Bollywood movies. PMLR 81:92–105
    [Google Scholar]
  60. Maity S, Xue S, Yurochkin M, Sun Y. 2021. Statistical inference for individual fairness. arXiv:2103.16714v1 [stat.ML]
  61. Mitchell S, Potash E, Barocas S, D'Amour A, Lum K 2021. Algorithmic fairness, choices, assumptions and definitions. Annu. Rev. Stat. Appl. 8:141–63
    [Google Scholar]
  62. Miller JL, Rossi PH, Simpson JE. 1986. Race and gender differences in judgements of appropriate prison sentence. Law Soc. Rev. 20:3313–34
    [Google Scholar]
  63. Mohri M, Rostamizadeh A, Talwalkar A. 2018. Foundations of Machine Learning Cambridge, MA: MIT Press. , 2nd ed..
  64. Mukherjee D, Yurochkin M, Banerjee M, Sun Y 2020. Two simple ways to learn individual fairness metrics from data. ICML'20: Proceedings of the 37th International Conference on Machine Learning7097–107 N.p.: JMLR
    [Google Scholar]
  65. Murphy KP. 2012. Machine Learning: A Probabilistic Perspective Cambridge, MA: MIT Press
  66. Nabi R, Shpister I. 2018. Fair inference on outcomes. arXiv:1705.10378v4 [stat.ML]
  67. Nakkiran P, Kaplun G, Bansal Y, Yang T, Barak B, Sutskever I. 2019. Deep double descent: where bigger models and more data hurt. aXiv.1912.02292 [cs.LG ].
  68. Pan JP, Yang Q 2009. A survey of transfer learning. IEEE Trans. Knowl. Data Eng. 22:101345–59
    [Google Scholar]
  69. Pan L, Meng M, Ren Y, Zheng Y, Xu Z. 2021. Self-paced deep regression forests with consideration on ranking fairness. arXiv:2112.06455v1 [cs.CV]
  70. Peyré G, Cuturi M. 2019. Computational Optimal Transport with Applications to Data Science Boston: Now Publ.
  71. Oneto L, Chiappa S. 2020. Fairness in machine learning. arXiv:2012.15816v1 [cs.LG]
  72. O'Reilly S. 2017. Just because you're paranoid… Phillip K Dick's troubled life. The Irish Times Oct. 7. https://www.irishtimes.com/culture/film/just-because-you-re-paranoid-philip-k-dick-s-troubled-life-1.3243976
    [Google Scholar]
  73. Rawls J. 2001. Justice as Fairness: A Restatement Cambridge, MA: Harvard Univ. Press
  74. Robbins MW, Saunders J, Kilmer B. 2017. A framework for synthetic control methods with high-dimensional, micro-level data: evaluating a neighborhood-specific crime intervention. J. Am. Stat. Assoc. 112:517109–26
    [Google Scholar]
  75. Rodolfa KT, Lamba H, Ghani R. 2021. Empirical observation of negligible fairness–accuracy trade-offs in machine learning for public policy. Nat. Mach. Intell. 3:869–904
    [Google Scholar]
  76. Romano Y, Barber RF, Sabatti C, Candés E. 2019. With malice toward none: assessing uncertainty via equalized coverage. arXiv:1908:05428v1 [stat.ME]
  77. Romano Y, Patterson E, Candés E. 2019. Conformalized quantile regression. arXiv:1905.03222 [stat.ME]
  78. Rossi PH, Berk RA. 1985. Varieties of normative consensus. Am. Sociol. Rev. 50:3333–47
    [Google Scholar]
  79. Rossi PH, Berk RA. 1997. Just Punishments: Federal Guidelines and Public View Compared Piscataway, NJ: Aldine de Gruyter
  80. Rossi PH, Waite E, Bose C, Berk RA. 1974. The seriousness of crimes: normative structure and individual differences. Am. Sociol. Rev. 39:224–37
    [Google Scholar]
  81. Rothenhäusler D, Meinshausen N, Bühlmann P, Peters J. 2021. Anchor regression: heterogeneous data meet causality. J. R. Stat. Soc. Ser. B 83:215–46
    [Google Scholar]
  82. Rudin C, Berk U. 2018. Optimized scoring systems: toward trust in machine learning for healthcare and criminal justice. J. Appl. Anal. 48:5449–66
    [Google Scholar]
  83. Schökopf B. 2019. Causality for machine learning. arXiv:1911.10500 [cs.LG]
  84. Shannon CE. 1959. Coding theorems for a discrete source with a fidelity criterion. IRE Int. Conv. Rec. 7:42–163
    [Google Scholar]
  85. Shi C, Wang X, Luo S, Zhu H, Ye J, Song R 2021. Dynamic causal effects evaluation in A/B testing with a reinforcement learning framework. arXiv:2002.01711v5 [cs.LG]
  86. Singer N, Metz C. 2019. Many facial recognition systems are biased, says U.S. study. New York Times Dec. 19. https://www.nytimes.com/2019/12/19/technology/facial-recognition-bias.html
    [Google Scholar]
  87. Smith AH, Parish JJ. 2010. When a Person with Mental Illness Goes to Prison New York: Urban Justice Cent.
  88. Starr SB. 2014. Sentencing, by the numbers. New York Times Aug. 10. https://www.nytimes.com/2014/08/11/opinion/sentencing-by-the-numbers.html
    [Google Scholar]
  89. Stevenson MT, Doleac JL. 2018. The Roadblock to Reform Washington, DC: Am. Const. Soc.
  90. Tibshirani RJ, Barber RF, Candés EJ, Ramdas A. 2020. Conformal prediction under a covariate shift. arXiv:1904.06019v3 [stat.ME]
  91. Tseng G. 2018. Interpreting neural networks. Towards Data Science Blog Nov. 16. https://towardsdatascience.com/interpretable-neural-networks-45ac8aa91411
    [Google Scholar]
  92. Watson J, Holmes C. 2016. Approximate models and robust decisions. Stat. Sci. 11:4465–89
    [Google Scholar]
  93. Wu J, Ma Z, Ramen A, Laudanski K, Hung A 2021. APOL1 risk variants in individuals of African genetic ancestry drive endothelial cell defects that exacerbate sepsis. Immunity 54:112632–49
    [Google Scholar]
  94. Wu Y, Zhang L, Wu X, Tong H. 2019. PC-fairness: a unified framework for measuring causality-based fairness. arXiv:1910.12586v1 [cs.LG]
  95. Zhuang F, Qi Z, Duan K, Xi D, Zhu Y et al. 2020. A comprehensive survey of transfer learning. arXiv:1911:02685v3 [cs.LG]
/content/journals/10.1146/annurev-statistics-033021-120649
Loading
/content/journals/10.1146/annurev-statistics-033021-120649
Loading

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error