The use of machine learning (ML) in healthcare raises numerous ethical concerns, especially as models can amplify existing health inequities. Here, we outline ethical considerations for equitable ML in the advancement of healthcare. Specifically, we frame ethics of ML in healthcare through the lens of social justice. We describe ongoing efforts and outline challenges in a proposed pipeline of ethical ML in health, ranging from problem selection to postdeployment considerations. We close by summarizing recommendations to address these challenges.


Article metrics loading...

Loading full text...

Full text loading...


Literature Cited

  1. 1. 
    Topol EJ. 2019. High-performance medicine: the convergence of human and artificial intelligence. Nat. Med. 25:44–56
    [Google Scholar]
  2. 2. 
    Ferryman K, Winn RA. 2018. Artificial intelligence can entrench disparities—Here's what we must do. The Cancer Letter Nov. 16. https://cancerletter.com/articles/20181116_1/
    [Google Scholar]
  3. 3. 
    Wiens J, Saria S, Sendak M, Ghassemi M, Liu VX et al. 2019. Do no harm: a roadmap for responsible machine learning for health care. Nat. Med. 25:1337–40
    [Google Scholar]
  4. 4. 
    Ghassemi M, Naumann T, Schulam P, Beam AL, Chen IY, Ranganath R. 2020. A review of challenges and opportunities in machine learning for health. AMIA Summits Transl. Sci. Proc. 2020:191–200
    [Google Scholar]
  5. 5. 
    Ghassemi M, Naumann T, Schulam P, Beam AL, Chen IY, Ranganath R. 2019. Practical guidance on artificial intelligence for health-care data. Lancet Digital Health 1:e157–59
    [Google Scholar]
  6. 6. 
    Chen IY, Szolovits P, Ghassemi M. 2019. Can AI help reduce disparities in general medical and mental health care?. AMA J. Ethics 21:167–79
    [Google Scholar]
  7. 7. 
    Zhang H, Lu AX, Abdalla M, McDermott M, Ghassemi M. 2020. Hurtful words: quantifying biases in clinical contextual word embeddings. Proceedings of the ACM Conference on Health, Inference, and Learning110–20 New York: Assoc. Comput. Mach .
  8. 8. 
    Obermeyer Z, Powers B, Vogeli C, Mullainathan S. 2019. Dissecting racial bias in an algorithm used to manage the health of populations. Science 366:447–53
    [Google Scholar]
  9. 9. 
    boyd d, Crawford K. 2012. Critical questions for big data: provocations for a cultural, technological, and scholarly phenomenon. Inform. Commun. Soc. 15:662–79
    [Google Scholar]
  10. 10. 
    Dalton CM, Taylor L, Thatcher J. 2016. Critical data studies: a dialog on data and space. Big Data Soc. 3:1). https://doi.org/10.1177/2053951716648346
    [Crossref] [Google Scholar]
  11. 11. 
    Zliobaite I. 2015. A survey on measuring indirect discrimination in machine learning. arXiv:1511.00148 [cs.CY]
  12. 12. 
    Barocas S, Hardt M, Narayanan A. 2018. Fairness and machine learning Online Book, fairmlbook.org. http://www.fairmlbook.org
  13. 13. 
    Corbett-Davies S, Goel S. 2018. The measure and mismeasure of fairness: a critical review of fair machine learning. arXiv:1808.00023 [cs.CY]
  14. 14. 
    Chen I, Johansson FD, Sontag D. 2018. Why is my classifier discriminatory?. Proceedings of the 31st International Conference on Advances in Neural Information Processing Systems (NIPS 2018) S Bengio, H Wallach, H Larochelle, K Grauman, N Cesa-Bianchi, R Garnett 3539–50 https://proceedings.neurips.cc/paper/2018/file/1f1baa5b8edac74eb4eaa329f14a0361-Paper.pdf
  15. 15. 
    Rajkomar A, Hardt M, Howell MD, Corrado G, Chin MH. 2018. Ensuring fairness in machine learning to advance health equity. Ann. Intern. Med. 169:866–72
    [Google Scholar]
  16. 16. 
    Ustun B, Liu Y, Parkes D. 2019. Fairness without harm: decoupled classifiers with preference guarantees. Proc. Mach. Learn. Res. 97:6373–82
    [Google Scholar]
  17. 17. 
    Benjamin R. 2019. Assessing risk, automating racism. Science 366:421–22
    [Google Scholar]
  18. 18. 
    Veatch RM, Guidry-Grimes LK. 2019. The Basics of Bioethics New York: Routledge. , 4th ed..
  19. 19. 
    Vayena E, Blasimme A, Cohen IG. 2018. Machine learning in medicine: addressing ethical challenges. PLOS Med. 15:e1002689
    [Google Scholar]
  20. 20. 
    Kaye J. 2012. The tension between data sharing and the protection of privacy in genomics research. Annu. Rev. Genom. Hum. Genet. 13:415–31
    [Google Scholar]
  21. 21. 
    Powers M, Faden R. 2006. Social Justice: The Moral Foundations of Public Health and Health Policy New York: Oxford Univ. Press
  22. 22. 
    Berg CJ, Atrash HK, Koonin LM, Tucker M. 1996. Pregnancy-related mortality in the United States, 1987–1990. Obstet. Gynecol. 88:161–67
    [Google Scholar]
  23. 23. 
    Roberts DE. 1999. Killing the Black Body: Race, Reproduction, and the Meaning of Liberty New York: Vintage Books
  24. 24. 
    Berry DR. 2017. The Price for their Pound of Flesh: The Value of the Enslaved, from Womb to Grave, in the Building of a Nation Boston: Beacon
  25. 25. 
    Fisk N, Atun R. 2009. Systematic analysis of research underfunding in maternal and perinatal health. BJOG 116:347–56
    [Google Scholar]
  26. 26. 
    Howell EA, Egorova N, Balbierz A, Zeitlin J, Hebert PL. 2016. Black-white differences in severe maternal morbidity and site of care. Am. J. Obstet. Gynecol. 214:122.e1–122.e7
    [Google Scholar]
  27. 27. 
    Creanga AA, Bateman BT, Mhyre JM, Kuklina E, Shilkrut A, Callaghan WM. 2014. Performance of racial and ethnic minority-serving hospitals on delivery-related indicators. Am. J. Obstet. Gynecol. 211:647 e1–647.e16
    [Google Scholar]
  28. 28. 
    Eltoukhi HM, Modi MN, Weston M, Armstrong AY, Stewart EA. 2014. The health disparities of uterine fibroid tumors for African American women: a public health issue. Am. J. Obstet. Gynecol. 210:194–99
    [Google Scholar]
  29. 29. 
    Hoffman KM, Trawalter S, Axt JR, Oliver MN. 2016. Racial bias in pain assessment and treatment recommendations, and false beliefs about biological differences between blacks and whites. PNAS 113:4296–301
    [Google Scholar]
  30. 30. 
    Creanga AA, Berg CJ, Ko JY, Farr SL, Tong VT et al. 2014. Maternal mortality and morbidity in the United States: Where are we now?. J. Women's Health 23:3–9
    [Google Scholar]
  31. 31. 
    Vidyasagar D. 2006. Global notes: the 10/90 gap disparities in global health research. J. Perinatol. 26:55–56
    [Google Scholar]
  32. 32. 
    Pierson L, Millum J. 2019. Grant reviews and health research priority setting: Do research funders uphold widely endorsed ethical principles? Paper presented at Global Health Bioeth. Conf. Oxford:1–2 July
  33. 33. 
    Von Philipsborn P, Steinbeis F, Bender ME, Regmi S, Tinnemann P. 2015. Poverty-related and neglected diseases–an economic and epidemiological analysis of poverty relatedness and neglect in research and development. Glob. Health Action 8:25818
    [Google Scholar]
  34. 34. 
    All Us Res. Prog. Investig 2019. The “All of Us” research program. New Engl. J. Med. 381:668–76
    [Google Scholar]
  35. 35. 
    23andme 2019. 23andme's call for collaborations to study underrepresented populations. 23andmeBlog Feb. 28. https://blog.23andme.com/23andme-research/23andmes-call-for-collaborations-to-study-underrepresented-populations/
    [Google Scholar]
  36. 36. 
    Tsosie KS, Yracheta JM, Dickenson D. 2019. Overvaluing individual consent ignores risks to tribal participants. Nat. Rev. Genet. 20:497–98
    [Google Scholar]
  37. 37. 
    Farooq F, Strouse JJ. 2018. Disparities in foundation and federal support and development of new therapeutics for sickle cell disease and cystic fibrosis. Blood 132:4687–87
    [Google Scholar]
  38. 38. 
    Park M. 2010. NCAA genetic screening rule sparks discrimination concerns. CNN Aug. 4. https://www.cnn.com/2010/HEALTH/08/04/ncaa.sickle.genetic.screening/index.html
    [Google Scholar]
  39. 39. 
    Rouse C. 2009. Uncertain Suffering: Racial Health Care Disparities and Sickle Cell Disease Berkeley: Univ. Calif. Press
  40. 40. 
    Chakradhar S. 2018. Discovery cycle. Nat. Med. 24:1082–86
    [Google Scholar]
  41. 41. 
    Eisenberg V, Weil C, Chodick G, Shalev V. 2018. Epidemiology of endometriosis: a large population-based database study from a healthcare provider with 2 million members. BJOG 125:55–62
    [Google Scholar]
  42. 42. 
    Pierson E, Althoff T, Thomas D, Hillard P, Leskovec J. 2021. Daily, weekly, seasonal and menstrual cycles in women's mood, behaviour and vital signs. Nat. Human Behav. https://doi.org/10.1038/S41562-020-01046-9
    [Crossref] [Google Scholar]
  43. 43. 
    Hillard PJA. 2014. Menstruation in adolescents: What do we know? And what do we do with the information?. J. Pediatr. Adolesc. Gynecol. 27:309–19
    [Google Scholar]
  44. 44. 
    Am. Acad. Pediatr. Comm. Adolesc., Am. Coll. Obstet. Gynecol. Comm. Adolesc. Health Care 2006. Menstruation in girls and adolescents: using the menstrual cycle as a vital sign. Pediatrics 118:2245–50
    [Google Scholar]
  45. 45. 
    NIH (Natl. Inst. Health) 2020. NIH offers its first research project grant (R01) on sex and gender. . In the Spotlight Oct. 8. https://orwh.od.nih.gov/in-the-spotlight/all-articles/nih-offers-its-first-research-project-grant-r01-sex-and-gender
    [Google Scholar]
  46. 46. 
    Kasy M, Abebe R. 2020. Fairness, equality, and power in algorithmic decision making Work. Pap. Oct. 8. https://maxkasy.github.io/home/files/papers/fairness_equality_power.pdf
  47. 47. 
    Hofstra B, Kulkarni VV, Galvez SMN, He B, Jurafsky D, McFarland DA. 2020. The diversity–innovation paradox in science. PNAS 117:9284–91
    [Google Scholar]
  48. 48. 
    West JD, Jacquet J, King MM, Correll SJ, Bergstrom CT. 2013. The role of gender in scholarly authorship. PLOS ONE 8:e66212
    [Google Scholar]
  49. 49. 
    Pierson E. 2017. Demographics and discussion influence views on algorithmic fairness. arXiv:1712.09124 [cs.CY]
  50. 50. 
    Hoppe TA, Litovitz A, Willis KA, Meseroll RA, Perkins MJ et al. 2019. Topic choice contributes to the lower rate of NIH awards to African-American/black scientists. Sci. Adv. 5: eaaw7238
    [Google Scholar]
  51. 51. 
    Ginther DK, Schaffer WT, Schnell J, Masimore B, Liu F et al. 2011. Race, ethnicity, and NIH research awards. Science 333:1015–19
    [Google Scholar]
  52. 52. 
    CIHI (Can. Inst. Health Info.) 2020. Proposed standards for race-based and indigenous identity data collection and health reporting in Canada Data Stand., Can. Inst. Health Info. Ottawa: Ont .
  53. 53. 
    Léonard MN. 2014. Census and racial categorization in France: invisible categories and color-blind politics. Humanit. Soc. 38:67–88
    [Google Scholar]
  54. 54. 
    Tomašev N, Glorot X, Rae JW, Zielinski M, Askham H et al. 2019. A clinically applicable approach to continuous prediction of future acute kidney injury. Nature 572:116–19
    [Google Scholar]
  55. 55. 
    McDermott MBA, Nestor B, Kim E, Zhang W, Goldenberg A et al. 2020. A comprehensive evaluation of multi-task learning and multi-task pre-training on EHR time-series data. arXiv:2007.10185 [cs.LG]
  56. 56. 
    Oakden-Rayner L, Dunnmon J, Carneiro G, C. 2020. Hidden stratification causes clinically meaningful failures in machine learning for medical imaging. Proceedings of the ACM Conference on Health, Inference, and Learning151–59 New York: Assoc. Comput. Mach .
  57. 57. 
    Rothwell PM. 2005. External validity of randomised controlled trials: “To whom do the results of this trial apply?. Lancet 365:82–93
    [Google Scholar]
  58. 58. 
    Courtright K. 2016. Point: Do randomized controlled trials ignore needed patient populations? Yes. Chest 149:1128–30
    [Google Scholar]
  59. 59. 
    Travers J, Marsh S, Williams M, Weatherall M, Caldwell B et al. 2007. External validity of randomised controlled trials in asthma: To whom do the results of the trials apply?. Thorax 62:219–23
    [Google Scholar]
  60. 60. 
    Stuart EA, Bradshaw CP, Leaf PJ. 2015. Assessing the generalizability of randomized trial results to target populations. Prev. Sci. 16:475–85
    [Google Scholar]
  61. 61. 
    Wells BJ, Chagin KM, Nowacki AS, Kattan MW. 2013. Strategies for handling missing data in electronic health record derived data. eGEMS 1:37
    [Google Scholar]
  62. 62. 
    Agniel D, Kohane IS, Weber GM. 2018. Biases in electronic health record data due to processes within the healthcare system: retrospective observational study. BMJ 361:k1479
    [Google Scholar]
  63. 63. 
    Bartlett VL, Dhruva SS, Shah ND, Ryan P, Ross JS 2019. Feasibility of using real-world data to replicate clinical trial evidence. JAMA Netw. Open 2:e1912869
    [Google Scholar]
  64. 64. 
    Ferryman K, Pitcan M. 2018. Fairness in precision medicine Res. Proj., Data & Society. https://datasociety.net/research/fairness-precision-medicine/
  65. 65. 
    Hing E, Burt CW. 2009. Are there patient disparities when electronic health records are adopted?. J. Health Care Poor Underserved 20:473–88
    [Google Scholar]
  66. 66. 
    Kapoor M, Agrawal D, Ravi S, Roy A, Subramanian S, Guleria R. 2019. Missing female patients: an observational analysis of sex ratio among outpatients in a referral tertiary care public hospital in India. BMJ Open 9:e026850
    [Google Scholar]
  67. 67. 
    Haneuse SJA, Shortreed SM 2017. On the use of electronic health records. Methods in Comparative Effectiveness Research C Gatsonis, SC Morton 469–502 New York: Chapman & Hall/CRC
    [Google Scholar]
  68. 68. 
    Wing C, Simon K, Bello-Gomez RA. 2018. Designing difference in difference studies: best practices for public health policy research. Annu. Rev. Public Health 39:453–69
    [Google Scholar]
  69. 69. 
    Callahan EJ, Hazarian S, Yarborough M, Sánchez JP. 2014. Eliminating LGBTIQQ health disparities: the associated roles of electronic health records and institutional culture. Hastings Center Rep. 44:S48–52
    [Google Scholar]
  70. 70. 
    López MM, Bevans M, Wehrlen L, Yang L, Wallen G. 2017. Discrepancies in race and ethnicity documentation: a potential barrier in identifying racial and ethnic disparities. J. Racial Ethnic Health Disparities 4:812–18
    [Google Scholar]
  71. 71. 
    Klinger EV, Carlini SV, Gonzalez I, Hubert SS, Linder JA et al. 2015. Accuracy of race, ethnicity, and language preference in an electronic health record. J. Gen. Intern. Med. 30:719–23
    [Google Scholar]
  72. 72. 
    Dredze M. 2012. How social media will change public health. IEEE Intell. Syst. 27:81–84
    [Google Scholar]
  73. 73. 
    Abebe R, Hill S, Vaughan JW, Small PM, Schwartz HA. 2019. Using search queries to understand health information needs in Africa. Proceedings of the Thirteenth International AAAI Conference on Web and Social Media3–14 Palo Alto, CA: AAAI
  74. 74. 
    Giorgi S, Preoţiuc-Pietro D, Buffone A, Rieman D, Ungar L, Schwartz HA. 2018. The remarkable benefit of user-level aggregation for lexical-based population-level predictions. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing1167–72 Stroudsburg, PA: Assoc. Comput. Linguist.
  75. 75. 
    Martin AR, Kanai M, Kamatani Y, Okada Y, BM Neale, Daly MJ. 2019. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet. 51:584–91
    [Google Scholar]
  76. 76. 
    Jamison DT, Feacham RG, Makgoba MW, Bos ER, Baingana FK et al. 2006. Disease and Mortality in Sub-Saharan Africa Washington, DC: World Bank. , 2nd ed..
  77. 77. 
    James S, Herman J, Rankin S, Keisling M, Mottet L, Anafi M. 2016. The report of the 2015 US transgender survey Washington, DC: Natl. Cent. Transgend. Equal.
  78. 78. 
    Fountain C, Bearman P. 2011. Risk as social context: immigration policy and autism in California. Sociol. Forum 26:215–40
    [Google Scholar]
  79. 79. 
    Collier AY, Molina RL. 2019. Maternal mortality in the United States: updates on trends, causes, and solutions. NeoReviews 20:e561–74
    [Google Scholar]
  80. 80. 
    Tiwari C, Beyer K, Rushton G. 2014. The impact of data suppression on local mortality rates: the case of CDC WONDER. Am. J. Public Health 104:1386–88
    [Google Scholar]
  81. 81. 
    Kesselheim AS, Brennan TA. 2005. Overbilling versus downcoding—the battle between physicians and insurers. New Engl. J. Med. 352:855–57
    [Google Scholar]
  82. 82. 
    Boag W, Suresh H, Celi LA, Szolovits P, Ghassemi M. 2018. Racial disparities and mistrust in end-of-life care. arXiv:1808.03827 [stat.AP]
  83. 83. 
    Canto JG, Goldberg RJ, Hand MM, Bonow RO, Sopko G et al. 2007. Symptom presentation of women with acute coronary syndromes: myth versus reality. Arch. Intern. Med. 167:2405–13
    [Google Scholar]
  84. 84. 
    Bugiardini R, Ricci B, Cenko E, Vasiljevic Z, Kedev S et al. 2017. Delayed care and mortality among women and men with myocardial infarction. J. Am. Heart Assoc. 6:e005968
    [Google Scholar]
  85. 85. 
    Rose S. 2016. A machine learning framework for plan payment risk adjustment. Health Serv. Res. 51:2358–74
    [Google Scholar]
  86. 86. 
    Geruso M, Layton T. 2015. Upcoding: evidence from Medicare on squishy risk adjustment. J. Pol. Econ. 128:3). https://doi.org/10.1086/704756
    [Crossref] [Google Scholar]
  87. 87. 
    Natarajan N, Dhillon IS, Ravikumar PK, Tewari A 2013. Learning with noisy labels. Proceedings of the 26th International Conference on Advances in Neural Information Processing Systems (NIPS 2013) CJC Burges, L Bottou, M Welling, Z Ghahramani, KQ Weinberger 1196–204 https://papers.nips.cc/paper/2013/file/3871bd64012152bfb53fdf04b401193f-Paper.pdf
  88. 88. 
    Halpern Y, Horng S, Choi Y, Sontag D. 2016. Electronic medical record phenotyping using the anchor and learn framework. J. Am. Med. Inform. Assoc. 23:731–40
    [Google Scholar]
  89. 89. 
    Oakden-Rayner L. 2020. Exploring large-scale public medical image datasets. Acad. Radiol. 27:106–12
    [Google Scholar]
  90. 90. 
    Tamang S, Milstein A, Sørensen HT, Pedersen L, Mackey L et al. 2017. Predicting patient ‘cost blooms’ in Denmark: a longitudinal population-based study. BMJ Open 7:e011580
    [Google Scholar]
  91. 91. 
    Cook BL, McGuire TG, Zaslavsky AM. 2012. Measuring racial/ethnic disparities in health care: methods and practical issues. Health Serv. Res. 47:1232–54
    [Google Scholar]
  92. 92. 
    Cook BL, Zuvekas SH, Carson N, Wayne GF, Vesper A, McGuire TG. 2014. Assessing racial/ethnic disparities in treatment across episodes of mental health care. Health Serv. Res. 49:206–29
    [Google Scholar]
  93. 93. 
    Zink A, Rose S. 2020. Fair regression for health care spending. Biometrics 76:973–82
    [Google Scholar]
  94. 94. 
    Guillory D. 2020. Combating anti-blackness in the AI community. arXiv:2006.16879 [cs.CY]
  95. 95. 
    Lohaus M, Perrot M, von Luxburg U 2020. Too relaxed to be fair. Proc. Mach. Learn. Res. 119:6360–69
    [Google Scholar]
  96. 96. 
    Sagawa S, Koh PW, Hashimoto TB, Liang P. 2020. Distributionally robust neural network Paper presented at the Eighth International Conference on Learning Representations (ICLR 2020) Apr. 26–May 1. https://openreview.net/pdf?id=ryxGuJrFvS
    [Google Scholar]
  97. 97. 
    Joshi S, Koyejo O, Kim B, Ghosh J. 2018. xGEMS: generating examplars to explain black-box models. arXiv:1806.08867 [cs.LG]
  98. 98. 
    Caruana R, Lou Y, Gehrke J, Koch P, Sturm M, Elhadad N. 2015. Intelligible models for healthcare: predicting pneumonia risk and hospital 30-day readmission. Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining1721–30 New York: Assoc. Comput. Mach.
  99. 99. 
    Hernán MA, Robins JM. 2010. Causal Inference: What If Boca Raton, FL: Chapman & Hall/CRC
  100. 100. 
    Glymour C, Zhang K, Spirtes P. 2019. Review of causal discovery methods based on graphical models. Front. Genet. 10:524
    [Google Scholar]
  101. 101. 
    Van der Laan MJ, Rose S 2011. Targeted Learning: Causal Inference for Observational and Experimental Data New York: Springer-Verlag
  102. 102. 
    Chernozhukov V, Chetverikov D, Demirer M, Duflo E, Hansen C et al. 2018. Double/debiased machine learning for treatment and structural parameters. Econom. J. 21:1C1–68
    [Google Scholar]
  103. 103. 
    Miao W, Geng Z, Tchetgen Tchetgen EJ 2018. Identifying causal effects with proxy variables of an unmeasured confounder. Biometrika 105:987–93
    [Google Scholar]
  104. 104. 
    Franks A, D'Amour A, Feller A 2019. Flexible sensitivity analysis for observational studies without observable implications. J. Am. Stat. Assoc. 115:5321730–76
    [Google Scholar]
  105. 105. 
    Little MA, Badawy R. 2019. Causal bootstrapping. arXiv:1910.09648 [cs.LG]
  106. 106. 
    Vyas DA, Eisenstein LG, Jones DS. 2020. Hidden in plain sight—reconsidering the use of race correction in clinical algorithms. N. Engl. J. Med. 383:874–82
    [Google Scholar]
  107. 107. 
    Grobman WA, Lai Y, Landon MB, Spong CY, Leveno KJ et al. 2007. Development of a nomogram for prediction of vaginal birth after cesarean delivery. Obstet. Gynecol. 109:806–12
    [Google Scholar]
  108. 108. 
    Thompson B. 1995. Stepwise regression and stepwise discriminant analysis need not apply here: a guidelines editorial. Educ. Psychol. Meas. 55:4525–34
    [Google Scholar]
  109. 109. 
    Harrell FE Jr. 2015. Regression Modeling Strategies: With Applications to Linear Models, Logistic and Ordinal Regression, and Survival Analysis New York: Springer. , 2nd ed..
  110. 110. 
    James G, Witten D, Hastie T, Tibshirani R. 2013. An Introduction to Statistical Learning: With Applications in R New York: Springer-Verlag
  111. 111. 
    Koh PW, Nguyen T, Tang YS, Mussmann S, Pierson E et al. 2020. Concept bottleneck models. Proc. Mach. Learn. Res. 119:5338–48
    [Google Scholar]
  112. 112. 
    Sagawa S, Raghunathan A, Koh PW, Liang P. 2020. An investigation of why overparameterization exacerbates spurious correlations. Proc. Mach. Learn. Res. 119:8346–56
    [Google Scholar]
  113. 113. 
    Flach PA 2003. The geometry of ROC space: understanding machine learning metrics through ROC isometrics. Proceedings of the 20th International Conference on Machine Learning T Fawcett, N Mishra 194–201 Palo Alto, CA: AAAI
  114. 114. 
    Vyas DA, Eisenstein LG, Jones DS. 2020. Hidden in plain sight—reconsidering the use of race correction in clinical algorithms. New Engl. J. Med. 383:874–82
    [Google Scholar]
  115. 115. 
    Dwork C, Hardt M, Pitassi T, Reingold O, Zemel R. 2012. Fairness through awareness. Proceedings of the 3rd Innovations in Theoretical Computer Science Conference214–26 New York: Assoc. Comput. Mach.
  116. 116. 
    Dwork C, Ilvento C. 2018. Fairness under composition. arXiv:1806.06122 [cs.LG]
  117. 117. 
    Chouldechova A, Roth A. 2020. A snapshot of the frontiers of fairness in machine learning. Commun. ACM 63:82–89
    [Google Scholar]
  118. 118. 
    Calders T, Karim A, Kamiran F, Ali W, Zhang X 2013. Controlling attribute effect in linear regression. 2013 IEEE 13th International Conference on Data Mining71–80 Los Alamitos, CA: IEEE Comput. Soc.
  119. 119. 
    Zafar MB, Valera I, Rogriguez MG, Gummadi KP. 2017. Fairness constraints: mechanisms for fair classification. Proc. Mach. Learn. Res. 54:962–70
    [Google Scholar]
  120. 120. 
    Montz E, Layton T, Busch AB, Ellis RP, Rose S, McGuire TG. 2016. Risk-adjustment simulation: Plans may have incentives to distort mental health and substance use coverage. Health Aff. 35:1022–28
    [Google Scholar]
  121. 121. 
    McGuire TG, Zink AL, Rose S 2020. Simplifying and improving the performance of risk adjustment systems Work. Pap. Natl. Bur. Econ. Res. Cambridge, MA:
  122. 122. 
    Helmreich RL. 2000. On error management: lessons from aviation. BMJ 320:781–85
    [Google Scholar]
  123. 123. 
    Murayama KM, Derossis AM, DaRosa DA, Sherman HB, Fryer JP. 2002. A critical evaluation of the morbidity and mortality conference. Am. J. Surg. 183:246–50
    [Google Scholar]
  124. 124. 
    Herrera-Perez D, Haslam A, Crain T, Gill J, Livingston C et al. 2019. Meta-research: a comprehensive review of randomized clinical trials in three medical journals reveals 396 medical reversals. eLife 8:e45183
    [Google Scholar]
  125. 125. 
    Creager E, Madras D, Pitassi T, Zemel R. 2019. Causal modeling for fairness in dynamical systems. arXiv:1909.09141 [cs.LG]
  126. 126. 
    Noseworthy PA, Attia ZI, Brewer LC, Hayes SN, Yao X et al. 2020. Assessing and mitigating bias in medical artificial intelligence: the effects of race and ethnicity on a deep learning model for ECG analysis. Circ. Arrhythm. Electrophysiol. 13:e007988
    [Google Scholar]
  127. 127. 
    Inst. Med 2002. Unequal Treatment: Confronting Racial and Ethnic Disparities in Health Care Washington, DC: Natl. Acad. Press
  128. 128. 
    Perez CC. 2019. Invisible Women: Exposing Data Bias in a World Designed for Men New York: Abrams
  129. 129. 
    Zech JR, Badgeley MA, Liu M, Costa AB, Titano JJ, Oermann EK. 2018. Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: a cross-sectional study. PLOS Med. 15:e1002683
    [Google Scholar]
  130. 130. 
    Larrazabal AJ, Nieto N, Peterson V, Milone DH, Ferrante E. 2020. Gender imbalance in medical imaging datasets produces biased classifiers for computer-aided diagnosis. PNAS 117:12592–94
    [Google Scholar]
  131. 131. 
    Seyyed-Kalantari L, Liu G, McDermott M, Ghassemi M. 2020. CheXclusion: fairness gaps in deep chest X-ray classifiers. arXiv:2003.00827 [cs.CV]
  132. 132. 
    Nestor B, McDermott M, Boag W, Berner G, Naumann T et al. 2019. Feature robustness in non-stationary health records: caveats to deployable model performance in common clinical machine learning tasks. Proc. Mach. Learn. Res. 106:381–405
    [Google Scholar]
  133. 133. 
    Bissoto A, Fornaciali M, Valle E, Avila S. 2019. (De)constructing bias on skin lesion datasets. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops https://openaccess.thecvf.com/content_CVPRW_2019/html/ISIC/Bissoto_DeConstructing_Bias_on_Skin_Lesion_Datasets_CVPRW_2019_paper.html
  134. 134. 
    Kundu RV, Patterson S. 2013. Dermatologic conditions in skin of color: part I. Special considerations for common skin disorders. Am. Family Phys. 87:850–56
    [Google Scholar]
  135. 135. 
    Rabanser S, Günnemann S, Lipton Z. 2019. Failing loudly: an empirical study of methods for detecting dataset shift. Proceedings of the 32nd International Conference on Advances in Neural Information Processing Systems (NIPS 2019) H Wallach, H Larochelle, A Beygelzimer, F d'Alché Buc, E Fox, R Garnett 1396–408 https://papers.nips.cc/paper/2019/file/846c260d715e5b854ffad5f70a516c88-Paper.pdf
  136. 136. 
    Subbaswamy A, Saria S. 2020. From development to deployment: dataset shift, causality, and shift-stable models in health AI. Biostatistics 21:345–52
    [Google Scholar]
  137. 137. 
    Davis SE, Lasko TA, Chen G, Siew ED, Matheny ME. 2017. Calibration drift in regression and machine learning models for acute kidney injury. J. Am. Med. Inform. Assoc. 24:1052–61
    [Google Scholar]
  138. 138. 
    Saleh S, Boag W, Erdman L, Naumann T. 2020. Clinical collabsheets: 53 questions to guide a clinical collaboration. Proc. Mach. Learn. Res. 126:783–812
    [Google Scholar]
  139. 139. 
    Madaio MA, Stark L, Wortman Vaughan J, Wallach H 2020. Co-designing checklists to understand organizational challenges and opportunities around fairness in AI. Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems Pap. 318 New York: Assoc. Comput. Mach .
  140. 140. 
    Mitchell M, Wu S, Zaldivar A, Barnes P, Vasserman L et al. 2019. Model cards for model reporting. Proceedings of the Conference on Fairness, Accountability, and Transparency220–29 New York: Assoc. Comput. Mach .
  141. 141. 
    Gebru T, Morgenstern J, Vecchione B, Vaughan JW, Wallach H et al. 2018. Datasheets for datasets. arXiv:1803.09010 [cs.DB]
  142. 142. 
    FDA (US Food Drug Admin.) 2021. Artificial intelligence and machine learning in software as a medical device Web Resour. FDA Silver Spring, MD: https://www.fda.gov/medical-devices/software-medical-device-samd/artificial-intelligence-and-machine-learning-software-medical-device
  143. 143. 
    Ferryman K. 2020. Addressing health disparities in the Food and Drug Administration's artificial intelligence and machine learning regulatory framework. J. Am. Med. Inform. Assoc. 27:122016–19
    [Google Scholar]
  144. 144. 
    Sullivan HR, Schweikart SJ. 2019. Are current tort liability doctrines adequate for addressing injury caused by AI?. AMA J. Ethics 21:160–66
    [Google Scholar]
  145. 145. 
    Liu X, Rivera SC, Faes L, Di Ruffano LF, Yau C et al. 2019. Reporting guidelines for clinical trials evaluating artificial intelligence interventions are needed. Nat. Med. 25:1467–68
    [Google Scholar]
  146. 146. 
    Coravos A, Chen I, Gordhandas A, Stern AD. 2019. We should treat algorithms like prescription drugs. Quartz Feb. 19. https://qz.com/1540594/treating-algorithms-like-prescription-drugs-could-reduce-ai-bias/
    [Google Scholar]
  147. 147. 
    Parikh RB, Obermeyer Z, Navathe AS. 2019. Regulation of predictive analytics in medicine. Science 363:810–12
    [Google Scholar]
  148. 148. 
    Mohamed S, Png MT, Isaac W. 2020. Decolonial AI: decolonial theory as sociotechnical foresight in artificial intelligence. Philos. Technol. 33:659–84
    [Google Scholar]
  149. 149. 
    Lyndon A, McNulty J, VanderWal B, Gabel K, Huwe V, Main E. 2015. Cumulative quantitative assessment of blood loss. CMQCC Obstet. Hemorrhage Toolkit Vers 280–85 Stanford, CA: Calif. Matern Qual. Care Collab: https://www.cmqcc.org/content/cumulative-quantitative-assessment-blood-loss
    [Google Scholar]
  150. 150. 
    Chen IY, Joshi S, Ghassemi M. 2020. Treating health disparities with artificial intelligence. Nat. Med. 26:16–17
    [Google Scholar]

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error