1932

Abstract

Artificial intelligence (AI) and other data-driven technologies hold great promise to transform healthcare and confer the predictive power essential to precision medicine. However, the existing biomedical data, which are a vital resource and foundation for developing medical AI models, do not reflect the diversity of the human population. The low representation in biomedical data has become a significant health risk for non-European populations, and the growing application of AI opens a new pathway for this health risk to manifest and amplify. Here we review the current status of biomedical data inequality and present a conceptual framework for understanding its impacts on machine learning. We also discuss the recent advances in algorithmic interventions for mitigating health disparities arising from biomedical data inequality. Finally, we briefly discuss the newly identified disparity in data quality among ethnic groups and its potential impacts on machine learning.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-biodatasci-020722-020704
2023-08-10
2024-05-04
Loading full text...

Full text loading...

/deliver/fulltext/biodatasci/6/1/annurev-biodatasci-020722-020704.html?itemId=/content/journals/10.1146/annurev-biodatasci-020722-020704&mimeType=html&fmt=ahah

Literature Cited

  1. 1.
    Halldorsson BV, Eggertsson HP, Moore KHS, Hauswedell H, Eiriksson O et al. 2022. The sequences of 150,119 genomes in the UK Biobank. Nature 607:732–40
    [Google Scholar]
  2. 2.
    All Us Res. Progr. Investig 2019. The “All of Us” Research Program. N. Engl. J. Med. 381:668–76
    [Google Scholar]
  3. 3.
    Karczewski KJ, Snyder MP. 2018. Integrative omics for health and disease. Nat. Rev. Genet. 19:299–310
    [Google Scholar]
  4. 4.
    Gaziano JM, Concato J, Brophy M, Fiore L, Pyarajan S et al. 2016. Million Veteran Program: a mega-biobank to study genetic influences on health and disease. J. Clin. Epidemiol. 70:214–23
    [Google Scholar]
  5. 5.
    Li R, Chen Y, Ritchie MD, Moore JH. 2020. Electronic health records and polygenic risk scores for predicting disease risk. Nat. Rev. Genet. 21:493–502
    [Google Scholar]
  6. 6.
    Zhang A, Xing L, Zou J, Wu JC. 2022. Shifting machine learning for healthcare from development to deployment and from models to data. Nat. Biomed. Eng. 6:1330–45
    [Google Scholar]
  7. 7.
    Uddin S, Khan A, Hossain ME, Moni MA. 2019. Comparing different supervised machine learning algorithms for disease prediction. BMC Med. Inform. Decis. Making 19:281
    [Google Scholar]
  8. 8.
    Ho DSW, Schierding W, Wake M, Saffery R, O'Sullivan J. 2019. Machine learning SNP based prediction for precision medicine. Front. Genet. 10:267
    [Google Scholar]
  9. 9.
    Gao Y, Cui Y. 2022. Clinical time-to-event prediction enhanced by incorporating compatible related outcomes. PLOS Digital Health 1:5e0000038
    [Google Scholar]
  10. 10.
    Ching T, Zhu X, Garmire LX. 2018. Cox-nnet: An artificial neural network method for prognosis prediction of high-throughput omics data. PLOS Comput. Biol. 14:4e1006076
    [Google Scholar]
  11. 11.
    Rajkomar A, Dean J, Kohane I. 2019. Machine learning in medicine. N. Engl. J. Med. 380:1347–58
    [Google Scholar]
  12. 12.
    Cheerla A, Gevaert O. 2019. Deep learning with multimodal representation for pancancer prognosis prediction. Bioinformatics 35:i446–54
    [Google Scholar]
  13. 13.
    Leist AK, Klee M, Kim JH, Rehkopf DH, Bordas SPA et al. 2022. Mapping of machine learning approaches for description, prediction, and causal inference in the social and health sciences. Sci. Adv. 8:eabk1942
    [Google Scholar]
  14. 14.
    Martin AR, Kanai M, Kamatani Y, Okada Y, Neale BM, Daly MJ. 2019. Clinical use of current polygenic risk scores may exacerbate health disparities. Nat. Genet. 51:584–91
    [Google Scholar]
  15. 15.
    Guerrero S, López-Cortés A, Indacochea A, García-Cárdenas JM, Zambrano AK et al. 2018. Analysis of racial/ethnic representation in select basic and applied cancer research studies. Sci. Rep. 8:13978
    [Google Scholar]
  16. 16.
    Mills MC, Rahal C. 2020. The GWAS Diversity Monitor tracks diversity by disease in real time. Nat. Genet. 52:242–43
    [Google Scholar]
  17. 17.
    Gurdasani D, Barroso I, Zeggini E, Sandhu MS. 2019. Genomics of disease risk in globally diverse populations. Nat. Rev. Genet. 20:520–35
    [Google Scholar]
  18. 18.
    Sirugo G, Williams SM, Tishkoff SA. 2019. The missing diversity in human genetic studies. Cell 177:26–31
    [Google Scholar]
  19. 19.
    Gao Y, Cui Y. 2020. Deep transfer learning for reducing health care disparities arising from biomedical data inequality. Nat. Commun. 11:5131
    [Google Scholar]
  20. 20.
    Natl. Hum. Genome Res. Inst 2021. Diversity in genomic research Fact Sheet, Natl. Hum. Genome Res. Inst. Bethesda, MD: https://www.genome.gov/about-genomics/fact-sheets/Diversity-in-Genomic-Research
  21. 21.
    Uffelmann E, Huang QQ, Munung NS, de Vries J, Okada Y et al. 2021. Genome-wide association studies. Nat. Rev. Methods Primers 1:59
    [Google Scholar]
  22. 22.
    Torkamani A, Wineinger NE, Topol EJ. 2018. The personal and clinical utility of polygenic risk scores. Nat. Rev. Genet. 19:581–90
    [Google Scholar]
  23. 23.
    Lambert SA, Abraham G, Inouye M. 2019. Towards clinical utility of polygenic risk scores. Hum. Mol. Genet. 28:R133–42
    [Google Scholar]
  24. 24.
    Lewis CM, Vassos E. 2020. Polygenic risk scores: from research tools to clinical instruments. Genome Med 12:44
    [Google Scholar]
  25. 25.
    Choi SW, Mak TS-H, O'Reilly PF 2020. Tutorial: a guide to performing polygenic risk score analyses. Nat. Protoc. 15:2759–72
    [Google Scholar]
  26. 26.
    Wray NR, Lin T, Austin J, McGrath JJ, Hickie IB et al. 2021. From basic science to clinical application of polygenic risk scores: a primer. JAMA Psychiatry 78:101–09
    [Google Scholar]
  27. 27.
    Polygenic Risk Score Task Force Int. Common Dis. Alliance 2021. Responsible use of polygenic risk scores in the clinic: potential benefits, risks and gaps. Nat. Med. 27:1876–84
    [Google Scholar]
  28. 28.
    Kullo IJ, Lewis CM, Inouye M, Martin AR, Ripatti S, Chatterjee N. 2022. Polygenic scores in biomedical research. Nat. Rev. Genet. 23:524–32
    [Google Scholar]
  29. 29.
    Ma Y, Zhou X. 2021. Genetic prediction of complex traits with polygenic scores: a statistical review. Trends Genet. 37:995–1011
    [Google Scholar]
  30. 30.
    Özdemir BC, Dotto G-P. 2017. Racial differences in cancer susceptibility and survival: more than the color of the skin?. Trends Cancer 3:181–97
    [Google Scholar]
  31. 31.
    Oak N, Cherniack AD, Mashl RJ, Hirsch FR, Ding L et al. 2020. Ancestry-specific predisposing germline variants in cancer. Genome Med. 12:51
    [Google Scholar]
  32. 32.
    Yuan J, Hu Z, Mahal BA, Zhao SD, Kensler KH et al. 2018. Integrated analysis of genetic ancestry and genomic alterations across cancers. Cancer Cell 34:549–60.e9
    [Google Scholar]
  33. 33.
    Carrot-Zhang J, Chambwe N, Damrauer JS, Knijnenburg TA, Robertson AG et al. 2020. Comprehensive analysis of genetic ancestry and its molecular correlates in cancer. Cancer Cell 37:639–54.e6
    [Google Scholar]
  34. 34.
    Ahmad A, Azim S, Zubair H, Khan MA, Singh S et al. 2017. Epigenetic basis of cancer health disparities: looking beyond genetic differences. Biochim. Biophys. Acta 1868:16–28
    [Google Scholar]
  35. 35.
    Xia Y-Y, Ding Y-B, Liu X-Q, Chen X-M, Cheng S-Q et al. 2014. Racial/ethnic disparities in human DNA methylation. Biochim. Biophys. Acta. 1846:258–62
    [Google Scholar]
  36. 36.
    Galanter JM, Gignoux CR, Oh SS, Torgerson D, Pino-Yanes M et al. 2017. Differential methylation between ethnic sub-groups reflects the effect of genetic ancestry and environmental exposures. eLife 6:e20532
    [Google Scholar]
  37. 37.
    Rahmani E, Shenhav L, Schweiger R, Yousefi P, Huen K et al. 2017. Genome-wide methylation data mirror ancestry information. Epigenet. Chromatin 10:1
    [Google Scholar]
  38. 38.
    Bisogno LS, Yang J, Bennett BD, Ward JM, Mackey LC et al. 2020. Ancestry-dependent gene expression correlates with reprogramming to pluripotency and multiple dynamic biological processes. Sci. Adv. 6:eabc3851
    [Google Scholar]
  39. 39.
    Park CS, De T, Xu Y, Zhong Y, Smithberger E et al. 2019. Hepatocyte gene expression and DNA methylation as ancestry-dependent mechanisms in African Americans. NPJ Genom. Med. 4:29
    [Google Scholar]
  40. 40.
    Sjaarda J, Gerstein HC, Kutalik Z, Mohammadi-Shemirani P, Pigeyre M et al. 2020. Influence of genetic ancestry on human serum proteome. Am. J. Hum. Genet. 106:303–14
    [Google Scholar]
  41. 41.
    Mogil LS, Andaleon A, Badalamenti A, Dickinson SP, Guo X et al. 2018. Genetic architecture of gene expression traits across diverse populations. PLOS Genet. 14:e1007586
    [Google Scholar]
  42. 42.
    Gay NR, Gloudemans M, Antonio ML, Abell NS, Balliu B et al. 2020. Impact of admixture and ancestry on eQTL analysis and GWAS colocalization in GTEx. Genome Biol. 21:233
    [Google Scholar]
  43. 43.
    Hu J, Yao J, Deng S, Balasubramanian R, Jimenez MC et al. 2022. Differences in metabolomic profiles between black and white women and risk of coronary heart disease: an observational study of women from four US cohorts. Circ. Res. 131:601–15
    [Google Scholar]
  44. 44.
    Vasishta S, Ganesh K, Umakanth S, Joshi MB. 2022. Ethnic disparities attributed to the manifestation in and response to type 2 diabetes: insights from metabolomics. Metabolomics 18:45
    [Google Scholar]
  45. 45.
    Patel MJ, Batch BC, Svetkey LP, Bain JR, Turer CB et al. 2013. Race and sex differences in small-molecule metabolites and metabolic hormones in overweight and obese adults. OMICS 17:627–35
    [Google Scholar]
  46. 46.
    van Valkengoed IGM, Argmann C, Ghauharali-van der Vlugt K, Aerts JMFG, Brewster LM et al. 2017. Ethnic differences in metabolite signatures and type 2 diabetes: a nested case–control analysis among people of South Asian, African and European origin. Nutr. Diabetes 7:300
    [Google Scholar]
  47. 47.
    Brooks AW, Priya S, Blekhman R, Bordenstein SR. 2018. Gut microbiota diversity across ethnicities in the United States. PLOS Biol. 16:e2006842
    [Google Scholar]
  48. 48.
    Ang QY, Alba DL, Upadhyay V, Bisanz JE, Cai J et al. 2021. The East Asian gut microbiome is distinct from colocalized White subjects and connected to metabolic health. eLife 10:e70349
    [Google Scholar]
  49. 49.
    Deschasaux M, Bouter KE, Prodan A, Levin E, Groen AK et al. 2018. Depicting the composition of gut microbiota in a population with varied ethnic origins but shared geography. Nat. Med. 24:1526–31
    [Google Scholar]
  50. 50.
    Nédélec Y, Sanz J, Baharian G, Szpiech ZA, Pacis A et al. 2016. Genetic ancestry and natural selection drive population differences in immune responses to pathogens. Cell 167:657–69.e21
    [Google Scholar]
  51. 51.
    Yang HC, Chen CW, Lin YT, Chu SK. 2021. Genetic ancestry plays a central role in population pharmacogenomics. Commun. Biol. 4:171
    [Google Scholar]
  52. 52.
    Mulford AJ, Wing C, Dolan ME, Wheeler HE. 2021. Genetically regulated expression underlies cellular sensitivity to chemotherapy in diverse populations. Hum. Mol. Genet. 30:3–4305–17
    [Google Scholar]
  53. 53.
    Moreno-Torres JG, Raeder T, Alaiz-Rodríguez R, Chawla NV, Herrera F. 2012. A unifying view on dataset shift in classification. Pattern Recognit. 45:521–30
    [Google Scholar]
  54. 54.
    Gama J, Žliobaitė I, Bifet A, Pechenizkiy M, Bouchachia A. 2014. A survey on concept drift adaptation. ACM Comput. Surveys 46:1–37
    [Google Scholar]
  55. 55.
    Quionero-Candela J, Sugiyama M, Schwaighofer A, Lawrence ND. 2009. Dataset Shift in Machine Learning. Cambridge, MA: MIT
  56. 56.
    Lam M, Chen CY, Li Z, Martin AR, Bryois J et al. 2019. Comparative genetic architectures of schizophrenia in East Asian and European populations. Nat. Genet 51:1670–78
    [Google Scholar]
  57. 57.
    Graham SE, Clarke SL, Wu KH, Kanoni S, Zajac GJM et al. 2021. The power of genetic diversity in genome-wide association studies of lipids. Nature 600:675–79
    [Google Scholar]
  58. 58.
    Galinsky KJ, Reshef YA, Finucane HK, Loh PR, Zaitlen N et al. 2019. Estimating cross-population genetic correlations of causal effect sizes. Genet. Epidemiol. 43:180–88
    [Google Scholar]
  59. 59.
    Brown BC, Asian Genet. Epidemiol. Netw. Type 2 Diabetes Consort., Ye CJ, Price AL, Zaitlen N. 2016. Transethnic genetic-correlation estimates from summary statistics. Am. J. Hum. Genet. 99:76–88
    [Google Scholar]
  60. 60.
    Zhang G, Zhao J, Yi J, Luan Y, Wang Q. 2016. Association between gene polymorphisms on chromosome 1 and susceptibility to pre-eclampsia: an updated meta-analysis. Med. Sci. Monit. 22:2202–14
    [Google Scholar]
  61. 61.
    Pairo-Castineira E, Clohisey S, Klaric L, Bretherick AD, Rawlik K et al. 2021. Genetic mechanisms of critical illness in COVID-19. Nature 591:92–98
    [Google Scholar]
  62. 62.
    Johnson A, Bulgarelli L, Pollard T, Horng S, Celi LA, Mark R. 2022. MIMIC-IV (version 2.0). PhysioNet. https://doi.org/10.13026/7vcr-e114
    [Crossref] [Google Scholar]
  63. 63.
    Lim E, Miyamura J, Chen JJ. 2015. Racial/ethnic-specific reference intervals for common laboratory tests: a comparison among Asians, Blacks, Hispanics, and White. Hawai'i J. Medic. Public Health 74:302–10
    [Google Scholar]
  64. 64.
    Rappoport N, Paik H, Oskotsky B, Tor R, Ziv E et al. 2019. Comparing ethnicity-specific reference intervals for clinical laboratory tests from EHR data. J. Appl. Lab. Med. 3:366–77
    [Google Scholar]
  65. 65.
    Manrai AK, Patel CJ, Ioannidis JPA. 2018. In the era of precision medicine and big data, who is normal?. JAMA 319:1981–82
    [Google Scholar]
  66. 66.
    Manrai AK, Funke BH, Rehm HL, Olesen MS, Maron BA et al. 2016. Genetic misdiagnoses and the potential for health disparities. N. Engl. J. Med. 375:655–65
    [Google Scholar]
  67. 67.
    Prive F, Aschard H, Carmi S, Folkersen L, Hoggart C et al. 2022. Portability of 245 polygenic scores when derived from the UK Biobank and applied to 9 ancestry groups from the same cohort. Am. J. Hum. Genet. 109:12–23
    [Google Scholar]
  68. 68.
    Martin AR, Gignoux CR, Walters RK, Wojcik GL, Neale BM et al. 2017. Human demographic history impacts genetic risk prediction across diverse populations. Am. J. Hum. Genet. 100:635–49
    [Google Scholar]
  69. 69.
    Duncan L, Shen H, Gelaye B, Meijsen J, Ressler K et al. 2019. Analysis of polygenic risk score usage and performance in diverse human populations. Nat. Commun. 10:3328
    [Google Scholar]
  70. 70.
    Chen M-H, Raffield LM, Mousas A, Sakaue S, Huffman JE et al. 2020. Trans-ethnic and ancestry-specific blood-cell genetics in 746,667 individuals from 5 global populations. Cell 182:1198–213.e14
    [Google Scholar]
  71. 71.
    Zhou W, Kanai M, Wu K-HH, Rasheed H, Tsuo K et al. 2021. Global Biobank Meta-analysis Initiative: powering genetic discovery across human diseases. Cell Genom. 2:10100192
    [Google Scholar]
  72. 72.
    Wang Y, Guo J, Ni G, Yang J, Visscher PM, Yengo L. 2020. Theoretical and empirical quantification of the accuracy of polygenic scores in ancestry divergent populations. Nat. Commun. 11:3865
    [Google Scholar]
  73. 73.
    Conti DV, Darst BF, Moss LC, Saunders EJ, Sheng X et al. 2021. Trans-ancestry genome-wide association meta-analysis of prostate cancer identifies new susceptibility loci and informs genetic risk prediction. Nat. Genet. 53:65–75
    [Google Scholar]
  74. 74.
    Cavazos TB, Witte JS. 2021. Inclusion of variants discovered from diverse populations improves polygenic risk score transferability. Hum. Genet. Genom. Adv. 2:100017
    [Google Scholar]
  75. 75.
    Li J, Bzdok D, Chen J, Tam A, Ooi LQR et al. 2022. Cross-ethnicity/race generalization failure of behavioral prediction from resting-state functional connectivity. Sci. Adv. 8:eabj1812
    [Google Scholar]
  76. 76.
    Dai Z, Long N, Huang W. 2020. Influence of genetic interactions on polygenic prediction. G3 10:109–15
    [Google Scholar]
  77. 77.
    Ruan Y, Lin YF, Feng YA, Chen CY, Lam M et al. 2022. Improving polygenic prediction in ancestrally diverse populations. Nat. Genet. 54:573–80
    [Google Scholar]
  78. 78.
    Cai M, Xiao J, Zhang S, Wan X, Zhao H et al. 2021. A unified framework for cross-population trait prediction by leveraging the genetic correlation of polygenic traits. Am. J. Hum. Genet. 108:632–55
    [Google Scholar]
  79. 79.
    Zhang H, Zhan J, Jin J, Zhang J, Wenxuan L et al. 2022. Novel methods for multi-ancestry polygenic prediction and their evaluations in 5.1 million individuals of diverse ancestry. bioRxiv 2022.03.24.485519. https://doi.org/10.1101/2022.03.24.485519
    [Crossref]
  80. 80.
    Coram MA, Fang H, Candille SI, Assimes TL, Tang H. 2017. Leveraging multi-ethnic evidence for risk assessment of quantitative traits in minority populations. Am. J. Hum. Genet. 101:218–26
    [Google Scholar]
  81. 81.
    Xiao J, Cai M, Hu X, Wan X, Chen G, Yang C. 2022. XPXP: improving polygenic prediction by cross-population and cross-phenotype analysis. Bioinformatics 38:1947–55
    [Google Scholar]
  82. 82.
    Weissbrod O, Kanai M, Shi H, Gazal S, Peyrot WJ et al. 2022. Leveraging fine-mapping and multipopulation training data to improve cross-population polygenic risk scores. Nat. Genet. 54:450–58
    [Google Scholar]
  83. 83.
    Zhou X, Chen Y, Ip F, Jiang Y, Cao H et al. 2021. Deep learning methods improve polygenic risk analysis and prediction for Alzheimer's disease. Res. Sq. rs.3.rs-818364/v1. http://doi.org/10.21203/rs.3.rs-818364/v1
    [Crossref]
  84. 84.
    Muneeb M, Feng S, Henschel A. 2022. An empirical comparison between polygenic risk scores and machine learning for case/control classification. Res. Sq. rs.3.rs-1298372/v1. http://doi.org/10.21203/rs.3.rs-1298372/v1
    [Crossref]
  85. 85.
    Badré A, Zhang L, Muchero W, Reynolds JC, Pan C. 2021. Deep neural network improves the estimation of polygenic risk scores for breast cancer. J. Hum. Genet. 66:359–69
    [Google Scholar]
  86. 86.
    Yang Q, Zhang Y, Dai W, Pan SJ. 2020. Transfer Learning. Cambridge, UK: Cambridge Univ. Press
  87. 87.
    Pan SJ, Yang Q. 2010. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22:1345–59
    [Google Scholar]
  88. 88.
    Tan C, Sun F, Kong T, Zhang W, Yang C, Liu C. 2018. A survey on deep transfer learning Paper presented at the 27th International Conference on Artificial Neural Networks (ICANN 2018) Rhodes, Greece: Oct. 4–7
  89. 89.
    Weiss K, Khoshgoftaar TM, Wang D. 2016. A survey of transfer learning. J. Big Data 3:9
    [Google Scholar]
  90. 90.
    Sevakula RK, Singh V, Verma NK, Kumar C, Cui Y. 2019. Transfer learning for molecular cancer classification using deep neural networks. IEEE/ACM Trans. Comput. Biol. Bioinform. 16:2089–100
    [Google Scholar]
  91. 91.
    Ebbehoj A, Thunbo , Andersen OE, Glindtvad MV, Hulman A. 2022. Transfer learning for non-image data in clinical research: a scoping review. PLOS Digit. Health 1:e0000014
    [Google Scholar]
  92. 92.
    Zhuang F, Qi Z, Duan K, Xi D, Zhu Y et al. 2021. A comprehensive survey on transfer learning. Proc. IEEE 109:43–76
    [Google Scholar]
  93. 93.
    Gao Y, Cui Y. 2021. Multi-ethnic survival analysis: transfer learning with cox neural networks. Proc. Mach. Learn. Res. 146:252–57
    [Google Scholar]
  94. 94.
    Toseef M, Li X, Wong K-C. 2022. Reducing healthcare disparities using multiple multiethnic data distributions with fine-tuning of transfer learning. Brief. Bioinform. 23:3bbac078
    [Google Scholar]
  95. 95.
    Gao Y, Cui Y. 2022. Deep transfer learning provides a Pareto improvement for multi-ancestral clinico-genomic prediction of diseases. bioRxiv 2022.09.22.509055. https://doi.org/10.1101/2022.09.22.509055
    [Crossref]
  96. 96.
    LeCun Y, Bengio Y, Hinton G. 2015. Deep learning. Nature 521:436–44
    [Google Scholar]
  97. 97.
    Long M, Cao Y, Wang J, Jordan MI. 2015. Learning transferable features with deep adaptation networks. Proc. Mach. Learn. Res. 37:97–105
    [Google Scholar]
  98. 98.
    Zhao Z, Fritsche LG, Smith JA, Mukherjee B, Lee S. 2022. The construction of cross-population polygenic risk scores using transfer learning. Am. J. Hum. Genet. 109:1998–2008
    [Google Scholar]
  99. 99.
    Tian P, Chan TH, Wang Y-F, Yang W, Yin G, Zhang YD 2022. Multiethnic polygenic risk prediction in diverse populations through transfer learning. Front. Genet. 13:906965
    [Google Scholar]
  100. 100.
    Yosinski J, Clune J, Bengio Y, Lipson H. 2014. How transferable are features in deep neural networks?. Adv. Neural Inf. Process. Syst. 27:3320–28
    [Google Scholar]
  101. 101.
    Sevakula RK, Singh V, Verma NK, Kumar C, Cui Y. 2018. Transfer learning for molecular cancer classification using deep neural networks. IEEE/ACM Trans. Comput. Biol. Bioinform. 16:62089–100
    [Google Scholar]
  102. 102.
    Csurka G. 2017. Domain Adaptation in Computer Vision Applications. Cham, Switz: Springer
  103. 103.
    Guan H, Liu M. 2021. Domain adaptation for medical image analysis: a survey. IEEE Trans. Biomed. Eng. 69:1173–85
    [Google Scholar]
  104. 104.
    Long M, Zhu H, Wang J, Jordan MI. 2017. Deep transfer learning with joint adaptation networks. Proc. Mach. Learn. Res. 70:2208–17
    [Google Scholar]
  105. 105.
    Motiian S, Piccirilli M, Adjeroh DA, Doretto G. 2017. Unified deep supervised domain adaptation and generalization. Proceedings of the IEEE International Conference on Computer Vision (ICCV 2017)5716–26. Los Alamitos, CA: IEEE Comput. Soc.
    [Google Scholar]
  106. 106.
    Maity S, Mukherjee D, Yurochkin M, Sun Y. 2021. Does enforcing fairness mitigate biases caused by subpopulation shift?. Adv. Neural Inf. Process. Syst. 34:25773–84
    [Google Scholar]
  107. 107.
    Rajkomar A, Hardt M, Howell MD, Corrado G, Chin MH. 2018. Ensuring fairness in machine learning to advance health equity. Ann. Intern. Med. 169:866–72
    [Google Scholar]
  108. 108.
    Chen IY, Pierson E, Rose S, Joshi S, Ferryman K, Ghassemi M 2021. Ethical machine learning in healthcare. Annu. Rev. Biomed. Data Sci. 4:123–44
    [Google Scholar]
  109. 109.
    Mehrabi N, Morstatter F, Saxena N, Lerman K, Galstyan A. 2021. A survey on bias and fairness in machine learning. ACM Comput. Surveys 54:1–35
    [Google Scholar]
  110. 110.
    Zhao H, Gordon G. 2019. Inherent tradeoffs in learning fair representations. Adv. Neural Inf. Process. Syst. 32:15675–85
    [Google Scholar]
  111. 111.
    Menon AK, Williamson RC. 2018. The cost of fairness in binary classification. Proc. Mach. Learn. Res. 81:107–18
    [Google Scholar]
  112. 112.
    Chatterjee DK. 2011. Encyclopedia of Global Justice Dordrecht, Neth: Springer Sci. Bus. Media
  113. 113.
    Davis J, Goadrich M. 2006. The relationship between Precision-Recall and ROC curves. Proceedings of the 23rd International Conference on Machine Learning (ICML ’06)233–40. New York: Assoc. Comput. Mach.
    [Google Scholar]
  114. 114.
    He H, Garcia EA. 2009. Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 21:1263–84
    [Google Scholar]
  115. 115.
    Hanley JA, McNeil BJ. 1982. The meaning and use of the area under a receiver operating characteristic (ROC) curve. Radiology 143:29–36
    [Google Scholar]
  116. 116.
    Fatumo S, Chikowore T, Choudhury A, Ayub M, Martin AR, Kuchenbaecker K. 2022. A roadmap to increase diversity in genomic studies. Nat. Med. 28:243–50
    [Google Scholar]
  117. 117.
    Wickland DP, Sherman ME, Radisky DC, Mansfield AS, Asmann YW. 2022. Lower exome sequencing coverage of ancestrally African patients in The Cancer Genome Atlas. J. Natl. Cancer Inst. 114:1192–99
    [Google Scholar]
  118. 118.
    Weber CJ, Carrillo MC, Jagust W, Jack CR Jr., Shaw LM et al. 2021. The Worldwide Alzheimer's Disease Neuroimaging Initiative: ADNI-3 updates and global perspectives. Alzheimer's Dement. 7:1e12226
    [Google Scholar]
  119. 119.
    Weiner MW, Veitch DP, Aisen PS, Beckett LA, Cairns NJ et al. 2017. The Alzheimer's Disease Neuroimaging Initiative 3: continued innovation for clinical trial improvement. Alzheimer's Dement. 13:561–71
    [Google Scholar]
  120. 120.
    Pugh TJ, Bell JL, Bruce JP, Doherty GJ, Galvin M et al. 2022. AACR Project GENIE: 100,000 cases and beyond. Cancer Discov. 12:92044–57
    [Google Scholar]
  121. 121.
    GTEx (Genotype-Tissue Expression) Consort 2020. The GTEx Consortium atlas of genetic regulatory effects across human tissues. Science 369:1318–30
    [Google Scholar]
  122. 122.
    GTEx (Genotype-Tissue Expression) Consort 2022. Data set summary of analysis samples GTEx Analysis Release v8, accessed on Oct. 8, 2022. https://gtexportal.org/home/tissueSummaryPage
  123. 123.
    Wendt FR, Pathak GA, Vahey J, Qin X, Koller D et al. 2022. Modeling the longitudinal changes of ancestry diversity in the Million Veteran Program. bioRxiv 2022.01.24.477583. https://doi.org/10.1101/2022.01.24.477583
    [Crossref]
  124. 124.
    Meng C, Trinh L, Xu N, Enouen J, Liu Y. 2021. Interpretability and fairness evaluation of deep learning models on MIMIC-IV dataset. Sci. Rep. 12:7166
    [Google Scholar]
  125. 125.
    Johnson AEW, Bulgarelli L, Shen L, Gayless A, Shammout A et al. 2023. MIMIC-IV, a freely accessible electronic health record dataset. Sci. Data 10:1
    [Google Scholar]
  126. 126.
    Sleep Heart Health Study 2022. Sleep Heart Health Study dataset: race Natl. Sleep Res. Resour. accessed on Oct. 8, 2022. https://sleepdata.org/datasets/shhs/variables/race
  127. 127.
    Zhang G-Q, Cui L, Mueller R, Tao S, Kim M et al. 2018. The National Sleep Research Resource: towards a sleep data commons. J. Am. Med. Inform. Assoc. 25:1351–58
    [Google Scholar]
  128. 128.
    Pan-UK Biobank 2022. Overview: pan-ancestry genetic analysis of the UK Biobank Web Resour. Pan-UK Biobank, accessed on Oct. 8, 2022. https://pan.ukbb.broadinstitute.org/docs/technical-overview
  129. 129.
    All Us Res. Progr 2023. Data snapshots Web Resour., All Us Res. Progr. Natl. Inst. Health Bethesda, MD: https://www.researchallofus.org/data-tools/data-snapshots/
  130. 130.
    Chan-Zuckerberg Initiat 2023. Ancestry networks for the Human Cell Atlas Web Resour. Chan-Zuckerberg Initiat. San Francisco: https://chanzuckerberg.com/science/programs-resources/single-cell-biology/ancestry-networks
  131. 131.
    Zhou W, Kanai M, Wu K-HH, Rasheed H, Tsuo K et al. 2022. Global Biobank Meta-analysis Initiative: powering genetic discovery across human disease. Cell Genom. 2:100192
    [Google Scholar]
  132. 132.
    Mulder N, Abimiku A, Adebamowo SN, de Vries J, Matimba A et al. 2018. H3Africa: current perspectives. Pharmgenom. Pers. Med. 11:59–66
    [Google Scholar]
  133. 133.
    Wojcik GL, Graff M, Nishimura KK, Tao R, Haessler J et al. 2019. Genetic analyses of diverse populations improves discovery for complex traits. Nature 570:514–18
    [Google Scholar]
  134. 134.
    TOPMed (Trans-Omics Precis. Med.) 2022. About TOPMed Web Resour. TOPMed, Natl. Heart, Lung Blood Inst Bethesda, MD: https://www.nhlbiwgs.org/
  135. 135.
    Auton A, Abecasis GR, Altshuler DM, Durbin RM, Bentley DR et al. 2015. A global reference for human genetic variation. Nature 526:68–74
    [Google Scholar]
  136. 136.
    Schölkopf B, Janzing D, Peters J, Sgouritsa E, Zhang K, Mooij J. 2012. On causal and anticausal learning. Proceedings of the 29th International Conference on Machine Learning459–66. New York: Assoc. Comput. Mach.
    [Google Scholar]
  137. 137.
    Cunningham F, Allen JE, Allen J, Alvarez-Jarreta J, Amode MR et al. 2022. Ensembl 2022. Nucleic Acids Res 50:D988–95
    [Google Scholar]
/content/journals/10.1146/annurev-biodatasci-020722-020704
Loading
/content/journals/10.1146/annurev-biodatasci-020722-020704
Loading

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error