1932

Abstract

A bias in health research to favor understanding diseases as they present in men can have a grave impact on the health of women. This paper reports on a conceptual review of the literature on machine learning or natural language processing (NLP) techniques to interrogate big data for identifying sex-specific health disparities. We searched Ovid MEDLINE, Embase, and PsycINFO in October 2021 using synonyms and indexing terms for () “women,” “men,” or “sex”; () “big data,” “artificial intelligence,” or “NLP”; and () “disparities” or “differences.” From 902 records, 22 studies met the inclusion criteria and were analyzed. Results demonstrate that the inclusion by sex is inconsistent and often unreported, although the inclusion of men in these studies is disproportionately less than women. Even though artificial intelligence and NLP techniques are widely applied in healthresearch, few studies use them to take advantage of unstructured text to investigate sex-related differences or disparities. Researchers are increasingly aware of sex-based data bias, but the process toward correction is slow. We reflect on best practices on using big data analytics to address sex-specific biases in understanding the etiology, diagnosis, and prognosis of diseases.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-biodatasci-122120-025806
2022-08-10
2024-06-13
Loading full text...

Full text loading...

/deliver/fulltext/biodatasci/5/1/annurev-biodatasci-122120-025806.html?itemId=/content/journals/10.1146/annurev-biodatasci-122120-025806&mimeType=html&fmt=ahah

Literature Cited

  1. 1.
    NIH ORWH (Natl. Inst. Health Off. Res. Women's Health) 2016. NIH policy on sex as a biological variable Web Resour NIH, Bethesda, MD: https://orwh.od.nih.gov/sex-gender/nih-policy-sex-biological-variable
    [Google Scholar]
  2. 2.
    Hughes RN. 2007. Sex does matter: comments on the prevalence of male-only investigations of drug effects on rodent behaviour. Behav. Pharmacol. 18:7583–89
    [Google Scholar]
  3. 3.
    Mauvais-Jarvis F, Arnold AP, Reue K. 2017. A guide for the design of pre-clinical studies on sex differences in metabolism. Cell Metab. 25:61216–30
    [Google Scholar]
  4. 4.
    Zucker I, Beery AK. 2010. Males still dominate animal studies. Nature 465:7299690
    [Google Scholar]
  5. 5.
    Criado-Perez C. 2019. Invisible Women: Exposing Data Bias in a World Designed for Men New York: Abrams Press
    [Google Scholar]
  6. 6.
    Ibarra M, Vázquez M, Fagiolino P. 2017. Sex effect on average bioequivalence. Clin. Ther. 39:123–33
    [Google Scholar]
  7. 7.
    Blair ML. 2007. Sex-based differences in physiology: What should we teach in the medical curriculum?. Adv. Physiol. Educ. 31:23–25
    [Google Scholar]
  8. 8.
    Marts SA, Keitt S. 2004. Foreword: a historical overview of advocacy for research in sex-based biology. Adv. Physiol. Educ. 34:v–xiii
    [Google Scholar]
  9. 9.
    Mehta LS, Beckie TM, DeVon HA, Grines CL, Krumholz HM et al. 2016. Acute myocardial infarction in women: a scientific statement from the American Heart Association. Circulation 133:9916–47
    [Google Scholar]
  10. 10.
    Mozaffarian D, Benjamin EJ, Go AS, Arnett DK, Blaha MJ et al. 2015. Heart disease and stroke statistics—2015 update: a report from the American Heart Association. Circulation 131:4e29–322
    [Google Scholar]
  11. 11.
    Bugiardini R, Estrada JL, Nikus K, Hall AS, Manfrini O. 2010. Gender bias in acute coronary syndromes. Curr. Vasc. Pharmacol. 8:276–84
    [Google Scholar]
  12. 12.
    Aryal S, Diaz-Guzman E, Mannino DM. 2013. COPD and gender differences: an update. Transl. Res. 162:208–18
    [Google Scholar]
  13. 13.
    Assayag D, Morisset J, Johannson KA, Wells AU, Walsh SLF. 2020. Patient gender bias on the diagnosis of idiopathic pulmonary fibrosis. Thorax 75:407–12
    [Google Scholar]
  14. 14.
    Martinez CH, Raparla S, Plauschinat CA, Giardino ND, Rogers B et al. 2012. Gender differences in symptoms and care delivery for chronic obstructive pulmonary disease. J. Women's Health 21:1267–74
    [Google Scholar]
  15. 15.
    Ferretti MT, Iulita MF, Cavedo E, Chiesa PA, Dimech AS et al. 2018. Sex differences in Alzheimer disease—the gateway to precision medicine. Nat. Rev. Neurol. 14:457–69
    [Google Scholar]
  16. 16.
    Kuehner C. 2017. Why is depression more common among women than among men?. Lancet Psychiatry 4:146–58
    [Google Scholar]
  17. 17.
    Kim HI, Lim H, Moon A. 2018. Sex differences in cancer: epidemiology, genetics and therapy. Biomol. Ther. 26:335–42
    [Google Scholar]
  18. 18.
    Natri H, Garcia AR, Buetow KH, Trumble BC, Wilson MA. 2019. The pregnancy pickle: evolved immune compensation due to pregnancy underlies sex differences in human diseases. Trends Genet. 35:478–88
    [Google Scholar]
  19. 19.
    Dance A. 2019. Why the sexes don't feel pain the same way. Nature 567:448–50
    [Google Scholar]
  20. 20.
    Guggenmos M, Schmack K, Sekutowicz M, Garbusow M, Sebold M et al. 2017. Quantitative neurobiological evidence for accelerated brain aging in alcohol dependence. Transl. Psychiatry 7:1279
    [Google Scholar]
  21. 21.
    CDC (Cent. Dis. Control Prev.) 2017. Compressed mortality file 1999–2016 Dataset, CDC, Atlanta, GA: accessed Nov. 12, 2021. http://wonder.cdc.gov/cmf-icd10.html
    [Google Scholar]
  22. 22.
    Roberts A. 2017. Language, structure, and reuse in the electronic health record. AMA J. Ethics 19:3281–88
    [Google Scholar]
  23. 23.
    Rosenbloom ST, Denny JC, Xu H, Lorenzi N, Stead WW, Johnson KB. 2011. Data from clinical notes: a perspective on the tension between structure and flexible documentation. J. Am. Med. Inform. Assoc. 18:2181–86
    [Google Scholar]
  24. 24.
    Greenhalgh T, Potts HW, Wong G, Bark P, Swinglehurst D. 2009. Tensions and paradoxes in electronic patient record research: a systematic literature review using the meta-narrative method. Milbank Q. 87:4729–88
    [Google Scholar]
  25. 25.
    Hernandez-Boussard T, Monda KL, Crespo BC, Riskin D. 2019. Real world evidence in cardiovascular medicine: ensuring data validity in electronic health record-based studies. J. Am. Med. Inform. Assoc. 26:111189–94
    [Google Scholar]
  26. 26.
    Nunes A, Yang J, Radican L, Engel S, Kurtyka K et al. 2016. Assessing occurrence of hypoglycemia and its severity from electronic health records of patients with type 2 diabetes mellitus. Diabetes Res. Clin. Pract. 121:192–203
    [Google Scholar]
  27. 27.
    Nikfarjam A, Sarker A, O'Connor K, Ginn R, Gonzalez G. 2015. Pharmacovigilance from social media: mining adverse drug reaction mentions using sequence labeling with word embedding cluster features. J. Am. Med. Inform. Assoc. 22:671–81
    [Google Scholar]
  28. 28.
    Belz A, Hoile R, Ford E, Mullick A 2019. Conceptualisation and annotation of drug nonadherence information for knowledge extraction from patient-generated texts. Proceedings of the 5th Workshop on Noisy User-generated Text (W-NUT 2019)202–11 Stroudsburg, PA: Assoc. Comput. Linguist.
    [Google Scholar]
  29. 29.
    Sarker A, O'Connor K, Ginn R, Scotch M, Smith K et al. 2016. Social media mining for toxicovigilance: automatic monitoring of prescription medication abuse from Twitter. Drug Saf. 39:3231–40
    [Google Scholar]
  30. 30.
    Harris JK, Hawkins JB, Nguyen L, Nsoesie EO, Tuli G et al. 2017. Using Twitter to identify and respond to food poisoning: The food safety STL project. J. Public Health Manag. Pract. 23:6577–80
    [Google Scholar]
  31. 31.
    Rocklöv J, Tozan Y, Ramadona A, Sewe MO, Sudre B et al. 2019. Using big data to monitor the introduction and spread of Chikungunya, Europe, 2017. Emerg. Infect. Dis. 25:1041–49
    [Google Scholar]
  32. 32.
    Zadeh AH, Zolbanin HM, Sharda R, Delen D 2019. Social media for nowcasting flu activity: spatio-temporal big data analysis. Inf. Syst. Front. 21:743–60
    [Google Scholar]
  33. 33.
    Murray C, Mitchell L, Tuke J, Mackay M. 2020. Symptom extraction from the narratives of personal experiences with COVID-19 on Reddit. Proceedings of the 15th International AAAI Conference on Web and Social Media (ICWSM 2020) Menlo Park, CA: AAAI
    [Google Scholar]
  34. 34.
    Golder S, Klein AZ, Magge A, O'Connor K, Cai H et al. 2020. Extending A chronological and geographical analysis of personal reports of COVID-19 on Twitter to England, UK. medRxiv 10.1101/2020.05.05.20083436. https://doi.org/10.1101/2020.05.05.20083436
    [Crossref]
  35. 35.
    Karisani N, Karisani P. 2020. Mining coronavirus (COVID-19) posts in social media. arXiv:2004.06778 [cs.CL]. https://arxiv.org/abs/2004.06778
  36. 36.
    Klein AZ, Magge A, O'Connor KM, Cai H, Weissenbacher D, Gonzalez-Hernandez G. 2020. A chronological and geographical analysis of personal reports of COVID-19 on Twitter. medRxiv 10.1101/2020.04.19.20069948. https://dx.doi.org/10.1101/2020.04.19.20069948
    [Crossref]
  37. 37.
    Sarker A, Lakamana S, Hogg-Bremer W, Xie A, Al-Garadi MA, Yang YC 2020. Self-reported COVID-19 symptoms on Twitter: an analysis and a research resource. J. Am. Med. Inform. Assoc. 27:1310–15
    [Google Scholar]
  38. 38.
    Shen C, Chen A, Luo C, Zhang J, Feng B, Liao W. 2020. Using reports of symptoms and diagnoses on social media to predict COVID-19 case counts in mainland China: observational infoveillance study. J. Med. Internet Res. 22:5e19421
    [Google Scholar]
  39. 39.
    Gharavi E, Nazemi N, Dadgostari F. 2020. Early outbreak detection for proactive crisis management using Twitter data: Covid-19 a case study in the US. arXiv:2005.00475. https://arxiv.org/abs/2005.00475
  40. 40.
    Jahanbin K, Rahmanian V. 2020. Using Twitter and web news mining to predict COVID-19 outbreak. Asian Pac. J. Trop. Med. 13:8378–80
    [Google Scholar]
  41. 41.
    Cesare N, Nguyen QC, Grant C, Nsoesie EO. 2019. Social media captures demographic and regional physical activity. BMJ Open Sport Exerc. Med. 5:e000567
    [Google Scholar]
  42. 42.
    De Choudhury M, Sharma SS, Logar T, Eekhout W, Nielsen RC 2017. Gender and cross-cultural differences in social media disclosures of mental illness. Proceedings of the 2017 ACM Conference on Computer Supported Cooperative Work and Social Computing S Poltrock, CP Lee 353–69 New York: ACM
    [Google Scholar]
  43. 43.
    Wesley EW, Kadra-Scalzo G, Pritchard M, Shetty H, Broadbent M et al. 2021. Gender disparities in clozapine prescription in a cohort of treatment-resistant schizophrenia in the South London and Mauds-ley case register. Schizophr. Res. 232:68–76
    [Google Scholar]
  44. 44.
    Irving J, Colling C, Shetty H, Pritchard M, Stewart R et al. 2021. Gender differences in clinical presentation and illicit substance use during first episode psychosis: a natural language processing, electronic case register study. BMJ Open 11:4e042949
    [Google Scholar]
  45. 45.
    Ancochea J, Izquierdo JL, Savana COVID-19 Res. Group, Soriano JB 2021. Evidence of gender differences in the diagnosis and management of coronavirus disease 2019 patients: an analysis of electronic health records using natural language processing and machine learning. J. Women's Health 30:3393–404
    [Google Scholar]
  46. 46.
    Niemann U, Boecking B, Brueggemann P, Mazurek B, Spiliopoulou M. 2020. Gender-specific differences in patients with chronic tinnitus—baseline characteristics and treatment effects. Front. Neurosci. 14:487
    [Google Scholar]
  47. 47.
    Ghorbani M, Pousset F, Tucker A, Swift S, Giunti P et al. 2019. Analysis of Friedreich's ataxia patient clinical data reveals importance of accurate GAA repeat determination in disease prognosis and gender differences in cardiac measures. Inform. Med. Unlocked 17:100266
    [Google Scholar]
  48. 48.
    Grossi E, Massini G, Buscema M, Savarè R, Maurelli G. 2005. Two different Alzheimer diseases in men and women: clues from advanced neural networks and artificial intelligence. Gender Med. 2:2106–17
    [Google Scholar]
  49. 49.
    Phinyomark A, Osis ST, Hettinga BA, Kobsar D, Ferber R. 2016. Gender differences in gait kinematics for patients with knee osteoarthritis. BMC Musculoskelet. Disord. 17:157
    [Google Scholar]
  50. 50.
    Nardelli M, Valenza G, Bianchi M, Greco A, Lanata A et al. 2015. Gender-specific velocity recognition of caress-like stimuli through nonlinear analysis of heart rate variability. 37th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)298–301 New York: IEEE
    [Google Scholar]
  51. 51.
    Baron-Cohen S, Bowen DC, Holt RJ, Allison C, Auyeung B et al. 2015. The “reading the mind in the eyes” test: complete absence of typical sex difference in 400 men and women with autism. PLOS ONE 10:8e0136521
    [Google Scholar]
  52. 52.
    Gradus JL, King MW, Galatzer-Levy I, Street AE. 2017. Gender differences in machine learning models of trauma and suicidal ideation in veterans of the Iraq and Afghanistan Wars. J. Trauma. Stress362–71
    [Google Scholar]
  53. 53.
    Vigna L, Brunani A, Brugnera A, Grossi E, Compare A et al. 2019. Determinants of metabolic syndrome in obese workers: gender differences in perceived job-related stress and in psychological characteristics identified using artificial neural networks. Eating Weight Disord. 24:73–81
    [Google Scholar]
  54. 54.
    Spechler PA, Allgaier N, Chaarani B, Whelan R, Watts R et al. 2019. The initiation of cannabis use in adolescence is predicted by sex-specific psychosocial and neurobiological features. Eur. J. Neurosci. 50:32346–56
    [Google Scholar]
  55. 55.
    Badal VD, Graham SA, Depp CA, Shinkawa K, Yamada Y et al. 2021. Prediction of loneliness in older adults using natural language processing: exploring sex differences in speech. Am. J. Geriatr. Psychiatry 29:8853–66
    [Google Scholar]
  56. 56.
    Busto Serrano N, Sánchez AS, Lasheras FS, Iglesias-Rodríguez FJ, Valverde GF 2020. Identification of gender differences in the factors influencing shoulders, neck and upper limb MSD by means of multivariate adaptive regression splines (MARS). Appl. Ergon. 82:102981
    [Google Scholar]
  57. 57.
    Davis JP, Eddie D, Prindle J, Dworkin ER, Christie NC et al. 2021. Sex differences in factors predicting post-treatment opioid use. Addiction 116:82116–26
    [Google Scholar]
  58. 58.
    Wang Y, Liu S, Wang Z, Fan Y, Huang J et al. 2021. A machine learning-based investigation of gender-specific prognosis of lung cancers. Medicina 57:299
    [Google Scholar]
  59. 59.
    Tokodi M, Behon A, Merkel ED, Kovács A, Tösér Z et al. 2021. Sex-specific patterns of mortality predictors among patients undergoing cardiac resynchronization therapy: a machine learning approach. Front. Cardiovasc. Med. 8:87
    [Google Scholar]
  60. 60.
    Kalgotra P, Sharda R, Croff JM. 2017. Examining health disparities by gender: a multimorbidity network analysis of electronic medical record. Int. J. Med. Inform. 108:22–28
    [Google Scholar]
  61. 61.
    Gradus JL, Rosellini AJ, Horváth-Puhó E, Street AE, Galatzer-Levy I et al. 2020. Prediction of sex-specific suicide risk using machine learning and single-payer health care registry data from Denmark. JAMA Psychiatry 77:125–34
    [Google Scholar]
  62. 62.
    Morrison FJ, Zhang H, Skentzos S, Shubina M, Bentley-Lewis R, Turchin A. 2014. Reasons for discontinuation of lipid-lowering medications in patients with chronic kidney disease. Cardiorenal Med. 4:3–4225–33
    [Google Scholar]
  63. 63.
    Cesare N, Grant C, Hawkins JB, Brownstein JS, Nsoesie EO. 2017. Demographics in social media data for public health research: Does it matter?. arXiv:1710.11048 [cs.CY]. https://arxiv.org/abs/1710.11048
  64. 64.
    Mello MM, Jagsi R. 2020. Standing up against gender bias and harassment—a matter of professional ethics. N. Engl. J. Med. 382:151385–87
    [Google Scholar]
  65. 65.
    Gupta GR, Oomman N, Grown C, Conn K, Hawkes S et al. 2019. Gender equality and gender norms: framing the opportunities for health. Lancet 393:101902550–62
    [Google Scholar]
  66. 66.
    Nieuwenhoven L, Klinge I. 2010. Scientific excellence in applying sex-and gender-sensitive methods in biomedical and health research. J. Women's Health 19:2313–21
    [Google Scholar]
  67. 67.
    PAHO (Pan-Am. Health Organ.), WHO (World Health Organ.) 2011. Women and men face different chronic disease risks Press Release, Feb. 28, PAHO/WHO Washington, DC: https://www.paho.org/hq/index.php?option=com_content&view=article&id=5080:2011-women-men-face-different-chronic-disease-risks&Itemid=135&lang=en
    [Google Scholar]
  68. 68.
    Varì R, Scazzocchio B, D'Amore A, Giovannini C, Gessani S, Masella R 2016. Gender-related differences in lifestyle may affect health status. Ann. Ist. Super. Sanità 52:2158–66
    [Google Scholar]
  69. 69.
    Phinyomark A, Hettinga BA, Osis ST, Ferber R 2014. Gender and age-related differences in bilateral lower extremity mechanics during treadmill running. PLOS ONE 9:8e105246
    [Google Scholar]
  70. 70.
    Crenshaw K. 1991. Race, gender, and sexual harassment. South. Cal. Law Rev. 65:1467–76
    [Google Scholar]
  71. 71.
    Bowleg L. 2012. The problem with the phrase women and minorities: intersectionality—an important theoretical framework for public health. Am. J. Public Health 102:71267–73
    [Google Scholar]
  72. 72.
    Pierson E, Cutler DM, Leskovec J, Mullainathan S, Obermeyer Z. 2021. An algorithmic approach to reducing unexplained pain disparities in underserved populations. Nat. Med. 27:136–40
    [Google Scholar]
  73. 73.
    Bozkurt S, Cahan EM, Seneviratne MG, Sun R, Lossio-Ventura JA et al. 2020. Reporting of demographic data and representativeness in machine learning models using electronic health records. J. Am. Med. Inform. Assoc. 27:121878–84
    [Google Scholar]
  74. 74.
    Lwowski B, Rios A. 2021. The risk of racial bias while tracking influenza-related content on social media using machine learning. J. Am. Med. Inform. Assoc. 28:4839–49
    [Google Scholar]
  75. 75.
    Golder S, Stevens R, O'Connor K, James R, Gonzalez-Hernandez G 2021. Who is tweeting? A scoping review of methods to establish race and ethnicity from Twitter datasets. SocArXiv. https://doi.org/10.31235/osf.io/wru5q
    [Crossref]
  76. 76.
    Vyas DA, Eisenstein LG, Jones DS. 2020. Hidden in plain sight—reconsidering the use of race correction in clinical algorithms. N. Engl. J. Med. 383:874–82
    [Google Scholar]
  77. 77.
    Jones T. 2018. Intersex studies: a systematic review of international health literature. SAGE Open 8:2 https://doi.org/10.1177/2158244017745577
    [Crossref] [Google Scholar]
  78. 78.
    Scandurra C, Mezza F, Maldonato NM, Bottone M, Bochicchio V et al. 2019. Health of non-binary and genderqueer people: a systematic review. Front. Psychol. 10:1453
    [Google Scholar]
  79. 79.
    Marshall Z, Welch V, Minichiello A, Swab M, Brunger F, Kaposy C. 2019. Documenting research with transgender, nonbinary, and other gender diverse (trans) individuals and communities: introducing the global trans research evidence map. Transgender Health 4:168–80
    [Google Scholar]
/content/journals/10.1146/annurev-biodatasci-122120-025806
Loading
/content/journals/10.1146/annurev-biodatasci-122120-025806
Loading

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error