The big data revolution presents an exciting frontier to expand public health research, broadening the scope of research and increasing the precision of answers. Despite these advances, scientists must be vigilant against also advancing potential harms toward marginalized communities. In this review, we provide examples in which big data applications have (unintentionally) perpetuated discriminatory practices, while also highlighting opportunities for big data applications to advance equity in public health. Here, big data is framed in the context of the five Vs (volume, velocity, veracity, variety, and value), and we propose a sixth V, virtuosity, which incorporates equity and justice frameworks. Analytic approaches to improving equity are presented using social computational big data, fairness in machine learning algorithms, medical claims data, and data augmentation as illustrations. Throughout, we emphasize the biasing influence of data absenteeism and positionality and conclude with recommendations for incorporating an equity lens into big data research.


Article metrics loading...

Loading full text...

Full text loading...


Literature Cited

  1. 1. 
    Abrams MP, Torres FE, Little SJ. 2021. Biometric registration to an HIV research study may deter participation. AIDS Behav. 25:51552–59
    [Google Scholar]
  2. 2. 
    Acquisti A, Gross R. 2006. Imagined communities: awareness, information sharing, and privacy on the Facebook. Proceedings of the International Workshop on Privacy Enhancing Technologies (PET 2006)36–58 Berlin: Springer
    [Google Scholar]
  3. 3. 
    Agarwal A, Beygelzimer A, Dudík M, Langford J, Wallach H. 2018. A reductions approach to fair classification. Proceedings of the 2018 International Conference on Machine Learning (ICML 18)60–69 New York: ACM
    [Google Scholar]
  4. 4. 
    Anderson MJ, Fienberg SE 1999. Who Counts? The Politics of Census-Taking in Contemporary America New York: Russell Sage Found.
  5. 5. 
    Ayhan CHB, Bilgin H, Uluman OT, Sukut O, Yilmaz S, Buzlu S 2020. A systematic review of the discrimination against sexual and gender minority in health care settings. Int. J. Health Serv. 50:144–61
    [Google Scholar]
  6. 6. 
    Baro E, Degoul S, Beuscart R, Chazard E. 2015. Toward a literature-driven definition of big data in healthcare. Biomed. Res. Int. 2015 639021
    [Google Scholar]
  7. 7. 
    Barocas JA 2020. Commentary on Jones, et al. 2020: Using indirect estimation methods of drug use prevalence to address racial and ethnic health disparities. Addiction 115:122405–6
    [Google Scholar]
  8. 8. 
    Blewett LA, Thiede Call K, Turner J, Hest R 2018. Resources for conducting health services and policy research. Annu. Rev. Public Health 39:437–52
    [Google Scholar]
  9. 9. 
    Biller-Andorno N, Biller A. 2019. Algorithm-aided prediction of patient preferences—an ethics sneak peek. N. Engl. J. Med. 381:1480–85
    [Google Scholar]
  10. 10. 
    Böhning D, Rocchetti I, Maruotti A, Holling H 2020. Estimating the undetected infections in the Covid-19 outbreak by harnessing capture–recapture methods. Int. J. Infect. Dis. 97:197–201
    [Google Scholar]
  11. 11. 
    Bollen J, Mao H, Zeng X 2011. Twitter mood predicts the stock market. J. Comput. Sci. 2:11–8
    [Google Scholar]
  12. 12. 
    Brandon DT, Isaac LA, LaVeist TA. 2005. The legacy of Tuskegee and trust in medical care: Is Tuskegee responsible for race differences in mistrust of medical care?. J. Natl. Med. Assoc. 97:7951–56
    [Google Scholar]
  13. 13. 
    Braveman PA, Kumanyika S, Fielding J, Laveist T, Borrell LN et al. 2011. Health disparities and health equity: The issue is justice. Am. J. Public Health 101: Suppl. 1 S149–55
    [Google Scholar]
  14. 14. 
    Brownson RC, Kumanyika SK, Kreuter MW, Haire-Joshu D. 2021. Implementation science should give higher priority to health equity. Implement. Sci. 16:128
    [Google Scholar]
  15. 15. 
    Buntain C, Golbeck J. 2015. This is your Twitter on drugs: Any questions?. Proceedings of the 24th International Conference on the World Wide Web777–82 New York: ACM
  16. 16. 
    Burchard EG, Oh SS, Foreman MG, Celedón JC. 2015. Moving toward true inclusion of racial/ethnic minorities in federally funded studies. A key step for achieving respiratory health equality in the United States. Am. J. Respir. Crit. Care Med. 191:5514–21
    [Google Scholar]
  17. 17. 
    Carvalho JP, Rosa H, Brogueira G, Batista F. 2017. MISNIS: an intelligent platform for Twitter topic mining. Expert Syst. Appl. 89:374–88
    [Google Scholar]
  18. 18. 
    Cesare N, Grant C, Nguyen Q, Lee H, Nsoesie EO. 2017. How well can machine learning predict demographics of social media users?. arXiv:1702.01807 [cs]
  19. 19. 
    Cesare N, Grant C, Nsoesie EO 2017. Detection of user demographics on social media: a review of methods and recommendations for best practices. arXiv:1702.01807 [cs]
  20. 20. 
    Chang S, Pierson E, Koh PW, Gerardin J, Redbird B et al. 2021. Mobility network models of COVID-19 explain inequities and inform reopening. Nature 589:82–87
    [Google Scholar]
  21. 21. 
    Chen MS Jr., Lara PN, Dang JH, Paterniti DA, Kelly K 2014. Twenty years post-NIH Revitalization Act: enhancing minority participation in clinical trials (EMPaCT): laying the groundwork for improving minority clinical trial accrual: renewing the case for enhancing minority participation in cancer clinical trials. Cancer 120:1091–96
    [Google Scholar]
  22. 22. 
    Comenetz J. 2016. Frequently occurring surnames in the 2010 Census Data Tables, US Census Bur. Washington, DC:
  23. 23. 
    Corbett-Davies S, Sharad G. 2018. The measure and mis measure of fairness: a critical review of fair machine learning. arXiv:1808.00023 [cs.CY]
  24. 24. 
    Costanza-Chock S. 2018. Design justice, A.I., and escape from the matrix of domination. JoDS Blog July 16
    [Google Scholar]
  25. 25. 
    Crenshaw K. 1989. Demarginalizing the intersection of race and sex: a black feminist critique of antidiscrimination doctrine, feminist theory and antiracist politics. Univ. Chicago Leg. Forum 1989 139–67
    [Google Scholar]
  26. 26. 
    Curtis A, Ajayakumar J, Curtis J, Mihalik S, Purohit M et al. 2020. Geographic monitoring for early disease detection (GeoMEDD). Sci. Rep. 10:21753
    [Google Scholar]
  27. 27. 
    Curtis DS, Washburn T, Lee H, Smith KR, Kim J et al. 2021. Highly public anti-Black violence is associated with poor mental health days for Black Americans. PNAS 118:17e2019624118
    [Google Scholar]
  28. 28. 
    Dankwa-Mullan I, Zhang X, Le PT, Riley WT et al. 2021. Applications of big data science and analytic techniques for health disparities research. The Science of Health Disparities Research I Dankwa-Mullan, EJ Pérez-Stable, KL Gardner, X Zhang, AM Rosario 221–42 New York: Wiley
    [Google Scholar]
  29. 29. 
    Davis MM, Renfro S, Pham R, Hassmiller Lich K, Shannon J et al. 2017. Geographic and population-level disparities in colorectal cancer testing: a multilevel analysis of Medicaid and commercial claims data. Prev. Med. 1012017:44–52
    [Google Scholar]
  30. 30. 
    D'Ignazzio C, Klein LF 2020. Data Feminism Cambridge, MA: MIT Press
  31. 31. 
    Dombrowski K, Khan B, Wendel T, McLean K, Misshula E, Curtis R 2012. Estimating the size of the methamphetamine-using population in New York City using network sampling techniques. Adv. Appl. Sociol. 2:4245–52
    [Google Scholar]
  32. 32. 
    Duggan M, Brenner J. 2013. The demographics of social media users; 2012. Pew Research Center Blog Febr. 14
    [Google Scholar]
  33. 33. 
    Dula A. 1994. African American suspicion of the healthcare system is justified: What do we do about it?. Camb. Q. Healthc. Ethics 3:347–57
    [Google Scholar]
  34. 34. 
    Dumbill E. 2013. Making sense of Big Data. Big Data 1:11–2
    [Google Scholar]
  35. 35. 
    Eneanya ND, Yang W, Reese PP 2019. Reconsidering the consequences of using race to estimate kidney function. JAMA 322:2113–14
    [Google Scholar]
  36. 36. 
    Erikson SL. 2018. Cell phones ≠ self and other problems with big data detection and containment during epidemics. Med. Anthropol. Q. 32:3315–39
    [Google Scholar]
  37. 37. 
    Eubanks V. 2018. Automating Inequality New York: St. Martin's
  38. 38. 
    Facebook 2021. Number of daily active Facebook users worldwide as of 2nd quarter 2021 Data Vis. Facebook Menlo Park, CA: https://www.statista.com/statistics/346167/facebook-global-dau/
  39. 39. 
    Ford CL, Takahashi LM, Chandanabhumma PP, Ruiz ME, Cunningham WE. 2018. Anti-racism methods for big data research: lessons learned from the HIV Testing, Linkage & Retention in Care (HIV TLR) study. Ethn. Dis. 28:Suppl. 1261–66
    [Google Scholar]
  40. 40. 
    Freimuth VS, Quinn SC, Thomas SB, Cole G, Zook E, Duncan T 2001. African Americans’ views on research and the Tuskegee Syphilis Study. Soc. Sci. Med. 52:5797–808
    [Google Scholar]
  41. 41. 
    Gianfranceso MA, Tamang S, Yazdany J, Schmajuk G. 2018. Potential biases in machine learning algorithms using electronic health record data. JAMA Intern. Med. 178:111544–47
    [Google Scholar]
  42. 42. 
    Giordano LA, Elliott MN, Goldstein E, Lehrman WG, Spencer PA. 2010. Development, implementation, and public reporting of the HCAHPS survey. Med. Care Res. Rev. 67:127–37
    [Google Scholar]
  43. 43. 
    Göçmen İ, Yılmaz V. 2017. Exploring perceived discrimination among LGBT individuals in Turkey in education, employment, and health care: results of an online survey. J. Homosex. 64:81052–68
    [Google Scholar]
  44. 44. 
    Goel N, Yaghini M, Faltings B 2018. Non-discriminatory machine learning through convex fairness criteria. Proceedings of the 32nd AAAI/ACM Conference on AI, Ethics, and Society116 New York: ACM
    [Google Scholar]
  45. 45. 
    Green AH, Ball P. 2019. Civilian killings and disappearances during civil war in El Salvador (1980–1992). Demogr. Res. 41:781–814
    [Google Scholar]
  46. 46. 
    Greenwood S, Perrin A, Duggan M. 2016. Social media update 2016 News Release, Pew Res. Cent. Washington, DC:
  47. 47. 
    Hajian S, Bonchi F, Castillo C. 2016. Algorithmic bias: from discrimination discovery to fairness-aware data mining. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining2125–26 New York: ACM
  48. 48. 
    Hall HI, Song R, Gerstle JE, Lee LM. 2006. Assessing the completeness of reporting of human immunodeficiency virus diagnoses in 2002–2003: capture–recapture methods. Am. J. Epidemiol. 164:4391–97
    [Google Scholar]
  49. 49. 
    Hardt M, Price E, Srebro N 2016. Equality of opportunity in supervised learning. Proceedings of the 30th Conference on Neural Information Processing System (NeurIPS 2016)3315–23 San Diego, CA: NeurIPS
    [Google Scholar]
  50. 50. 
    Hatzenbuehler ML. 2011. The social environment and suicide attempts in lesbian, gay, and bisexual youth. Pediatrics 127:5896–903
    [Google Scholar]
  51. 51. 
    Hawkins JB, Tuli G, Kluberg S, Harris J, Brownstein JS, Nsoesie E. 2016. A digital platform for local foodborne illness and outbreak surveillance. Online J. Public Health Inform. 8:16474
    [Google Scholar]
  52. 52. 
    Hogan B. 2008. A comparison of on and offline networks through the Facebook API Work. Pap., Univ. Oxford Oxford, UK: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=1331029
  53. 53. 
    Hox JJ, Moerbeek M, van de Schoot R. 2018. Multilevel Analysis New York: Routledge. , 3rd ed..
  54. 54. 
    Hswen Y, Brownstein JS. 2019. Real-time digital surveillance of vaping-induced pulmonary disease. N. Engl. J. Med. 381:181778–80
    [Google Scholar]
  55. 55. 
    Hswen Y, Brownstein JS, Liu J, Hawkins JB. 2017. Use of a digital health application for influenza surveillance in China. Am. J. Public Health 107:71130–36
    [Google Scholar]
  56. 56. 
    Hswen Y, Brownstein JS, Xu X, Yom-Tov E. 2020. Early detection of COVID-19 in China and the USA: summary of the implementation of a digital decision-support and disease surveillance tool. BMJ Open 10:12e041004
    [Google Scholar]
  57. 57. 
    Hswen Y, Gopaluni A, Brownstein JS, Hawkins JB. 2019. Using Twitter to detect psychological characteristics of self-identified persons with autism spectrum disorder: a feasibility study. JMIR mHealth uHealth 7:2e12264
    [Google Scholar]
  58. 58. 
    Hswen Y, Naslund JA, Brownstein JS, Hawkins JB. 2018. Online communication about depression and anxiety among Twitter users with schizophrenia: preliminary findings to inform a digital phenotype using social media. Psychiatr. Q. 89:3569–80
    [Google Scholar]
  59. 59. 
    Hswen Y, Qin Q, Williams DR, Viswanath K, Subramanian S, Brownstein JS. 2020. Online negative sentiment towards Mexicans and Hispanics and impact on mental well-being: a time-series analysis of social media data during the 2016 United States presidential election. Heliyon 6:9e04910
    [Google Scholar]
  60. 60. 
    Hswen Y, Viswanath K. 2015. Beyond the hype: mobile technologies and opportunities to address health disparities. J. Mob. Technol. Med. 4:139–40
    [Google Scholar]
  61. 61. 
    Hswen Y, Williams DR, Tuli G, Sewalk K, Hawkins JB et al. 2020. Racial and ethnic disparities in patient experiences in the United States: 4-year content analysis of Twitter. J. Med. Internet Res. 22:8e17048
    [Google Scholar]
  62. 62. 
    Hswen Y, Xu X, Hing A, Hawkins JB, Brownstein JS, Gee GC. 2021. Association of “#COVID19” versus “#Chinesevirus” with anti-Asian sentiments on Twitter: March 9–23, 2020. Am. J. Public Health 111:5956–64
    [Google Scholar]
  63. 63. 
    Hswen Y, Zhang A, Brownstein JS. 2020. Leveraging black-market street buprenorphine pricing to increase capacity to treat opioid addiction, 2010–2018. Prev. Med. 137:106105
    [Google Scholar]
  64. 64. 
    Hswen Y, Zhang A, Freifeld C, Brownstein JS. 2020. Evaluation of volume of news reporting and opioid-related deaths in the United States: comparative analysis study of geographic and socioeconomic differences. J. Med. Internet Res. 22:7e17693
    [Google Scholar]
  65. 65. 
    Hswen Y, Zhang A, Sewalk K, Tuli G, Brownstein JS, Hawkins JB. 2020. Use of social media to assess the impact of equitable state policies on LGBTQ patient experiences: an exploratory study. Healthcare 8:2100410
    [Google Scholar]
  66. 66. 
    Hswen Y, Zhang A, Sewalk KC, Tuli G, Brownstein JS, Hawkins JB. 2020. Investigation of geographic and macrolevel variations in LGBTQ patient experiences: longitudinal social media analysis. J. Med. Internet Res. 22:7e17087
    [Google Scholar]
  67. 67. 
    Hu M, Xu C, Wang J. 2020. Spatiotemporal analysis of men who have sex with men in Mainland China: social app capture–recapture method. JMIR mHealth uHealth 8:1e14800
    [Google Scholar]
  68. 68. 
    Int. Work. Group Dis. Monit. Forecast 1995. Capture–recapture and multiple-record systems estimation. I: History and theoretical development. Am. J. Epidemiol. 142:101047–58
    [Google Scholar]
  69. 69. 
    Isaak J, Hanna MJ. 2018. User data privacy: Facebook, Cambridge Analytica, and privacy protection. Computer 51:856–59
    [Google Scholar]
  70. 70. 
    Jain SH, Powers BW, Hawkins JB, Brownstein JS. 2015. The digital phenotype. Nat. Biotechnol. 33:5462–63
    [Google Scholar]
  71. 71. 
    Jha A, Mamidi R. 2017. When does a compliment become sexist? Analysis and classification of ambivalent sexism using Twitter data. Proceedings of the 2nd Workshop on Natural Language Processing and Computational Social Science7–16 Washington, DC: Assoc. Comput. Linguist.
  72. 72. 
    Johnson CL, Paulose-Ram R, Ogden CL, Carroll MD, Kruszan-Moran D et al. 2013. National Health and Nutrition Examination Survey: analytic guidelines, 1999–2010. Vital and Health Statistics, Ser. 2: Data Evaluation and Methods Research, no. 161 Washington, DC: Cent. Dis. Control Prev.
    [Google Scholar]
  73. 73. 
    Joseph K, Landwehr PM, Carley KM. 2014. Two 1%s don't make a whole: comparing simultaneous samples from Twitter's streaming API. Proceedings of the International Conference on Social Computing, Behavioral-Cultural Modeling, and Prediction75–83 Berlin: Springer
  74. 74. 
    Kawachi I, Subramanian S. 2018. Social epidemiology for the 21st century. Soc. Sci. Med. 196:240–45
    [Google Scholar]
  75. 75. 
    Kaplan RM, Chambers DA, Glasgow RE. 2014. Big data and large sample size: a cautionary note on the potential for bias. Clin. Transl. Sci. 7:4342–46
    [Google Scholar]
  76. 76. 
    Kavanagh MM, Baral SD, Milanga M, Sugarman J. 2019. Biometrics and public health surveillance in criminalised and key populations: policy, ethics, and human rights considerations. Lancet HIV 6:1e51–59
    [Google Scholar]
  77. 77. 
    Kenney M, Mamo L. 2020. The imaginary of precision public health. Med. Humanit. 46:192–203
    [Google Scholar]
  78. 78. 
    Khoury MJ, Engelgau M, Chambers DA, Mensah GA. 2018. Beyond public health genomics: Can big data and predictive analytics deliver precision public health?. Public Health Genom 21:5/6244–50
    [Google Scholar]
  79. 79. 
    Klawitter M. 2011. Multilevel analysis of the effects of antidiscrimination policies on earnings by sexual orientation. J. Policy Anal. Manag. 30:2334–58
    [Google Scholar]
  80. 80. 
    Konkel L. 2015. Racial and ethnic disparities in research studies: the challenge of creating more diverse cohorts. Environ. Health Perspect. 123:12A297–302
    [Google Scholar]
  81. 81. 
    Kontos EZ, Emmons KM, Puleo E, Viswanath K 2010. Communication inequalities and public health implications of adult social networking site use in the United States. J. Health Commun. 15:Suppl. 3216–35
    [Google Scholar]
  82. 82. 
    Kristoufek L. 2013. BitCoin meets Google Trends and Wikipedia: quantifying the relationship between phenomena of the Internet era. Sci. Rep. 3:3415
    [Google Scholar]
  83. 83. 
    Lamb MR, Kandula S, Shaman J 2021. Differential COVID-19 case positivity in New York City neighborhoods: socioeconomic factors and mobility. Influenza Other Respir. Vir. 15:2209–17
    [Google Scholar]
  84. 84. 
    Lee EW, Viswanath K. 2020. Big data in context: addressing the twin perils of data absenteeism and chauvinism in the context of health disparities research. J. Med. Internet Res. 22:1e16377
    [Google Scholar]
  85. 85. 
    Lewis K, Kaufman J, Gonzalez M, Wimmer A, Christakis N 2008. Tastes, ties, and time: a new social network dataset using Facebook.com. Soc. Netw. 30:4330–42
    [Google Scholar]
  86. 86. 
    Lum K, Price ME, Banks D. 2013. Applications of multiple systems estimation in human rights research. Am. Stat. 67:4191–200
    [Google Scholar]
  87. 87. 
    Maharana A, Cai K, Hellerstein J, Hswen Y, Munsell M et al. 2019. Detecting reports of unsafe foods in consumer product reviews. JAMIA Open 2:3330–38
    [Google Scholar]
  88. 88. 
    Malebranche DJ, Peterson JL, Fullilove RE, Stackhouse RW. 2004. Race and sexual identity: perceptions about medical culture and healthcare among Black men who have sex with men. J. Natl. Med. Assoc. 96:197–107
    [Google Scholar]
  89. 89. 
    Mateos P. 2007. A review of name-based ethnicity classification methods and their potential in population studies. Popul. Space Place 13:4243–63
    [Google Scholar]
  90. 90. 
    McKenna B, Myers MD, Newman M. 2017. Social media in qualitative research: challenges and recommendations. Inform. Organ. 27:287–99
    [Google Scholar]
  91. 91. 
    McMurtry CL, Findling MG, Casey LS, Blendon RJ, Benson JM et al. 2019. Discrimination in the United States: experiences of Asian Americans. Health Serv. Res. 54:1419–30
    [Google Scholar]
  92. 92. 
    Medeiros V, Marques Ribeiro RS, Maia do Amaral PV 2021. Infrastructure and household poverty in Brazil: a regional approach using multilevel models. World Dev 137:105118
    [Google Scholar]
  93. 93. 
    Mellon J, Prosser C. 2017. Twitter and Facebook are not representative of the general population: political attitudes and demographics of British social media users. Res. Politics 4:3 https://doi.org/10.1177/2053168017720008
    [Crossref] [Google Scholar]
  94. 94. 
    Min JE, Pearce LA, Homayra F, Dale LM, Barocas JA et al. 2020. Estimates of opioid use disorder prevalence from a regression-based multi-sample stratified capture–recapture analysis. Drug Alcohol Depend 217:108337
    [Google Scholar]
  95. 95. 
    Mislove A, Lehmann S, Ahn Y-Y, Onnela J-P, Rosenquist J. 2011. Understanding the demographics of Twitter users. Proceedings of the 5th International Conference on Weblogs and Social Media554–57 Palo Alto, CA: AAAI
  96. 96. 
    Mitchell S, Potash E, Barocas S, D'Amour A, Lum K. 2018. Prediction-based decisions and fairness: a catalogue of choices, assumptions, and definitions. arXiv:1811.07867 [stat.AP]
  97. 97. 
    Mohsin M. 2020. 10 Google Search statistics you need to know. Oberlo Blog April 3. https://www.oberlo.com/blog/google-search-statistics
  98. 98. 
    Mooney SJ, Pejaver V. 2018. Big data in public health: terminology, machine learning, and privacy. Annu. Rev. Public Health 39:95–112
    [Google Scholar]
  99. 99. 
    Morstatter F, Pfeffer J, Liu H. 2014. When is it biased? Assessing the representativeness of Twitter's streaming API. Proceedings of the 23rd International Conference on the World Wide Web555–56 New York: ACM
  100. 100. 
    Morstatter F, Pfeffer J, Liu H, Carley K. 2013. Is the sample good enough? Comparing data from Twitter's streaming API with Twitter's firehose. Proceedings of the 7th International AAAI Conference on Weblogs and Social Media400–8 Palo Alto, CA: AAAI
  101. 101. 
    Müller K, Schwarz C. 2020. From hashtag to hate crime: Twitter and anti-minority sentiment Work. Pap. Natl. Univ. Singapore/Bocconi Univ. Milan: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=3149103
  102. 102. 
    Nardone A, Casey JA, Morello-Frosch R, Mujahid M, Balmes JR, Thakur N. 2020. Associations between historical residential redlining and current age-adjusted rates of emergency department visits due to asthma across eight cities in California: an ecological study. Lancet Planet Health 4:1e24–31
    [Google Scholar]
  103. 103. 
    Nayak A. 2010. Race, affect, and emotion: young people, racism, and graffiti in the postcolonial English suburbs. Environ. Plan. A 42:102370–92
    [Google Scholar]
  104. 104. 
    Nielsen S, Hansen JF, Hay G, Cowan S, Jepsen P et al. 2020. Hepatitis C prevalence in Denmark in 2016—an updated estimate using multiple national registers. PLOS ONE 15:9e0238203
    [Google Scholar]
  105. 105. 
    Nguyen TT, Criss S, Dwivedi P, Huang D, Keralis J et al. 2020. Exploring U.S. shifts in anti-Asian sentiment with the emergence of COVID-19. Int. J. Environ. Res. Public Health 17:197032
    [Google Scholar]
  106. 106. 
    Nguyen TT, Adams N, Huang D, Glymour MM, Allen AM, Nguyen QC. 2020. The association between state-level racial attitudes assessed from Twitter data and adverse birth outcomes: observational study. JMIR Public Health Surveill 6:3e17103
    [Google Scholar]
  107. 107. 
    Nsoesie EO, Kluberg SA, Brownstein JS. 2014. Online reports of foodborne illness capture foods implicated in official foodborne outbreak reports. Prev. Med. 67:264–69
    [Google Scholar]
  108. 108. 
    Nuti SV, Wayda B, Ranasinghe I, Wang S, Dreyer RP et al. 2014. The use of Google Trends in health care research: a systematic review. PLOS ONE 9:10e109583
    [Google Scholar]
  109. 109. 
    Obermeyer Z, Powers B, Vogeli C, Mullainathan S. 2019. Dissecting racial bias in an algorithm used to manage the health of populations. Science 366:6464447–53
    [Google Scholar]
  110. 110. 
    Orellana C, Kreshpaj B, Burstrom B, Davis L, Frumento P et al. 2021. Organisational factors and under-reporting of occupational injuries in Sweden: a population-based study using capture-recapture methodology. Occup. Env. Med. 78:745–52
    [Google Scholar]
  111. 111. 
    Oum S, Chandramohan D, Cairncross S 2005. Community-based surveillance: a pilot study from rural Cambodia. Trop. Med. Int. Health 10:7689–97
    [Google Scholar]
  112. 112. 
    Pachankis JE, Hatzenbuehler ML, Berg RC, Fernández-Dávila P, Mirandola M et al. 2017. Anti-LGBT and anti-immigrant structural stigma: an intersectional analysis of sexual minority men's HIV risk when migrating to or within Europe. J. Acquir. Immune Defic. Syndr. 76:4356–66
    [Google Scholar]
  113. 113. 
    Pachankis JE, Hatzenbuehler ML, Mirandola M, Weatherburn P, Berg RC et al. 2017. The geography of sexual orientation: structural stigma and sexual attraction, behavior, and identity among men who have sex with men across 38 European countries. Arch. Sex. Behav. 46:51491–502
    [Google Scholar]
  114. 114. 
    Parikh RB, Teeple S, Navathe AS 2019. Addressing bias in artificial intelligence in health care. JAMA 322:242377–78
    [Google Scholar]
  115. 115. 
    Paulus JK, Kent DM. 2020. Predictably unequal: understanding and addressing concerns that algorithmic clinical prediction may increase health disparities. npj Digit. Med. 3:99
    [Google Scholar]
  116. 116. 
    Potash E, Ghani R, Walsh J, Jorgensen E, Lohff C et al. 2020. Validation of a machine learning model to predict childhood lead poisoning. JAMA Netw. Open 3:9e2012734
    [Google Scholar]
  117. 117. 
    Pourebrahim N, Sultana S, Edwards J, Gochanour A, Mohanty S 2019. Understanding communication dynamics on Twitter during natural disasters: a case study of Hurricane Sandy. Int. J. Disaster Risk Reduct. 37:101176
    [Google Scholar]
  118. 118. 
    Preis T, Moat HS, Stanley HE. 2013. Quantifying trading behavior in financial markets using Google Trends. Sci. Rep. 3:1684
    [Google Scholar]
  119. 119. 
    Preis T, Moat HS, Stanley HE, Bishop SR 2012. Quantifying the advantage of looking forward. Sci. Rep. 2:350
    [Google Scholar]
  120. 120. 
    Pruss D, Fujinuma Y, Daughton AR, Paul MJ, Arnot B et al. 2019. Zika discourse in the Americas: a multilingual topic analysis of Twitter. PLOS ONE 14:5e0216922
    [Google Scholar]
  121. 121. 
    Razavian N, Blecker S, Schmidt AM, Smith-Mclallen A et al. 2015. Population-level prediction of type 2 diabetes from claims data and analysis of risk factors. Big Data 3:4277–87
    [Google Scholar]
  122. 122. 
    Reips U-D, Matzat U. 2014. Mining “Big Data” using big data services. Int. J. Internet Sci. 9:11–8
    [Google Scholar]
  123. 123. 
    Runge-Ranzinger S, Horstick O, Marx M, Kroeger A 2008. What does dengue disease surveillance contribute to predicting and detecting outbreaks and describing trends?. Trop. Med. Int. Health 13:81022–41
    [Google Scholar]
  124. 124. 
    Sadilek A, Hswen Y, Bavadekar S, Shekel T, Brownstein JS, Gabrilovich E. 2020. Lymelight: forecasting Lyme disease risk using web search data. . npj Digit. Med. 3:16
    [Google Scholar]
  125. 125. 
    Sandifer PA, Knapp LC, Lichtveld MY, Manley RE, Abramson D et al. 2020. Framework for a community health observing system for the Gulf of Mexico region: preparing for future disasters. Front. Public Health 8:588
    [Google Scholar]
  126. 126. 
    Serwaa-bonsu A, Herbst A, Reniers G, Ijaa W, Clark B et al. 2010. First experiences in the implementation of biometric technology to link data from Health and Demographic Surveillance Systems with health facility data. Glob. Health Act. 3:1 https://doi.org/10.3402/gha.v3i0.2120
    [Crossref] [Google Scholar]
  127. 127. 
    Sewalk KC, Tuli G, Hswen Y, Brownstein JS, Hawkins JB. 2018. Using Twitter to examine Web-based patient experience sentiments in the United States: longitudinal study. J. Med. Internet Res. 20:10e10043
    [Google Scholar]
  128. 128. 
    Silverman B. 2014. Modern slavery: an application of multiple systems estimation Crime Prev. Rep., Home Off., Gov UK, London: https://www.gov.uk/government/uploads/system/uploads/attachment_data/file/386841/Modern_Slavery_an_application_of_MSE_revised.pdf
  129. 129. 
    Sjoding MW, Dickson RP, Iwashyna TJ, Gay SE, Valley TS. 2020. Racial bias in pulse oximetry measurement. N. Engl. J. Med. 383:252477–78
    [Google Scholar]
  130. 130. 
    Song TM, Song J, An JY, Hayman LL, Woo JM. 2014. Psychological and social factors affecting internet searches on suicide in Korea: a big data analysis of google search trends. Yonsei Med. J. 55:1254–63
    [Google Scholar]
  131. 131. 
    Stephens-Davidowitz S. 2014. The cost of racial animus on a black candidate: evidence using Google search data. J. Public Econ. 118:26–40
    [Google Scholar]
  132. 132. 
    Tavoschi L, Quattrone F, D'Andrea E, Ducange P, Vabanesi M et al. 2020. Twitter as a sentinel tool to monitor public opinion on vaccination: an opinion mining analysis from September 2016 to August 2017 in Italy. Hum. Vaccines Immunother. 16:51062–69
    [Google Scholar]
  133. 133. 
    Thesmar D, Sraer D, Pinheiro L, Dadson N, Veliche R, Greenberg P. 2019. Combining the power of artificial intelligence with the richness of healthcare claims data: opportunities and challenges. PharmacoEconomics 37:6745–52
    [Google Scholar]
  134. 134. 
    Thompson C. 2006. Google's China problem (and China's Google problem). New York Times Magazine April 23. https://www.nytimes.com/2006/04/23/magazine/googles-china-problem-and-chinas-google-problem.html
    [Google Scholar]
  135. 135. 
    Tufekci Z. 2014. Big questions for social media big data: representativeness, validity and other methodological pitfalls. Proceedings of the 8th International AAAI Conference on Weblogs and Social Media505–14 Palo Alto, CA: AAAI
  136. 136. 
    Twitter 2021. Rate limits Dev. Platf. Doc. Twitter, San Francisco, CA: https://developer.twitter.com/en/docs/rate-limits
  137. 137. 
    Tzioumis K. 2018. Demographic aspects of first names. Sci. Data 5:180025
    [Google Scholar]
  138. 138. 
    Valentino-DeVries J, Lu D, Dance GJX 2020. Location data says it all: Staying at home during coronavirus is a luxury. New York Times April 3. https://www.nytimes.com/interactive/2020/04/03/us/coronavirus-stay-home-rich-poor.html?auth=link-dismiss-google1tap
    [Google Scholar]
  139. 139. 
    Vosen S, Schmidt T. 2011. Forecasting private consumption: survey-based indicators versus Google trends. J. Forecast. 30:6565–78
    [Google Scholar]
  140. 140. 
    Wall KM, Kilembe W, Inambao M, Chen YN, Mchoongo M et al. 2015. Implementation of an electronic fingerprint-linked data collection system: a feasibility and acceptability study among Zambian female sex workers. Glob. Health 11:27
    [Google Scholar]
  141. 141. 
    Wang B, Zhuang J 2017. Crisis information distribution on Twitter: a content analysis of tweets during Hurricane Sandy. Nat. Hazards 89:1161–81
    [Google Scholar]
  142. 142. 
    Wang F, Shu X, Meszoely I, Pal T, Mayer IA et al. 2019. Overall mortality after diagnosis of breast cancer in men versus women. JAMA Oncol 5:111589–96
    [Google Scholar]
  143. 143. 
    Wang L, Porter B, Maynard C, Evans G, Bryson C et al. 2013. Predicting risk of hospitalization or death among patients receiving primary care in the Veterans Health Administration. Med. Care 51:4368–73
    [Google Scholar]
  144. 144. 
    Wesson P, Murgai N. 2018. Evaluating the completeness of HIV surveillance using capture–recapture models, Alameda County, California. AIDS Behav 22:72248–57
    [Google Scholar]
  145. 145. 
    Williams BA, Brooks CF, Shmargad Y. 2018. How algorithms discriminate based on data they lack: challenges, solutions, and policy implications. J. Inf. Policy 8:78–115
    [Google Scholar]
  146. 146. 
    Wong KO, Zaïane OR, Davis FG, Yasui Y. 2020. A machine learning approach to predict ethnicity using personal name and census location in Canada. PLOS ONE 15:11e0241239
    [Google Scholar]
  147. 147. 
    Xiong J, Hswen Y, Naslund JA. 2020. Digital surveillance for monitoring environmental health threats: a case study capturing public opinion from Twitter about the 2019 Chennai water crisis. Int. J. Environ. Res. Public Health 17:145077
    [Google Scholar]
  148. 148. 
    Zech JR, Badgeley MA, Liu M, Costa AB, Titano JJ, Oermann EK. 2018. Variable generalization performance of a deep learning model to detect pneumonia in chest radiographs: a cross-sectional study. PLOS Med 15:11e1002683
    [Google Scholar]
  149. 149. 
    Zemel R, Wu Y, Swersky K, Pitassi T, Dwork C. 2013. Learning fair representations. PMLR 28:3325–33
    [Google Scholar]

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error