1932

Abstract

The Research Program's Data and Research Center (DRC) was established to help acquire, curate, and provide access to one of the world's largest and most diverse datasets for precision medicine research. Already, over 500,000 participants are enrolled in , 80% of whom are underrepresented in biomedical research, and data are being analyzed by a community of over 2,300 researchers. The DRC created this thriving data ecosystem by collaborating with engaged participants, innovative program partners, and empowered researchers. In this review, we first describe how the DRC is organized to meet the needs of this broad group of stakeholders. We then outline guiding principles, common challenges, and innovative approaches used to build the data ecosystem. Finally, we share lessons learned to help others navigate important decisions and trade-offs in building a modern biomedical data platform.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-biodatasci-122120-104825
2023-08-10
2024-04-22
Loading full text...

Full text loading...

/deliver/fulltext/biodatasci/6/1/annurev-biodatasci-122120-104825.html?itemId=/content/journals/10.1146/annurev-biodatasci-122120-104825&mimeType=html&fmt=ahah

Literature Cited

  1. 1.
    NCI (Natl. Cancer Inst.) 2018. The Cancer Genome Atlas program Web Resour., NCI Bethesda, MD: accessed Sept. 30, 2022. https://www.cancer.gov/about-nci/organization/ccg/research/structural-genomics/tcga
  2. 2.
    Luo Y, Hitz BC, Gabdank I, Hilton JA, Kagda MS et al. 2020. New developments on the Encyclopedia of DNA Elements (ENCODE) data portal. Nucleic Acids Res. 48:D1D882–89
    [Google Scholar]
  3. 3.
    Cann HM, de Toma C, Cazes L, Legrand M-F, Morel V et al. 2002. A human genome diversity cell line panel. Science 296:5566261–62
    [Google Scholar]
  4. 4.
    Reuter MS, Walker S, Thiruvahindrapuram B, Whitney J, Cohn I et al. 2018. The Personal Genome Project Canada: findings from whole genome sequences of the inaugural 56 participants. CMAJ 190:E126–36
    [Google Scholar]
  5. 5.
    Psaty BM, O'Donnell CJ, Gudnason V, Lunetta KL, Folsom AR et al. 2009. Cohorts for Heart and Aging Research in Genomic Epidemiology (CHARGE) Consortium: design of prospective meta-analyses of genome-wide association studies from 5 cohorts. Circ. Cardiovasc. Genet. 2:173–80
    [Google Scholar]
  6. 6.
    Sudlow C, Gallacher J, Allen N, Beral V, Burton P et al. 2015. UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. PLOS Med. 12:3e1001779
    [Google Scholar]
  7. 7.
    Chen Z, Chen J, Collins R, Guo Y, Peto R et al. 2011. China Kadoorie Biobank of 0.5 million people: survey methods, baseline characteristics and long-term follow-up. Int. J. Epidemiol. 40:1652–66
    [Google Scholar]
  8. 8.
    Schatz MC, Philippakis AA, Afgan E, Banks E, Carey VJ et al. 2022. Inverting the model of genomics data sharing with the NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space. Cell Genom. 2:100085
    [Google Scholar]
  9. 9.
    Zhang J, Bajari R, Andric D, Gerthoffert F, Lepsa A et al. 2019. The International Cancer Genome Consortium Data Portal. Nat. Biotechnol. 37:367–69
    [Google Scholar]
  10. 10.
    All Us Res. Progr. Investig 2019. The “All of Us” Research Program. New Engl. J. Med. 381:668–76
    [Google Scholar]
  11. 11.
    Mapes BM, Foster CS, Kusnoor SV, Epelbaum MI, AuYoung M et al. 2020. Diversity and inclusion for the All of Us Research Program: a scoping review. PLOS ONE 15:e0234962
    [Google Scholar]
  12. 12.
    Ginsburg GS, Phillips KA. 2018. Precision medicine: from science to value. Health Aff. 37:694–701
    [Google Scholar]
  13. 13.
    Robinson PN. 2012. Deep phenotyping for precision medicine. Hum. Mutat. 33:777–80
    [Google Scholar]
  14. 14.
    Torous J, Kiang MV, Lorme J, Onnela J-P. 2016. New tools for new research in psychiatry: a scalable and customizable platform to empower data driven smartphone research. JMIR Ment. Health 3:e16
    [Google Scholar]
  15. 15.
    Khodyakov D, Bromley E, Evans SK, Sieck K. 2018. Best practices for participant and stakeholder engagement in the All of Us Research Program Tech. Rep. RAND Corp. Santa Monica, CA:
  16. 16.
    Devaney S. 2020. Fighting unfairness in genetic medicine. . Scientific American Jan. 1. https://www.scientificamerican.com/article/fighting-unfairness-in-genetic-medicine/
    [Google Scholar]
  17. 17.
    Popejoy AB, Fullerton SM. 2016. Genomics is failing on diversity. Nature 538:161–64
    [Google Scholar]
  18. 18.
    Baxter SL, Saseendrakumar BR, Paul P, Kim J, Bonomi L et al. 2021. Predictive analytics for glaucoma using data from the All of Us Research Program. Am. J. Ophthalmol. 227:74–86
    [Google Scholar]
  19. 19.
    Lyles CR, Lunn MR, Obedin-Maliver J, Bibbins-Domingo K. 2018. The new era of precision population health: insights for the All of Us Research Program and beyond. J. Transl. Med. 16:211
    [Google Scholar]
  20. 20.
    Bohnert K. 2019. Thematic analysis of sexual and gender minority enrollment in the All of Us Pennsylvania project: implications for public health research MS Thesis Univ. Pittsburgh Pittsburgh, PA:
  21. 21.
    Tabak LA, Collins FS. 2011. Weaving a richer tapestry in biomedical science. Science 333:940–41
    [Google Scholar]
  22. 22.
    Oh SS, Galanter J, Thakur N, Pino-Yanes M, Barcelo NE et al. 2015. Diversity in clinical and biomedical research: a promise yet to be fulfilled. PLOS Med. 12:e1001918
    [Google Scholar]
  23. 23.
    PMI (Precis. Med. Intiat.) Work. Group 2015. The Precision Medicine Initiative Cohort Program—building a research foundation for 21st century medicine Tech. Rep. Natl. Inst. Health Bethesda, MD:
  24. 24.
    Doerr M, Moore S, Barone V, Sutherland S, Bot BM et al. 2021. Assessment of the All of Us research program's informed consent process. AJOB Empir. Bioeth. 12:72–83
    [Google Scholar]
  25. 25.
    Aschebrook-Kilfoy B, Zakin P, Craver A, Shah S, Kibriya MG et al. 2022. An overview of cancer in the first 315,000 All of Us participants. PLOS ONE 17:e0272522
    [Google Scholar]
  26. 26.
    Harrison SM, Austin-Tse CA, Kim S, Lebo M, Leon A et al. 2022. Harmonizing variant classification for return of results in the All of Us Research Program. Hum. Mutation 43:1114–21
    [Google Scholar]
  27. 27.
    Venner E, Muzny D, Smith JD, Walker K, Neben CL et al. 2022. Whole-genome sequencing as an investigational device for return of hereditary disease risk and pharmacogenomic results as part of the All of Us Research Program. Genome Med. 14:34
    [Google Scholar]
  28. 28.
    NIH (Natl. Inst. Health) 2018. NIH strategic plan for data science Tech. Rep. NIH Bethesda, MD: https://datascience.nih.gov/sites/default/files/NIH_Strategic_Plan_for_Data_Science_Final_508.pdf
  29. 29.
    Rossi RL, Grifantini RM. 2018. Big data: challenge and opportunity for translational and industrial research in healthcare. Front. Digit. Humanit. 5:13
    [Google Scholar]
  30. 30.
    Hong L, Luo M, Wang R, Lu P, Lu W, Lu L. 2018. Big data in health care: applications and challenges. Data Inf. Manag. 2:175–97
    [Google Scholar]
  31. 31.
    OpenSistemas 2020. The 4 V's of big data Web Resour., OpenSistemas Madrid: accessed Oct. 4, 2022. https://opensistemas.com/en/the-four-vs-of-big-data/
  32. 32.
    Doerr M, Grayson S, Moore S, Suver C, Wilbanks J, Wagner J. 2019. Implementing a universal informed consent process for the All of Us Research Program. Pac. Symp. Biocomput. 24:427–38
    [Google Scholar]
  33. 33.
    All Us Res. Progr 2020. Participant Technology Systems Center Web Resour., All Us Res. Prog., Natl. Inst. Health Bethesda, MD: accessed Nov. 22, 2022. https://allofus.nih.gov/funding-and-program-partners/participant-technology-systems-center
  34. 34.
    Harris PA, Taylor R, Thielke R, Payne J, Gonzalez N, Conde JG. 2009. Research electronic data capture (REDCap)—a metadata-driven methodology and workflow process for providing translational research informatics support. J. Biomed. Inform. 42:377–81
    [Google Scholar]
  35. 35.
    Harris PA, Taylor R, Minor BL, Elliott V, Fernandez M et al. 2019. The REDCap consortium: building an international community of software platform partners. J. Biomed. Inform. 95:103208
    [Google Scholar]
  36. 36.
    Cronin RM, Jerome RN, Mapes B, Andrade R, Johnston R et al. 2019. Development of the initial surveys for the All of Us Research Program. Epidemiology 30:597–608
    [Google Scholar]
  37. 37.
    Overhage JM, Ryan PB, Reich CG, Hartzema AG, Stang PE. 2012. Validation of a common data model for active safety surveillance research. J. Am. Med. Inform. Assoc. 19:54–60
    [Google Scholar]
  38. 38.
    Turner SP, Pompea ST, Williams KL, Kraemer DA, Sholle ET et al. 2019. Implementation of informatics to support the NIH All of Us Research Program in a healthcare provider organization. AMIA Jt. Summits Transl. Sci. Proc. 2019:602–9
    [Google Scholar]
  39. 39.
    Klann JG, Joss MAH, Embree K, Murphy SN. 2019. Data model harmonization for the All Of Us Research Program: transforming i2b2 data into the OMOP common data model. PLOS ONE 14:e0212463
    [Google Scholar]
  40. 40.
    Engel N, Wang H, Jiang X, Lau CY, Patterson J, Acharya N et al. 2022. EHR data quality assessment tools and issue reporting workflows for the ‘All of Us’ Research Program clinical data research network. AMIA Annu. Symp. Proc. 2022:186–95
    [Google Scholar]
  41. 41.
    Ramirez AH, Sulieman L, Schlueter DJ, Halvorson A, Qian J et al. 2022. The All of Us Research Program: data quality, utility, and diversity. Patterns 3:100570
    [Google Scholar]
  42. 42.
    Zhou W, Kanai M, Wu K-HH, Humaira R, Tsuo K et al. 2022. Global Biobank Meta-analysis Initiative: powering genetic discovery across human diseases. Cell Genom. 2:10100192
    [Google Scholar]
  43. 43.
    Fan J, Han F, Liu H. 2014. Challenges of big data analysis. Natl. Sci. Rev. 1:293–314
    [Google Scholar]
  44. 44.
    NEJM Catalyst 2018. Healthcare big data and the promise of value-based care. NEJM Catalyst Jan. 1. https://catalyst.nejm.org/doi/full/10.1056/CAT.18.0290
    [Google Scholar]
  45. 45.
    Navale V, von Kaeppler D, McAuliffe M. 2021. An overview of biomedical platforms for managing research data. J. Data Inf. Manag. 3:21–27
    [Google Scholar]
  46. 46.
    Yin Z, Lan H, Tan G, Lu M, Vasilakos AV, Liu W. 2017. Computing platforms for big biological data analytics: perspectives and challenges. Comput. Struct. Biotechnol. J. 15:403–11
    [Google Scholar]
  47. 47.
    Stine KM, Kissel RL, Barker WC, Lee A, Fahlsing J, Gulick J. 2008. Guide for mapping types of information and information systems to security categories Spec. Pub. 800-60, Vols. 1 and 2 Natl. Inst. Stand. Technol. Gaithersburg, MD:
  48. 48.
    HL7 (Health Level Seven Int.) 2022. FHIR v4.3.0 Data Exch. Stand. Doc. HL7, accessed Oct. 4, 2022. https://hl7.org/fhir/
  49. 49.
    Hripcsak G, Schuemie MJ, Madigan D, Ryan PB, Suchard MA. 2021. Drawing reproducible conclusions from observational clinical data with OHDSI. Yearb. Med. Inform. 30:283–89
    [Google Scholar]
  50. 50.
    Hripcsak G, Duke JD, Shah NH, Reich CG, Huser V et al. 2015. Observational Health Data Sciences and Informatics (OHDSI): opportunities for observational researchers. Stud. Health Technol. Inform. 216:574–78
    [Google Scholar]
  51. 51.
    Global Alliance Genomics Health 2016. A federated ecosystem for sharing genomic, clinical data. Science 352:1278–80
    [Google Scholar]
  52. 52.
    Cronin RM, Halvorson AE, Springer C, Feng X, Sulieman L et al. 2021. Comparison of family health history in surveys vs electronic health record data mapped to the observational medical outcomes partnership data model in the All of Us Research Program. J. Am. Med. Inform. Assoc. 28:695–703
    [Google Scholar]
  53. 53.
    Sulieman L, Cronin RM, Carroll RJ, Natarajan K, Marginean K et al. 2022. Comparing medical history data derived from electronic health records and survey answers in the All of Us Research Program. J. Am. Med. Inform. Assoc. 29:1131–41
    [Google Scholar]
  54. 54.
    Xu H, Stenner SP, Doan S, Johnson KB, Waitman LR, Denny JC. 2010. MedEx: a medication information extraction system for clinical narratives. J. Am. Med. Inform. Assoc. 17:19–24
    [Google Scholar]
  55. 55.
    Wu Y, Denny JC, Rosenbloom ST, Miller RA, Giuse DA et al. 2017. A long journey to short abbreviations: developing an open-source framework for clinical abbreviation recognition and disambiguation (CARD). J. Am. Med. Inform. Assoc. 24:e79–86
    [Google Scholar]
  56. 56.
    Soysal E, Wang J, Jiang M, Wu Y, Pakhomov S et al. 2018. CLAMP—a toolkit for efficiently building customized clinical natural language processing pipelines. J. Am. Med. Inform. Assoc. 25:331–36
    [Google Scholar]
  57. 57.
    Information blocking and the ONC Health IT Certification Program: extension of compliance dates and timeframes in response to the COVID-19 public health emergency. 85 Fed. Reg. 70064 (Nov. 4, 2020) (amending 45 C.F.R. § 170, 45 C.F.R. § 171)
  58. 58.
    All Us Res. Hub. A n.d. Data methods Web Resour., All Us Res. Hub., Natl. Inst. Health Bethesda, MD: accessed Oct. 10, 2022. https://www.researchallofus.org/data-tools/methods/
  59. 59.
    Khan MS, Carroll RJ. 2022. Inference-based correction of multi-site height and weight measurement data in the All of Us Research Program. J. Am. Med. Inform. Assoc. 29:626–30
    [Google Scholar]
  60. 60.
    Cimino JJ, Ayres EJ. 2010. The clinical research data repository of the US National Institutes of Health. Stud. Health Technol. Inform. 160:1299–303
    [Google Scholar]
  61. 61.
    All Us. Res. Progr n.d. Data access tiers Web Resour., All Us Res. Progr., Natl. Inst. Health Bethesda, MD: accessed Oct. 7, 2022. https://www.researchallofus.org/data-tools/data-access/
  62. 62.
    All Us. Res. Progr n.d. Data browser Web Resour., All Us Res. Progr., Natl. Inst. Health Bethesda, MD: accessed Oct. 21, 2022. https://databrowser.researchallofus.org/
  63. 63.
    HHS (US Dept. Health Hum. Serv.) 2012. Guidance regarding methods for de-identification of protected health information in accordance with the Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule Regul. Guid., HHS Washington, DC: accessed Oct. 6, 2022. https://www.hhs.gov/hipaa/for-professionals/privacy/special-topics/de-identification/index.html
  64. 64.
    All Us. Res. Progr n.d. Research projects directory Web Resour., All Us Res. Progr., Natl. Inst. Health Bethesda, MD: accessed Oct. 10, 2022. https://www.researchallofus.org/research-projects-directory/
  65. 65.
    All Us Res. Progr 2020. Precision medicine initiative: privacy and trust principles Web Resour., All Us Res. Progr., Natl. Inst. Health Bethesda, MD: accessed Oct. 6, 2022. https://allofus.nih.gov/protecting-data-and-privacy/precision-medicine-initiative-privacy-and-trust-principles
  66. 66.
    Diwadkar AR, Yoon S, Shim J, Gonzalez M, Urbanowicz R, Himes BE. 2021. Integrating biomedical informatics training into existing high school curricula. AMIA Jt. Summits Transl. Sci. Proc. 2021:190–99
    [Google Scholar]
  67. 67.
    Wolff A, Gooch D, Montaner JJC, Rashid U, Kortuem G. 2016. Creating an understanding of data literacy for a data-driven society. J. Commun. Inform. 12:39–26
    [Google Scholar]
  68. 68.
    Wilkinson MD, Dumontier M, Aalbersberg IJJ, Appleton G, Axton M et al. 2016. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 3:160018
    [Google Scholar]
  69. 69.
    Agile Alliance 2001. Manifesto for Agile software development Web Resour., Agile Alliance, accessed Oct. 5, 2022. https://agilemanifesto.org/
  70. 70.
    Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MAR et al. 2007. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81:559–75
    [Google Scholar]
  71. 71.
    Di Tommaso P, Chatzou M, Floden E, Barja PP, Palumbo E, Notredame C. 2017. Nextflow enables reproducible computational workflows. Nat. Biotechnol. 35:316–19
    [Google Scholar]
  72. 72.
    Broad Inst 2019. Cromwell Softw. Doc., Cromwell, Broad Inst. Cambridge, MA: accessed Oct. 7, 2022. https://cromwell.readthedocs.io/en/stable/
  73. 73.
    Cirillo D, Valencia A. 2019. Big data analytics for personalized medicine. Curr. Opin. Biotechnol. 58:161–67
    [Google Scholar]
  74. 74.
    Hulsen T, Jamuar SS, Moody AR, Karnes JH, Varga O et al. 2019. From big data to precision medicine. Front. Med. 6:34
    [Google Scholar]
  75. 75.
    Florance V 2014. Training for informatics research careers: history of extramural informatics training at the National Library of Medicine. Informatics Education in Healthcare: Lessons Learned ES Berner 27–42. London: Springer-Verlag
    [Google Scholar]
  76. 76.
    Staggers N, Gassert CA, Skiba DJ. 2000. Health professionals’ views of informatics education: findings from the AMIA 1999 Spring Conference. J. Am. Med. Inform. Assoc. 7:550–58
    [Google Scholar]
/content/journals/10.1146/annurev-biodatasci-122120-104825
Loading
/content/journals/10.1146/annurev-biodatasci-122120-104825
Loading

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error