1932

Abstract

The increasing amounts of healthcare data stored in health registries, in combination with genomic and other types of data, have the potential to enable better decision making and pave the path for personalized medicine. However, reaping the full benefits of big, sensitive data for the benefit of patients requires greater access to data across organizations and institutions in various regions. This overview first introduces cloud computing and takes stock of the challenges to enhancing data availability in the healthcare system. Four models for ensuring higher data accessibility are then discussed. Finally, several cases are discussed that explore how enhanced access to data would benefit the end user.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-biodatasci-012920-013357
2020-07-20
2024-05-06
Loading full text...

Full text loading...

/deliver/fulltext/biodatasci/3/1/annurev-biodatasci-012920-013357.html?itemId=/content/journals/10.1146/annurev-biodatasci-012920-013357&mimeType=html&fmt=ahah

Literature Cited

  1. 1. 
    OECD (Organ. Econ. Co-op. Dev.) 2019. OECD health statistics 2019 Health Stat. Database, OECD, Paris accessed Oct 19. https://www.oecd.org/health/health-data.htm
  2. 2. 
    Stanford Med 2017. Stanford Medicine 2017 health trends report: harnessing the power of data in health Health Trends Rep., Stanford Med Stanford, CA:
  3. 3. 
    Ashley EA. 2016. Towards precision medicine. Nat. Rev. Genet. 17:507–22
    [Google Scholar]
  4. 4. 
    Tremblay J, Hamet P. 2013. Role of genomics on the path to personalized medicine. Metabolism 62:S2–5
    [Google Scholar]
  5. 5. 
    Hulsen T, Jamuar SS, Moody AR, Karnes JH, Varga O et al. 2019. From big data to precision medicine. Front. Med. 6:34
    [Google Scholar]
  6. 6. 
    Ahmad F, Tripathi MM. 2018. Approaches of big data in healthcare: a critical review. Int. J. Adv. Res. Comput. Sci. 9:122–27
    [Google Scholar]
  7. 7. 
    Kong H-J. 2019. Managing unstructured big data in healthcare system. Healthc. Inform. Res. 25:1–2
    [Google Scholar]
  8. 8. 
    Meienberg J, Bruggmann R, Oexle K, Matyas G 2016. Clinical sequencing: Is WGS the better WES. ? Hum. Genet. 135:359–62
    [Google Scholar]
  9. 9. 
    Lionel AC, Costain G, Monfared N, Walker S, Reuter MS et al. 2018. Improved diagnostic yield compared with targeted gene sequencing panels suggests a role for whole-genome sequencing as a first-tier genetic test. Genet. Med. 20:435–44
    [Google Scholar]
  10. 10. 
    Krier JB, Kalia SS, Green RC 2016. Genomic sequencing in clinical practice: applications, challenges, and opportunities. Dialogues Clin. Neurosci. 18:299–312
    [Google Scholar]
  11. 11. 
    Weinstein J, Collisson EA, Mills GB, Shaw KRM, Ozenberger BA et al. 2013. The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet. 45:1113–20
    [Google Scholar]
  12. 12. 
    Bailey MH, Tokheim C, Porta-Pardo E, Sengupta S, Bertrand D et al. 2018. Comprehensive characterization of cancer driver genes and mutations. Cell 173:371–85
    [Google Scholar]
  13. 13. 
    Arora NS, Davis JK, Kirby C, McGuire AL, Green RC et al. 2017. Communication challenges for nongeneticist physicians relaying clinical genomic results. Pers. Med. 14:423–31
    [Google Scholar]
  14. 14. 
    Hasin Y, Seldin M, Lusis A 2017. Multi-omics approaches to disease. Genome Biol 18:83
    [Google Scholar]
  15. 15. 
    Hickey H. 2006. Bringing supercomputers to life (sciences). Biomed. Comput. Rev. 2:7–15
    [Google Scholar]
  16. 16. 
    Chandak S, Tatwawadi K, Ochoa I, Hernaez M, Weissman T 2018. SPRING: a next-generation compressor for FASTQ data. Bioinformatics 35:2674–76
    [Google Scholar]
  17. 17. 
    Stephens ZD, Lee SY, Faghri F, Campbell RH, Zhai C et al. 2015. Big data: astronomical or genomical. ? PLOS Biol 13:e1002195
    [Google Scholar]
  18. 18. 
    Hernaez M, Pavlichin D, Weissman T, Ochoa I 2019. Genomic data compression. Annu. Rev. Biomed. Data Sci. 2:19–37
    [Google Scholar]
  19. 19. 
    Tang H, Jiang X, Wang X, Wang S, Sofia H et al. 2016. Protecting genomic data analytics in the cloud: state of the art and opportunities. BMC Med. Genom. 9:63
    [Google Scholar]
  20. 20. 
    Conesa A, Beck S. 2019. Making multi-omics data accessible to researchers. Sci. Data 6:251
    [Google Scholar]
  21. 21. 
    Schatz MC, Langmead B. 2013. The DNA data deluge: fast, efficient genome sequencing machines are spewing out more data than geneticists can analyze. IEEE Spectr 50:26–33
    [Google Scholar]
  22. 22. 
    Merelli I. 2018. Infrastructure for high-performance computing: grids and grid computing. Encyclopedia of Bioinformatics and Computational Biology: ABC of Bioinformatics S Ranganathan, M Gribskov, K Nakai, C Schönbach 230–35 Amsterdam: Elsevier
    [Google Scholar]
  23. 23. 
    Raj R, Kaprio J, Korja M, Mikkonen ED, Jousilahti P, Siironen J 2017. Risk of hospitalization with neurodegenerative disease after moderate-to-severe traumatic brain injury in the working-age population: a retrospective cohort study using the Finnish national health registries. PLOS Med 14:e1002316
    [Google Scholar]
  24. 24. 
    Friberg L, Tabrizi F, Englund A 2016. Catheter ablation for atrial fibrillation is associated with lower incidence of stroke and death: data from Swedish health registries. Eur. Heart J. 37:2478–87
    [Google Scholar]
  25. 25. 
    Schmidt M, Schmidt SA, Sandegaard JL, Ehrenstein V, Pedersen L, Sorensen HT 2015. The Danish National Patient Registry: a review of content, data quality, and research potential. Clin. Epidemiol. 7:449–90
    [Google Scholar]
  26. 26. 
    RKKP (Reg. Klin. Kvalitetsudviklingsprogr.) 2019. Introduction to RKKP https://www.rkkp.dk/in-english/
  27. 27. 
    Buske OJ, Girdea M, Dumitriu S, Gallinger B, Hartley T et al. 2015. PhenomeCentral: a portal for phenotypic and genotypic matchmaking of patients with rare genetic diseases. Hum. Mutat. 36:931–40
    [Google Scholar]
  28. 28. 
    Chatzimichali EA, Brent S, Hutton B, Perrett D, Wright CF et al. 2015. Facilitating collaboration in rare genetic disorders through effective matchmaking in DECIPHER. Hum. Mutat. 36:941–49
    [Google Scholar]
  29. 29. 
    Fokkema IF, Taschner PE, Schaafsma GC, Celli J, Laros JF, den Dunnen JT 2011. LOVD v.2.0: the next generation in gene variant databases. Hum. Mutat. 32:557–63
    [Google Scholar]
  30. 30. 
    Landrum MJ, Lee JM, Benson M, Brown GR, Chao C et al. 2018. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res 46:D1062–67
    [Google Scholar]
  31. 31. 
    Njolstad PR, Andreassen OA, Brunak S, Borglum AD, Dillner J et al. 2019. Roadmap for a precision-medicine initiative in the Nordic region. Nat. Genet. 51:924–30
    [Google Scholar]
  32. 32. 
    Ceruzzi PE. 2003. A History of Modern Computing Cambridge, MA: MIT Press
  33. 33. 
    Thomas J. 2019. DOE/NNSA, Lab announce partnership with Cray to develop NNSA's first exascale supercomputer News Release, Aug. 13, Lawrence Livermore Natl. Lab Livermore, CA: https://www.llnl.gov/news/doennsa-lab-announce-partnership-cray-develop-nnsas-first-exascale-supercomputer
  34. 34. 
    Navale V, Bourne PE. 2018. Cloud computing applications for biomedical science: a perspective. PLOS Comput. Biol. 14:e1006144
    [Google Scholar]
  35. 35. 
    Mell P, Grance T. 2011. The NIST definition of cloud computing Special Pub. 800-145, Inf. Technol. Lab., Natl. Inst. Stand. Technol Gaithersburg, MD:
  36. 36. 
    Hum. Microbiome Proj 2013. Human Microbiome Project Project Dataset, Amazon Web Serv Seattle, WA: updated Jan. 2018. https://aws.amazon.com/datasets/human-microbiome-project/
  37. 37. 
    AWS (Amazon Web Serv.) 2014. Novartis case study Case Study, Amazon Web. Serv Seattle, WA: https://aws.amazon.com/it/solutions/case-studies/novartis/
  38. 38. 
    Off. Data Sci. Strateg 2019. About the STRIDES Initiative Media Resour., Off. Data Sci. Strateg., Natl. Inst. Health Bethesda, MD: https://datascience.nih.gov/strides
  39. 39. 
    Malin BA, Emam KE, O'Keefe CM 2013. Biomedical data privacy: problems, perspectives, and recent advances. J. Am. Med. Inform. Assoc. 20:2–6
    [Google Scholar]
  40. 40. 
    Marozzo F, Paolo T. 2018. Infrastructures for high-performance computing: cloud computing development environments. Encyclopedia of Bioinformatics and Computational Biology: ABC of Bioinformatics S Ranganathan, M Gribsov, K Nakai, C Schönbach 247–51 Amsterdam: Elsevier
    [Google Scholar]
  41. 41. 
    Descartes Labs 2019. Thunder from the cloud: 40,000 cores running in concert on AWS. Meditations Blog, Jun. 19. https://medium.com/descarteslabs-team/thunder-from-the-cloud-40-000-cores-running-in-concert-on-aws-bf1610679978
    [Google Scholar]
  42. 42. 
    Barrett A, Basilyan M. 2017. 220,000 cores and counting: MIT math professor breaks record for largest ever Compute Engine job. Google Cloud Platform Blog April 20. https://cloud.google.com/blog/products/gcp/220000-cores-and-counting-mit-math-professor-breaks-record-for-largest-ever-compute-engine-job
    [Google Scholar]
  43. 43. 
    Milne R, Morley KI, Howard H, Niemiec E, Nicol D et al. 2019. Trust in genomic data sharing among members of the general public in the UK, USA, Canada and Australia. Hum. Genet. 138:1237–46
    [Google Scholar]
  44. 44. 
    Krahe M, Milligan E, Reilly S 2019. Personal health information in research: perceived risk, trustworthiness and opinions from patients attending a tertiary healthcare facility. J. Biomed. Inform. 95:103222
    [Google Scholar]
  45. 45. 
    McGuire AL, Caulfield T, Cho MK 2008. Research ethics and the challenge of whole-genome sequencing. Nat. Rev. Genet. 9:152–56
    [Google Scholar]
  46. 46. 
    Gutmann A, Wagner J, Ali Y, Allen A, Arras J et al. 2012. Privacy and progress in whole genome sequencing White Pap., Pres. Comm. Study Bioeth. Issues Washington, DC:
  47. 47. 
    Howard HC, Knoppers BM, Cornel MC, Clayton EW, Sénécal K et al. 2015. Whole-genome sequencing in newborn screening? A statement on the continued importance of targeted approaches in newborn screening programmes. Eur. J. Hum. Genet. 23:1593–600
    [Google Scholar]
  48. 48. 
    Ayday E, De Cristofaro E, Hubaux J-P, Tsudik G 2015. Whole genome sequencing: revolutionary medicine or privacy nightmare. ? Computer 48:58–66
    [Google Scholar]
  49. 49. 
    Barth-Jones D. 2012. The “re-identification” of Governor William Weld's medical information: a critical re-examination of health data identification risks and privacy protections, then and now Work. Pap., Mailman Sch. Public Health, Columbia Univ. July 24 https://dx.doi.org/10.2139/ssrn.2076397
    [Crossref]
  50. 50. 
    Eur. Comm 2019. General Data Protection Regulation Legis. Act, Eur. Comm Brussels: https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:32016R0679
  51. 51. 
    Calif. Legis 2018. California Consumer Privacy Act Legis. Act, Calif. Legis Sacramento: https://leginfo.legislature.ca.gov/faces/billTextClient.xhtml?bill_id=201720180AB375
  52. 52. 
    Podlesny NJ, Kayem AVDM, Schorlemer S, Uflacker M 2018. Minimising information loss on anonymised high dimensional data with greedy in-memory processing. DEXA 2018: Database and Expert Systems Applications S Hartman, H Ma, A Hameurlain, G Pernul, R Wagner 85–100 Cham, Switz.: Springer
    [Google Scholar]
  53. 53. 
    Sweeney L. 2002. Achieving k-anonymity privacy protection using generalization and suppression. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 10:571–88
    [Google Scholar]
  54. 54. 
    Machanavajjhala A, Gehrke J, Kifer D, Venkitasubramaniam M 2006. l-diversity: privacy beyond k-anonymity. Proceedings of the 22nd International Conference on Data Engineering (ICDE '06) Pap. 24 Los Alamitos, CA: IEEE Comput. Soc.
    [Google Scholar]
  55. 55. 
    Li N, Li T, Venkatasubramanian S 2007. t-closeness: privacy beyond k-anonymity and l-diversity. Proceedings of the 2007 IEEE 23rd International Conference on Data Engineering106–15 Los Alamitos, CA: IEEE Comput. Soc.
    [Google Scholar]
  56. 56. 
    Islam MZ, Brankovic L. 2011. Privacy preserving data mining: a noise addition framework using a novel clustering technique. Knowl.-Based Syst. 24:1214–23
    [Google Scholar]
  57. 57. 
    Meyerson A, Williams R. 2004. On the complexity of optimal k-anonymity. Proceedings of the Twenty-Third ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems223–28 New York: Assoc. Comput. Mach.
    [Google Scholar]
  58. 58. 
    Dwork C. 2008. Differential privacy: a survey of results. Theory and Applications of Models of Computation M Agrawal, D Du, Z Duan, A Li 1–19 Berlin: Springer
    [Google Scholar]
  59. 59. 
    Bhaskar R, Laxman S, Smith A, Thakurta A 2010. Discovering frequent patterns in sensitive data. Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining503–12 New York: Assoc. Comput. Mach.
    [Google Scholar]
  60. 60. 
    Dwork C. 2011. Differential privacy. Encyclopedia of Cryptography and Security HCA van Tilborg, S Jajodia 338–40 Boston: Springer. , 2nd ed..
    [Google Scholar]
  61. 61. 
    Kifer D, Machanavajjhala A. 2011. No free lunch in data privacy. Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data193–204 New York: Assoc. Comput. Mach.
    [Google Scholar]
  62. 62. 
    El Emam K, Jonker E, Arbuckle L, Malin B 2011. A systematic review of re-identification attacks on health data. PLOS ONE 6:e28071
    [Google Scholar]
  63. 63. 
    Mohammed N, Fung B, Hung PCK, Lee C-K 2009. Anonymizing healthcare data: a case study on the blood transfusion service. Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining1285–94 New York: Assoc. Comput. Mach.
    [Google Scholar]
  64. 64. 
    Mohammed N, Fung B, Hung PCK, Lee C-K 2010. Centralized and distributed anonymization for high-dimensional healthcare data. ACM Transactions on Knowledge Discovery from Data (TKDD) Art. 18 New York: Assoc. Comput. Mach.
    [Google Scholar]
  65. 65. 
    Podlesny NJ, Kayem AVDM, Meinel C 2019. Identifying data exposure across high-dimensional health data silos through Bayesian networks optimised by multigrid and manifold. IEEE 17th International Conference on Dependable, Autonomic and Secure Computing (DASC)556–63 Los Alamitos, CA: IEEE Comput. Soc.
    [Google Scholar]
  66. 66. 
    Wilkinson MD, Dumontier M, Aalbersberg IJ, Appleton G, Axton M et al. 2016. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 3:160018
    [Google Scholar]
  67. 67. 
    Eur. Comm 2017. Pooling data to combat rare diseases News Release, Dec. 15. https://ec.europa.eu/jrc/en/news/pooling-data-combat-rare-diseases
  68. 68. 
    EJP RD (Eur. Jt. Progr. Rare Dis.) 2019. About EJP RD Media Resour., Eur. Jt. Progr. Rare Dis https://www.ejprarediseases.org/index.php/about/
  69. 69. 
    ELIXIR 2017. ELIXIR position paper on FAIR Data Management in the life sciences Position Pap., Sept. 7, ELIXIR Hinxton, UK: https://elixir-europe.org/system/files/elixir_statement_on_fair_data_management.pdf
  70. 70. 
    Ison J, Ienasescu H, Chmura P, Rydza E, Menager H et al. 2019. The bio.tools registry of software tools and data resources for the life sciences. Genome Biol 20:164
    [Google Scholar]
  71. 71. 
    Jensen AB, Moseley PL, Oprea TI, Ellesoe SG, Eriksson R et al. 2014. Temporal disease trajectories condensed from population-wide registry data covering 6.2 million patients. Nat. Commun. 5:4022
    [Google Scholar]
  72. 72. 
    Nielsen AB, Thorsen-Meyer H-C, Belling K, Nielsen AP, Thomas CE et al. 2019. Survival prediction in intensive-care units based on aggregation of long-term disease history and acute physiology: a retrospective study of the Danish National Patient Registry and electronic patient records. Lancet Digital Health 1:e78–89
    [Google Scholar]
  73. 73. 
    Cook CE, Bergman MT, Finn RD, Cochrane G, Birney E, Apweiler R 2016. The European Bioinformatics Institute in 2016: data growth and integration. Nucleic Acids Res 44:D20–26
    [Google Scholar]
  74. 74. 
    Linden M, Prochazka M, Lappalainen I, Bucik D, Vyskocil P et al. 2018. Common ELIXIR service for researcher authentication and authorisation. F1000Research 7:1199
    [Google Scholar]
  75. 75. 
    Saunders G, Baudis M, Becker R, Beltran S, Béroud C et al. 2019. Leveraging European infrastructures to access 1 million human genomes by 2022. Nat. Rev. Genet. 20:693–701
    [Google Scholar]
  76. 76. 
    Moran AE, Forouzanfar MH, Roth GA, Mensah GA, Ezzati M et al. 2014. The global burden of ischemic heart disease in 1990 and 2010: the Global Burden of Disease 2010 study. Circulation 129:1493–501
    [Google Scholar]
  77. 77. 
    Eur. Heart Netw 2017. European Cardiovascular Disease Statistics 2017 Brussels: Eur. Heart Netw http://www.ehnheart.org/cvd-statistics.html
  78. 78. 
    Agile Alliance 2001. Manifesto for agile software development Unpubl. Manif Agile Alliance, Corryton, TN: https://agilemanifesto.org/
  79. 79. 
    Dingsøyr T, Nerur S, Balijepally V, Moe NB 2012. A decade of agile methodologies: towards explaining agile software development. J. Syst. Softw. 85:1213–21
    [Google Scholar]
  80. 80. 
    Crew B. 2019. Lili Milani banks Estonia's genomic potential. Nature 569:S16–17
    [Google Scholar]
  81. 81. 
    Leitsalu L, Haller T, Esko T, Tammesoo ML, Alavere H et al. 2015. Cohort profile: Estonian Biobank of the Estonian Genome Center, University of Tartu. Int. J. Epidemiol. 44:1137–47
    [Google Scholar]
  82. 82. 
    NHS (Natl. Health Serv.) Engl 100,000 Genomes Project Media Resour., Natl. Health Serv Engl., London: https://www.england.nhs.uk/genomics/100000-genomes-project/
  83. 83. 
    Samuel GN, Farsides B. 2018. Genomics England's implementation of its public engagement strategy: blurred boundaries between engagement for the United Kingdom's 100,000 Genomes project and the need for public support. Public Underst. Sci. 27:352–64
    [Google Scholar]
  84. 84. 
    Genom. Engl 2018. The UK has sequenced 100,000 whole genomes in the NHS News Release, Dec. 5., Genom Engl., London: https://www.genomicsengland.co.uk/the-uk-has-sequenced-100000-whole-genomes-in-the-nhs/
/content/journals/10.1146/annurev-biodatasci-012920-013357
Loading
/content/journals/10.1146/annurev-biodatasci-012920-013357
Loading

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error