Supercomputing and Secure Cloud Infrastructures in Biology and Medicine

Cathrine Jespersgaard; Ali Syed; Piotr Chmura; Peter Løngreen

doi:10.1146/annurev-biodatasci-012920-013357

Annual Review of Biomedical Data Science

Volume 3, 2020

Review Article

Free

Supercomputing and Secure Cloud Infrastructures in Biology and Medicine

Cathrine Jespersgaard¹, Ali Syed¹, Piotr Chmura², and Peter Løngreen¹
View Affiliations Hide Affiliations

Affiliations: ¹Danish National Genome Center, DK-2300 Copenhagen S, Denmark; email: [email protected] ²Novo Nordisk Foundation Center for Protein Research, University of Copenhagen, DK-2200 Copenhagen N, Denmark
Vol. 3:391-410 (Volume publication date July 2020) https://doi.org/10.1146/annurev-biodatasci-012920-013357
Copyright © 2020 by Annual Reviews. All rights reserved

Abstract

The increasing amounts of healthcare data stored in health registries, in combination with genomic and other types of data, have the potential to enable better decision making and pave the path for personalized medicine. However, reaping the full benefits of big, sensitive data for the benefit of patients requires greater access to data across organizations and institutions in various regions. This overview first introduces cloud computing and takes stock of the challenges to enhancing data availability in the healthcare system. Four models for ensuring higher data accessibility are then discussed. Finally, several cases are discussed that explore how enhanced access to data would benefit the end user.

Keyword(s): clinical IT infrastructure, cloud bursting, cloud infrastructures, orchestration, personalized medicine

Article metrics loading...

/content/journals/10.1146/annurev-biodatasci-012920-013357

2020-07-20

2024-05-06

Full text loading...

/deliver/fulltext/biodatasci/3/1/annurev-biodatasci-012920-013357.html?itemId=/content/journals/10.1146/annurev-biodatasci-012920-013357&mimeType=html&fmt=ahah

Literature Cited

1.
OECD (Organ. Econ. Co-op. Dev.) 2019. OECD health statistics 2019 Health Stat. Database, OECD, Paris accessed Oct 19. https://www.oecd.org/health/health-data.htm
2.
Stanford Med 2017. Stanford Medicine 2017 health trends report: harnessing the power of data in health Health Trends Rep., Stanford Med Stanford, CA:
3.
Ashley EA. 2016. Towards precision medicine. Nat. Rev. Genet. 17:507–22
[Google Scholar]
4.
Tremblay J, Hamet P. 2013. Role of genomics on the path to personalized medicine. Metabolism 62:S2–5
[Google Scholar]
5.
Hulsen T, Jamuar SS, Moody AR, Karnes JH, Varga O et al. 2019. From big data to precision medicine. Front. Med. 6:34
[Google Scholar]
6.
Ahmad F, Tripathi MM. 2018. Approaches of big data in healthcare: a critical review. Int. J. Adv. Res. Comput. Sci. 9:122–27
[Google Scholar]
7.
Kong H-J. 2019. Managing unstructured big data in healthcare system. Healthc. Inform. Res. 25:1–2
[Google Scholar]
8.
Meienberg J, Bruggmann R, Oexle K, Matyas G 2016. Clinical sequencing: Is WGS the better WES. ? Hum. Genet. 135:359–62
[Google Scholar]
9.
Lionel AC, Costain G, Monfared N, Walker S, Reuter MS et al. 2018. Improved diagnostic yield compared with targeted gene sequencing panels suggests a role for whole-genome sequencing as a first-tier genetic test. Genet. Med. 20:435–44
[Google Scholar]
10.
Krier JB, Kalia SS, Green RC 2016. Genomic sequencing in clinical practice: applications, challenges, and opportunities. Dialogues Clin. Neurosci. 18:299–312
[Google Scholar]
11.
Weinstein J, Collisson EA, Mills GB, Shaw KRM, Ozenberger BA et al. 2013. The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet. 45:1113–20
[Google Scholar]
12.
Bailey MH, Tokheim C, Porta-Pardo E, Sengupta S, Bertrand D et al. 2018. Comprehensive characterization of cancer driver genes and mutations. Cell 173:371–85
[Google Scholar]
13.
Arora NS, Davis JK, Kirby C, McGuire AL, Green RC et al. 2017. Communication challenges for nongeneticist physicians relaying clinical genomic results. Pers. Med. 14:423–31
[Google Scholar]
14.
Hasin Y, Seldin M, Lusis A 2017. Multi-omics approaches to disease. Genome Biol 18:83
[Google Scholar]
15.
Hickey H. 2006. Bringing supercomputers to life (sciences). Biomed. Comput. Rev. 2:7–15
[Google Scholar]
16.
Chandak S, Tatwawadi K, Ochoa I, Hernaez M, Weissman T 2018. SPRING: a next-generation compressor for FASTQ data. Bioinformatics 35:2674–76
[Google Scholar]
17.
Stephens ZD, Lee SY, Faghri F, Campbell RH, Zhai C et al. 2015. Big data: astronomical or genomical. ? PLOS Biol 13:e1002195
[Google Scholar]
18.
Hernaez M, Pavlichin D, Weissman T, Ochoa I 2019. Genomic data compression. Annu. Rev. Biomed. Data Sci. 2:19–37
[Google Scholar]
19.
Tang H, Jiang X, Wang X, Wang S, Sofia H et al. 2016. Protecting genomic data analytics in the cloud: state of the art and opportunities. BMC Med. Genom. 9:63
[Google Scholar]
20.
Conesa A, Beck S. 2019. Making multi-omics data accessible to researchers. Sci. Data 6:251
[Google Scholar]
21.
Schatz MC, Langmead B. 2013. The DNA data deluge: fast, efficient genome sequencing machines are spewing out more data than geneticists can analyze. IEEE Spectr 50:26–33
[Google Scholar]
22.
Merelli I. 2018. Infrastructure for high-performance computing: grids and grid computing. Encyclopedia of Bioinformatics and Computational Biology: ABC of Bioinformatics S Ranganathan, M Gribskov, K Nakai, C Schönbach 230–35 Amsterdam: Elsevier
[Google Scholar]
23.
Raj R, Kaprio J, Korja M, Mikkonen ED, Jousilahti P, Siironen J 2017. Risk of hospitalization with neurodegenerative disease after moderate-to-severe traumatic brain injury in the working-age population: a retrospective cohort study using the Finnish national health registries. PLOS Med 14:e1002316
[Google Scholar]
24.
Friberg L, Tabrizi F, Englund A 2016. Catheter ablation for atrial fibrillation is associated with lower incidence of stroke and death: data from Swedish health registries. Eur. Heart J. 37:2478–87
[Google Scholar]
25.
Schmidt M, Schmidt SA, Sandegaard JL, Ehrenstein V, Pedersen L, Sorensen HT 2015. The Danish National Patient Registry: a review of content, data quality, and research potential. Clin. Epidemiol. 7:449–90
[Google Scholar]
26.
RKKP (Reg. Klin. Kvalitetsudviklingsprogr.) 2019. Introduction to RKKP https://www.rkkp.dk/in-english/
27.
Buske OJ, Girdea M, Dumitriu S, Gallinger B, Hartley T et al. 2015. PhenomeCentral: a portal for phenotypic and genotypic matchmaking of patients with rare genetic diseases. Hum. Mutat. 36:931–40
[Google Scholar]
28.
Chatzimichali EA, Brent S, Hutton B, Perrett D, Wright CF et al. 2015. Facilitating collaboration in rare genetic disorders through effective matchmaking in DECIPHER. Hum. Mutat. 36:941–49
[Google Scholar]
29.
Fokkema IF, Taschner PE, Schaafsma GC, Celli J, Laros JF, den Dunnen JT 2011. LOVD v.2.0: the next generation in gene variant databases. Hum. Mutat. 32:557–63
[Google Scholar]
30.
Landrum MJ, Lee JM, Benson M, Brown GR, Chao C et al. 2018. ClinVar: improving access to variant interpretations and supporting evidence. Nucleic Acids Res 46:D1062–67
[Google Scholar]
31.
Njolstad PR, Andreassen OA, Brunak S, Borglum AD, Dillner J et al. 2019. Roadmap for a precision-medicine initiative in the Nordic region. Nat. Genet. 51:924–30
[Google Scholar]
32.
Ceruzzi PE. 2003. A History of Modern Computing Cambridge, MA: MIT Press
33.
Thomas J. 2019. DOE/NNSA, Lab announce partnership with Cray to develop NNSA's first exascale supercomputer News Release, Aug. 13, Lawrence Livermore Natl. Lab Livermore, CA: https://www.llnl.gov/news/doennsa-lab-announce-partnership-cray-develop-nnsas-first-exascale-supercomputer
34.
Navale V, Bourne PE. 2018. Cloud computing applications for biomedical science: a perspective. PLOS Comput. Biol. 14:e1006144
[Google Scholar]
35.
Mell P, Grance T. 2011. The NIST definition of cloud computing Special Pub. 800-145, Inf. Technol. Lab., Natl. Inst. Stand. Technol Gaithersburg, MD:
36.
Hum. Microbiome Proj 2013. Human Microbiome Project Project Dataset, Amazon Web Serv Seattle, WA: updated Jan. 2018. https://aws.amazon.com/datasets/human-microbiome-project/
37.
AWS (Amazon Web Serv.) 2014. Novartis case study Case Study, Amazon Web. Serv Seattle, WA: https://aws.amazon.com/it/solutions/case-studies/novartis/
38.
Off. Data Sci. Strateg 2019. About the STRIDES Initiative Media Resour., Off. Data Sci. Strateg., Natl. Inst. Health Bethesda, MD: https://datascience.nih.gov/strides
39.
Malin BA, Emam KE, O'Keefe CM 2013. Biomedical data privacy: problems, perspectives, and recent advances. J. Am. Med. Inform. Assoc. 20:2–6
[Google Scholar]
40.
Marozzo F, Paolo T. 2018. Infrastructures for high-performance computing: cloud computing development environments. Encyclopedia of Bioinformatics and Computational Biology: ABC of Bioinformatics S Ranganathan, M Gribsov, K Nakai, C Schönbach 247–51 Amsterdam: Elsevier
[Google Scholar]
41.
Descartes Labs 2019. Thunder from the cloud: 40,000 cores running in concert on AWS. Meditations Blog, Jun. 19. https://medium.com/descarteslabs-team/thunder-from-the-cloud-40-000-cores-running-in-concert-on-aws-bf1610679978
[Google Scholar]
42.
Barrett A, Basilyan M. 2017. 220,000 cores and counting: MIT math professor breaks record for largest ever Compute Engine job. Google Cloud Platform Blog April 20. https://cloud.google.com/blog/products/gcp/220000-cores-and-counting-mit-math-professor-breaks-record-for-largest-ever-compute-engine-job
[Google Scholar]
43.
Milne R, Morley KI, Howard H, Niemiec E, Nicol D et al. 2019. Trust in genomic data sharing among members of the general public in the UK, USA, Canada and Australia. Hum. Genet. 138:1237–46
[Google Scholar]
44.
Krahe M, Milligan E, Reilly S 2019. Personal health information in research: perceived risk, trustworthiness and opinions from patients attending a tertiary healthcare facility. J. Biomed. Inform. 95:103222
[Google Scholar]
45.
McGuire AL, Caulfield T, Cho MK 2008. Research ethics and the challenge of whole-genome sequencing. Nat. Rev. Genet. 9:152–56
[Google Scholar]
46.
Gutmann A, Wagner J, Ali Y, Allen A, Arras J et al. 2012. Privacy and progress in whole genome sequencing White Pap., Pres. Comm. Study Bioeth. Issues Washington, DC:
47.
Howard HC, Knoppers BM, Cornel MC, Clayton EW, Sénécal K et al. 2015. Whole-genome sequencing in newborn screening? A statement on the continued importance of targeted approaches in newborn screening programmes. Eur. J. Hum. Genet. 23:1593–600
[Google Scholar]
48.
Ayday E, De Cristofaro E, Hubaux J-P, Tsudik G 2015. Whole genome sequencing: revolutionary medicine or privacy nightmare. ? Computer 48:58–66
[Google Scholar]
49.
Barth-Jones D. 2012. The “re-identification” of Governor William Weld's medical information: a critical re-examination of health data identification risks and privacy protections, then and now Work. Pap., Mailman Sch. Public Health, Columbia Univ. July 24 https://dx.doi.org/10.2139/ssrn.2076397
[Crossref]
50.
Eur. Comm 2019. General Data Protection Regulation Legis. Act, Eur. Comm Brussels: https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:32016R0679
51.
Calif. Legis 2018. California Consumer Privacy Act Legis. Act, Calif. Legis Sacramento: https://leginfo.legislature.ca.gov/faces/billTextClient.xhtml?bill_id=201720180AB375
52.
Podlesny NJ, Kayem AVDM, Schorlemer S, Uflacker M 2018. Minimising information loss on anonymised high dimensional data with greedy in-memory processing. DEXA 2018: Database and Expert Systems Applications S Hartman, H Ma, A Hameurlain, G Pernul, R Wagner 85–100 Cham, Switz.: Springer
[Google Scholar]
53.
Sweeney L. 2002. Achieving k-anonymity privacy protection using generalization and suppression. Int. J. Uncertain. Fuzziness Knowl.-Based Syst. 10:571–88
[Google Scholar]
54.
Machanavajjhala A, Gehrke J, Kifer D, Venkitasubramaniam M 2006. l-diversity: privacy beyond k-anonymity. Proceedings of the 22nd International Conference on Data Engineering (ICDE '06) Pap. 24 Los Alamitos, CA: IEEE Comput. Soc.
[Google Scholar]
55.
Li N, Li T, Venkatasubramanian S 2007. t-closeness: privacy beyond k-anonymity and l-diversity. Proceedings of the 2007 IEEE 23rd International Conference on Data Engineering106–15 Los Alamitos, CA: IEEE Comput. Soc.
[Google Scholar]
56.
Islam MZ, Brankovic L. 2011. Privacy preserving data mining: a noise addition framework using a novel clustering technique. Knowl.-Based Syst. 24:1214–23
[Google Scholar]
57.
Meyerson A, Williams R. 2004. On the complexity of optimal k-anonymity. Proceedings of the Twenty-Third ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems223–28 New York: Assoc. Comput. Mach.
[Google Scholar]
58.
Dwork C. 2008. Differential privacy: a survey of results. Theory and Applications of Models of Computation M Agrawal, D Du, Z Duan, A Li 1–19 Berlin: Springer
[Google Scholar]
59.
Bhaskar R, Laxman S, Smith A, Thakurta A 2010. Discovering frequent patterns in sensitive data. Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining503–12 New York: Assoc. Comput. Mach.
[Google Scholar]
60.
Dwork C. 2011. Differential privacy. Encyclopedia of Cryptography and Security HCA van Tilborg, S Jajodia 338–40 Boston: Springer. , 2nd ed..
[Google Scholar]
61.
Kifer D, Machanavajjhala A. 2011. No free lunch in data privacy. Proceedings of the 2011 ACM SIGMOD International Conference on Management of Data193–204 New York: Assoc. Comput. Mach.
[Google Scholar]
62.
El Emam K, Jonker E, Arbuckle L, Malin B 2011. A systematic review of re-identification attacks on health data. PLOS ONE 6:e28071
[Google Scholar]
63.
Mohammed N, Fung B, Hung PCK, Lee C-K 2009. Anonymizing healthcare data: a case study on the blood transfusion service. Proceedings of the 15th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining1285–94 New York: Assoc. Comput. Mach.
[Google Scholar]
64.
Mohammed N, Fung B, Hung PCK, Lee C-K 2010. Centralized and distributed anonymization for high-dimensional healthcare data. ACM Transactions on Knowledge Discovery from Data (TKDD) Art. 18 New York: Assoc. Comput. Mach.
[Google Scholar]
65.
Podlesny NJ, Kayem AVDM, Meinel C 2019. Identifying data exposure across high-dimensional health data silos through Bayesian networks optimised by multigrid and manifold. IEEE 17th International Conference on Dependable, Autonomic and Secure Computing (DASC)556–63 Los Alamitos, CA: IEEE Comput. Soc.
[Google Scholar]
66.
Wilkinson MD, Dumontier M, Aalbersberg IJ, Appleton G, Axton M et al. 2016. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 3:160018
[Google Scholar]
67.
Eur. Comm 2017. Pooling data to combat rare diseases News Release, Dec. 15. https://ec.europa.eu/jrc/en/news/pooling-data-combat-rare-diseases
68.
EJP RD (Eur. Jt. Progr. Rare Dis.) 2019. About EJP RD Media Resour., Eur. Jt. Progr. Rare Dis https://www.ejprarediseases.org/index.php/about/
69.
ELIXIR 2017. ELIXIR position paper on FAIR Data Management in the life sciences Position Pap., Sept. 7, ELIXIR Hinxton, UK: https://elixir-europe.org/system/files/elixir_statement_on_fair_data_management.pdf
70.
Ison J, Ienasescu H, Chmura P, Rydza E, Menager H et al. 2019. The bio.tools registry of software tools and data resources for the life sciences. Genome Biol 20:164
[Google Scholar]
71.
Jensen AB, Moseley PL, Oprea TI, Ellesoe SG, Eriksson R et al. 2014. Temporal disease trajectories condensed from population-wide registry data covering 6.2 million patients. Nat. Commun. 5:4022
[Google Scholar]
72.
Nielsen AB, Thorsen-Meyer H-C, Belling K, Nielsen AP, Thomas CE et al. 2019. Survival prediction in intensive-care units based on aggregation of long-term disease history and acute physiology: a retrospective study of the Danish National Patient Registry and electronic patient records. Lancet Digital Health 1:e78–89
[Google Scholar]
73.
Cook CE, Bergman MT, Finn RD, Cochrane G, Birney E, Apweiler R 2016. The European Bioinformatics Institute in 2016: data growth and integration. Nucleic Acids Res 44:D20–26
[Google Scholar]
74.
Linden M, Prochazka M, Lappalainen I, Bucik D, Vyskocil P et al. 2018. Common ELIXIR service for researcher authentication and authorisation. F1000Research 7:1199
[Google Scholar]
75.
Saunders G, Baudis M, Becker R, Beltran S, Béroud C et al. 2019. Leveraging European infrastructures to access 1 million human genomes by 2022. Nat. Rev. Genet. 20:693–701
[Google Scholar]
76.
Moran AE, Forouzanfar MH, Roth GA, Mensah GA, Ezzati M et al. 2014. The global burden of ischemic heart disease in 1990 and 2010: the Global Burden of Disease 2010 study. Circulation 129:1493–501
[Google Scholar]
77.
Eur. Heart Netw 2017. European Cardiovascular Disease Statistics 2017 Brussels: Eur. Heart Netw http://www.ehnheart.org/cvd-statistics.html
78.
Agile Alliance 2001. Manifesto for agile software development Unpubl. Manif Agile Alliance, Corryton, TN: https://agilemanifesto.org/
79.
Dingsøyr T, Nerur S, Balijepally V, Moe NB 2012. A decade of agile methodologies: towards explaining agile software development. J. Syst. Softw. 85:1213–21
[Google Scholar]
80.
Crew B. 2019. Lili Milani banks Estonia's genomic potential. Nature 569:S16–17
[Google Scholar]
81.
Leitsalu L, Haller T, Esko T, Tammesoo ML, Alavere H et al. 2015. Cohort profile: Estonian Biobank of the Estonian Genome Center, University of Tartu. Int. J. Epidemiol. 44:1137–47
[Google Scholar]
82.
NHS (Natl. Health Serv.) Engl 100,000 Genomes Project Media Resour., Natl. Health Serv Engl., London: https://www.england.nhs.uk/genomics/100000-genomes-project/
83.
Samuel GN, Farsides B. 2018. Genomics England's implementation of its public engagement strategy: blurred boundaries between engagement for the United Kingdom's 100,000 Genomes project and the need for public support. Public Underst. Sci. 27:352–64
[Google Scholar]
84.
Genom. Engl 2018. The UK has sequenced 100,000 whole genomes in the NHS News Release, Dec. 5., Genom Engl., London: https://www.genomicsengland.co.uk/the-uk-has-sequenced-100000-whole-genomes-in-the-nhs/

/content/journals/10.1146/annurev-biodatasci-012920-013357

Supercomputing and Secure Cloud Infrastructures in Biology and Medicine

Annual Review of Biomedical Data Science 3, 391 (2020); https://doi.org/10.1146/annurev-biodatasci-012920-013357

/content/journals/10.1146/annurev-biodatasci-012920-013357

Data & Media loading...

Article Type: Review Article

Most Cited Most Cited RSS feed

- Ethical Machine Learning in Healthcare
  
  Irene Y. Chen, Emma Pierson, Sherri Rose, Shalmali Joshi, Kadija Ferryman, and Marzyeh Ghassemi
  
  Vol. 4 (2021), pp. 123–144
- Spatial Metabolomics and Imaging Mass Spectrometry in the Age of Artificial Intelligence
  
  Theodore Alexandrov
  
  Vol. 3 (2020), pp. 61–87
- Advances in Electronic Phenotyping: From Rule-Based Definitions to Machine Learning Models
  
  Juan M. Banda, Martin Seneviratne, Tina Hernandez-Boussard, and Nigam H. Shah
  
  Vol. 1 (2018), pp. 53–68
- Computational Methods for Understanding Mass Spectrometry–Based Shotgun Proteomics Data
  
  Pavel Sinitcyn, Jan Daniel Rudolph, and Jürgen Cox
  
  Vol. 1 (2018), pp. 207–234
- RNA Sequencing Data: Hitchhiker's Guide to Expression Analysis
  
  Koen Van den Berge, Katharina M. Hembach, Charlotte Soneson, Simone Tiberi, Lieven Clement, Michael I. Love, Rob Patro, and Mark D. Robinson
  
  Vol. 2 (2019), pp. 139–173
- Deep Learning in Biomedical Data Science
  
  Pierre Baldi
  
  Vol. 1 (2018), pp. 181–205
- From Tissues to Cell Types and Back: Single-Cell Gene Expression Analysis of Tissue Architecture
  
  Xi Chen, Sarah A. Teichmann, and Kerstin B. Meyer
  
  Vol. 1 (2018), pp. 29–51
- Using Phecodes for Research with the Electronic Health Record: From PheWAS to PheRS
  
  Lisa Bastarache
  
  Vol. 4 (2021), pp. 1–19
- Visualization of Biomedical Data
  
  Seán I. O'Donoghue, Benedetta Frida Baldi, Susan J. Clark, Aaron E. Darling, James M. Hogan, Sandeep Kaur, Lena Maier-Hein, Davis J. McCarthy, William J. Moore, Esther Stenau, Jason R. Swedlow, Jenny Vuong, and James B. Procter
  
  Vol. 1 (2018), pp. 275–304
- Alignment-Free Sequence Analysis and Applications
  
  Jie Ren, Xin Bai, Yang Young Lu, Kujin Tang, Ying Wang, Gesine Reinert, and Fengzhu Sun
  
  Vol. 1 (2018), pp. 93–114
More Less

Annual Review of Biomedical Data Science

Volume 3, 2020

Review Article

Free

Supercomputing and Secure Cloud Infrastructures in Biology and Medicine

Abstract

Most Read This Month

Most Cited Most Cited RSS feed