1932

Abstract

Genomics data are important for advancing biomedical research, improving clinical care, and informing other disciplines such as forensics and genealogy. However, privacy concerns arise when genomic data are shared. In particular, the identifying nature of genetic information, its direct relationship to health status, and the potential financial harm and stigmatization posed to individuals and their blood relatives call for a survey of the privacy issues related to sharing genetic and related data and potential solutions to overcome these issues. In this work, we provide an overview of the importance of genomic privacy, the information gleaned from genomics data, the sources of potential private information leakages in genomics, and ways to preserve privacy while utilizing the genetic information in research. We discuss the relationship between trust in the scientific community and protecting privacy, illuminating a future roadmap for data sharing and study participation.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-biodatasci-122120-021311
2022-08-10
2024-04-25
Loading full text...

Full text loading...

/deliver/fulltext/biodatasci/5/1/annurev-biodatasci-122120-021311.html?itemId=/content/journals/10.1146/annurev-biodatasci-122120-021311&mimeType=html&fmt=ahah

Literature Cited

  1. 1.
    Warren S, Brandeis L. 1890. The right to privacy. Harvard Law Rev. 4:5193–220
    [Google Scholar]
  2. 2.
    Werner T. 2010. Next generation sequencing in functional genomics. Brief. Bioinform. 11:5499–511
    [Google Scholar]
  3. 3.
    Hirst M, Marra MA. 2010. Next generation sequencing based approaches to epigenomics. Brief. Funct. Genom. 9:5–6455–65
    [Google Scholar]
  4. 4.
    All Us Res. Progr. Investig. Denny JC, Rutter JL, Goldstein DB, Philippakis A et al. 2019. The “All of Us” Research Program. N. Engl. J. Med 381:7668–76
    [Google Scholar]
  5. 5.
    Cancer Genome Atlas Res. Netw. Weinstein JN, Collisson EA, Mills GB, Shaw KRM et al. 2013. The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet. 45:101113–20
    [Google Scholar]
  6. 6.
    Wellcome Trust, MRC (Med. Res. Counc.), UK Dep. Health 2002. The UK Biobank: A Study of Genes, Environment and Health London: Wellcome Trust
  7. 7.
    Ponting CP. 2019. The Human Cell Atlas: making “cell space” for disease. Dis. Model. Mech. 12:2dmm037622
    [Google Scholar]
  8. 8.
    GTEx Consort 2013. The Genotype-Tissue Expression (GTEx) project. Nat. Genet. 45:6580–85
    [Google Scholar]
  9. 9.
    ENCODE Proj. Consort 2012. An integrated encyclopedia of DNA elements in the human genome. Nature 489:741457–74
    [Google Scholar]
  10. 10.
    Erlich Y, Shor T, Pe'er I, Carmi S 2018. Identity inference of genomic data using long-range familial searches. Science 362:6415690–94
    [Google Scholar]
  11. 11.
    Lin Z. 2004. Genomic research and human subject privacy. Science 305:5681183
    [Google Scholar]
  12. 12.
    Wickenheiser RA. 2019. Forensic genealogy, bioethics and the Golden State Killer case. Forensic. Sci. Int. Synerg. 1:114–25
    [Google Scholar]
  13. 13.
    Gürsoy G 2020. Criticality of data sharing in genomic research and public views of genomic data sharing. Responsible Genomic Data Sharing: Challenges and Approaches X Jiang, H Tang 3–18 London: Elsevier
    [Google Scholar]
  14. 14.
    Arellano AM, Dai W, Wang S, Jiang X, Ohno-Machado L. 2018. Privacy policy and technology in biomedical data science. Annu. Rev. Biomed. Data Sci. 1:115–29
    [Google Scholar]
  15. 15.
    Harbord K. 2019. Genetic data privacy solutions in the GDPR. Tex. A&M Law Rev. 7:1269–97
    [Google Scholar]
  16. 16.
    Erlich Y, Narayanan A. 2014. Routes for breaching and protecting genetic privacy. Nat. Rev. Genet. 15:6409–21
    [Google Scholar]
  17. 17.
    Boylan M. 2008. Racial profiling and genetic privacy Tech. Rep., Cent. Am. Progr. Washington, DC: https://www.americanprogress.org/article/racial-profiling-and-genetic-privacy/
  18. 18.
    Knoppers BM, Beauvais MJS. 2021. Three decades of genetic privacy: a metaphoric journey. Hum. Mol. Genet. 30:R2R156–60
    [Google Scholar]
  19. 19.
    Robinson JC. 2004. Ethics and genetic privacy. Online J. Health Ethics 1:11
    [Google Scholar]
  20. 20.
    DeCew JW. 2004. Privacy and policy for genetic research. Ethics Inf. Technol. 6:5–14
    [Google Scholar]
  21. 21.
    Troy ESF 1997. The Genetic Privacy Act: an analysis of privacy and research concerns. J. Law Med. Ethics 25:256–72
    [Google Scholar]
  22. 22.
    Strand NK. 2016. Shedding privacy along with our genetic material: What constitutes adequate legal protection against surreptitious genetic testing?. AMA J. Ethics 18:3264–71
    [Google Scholar]
  23. 23.
    Paillier F. 2018. About consumer genomics, genetic data privacy and ethics. J. Bioanal. Biomed. 10:132–33
    [Google Scholar]
  24. 24.
    Anderlik MA, Rothstein MA. 2001. Privacy and confidentiality of genetic information: what rules for the new science?. Annu. Rev. Genom. Hum. Genet. 2:401–33
    [Google Scholar]
  25. 25.
    Springer JA, Beever J, Morar N, Sprague JE, Kane MD. 2013. Ethics, privacy, and the future of genetic information in healthcare information assurance and security. Bioinformatics: Concepts, Methodologies, Tools, and Applications1405–23 Hershey, PA: IPI Global
    [Google Scholar]
  26. 26.
    O'Neill M. 2021. Genetic information, social justice, and risk-sharing institutions. J. Med. Ethics 47:473–79
    [Google Scholar]
  27. 27.
    Kaan T, Ho CW-L, eds. 2013. Genetic Privacy: An Evaluation of the Ethical and Legal Landscape Hackensack, NJ: Imp. Coll. Press
  28. 28.
    Tovino SA. 2021. HIPAA compliance. The Cambridge Handbook of Complianceed. B van Rooij, DD Sokol895–908 Cambridge, UK: Cambridge Univ. Press
    [Google Scholar]
  29. 29.
    Shifman S, Kuypers J, Kokoris M, Yakir B, Darvasi A. 2003. Linkage disequilibrium patterns of the human genome across populations. Hum. Mol. Genet. 12:771–76
    [Google Scholar]
  30. 30.
    Gymrek M, McGuire AL, Golan D, Halperin E, Erlich Y. 2013. Identifying personal genomes by surname inference. Science 339:6117321–24
    [Google Scholar]
  31. 31.
    Mittos A, Malin B, De Cristofaro E. 2019. Systematizing genome privacy research: a privacy-enhancing technologies perspective. Proc. Priv. Enhanc. Technol. 2019:187–107
    [Google Scholar]
  32. 32.
    EEOC (Equal Employ. Oppor. Comm.) 2014. Genetic Information Nondiscrimination Act Fact Sheet, EEOC Washington, DC: https://www.eeoc.gov/laws/guidance/fact-sheet-genetic-information-nondiscrimination-act
  33. 33.
    Sweeney L, Abu A, Winn J. 2013. Identifying participants in the personal genome project by name White Pap., Data Priv. Lab, IQSS, Harvard Univ. Cambridge, MA:
  34. 34.
    Lippert C, Sabatini R, Maher MC, Kang EY, Lee S et al. 2017. Identification of individuals by trait prediction using whole-genome sequencing data. PNAS 114:3810166–71
    [Google Scholar]
  35. 35.
    Venkatesaramani R, Malin BA, Vorobeychik Y. 2021. Re-identification of individuals in genomic datasets using public face images. Sci. Adv. 7:47eabg3296
    [Google Scholar]
  36. 36.
    Erlich Y. 2017. Major flaws in “Identification of individuals by trait prediction using whole-genome sequencing data.”. bioRxiv 10.1101/185330 . https://doi.org/10.1101/185330
    [Crossref]
  37. 37.
    Zhu X, Zhang S, Kan D, Cooper R. 2003. Haplotype block definition and its application. Pac. Symp. Biocomput. 2004 152–63
    [Google Scholar]
  38. 38.
    Naj A. 2019. Genotype imputation in genome-wide association studies. Curr. Protoc. Hum. Genet. 102:1e84
    [Google Scholar]
  39. 39.
    Rubinacci S. 2020. Genotype imputation methods for next generation datasets. PhD Thesis Univ. Oxford Oxford, UK:
    [Google Scholar]
  40. 40.
    Roshyara NR. 2020. Genome-wide genotype imputation-aspects of quality, performance and practical implementation. PhD Thesis Univ. Leipzig Leipzig, Ger:.
    [Google Scholar]
  41. 41.
    Sherman MA. 2021. Paving the path toward genomic privacy with secure imputation. Cell Syst 12:10950–52
    [Google Scholar]
  42. 42.
    Browning BL, Zhou Y, Browning SR. 2018. A one-penny imputed genome from next-generation reference panels. Am. J. Hum. Genet. 103:3338–48
    [Google Scholar]
  43. 43.
    van Leeuwen EM, Kanterakis A, Deelen P, Kattenberg MV, Genome Neth Consort. et al. 2015. Population-specific genotype imputations using minimac or IMPUTE2. Nat. Protoc. 10:91285–96
    [Google Scholar]
  44. 44.
    Davies RW, Kucka M, Su D, Shi S, Flanagan M et al. 2021. Rapid genotype imputation from sequence with reference panels. Nat. Genet. 53:1104–11
    [Google Scholar]
  45. 45.
    Howie BN, Donnelly P, Marchini J. 2009. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLOS Genet 5:6e1000529
    [Google Scholar]
  46. 46.
    Wheeler DA, Srinivasan M, Egholm M, Shen Y, Chen L et al. 2008. The complete genome of an individual by massively parallel DNA sequencing. Nature 452:7189872–76
    [Google Scholar]
  47. 47.
    Nyholt DR, Yu C-E, Visscher PM. 2009. On Jim Watson's APOE status: Genetic information is hard to hide. Eur. J. Hum. Genet. 17:2147–49
    [Google Scholar]
  48. 48.
    Paltoo DN, Rodriguez LL, Feolo M, Gillanders E, Ramos EM et al. 2014. Data use under the NIH GWAS Data Sharing Policy and future directions. Nat. Genet. 46:9934–38
    [Google Scholar]
  49. 49.
    Krawczak M, Goebel JW, Cooper DN. 2010. Is the NIH policy for sharing GWAS data running the risk of being counterproductive?. Investig. Genet. 1:3
    [Google Scholar]
  50. 50.
    NIH (Natl. Inst. Health) 2018. Update to NIH management of genomic summary results access Public Not. NOT-OD-19-023 NIH, US Dept. Health Hum. Serv. Washington, DC: https://grants.nih.gov/grants/guide/notice-files/NOT-OD-19-023.html
  51. 51.
    Tsunoda T, Tanaka T, Nakamura Y, eds. 2019. Genome-Wide Association Studies Singapore: Springer Nature
  52. 52.
    Homer N, Szelinger S, Redman M, Duggan D, Tembe W et al. 2008. Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLOS Genet 4:8e1000167
    [Google Scholar]
  53. 53.
    Sankararaman S, Obozinski G, Jordan MI, Halperin E. 2009. Genomic privacy and limits of individual detection in a pool. Nat. Genet. 41:965–67
    [Google Scholar]
  54. 54.
    Lumley T, Rice K 2010. Potential for revealing individual-level information in genome-wide association studies. JAMA 303:7659–60
    [Google Scholar]
  55. 55.
    Im HK, Gamazon ER, Nicolae DL, Cox NJ. 2012. On sharing quantitative trait GWAS results in an era of multiple-omics data and the limits of genomic privacy. Am. J. Hum. Genet. 90:4591–98
    [Google Scholar]
  56. 56.
    Fiume M, Cupak M, Keenan S, Rambla J, de la Torre S et al. 2019. Federated discovery and sharing of genomic data using Beacons. Nat. Biotechnol. 37:3220–24
    [Google Scholar]
  57. 57.
    Shringarpure SS, Bustamante CD. 2015. Privacy risks from genomic data-sharing beacons. Am. J. Hum. Genet. 97:5631–46
    [Google Scholar]
  58. 58.
    Ayoz K, Ayday E, Cicek AE. 2021. Genome reconstruction attacks against genomic data-sharing beacons. Proc. Priv. Enhancing Technol. 2021:328–48
    [Google Scholar]
  59. 59.
    De Cristofaro E. 2021. A critical overview of privacy in machine learning. IEEE Secur. Priv. 19:19–27
    [Google Scholar]
  60. 60.
    Shokri R, Stronati M, Song C, Shmatikov V. 2017. Membership inference attacks against machine learning models. 2017 IEEE Symposium on Security and Privacy (SP)3–18 Los Alamitos, CA: IEEE Comput. Soc.
  61. 61.
    Oprisanu B, Ganev G, De Cristofaro E. 2021. On utility and privacy in synthetic genomic data. arXiv:2102.03314 [q-bio.GN]. https://arxiv.org/abs/2102.03314
  62. 62.
    Gürsoy G, Li T, Liu S, Ni E, Brannon CM, Gerstein MB. 2022. Functional genomics data: privacy risk assessment and technological mitigation. Nat. Rev. Genet. 23:24558
    [Google Scholar]
  63. 63.
    Gürsoy G, Emani P, Brannon CM, Jolanki OA, Harmanci A et al. 2020. Data sanitization to reduce private information leakage from functional genomics. Cell 183:4905–17.e16
    [Google Scholar]
  64. 64.
    Gürsoy G, Brannon CM, Navarro FCP, Gerstein M. 2020. FANCY: fast estimation of privacy risk in functional genomics data. Bioinformatics 36:215145–50
    [Google Scholar]
  65. 65.
    Harmanci A, Gerstein M. 2018. Analysis of sensitive information leakage in functional genomics signal profiles through genomic deletions. Nat. Commun. 9:2453
    [Google Scholar]
  66. 66.
    Harmanci A, Gerstein M. 2016. Quantification of private information leakage from phenotype–genotype data: linking attacks. Nat. Methods 13:3251–56
    [Google Scholar]
  67. 67.
    Schadt EE, Woo S, Hao K. 2012. Bayesian method to predict individual SNP genotypes from gene expression data. Nat. Genet. 44:5603–8
    [Google Scholar]
  68. 68.
    Gürsoy G, Lu N, Wagner S, Gerstein M. 2021. Recovering genotypes and phenotypes using allele-specific genes. Genome Biol 22:263
    [Google Scholar]
  69. 69.
    Backes M, Berrang P, Bieg M, Eils R, Herrmann C et al. 2017. Identifying personal DNA methylation profiles by genotype inference. 2017 IEEE Symposium on Security and Privacy (SP)957–76 Los Alamitos, CA: IEEE Comput. Soc.
  70. 70.
    Philibert RA, Terry N, Erwin C, Philibert WJ, Beach SR, Brody GH. 2014. Methylation array data can simultaneously identify individuals and convey protected health information: an unrecognized ethical concern. Clin. Epigenet. 6:128
    [Google Scholar]
  71. 71.
    DNAresource.com 2011. State DNA database laws: qualifying offenses Web Resour., DNAresource.com Tacoma, WA: https://www.dnaresource.com/documents/statequalifyingoffenses2011.pdf
  72. 72.
    Kim J, Edge MD, Algee-Hewitt BFB, Li JZ, Rosenberg NA. 2018. Statistical detection of relatives typed with disjoint forensic and biomedical loci. Cell 175:3848–58.e6
    [Google Scholar]
  73. 73.
    Edge MD, Algee-Hewitt BFB, Pemberton TJ, Li JZ, Rosenberg NA. 2017. Linkage disequilibrium matches forensic genetic records to disjoint genomic marker sets. PNAS 114:225671–76
    [Google Scholar]
  74. 74.
    Birzer ML. 2012. Racial Profiling: They Stopped Me Because I'm ———! Boca Raton, FL: CRC
  75. 75.
    Ramirez D, McDevitt J, Farrell A. 2014. A resource guide on racial profiling data collection systems: promising practices and lessons learned Resour. Guide, US Dep. Justice Washington, DC:
  76. 76.
    Lynch MJ, Patterson EB, Childs KK, eds. 2008. Racial Divide: Racial and Ethnic Bias in the Criminal Justice System Monsey, NY: Crim. Justice Press
  77. 77.
    Greytak EM, Moore C, Armentrout SL. 2019. Genetic genealogy for cold case and active investigations. Forensic Sci. Int. 299:103–13
    [Google Scholar]
  78. 78.
    Syndercombe Court D 2018. Forensic genealogy: some serious concerns. Forensic Sci. Int. Genet. 36:203–4
    [Google Scholar]
  79. 79.
    Ram N, Guerrini CJ, McGuire AL. 2018. Genealogy databases and the future of criminal investigation. Science 360:63931078–79
    [Google Scholar]
  80. 80.
    Edge MD, Coop G 2020. Attacks on genetic privacy via uploads to genealogical databases. eLife 9:e51810
    [Google Scholar]
  81. 81.
    Kennett D. 2019. Using genetic genealogy databases in missing persons cases and to develop suspect leads in violent crimes. Forensic Sci. Int. 301:107–17
    [Google Scholar]
  82. 82.
    Chung J, Kaufman A, Rauenzahn B. 2021. Privacy problems in the genetic testing industry. The Regulatory Review Jan. 23. https://www.theregreview.org/2021/01/23/saturday-seminar-privacy-problems-genetic-testing/
    [Google Scholar]
  83. 83.
    Albugmi A, Alassafi MO, Walters R, Wills G. 2016. Data security in cloud computing. 2016 Fifth International Conference on Future Generation Communication Technologies (FGCT)55–59 New York: IEEE
  84. 84.
    Byrd JB, Greene AC, Prasad DV, Jiang X, Greene CS. 2020. Responsible, practical genomic data sharing that accelerates research. Nat. Rev. Genet. 21:10615–29
    [Google Scholar]
  85. 85.
    Hie B, Cho H, Berger B. 2018. Realizing private and practical pharmacological collaboration. Science 362:6412347–50
    [Google Scholar]
  86. 86.
    Tryka KA, Hao L, Sturcke A, Jin Y, Wang ZY et al. 2014. NCBI's Database of Genotypes and Phenotypes: dbGaP. Nucleic Acids Res 42:D975–79
    [Google Scholar]
  87. 87.
    Lee SM, Majumder MA. 2021. National Institutes of Mental Health Data Archive: privacy, consent, and diversity considerations and options for improvement. AJOB Neurosci 13:13–9
    [Google Scholar]
  88. 88.
    Fernandez-Orth D, Lloret-Villas A, Rambla de Argila J. 2019. European Genome-phenome Archive (EGA)—granular solutions for the next 10 years. 2019 IEEE 32nd International Symposium on Computer-Based Medical Systems (CBMS)4–6 New York: IEEE
  89. 89.
    Berger B, Cho H. 2019. Emerging technologies towards enhancing privacy in genomic data sharing. Genome Biol 20:1128
    [Google Scholar]
  90. 90.
    NIH (Natl. Inst. Health) 2021. NIH security best practices for controlled-access data subject to the NIH Genomic Data Sharing (GDS) Policy Web Resour., NIH Washington, DC: https://osp.od.nih.gov/wp-content/uploads/NIH_Best_Practices_for_Controlled-Access_Data_Subject_to_the_NIH_GDS_Policy.pdf
  91. 91.
    Bonomi L, Huang Y, Ohno-Machado L. 2020. Privacy challenges and research opportunities for genomic data sharing. Nat. Genet. 52:7646–54
    [Google Scholar]
  92. 92.
    Tang H, Jiang X, Wang X, Wang S, Sofia H et al. 2016. Protecting genomic data analytics in the cloud: state of the art and opportunities. BMC Med. Genom. 9:63
    [Google Scholar]
  93. 93.
    Wang S, Mohammed N, Chen R 2014. Differentially private genome data dissemination through top-down specialization. BMC Med. Inform. Decis. Mak. 14:Suppl. 1S2
    [Google Scholar]
  94. 94.
    Lu W-J, Yamada Y, Sakuma J. 2015. Privacy-preserving genome-wide association studies on cloud environment using fully homomorphic encryption. BMC Med. Inform. Decis. Mak. 15:Suppl. 5S1
    [Google Scholar]
  95. 95.
    Cho H, Wu DJ, Berger B. 2018. Secure genome-wide association analysis using multiparty computation. Nat. Biotechnol. 36:6547–51
    [Google Scholar]
  96. 96.
    Constable SD, Tang Y, Wang S, Jiang X, Chapin S. 2015. Privacy-preserving GWAS analysis on federated genomic datasets. BMC Med. Inform. Decis. Mak. 15:Suppl. 5S2
    [Google Scholar]
  97. 97.
    Kockan C, Zhu K, Dokmai N, Karpov N, Kulekci MO et al. 2020. Sketching algorithms for genomic data analysis and querying in a secure enclave. Nat. Methods 17:3295–301
    [Google Scholar]
  98. 98.
    Kim M, Harmanci AO, Bossuat J-P, Carpov S, Cheon JH et al. 2021. Ultrafast homomorphic encryption models enable secure outsourcing of genotype imputation. Cell Syst. 12:11108–20.e4
    [Google Scholar]
  99. 99.
    Dokmai N, Kockan C, Zhu K, Wang X, Sahinalp SC, Cho H 2021. Privacy-preserving genotype imputation in a trusted execution environment. Cell Syst. 12:1983–93.e7
    [Google Scholar]
  100. 100.
    Gürsoy G, Chielle E, Brannon CM, Maniatakos M, Gerstein M. 2021. Privacy-preserving genotype imputation with fully homomorphic encryption. Cell Syst 13:12173–82.e3
    [Google Scholar]
  101. 101.
    Gürsoy G, Bjornson R, Green ME, Gerstein M. 2020. Using blockchain to log genome dataset access: efficient storage and query. BMC Med. Genom. 13:Suppl. 778
    [Google Scholar]
  102. 102.
    Ma S, Cao Y, Xiong L. 2020. Efficient logging and querying for blockchain-based cross-site genomic dataset access audit. BMC Med. Genom. 13:Suppl. 791
    [Google Scholar]
  103. 103.
    Pattengale ND, Hudson CM. 2020. Decentralized genomics audit logging via permissioned blockchain ledgering. BMC Med. Genom. 13:Suppl. 7102
    [Google Scholar]
  104. 104.
    Ozdayi MS, Kantarcioglu M, Malin B. 2020. Leveraging blockchain for immutable logging and querying across multiple sites. BMC Med. Genom. 13:Suppl. 782
    [Google Scholar]
  105. 105.
    Kuo T-T, Bath T, Ma S, Pattengale N, Yang M et al. 2021. Benchmarking blockchain-based gene-drug interaction data sharing methods: a case study from the iDASH 2019 secure genome analysis competition blockchain track. Int. J. Med. Inform. 154:104559
    [Google Scholar]
  106. 106.
    Gefenas E. 2006. The concept of risk and responsible conduct of research. Sci. Eng. Ethics 12:175–83
    [Google Scholar]
  107. 107.
    Tsosie KS, Yracheta JM, Dickenson D. 2019. Overvaluing individual consent ignores risks to tribal participants. Nat. Rev. Genet. 20:9497–98
    [Google Scholar]
  108. 108.
    U.N 1948. Universal declaration of human rights. U.N. Declar., U.N. Gen. Assem. Paris:
  109. 109.
    Grant S. 2019. Privacy is a human right—It can't be bought or sold Blog Post, Consum. Fed Am.: Dec. 19. https://consumerfed.org/privacy-is-a-human-right-it-cant-be-bought-or-sold/
  110. 110.
    Wee S-L. 2019. China uses DNA to track its people, with the help of American expertise. The New York Times Feb. 21
    [Google Scholar]
  111. 111.
    Fox K. 2020. The illusion of inclusion—the “All of Us” research program and Indigenous peoples’ DNA. N. Engl. J. Med. 383:5411–13
    [Google Scholar]
/content/journals/10.1146/annurev-biodatasci-122120-021311
Loading
/content/journals/10.1146/annurev-biodatasci-122120-021311
Loading

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error