1932

Abstract

Continued advances in precision medicine rely on the widespread sharing of data that relate human genetic variation to disease. However, data sharing is severely limited by legal, regulatory, and ethical restrictions that safeguard patient privacy. Federated analysis addresses this problem by transferring the code to the data—providing the technical and legal capability to analyze the data within their secure home environment rather than transferring the data to another institution for analysis. This allows researchers to gain new insights from data that cannot be moved, while respecting patient privacy and the data stewards’ legal obligations. Because federated analysis is a technical solution to the legal challenges inherent in data sharing, the technology and policy implications must be evaluated together. Here, we summarize the technical approaches to federated analysis and provide a legal analysis of their policy implications.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-genom-110122-084756
2023-08-25
2024-04-12
Loading full text...

Full text loading...

/deliver/fulltext/genom/24/1/annurev-genom-110122-084756.html?itemId=/content/journals/10.1146/annurev-genom-110122-084756&mimeType=html&fmt=ahah

Literature Cited

  1. 1.
    Afgan E, Jalili V, Goonasekera N, Taylor J, Goecks J. 2018. Federated Galaxy: biomedical computing at the frontier. 2018 IEEE 11th International Conference on Cloud Computing871–74. Piscataway, NJ: IEEE
  2. 2.
    Ateniese G, Felici G, Mancini LV, Spognardi A, Villani A, Vitali D. 2015. Hacking smart machines with smarter ones: how to extract meaningful data from machine learning classifiers. Int. J. Secur. Netw. 10:137–50
    [Google Scholar]
  3. 3.
    Ayaz M, Pasha MF, Alzahrani MY, Budiarto R, Stiawan D. 2021. The Fast Health Interoperability Resources (FHIR) standard: systematic literature review of implementations, applications, challenges and opportunities. JMIR Med. Inform. 9:e21929
    [Google Scholar]
  4. 4.
    Azzariti DR, Riggs ER, Niehaus A, Rodriguez LL, Ramos EM et al. 2018. Points to consider for sharing variant-level information from clinical genetic testing with ClinVar. Cold Spring Harb. Mol. Case Stud. 4:a002345
    [Google Scholar]
  5. 5.
    Bender D, Sartipi K. 2013. HL7 FHIR: an agile and RESTful approach to healthcare information exchange. Proceedings of the 26th IEEE International Symposium on Computer-Based Medical Systems326–31. Piscataway, NJ: IEEE
  6. 6.
    Bernier A, Molnár-Gábor F, Knoppers BM. 2022. The international data governance landscape. J. Law Biosci. 9:lsac005
    [Google Scholar]
  7. 7.
    Beskow LM, Hammack-Aviran CM, Brelsford KM. 2020. Thought leader comparisons of risks in precision medicine research. Ethics Hum. Res. 42:35–40
    [Google Scholar]
  8. 8.
    Blatt M, Gusev A, Polyakov Y, Goldwasser S. 2020. Secure large-scale genome-wide association studies using homomorphic encryption. PNAS 117:11608–13
    [Google Scholar]
  9. 9.
    Borgesius FZ. 2017. The Breyer case of the Court of Justice of the European Union: IP addresses and the personal data definition. Eur. Data Prot. Law Rev. 3:130–37
    [Google Scholar]
  10. 10.
    Breyer v. Bundesrepublik Deutschland, Ct. Just. Eur. Union C-582/14, ECLI:EU:C:2016:779 2016.)
  11. 11.
    Cabili MN, Lawson J, Saltzman A, Rushton G, O'Rourke P et al. 2021. Empirical validation of an automated approach to data use oversight. Cell Genom 1:100031
    [Google Scholar]
  12. 12.
    Carpov S, Tortech T. 2018. Secure top most significant genome variants search: iDASH 2017 competition. BMC Med. Genom. 11:Suppl. 482
    [Google Scholar]
  13. 13.
    Casaletto J, Parsons M, Markello C, Iwasaki Y, Momozawa Y et al. 2022. Federated analysis of BRCA1 and BRCA2 variation in a Japanese cohort. Cell Genom 2:a002345
    [Google Scholar]
  14. 14.
    Caswell-Jin JL, Gupta T, Hall E, Petrovchich IM, Mills MA et al. 2018. Racial/ethnic differences in multiple-gene sequencing results for hereditary cancer risk. Genet. Med. 20:234–39
    [Google Scholar]
  15. 15.
    Chaterji S, Koo J, Li N, Meyer F, Grama A, Bagchi S. 2017. Federation in genomics pipelines: techniques and challenges. Brief. Bioinform. 20:235–44
    [Google Scholar]
  16. 16.
    Cheah PY, Piasecki J. 2020. Data access committees. BMC Med. Ethics 21:12
    [Google Scholar]
  17. 17.
    Cho H, Wu DJ, Berger B. 2018. Secure genome-wide association analysis using multiparty computation. Nat. Biotechnol. 36:547–51
    [Google Scholar]
  18. 18.
    Clough E, Barrett T. 2016. The Gene Expression Omnibus Database. Methods Mol. Biol. 1418:93–110
    [Google Scholar]
  19. 19.
    Constable SD, Tang Y, Wang S, Jiang X, Chapin S. 2015. Privacy-preserving GWAS analysis on federated genomic datasets. BMC Med. Inform. Decis. Mak. 15:Suppl. 5S2
    [Google Scholar]
  20. 20.
    Costan V, Devadas S. 2016. Intel SGX explained. Cryptol. ePrint Arch. 2016/086. https://eprint.iacr.org/2016/086
  21. 21.
    Cragun D, Radford C, Dolinsky JS, Caldwell M, Chao E, Pal T. 2014. Panel-based testing for inherited colorectal cancer: a descriptive study of clinical testing performed by a US laboratory. Clin. Genet. 86:510–20
    [Google Scholar]
  22. 22.
    Danecek P, Auton A, Abecasis G, Albers CA, Banks E et al. 2011. The variant call format and VCFtools. Bioinformatics 27:2156–58
    [Google Scholar]
  23. 23.
    den Dunnen JT. 2017. Describing sequence variants using HGVS nomenclature. Methods Mol. Biol. 1492:243–51
    [Google Scholar]
  24. 24.
    Dursi LJ, Bozoky Z, de Borja R, Li H, Bujold D et al. 2021. CanDIG: federated network across Canada for multi-omic and health data discovery and analysis. Cell Genom. 1:100033
    [Google Scholar]
  25. 25.
    Dwork C, Roth A. 2014. The algorithmic foundations of differential privacy. Found. Trends Theor. Comput. Sci. 9:211–407
    [Google Scholar]
  26. 26.
    Dyke SOM, Knoppers BM, Hamosh A, Firth HV, Hurles M et al. 2017.. “ Matching” consent to purpose: the example of the Matchmaker Exchange. Hum. Mutat. 38:1281–85
    [Google Scholar]
  27. 27.
    Eur. Data Prot. Board 2021. Guidelines 07/2020 on the concepts of controller and processor in the GDPR. Guidel. Doc., Version 2.1, Eur. Data Prot. Board, Brussels. https://edpb.europa.eu/system/files/2021-07/eppb_guidelines_202007_controllerprocessor_final_en.pdf
  28. 28.
    Eur. Parliam 2016. Regulation (EU) 2016/679 of the European Parliament and of the Council of 27April 2016 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data, and repealing Directive 95/46/EC (General Data Protection Regulation). O.J. L 119, May 4, pp. 1–88. Corrigendum. 2018. O.J. L 127, May 23, pp. 2–5
  29. 29.
    Fashion ID GmbH & Co. KG v. Verbraucherzentrale NRW eV, Ct. Just. Eur. Union C-40/17, ECLI:EU:C:2019:629 (2018)
  30. 30.
    Finck M, Pallas F. 2020. They who must not be identified—distinguishing personal from non-personal data under the GDPR. Int. Data Priv. Law 10:11–36
    [Google Scholar]
  31. 31.
    Gal MS, Aviv O. 2020. The competitive effects of the GDPR. J. Compet. Law Econ. 16:349–91
    [Google Scholar]
  32. 32.
    Ganju K, Wang Q, Yang W, Gunter CA, Borisov N. 2018. Property inference attacks on fully connected neural networks using permutation invariant representations. Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security619–33. New York: ACM
  33. 33.
    Gentry C. 2009. A fully homomorphic encryption scheme. PhD Thesis Stanford Univ. Stanford, CA:
    [Google Scholar]
  34. 34.
    Gholami A, Lind A-S, Reichel J, Litton J-E, Edlund A, Laure E 2014. Privacy threat modeling for emerging BiobankClouds. Procedia Comput. Sci. 37:489–96
    [Google Scholar]
  35. 35.
    Goldwasser S, Micali S, Rackoff C. 2019. The knowledge complexity of interactive proof-systems. Providing Sound Foundations for Cryptography: On the Work of Shafi Goldwasser and Silvio Micali203–25. San Rafael, CA: Morgan & Claypool
    [Google Scholar]
  36. 36.
    Gourd E. 2021. GDPR obstructs cancer research data sharing. Lancet Oncol 22:592
    [Google Scholar]
  37. 37.
    Halfond WG, Viegas J, Orso A. 2006. A classification of SQL injection attacks and countermeasures. Proceedings of the IEEE International Symposium on Secure Software Engineering, Vol. 113–15. Piscataway, NJ: IEEE
  38. 38.
    Heeney C, Hawkins N, de Vries J, Boddington P, Kaye J. 2011. Assessing the privacy risks of data sharing in genomics. Public Health Genom. 14:17–25
    [Google Scholar]
  39. 39.
    Hegde M, Santani A, Mao R, Ferreira-Gonzalez A, Weck KE, Voelkerding KV. 2017. Development and validation of clinical whole-exome and whole-genome sequencing for detection of germline variants in inherited disease. Arch. Pathol. Lab. Med. 141:798–805
    [Google Scholar]
  40. 40.
    Hermel DJ, McKinnon WC, Wood ME, Greenblatt MS. 2017. Multi-gene panel testing for hereditary cancer susceptibility in a rural Familial Cancer Program. Fam. Cancer 16:159–66
    [Google Scholar]
  41. 41.
    Hintze M. 2018. Data controllers, data processors, and the growing use of connected products in the enterprise: managing risks, understanding benefits, and complying with the GDPR. SSRN 3192721 . https://doi.org/10.2139/ssrn.3192721
    [Crossref]
  42. 42.
    Homer N, Szelinger S, Redman M, Duggan D, Tembe W et al. 2008. Resolving individuals contributing trace amounts of DNA to highly complex mixtures using high-density SNP genotyping microarrays. PLOS Genet 4:e1000167
    [Google Scholar]
  43. 43.
    Hong N, Wen A, Shen F, Sohn S, Wang C et al. 2019. Developing a scalable FHIR-based clinical data normalization pipeline for standardizing and integrating unstructured and structured electronic health record data. JAMIA Open 2:570–79
    [Google Scholar]
  44. 44.
    Hudson M, Garrison NA, Sterling R, Caron NR, Fox K et al. 2020. Rights, interests and expectations: Indigenous perspectives on unrestricted access to genomic data. Nat. Rev. Genet. 21:377–84
    [Google Scholar]
  45. 45.
    Humbert M, Ayday E, Hubaux J-P, Telenti A. 2013. Addressing the concerns of the Lacks family: quantification of kin genomic privacy. CCS’13: Proceedings of the 2013 ACM SIGSAC Conference on Computer and Communications Security1141–52. New York: ACM
  46. 46.
    Jacobsen JOB, Baudis M, Baynam GS, Beckmann JS, Beltran S et al. 2022. The GA4GH Phenopacket schema defines a computable representation of clinical data. Nat. Biotechnol. 40:817–20
    [Google Scholar]
  47. 47.
    Jere MS, Farnan T, Koushanfar F. 2021. A taxonomy of attacks on federated learning. IEEE Secur. Priv. 19:20–28
    [Google Scholar]
  48. 48.
    Joly Y, Dove ES, Knoppers BM, Bobrow M, Chalmers D. 2012. Data sharing in the post-genomic world: the experience of the International Cancer Genome Consortium (ICGC) Data Access Compliance Office (DACO). PLOS Comput. Biol. 8:e1002549
    [Google Scholar]
  49. 49.
    Kessler C. 2018. Genomics and precision medicine: implications for critical care. AACN Adv. Crit. Care. 29:28–35
    [Google Scholar]
  50. 50.
    Kim M, Harmanci AO, Bossuat J-P, Carpov S, Cheon JH et al. 2021. Ultrafast homomorphic encryption models enable secure outsourcing of genotype imputation. Cell Syst 12:1108–20.e4
    [Google Scholar]
  51. 51.
    Kuo T-T, Pham A. 2021. Detecting model misconducts in decentralized healthcare federated learning. Int. J. Med. Inform. 158:104658
    [Google Scholar]
  52. 52.
    Lamport L, Lynch N. 1990. Distributed computing: models and methods. Formal Models and Semantics ed. J van Leeuwen 1157–99. Amsterdam: Elsevier
    [Google Scholar]
  53. 53.
    Li T, Sahu AK, Talwalkar A, Smith V. 2020. Federated learning: challenges, methods, and future directions. IEEE Signal Process. Mag. 37:350–60
    [Google Scholar]
  54. 54.
    Mahesh KP. 2014. Genomic sovereignty in South Africa: ethico-legal issues. MS Thesis Univ. Witwatersrand Johannesburg:
    [Google Scholar]
  55. 55.
    Mailman MD, Feolo M, Jin Y, Kimura M, Tryka K et al. 2007. The NCBI dbGaP database of genotypes and phenotypes. Nat. Genet. 39:1181–86
    [Google Scholar]
  56. 56.
    Mandl KD, Glauser T, Krantz ID, Avillach P, Bartels A et al. 2020. The Genomics Research and Innovation Network: creating an interoperable, federated, genomics learning system. Genet. Med. 22:371–80
    [Google Scholar]
  57. 57.
    Maroilley T, Tarailo-Graovac M. 2019. Uncovering missing heritability in rare diseases. Genes 10:275
    [Google Scholar]
  58. 58.
    Mascalzoni D, Bentzen HB, Budin-Ljøsne I, Bygrave LA, Bell J et al. 2019. Are requirements to deposit data in research repositories compatible with the European Union's General Data Protection Regulation?. Ann. Intern. Med. 170:332–34
    [Google Scholar]
  59. 59.
    McLennan S, Celi LA, Buyx A. 2020. COVID-19: putting the General Data Protection Regulation to the test. JMIR Public Health Surveill. 6:e19279
    [Google Scholar]
  60. 60.
    Micali S, Goldreich O, Wigderson A. 1987. How to play ANY mental game. Proceedings of the Nineteenth ACM Symposium on Theory of Computing218–29. New York: ACM
  61. 61.
    Mitra-Behura S, Fiolka RP, Daetwyler S. 2021. Singularity containers improve reproducibility and ease of use in computational image analysis workflows. Front. Bioinform. 1:757291
    [Google Scholar]
  62. 62.
    O'Connor BD, Yuen D, Chung V, Duncan AG, Liu XK et al. 2017. The Dockstore: enabling modular, community-focused sharing of Docker-based genomics tools and workflows. F1000Research 6:52
    [Google Scholar]
  63. 63.
    Popejoy AB, Fullerton SM. 2016. Genomics is failing on diversity. Nature 538:161–64
    [Google Scholar]
  64. 64.
    Purtova N. 2018. The law of everything. Broad concept of personal data and future of EU data protection law. Law Innov. Technol. 10:40–81
    [Google Scholar]
  65. 65.
    Rahimzadeh V, Lawson J, Rushton G, Dove ES. 2022. Leveraging algorithms to improve decision-making workflows for genomic data access and management. Biopreserv. Biobank. 20:429–35
    [Google Scholar]
  66. 66.
    Rehm HL, Page AJH, Smith L, Adams JB, Alterovitz G et al. 2021. GA4GH: international policies and standards for data sharing across genomic research and healthcare. Cell Genom 1:100029
    [Google Scholar]
  67. 67.
    Richards S, Aziz N, Bale S, Bick D, Das S et al. 2015. Standards and guidelines for the interpretation of sequence variants: a joint consensus recommendation of the American College of Medical Genetics and Genomics and the Association for Molecular Pathology. Genet. Med. 17:405–24
    [Google Scholar]
  68. 68.
    Roberts ME, Susswein LR, Cheng WJ, Carter NJ, Carter AC et al. 2020. Ancestry-specific hereditary cancer panel yields: moving toward more personalized risk assessment. J. Genet. Couns. 29:598–606
    [Google Scholar]
  69. 69.
    Sadat MN, Al Aziz MM, Mohammed N, Chen F, Jiang X, Wang S 2019. SAFETY: Secure gwAs in Federated Environment through a hYbrid Solution. IEEE/ACM Trans. Comput. Biol. Bioinform. 16:93–102
    [Google Scholar]
  70. 70.
    Schatz MC, Philippakis AA, Afgan E, Banks E, Carey VJ et al. 2022. Inverting the model of genomics data sharing with the NHGRI Genomic Data Science Analysis, Visualization, and Informatics Lab-space. Cell Genom 2:100085
    [Google Scholar]
  71. 71.
    Schwartz-Marín E, Méndez AA. 2012. The law of genomic sovereignty and the protection of “Mexican genetic patrimony. .” Med. Law 31:283–94
    [Google Scholar]
  72. 72.
    Selkirk CG, Vogel KJ, Newlin AC, Weissman SM, Weiss SM et al. 2014. Cancer genetic testing panels for inherited cancer susceptibility: the clinical experience of a large adult genetics practice. Fam. Cancer 13:527–36
    [Google Scholar]
  73. 73.
    Senarath A, Arachchilage NAG. 2019. A data minimization model for embedding privacy into software systems. Comput. Secur. 87:101605
    [Google Scholar]
  74. 74.
    Shabani M, Dyke SOM, Marelli L, Borry P. 2019. Variant data sharing by clinical laboratories through public databases: consent, privacy and further contact for research policies. Genet. Med. 21:1031–37
    [Google Scholar]
  75. 75.
    Sheller MJ, Edwards B, Reina GA, Martin J, Pati S et al. 2020. Federated learning in medicine: facilitating multi-institutional collaborations without sharing patient data. Sci. Rep. 10:12598
    [Google Scholar]
  76. 76.
    Sheth AP, Larson JA. 1990. Federated database systems for managing distributed, heterogeneous, and autonomous databases. ACM Comput. Surv. 22:183–236
    [Google Scholar]
  77. 77.
    Shokri R, Stronati M, Song C, Shmatikov V. 2017. Membership inference attacks against machine learning models. 2017 IEEE Symposium on Security and Privacy3–18. Piscataway, NJ: IEEE
  78. 78.
    Shringarpure SS, Bustamante CD. 2015. Privacy risks from genomic data-sharing beacons. Am. J. Hum. Genet. 97:631–46
    [Google Scholar]
  79. 79.
    Stranneheim H, Wedell A. 2016. Exome and genome sequencing: a revolution for the discovery and diagnosis of monogenic disorders. J. Intern. Med. 279:3–15
    [Google Scholar]
  80. 80.
    Sweeney L, Abu A, Winn J. 2013. Identifying participants in the Personal Genome Project by name (a re-identification experiment). arXiv:1304.7605 [cs.CY]
  81. 81.
    Thorogood A, Rehm HL, Goodhand P, Page AJH, Joly Y et al. 2021. International federation of genomic medicine databases using GA4GH standards. Cell Genom 1:100032
    [Google Scholar]
  82. 82.
    Tietosuojavaltuutettu v. Jehovan todistajatuskonnollinen yhdyskunta, Ct. Just. Eur. Union C-25/17, ECLI:EU:C:2018:551 (2018)
  83. 83.
    Tsosie KS, Yracheta JM, Kolopenuk JA, Geary J. 2021. We have “gifted” enough: Indigenous genomic data sovereignty in precision medicine. Am. J. Bioeth. 21:72–75
    [Google Scholar]
  84. 84.
    Turnbull C. 2018. Introducing whole-genome sequencing into routine cancer care: the Genomics England 100 000 Genomes Project. Ann. Oncol. 29:784–87
    [Google Scholar]
  85. 85.
    Turro E, Astle WJ, Megy K, Gräf S, Greene D et al. 2020. Whole-genome sequencing of patients with rare diseases in a national health system. Nature 583:96–102
    [Google Scholar]
  86. 86.
    Uhlerop C, Slavković A, Fienberg SE. 2013. Privacy-preserving data sharing for genome-wide association studies. J. Priv. Confid. 5:137–66
    [Google Scholar]
  87. 87.
    Unabhängiges Landeszentrum für Datenschutz Schleswig-Holstein v. Wirtschaftsakademie Schleswig-Holstein GmbH, Ct. Just. Eur. Union C-210/16, ECLI:EU:C:2018:388 (2018)
  88. 88.
    Vaske OM, Bjork I, Salama SR, Beale H, Tayi Shah A et al. 2019. Comparative tumor RNA sequencing analysis for difficult-to-treat pediatric and young adult patients with cancer. JAMA Netw. Open 2:e1913968
    [Google Scholar]
  89. 89.
    Vlahou A, Hallinan D, Apweiler R, Argiles A, Beige J et al. 2021. Data sharing under the General Data Protection Regulation: time to harmonize law and research ethics?. Hypertension 77:1029–35
    [Google Scholar]
  90. 90.
    Wagner AH, Babb L, Alterovitz G, Baudis M, Brush M et al. 2021. The GA4GH Variation Representation Specification: a computational framework for variation representation and federated identification. Cell Genom 1:100027
    [Google Scholar]
  91. 91.
    Wainschtein P, Jain D, Zheng Z, TOPMed Anthr. Work. Group, NHLBI Trans-Omics for Precis. Med. (TOPMed) Consort., et al. 2022. Assessing the contribution of rare variants to complex trait heritability from whole-genome sequence data. Nat. Genet. 54:263–73
    [Google Scholar]
  92. 92.
    Wang D, Shi S, Zhu Y, Han Z. 2022. Federated analytics: opportunities and challenges. IEEE Netw. 36:151–58
    [Google Scholar]
  93. 93.
    Warnat-Herresthal S, Schultze H, Shastry KL, Manamohan S, Mukherjee S et al. 2021. Swarm Learning for decentralized and confidential clinical machine learning. Nature 594:265–70
    [Google Scholar]
  94. 94.
    Wright CF, Ware JS, Lucassen AM, Hall A, Middleton A et al. 2019. Genomic variant sharing: a position statement. Wellcome Open Res 4:22
    [Google Scholar]
  95. 95.
    Wrigley S 2019.. “ When people just click”: addressing the difficulties of controller/processor agreements online. Legal Tech, Smart Contracts and Blockchain M Corrales, M Fenwick, H Haapio 221–52. Singapore: Springer
    [Google Scholar]
  96. 96.
    Wuyts K, Sion L, Joosen W. 2020. LINDDUN GO: a lightweight approach to privacy threat modeling. 2020 IEEE European Symposium on Security and Privacy Workshops302–9. Piscataway, NJ: IEEE
  97. 97.
    Yao AC. 1982. Protocols for secure computations. 23rd Annual Symposium on Foundations of Computer Science160–64. Piscataway, NJ: IEEE
/content/journals/10.1146/annurev-genom-110122-084756
Loading
/content/journals/10.1146/annurev-genom-110122-084756
Loading

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error