1932

Abstract

Summary statistics for genome-wide association studies (GWAS) are increasingly available for downstream analyses. Meanwhile, the popularity of causal inference methods has grown as we look to gather robust evidence for novel medical and public health interventions. This has led to the development of methods that use GWAS summary statistics for causal inference. Here, we describe these methods in order of their escalating complexity, from genetic associations to extensions of Mendelian randomization that consider thousands of phenotypes simultaneously. We also cover the assumptions and limitations of these approaches before considering the challenges faced by researchers performing causal inference using GWAS data. GWAS summary statistics constitute an important data source for causal inference research that offers a counterpoint to nongenetic methods when triangulating evidence. Continued efforts to address the challenges in using GWAS data for causal inference will allow the full impact of these approaches to be realized.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-biodatasci-122120-024910
2022-08-10
2024-12-04
Loading full text...

Full text loading...

/deliver/fulltext/biodatasci/5/1/annurev-biodatasci-122120-024910.html?itemId=/content/journals/10.1146/annurev-biodatasci-122120-024910&mimeType=html&fmt=ahah

Literature Cited

  1. 1.
    Rothman KJ, Greenland S. 2005. Causation and causal inference in epidemiology. Am. J. Public Health 95:S1S144–50
    [Google Scholar]
  2. 2.
    Hernán M, Robins J. 2020. Causal Inference: What If Boca Raton, FL: Chapman & Hall/CRC
    [Google Scholar]
  3. 3.
    Davey Smith G, Richmond R, Pingault J-B, eds. 2022. Combining Human Genetics and Causal Inference to Understand Human Disease and Development Cold Spring Harbor, NY: Cold Spring Harbor Lab.
    [Google Scholar]
  4. 4.
    Karczewski KJ, Martin AR. 2020. Analytic and translational genetics. Annu. Rev. Biomed. Data Sci. 3:217–41
    [Google Scholar]
  5. 5.
    Frayling T. 2014. Genome-wide association studies: the good, the bad and the ugly. Clin. Med. 14:4428–31
    [Google Scholar]
  6. 6.
    Ho SS, Urban AE, Mills RE. 2020. Structural variation in the sequencing era. Nat. Rev. Genet. 21:3171–89
    [Google Scholar]
  7. 7.
    Klarin D, Tsao PS, Damrauer SM. 2021. Genetic determinants of peripheral artery disease. Circ. Res. 128:121805–17
    [Google Scholar]
  8. 8.
    Vilhjálmsson BJ, Yang J, Finucane HK, Gusev A, Lindström S et al. 2015. Modeling linkage disequilibrium increases accuracy of polygenic risk scores. Am. J. Hum. Genet. 97:4576–92
    [Google Scholar]
  9. 9.
    Ritchie SC, Lambert SA, Arnold M, Teo SM, Lim S et al. 2021. Integrative analysis of the plasma proteome and polygenic risk of cardiometabolic diseases. Nat. Metabol. 3:1476–83
    [Google Scholar]
  10. 10.
    Richardson TG, Harrison S, Hemani G, Davey Smith G. 2019. An atlas of polygenic risk score associations to highlight putative causal relationships across the human phenome. eLife 8:e43657
    [Google Scholar]
  11. 11.
    Timpson NJ, Greenwood CMT, Soranzo N, Lawson DJ, Richards JB. 2018. Genetic architecture: the shape of the genetic contribution to human traits and disease. Nat. Rev. Genet. 19:2110–24
    [Google Scholar]
  12. 12.
    Schaid DJ, Chen W, Larson NB 2018. From genome-wide associations to candidate causal variants by statistical fine-mapping. Nat. Rev. Genet. 19:8491–504
    [Google Scholar]
  13. 13.
    Broekema RV, Bakker OB, Jonkers IH. A practical view of fine-mapping and gene prioritization in the post-genome-wide association era. Open Biol 10:1190221
    [Google Scholar]
  14. 14.
    Mahajan A, Taliun D, Thurner M, Robertson NR, Torres JM et al. 2018. Fine-mapping type 2 diabetes loci to single-variant resolution using high-density imputation and islet-specific epigenome maps. Nat. Genet. 50:111505–13
    [Google Scholar]
  15. 15.
    Andersson R, Gebhard C, Miguel-Escalada I, Hoof I, Bornholdt J et al. 2014. An atlas of active enhancers across human cell types and tissues. Nature 507:7493455–61
    [Google Scholar]
  16. 16.
    Giambartolomei C, Vukcevic D, Schadt EE, Franke L, Hingorani AD et al. 2014. Bayesian test for colocalisation between pairs of genetic association studies using summary statistics. PLOS Genet 10:5e1004383
    [Google Scholar]
  17. 17.
    Wallace C. 2020. Eliciting priors and relaxing the single causal variant assumption in colocalisation analyses. PLOS Genet 16:4e1008720
    [Google Scholar]
  18. 18.
    Wallace C. 2021. A more accurate method for colocalisation analysis allowing for multiple causal variants. PLOS Genet. 17:9e1009440
    [Google Scholar]
  19. 19.
    Zheng J, Haberland V, Baird D, Walker V, Haycock PC et al. 2020. Phenome-wide Mendelian randomization mapping the influence of the plasma proteome on complex diseases. Nat. Genet. 52:1122–31
    [Google Scholar]
  20. 20.
    Fortune MD, Guo H, Burren O, Schofield E, Walker NM et al. 2015. Statistical colocalization of genetic risk variants for related autoimmune diseases in the context of common controls. Nat. Genet. 47:7839–46
    [Google Scholar]
  21. 21.
    Giambartolomei C, Zhenli Liu J, Zhang W, Hauberg M, Shi H et al. 2018. A Bayesian framework for multiple trait colocalization from summary association statistics. Bioinformatics 34:152538–45
    [Google Scholar]
  22. 22.
    Foley CN, Staley JR, Breen PG, Sun BB, Kirk PDW et al. 2021. A fast and efficient colocalization algorithm for identifying shared genetic risk factors across multiple traits. Nat. Commun. 12:764
    [Google Scholar]
  23. 23.
    Zheng J, Erzurumluoglu AM, Elsworth BL, Kemp JP, Howe L et al. 2017. LD Hub: a centralized database and web interface to perform LD score regression that maximizes the potential of summary level GWAS data for SNP heritability and genetic correlation analysis. Bioinformatics 33:2272–79
    [Google Scholar]
  24. 24.
    Bulik-Sullivan BK, Loh P-R, Finucane H, Ripke S, Yang J et al. 2015. LD score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47:3291–95
    [Google Scholar]
  25. 25.
    Finucane HK, Bulik-Sullivan B, Gusev A, Trynka G, Reshef Y et al. 2015. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47:111228–35
    [Google Scholar]
  26. 26.
    Bulik-Sullivan B, Finucane HK, Anttila V, Gusev A, Day FR et al. 2015. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47:111236–41
    [Google Scholar]
  27. 27.
    Dardani C, Howe LJ, Mukhopadhyay N, Stergiakouli E, Wren Y et al. 2020. Cleft lip/palate and educational attainment: cause, consequence or correlation? A Mendelian randomization study. Int. J. Epidemiol. 49:41282–93
    [Google Scholar]
  28. 28.
    Davey Smith G, Ebrahim S 2003.. ‘ Mendelian randomization’: Can genetic epidemiology contribute to understanding environmental determinants of disease?. Int. J. Epidemiol. 32:11–22
    [Google Scholar]
  29. 29.
    Davey Smith G, Holmes MV, Davies NM, Ebrahim S 2020. Mendel's laws, Mendelian randomization and causal inference in observational data: substantive and nomenclatural issues. Eur. J. Epidemiol. 35:299–111
    [Google Scholar]
  30. 30.
    Davey Smith G, Lawlor DA, Harbord R, Timpson N, Rumley A et al. 2005. Association of C-reactive protein with blood pressure and hypertension. Arterioscler. Thromb. Vasc. Biol. 25:51051–56
    [Google Scholar]
  31. 31.
    Davey Smith G, Lawlor DA, Harbord R, Timpson N, Day I, Ebrahim S 2007. Clustered environments and randomized genes: a fundamental distinction between conventional and genetic epidemiology. PLOS Med 4:12e352
    [Google Scholar]
  32. 32.
    Davies NM, Howe LJ, Brumpton B, Havdahl A, Evans DM, Davey Smith G. 2019. Within family Mendelian randomization studies. Hum. Mol. Genet. 28:R2R170–79
    [Google Scholar]
  33. 33.
    Brumpton B, Sanderson E, Heilbron K, Hartwig FP, Harrison S et al. 2020. Avoiding dynastic, assortative mating, and population stratification biases in Mendelian randomization through within-family analyses. Nat. Commun. 11:3519
    [Google Scholar]
  34. 34.
    Davey Smith G, Hemani G 2014. Mendelian randomization: genetic anchors for causal inference in epidemiological studies. Hum. Mol. Genet. 23:R1R89–98
    [Google Scholar]
  35. 35.
    Labrecque J, Swanson SA. 2018. Understanding the assumptions underlying instrumental variable analyses: a brief review of falsification strategies and related tools. Curr. Epidemiol. Rep. 5:3214–20
    [Google Scholar]
  36. 36.
    Richmond RC, Davey Smith G. 2021. Mendelian randomization: concepts and scope. Cold Spring Harb. Perspect. Med. 12:1a040501
    [Google Scholar]
  37. 37.
    Skrivankova VW, Richmond RC, Woolf BAR, Davies NM, Swanson SA et al. 2021. Strengthening the reporting of observational studies in epidemiology using Mendelian randomisation (STROBE-MR): explanation and elaboration. BMJ 375:n2233
    [Google Scholar]
  38. 38.
    Skrivankova VW, Richmond RC, Woolf BAR, Yarmolinsky J, Davies NM et al. 2021. Strengthening the reporting of observational studies in epidemiology using Mendelian randomization: The STROBE-MR Statement. JAMA 326:161614–21
    [Google Scholar]
  39. 39.
    Lawlor DA. 2016. Commentary: two-sample Mendelian randomization: opportunities and challenges. Int. J. Epidemiol. 45:3908–15
    [Google Scholar]
  40. 40.
    Burgess S, Davies NM, Thompson SG. 2016. Bias due to participant overlap in two-sample Mendelian randomization. Genet. Epidemiol. 40:7597–608
    [Google Scholar]
  41. 41.
    Hemani G, Zheng J, Elsworth B, Wade KH, Haberland V et al. 2018. The MR-Base platform supports systematic causal inference across the human phenome. eLife 7:e34408
    [Google Scholar]
  42. 42.
    Elsworth B, Lyon M, Alexander T, Liu Y, Matthews P et al. 2020. The MRC IEU OpenGWAS data infrastructure. bioRxiv 10.1101/2020.08.10.244293. http://biorxiv.org/lookup/doi/10.1101/2020.08.10.244293
  43. 43.
    Walker VM, Davey Smith G, Davies NM, Martin RM 2017. Mendelian randomization: a novel approach for the prediction of adverse drug events and drug repurposing opportunities. Int. J. Epidemiol. 46:62078–89
    [Google Scholar]
  44. 44.
    Ference BA, Yoo W, Alesh I, Mahajan N, Mirowska KK et al. 2012. Effect of long-term exposure to lower low-density lipoprotein cholesterol beginning early in life on the risk of coronary heart disease: a Mendelian randomization analysis. J. Am. Coll. Cardiol. 60:252631–39
    [Google Scholar]
  45. 45.
    Teslovich TM, Musunuru K, Smith AV, Edmondson AC, Stylianou IM et al. 2010. Biological, clinical and population relevance of 95 loci for blood lipids. Nature 466:7307707–13
    [Google Scholar]
  46. 46.
    White J, Swerdlow DI, Preiss D, Fairhurst-Hunter Z, Keating BJ et al. 2016. Association of lipid fractions with risks for coronary artery disease and diabetes. JAMA Cardiol 1:6692–99
    [Google Scholar]
  47. 47.
    Willer CJ, Schmidt EM, Sengupta S, Peloso GM, Gustafsson S et al. 2013. Discovery and refinement of loci associated with lipid levels. Nat. Genet. 45:111274–83
    [Google Scholar]
  48. 48.
    Swerdlow DI, Preiss D, Kuchenbaecker KB, Holmes MV, Engmann JEL et al. 2015. HMG-coenzyme A reductase inhibition, type 2 diabetes, and bodyweight: evidence from genetic analysis and randomised trials. Lancet 385:9965351–61
    [Google Scholar]
  49. 49.
    Ference BA, Robinson JG, Brook RD, Catapano AL, Chapman MJ et al. 2016. Variation in PCSK9 and HMGCR and risk of cardiovascular disease and diabetes. N. Engl. J. Med. 375:222144–53
    [Google Scholar]
  50. 50.
    Holmes MV, Davey Smith G. 2017. Revealing the effect of CETP inhibition in cardiovascular disease. Nat. Rev. Cardiol. 14:11635–36
    [Google Scholar]
  51. 51.
    Holmes MV, Ala-Korpela M, Davey Smith G. 2017. Mendelian randomization in cardiometabolic disease: challenges in evaluating causality. Nat. Rev. Cardiol. 14:10577–90
    [Google Scholar]
  52. 52.
    Davey Smith G, Ebrahim S 2004. Mendelian randomization: prospects, potentials, and limitations. Int. J. Epidemiol. 33:130–42
    [Google Scholar]
  53. 53.
    Bowden J, Davey Smith G, Haycock PC, Burgess S 2016. Consistent estimation in Mendelian randomization with some invalid instruments using a weighted median estimator. Genet. Epidemiol. 40:4304–14
    [Google Scholar]
  54. 54.
    Bowden J, Davey Smith G, Burgess S 2015. Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression. Int. J. Epidemiol. 44:2512–25
    [Google Scholar]
  55. 55.
    Lawlor DA, Harbord RM, Sterne JAC, Timpson N, Davey Smith G. 2008. Mendelian randomization: using genes as instruments for making causal inferences in epidemiology. Stat. Med. 27:81133–63
    [Google Scholar]
  56. 56.
    Sanderson E, Glymour MM, Holmes MV, Kang H, Morrison J et al. 2022. Mendelian randomization. Nat. Rev. Methods Primers 2:6
    [Google Scholar]
  57. 57.
    Zhao Q, Wang J, Hemani G, Bowden J, Small DS. 2020. Statistical inference in two-sample summary-data Mendelian randomization using robust adjusted profile score. Ann. Stat 483174269
  58. 58.
    Hartwig FP, Davey Smith G, Bowden J 2017. Robust inference in summary data Mendelian randomization via the zero modal pleiotropy assumption. Int. J. Epidemiol. 46:61985–98
    [Google Scholar]
  59. 59.
    Hemani G, Tilling K, Davey Smith G. 2017. Orienting the causal relationship between imprecisely measured traits using GWAS summary data. PLOS Genet 13:11e1007081
    [Google Scholar]
  60. 60.
    Cho Y, Haycock PC, Sanderson E, Gaunt TR, Zheng J et al. 2020. Exploiting horizontal pleiotropy to search for causal pathways within a Mendelian randomization framework. Nat. Commun. 11:1010
    [Google Scholar]
  61. 61.
    Verbanck M, Chen C-Y, Neale B, Do R 2018. Detection of widespread horizontal pleiotropy in causal relationships inferred from Mendelian randomization between complex traits and diseases. Nat. Genet. 50:5693–98
    [Google Scholar]
  62. 62.
    Spiller W, Slichter D, Bowden J, Davey Smith G. 2019. Detecting and correcting for bias in Mendelian randomization analyses using Gene-by-Environment interactions. Int. J. Epidemiol. 48:3702–12
    [Google Scholar]
  63. 63.
    Davey Smith G. 2010. Mendelian randomization for strengthening causal inference in observational studies: application to Gene × Environment interactions. Perspect. Psychol. Sci. 5:5527–45
    [Google Scholar]
  64. 64.
    Davey Smith G, Davies NM, Dimou N, Egger M, Gallo V et al. 2019. STROBE-MR: guidelines for strengthening the reporting of Mendelian randomization studies. PeerJ Prepr. 7:e27857v1
    [Google Scholar]
  65. 65.
    O'Connor LJ, Price AL 2018. Distinguishing genetic correlation from causation across 52 diseases and complex traits. Nat. Genet. 50:121728–34
    [Google Scholar]
  66. 66.
    Hatoum AS, Johnson EC, Agrawal A, Bogdan R. 2021. Brain structure and problematic alcohol use: a test of plausible causation using latent causal variable analysis. Brain Imaging Behav 15:2741–45
    [Google Scholar]
  67. 67.
    Carreras-Torres R, Johansson M, Haycock PC, Relton CL, Davey Smith G et al. 2018. Role of obesity in smoking behaviour: Mendelian randomisation study in UK Biobank. BMJ 361:k1767
    [Google Scholar]
  68. 68.
    Millard LA, Davies NM, Gaunt TR, Davey Smith G, Tilling K 2018. Software application profile: PHESANT: a tool for performing automated phenome scans in UK Biobank. Int. J. Epidemiol. 47:129–35
    [Google Scholar]
  69. 69.
    Hemani G, Bowden J, Haycock P, Zheng J, Davis O, Flach P et al. 2017. Automating Mendelian randomization through machine learning to construct a putative causal map of the human phenome. bioRxiv 173682. https://doi.org/10.1101/173682
    [Crossref]
  70. 70.
    Liu Y, Elsworth B, Erola P, Haberland V, Hemani G et al. 2021. EpiGraphDB: a database and data mining platform for health data science. Bioinformatics 37:91304–11
    [Google Scholar]
  71. 71.
    Millard LAC, Davies NM, Timpson NJ, Tilling K, Flach PA, Davey Smith G. 2015. MR-PheWAS: hypothesis prioritization among potential causal effects of body mass index on many outcomes, using Mendelian randomization. Sci. Rep. 5:16645
    [Google Scholar]
  72. 72.
    Relton CL, Davey Smith G. 2012. Two-step epigenetic Mendelian randomization: a strategy for establishing the causal role of epigenetic processes in pathways to disease. Int. J. Epidemiol. 41:1161–76
    [Google Scholar]
  73. 73.
    Gill D, Zuber V, Dawson J, Pearson-Stuttard J, Carter AR et al. 2020. Risk factors mediating the effect of body-mass index and waist-to-hip ratio on cardiovascular outcomes: Mendelian randomization analysis. Int. J. Obes. 45:1428–38
    [Google Scholar]
  74. 74.
    Walker V, Vujkovic M, Carter AR, Davies NM, Udler M et al. 2022. Separating the direct effects of risk factors for atherosclerotic cardiovascular disease from those mediated by type 2 diabetes. Diabetologia 6579099
  75. 75.
    Carter AR, Sanderson E, Hammerton G, Richmond RC, Davey Smith G et al. 2021. Mendelian randomisation for mediation analysis: current methods and challenges for implementation. Eur. J. Epidemiol. 36:5465–78
    [Google Scholar]
  76. 76.
    Anderson EL, Howe LD, Wade KH, Ben-Shlomo Y, Hill WD et al. 2020. Education, intelligence and Alzheimer's disease: evidence from a multivariable two-sample Mendelian randomization study. Int. J. Epidemiol. 49:41163–72
    [Google Scholar]
  77. 77.
    Sanderson E, Davey Smith G, Windmeijer F, Bowden J 2019. An examination of multivariable Mendelian randomization in the single-sample and two-sample summary data settings. Int. J. Epidemiol. 48:3713–27
    [Google Scholar]
  78. 78.
    Richardson TG, Sanderson E, Elsworth B, Tilling K, Davey Smith G. 2020. Use of genetic variation to separate the effects of early and later life adiposity on disease risk: Mendelian randomisation study. BMJ 369:m1203
    [Google Scholar]
  79. 79.
    Richardson TG, Mykkänen J, Pahkala K, Ala-Korpela M, Bell JA et al. 2021. Evaluating the direct effects of childhood adiposity on adult systemic metabolism: a multivariable Mendelian randomization analysis. Int. J. Epidemiol. 50:51580–92
    [Google Scholar]
  80. 80.
    Power GM, Tyrrell J, Frayling TM, Davey Smith G, Richardson TG 2021. Mendelian randomization analyses suggest childhood body size indirectly influences end points from across the cardiovascular disease spectrum through adult body size. J. Am. Heart Assoc. 10:17e021503
    [Google Scholar]
  81. 81.
    Allen NE, Sudlow C, Peakman T, Collins R, Biobank UK 2014. UK Biobank data: Come and get it. Sci. Transl. Med. 6:224224ed4
    [Google Scholar]
  82. 82.
    Millard LAC, Munafò MR, Tilling K, Wootton RE, Davey Smith G. 2019. MR-pheWAS with stratification and interaction: searching for the causal effects of smoking heaviness identified an effect on facial aging. PLOS Genet 15:10e1008353
    [Google Scholar]
  83. 83.
    Wyss AB, Sofer T, Lee MK, Terzikhan N, Nguyen JN et al. 2018. Multiethnic meta-analysis identifies ancestry-specific and cross-ancestry loci for pulmonary function. Nat. Commun. 9:2976
    [Google Scholar]
  84. 84.
    Gaziano JM, Concato J, Brophy M, Fiore L, Pyarajan S et al. 2016. Million Veteran Program: a mega-biobank to study genetic influences on health and disease. J. Clin. Epidemiol. 70:214–23
    [Google Scholar]
  85. 85.
    MacArthur J, Bowler E, Cerezo M, Gil L, Hall P et al. 2017. The new NHGRI-EBI Catalog of published genome-wide association studies. Nucleic Acids Res 45:D896–901
    [Google Scholar]
  86. 86.
    Buniello A 2021. Sharing is caring: why we need more freely available cancer GWAS summary statistics. Open Targets Blog Mar. 11. https://blog.opentargets.org/open-sharing-of-cancer-summary-statistics/
    [Google Scholar]
  87. 87.
    Paternoster L, Tilling K, Davey Smith G. 2017. Genetic epidemiology and Mendelian randomization for informing disease therapeutics: conceptual and methodological challenges. PLOS Genet 13:10e1006944
    [Google Scholar]
  88. 88.
    Tudball MJ, Bowden J, Hughes RA, Ly A, Munafò MR et al. 2021. Mendelian randomisation with coarsened exposures. Genet. Epidemiol. 45:3338–50
    [Google Scholar]
  89. 89.
    Lyon MS, Andrews SJ, Elsworth B, Gaunt TR, Hemani G, Marcora E. 2021. The variant call format provides efficient and robust storage of GWAS summary statistics. Genome Biol 22:32
    [Google Scholar]
  90. 90.
    Wootton RE, Sallis HM. 2020. Let's call it the effect allele: a suggestion for GWAS naming conventions. Int. J. Epidemiol. 49:51734–35
    [Google Scholar]
  91. 91.
    Munafò M, Davey Smith G. 2018. Robust research needs many lines of evidence. Nature 553:399–401
    [Google Scholar]
  92. 92.
    Munafò MR, Higgins JPT, Davey Smith G. 2021. Triangulating evidence through the inclusion of genetically informed designs. Cold Spring Harb. Perspect. Med. 11:8a040659
    [Google Scholar]
/content/journals/10.1146/annurev-biodatasci-122120-024910
Loading
/content/journals/10.1146/annurev-biodatasci-122120-024910
Loading

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error