1932

Abstract

Since the initial success of genome-wide association studies (GWAS) in 2005, tens of thousands of genetic variants have been identified for hundreds of human diseases and traits. In a GWAS, genotype information at up to millions of genetic markers is collected from up to hundreds of thousands of individuals, together with their phenotype information. Several scientific goals can be accomplished through the analysis of GWAS data, including the identification of variants, genes, and pathways associated with diseases and traits of interest; the inference of the genetic architecture of these traits; and the development of genetic risk prediction models. In this review, we provide an overview of the statistical challenges in achieving these goals and recent progress in statistical methodology to address these challenges.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-biodatasci-030320-041026
2020-07-20
2024-04-25
Loading full text...

Full text loading...

/deliver/fulltext/biodatasci/3/1/annurev-biodatasci-030320-041026.html?itemId=/content/journals/10.1146/annurev-biodatasci-030320-041026&mimeType=html&fmt=ahah

Literature Cited

  1. 1. 
    Klein RJ, Zeiss C, Chew EY, Tsai JY, Sackler RS et al. 2005. Complement factor H polymorphism in age-related macular degeneration. Science 308:385–89
    [Google Scholar]
  2. 2. 
    Laurie CC, Doheny KF, Mirel DB, Pugh EW, Bierut LJ et al. 2010. Quality control and quality assurance in genotypic data for genome-wide association studies. Genet. Epidemiol. 34:591–602
    [Google Scholar]
  3. 3. 
    Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA et al. 2007. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81:559–75
    [Google Scholar]
  4. 4. 
    Yang J, Lee SH, Goddard ME, Visscher PM 2011. GCTA: a tool for genome-wide complex trait analysis. Am. J. Hum. Genet. 88:76–82
    [Google Scholar]
  5. 5. 
    Zhou X, Stephens M. 2012. Genome-wide efficient mixed-model analysis for association studies. Nat. Genet. 44:821–24
    [Google Scholar]
  6. 6. 
    Loh PR, Tucker G, Bulik-Sullivan BK, Vilhjalmsson BJ, Finucane HK et al. 2015. Efficient Bayesian mixed-model analysis increases association power in large cohorts. Nat. Genet. 47:284–90
    [Google Scholar]
  7. 7. 
    Loh PR, Kichaev G, Gazal S, Schoech AP, Price AL 2018. Mixed-model association for biobank-scale datasets. Nat. Genet. 50:906–8
    [Google Scholar]
  8. 8. 
    Visscher PM, Wray NR, Zhang Q, Sklar P, McCarthy MI et al. 2017. 10 years of GWAS discovery: biology, function, and translation. Am. J. Hum. Genet. 101:5–22
    [Google Scholar]
  9. 9. 
    Boyle EA, Li YI, Pritchard JK 2017. An expanded view of complex traits: from polygenic to omnigenic. Cell 169:1177–86
    [Google Scholar]
  10. 10. 
    Hindorff LA, Sethupathy P, Junkins HA, Ramos EM, Mehta JP et al. 2009. Potential etiologic and functional implications of genome-wide association loci for human diseases and traits. PNAS 106:9362–67
    [Google Scholar]
  11. 11. 
    Watanabe K, Taskesen E, van Bochoven A, Posthuma D 2017. Functional mapping and annotation of genetic associations with FUMA. Nat. Commun. 8:1826
    [Google Scholar]
  12. 12. 
    Bernstein BE, Birney E, Dunham I, Green ED, Gunter C, Snyder M 2012. An integrated encyclopedia of DNA elements in the human genome. Nature 489:57–74
    [Google Scholar]
  13. 13. 
    Kundaje A, Meuleman W, Ernst J, Bilenky M, Yen A et al. 2015. Integrative analysis of 111 reference human epigenomes. Nature 518:317–30
    [Google Scholar]
  14. 14. 
    Ardlie KG, Deluca DS, Segrè AV, Sullivan TJ, Young TR et al. 2015. The Genotype-Tissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science 348:648–60
    [Google Scholar]
  15. 15. 
    Aguet F, Ardlie KG, Cummings BB, Gelfand ET, Getz G et al. 2017. Genetic effects on gene expression across human tissues. Nature 550:204–13
    [Google Scholar]
  16. 16. 
    Adzhubei IA, Schmidt S, Peshkin L, Ramensky VE, Gerasimova A et al. 2010. A method and server for predicting damaging missense mutations. Nat. Methods 7:248–49
    [Google Scholar]
  17. 17. 
    MacArthur DG, Balasubramanian S, Frankish A, Huang N, Morris J et al. 2012. A systematic survey of loss-of-function variants in human protein-coding genes. Science 335:823–28
    [Google Scholar]
  18. 18. 
    Cooper GM, Stone EA, Asimenos G, Green ED, Batzoglou S, Sidow A 2005. Distribution and intensity of constraint in mammalian genomic sequence. Genome Res 15:901–13
    [Google Scholar]
  19. 19. 
    Davydov EV, Goode DL, Sirota M, Cooper GM, Sidow A, Batzoglou S 2010. Identifying a high fraction of the human genome to be under selective constraint using GERP++. PLOS Comput. Biol. 6:e1001025
    [Google Scholar]
  20. 20. 
    Pollard KS, Hubisz MJ, Rosenbloom KR, Siepel A 2010. Detection of nonneutral substitution rates on mammalian phylogenies. Genome Res 20:110–21
    [Google Scholar]
  21. 21. 
    Finucane HK, Bulik-Sullivan B, Gusev A, Trynka G, Reshef Y et al. 2015. Partitioning heritability by functional annotation using genome-wide association summary statistics. Nat. Genet. 47:1228–35
    [Google Scholar]
  22. 22. 
    Lindblad-Toh K, Garber M, Zuk O, Lin MF, Parker BJ et al. 2011. A high-resolution map of human evolutionary constraint using 29 mammals. Nature 478:476–82
    [Google Scholar]
  23. 23. 
    Lambert J-C, Ibrahim-Verbaas CA, Harold D, Naj AC, Sims R et al. 2013. Meta-analysis of 74,046 individuals identifies 11 new susceptibility loci for Alzheimer's disease. Nat. Genet. 45:1452–58
    [Google Scholar]
  24. 24. 
    Furberg H, Kim Y, Dackor J, Boerwinkle E, Franceschini N et al. 2010. Genome-wide meta-analyses identify multiple loci associated with smoking behavior. Nat. Genet. 42:441–47
    [Google Scholar]
  25. 25. 
    Akbarian S, Liu C, Knowles JA, Vaccarino FM, Farnham PJ et al. 2015. The PsychENCODE project. Nat. Neurosci. 18:1707–12
    [Google Scholar]
  26. 26. 
    Lawrence M, Daujat S, Schneider R 2016. Lateral thinking: how histone modifications regulate gene expression. Trends Genet 32:42–56
    [Google Scholar]
  27. 27. 
    Trynka G, Sandor C, Han B, Xu H, Stranger BE et al. 2013. Chromatin marks identify critical cell types for fine mapping complex trait variants. Nat. Genet. 45:124–30
    [Google Scholar]
  28. 28. 
    Boyle AP, Hong EL, Hariharan M, Cheng Y, Schaub MA et al. 2012. Annotation of functional variation in personal genomes using RegulomeDB. Genome Res 22:1790–97
    [Google Scholar]
  29. 29. 
    Wang K, Li M, Hakonarson H 2010. ANNOVAR: functional annotation of genetic variants from high-throughput sequencing data. Nucleic Acids Res 38:e164
    [Google Scholar]
  30. 30. 
    Maurano MT, Humbert R, Rynes E, Thurman RE, Haugen E et al. 2012. Systematic localization of common disease-associated variation in regulatory DNA. Science 337:1190–95
    [Google Scholar]
  31. 31. 
    Hou L, Ma T, Zhao H 2014. Incorporating functional annotation information in prioritizing disease associated SNPs from genome wide association studies. Sci. China Life Sci. 57:1072–79
    [Google Scholar]
  32. 32. 
    Kindt AS, Navarro P, Semple CA, Haley CS 2013. The genomic signature of trait-associated variants. BMC Genom 14:108
    [Google Scholar]
  33. 33. 
    Kircher M, Witten DM, Jain P, O'Roak BJ, Cooper GM, Shendure J 2014. A general framework for estimating the relative pathogenicity of human genetic variants. Nat. Genet. 46:310–15
    [Google Scholar]
  34. 34. 
    Ritchie GR, Dunham I, Zeggini E, Flicek P 2014. Functional annotation of noncoding sequence variants. Nat. Methods 11:294–96
    [Google Scholar]
  35. 35. 
    Lee D, Gorkin DU, Baker M, Strober BJ, Asoni AL et al. 2015. A method to predict the impact of regulatory variants from DNA sequence. Nat. Genet. 47:955–61
    [Google Scholar]
  36. 36. 
    Zhou J, Troyanskaya OG. 2015. Predicting effects of noncoding variants with deep learning–based sequence model. Nat. Methods 12:931–34
    [Google Scholar]
  37. 37. 
    Ernst J, Kellis M. 2010. Discovery and characterization of chromatin states for systematic annotation of the human genome. Nat. Biotechnol. 28:817–25
    [Google Scholar]
  38. 38. 
    Ernst J, Kellis M. 2012. ChromHMM: automating chromatin-state discovery and characterization. Nat. Methods 9:215–16
    [Google Scholar]
  39. 39. 
    Chan RCW, Libbrecht MW, Roberts EG, Bilmes JA, Noble WS, Hoffman MM 2018. Segway 2.0: Gaussian mixture models and minibatch training. Bioinformatics 34:669–71
    [Google Scholar]
  40. 40. 
    Hoffman MM, Ernst J, Wilder SP, Kundaje A, Harris RS et al. 2012. Integrative annotation of chromatin elements from ENCODE data. Nucleic Acids Res 41:827–41
    [Google Scholar]
  41. 41. 
    Lu Q, Hu Y, Sun J, Cheng Y, Cheung K-H, Zhao H 2015. A statistical framework to predict functional non-coding regions in the human genome through integrated analysis of annotation data. Sci. Rep. 5:10576
    [Google Scholar]
  42. 42. 
    Ionita-Laza I, McCallum K, Xu B, Buxbaum JD 2016. A spectral approach integrating functional genomic annotations for coding and noncoding variants. Nat. Genet. 48:214–20
    [Google Scholar]
  43. 43. 
    Parisi F, Strino F, Nadler B, Kluger Y 2014. Ranking and combining multiple predictors without labeled data. PNAS 111:1253–58
    [Google Scholar]
  44. 44. 
    Lu Q, Powles RL, Wang Q, He BJ, Zhao H 2016. Integrative tissue-specific functional annotations in the human genome provide novel insights on many complex traits and improve signal prioritization in genome wide association studies. PLOS Genet 12:e1005947
    [Google Scholar]
  45. 45. 
    Lu Q, Powles RL, Abdallah S, Ou D, Wang Q et al. 2017. Systematic tissue-specific functional annotation of the human genome highlights immune-related DNA elements for late-onset Alzheimer's disease. PLOS Genet 13:e1006933
    [Google Scholar]
  46. 46. 
    Backenroth D, He Z, Kiryluk K, Boeva V, Pethukova L et al. 2018. FUN-LDA: a latent Dirichlet allocation model for predicting tissue-specific functional effects of noncoding variation: methods and applications. Am. J. Hum. Genet. 102:920–42
    [Google Scholar]
  47. 47. 
    Singleton MV, Guthery SL, Voelkerding KV, Chen K, Kennedy B et al. 2014. Phevor combines multiple biomedical ontologies for accurate identification of disease-causing alleles in single individuals and small nuclear families. Am. J. Hum. Genet. 94:599–610
    [Google Scholar]
  48. 48. 
    Javed A, Agrawal S, Ng PC 2014. Phen-Gen: combining phenotype and genotype to analyze rare disorders. Nat. Methods 11:935–37
    [Google Scholar]
  49. 49. 
    Chen L, Jin P, Qin ZS 2016. DIVAN: accurate identification of non-coding disease-specific risk variants using multi-omics profiles. Genome Biol 17:252
    [Google Scholar]
  50. 50. 
    Bodea CA, Mitchell AA, Bloemendal A, Day-Williams AG, Runz H, Sunyaev SR 2018. PINES: phenotype-informed tissue weighting improves prediction of pathogenic noncoding variants. Genome Biol 19:173
    [Google Scholar]
  51. 51. 
    Voineagu I, Wang X, Johnston P, Lowe JK, Tian Y et al. 2011. Transcriptomic analysis of autistic brain reveals convergent molecular pathology. Nature 474:380–84
    [Google Scholar]
  52. 52. 
    Hou L, Chen M, Zhang CK, Cho J, Zhao H 2014. Guilt by rewiring: gene prioritization through network rewiring in genome wide association studies. Hum. Mol. Genet. 23:2780–90
    [Google Scholar]
  53. 53. 
    Bader GD, Betel D, Hogue CW 2003. BIND: the Biomolecular Interaction Network Database. Nucleic Acids Res 31:248–50
    [Google Scholar]
  54. 54. 
    Chatr-Aryamontri A, Breitkreutz BJ, Heinicke S, Boucher L, Winter A et al. 2013. The BioGRID interaction database: 2013 update. Nucleic Acids Res 41:D816–23
    [Google Scholar]
  55. 55. 
    Peri S, Navarro JD, Kristiansen TZ, Amanchy R, Surendranath V et al. 2004. Human protein reference database as a discovery resource for proteomics. Nucleic Acids Res 32:D497–501
    [Google Scholar]
  56. 56. 
    Fromer M, Roussos P, Sieberts SK, Johnson JS, Kavanagh DH et al. 2016. Gene expression elucidates functional impact of polygenic risk for schizophrenia. Nat. Neurosci. 19:1442–53
    [Google Scholar]
  57. 57. 
    Franzén O, Ermel R, Cohain A, Akers NK, Di Narzo A et al. 2016. Cardiometabolic risk loci share downstream cis- and trans-gene regulation across tissues and diseases. Science 353:827–30
    [Google Scholar]
  58. 58. 
    Finucane HK, Reshef YA, Anttila V, Slowikowski K, Gusev A et al. 2018. Heritability enrichment of specifically expressed genes identifies disease-relevant tissues and cell types. Nat. Genet. 50:621–29
    [Google Scholar]
  59. 59. 
    Nicolae DL, Gamazon E, Zhang W, Duan S, Dolan ME, Cox NJ 2010. Trait-associated SNPs are more likely to be eQTLs: annotation to enhance discovery from GWAS. PLOS Genet 6:e1000888
    [Google Scholar]
  60. 60. 
    Mehta D, Heim K, Herder C, Carstensen M, Eckstein G et al. 2013. Impact of common regulatory single-nucleotide variants on gene expression profiles in whole blood. Eur. J. Hum. Genet. 21:48–54
    [Google Scholar]
  61. 61. 
    Roeder K, Bacanu SA, Wasserman L, Devlin B 2006. Using linkage genome scans to improve power of association in genome scans. Am. J. Hum. Genet. 78:243–52
    [Google Scholar]
  62. 62. 
    Lu Q, Yao X, Hu Y, Zhao H 2016. GenoWAP: GWAS signal prioritization through integrated analysis of genomic functional annotation. Bioinformatics 32:542–48
    [Google Scholar]
  63. 63. 
    Lin WY, Lee WC. 2012. Improving power of genome-wide association studies with weighted false discovery rate control and prioritized subset analysis. PLOS ONE 7:e33716
    [Google Scholar]
  64. 64. 
    Saccone SF, Saccone NL, Swan GE, Madden PA, Goate AM et al. 2008. Systematic biological prioritization after a genome-wide association study: an application to nicotine dependence. Bioinformatics 24:1805–11
    [Google Scholar]
  65. 65. 
    Chen GK, Witte JS. 2007. Enriching the analysis of genomewide association studies with hierarchical modeling. Am. J. Hum. Genet. 81:397–404
    [Google Scholar]
  66. 66. 
    Heron EA, O'Dushlaine C, Segurado R, Gallagher L, Gill M 2011. Exploration of empirical Bayes hierarchical modeling for the analysis of genome-wide association study data. Biostatistics 12:445–61
    [Google Scholar]
  67. 67. 
    Lewinger JP, Conti DV, Baurley JW, Triche TJ, Thomas DC 2007. Hierarchical Bayes prioritization of marker associations from a genome-wide association scan for further investigation. Genet. Epidemiol. 31:871–82
    [Google Scholar]
  68. 68. 
    Fridley BL, Serie D, Jenkins G, White K, Bamlet W et al. 2010. Bayesian mixture models for the incorporation of prior knowledge to inform genetic association studies. Genet. Epidemiol. 34:418–26
    [Google Scholar]
  69. 69. 
    Fridley BL, Iversen E, Tsai YY, Jenkins GD, Goode EL, Sellers TA 2011. A latent model for prioritization of SNPs for functional studies. PLOS ONE 6:e20764
    [Google Scholar]
  70. 70. 
    Ming J, Dai M, Cai M, Wan X, Liu J, Yang C 2018. LSMM: a statistical approach to integrating functional annotations with genome-wide association studies. Bioinformatics 34:2788–96
    [Google Scholar]
  71. 71. 
    Wang K, Li M, Bucan M 2007. Pathway-based approaches for analysis of genomewide association studies. Am. J. Hum. Genet. 81:1278–83
    [Google Scholar]
  72. 72. 
    Ballard DH, Cho J, Zhao H 2010. Comparisons of multi-marker association methods to detect association between a candidate region and disease. Genet. Epidemiol. 34:201–12
    [Google Scholar]
  73. 73. 
    Chun H, Ballard DH, Cho J, Zhao H 2011. Identification of association between disease and multiple markers via sparse partial least-squares regression. Genet. Epidemiol. 35:479–86
    [Google Scholar]
  74. 74. 
    de Leeuw CA, Mooij JM, Heskes T, Posthuma D 2015. MAGMA: generalized gene-set analysis of GWAS data. PLOS Comput. Biol. 11:e1004219
    [Google Scholar]
  75. 75. 
    Wu MC, Lee S, Cai T, Li Y, Boehnke M, Lin X 2011. Rare-variant association testing for sequencing data with the sequence kernel association test. Am. J. Hum. Genet. 89:82–93
    [Google Scholar]
  76. 76. 
    Ionita-Laza I, Lee S, Makarov V, Buxbaum JD, Lin X 2013. Sequence kernel association tests for the combined effect of rare and common variants. Am. J. Hum. Genet. 92:841–53
    [Google Scholar]
  77. 77. 
    Liu JZ, McRae AF, Nyholt DR, Medland SE, Wray NR et al. 2010. A versatile gene-based test for genome-wide association studies. Am. J. Hum. Genet. 87:139–45
    [Google Scholar]
  78. 78. 
    Li M-X, Gui H-S, Kwan JSH, Sham PC 2011. GATES: a rapid and powerful gene-based association test using extended Simes procedure. Am. J. Hum. Genet. 88:283–93
    [Google Scholar]
  79. 79. 
    Bacanu SA. 2012. On optimal gene-based analysis of genome scans. Genet. Epidemiol. 36:333–39
    [Google Scholar]
  80. 80. 
    Dinu V, Miller PL, Zhao H 2007. Evidence for association between multiple complement pathway genes and AMD. Genet. Epidemiol. 31:224–37
    [Google Scholar]
  81. 81. 
    Ballard D, Abraham C, Cho J, Zhao H 2010. Pathway analysis comparison using Crohn's disease genome wide association studies. BMC Med. Genom. 3:25
    [Google Scholar]
  82. 82. 
    Lango Allen H, Estrada K, Lettre G, Berndt SI, Weedon MN et al. 2010. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 467:832–38
    [Google Scholar]
  83. 83. 
    Shahbaba B, Shachaf CM, Yu Z 2012. A pathway analysis method for genome-wide association studies. Stat. Med. 31:988–1000
    [Google Scholar]
  84. 84. 
    Wang L, Jia P, Wolfinger RD, Chen X, Grayson BL et al. 2011. An efficient hierarchical generalized linear mixed model for pathway analysis of genome-wide association studies. Bioinformatics 27:686–92
    [Google Scholar]
  85. 85. 
    Zhang W, Chen Y, Sun F, Jiang R 2011. DomainRBF: a Bayesian regression approach to the prioritization of candidate domains for complex diseases. BMC Syst. Biol. 5:55
    [Google Scholar]
  86. 86. 
    Sun R, Hui S, Bader GD, Lin X, Kraft P 2019. Powerful gene set analysis in GWAS with the Generalized Berk-Jones statistic. PLOS Genet 15:e1007530
    [Google Scholar]
  87. 87. 
    Erten S, Bebek G, Koyutürk M 2011. Vavien: an algorithm for prioritizing candidate disease genes based on topological similarity of proteins in interaction networks. J. Comput. Biol. 18:1561–74
    [Google Scholar]
  88. 88. 
    Chen J, Aronow BJ, Jegga AG 2009. Disease candidate gene identification and prioritization using protein interaction networks. BMC Bioinform 10:73
    [Google Scholar]
  89. 89. 
    White S, Smyth P. 2003. Algorithms for estimating relative importance in networks. Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining266–75 New York: Assoc. Comput. Mach.
    [Google Scholar]
  90. 90. 
    Chen J, Bardes EE, Aronow BJ, Jegga AG 2009. ToppGene Suite for gene list enrichment analysis and candidate gene prioritization. Nucleic Acids Res 37:W305–11
    [Google Scholar]
  91. 91. 
    Navlakha S, Kingsford C. 2010. The power of protein interaction networks for associating genes with diseases. Bioinformatics 26:1057–63
    [Google Scholar]
  92. 92. 
    Guney E, Oliva B. 2012. Exploiting protein-protein interaction networks for genome-wide disease-gene prioritization. PLOS ONE 7:e43557
    [Google Scholar]
  93. 93. 
    Jia P, Zheng S, Long J, Zheng W, Zhao Z 2010. dmGWAS: dense module searching for genome-wide association studies in protein-protein interaction networks. Bioinformatics 27:95–102
    [Google Scholar]
  94. 94. 
    Chen M, Cho J, Zhao H 2011. Incorporating biological pathways via a Markov random field model in genome-wide association studies. PLOS Genet 7:e1001353
    [Google Scholar]
  95. 95. 
    Gamazon ER, Wheeler HE, Shah KP, Mozaffari SV, Aquino-Michaels K et al. 2015. A gene-based association method for mapping traits using reference transcriptome data. Nat. Genet. 47:1091–98
    [Google Scholar]
  96. 96. 
    Gusev A, Ko A, Shi H, Bhatia G, Chung W et al. 2016. Integrative approaches for large-scale transcriptome-wide association studies. Nat. Genet. 48:245–52
    [Google Scholar]
  97. 97. 
    Barbeira AN, Dickinson SP, Bonazzola R, Zheng J, Wheeler HE et al. 2018. Exploring the phenotypic consequences of tissue specific gene expression variation inferred from GWAS summary statistics. Nat. Commun. 9:1825
    [Google Scholar]
  98. 98. 
    Mancuso N, Shi H, Goddard P, Kichaev G, Gusev A, Pasaniuc B 2017. Integrating gene expression with summary association statistics to identify genes associated with 30 complex traits. Am. J. Hum. Genet. 100:473–87
    [Google Scholar]
  99. 99. 
    Xu Z, Wu C, Wei P, Pan W 2017. A powerful framework for integrating eQTL and GWAS summary data. Genetics 207:893–902
    [Google Scholar]
  100. 100. 
    Nagpal S, Meng X, Epstein MP, Tsoi LC, Patrick M et al. 2019. TIGAR: an improved Bayesian tool for transcriptomic data imputation enhances gene mapping of complex traits. Am. J. Hum. Genet. 105:258–66
    [Google Scholar]
  101. 101. 
    Hu Y, Li M, Lu Q, Weng H, Wang J et al. 2019. A statistical framework for cross-tissue transcriptome-wide association analysis. Nat. Genet. 51:568–76
    [Google Scholar]
  102. 102. 
    Flutre T, Wen X, Pritchard J, Stephens M 2013. A statistical framework for joint eQTL analysis in multiple tissues. PLOS Genet 9:e1003486
    [Google Scholar]
  103. 103. 
    Raj T, Li YI, Wong G, Humphrey J, Wang M et al. 2018. Integrative transcriptome analyses of the aging brain implicate altered splicing in Alzheimer's disease susceptibility. Nat. Genet. 50:1584–92
    [Google Scholar]
  104. 104. 
    Mancuso N, Freund MK, Johnson R, Shi H, Kichaev G et al. 2019. Probabilistic fine-mapping of transcriptome-wide association studies. Nat. Genet. 51:675–82
    [Google Scholar]
  105. 105. 
    Freudenberg J, Propping P. 2002. A similarity-based method for genome-wide prediction of disease-relevant human genes. Bioinformatics 18:Suppl. 2S110–15
    [Google Scholar]
  106. 106. 
    Lettre G, Rioux JD. 2008. Autoimmune diseases: insights from genome-wide association studies. Hum. Mol. Genet. 17:R116–21
    [Google Scholar]
  107. 107. 
    Visscher PM, Yang J. 2016. A plethora of pleiotropy across complex traits. Nat. Genet. 48:707–8
    [Google Scholar]
  108. 108. 
    van Rheenen W, Peyrot WJ, Schork AJ, Lee SH, Wray NR 2019. Genetic correlations of polygenic disease traits: from theory to practice. Nat. Rev. Genet. 20:567–81
    [Google Scholar]
  109. 109. 
    Zhou X, Stephens M. 2014. Efficient multivariate linear mixed model algorithms for genome-wide association studies. Nat. Methods 11:407–9
    [Google Scholar]
  110. 110. 
    Stephens M. 2013. A unified framework for association analysis with multiple related phenotypes. PLOS ONE 8:e65245
    [Google Scholar]
  111. 111. 
    Chung D, Yang C, Li C, Gelernter J, Zhao H 2014. GPA: a statistical approach to prioritizing GWAS results by integrating pleiotropy and annotation. PLOS Genet 10:e1004787
    [Google Scholar]
  112. 112. 
    Turley P, Walters RK, Maghzian O, Okbay A, Lee JJ et al. 2018. Multi-trait analysis of genome-wide association summary statistics using MTAG. Nat. Genet. 50:229–37
    [Google Scholar]
  113. 113. 
    Int. Schizophr. Consort., Purcell SM, Wray NR, Stone JL, Visscher PM et al. 2009. Common polygenic variation contributes to risk of schizophrenia and bipolar disorder. Nature 460:748–52
    [Google Scholar]
  114. 114. 
    Yengo L, Sidorenko J, Kemper KE, Zheng Z, Wood AR et al. 2018. Meta-analysis of genome-wide association studies for height and body mass index in approximately 700000 individuals of European ancestry. Hum. Mol. Genet. 27:3641–49
    [Google Scholar]
  115. 115. 
    Manolio TA, Collins FS, Cox NJ, Goldstein DB, Hindorff LA et al. 2009. Finding the missing heritability of complex diseases. Nature 461:747–53
    [Google Scholar]
  116. 116. 
    Yang J, Benyamin B, McEvoy BP, Gordon S, Henders AK et al. 2010. Common SNPs explain a large proportion of the heritability for human height. Nat. Genet. 42:565–69
    [Google Scholar]
  117. 117. 
    Jiang J, Li C, Paul D, Yang C, Zhao H 2016. On high-dimensional misspecified mixed model analysis in genome-wide association study. Ann. Stat. 44:2127–60
    [Google Scholar]
  118. 118. 
    Pasaniuc B, Price AL. 2017. Dissecting the genetics of complex traits using summary association statistics. Nat. Rev. Genet. 18:117–27
    [Google Scholar]
  119. 119. 
    Bulik-Sullivan BK, Loh P-R, Finucane HK, Ripke S, Yang J et al. 2015. LD Score regression distinguishes confounding from polygenicity in genome-wide association studies. Nat. Genet. 47:291–95
    [Google Scholar]
  120. 120. 
    Zhou X. 2017. A unified framework for variance component estimation with summary statistics in genome-wide association studies. Ann. Appl. Stat. 11:2027–51
    [Google Scholar]
  121. 121. 
    Holmes JB, Speed D, Balding DJ 2019. Summary statistic analyses can mistake confounding bias for heritability. Genet. Epidemiol. 43:930–40
    [Google Scholar]
  122. 122. 
    Speed D, Hemani G, Johnson MR, Balding DJ 2012. Improved heritability estimation from genome-wide SNPs. Am. J. Hum. Genet. 91:1011–21
    [Google Scholar]
  123. 123. 
    Speed D, Cai N, UCELB Consort., Johnson MR, Nejentsev S, Balding DJ 2017. Reevaluation of SNP heritability in complex human traits. Nat. Genet 49:986–92
    [Google Scholar]
  124. 124. 
    Speed D, Balding DJ. 2019. SumHer better estimates the SNP heritability of complex traits from summary statistics. Nat. Genet. 51:277–84
    [Google Scholar]
  125. 125. 
    Yang J, Manolio TA, Pasquale LR, Boerwinkle E, Caporaso N et al. 2011. Genome partitioning of genetic variation for complex traits using common SNPs. Nat. Genet. 43:519–25
    [Google Scholar]
  126. 126. 
    Yang J, Bakshi A, Zhu Z, Hemani G, Vinkhuyzen AA et al. 2015. Genetic variance estimation with imputed variants finds negligible missing heritability for human height and body mass index. Nat. Genet. 47:1114–20
    [Google Scholar]
  127. 127. 
    Lee SH, Yang J, Goddard ME, Visscher PM, Wray NR 2012. Estimation of pleiotropy between complex diseases using single-nucleotide polymorphism-derived genomic relationships and restricted maximum likelihood. Bioinformatics 28:2540–42
    [Google Scholar]
  128. 128. 
    Lee H, Ripke S, Neale B, Faraone S, Purcell S et al. 2013. Genetic relationship between five psychiatric disorders estimated from genome-wide SNPs. Nat. Genet. 45:984–94
    [Google Scholar]
  129. 129. 
    Bulik-Sullivan B, Finucane HK, Anttila V, Gusev A, Day FR et al. 2015. An atlas of genetic correlations across human diseases and traits. Nat. Genet. 47:1236–41
    [Google Scholar]
  130. 130. 
    Lu Q, Li B, Ou D, Erlendsdottir M, Powles RL et al. 2017. A powerful approach to estimating annotation-stratified genetic covariance via GWAS summary statistics. Am. J. Hum. Genet. 101:939–64
    [Google Scholar]
  131. 131. 
    Brainstorm C, Anttila V, Bulik-Sullivan B, Finucane HK, Walters RK et al. 2018. Analysis of shared heritability in common disorders of the brain. Science 360:eaap8757
    [Google Scholar]
  132. 132. 
    Tylee DS, Sun J, Hess JL, Tahir MA, Sharma E et al. 2018. Genetic correlations among psychiatric and immune-related phenotypes based on genome-wide association data. Am. J. Med. Genet. B 177:641–57
    [Google Scholar]
  133. 133. 
    Shi H, Mancuso N, Spendlove S, Pasaniuc B 2017. Local genetic correlation gives insights into the shared genetic architecture of complex traits. Am. J. Hum. Genet. 101:737–51
    [Google Scholar]
  134. 134. 
    Brown BC, AGEN-T2D Consort., Ye CJ, Price AL, Zaitlen N 2016. Transethnic genetic-correlation estimates from summary statistics. Am. J. Hum. Genet 99:76–88
    [Google Scholar]
  135. 135. 
    Weissbrod O, Flint J, Rosset S 2018. Estimating SNP-based heritability and genetic correlation in case-control studies directly and with summary statistics. Am. J. Hum. Genet. 103:89–99
    [Google Scholar]
  136. 136. 
    Zheng J, Erzurumluoglu AM, Elsworth BL, Kemp JP, Howe L et al. 2017. LD Hub: a centralized database and web interface to perform LD score regression that maximizes the potential of summary level GWAS data for SNP heritability and genetic correlation analysis. Bioinformatics 33:272–79
    [Google Scholar]
  137. 137. 
    Chatterjee N, Shi J, Garcia-Closas M 2016. Developing and evaluating polygenic risk prediction models for stratified disease prevention. Nat. Rev. Genet. 17:392–406
    [Google Scholar]
  138. 138. 
    Wei Z, Wang W, Bradfield J, Li J, Cardinale C et al. 2013. Large sample size, wide variant spectrum, and advanced machine-learning technique boost risk prediction for inflammatory bowel disease. Am. J. Hum. Genet. 92:1008–12
    [Google Scholar]
  139. 139. 
    Zhou X, Carbonetto P, Stephens M 2013. Polygenic modeling with Bayesian sparse linear mixed models. PLOS Genet 9:e1003264
    [Google Scholar]
  140. 140. 
    Speed D, Balding DJ. 2014. MultiBLUP: improved SNP-based prediction for complex traits. Genome Res 24:1550–57
    [Google Scholar]
  141. 141. 
    Minnier J, Yuan M, Liu JS, Cai T 2015. Risk classification with an adaptive naive Bayes kernel machine model. J. Am. Stat. Assoc. 110:393–404
    [Google Scholar]
  142. 142. 
    Li C, Yang C, Gelernter J, Zhao H 2014. Improving genetic risk prediction by leveraging pleiotropy. Hum. Genet. 133:639–50
    [Google Scholar]
  143. 143. 
    Maier R, Moser G, Chen G-B, Ripke S, Coryell W et al. 2015. Joint analysis of psychiatric disorders increases accuracy of risk prediction for schizophrenia, bipolar disorder, and major depressive disorder. Am. J. Hum. Genet. 96:283–94
    [Google Scholar]
  144. 144. 
    Vilhjalmsson BJ, Yang J, Finucane HK, Gusev A, Lindstrom S et al. 2015. Modeling linkage disequilibrium increases accuracy of polygenic risk scores. Am. J. Hum. Genet. 97:576–92
    [Google Scholar]
  145. 145. 
    So HC, Sham PC. 2017. Improving polygenic risk prediction from summary statistics by an empirical Bayes approach. Sci. Rep. 7:41262
    [Google Scholar]
  146. 146. 
    Mak TSH, Porsch RM, Choi SW, Zhou X, Sham PC 2017. Polygenic scores via penalized regression on summary statistics. Genet. Epidemiol. 41:469–80
    [Google Scholar]
  147. 147. 
    Ge T, Chen CY, Ni Y, Feng YA, Smoller JW 2019. Polygenic prediction via Bayesian regression and continuous shrinkage priors. Nat. Commun. 10:1776
    [Google Scholar]
  148. 148. 
    Schrodi SJ, Mukherjee S, Shan Y, Tromp G, Sninsky JJ et al. 2014. Genetic-based prediction of disease traits: Prediction is very difficult, especially about the future. Front. Genet. 5:162
    [Google Scholar]
  149. 149. 
    Hu Y, Lu Q, Powles R, Yao X, Yang C et al. 2017. Leveraging functional annotations in genetic risk prediction for human complex diseases. PLOS Comput. Biol. 13:e1005589
    [Google Scholar]
  150. 150. 
    Hu Y, Lu Q, Liu W, Zhang Y, Li M, Zhao H 2017. Joint modeling of genetically correlated diseases and functional annotations increases accuracy of polygenic risk prediction. PLOS Genet 13:e1006836
    [Google Scholar]
  151. 151. 
    Maier RM, Zhu Z, Lee SH, Trzaskowski M, Ruderfer DM et al. 2018. Improving genetic prediction by leveraging genetic correlations among human diseases and traits. Nat. Commun. 9:989
    [Google Scholar]
  152. 152. 
    Timberlake AT, Choi J, Zaidi S, Lu Q, Nelson-Williams C et al. 2016. Two locus inheritance of non-syndromic midline craniosynostosis via rare SMAD6 and common BMP2 alleles. eLife 5:e20125
    [Google Scholar]
  153. 153. 
    Buenrostro JD, Wu B, Chang HY, Greenleaf WJ 2015. ATAC‐seq: a method for assaying chromatin accessibility genome‐wide. Curr. Protoc. Mol. Biol. 21.29.1–21.29. 9
    [Google Scholar]
  154. 154. 
    Lieberman-Aiden E, Van Berkum NL, Williams L, Imakaev M, Ragoczy T et al. 2009. Comprehensive mapping of long-range interactions reveals folding principles of the human genome. Science 326:289–93
    [Google Scholar]
  155. 155. 
    Bycroft C, Freeman C, Petkova D, Band G, Elliott LT et al. 2018. The UK Biobank resource with deep phenotyping and genomic data. Nature 562:203–9
    [Google Scholar]
  156. 156. 
    Collins FS, Varmus H. 2015. A new initiative on precision medicine. N. Engl. J. Med. 372:793–95
    [Google Scholar]
  157. 157. 
    Gaziano JM, Concato J, Brophy M, Fiore L, Pyarajan S et al. 2016. Million Veteran Program: a mega-biobank to study genetic influences on health and disease. J. Clin. Epidemiol. 70:214–23
    [Google Scholar]
  158. 158. 
    Nagai A, Hirata M, Kamatani Y, Muto K, Matsuda K et al. 2017. Overview of the BioBank Japan Project: study design and profile. J. Epidemiol. 27:S2–8
    [Google Scholar]
  159. 159. 
    So HC, Chau CK, Chiu WT, Ho KS, Lo CP et al. 2017. Analysis of genome-wide association data highlights candidates for drug repositioning in psychiatry. Nat. Neurosci. 20:1342–49
    [Google Scholar]
  160. 160. 
    Evans DM, Davey Smith G 2015. Mendelian randomization: new applications in the coming age of hypothesis-free causality. Annu. Rev. Genom. Hum. Genet. 16:327–50
    [Google Scholar]
  161. 161. 
    Zhu Z, Zhang F, Hu H, Bakshi A, Robinson MR et al. 2016. Integration of summary data from GWAS and eQTL studies predicts complex trait gene targets. Nat. Genet. 48:481–87
    [Google Scholar]
/content/journals/10.1146/annurev-biodatasci-030320-041026
Loading
/content/journals/10.1146/annurev-biodatasci-030320-041026
Loading

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error