1932

Abstract

Genotype imputation has become a standard tool in genome-wide association studies because it enables researchers to inexpensively approximate whole-genome sequence data from genome-wide single-nucleotide polymorphism array data. Genotype imputation increases statistical power, facilitates fine mapping of causal variants, and plays a key role in meta-analyses of genome-wide association studies. Only variants that were previously observed in a reference panel of sequenced individuals can be imputed. However, the rapid increase in the number of deeply sequenced individuals will soon make it possible to assemble enormous reference panels that greatly increase the number of imputable variants. In this review, we present an overview of genotype imputation and describe the computational techniques that make it possible to impute genotypes from reference panels with millions of individuals.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-genom-083117-021602
2018-08-31
2024-12-07
Loading full text...

Full text loading...

/deliver/fulltext/genom/19/1/annurev-genom-083117-021602.html?itemId=/content/journals/10.1146/annurev-genom-083117-021602&mimeType=html&fmt=ahah

Literature Cited

  1. 1. 1000 Genomes Proj. Consort. 2010. A map of human genome variation from population-scale sequencing. Nature 467:1061–73
    [Google Scholar]
  2. 2. 1000 Genomes Proj. Consort. 2012. An integrated map of genetic variation from 1,092 human genomes. Nature 491:56–65
    [Google Scholar]
  3. 3. 1000 Genomes Proj. Consort. 2015. A global reference for human genetic variation. Nature 526:68–74
    [Google Scholar]
  4. 4.  Al Olama AA, Kote-Jarai Z, Berndt SI, Conti DV, Schumacher F et al. 2014. A meta-analysis of 87,040 individuals identifies 23 new susceptibility loci for prostate cancer. Nat. Genet. 46:1103–9
    [Google Scholar]
  5. 5.  Alioto TS, Buchhalter I, Derdak S, Hutter B, Eldridge MD et al. 2015. A comprehensive assessment of somatic mutation detection in cancer using whole-genome sequencing. Nat. Commun. 6:10001
    [Google Scholar]
  6. 6.  Anderson CA, Pettersson FH, Barrett JC, Zhuang JJ, Ragoussis J et al. 2008. Evaluating the effects of imputation on the power, coverage, and cost efficiency of genome-wide SNP platforms. Am. J. Hum. Genet. 83:112–19
    [Google Scholar]
  7. 7.  Barrett JC, Hansoul S, Nicolae DL, Cho JH, Duerr RH et al. 2008. Genome-wide association defines more than 30 distinct susceptibility loci for Crohn's disease. Nat. Genet. 40:955–62
    [Google Scholar]
  8. 8.  Browning BL, Browning SR 2009. A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. Am. J. Hum. Genet. 84:210–23
    [Google Scholar]
  9. 9.  Browning BL, Browning SR 2016. Genotype imputation with millions of reference samples. Am. J. Hum. Genet. 98:116–26
    [Google Scholar]
  10. 10.  Browning BL, Yu Z 2009. Simultaneous genotype calling and haplotype phasing improves genotype accuracy and reduces false-positive associations for genome-wide association studies. Am. J. Hum. Genet. 85:847–61
    [Google Scholar]
  11. 11.  Browning SR 2008. Missing data imputation and haplotype phase inference for genome-wide association studies. Hum. Genet. 124:439–50
    [Google Scholar]
  12. 12.  Browning SR, Browning BL 2007. Rapid and accurate haplotype phasing and missing-data inference for whole-genome association studies by use of localized haplotype clustering. Am. J. Hum. Genet. 81:1084–97
    [Google Scholar]
  13. 13.  Bycroft C, Freeman C, Petkova D, Band G, Elliott LT et al. 2017. Genome-wide genetic data on ∼500,000 UK Biobank participants. bioRxiv 166298. https://doi.org/10.1101/166298
    [Crossref]
  14. 14.  Chang D, Nalls MA, Hallgrimsdottir IB, Hunkapiller J, van der Brug M et al. 2017. A meta-analysis of genome-wide association studies identifies 17 new Parkinson's disease risk loci. Nat. Genet. 49:1511–16
    [Google Scholar]
  15. 15.  Coon KD, Myers AJ, Craig DW, Webster JA, Pearson JV et al. 2007. A high-density whole-genome association study reveals that APOE is the major susceptibility gene for sporadic late-onset Alzheimer's disease. J. Clin. Psychiatry 68:613–18
    [Google Scholar]
  16. 16.  Cooper JD, Smyth DJ, Smiles AM, Plagnol V, Walker NM et al. 2008. Meta-analysis of genome-wide association study data identifies additional type 1 diabetes risk loci. Nat. Genet. 40:1399–401
    [Google Scholar]
  17. 17.  Danecek P, Auton A, Abecasis GR, Albers CA, Banks E et al. 2011. The variant call format and VCFtools. Bioinformatics 27:2156–58
    [Google Scholar]
  18. 18.  Das S 2017. Next generation of genotype imputation methods PhD Thesis Dep. Biostat., Univ. Mich. Ann Arbor:
    [Google Scholar]
  19. 19.  Das S, Forer L, Schonherr S, Sidore C, Locke AE et al. 2016. Next-generation genotype imputation service and methods. Nat. Genet. 48:1284–87
    [Google Scholar]
  20. 20.  Davies RW, Flint J, Myers S, Mott R 2016. Rapid genotype imputation from sequence without reference panels. Nat. Genet. 48:965–69
    [Google Scholar]
  21. 21.  Deelen P, Menelaou A, van Leeuwen EM, Kanterakis A, van Dijk F et al. 2014. Improved imputation quality of low-frequency and rare variants in European samples using the ‘Genome of The Netherlands.’ Eur. J. Hum. . Genet 22:1321–26
    [Google Scholar]
  22. 22.  Deutsch LP 1996. GZIP file format specification version 4.3 RFC 1952 Netw. Work. Group, Internet Eng. Task Force Fremont, CA: https://tools.ietf.org/search/rfc1952
    [Google Scholar]
  23. 23. Diabetes Genet. Replication Meta-Anal. (DIAGRAM) Consort., Asian Genet. Epidemiol. Netw. Type 2 Diabetes (AGEN-T2D) Consort., South Asian Type 2 Diabetes (SAT2D) Consort., Mex. Am. Type 2 Diabetes (MAT2D) Consort., Type 2 Diabetes Genet. Explor. Next-Gener. Seq. Multi-Ethnic Samples (T2D-GENES) Consort., et al. 2014. Genome-wide trans-ancestry meta-analysis provides insight into the genetic architecture of type 2 diabetes susceptibility. Nat. Genet. 46:234–44
    [Google Scholar]
  24. 24.  Dilthey A, Leslie S, Moutsianas L, Shen J, Cox C et al. 2013. Multi-population classical HLA type imputation. PLOS Comput. Biol. 9:e1002877
    [Google Scholar]
  25. 25.  Duan Q, Liu EY, Auer PL, Zhang G, Lange EM et al. 2013. Imputation of coding variants in African Americans: better performance using data from the exome sequencing project. Bioinformatics 29:2744–49
    [Google Scholar]
  26. 26.  Dudbridge F 2008. Likelihood-based association analysis for nuclear families and unrelated subjects with missing genotype data. Hum. Hered. 66:87–98
    [Google Scholar]
  27. 27.  Durbin R 2014. Efficient haplotype matching and storage using the positional Burrows-Wheeler transform (PBWT). Bioinformatics 30:1266–72
    [Google Scholar]
  28. 28.  Ernst J, Kellis M 2015. Large-scale imputation of epigenomic datasets for systematic annotation of diverse human tissues. Nat. Biotechnol. 33:364–76
    [Google Scholar]
  29. 29.  Fisher RA 1918. The correlation between relatives on the supposition of Mendelian inheritance. Philos. Trans. R. Soc. Edinb. 52:399–433
    [Google Scholar]
  30. 30.  Fuchsberger C, Abecasis GR, Hinds DA 2015. minimac2: faster genotype imputation. Bioinformatics 31:782–84
    [Google Scholar]
  31. 31.  Gamazon ER, Wheeler HE, Shah KP, Mozaffari SV, Aquino-Michaels K et al. 2015. A gene-based association method for mapping traits using reference transcriptome data. Nat. Genet. 47:1091–98
    [Google Scholar]
  32. 32.  Gibson G 2012. Rare and common variants: twenty arguments. Nat. Rev. Genet. 13:135–45
    [Google Scholar]
  33. 33. Glob. Lipids Genet. Consort. 2013. Discovery and refinement of loci associated with lipid levels. Nat. Genet. 45:1274–83
    [Google Scholar]
  34. 34.  Goodwin S, McPherson JD, McCombie WR 2016. Coming of age: ten years of next-generation sequencing technologies. Nat. Rev. Genet. 17:333–51
    [Google Scholar]
  35. 35.  Green ED, Rubin EM, Olson MV 2017. The future of DNA sequencing. Nature 550:179–81
    [Google Scholar]
  36. 36.  Handsaker RE, Van Doren V, Berman JR, Genovese G, Kashin S et al. 2015. Large multiallelic copy number variations in humans. Nat. Genet. 47:296–303
    [Google Scholar]
  37. 37.  Hao K, Chudin E, McElwee J, Schadt EE 2009. Accuracy of genome-wide imputation of untyped markers and impacts on statistical power for association studies. BMC Genet 10:27
    [Google Scholar]
  38. 38.  Hoffmann TJ, Sakoda LC, Shen L, Jorgenson E, Habel LA et al. 2015. Imputation of the rare HOXB13 G84E mutation and cancer risk in a large population-based cohort. PLOS Genet 11:e1004930
    [Google Scholar]
  39. 39.  Holm H, Gudbjartsson DF, Sulem P, Masson G, Helgadottir HT et al. 2011. A rare variant in MYH6 is associated with high risk of sick sinus syndrome. Nat. Genet. 43:316–20
    [Google Scholar]
  40. 40.  Howie BN, Donnelly P, Marchini J 2009. A flexible and accurate genotype imputation method for the next generation of genome-wide association studies. PLOS Genet 5:e1000529
    [Google Scholar]
  41. 41.  Howie BN, Fuchsberger C, Stephens M, Marchini J, Abecasis GR 2012. Fast and accurate genotype imputation in genome-wide association studies through pre-phasing. Nat. Genet. 44:955–59
    [Google Scholar]
  42. 42.  Howie BN, Marchini J, Stephens M 2011. Genotype imputation with thousands of genomes. G3 1:457–70
    [Google Scholar]
  43. 43.  Huang J, Howie B, McCarthy S, Memari Y, Walter K et al. 2015. Improved imputation of low-frequency and rare variants using the UK10K haplotype reference panel. Nat. Commun. 6:457–70
    [Google Scholar]
  44. 44. Int. HapMap Consort. 2003. The International HapMap Consortium. Nature 426:789–96
    [Google Scholar]
  45. 45. Int. HapMap Consort. 2005. A haplotype map of the human genome. Nature 437:1299–320
    [Google Scholar]
  46. 46. Int. HapMap Consort. 2007. A second generation human haplotype map of over 3.1 million SNPs. Nature 449:851–61
    [Google Scholar]
  47. 47. Int. HapMap 3 Consort. 2010. Integrating common and rare genetic variation in diverse human populations. Nature 467:52–58
    [Google Scholar]
  48. 48. Int. Hum. Genome Seq. Consort. 2004. Finishing the euchromatic sequence of the human genome. Nature 431:931–45
    [Google Scholar]
  49. 49.  Jia X, Han B, Onengut-Gumuscu S, Chen WM, Concannon PJ et al. 2013. Imputing amino acid polymorphisms in human leukocyte antigens. PLOS ONE 8:e64683
    [Google Scholar]
  50. 50.  Jun G, Wing MK, Abecasis GR, Kang HM 2015. An efficient and scalable analysis framework for variant extraction and refinement from population-scale DNA sequence data. Genome Res 25:918–25
    [Google Scholar]
  51. 51.  Klein AP, Wolpin BM, Risch HA, Stolzenberg-Solomon RZ, Mocci E et al. 2018. Genome-wide meta-analysis identifies five new susceptibility loci for pancreatic cancer. Nat. Commun. 9:556
    [Google Scholar]
  52. 52.  Klein RJ, Zeiss C, Chew EY, Tsai JY, Sackler RS et al. 2005. Complement factor H polymorphism in age-related macular degeneration. Science 308:385–89
    [Google Scholar]
  53. 53.  Lango Allen H, Estrada K, Lettre G, Berndt SI, Weedon MN et al. 2010. Hundreds of variants clustered in genomic loci and biological pathways affect human height. Nature 467:832–38
    [Google Scholar]
  54. 54.  Lappalainen I, Almeida-King J, Kumanduri V, Senf A, Spalding JD et al. 2015. The European Genome-phenome Archive of human data consented for biomedical research. Nat. Genet. 47:692–95
    [Google Scholar]
  55. 55.  Leslie S, Donnelly P, McVean G 2008. A statistical method for predicting classical HLA alleles from SNP data. Am. J. Hum. Genet. 82:48–56
    [Google Scholar]
  56. 56.  Li N, Stephens M 2003. Modeling linkage disequilibrium and identifying recombination hotspots using single-nucleotide polymorphism data. Genetics 165:2213–33
    [Google Scholar]
  57. 57.  Li Y, Willer CJ, Ding J, Scheet P, Abecasis GR 2010. MaCH: using sequence and genotype data to estimate haplotypes and unobserved genotypes. Genet. Epidemiol. 34:816–34
    [Google Scholar]
  58. 58.  Li Y, Willer CJ, Sanna S, Abecasis GR 2009. Genotype imputation. Annu. Rev. Genom. Hum. Genet. 10:387–406
    [Google Scholar]
  59. 59.  Lin DY, Hu Y, Huang BE 2008. Simple and efficient analysis of disease association with missing genotype data. Am. J. Hum. Genet. 82:444–52
    [Google Scholar]
  60. 60.  Locke AE, Kahali B, Berndt SI, Justice AE, Pers TH et al. 2015. Genetic studies of body mass index yield new insights for obesity biology. Nature 518:197–206
    [Google Scholar]
  61. 61.  Loh P-R, Danecek P, Palamara PF, Fuchsberger C, Reshef YA et al. 2016. Reference-based phasing using the Haplotype Reference Consortium panel. Nat. Genet. 48:1443–48
    [Google Scholar]
  62. 62.  Loh P-R, Palamara PF, Price AL 2016. Fast and accurate long-range phasing in a UK Biobank cohort. Nat. Genet. 48:811–16
    [Google Scholar]
  63. 63.  Mahajan A, Taliun D, Thurner M, Robertson NR, Torres JM, Rayner NW 2018. Fine-mapping of an expanded set of type 2 diabetes loci to single-variant resolution using high-density imputation and islet-specific epigenome maps. bioRxiv 245506. https://doi.org/10.1101/245506
    [Crossref]
  64. 64.  Mailman MD, Feolo M, Jin Y, Kimura M, Tryka K et al. 2007. The NCBI dbGaP database of genotypes and phenotypes. Nat. Genet. 39:1181–86
    [Google Scholar]
  65. 65.  Marchini J, Cutler D, Patterson N, Stephens M, Eskin E et al. 2006. A comparison of phasing algorithms for trios and unrelated individuals. Am. J. Hum. Genet. 78:437–50
    [Google Scholar]
  66. 66.  Marchini J, Howie B 2010. Genotype imputation for genome-wide association studies. Nat. Rev. Genet. 11:499–511
    [Google Scholar]
  67. 67.  Marchini J, Howie B, Myers S, McVean G, Donnelly P 2007. A new multipoint method for genome-wide association studies by imputation of genotypes. Nat. Genet. 39:906–13
    [Google Scholar]
  68. 68.  McCarthy MI, Abecasis GR, Cardon LR, Goldstein DB, Little J et al. 2008. Genome-wide association studies for complex traits: consensus, uncertainty and challenges. Nat. Rev. Genet. 9:356–69
    [Google Scholar]
  69. 69.  McCarthy S, Das S, Kretzschmar W, Delaneau O, Wood AR et al. 2016. A reference panel of 64,976 haplotypes for genotype imputation. Nat. Genet. 48:1279–83
    [Google Scholar]
  70. 70.  Nair RP, Duffin KC, Helms C, Ding J, Stuart PE et al. 2009. Genome-wide scan reveals association of psoriasis with IL-23 and NF-κB pathways. Nat. Genet. 41:199–204
    [Google Scholar]
  71. 71. Natl. Heart Lung Blood Inst. 2018. Trans-Omics for Precision Medicine (TOPMed) program https://www.nhlbi.nih.gov/science/trans-omics-precision-medicine-topmed-program
    [Google Scholar]
  72. 72.  Nicolae DL 2006. Testing untyped alleles (TUNA)—applications to genome-wide association studies. Genet. Epidemiol. 30:718–27
    [Google Scholar]
  73. 73.  Nikpay M, Goel A, Won HH, Hall LM, Willenborg C et al. 2015. A comprehensive 1,000 Genomes-based genome-wide association meta-analysis of coronary artery disease. Nat. Genet. 47:1121–30
    [Google Scholar]
  74. 74.  O'Connell J, Sharp K, Shrine N, Wain L, Hall I et al. 2016. Haplotype estimation for biobank-scale data sets. Nat. Genet. 48:817–20
    [Google Scholar]
  75. 75.  Okada Y, Wu D, Trynka G, Raj T, Terao C et al. 2014. Genetics of rheumatoid arthritis contributes to biology and drug discovery. Nature 506:376–81
    [Google Scholar]
  76. 76.  Orho-Melander M, Melander O, Guiducci C, Perez-Martinez P, Corella D et al. 2008. Common missense variant in the glucokinase regulatory protein gene is associated with increased plasma triglyceride and C-reactive protein but lower fasting glucose concentrations. Diabetes 57:3112–21
    [Google Scholar]
  77. 77.  Pistis G, Porcu E, Vrieze SI, Sidore C, Steri M et al. 2015. Rare variant genotype imputation with thousands of study-specific whole-genome sequences: implications for cost-effective study designs. Eur. J. Hum. Genet. 23:975–83
    [Google Scholar]
  78. 78.  Pritchard JK, Przeworski M 2001. Linkage disequilibrium in humans: models and data. Am. J. Hum. Genet. 69:1–14
    [Google Scholar]
  79. 79.  Purcell S, Neale B, Todd-Brown K, Thomas L, Ferreira MA et al. 2007. PLINK: a tool set for whole-genome association and population-based linkage analyses. Am. J. Hum. Genet. 81:559–75
    [Google Scholar]
  80. 80.  Rabiner LR 1989. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77:257–86
    [Google Scholar]
  81. 81.  Rawlik K, Rowlatt A, Tenesa A 2016. Imputation of DNA methylation levels in the brain implicates a risk factor for Parkinson's disease. Genetics 204:771–81
    [Google Scholar]
  82. 82.  Saunders EJ, Dadaev T, Leongamornlert DA, Jugurnauth-Little S, Tymrakiewicz M et al. 2014. Fine-mapping the HOXB region detects common variants tagging a rare coding allele: evidence for synthetic association in prostate cancer. PLOS Genet 10:e1004129
    [Google Scholar]
  83. 83.  Scheet P, Stephens M 2006. A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am. J. Hum. Genet. 78:629–44
    [Google Scholar]
  84. 84.  Schully SD, Yu W, McCallum V, Benedicto CB, Dong LM et al. 2011. Cancer GAMAdb: database of cancer genetic associations from meta-analyses and genome-wide association studies. Eur. J. Hum. Genet. 19:928–30
    [Google Scholar]
  85. 85.  Scott LJ, Mohlke KL, Bonnycastle LL, Willer CJ, Li Y et al. 2007. A genome-wide association study of type 2 diabetes in Finns detects multiple susceptibility variants. Science 316:1341–45
    [Google Scholar]
  86. 86.  Speliotes EK, Willer CJ, Berndt SI, Monda KL, Thorleifsson G et al. 2010. Association analyses of 249,796 individuals reveal 18 new loci associated with body mass index. Nat. Genet. 42:937–48
    [Google Scholar]
  87. 87.  Spencer CC, Su Z, Donnelly P, Marchini J 2009. Designing genome-wide association studies: sample size, power, imputation, and the choice of genotyping chip. PLOS Genet 5:e1000477
    [Google Scholar]
  88. 88.  Taliun D, Chothani SP, Schönherr S, Forer L, Boehnke M et al. 2017. LASER server: ancestry tracing with genotypes or sequence reads. Bioinformatics 33:2056–58
    [Google Scholar]
  89. 89. UK10K Consort 2015. The UK10K project identifies rare variants in health and disease. Nature 526:82–90
    [Google Scholar]
  90. 90.  Wang J, Gamazon ER, Pierce BL, Stranger BE, Im HK et al. 2016. Imputing gene expression in uncollected tissues within and beyond GTEx. Am. J. Hum. Genet. 98:697–708
    [Google Scholar]
  91. 91.  Wang Y, Lu J, Yu J, Gibbs RA, Yu F 2013. An integrative variant analysis pipeline for accurate genotype/haplotype inference in population NGS data. Genome Res 23:833–42
    [Google Scholar]
  92. 92. Wellcome Trust Case Control Consort. 2007. Genome-wide association study of 14,000 cases of seven common diseases and 3,000 shared controls. Nature 447:661–78
    [Google Scholar]
  93. 93.  Willer CJ, Sanna S, Jackson AU, Scuteri A, Bonnycastle LL et al. 2008. Newly identified loci that influence lipid concentrations and risk of coronary artery disease. Nat. Genet. 40:161–69
    [Google Scholar]
  94. 94.  Willer CJ, Speliotes EK, Loos RJ, Li S, Lindgren CM et al. 2009. Six new loci associated with body mass index highlight a neuronal influence on body weight regulation. Nat. Genet. 41:25–34
    [Google Scholar]
  95. 95.  Wood AR, Esko T, Yang J, Vedantam S, Pers TH et al. 2014. Defining the role of common variation in the genomic and biological architecture of adult human height. Nat. Genet. 46:1173–86
    [Google Scholar]
  96. 96.  Yu Z, Schaid DJ 2007. Methods to impute missing genotypes for population data. Hum. Genet. 122:495–504
    [Google Scholar]
  97. 97.  Zhang J, Jiang K, Lv L, Wang H, Shen Z et al. 2015. Use of genome-wide association studies for cancer research and drug repositioning. PLOS ONE 10:e0116477
    [Google Scholar]
  98. 98.  Zheng X, Shen J, Cox C, Wakefield JC, Ehm MG et al. 2014. HIBAG—HLA genotype imputation with attribute bagging. Pharmacogenom. J. 14:192–200
    [Google Scholar]
  99. 99.  Zhou W, Han L, Altman RB 2016. Imputing gene expression to maximize platform compatibility. Bioinformatics 33:522–28
    [Google Scholar]
/content/journals/10.1146/annurev-genom-083117-021602
Loading
/content/journals/10.1146/annurev-genom-083117-021602
Loading

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error