Pangenome Graphs

Jordan M. Eizenga; Adam M. Novak; Jonas A. Sibbesen; Simon Heumos; Ali Ghaffaari; Glenn Hickey; Xian Chang; Josiah D. Seaman; Robin Rounthwaite; Jana Ebler; Mikko Rautiainen; Shilpa Garg; Benedict Paten; Tobias Marschall; Jouni Sirén; Erik Garrison

doi:10.1146/annurev-genom-120219-080406

Annual Review of Genomics and Human Genetics

Volume 21, 2020

Review Article

Free

Pangenome Graphs

Jordan M. Eizenga¹, Adam M. Novak¹, Jonas A. Sibbesen¹, Simon Heumos², Ali Ghaffaari^3,4,5, Glenn Hickey¹, Xian Chang¹, Josiah D. Seaman^6,7, Robin Rounthwaite¹, Jana Ebler^3,4,5, Mikko Rautiainen^3,4,5, Shilpa Garg^8,9, Benedict Paten¹, Tobias Marschall^3,4, Jouni Sirén¹, and Erik Garrison¹
View Affiliations Hide Affiliations

Affiliations: ¹Genomics Institute, University of California, Santa Cruz, California 95064, USA; email: [email protected] ²Quantitative Biology Center, University of Tübingen, 72076 Tübingen, Germany ³Center for Bioinformatics, Saarland University, 66123 Saarbrücken, Germany ⁴Max Planck Institute for Informatics, 66123 Saarbrücken, Germany ⁵Saarbrücken Graduate School for Computer Science, Saarland University, 66123 Saarbrücken, Germany ⁶Royal Botanic Gardens, Kew, Richmond TW9 3AB, United Kingdom ⁷School of Biological and Chemical Sciences, Queen Mary University of London, London E1 4NS, United Kingdom ⁸Departments of Genetics and Biomedical Informatics, Harvard Medical School, Boston, Massachusetts 02215, USA ⁹Department of Data Sciences, Dana-Farber Cancer Institute, Boston, Massachusetts 02215, USA
Vol. 21:139-162 (Volume publication date August 2020) https://doi.org/10.1146/annurev-genom-120219-080406
First published as a Review in Advance on May 26, 2020
Copyright © 2020 by Annual Reviews. All rights reserved

Abstract

Low-cost whole-genome assembly has enabled the collection of haplotype-resolved pangenomes for numerous organisms. In turn, this technological change is encouraging the development of methods that can precisely address the sequence and variation described in large collections of related genomes. These approaches often use graphical models of the pangenome to support algorithms for sequence alignment, visualization, functional genomics, and association studies. The additional information provided to these methods by the pangenome allows them to achieve superior performance on a variety of bioinformatic tasks, including read alignment, variant calling, and genotyping. Pangenome graphs stand to become a ubiquitous tool in genomics. Although it is unclear whether they will replace linearreference genomes, their ability to harmoniously relate multiple sequence and coordinate systems will make them useful irrespective of which pangenomic models become most common in the future.

Keyword(s): genome graph, pangenome, variation graph

Article metrics loading...

/content/journals/10.1146/annurev-genom-120219-080406

2020-08-31

2024-04-23

Full text loading...

/deliver/fulltext/genom/21/1/annurev-genom-120219-080406.html?itemId=/content/journals/10.1146/annurev-genom-120219-080406&mimeType=html&fmt=ahah

Literature Cited

1.
1000 Genomes Proj. Consort 2015. A global reference for human genetic variation. Nature 526:68–74
[Google Scholar]
2.
1001 Genomes Consort 2016. 1,135 genomes reveal the global pattern of polymorphism in Arabidopsis thaliana. Cell 166:481–91
[Google Scholar]
3.
Aguiar VRC, César J, Delaneau O, Dermitzakis ET, Meyer D 2019. Expression estimation and eQTL mapping for HLA genes with a personalized pipeline. PLOS Genet 15:e1008091
[Google Scholar]
4.
Ambler JM, Mulaudzi S, Mulder N 2019. GenGraph: a python module for the simple generation and manipulation of genome graphs. Bioinformatics 20:519
[Google Scholar]
5.
Amir A, Lewenstein M, Lewenstein N 1997. Pattern matching in hypertext. Algorithms and Data Structures F Dehne, A Rau-Chaplin, JR Sack, R Tamassia 160–73 Lect. Notes Comput. Sci. 1272 Berlin: Springer
[Google Scholar]
6.
Antipov D, Korobeynikov A, McLean JS, Pevzner PA 2015. hybridSPAdes: an algorithm for hybrid assembly of short and long reads. Bioinformatics 32:1009–15
[Google Scholar]
7.
Armstrong J, Hickey G, Diekhans M, Deran A, Fang Q et al. 2019. Progressive alignment with Cactus: a multiple-genome aligner for the thousand-genome era. bioRxiv 730531. https://doi.org/10.1101/730531
[Crossref]
8.
Audano PA, Sulovari A, Graves-Lindsay TA, Cantsilieris S, Sorensen M et al. 2019. Characterizing the major structural variant alleles of the human genome. Cell 176:663–75.e19
[Google Scholar]
9.
Baaijens JA, Stougie L, Schönhuth A 2019. Strain-aware assembly of genomes from mixed samples using variation graphs. bioRxiv 645721 https://doi.org/10.1101/645721
[Crossref]
10.
Baaijens JA, Van der Roest B, Köster J, Stougie L, Schönhuth A 2019. Full-length de novo viral quasispecies assembly through variation graph construction. Bioinformatics 35:5086–94
[Google Scholar]
11.
Baier U, Beller T, Ohlebusch E 2015. Graphical pan-genome analysis with compressed suffix trees and the Burrows–Wheeler transform. Bioinformatics 32:497–504
[Google Scholar]
12.
Bernardini G, Pisanti N, Pissis SP, Rosone G 2019. Approximate pattern matching on elastic-degenerate text. Theor. Comput. Sci. 812:109–22
[Google Scholar]
13.
Beyer W, Novak AM, Hickey G, Chan J, Tan V et al. 2019. Sequence tube maps: making graph genomes intuitive to commuters. Bioinformatics 35:5318–20
[Google Scholar]
14.
Biederstedt E, Oliver JC, Hansen NF, Jajoo A, Dunn N et al. 2018. NovoGraph: human genome graph construction from multiple long-read de novo assemblies. F1000Research 7:1391
[Google Scholar]
15.
Bolger A, Denton A, Bolger M, Usadel B 2017. Logan: a framework for LOssless Graph-based ANalysis of high throughput sequence data. bioRxiv 175976. https://doi.org/10.1101/175976
[Crossref]
16.
Bowe A, Onodera T, Sadakane K, Shibuya T 2012. Succinct de Bruijn graphs. Algorithms in Bioinformatics B Raphael, J Tang 225–35 Lect. Notes Comput. Sci. 7534 Berlin: Springer
[Google Scholar]
17.
Bradley P, Gordon NC, Walker TM, Dunn L, Heys S et al. 2015. Rapid antibiotic-resistance predictions from genome sequence data for Staphylococcus aureus and Mycobacterium tuberculosis. Nat. Commun 6:10063
[Google Scholar]
18.
Browning SR, Browning BL. 2011. Haplotype phasing: existing methods and new developments. Nat. Rev. Genet. 12:703–14
[Google Scholar]
19.
Brynildsrud O, Bohlin J, Scheffer L, Eldholm V 2016. Rapid scoring of genes in microbial pan-genome-wide association studies with Scoary. Genome Biol 17:238
[Google Scholar]
20.
Büchler T, Ohlebusch E. 2019. An improved encoding of genetic variation in a Burrows-Wheeler transform. bioRxiv 658716. https://doi.org/10.1101/658716
[Crossref]
21.
Burrows M, Wheeler DJ. 1994. A block sorting lossless data compression algorithm Tech. Rep. 124, Digital Equipment Corporation Palo Alto, CA:
22.
Cao J, Schneeberger K, Ossowski S, Günther T, Bender S et al. 2011. Whole-genome sequencing of multiple Arabidopsis thaliana populations. Nat. Genet. 43:956–63
[Google Scholar]
23.
Castel SE, Levy-Moonshine A, Mohammadi P, Banks E, Lappalainen T 2015. Tools and best practices for data processing in allelic expression analysis. Genome Biol 16:195
[Google Scholar]
24.
Chaisson MJP, Sanders AD, Zhao X, Malhotra A, Porubsky D et al. 2019. Multi-platform discovery of haplotype-resolved structural variation in human genomes. Nat. Commun. 10:1784
[Google Scholar]
25.
Chakraborty M, VanKuren NW, Zhao R, Zhang X, Kalsow S, Emerson J 2018. Hidden genetic variation shapes the structure of functional elements in Drosophila. Nat. Genet 50:20–25
[Google Scholar]
26.
Chen S, Krusche P, Dolzhenko E, Sherman RM, Petrovski R et al. 2019. Paragraph: a graph-based structural variant genotyper for short-read sequence data. Genome Biol 20:291
[Google Scholar]
27.
Chimani M, Gutwenger C, Jünger M, Klau G, Klein K, Mutzel P 2013. The Open Graph Drawing Framework (OGDF). Handbook of Graph Drawing and Visualization R Tamassia 543–69 Boca Raton, FL: CRC
[Google Scholar]
28.
Church DM, Schneider VA, Steinberg KM, Schatz MC, Quinlan AR et al. 2015. Extending reference assembly models. Genome Biol 16:13
[Google Scholar]
29.
Cisak A, Grabowski S, Holub J 2018. SOPanG: online text searching over a pan-genome. Bioinformatics 34:4290–92
[Google Scholar]
30.
Claude F, Navarro G, Ordóñez A 2015. The wavelet matrix: an efficient wavelet tree for large alphabets. Inf. Syst. 47:15–32
[Google Scholar]
31.
Cleary A, Ramaraj T, Kahanda I, Mudge J, Mumey B 2018. Exploring frequented regions in pan-genomic graphs. IEEE/ACM Trans. Comput. Biol. Bioinform. 16:1424–35
[Google Scholar]
32.
Comput. Pan-Genom. Consort 2016. Computational pan-genomics: status, promises and challenges. Brief. Bioinform 19:118–35
[Google Scholar]
33.
Danecek P, Auton A, Abecasis G, Albers CA, Banks E et al. 2011. The variant call format and VCFtools. Bioinformatics 27:2156–58
[Google Scholar]
34.
Degner JF, Marioni JC, Pai AA, Pickrell JK, Nkadori E et al. 2009. Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data. Bioinformatics 25:3207–12
[Google Scholar]
35.
Dilthey AT, Cox C, Iqbal Z, Nelson MR, McVean G 2015. Improved genome inference in the MHC using a population reference graph. Nat. Genet. 47:682–88
[Google Scholar]
36.
Dilthey AT, Mentzer AJ, Carapito R, Cutland C, Cereb N et al. 2019. HLA^*LA—HLA typing from linearly projected graph alignments. Bioinformatics 35:4394–96
[Google Scholar]
37.
Dolzhenko E, Deshpande V, Schlesinger F, Krusche P, Petrovski R et al. 2019. ExpansionHunter: a sequence-graph-based tool to analyze variation in short tandem repeat regions. Bioinformatics 35:4754–56
[Google Scholar]
38.
Duan Z, Qiao Y, Lu J, Lu H, Zhang W et al. 2019. HUPAN: a pan-genome analysis pipeline for human genomes. Genome Biol 20:149
[Google Scholar]
39.
Durbin R. 2014. Efficient haplotype matching and storage using the positional Burrows-Wheeler transform (PBWT). Bioinformatics 30:1266–72
[Google Scholar]
40.
Eggertsson HP, Jonsson H, Kristmundsdottir S, Hjartarson E, Kehr B et al. 2017. Graphtyper enables population-scale genotyping using pangenome graphs. Nat. Genet. 49:1654–60
[Google Scholar]
41.
Eggertsson HP, Kristmundsdottir S, Beyter D, Jonsson H, Skuladottir A et al. 2019. GraphTyper2 enables population-scale genotyping of structural variation using pangenome graphs. Nat. Commun. 10:5402
[Google Scholar]
42.
Ferragina P, Manzini G. 2005. Indexing compressed text. J. ACM 52:552–81
[Google Scholar]
43.
Franz M, Lopes C, Huck G, Dong Y, Sumer O, Bader G 2016. Cytoscape.js: a graph theory library for visualization and analysis. Bioinformatics 32:309–11
[Google Scholar]
44.
Fu S, Wang A, Au KF 2019. A comparative evaluation of hybrid error correction methods for error-prone long reads. Genome Biol 20:26
[Google Scholar]
45.
Gagie T, Manzini G, Sirén J 2017. Wheeler graphs: a framework for BWT-based data structures. Theor. Comput. Sci. 698:67–78
[Google Scholar]
46.
Gao L, Gonda I, Sun H, Ma Q, Bao K et al. 2019. The tomato pan-genome uncovers new genes and a rare allele regulating fruit flavor. Nat. Genet. 51:1044–51
[Google Scholar]
47.
Garg S, Rautiainen M, Novak AM, Garrison E, Durbin R, Marschall T 2018. A graph-based approach to diploid genome assembly. Bioinformatics 34:i105–14
[Google Scholar]
48.
Garrison E. 2019. Graphical pangenomics PhD Thesis, Univ. Cambridge Cambridge, UK:
49.
Garrison E, Marth G. 2012. Haplotype-based variant detection from short-read sequencing. arXiv:1207.3907 [q-bio.GN]
50.
Garrison E, Sirén J, Novak AM, Hickey G, Eizenga JM et al. 2018. Variation graph toolkit improves read mapping by representing genetic variation in the reference. Nat. Biotechnol. 36:875–79
[Google Scholar]
51.
Ghaffaari A, Marschall T. 2019. Fully-sensitive seed finding in sequence graphs using a hybrid index. Bioinformatics 35:i81–89
[Google Scholar]
52.
Gonnella G, Niehus N, Kurtz S 2018. GfaViz: flexible and interactive visualization of GFA sequence graphs. Bioinformatics 35:2853–55
[Google Scholar]
53.
Grasso C, Lee C. 2004. Combining partial order alignment and progressive multiple sequence alignment increases alignment speed and scalability to very large alignment problems. Bioinformatics 20:1546–56
[Google Scholar]
54.
Groza C, Kwan T, Soranzo N, Pastinen T, Bourque G 2019. Personalized and graph genomes reveal missing signal in epigenomic data. bioRxiv 457101. https://doi.org/10.1101/457101
[Crossref]
55.
Grytten I, Rand KD, Nederbragt AJ, Storvik GO, Glad IK, Sandve GK 2019. Graph Peak Caller: calling ChIP-seq peaks on graph-based reference genomes. PLOS Comput. Biol. 15:e1006731
[Google Scholar]
56.
Haeussler M, Zweig AS, Tyner C, Speir ML, Rosenbloom KR et al. 2018. The UCSC Genome Browser database: 2019 update. Nucleic Acids Res 47:D853–58
[Google Scholar]
57.
Hehir-Kwa JY, Marschall T, Kloosterman WP, Francioli LC, Baaijens JA et al. 2016. A high-quality human reference panel reveals the complexity and distribution of genomic structural variants. Nat. Commun. 7:12989
[Google Scholar]
58.
Hein J. 1989. A new method that simultaneously aligns and reconstructs ancestral sequences for any number of homologous sequences, when the phylogeny is given. Mol. Biol. Evol. 6:649–68
[Google Scholar]
59.
Heng L, Handsaker B, Wysoker A, Fennell T, Ruan J et al. 2009. The Sequence Alignment/Map format and SAMtools. Bioinformatics 25:2078–79
[Google Scholar]
60.
Heydari M, Miclotte G, Van de Peer Y, Fostier J 2018. BrownieAligner: accurate alignment of Illumina sequencing data to de Bruijn graphs. BMC Bioinform 19:311
[Google Scholar]
61.
Hickey G, Heller D, Monlong J, Sibbesen JA, Sirén J et al. 2019. Genotyping structural variants in pangenome graphs using the vg toolkit. Genome Biol 21:35
[Google Scholar]
62.
Holley G, Melsted P. 2019. Bifrost – highly parallel construction and indexing of colored and compacted de Bruijn graphs. bioRxiv 695338. https://doi.org/10.1101/695338
[Crossref]
63.
Huang L, Popic V, Batzoglou S 2013. Short read alignment with populations of genomes. Bioinformatics 29:i361–70
[Google Scholar]
64.
Huang S, Lam T, Sung W, Tam S, Yiu S 2010. Indexing similar DNA sequences. Algorithmic Aspects in Information and Management B Chen 180–90 Lect. Notes Comput. Sci. 6124 Berlin: Springer
[Google Scholar]
65.
Iqbal Z, Caccamo M, Turner I, Flicek P, McVean G 2012. De novo assembly and genotyping of variants using colored de Bruijn graphs. Nat. Genet. 44:226–32
[Google Scholar]
66.
Jain C, Misra S, Zhang H, Dilthey A, Aluru S 2019. Accelerating sequence alignment to graphs. 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS)451–61 Piscataway, NJ: IEEE
[Google Scholar]
67.
Jain C, Zhang H, Gao Y, Aluru S 2019. On the complexity of sequence to graph alignment. Research in Computational Molecular Biology L Cowen 85–100 Cham, Switz: Springer
[Google Scholar]
68.
Jandrasits C, Dabrowski PW, Fuchs S, Renard BY 2018. seq-seq-pan: building a computational pan-genome data structure on whole genome alignment. BMC Genom 19:47
[Google Scholar]
69.
Kavya VNS, Tayal K, Srinivasan R, Sivadasan N 2019. Sequence alignment on directed graphs. J. Comput. Biol. 26:53–67
[Google Scholar]
70.
Kim D, Langmead B, Salzberg SL 2015. HISAT: a fast spliced aligner with low memory requirements. Nat. Methods 12:357–60
[Google Scholar]
71.
Kim D, Paggi JM, Park C, Bennett C, Salzberg SL 2019. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37:907–15
[Google Scholar]
72.
Krzywinski M, Schein J, Birol I, Connors J, Gascoyne R et al. 2009. Circos: an information aesthetic for comparative genomics. Genome Res 19:1639–45
[Google Scholar]
73.
Kunyavskaya O, Prjibelski AD. 2018. SGTK: a toolkit for visualization and assessment of scaffold graphs. Bioinformatics 35:2303–5
[Google Scholar]
74.
Kural D. 2014. Methods for inter-and intra-species genomics for the detection of variation and function PhD Thesis, Boston Coll Boston:
75.
Laing C, Buchanan C, Taboada EN, Zhang Y, Kropinski A et al. 2010. Pan-genome sequence analysis using Panseq: an online tool for the rapid analysis of core and accessory genomic regions. BMC Bioinform 11:461
[Google Scholar]
76.
Langley SA, Miga KH, Karpen GH, Langley CH 2019. Haplotypes spanning centromeric regions reveal persistence of large blocks of archaic DNA. eLife 8:e42989
[Google Scholar]
77.
Lee C, Grasso C, Sharlow MF 2002. Multiple sequence alignment using partial order graphs. Bioinformatics 18:452–64
[Google Scholar]
78.
Lee H, Kingsford C. 2018. Kourami: graph-guided assembly for novel human leukocyte antigen allele discovery. Genome Biol 19:16
[Google Scholar]
79.
Lee W, Plant K, Humburg P, Knight JC 2018. AltHapAlignR: improved accuracy of RNA-seq analyses through the use of alternative haplotypes. Bioinformatics 34:2401–8
[Google Scholar]
80.
Leggett RM, Ramirez-Gonzalez RH, Verweij W, Kawashima CG, Iqbal Z et al. 2013. Identifying and classifying trait linked polymorphisms in non-reference species by walking coloured de Bruijn graphs. PLOS ONE 8:e60058
[Google Scholar]
81.
Li H. 2011. A statistical framework for SNP calling, mutation discovery, association mapping and population genetical parameter estimation from sequencing data. Bioinformatics 27:2987–93
[Google Scholar]
82.
Li H. 2013. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. arXiv:1303.3997 [q-bio.GN]
83.
Li H. 2018. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34:3094–100
[Google Scholar]
84.
Li R, Li Y, Zheng H, Luo R, Zhu H et al. 2010. Building the sequence map of the human pan-genome. Nat. Biotechnol. 28:57–63
[Google Scholar]
85.
Linthorst J, Hulsman M, Holstege H, Reinders M 2015. Scalable multi whole-genome alignment using recursive exact matching. bioRxiv 022715. https://doi.org/10.1101/022715
[Crossref]
86.
Liu X, MacLeod JN, Liu J 2018. iMapSplice: alleviating reference bias through personalized RNA-seq alignment. PLOS ONE 13:e0201554
[Google Scholar]
87.
Maciuca S, del Ojo Elias C, McVean G, Iqbal Z 2016. A natural encoding of genetic variation in a Burrows-Wheeler transform to enable mapping and genome inference. Algorithms in Bioinformatics M Frith, CNS Pedersen 222–33 Lect. Notes Comput. Sci. 9838 Cham, Switz: Springer
[Google Scholar]
88.
Mäkinen V, Navarro G, Sirén J, Välimäki N 2010. Storage and retrieval of highly repetitive sequence collections. J. Comput. Biol. 17:281–308
[Google Scholar]
89.
Manuweera B, Mudge J, Kahanda I, Mumey B, Ramaraj T, Cleary A 2019. Pangenome-wide association studies with frequented regions. ACM-BCB'19: Proceedings of the 10th ACM International Conference on Bioinformatics, Computational Biology and Health Informatics627–32 New York: ACM
[Google Scholar]
90.
Marcus S, Lee H, Schatz MC 2014. SplitMEM: a graphical algorithm for pan-genome analysis with suffix skips. Bioinformatics 30:3476–83
[Google Scholar]
91.
Miao Z, Alvarez M, Pajukanta P, Ko A 2018. ASElux: an ultra-fast and accurate allelic reads counter. Bioinformatics 34:1313–20
[Google Scholar]
92.
Miga KH, Koren S, Rhie A, Vollger MR, Gershman A et al. 2019. Telomere-to-telomere assembly of a complete human X chromosome. bioRxiv 735928. https://doi.org/10.1101/735928
[Crossref]
93.
Mikheenko A, Kolmogorov M. 2019. Assembly Graph Browser: interactive visualization of assembly graphs. Bioinformatics 35:3476–78
[Google Scholar]
94.
Minkin I, Medvedev P. 2019. Scalable multiple whole-genome alignment and locally collinear block construction with SibeliaZ. bioRxiv 548123. https://doi.org/10.1101/548123
[Crossref]
95.
Minkin I, Pham S, Medvedev P 2016. TwoPaCo: an efficient algorithm to build the compacted de Bruijn graph from many complete genomes. Bioinformatics 33:4024–32
[Google Scholar]
96.
Mokveld TO, Linthorst J, Al-Ars Z, Reinders M 2018. CHOP: haplotype-aware path indexing in population graphs. bioRxiv 305268. https://doi.org/10.1101/305268
[Crossref]
97.
Myers EW. 2005. The fragment assembly string graph. Bioinformatics 21:ii79–85
[Google Scholar]
98.
Myers EW, Miller W. 1989. Approximate matching of regular expressions. Bull. Math. Biol. 51:5–37
[Google Scholar]
99.
Na JC, Kim H, Min S, Park H, Lecroq T et al. 2018. FM-index of alignment with gaps. Theor. Comput. Sci. 710:148–57
[Google Scholar]
100.
Na JC, Kim H, Park H, Lecroq T, Léonard M et al. 2016. FM-index of alignment: a compressed index for similar strings. Theor. Comput. Sci. 638:159–70
[Google Scholar]
101.
Navarro G. 2000. Improved approximate pattern matching on hypertext. Theor. Comput. Sci. 237:455–63
[Google Scholar]
102.
Notredame C, Higgins DG, Heringa J 2000. T-Coffee: a novel method for fast and accurate multiple sequence alignment. J. Mol. Biol. 302:205–17
[Google Scholar]
103.
Novak AM, Garrison E, Paten B 2017. A graph extension of the positional Burrows–Wheeler transform and its applications. Algorithms Mol. Biol. 12:18
[Google Scholar]
104.
Novak AM, Hickey G, Garrison E, Blum S, Connelly A et al. 2017. Genome graphs. bioRxiv 101378. https://doi.org/10.1101/101378
[Crossref]
105.
Onodera T, Sadakane K, Shibuya T 2013. Detecting superbubbles in assembly graphs. Algorithms in Bioinformatics A Darling, J Stoye 338–48 Lect. Notes Comput. Sci. 8126 Berlin: Springer
[Google Scholar]
106.
Ou L, Li D, Lv J, Chen W, Zhang Z et al. 2018. Pan-genome of cultivated pepper (Capsicum) and its use in gene presence-absence variation analyses. New Phytol 220:360–63
[Google Scholar]
107.
Page AJ, Cummins CA, Hunt M, Wong VK, Reuter S et al. 2015. Roary: rapid large-scale prokaryote pan genome analysis. Bioinformatics 31:3691–93
[Google Scholar]
108.
Paten B, Eizenga JM, Rosen YM, Novak AM, Garrison E, Hickey G 2018. Superbubbles, ultrabubbles, and cacti. J. Comput. Biol. 25:649–63
[Google Scholar]
109.
Paten B, Novak AM, Eizenga JM, Garrison E 2017. Genome graphs and the evolution of genome inference. Genome Res 27:665–76
[Google Scholar]
110.
Pevzner PA, Tang H, Waterman MS 2001. An Eulerian path approach to DNA fragment assembly. PNAS 98:9748–53
[Google Scholar]
111.
Poplin R, Ruano-Rubio V, DePristo MA, Fennell TJ, Carneiro MO et al. 2018. Scaling accurate genetic variant discovery to tens of thousands of samples. bioRxiv 201178. https://doi.org/10.1101/201178
[Crossref]
112.
Pritt J, Chen NC, Langmead B 2018. FORGe: prioritizing variants for graph genomes. Genome Biol 19:220
[Google Scholar]
113.
Raghupathy N, Choi K, Vincent MJ, Beane GL, Sheppard KS et al. 2018. Hierarchical analysis of RNA-seq reads improves the accuracy of allele-specific expression. Bioinformatics 34:2177–84
[Google Scholar]
114.
Rahn R, Weese D, Reinert K 2014. Journaled string tree—a scalable data structure for analyzing thousands of similar genomes on your laptop. Bioinformatics 30:3499–505
[Google Scholar]
115.
Rakocevic G, Semenyuk V, Lee WP, Spencer J, Browning J et al. 2019. Fast and accurate genomic analyses using genome graphs. Nat. Genet. 51:354–62
[Google Scholar]
116.
Rand KD, Grytten I, Nederbragt AJ, Storvik GO, Glad IK, Sandve GK 2017. Coordinates and intervals in graph-based reference genomes. BMC Bioinform 18:263
[Google Scholar]
117.
Rautiainen M, Mäkinen V, Marschall T 2019. Bit-parallel sequence-to-graph alignment. Bioinformatics 35:3599–607
[Google Scholar]
118.
Rautiainen M, Marschall T. 2017. Aligning sequences to general graphs in O(V + mE) time. bioRxiv 216127. https://doi.org/10.1101/216127
[Crossref]
119.
Rautiainen M, Marschall T. 2019. GraphAligner: rapid and versatile sequence-to-graph alignment. bioRxiv 810812. https://doi.org/10.1101/810812
[Crossref]
120.
Rowe WPM, Winn MD. 2018. Indexed variation graphs for efficient and accurate resistome profiling. Bioinformatics 34:3601–8
[Google Scholar]
121.
Salmela L, Walve R, Rivals E, Ukkonen E 2016. Accurate self-correction of errors in long reads using de Bruijn graphs. Bioinformatics 33:799–806
[Google Scholar]
122.
Schaeffer L, Pimentel H, Bray N, Melsted P, Pachter L 2017. Pseudoalignment for metagenomic read assignment. Bioinformatics 33:2082–88
[Google Scholar]
123.
Schmidt D, Colomb R. 2009. A data structure for representing multi-version texts online. Int. J. Hum.-Comput. Stud. 67:497–514
[Google Scholar]
124.
Schneeberger K, Hagmann J, Ossowski S, Warthmann N, Gesing S et al. 2009. Simultaneous alignment of short reads against multiple genomes. Genome Biol 10:R98
[Google Scholar]
125.
Sheikhizadeh Anari S, de Ridder D, Schranz ME, Smit S 2018. Efficient inference of homologs in large eukaryotic pan-proteomes. BMC Bioinform 19:340
[Google Scholar]
126.
Sherman RM, Forman J, Antonescu V, Puiu D, Daya M et al. 2019. Assembly of a pan-genome from deep sequencing of 910 humans of African descent. Nat. Genet. 51:30–35
[Google Scholar]
127.
Sherman RM, Salzberg SL. 2020. Pan-genomics in the human genome era. Nat. Rev. Genet. 21:243–54
[Google Scholar]
128.
Sibbesen JA, Maretty L, Dan. Pan-Genome Consort, Krogh A 2018. Accurate genotyping across variant classes and lengths using variant graphs. Nat. Genet 50:1054–59
[Google Scholar]
129.
Sirén J. 2017. Indexing variation graphs. Proceedings of the Nineteenth Meeting on Algorithm Engineering and Experiments (ALENEX 2017) S Fekete, V Ramachandran 13–27 Philadelphia: Soc. Ind. Appl. Math.
[Google Scholar]
130.
Sirén J, Garrison E, Novak AM, Paten B, Durbin R 2020. Haplotype-aware graph indexes. Bioinformatics 36:400–7
[Google Scholar]
131.
Sirén J, Välimäki N, Mäkinen V 2014. Indexing graphs for path queries with applications in genome research. IEEE/ACM Trans. Comput. Biol. Bioinform. 11:375–88
[Google Scholar]
132.
Smith TF, Waterman MS. 1981. Comparison of biosequences. Adv. Appl. Math. 2:482–89
[Google Scholar]
133.
Stevenson KR, Coolon JD, Wittkopp PJ 2013. Sources of bias in measures of allele-specific expression derived from RNA-seq data aligned to a single reference genome. BMC Genom 14:536
[Google Scholar]
134.
Sudmant PH, Rausch T, Gardner EJ, Handsaker RE, Abyzov A et al. 2015. An integrated map of structural variation in 2,504 human genomes. Nature 526:75–81
[Google Scholar]
135.
Suzuki H. 2018. Dozeu. GitHub https://github.com/ocxtal/dozeu
[Google Scholar]
136.
Thachuk C. 2013. Indexing hypertext. J. Discrete Algorithms 18:113–22
[Google Scholar]
137.
Turner I, Garimella KV, Iqbal Z, McVean G 2018. Integrating long-range connectivity information into de Bruijn graphs. Bioinformatics 34:2556–65
[Google Scholar]
138.
Vaddadi K, Srinivasan R, Sivadasan N 2019. Read mapping on genome variation graphs. 19th International Workshop on Algorithms in Bioinformatics (WABI 2019) KT Huber, D Gusfield, art. 7 Dagstuhl, Ger: Schloss Dagstuhl–Leibniz-Zent. Inform.
[Google Scholar]
139.
Valenzuela D, Norri T, Välimäki N, Pitkänen E, Mäkinen V 2018. Towards pan-genome read alignment to improve variation calling. BMC Genom 19:87
[Google Scholar]
140.
Vernikos G, Medini D, Riley DR, Tettelin H 2015. Ten years of pan-genome analyses. Curr. Opin. Microbiol. 23:148–54
[Google Scholar]
141.
Wick RR, Schultz MB, Zobel J, Holt KE 2015. Bandage: interactive visualization of de novo genome assemblies. Bioinformatics 31:3350–52
[Google Scholar]
142.
Wu TD, Nacu S. 2010. Fast and SNP-tolerant detection of complex variants and splicing in short reads. Bioinformatics 26:873–81
[Google Scholar]
143.
Yokoyama TT, Sakamoto Y, Seki M, Suzuki Y, Kasahara M 2019. MoMI-G: modular multi-scale integrated genome graph browser. BMC Bioinform 20:548
[Google Scholar]
144.
Yue JX, Li J, Aigrain L, Hallin J, Persson K et al. 2017. Contrasting evolutionary genome dynamics between domesticated and wild yeasts. Nat. Genet. 49:913–24
[Google Scholar]
145.
Zhao Y, Wu J, Yang J, Sun S, Xiao J, Yu J 2011. PGAP: pan-genomes analysis pipeline. Bioinformatics 28:416–18
[Google Scholar]
146.
Zhou B, Wen S, Wang L, Jin L, Li H, Zhang H 2017. AntCaller: an accurate variant caller incorporating ancient DNA damage. Mol. Genet. Genom. 292:1419–30
[Google Scholar]
147.
Zook JM, Chapman B, Wang J, Mittelman D, Hofmann O et al. 2014. Integrating human sequence data sets provides a resource of benchmark SNP and indel genotype calls. Nat. Biotechnol. 32:246–51
[Google Scholar]

/content/journals/10.1146/annurev-genom-120219-080406

Pangenome Graphs

Annual Review of Genomics and Human Genetics 21, 139 (2020); https://doi.org/10.1146/annurev-genom-120219-080406

/content/journals/10.1146/annurev-genom-120219-080406

Data & Media loading...

Article Type: Review Article

Most Cited Most Cited RSS feed

- Next-Generation DNA Sequencing Methods
  
  Elaine R. Mardis
  
  Vol. 9 (2008), pp. 387–402
- Apolipoprotein E: Far More Than a Lipid Transport Protein
  
  Robert W. Mahley, and Stanley C. Rall Jr.
  
  Vol. 1 (2000), pp. 507–537
- A NEW APPROACH TO DECODING LIFE: Systems Biology
  
  Trey Ideker, Timothy Galitski, and Leroy Hood
  
  Vol. 2 (2001), pp. 343–372
- The Ciliopathies: An Emerging Class of Human Genetic Disorders
  
  Jose L. Badano, Norimasa Mitsuma, Phil L. Beales, and Nicholas Katsanis
  
  Vol. 7 (2006), pp. 125–148
- Copy Number Variation in Human Health, Disease, and Evolution
  
  Feng Zhang, Wenli Gu, Matthew E. Hurles, and James R. Lupski
  
  Vol. 10 (2009), pp. 451–481
- Genotype Imputation
  
  Yun Li, Cristen Willer, Serena Sanna, and Gonçalo Abecasis
  
  Vol. 10 (2009), pp. 387–406
- MAMMALIAN CIRCADIAN BIOLOGY: Elucidating Genome-Wide Levels of Temporal Organization
  
  Phillip L. Lowrey, and Joseph S. Takahashi
  
  Vol. 5 (2004), pp. 407–441
- Predicting the Effects of Amino Acid Substitutions on Protein Function
  
  Pauline C. Ng, and Steven Henikoff
  
  Vol. 7 (2006), pp. 61–80
- The RASopathies
  
  Katherine A. Rauen
  
  Vol. 14 (2013), pp. 355–369
- The Toxicogenomic Multiverse: Convergent Recruitment of Proteins Into Animal Venoms
  
  Bryan G. Fry, Kim Roelants, Donald E. Champagne, Holger Scheib, Joel D.A. Tyndall, Glenn F. King, Timo J. Nevalainen, Janette A. Norman, Richard J. Lewis, Raymond S. Norton, Camila Renjifo, and Ricardo C. Rodríguez de la Vega
  
  Vol. 10 (2009), pp. 483–511
More Less

Annual Review of Genomics and Human Genetics

Volume 21, 2020

Review Article

Free

Pangenome Graphs

Abstract

Most Read This Month

Most Cited Most Cited RSS feed