The Theory and Practice of Genome Sequence Assembly

Jared T. Simpson; Mihai Pop

doi:10.1146/annurev-genom-090314-050032

Annual Review of Genomics and Human Genetics

Volume 16, 2015

Review Article

Free

The Theory and Practice of Genome Sequence Assembly

Jared T. Simpson¹, and Mihai Pop²
View Affiliations Hide Affiliations

Affiliations: ¹Ontario Institute for Cancer Research, Toronto, Ontario M5G 0A3, Canada; email: [email protected] ²Center for Bioinformatics and Computational Biology, University of Maryland, College Park, Maryland 20742; email: [email protected]
Vol. 16:153-172 (Volume publication date August 2015) https://doi.org/10.1146/annurev-genom-090314-050032
First published as a Review in Advance on April 22, 2015
© Annual Reviews

Abstract

The current genomic revolution was made possible by joint advances in genome sequencing technologies and computational approaches for analyzing sequence data. The close interaction between biologists and computational scientists is perhaps most apparent in the development of approaches for sequencing entire genomes, a feat that would not be possible without sophisticated computational tools called genome assemblers (short for genome sequence assemblers). Here, we survey the key developments in algorithms for assembling genome sequences since the development of the first DNA sequencing methods more than 35 years ago.

Keyword(s): algorithm, bioinformatics, genome sequencing, sequence assembly, shotgun sequencing

Article metrics loading...

/content/journals/10.1146/annurev-genom-090314-050032

2015-08-24

2024-05-09

Full text loading...

/deliver/fulltext/genom/16/1/annurev-genom-090314-050032.html?itemId=/content/journals/10.1146/annurev-genom-090314-050032&mimeType=html&fmt=ahah

Literature Cited

1. 1000 Genomes Proj. Consort 2010. A map of human genome variation from population-scale sequencing. Nature 467:1061–73 [Google Scholar]
Adams MD, Celniker SE, Holt RA, Evans CA, Gocayne JD. 2. et al. 2000. The genome sequence of Drosophila melanogaster. Science 287:2185–95 [Google Scholar]
Ammiraju JS, Yu Y, Luo M, Kudrna D, Kim H. 3. et al. 2005. Random sheared fosmid library as a new genomic tool to accelerate complete finishing of rice (Oryza sativa spp. Nipponbare) genome sequence: sequencing of gap-specific fosmid clones uncovers new euchromatic portions of the genome. Theor. Appl. Genet. 111:1596–607 [Google Scholar]
Bankevich A, Nurk S, Antipov D, Gurevich AA, Dvorkin M. 4. et al. 2012. SPAdes: a new genome assembly algorithm and its applications to single-cell sequencing. J. Comput. Biol. 19:455–77 [Google Scholar]
Ben-Bassat I, Chor B. 5. 2014. String graph construction using incremental hashing. Bioinformatics 30:3515–23 [Google Scholar]
Berlin K, Koren S, Chin C-S, Drake JP, Landolin JM, Phillippy AM. 6. 2015. Assembling large genomes with single-molecule sequencing and locality-sensitive hashing. Nat. Biotechnol.. 33623–30
Bloom BH. 7. 1970. Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13:422–26 [Google Scholar]
Boetzer M, Henkel CV, Jansen HJ, Butler D, Pirovano W. 8. 2011. Scaffolding pre-assembled contigs using SSPACE. Bioinformatics 27:578–79 [Google Scholar]
Bowe A, Onodera T, Sadakane K, Shibuya T. 9. 2012. Succinct de Bruijn graphs. Algorithms in Bioinformatics B Raphael, J Tang 225–35 Lect. Notes Bioinform. 7534 Berlin: Springer [Google Scholar]
Boža V, Brejová B, Vinař T. 10. 2014. GAML: genome assembly by maximum likelihood. Algorithms in Bioinformatics D Brown, B Morgenstern 122–34 Lect. Notes Bioinform. 8701 Berlin: Springer [Google Scholar]
Bradnam KR, Fass JN, Alexandrov A, Baranay P, Bechner M. 11. et al. 2013. Assemblathon 2: evaluating de novo methods of genome assembly in three vertebrate species. GigaScience 2:10 [Google Scholar]
Brenchley R, Spannagl M, Pfeifer M, Barker GL, D'Amore R. 12. et al. 2012. Analysis of the bread wheat genome using whole-genome shotgun sequencing. Nature 491:705–10 [Google Scholar]
Butler J, MacCallum I, Kleber M, Shlyakhter IA, Belmonte MK. 13. et al. 2008. ALLPATHS: de novo assembly of whole-genome shotgun microreads. Genome Res. 18:810–20 [Google Scholar]
Carnevali P, Baccash J, Halpern AL, Nazarenko I, Nilsen GB. 14. et al. 2012. Computational techniques for human genome resequencing using mated gapped reads. J. Comput. Biol. 19:279–92 [Google Scholar]
Chaisson MJ, Pevzner PA. 15. 2008. Short read fragment assembly of bacterial genomes. Genome Res. 18:324–30 [Google Scholar]
Chikhi R, Rizk G. 16. 2012. Space-efficient and exact de Bruijn graph representation based on a bloom filter. Algorithms in Bioinformatics B Raphael, J Tang 236–48 Lect. Notes Bioinform. 7534 Berlin: Springer [Google Scholar]
Chin CS, Alexander DH, Marks P, Klammer AA, Drake J. 17. et al. 2013. Nonhybrid, finished microbial genome assemblies from long-read SMRT sequencing data. Nat. Methods 10:563–69Provided the first demonstration that third-generation sequencing data can be used to assemble entire bacterial genomes despite the high error rates. [Google Scholar]
Clark SC, Egan R, Frazier PI, Wang Z. 18. 2013. ALE: a generic assembly likelihood evaluation framework for assessing the accuracy of genome and metagenome assemblies. Bioinformatics 29:435–43 [Google Scholar]
Conway TC, Bromage AJ. 19. 2011. Succinct data structures for assembling large genomes. Bioinformatics 27:479–86 [Google Scholar]
Csűrös M, Milosavljevic A. 20. 2002. Pooled genomic indexing (PGI): mathematical analysis and experiment design. Algorithms in Bioinformatics R Guigó, D Gusfield 10–28 Lect. Notes Comput. Sci. 2452 Berlin: Springer [Google Scholar]
Dayarian A, Michael TP, Sengupta AM. 21. 2010. SOPRA: scaffolding algorithm for paired reads via statistical optimization. BMC Bioinform. 11:345 [Google Scholar]
Dew IM, Walenz B, Sutton G. 22. 2005. A tool for analyzing mate pairs in assemblies (TAMPA). J. Comput. Biol. 12:497–513 [Google Scholar]
Dinh H, Rajasekaran S. 23. 2011. A memory-efficient data structure representing exact-match overlap graphs with application for next-generation DNA assembly. Bioinformatics 27:1901–7 [Google Scholar]
Donmez N, Brudno M. 24. 2013. SCARPA: scaffolding reads with practical algorithms. Bioinformatics 29:428–34 [Google Scholar]
Earl D, Bradnam K, St. John J, Darling A, Lin D. 25. et al. 2011. Assemblathon 1: a competitive assessment of de novo short read assembly methods. Genome Res. 21:2224–41 [Google Scholar]
Eid J, Fehr A, Gray J, Luong K, Lyle J. 26. et al. 2009. Real-time DNA sequencing from single polymerase molecules. Science 323:133–38 [Google Scholar]
Eppley JM, Tyson GW, Getz WM, Banfield JF. 27. 2007. Strainer: software for analysis of population variation in community genomic datasets. BMC Bioinform. 8:398 [Google Scholar]
Ferragina P, Manzini G. 28. 2000. Opportunistic data structures with applications Presented at Annu. Symp. Found. Comput. Sci., 41st, Redondo Beach, CA, Nov. 12–14
Fleischmann RD, Adams MD, White O, Clayton RA, Kirkness EF. 29. et al. 1995. Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. Science 269:496–512Provided the first demonstration that shotgun sequencing can be used to reconstruct an entire bacterial genome. [Google Scholar]
Gao S, Sung WK, Nagarajan N. 30. 2011. Opera: reconstructing optimal genomic scaffolds with high-throughput paired-end sequences. J. Comput. Biol. 18:1681–91 [Google Scholar]
Ghodsi M, Hill CM, Astrovskaya I, Lin H, Sommer DD. 31. et al. 2013. De novo likelihood-based measures for comparing genome assemblies. BMC Res. Notes 6:334 [Google Scholar]
Gnerre S, Maccallum I, Przybylski D, Ribeiro FJ, Burton JN. 32. et al. 2011. High-quality draft assemblies of mammalian genomes from massively parallel sequence data. PNAS 108:1513–18Described ALLPATHS-LG, among the best assemblers for large genomes, and introduced the idea that the integration of assembly algorithms with experimental design is necessary to achieve good results. [Google Scholar]
Gonnella G, Kurtz S. 33. 2012. Readjoiner: a fast and memory efficient string graph-based sequence assembler. BMC Bioinform. 13:82 [Google Scholar]
Green P. 34. 1994. Appendix: algorithms. Documentation for Phrap and Cross_Match (Version 0.990319) http://www.phrap.org/phredphrap/phrap.html [Google Scholar]
Green P. 35. 1997. Against a whole-genome shotgun. Genome Res. 7:410–17 [Google Scholar]
Hernandez D, Francois P, Farinelli L, Osteras M, Schrenzel J. 36. 2008. De novo bacterial genome sequencing: millions of very short reads assembled on a desktop computer. Genome Res. 18:802–9 [Google Scholar]
Huang X, Madan A. 37. 1999. CAP3: a DNA sequence assembly program. Genome Res. 9:868–77 [Google Scholar]
Hunt M, Kikuchi T, Sanders M, Newbold C, Berriman M, Otto TD. 38. 2013. REAPR: a universal tool for genome assembly evaluation. Genome Biol. 14:R47 [Google Scholar]
Huson DH, Reinert K, Myers E. 39. 2001. The greedy path-merging algorithm for sequence assembly. Proceedings of the Fifth Annual International Conference on Computational Biology157–63 New York: ACM
Idury RM, Waterman MS. 40. 1995. A new algorithm for DNA sequence assembly. J. Comput. Biol. 2:291–306 [Google Scholar]
41. Int. Hum. Genome Seq. Consort. 2001. Initial sequencing and analysis of the human genome. Nature 409:860–921Described the reconstruction of the human genome using a hierarchical sequencing strategy. [Google Scholar]
Iqbal Z, Caccamo M, Turner I, Flicek P, McVean G. 42. 2012. De novo assembly and genotyping of variants using colored de Bruijn graphs. Nat. Genet. 44:226–32 [Google Scholar]
Jaffe DB, Butler J, Gnerre S, Mauceli E, Lindblad-Toh K. 43. et al. 2003. Whole-genome sequence assembly for mammalian genomes: Arachne 2. Genome Res. 13:91–96 [Google Scholar]
Jeck WR, Reinhardt JA, Baltrus DA, Hickenbotham MT, Magrini V. 44. et al. 2007. Extending assembly of short DNA sequences to handle error. Bioinformatics 23:2942–44 [Google Scholar]
Kececioglu JD, Myers EW. 45. 1995. Combinatorial algorithms for DNA sequence assembly. Algorithmica 13:7–51Introduced the main concepts and algorithmic underpinnings of the OLC assembly paradigm. [Google Scholar]
Kent WJ, Haussler D. 46. 2001. Assembly of the working draft of the human genome with GigAssembler. Genome Res. 11:1541–48 [Google Scholar]
Kingsford C, Schatz MC, Pop M. 47. 2010. Assembly complexity of prokaryotic genomes using short reads. BMC Bioinform. 11:21 [Google Scholar]
Koren S, Harhay GP, Smith TP, Bono JL, Harhay DM. 48. et al. 2013. Reducing assembly complexity of microbial genomes with single-molecule sequencing. Genome Biol. 14:R101 [Google Scholar]
Koren S, Schatz MC, Walenz BP, Martin J, Howard JT. 49. et al. 2012. Hybrid error correction and de novo assembly of single-molecule sequencing reads. Nat. Biotechnol. 30:693–700 [Google Scholar]
Koren S, Treangen TJ, Hill C, Pop M, Phillippy A. 50. 2014. Automated ensemble assembly and validation of microbial genomes. BMC Bioinform. 15:126 [Google Scholar]
Koren S, Treangen TJ, Pop M. 51. 2011. Bambus 2: scaffolding metagenomes. Bioinformatics 27:2964–71 [Google Scholar]
Lander ES, Waterman MS. 52. 1988. Genomic mapping by fingerprinting random clones: a mathematical analysis. Genomics 2:231–39Laid the foundation for the statistical analysis of the shotgun sequencing process. [Google Scholar]
Langmead B, Trapnell C, Pop M, Salzberg SL. 53. 2009. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10:R25 [Google Scholar]
Li H. 54. 2012. Exploring single-sample SNP and INDEL calling with whole-genome de novo assembly. Bioinformatics 28:1838–44 [Google Scholar]
Li H, Durbin R. 55. 2009. Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics 25:1754–60 [Google Scholar]
Li R, Fan W, Tian G, Zhu H, He L. 56. et al. 2010. The sequence and de novo assembly of the giant panda genome. Nature 463:311–17 [Google Scholar]
Li R, Yu C, Li Y, Lam TW, Yiu SM. 57. et al. 2009. SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics 25:1966–67 [Google Scholar]
Lippert RA. 58. 2005. Space-efficient whole genome comparisons with Burrows-Wheeler transforms. J. Comput. Biol. 12:407–15 [Google Scholar]
Lonardi S, Duma D, Alpert M, Cordero F, Beccuti M. 59. et al. 2013. Combinatorial pooling enables selective sequencing of the barley gene space. PLOS Comput. Biol. 9:e1003010 [Google Scholar]
Luo R, Liu B, Xie Y, Li Z, Huang W. 60. et al. 2012. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. GigaScience 1:18 [Google Scholar]
Maier D. 61. 1978. The complexity of some problems on subsequences and supersequences. J. ACM 25:322–36 [Google Scholar]
Medvedev P, Brudno M. 62. 2009. Maximum likelihood genome assembly. J. Comput. Biol. 16:1101–16 [Google Scholar]
Medvedev P, Georgiou K, Myers G, Brudno M. 63. 2007. Computability of models for sequence assembly. Algorithms in Bioinformatics R Giancarlo, S Hannenhalli 289–301 Lect. Notes. Bioinform. 4645 Berlin: Springer [Google Scholar]
Medvedev P, Pham S, Chaisson M, Tesler G, Pevzner P. 64. 2011. Paired de Bruijn graphs: a novel approach for incorporating mate pair information into genome assemblers. J. Comput. Biol. 18:1625–34 [Google Scholar]
Melsted P, Pritchard JK. 65. 2011. Efficient counting of k-mers in DNA sequences using a bloom filter. BMC Bioinform. 12:333 [Google Scholar]
Moncunill V, Gonzalez S, Beà S, Andrieux LO, Salaverria I. 66. et al. 2014. Comprehensive characterization of complex structural variations in cancer by directly comparing genome sequence reads. Nat. Biotechnol. 32:1106–12 [Google Scholar]
Morowitz MJ, Denef VJ, Costello EK, Thomas BC, Poroyko V. 67. et al. 2011. Strain-resolved community genomic analysis of gut microbial colonization in a premature infant. PNAS 108:1128–33 [Google Scholar]
Myers EW. 68. 1995. Toward simplifying and accurately formulating fragment assembly. J. Comput. Biol. 2:275–90 [Google Scholar]
Myers EW. 69. 2005. The fragment assembly string graph. Bioinformatics 21:Suppl. 279–85Introduced the string graph assembly framework. [Google Scholar]
Myers EW, Sutton GG, Delcher AL, Dew IM, Fasulo DP. 70. et al. 2000. A whole-genome assembly of Drosophila. Science 287:2196–204 [Google Scholar]
Myers EW, Weber JL. 71. 1997. Is whole human genome sequencing feasible?. Theoretical and Computational Methods in Genomic Research S Suhai 73–89 New York: Springer [Google Scholar]
Myers G. 72. 2014. Efficient local alignment discovery amongst noisy long reads. Algorithms in Bioinformatics D Brown, B Morgenstern 52–67 Lect. Notes Bioinform. 8701 Berlin: Springer [Google Scholar]
Nagarajan N, Pop M. 73. 2009. Parametric complexity of sequence assembly: theory and applications to next generation sequencing. J. Comput. Biol. 16:897–908 [Google Scholar]
Nagarajan N, Read TD, Pop M. 74. 2008. Scaffolding and validation of bacterial genome assemblies using optical restriction maps. Bioinformatics 24:1229–35 [Google Scholar]
Namiki T, Hachiya T, Tanaka H, Sakakibara Y. 75. 2011. MetaVelvet: an extension of Velvet assembler to de novo metagenome assembly from short sequence reads. Proceedings of the 2nd ACM Conference on Bioinformatics, Computational Biology and Biomedicine116–24 New York: ACM
Narzisi G, O'Rawe JA, Iossifov I, Fang H, Lee YH. 76. et al. 2014. Accurate de novo and transmitted indel detection in exome-capture data using microassembly. Nat. Methods 11:1033–36 [Google Scholar]
Nijkamp JF, Pop M, Reinders MJ, de Ridder D. 77. 2013. Exploring variation-aware contig graphs for (comparative) metagenomics using MaryGold. Bioinformatics 29:2826–34 [Google Scholar]
Ning Z, Cox AJ, Mullikin JC. 78. 2001. SSAHA: a fast search method for large DNA databases. Genome Res. 11:1725–29 [Google Scholar]
Okanohara D, Sadakane K. 79. 2007. Practical entropy-compressed rank/select dictionary. Proceedings of the Ninth Workshop on Algorithm Engineering and Experiments (ALENEX) D Applegate, GS Brodal, pp. 60–70 Philadelphia: Soc. Ind. Appl. Math. [Google Scholar]
Pell J, Hintze A, Canino-Koning R, Howe A, Tiedje JM, Brown CT. 80. 2012. Scaling metagenome sequence assembly with probabilistic de Bruijn graphs. PNAS 109:13272–77 [Google Scholar]
Peng Y, Leung HC, Yiu SM, Chin FY. 81. 2011. Meta-IDBA: a de Novo assembler for metagenomic data. Bioinformatics 27:i94–101 [Google Scholar]
Peng Y, Leung HC, Yiu SM, Chin FY. 82. 2012. IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28:1420–28 [Google Scholar]
Peterlongo P, Schnel N, Pisanti N, Sagot M-F, Lacroix V. 83. 2010. Identifying SNPs without a reference genome by comparing raw reads. String Processing and Information Retrieval E Chavez, S Lonardi 147–58 Lect. Notes. Comput. Sci. 6393 Berlin: Springer [Google Scholar]
Pevzner PA. 84. 1989. 1-Tuple DNA sequencing: computer analysis. J. Biomol. Struct. Dyn. 7:63–73 [Google Scholar]
Pevzner PA, Tang H. 85. 2001. Fragment assembly with double-barreled data. Bioinformatics 17:Suppl. 1S225–33 [Google Scholar]
Pevzner PA, Tang H, Waterman MS. 86. 2001. An Eulerian path approach to DNA fragment assembly. PNAS 98:9748–53Introduced the use of the de Bruijn graph approach in the assembly of shotgun sequencing data. [Google Scholar]
Pham SK, Antipov D, Sirotkin A, Tesler G, Pevzner PA, Alekseyev MA. 87. 2013. Pathset graphs: a novel approach for comprehensive utilization of paired reads in genome assembly. J. Comput. Biol. 20:359–71 [Google Scholar]
Phillippy AM, Schatz MC, Pop M. 88. 2008. Genome assembly forensics: finding the elusive mis-assembly. Genome Biol. 9:R55 [Google Scholar]
Pop M, Kosack DS, Salzberg SL. 89. 2004. Hierarchical scaffolding with Bambus. Genome Res. 14:149–59 [Google Scholar]
Price J, Ward J, Udall J, Snell Q, Clement M. 90. 2013. Identification and correction of substitution errors in Moleculo long reads Presented at IEEE Int. Conf. Bioinform. Bioeng., 13th, Chania, Greece, Nov. 10–13
Rahman A, Pachter L. 91. 2013. CGAL: computing genome assembly likelihoods. Genome Biol. 14R8
Räihä K-J, Ukkonen E. 92. 1981. The shortest common supersequence problem over binary alphabet is NP-complete. Theor. Comput. Sci. 16:187–98 [Google Scholar]
Rimmer A, Phan H, Mathieson I, Iqbal Z, Twigg SR. 93. et al. 2014. Integrating mapping-, assembly- and haplotype-based approaches for calling variants in clinical sequencing applications. Nat. Genet. 46:912–18 [Google Scholar]
Rodland EA. 94. 2013. Compact representation of k-mer de Bruijn graphs for genome read assembly. BMC Bioinform. 14:313 [Google Scholar]
Salmela L, Makinen V, Valimaki N, Ylinen J, Ukkonen E. 95. 2011. Fast scaffolding with small independent mixed integer programs. Bioinformatics 27:3259–65 [Google Scholar]
Salzberg SL, Phillippy AM, Zimin AV, Puiu D, Magoc T. 96. et al. 2012. GAGE: a critical evaluation of genome assemblies and assembly algorithms. Genome Res. 22:557–67 [Google Scholar]
Sanger F. 97. 1975. The Croonian Lecture, 1975: Nucleotide sequences in DNA. Proc. R. Soc. Lond. B 191:317–33 [Google Scholar]
Schwartz DC, Li X, Hernandez LI, Ramnarain SP, Huff EJ, Wang YK. 98. 1993. Ordered restriction maps of Saccharomyces cerevisiae chromosomes constructed by optical mapping. Science 262:110–14 [Google Scholar]
Simpson JT. 99. 2014. Exploring genome characteristics and sequence quality without a reference. Bioinformatics 30:1228–35 [Google Scholar]
Simpson JT, Durbin R. 100. 2010. Efficient construction of an assembly string graph using the FM-index. Bioinformatics 26:i367–73 [Google Scholar]
Simpson JT, Durbin R. 101. 2012. Efficient de novo assembly of large genomes using compressed data structures.. Genome Res. 22:549–56 [Google Scholar]
Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJ, Birol I. 102. 2009. ABySS: a parallel assembler for short read sequence data. Genome Res. 19:1117–23 [Google Scholar]
Sindi S, Helman E, Bashir A, Raphael BJ. 103. 2009. A geometric approach for classification and comparison of structural variants. Bioinformatics 25:i222–30 [Google Scholar]
Staden R. 104. 1979. A strategy of DNA sequencing employing computer programs. Nucleic Acids Res. 6:2601–10 [Google Scholar]
Star B, Nederbragt AJ, Jentoft S, Grimholt U, Malmstrom M. 105. et al. 2011. The genome sequence of Atlantic cod reveals a unique immune system. Nature 477:207–10 [Google Scholar]
Sutton GG, White O, Adams MD, Kerlavage AR. 106. 1995. TIGR Assembler: a new tool for assembling large shotgun sequencing projects. Genome Sci. Technol. 1:9–19 [Google Scholar]
Tarhio J, Ukkonen E. 107. 1988. A greedy approximation algorithm for constructing shortest common superstrings. Theor. Comput. Sci. 57:131–45 [Google Scholar]
108. Tomato Genome Consort 2012. The tomato genome sequence provides insights into fleshy fruit evolution. Nature 485:635–41 [Google Scholar]
Venter JC, Adams MD, Myers EW, Li PW, Mural RJ. 109. et al. 2001. The sequence of the human genome. Science 291:1304–51Demonstrated that the shotgun sequencing approach is feasible for the human genome. [Google Scholar]
Voskoboynik A, Neff NF, Sahoo D, Newman AM, Pushkarev D. 110. et al. 2013. The genome sequence of the colonial chordate, Botryllus schlosseri. eLife 2:e00569 [Google Scholar]
Warren RL, Sutton GG, Jones SJ, Holt RA. 111. 2007. Assembling millions of short DNA sequences using SSAKE. Bioinformatics 23:500–1 [Google Scholar]
Weisenfeld NI, Yin S, Sharpe T, Lau B, Hegarty R. 112. et al. 2014. Comprehensive variation discovery in single human genomes. Nat. Genet. 46:1350–55 [Google Scholar]
Ye C, Ma ZS, Cannon CH, Pop M, Yu DW. 113. 2012. Exploiting sparseness in de novo genome assembly. BMC Bioinform. 13:Suppl. 6S1 [Google Scholar]
Zerbino DR, Birney E. 114. 2008. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18:821–29Described the first successful practical implementation of the de Bruijn graph paradigm for assembly. [Google Scholar]
Zhou S, Wei F, Nguyen J, Bechner M, Potamousis K. 115. et al. 2009. A single molecule scaffold for the maize genome. PLOS Genet. 5:e1000711 [Google Scholar]
Zimin AV, Marcais G, Puiu D, Roberts M, Salzberg SL, Yorke JA. 116. 2013. The MaSuRCA genome assembler. Bioinformatics 29:2669–77 [Google Scholar]
Zimin AV, Smith DR, Sutton G, Yorke JA. 117. 2008. Assembly reconciliation. Bioinformatics 24:42–45 [Google Scholar]
Zimin AV, Stevens KA, Crepeau MW, Holtz-Morris A, Koriabine M. 118. et al. 2014. Sequencing and assembly of the 22-Gb loblolly pine genome. Genetics 196:875–90 [Google Scholar]

/content/journals/10.1146/annurev-genom-090314-050032

The Theory and Practice of Genome Sequence Assembly

Annual Review of Genomics and Human Genetics 16, 153 (2015); https://doi.org/10.1146/annurev-genom-090314-050032

/content/journals/10.1146/annurev-genom-090314-050032

Data & Media loading...

Article Type: Review Article

Most Cited Most Cited RSS feed

- Next-Generation DNA Sequencing Methods
  
  Elaine R. Mardis
  
  Vol. 9 (2008), pp. 387–402
- Apolipoprotein E: Far More Than a Lipid Transport Protein
  
  Robert W. Mahley, and Stanley C. Rall Jr.
  
  Vol. 1 (2000), pp. 507–537
- A NEW APPROACH TO DECODING LIFE: Systems Biology
  
  Trey Ideker, Timothy Galitski, and Leroy Hood
  
  Vol. 2 (2001), pp. 343–372
- The Ciliopathies: An Emerging Class of Human Genetic Disorders
  
  Jose L. Badano, Norimasa Mitsuma, Phil L. Beales, and Nicholas Katsanis
  
  Vol. 7 (2006), pp. 125–148
- Copy Number Variation in Human Health, Disease, and Evolution
  
  Feng Zhang, Wenli Gu, Matthew E. Hurles, and James R. Lupski
  
  Vol. 10 (2009), pp. 451–481
- Genotype Imputation
  
  Yun Li, Cristen Willer, Serena Sanna, and Gonçalo Abecasis
  
  Vol. 10 (2009), pp. 387–406
- MAMMALIAN CIRCADIAN BIOLOGY: Elucidating Genome-Wide Levels of Temporal Organization
  
  Phillip L. Lowrey, and Joseph S. Takahashi
  
  Vol. 5 (2004), pp. 407–441
- Predicting the Effects of Amino Acid Substitutions on Protein Function
  
  Pauline C. Ng, and Steven Henikoff
  
  Vol. 7 (2006), pp. 61–80
- The RASopathies
  
  Katherine A. Rauen
  
  Vol. 14 (2013), pp. 355–369
- The Toxicogenomic Multiverse: Convergent Recruitment of Proteins Into Animal Venoms
  
  Bryan G. Fry, Kim Roelants, Donald E. Champagne, Holger Scheib, Joel D.A. Tyndall, Glenn F. King, Timo J. Nevalainen, Janette A. Norman, Richard J. Lewis, Raymond S. Norton, Camila Renjifo, and Ricardo C. Rodríguez de la Vega
  
  Vol. 10 (2009), pp. 483–511
More Less

Annual Review of Genomics and Human Genetics

Volume 16, 2015

Review Article

Free

The Theory and Practice of Genome Sequence Assembly

Abstract

Most Read This Month

Most Cited Most Cited RSS feed