1932

Abstract

Large-scale genomics demands computational methods that scale sublinearly with the growth of data. We review several data structures and sketching techniques that have been used in genomic analysis methods. Specifically, we focus on four key ideas that take different approaches to achieve sublinear space usage and processing time: compressed full-text indices, approximate membership query data structures, locality-sensitive hashing, and minimizers schemes. We describe these techniques at a high level and give several representative applications of each.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-biodatasci-072018-021156
2019-07-20
2024-06-17
Loading full text...

Full text loading...

/deliver/fulltext/biodatasci/2/1/annurev-biodatasci-072018-021156.html?itemId=/content/journals/10.1146/annurev-biodatasci-072018-021156&mimeType=html&fmt=ahah

Literature Cited

  1. 1. 
    Stephens ZD, Lee SY, Faghri F, Campbell RH, Zhai C et al. 2015. Big data: Astronomical or genomical. PLOS Biol. 13:e1002195
    [Google Scholar]
  2. 2. 
    1000 Genomes Proj. Consort 2015. A global reference for human genetic variation. Nature 526:68–74
    [Google Scholar]
  3. 3. 
    Sudmant PH, Rausch T, Gardner EJ, Handsaker RE, Abyzov A et al. 2015. An integrated map of structural variation in 2,504 human genomes. Nature 526:75–81
    [Google Scholar]
  4. 4. 
    Weinstein JN, Collisson EA, Mills GB, Shaw KRM, Ozenberger BA et al. 2013. The Cancer Genome Atlas pan-cancer analysis project. Nat. Genet. 45:1113–20
    [Google Scholar]
  5. 5. 
    Wang Z, Gerstein M, Snyder M 2009. RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 10:57–63
    [Google Scholar]
  6. 6. 
    Edgar R, Domrachev M, Lash AE 2002. Gene Expression Omnibus: NCBI gene expression and hybridization array data repository. Nucleic Acids Res. 30:207–10
    [Google Scholar]
  7. 7. 
    Leinonen R, Sugawara H, Shumway M 2011. The sequence read archive. Nucleic Acids Res. 39:D19–21
    [Google Scholar]
  8. 8. 
    Stoesser G, Baker W, van den Broek A, Camon E, Garcia-Pastor M et al. 2002. The EMBL Nucleotide Sequence Database. Nucleic Acids Res. 30:21–26
    [Google Scholar]
  9. 9. 
    Leinonen R, Akhtar R, Birney E, Bower L, Cerdeno-Tárraga A et al. 2010. The European Nucleotide Archive. Nucleic Acids Res. 39:D28–31
    [Google Scholar]
  10. 10. 
    Loh PR, Baym M, Berger B 2012. Compressive genomics. Nat. Biotechnol. 30:627–30
    [Google Scholar]
  11. 11. 
    Berger B, Daniels NM, Yu YW 2016. Computational biology in the 21st century: scaling with compressive algorithms. Commun. ACM 59:72–80
    [Google Scholar]
  12. 12. 
    Ferragina P, Manzini G 2005. Indexing compressed text. J. ACM 52:552–81
    [Google Scholar]
  13. 13. 
    Kurtz S, Phillippy A, Delcher AL, Smoot M, Shumway M et al. 2004. Versatile and open software for comparing large genomes. Genome Biol. 5:R12
    [Google Scholar]
  14. 14. 
    Marçais G, Delcher AL, Phillippy AM, Coston R, Salzberg SL, Zimin A 2018. MUMmer4: a fast and versatile genome alignment system. PLOS Comput. Biol. 14:e1005944
    [Google Scholar]
  15. 15. 
    Langmead B, Trapnell C, Pop M, Salzberg SL 2009. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10:R25
    [Google Scholar]
  16. 16. 
    Langmead B, Salzberg SL 2012. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9:357–59
    [Google Scholar]
  17. 17. 
    Li H, Durbin R 2009. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25:1754–60
    [Google Scholar]
  18. 18. 
    Li H, Durbin R 2010. Fast and accurate long-read alignment with Burrows–Wheeler transform. Bioinformatics 26:589–95
    [Google Scholar]
  19. 19. 
    Chaisson MJ, Tesler G 2012. Mapping single molecule sequencing reads using basic local alignment with successive refinement (BLASR): application and theory. BMC Bioinformatics 13:238
    [Google Scholar]
  20. 20. 
    Kim D, Langmead B, Salzberg SL 2015. HISAT: a fast spliced aligner with low memory requirements. Nat. Methods 12:357–60
    [Google Scholar]
  21. 21. 
    Li R, Yu C, Li Y, Lam TW, Yiu SM et al. 2009. SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics 25:1966–67
    [Google Scholar]
  22. 22. 
    Simpson JT, Durbin R 2012. Efficient de novo assembly of large genomes using compressed data structures. Genome Res. 22:549–56
    [Google Scholar]
  23. 23. 
    Chikhi R, Limasset A, Jackman S, Simpson JT, Medvedev P 2015. On the representation of de Bruijn graphs. J. Comput. Biol. 22:336–52
    [Google Scholar]
  24. 24. 
    Chikhi R, Limasset A, Medvedev P 2016. Compacting de Bruijn graphs from sequencing data quickly and in low memory. Bioinformatics 32:i201–8
    [Google Scholar]
  25. 25. 
    Yu YW, Daniels NM, Danko DC, Berger B 2015. Entropy-scaling search of massive biological data. Cell Syst. 1:130–40
    [Google Scholar]
  26. 26. 
    Daniels NM, Gallant A, Peng J, Cowen LJ, Baym M, Berger B 2013. Compressive genomics for protein databases. Bioinformatics 29:i283–90
    [Google Scholar]
  27. 27. 
    Yorukoglu D, Yu YW, Peng J, Berger B 2016. Compressive mapping for next-generation sequencing. Nat. Biotechnol. 34:374–76
    [Google Scholar]
  28. 28. 
    Marçais G, Kingsford C 2011. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics 27:764–70
    [Google Scholar]
  29. 29. 
    Melsted P, Pritchard JK 2011. Efficient counting of k-mers in DNA sequences using a Bloom filter. BMC Bioinform. 12:333
    [Google Scholar]
  30. 30. 
    Solomon B, Kingsford C 2016. Fast search of thousands of short-read sequencing experiments. Nat. Biotechnol. 34:300–2
    [Google Scholar]
  31. 31. 
    Solomon B, Kingsford C 2018. Improved search of large transcriptomic sequencing databases using split sequence Bloom trees. J. Comput. Biol. 25:755–65
    [Google Scholar]
  32. 32. 
    Sun C, Harris RS, Chikhi R, Medvedev P 2018. AllSome sequence bloom trees. J. Comput. Biol. 25:467–79
    [Google Scholar]
  33. 33. 
    Pandey P, Almodaresi F, Bender MA, Ferdman M, Johnson R, Patro R 2018. Mantis: a fast, small, and exact large-scale sequence-search index. Cell Syst. 7:201–7.e4
    [Google Scholar]
  34. 34. 
    Zhang Q, Pell J, Canino-Koning R, Howe AC, Brown CT 2014. These are not the k-mers you are looking for: efficient online k-mer counting using a probabilistic data structure. PLOS ONE 9:e101271
    [Google Scholar]
  35. 35. 
    Mohamadi H, Khan H, Birol I 2017. ntCard: a streaming algorithm for cardinality estimation in genomics data. Bioinformatics 33:1324–30
    [Google Scholar]
  36. 36. 
    Buhler J 2001. Efficient large-scale sequence comparison by locality-sensitive hashing. Bioinformatics 17:419–28
    [Google Scholar]
  37. 37. 
    Berlin K, Koren S, Chin CS, Drake JP, Landolin JM, Phillippy AM 2015. Assembling large genomes with single-molecule sequencing and locality-sensitive hashing. Nat. Biotechnol. 33:623–30
    [Google Scholar]
  38. 38. 
    Jain C, Dilthey A, Koren S, Aluru S, Phillippy AM 2017. A fast approximate algorithm for mapping long reads to large reference databases. Research in Computational Molecular Biology SC Sahinalp66–81 Cham, Switz.: Springer Int.
    [Google Scholar]
  39. 39. 
    Rasheed Z, Rangwala H, Barbara D 2012. LSH-Div: species diversity estimation using locality sensitive hashing. 2012 IEEE International Conference on Bioinformatics and Biomedicine J Gao, W Dubitzky, C Wu, M Liebman, R Alhaij et al. New York: IEEE
    [Google Scholar]
  40. 40. 
    Ondov BD, Treangen TJ, Melsted P, Mallonee AB, Bergman NH et al. 2016. Mash: fast genome and metagenome distance estimation using MinHash. Genome Biol. 17:132
    [Google Scholar]
  41. 41. 
    Brown CT, Irber L 2016. sourmash: a library for MinHash sketching of DNA. J. Open Source Software 1:27
    [Google Scholar]
  42. 42. 
    Li H 2016. Minimap and miniasm: fast mapping and de novo assembly for noisy long sequences. Bioinformatics 32:2103–10
    [Google Scholar]
  43. 43. 
    Li H, Birol I 2018. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34:3094–100
    [Google Scholar]
  44. 44. 
    Li Y, Yan X 2015. MSPKmerCounter: a fast and memory efficient approach for k-mer counting. arXiv:1505.06550 [q-bio.GN]
    [Google Scholar]
  45. 45. 
    Deorowicz S, Kokot M, Grabowski S, Debudaj-Grabysz A 2015. KMC 2: fast and resource-frugal k-mer counting. Bioinformatics 31:1569–76
    [Google Scholar]
  46. 46. 
    Ye C, Ma ZS, Cannon CH, Pop M, Yu DW 2012. Exploiting sparseness in de novo genome assembly. BMC Bioinform. 13:S1
    [Google Scholar]
  47. 47. 
    Grabowski S, Raniszewski M 2015. Sampling the suffix array with minimizers. String Processing and Information Retrieval C Iliopoulos, S Puglisi, E Yilmaz287–98 Cham, Switz.: Springer Int.
    [Google Scholar]
  48. 48. 
    Merriman B, Rothberg JM 2012. Progress in Ion Torrent semiconductor chip based sequencing. Electrophoresis 33:3397–417
    [Google Scholar]
  49. 49. 
    Shendure J, Porreca GJ, Reppas NB, Lin X, McCutcheon JP et al. 2005. Accurate multiplex polony sequencing of an evolved bacterial genome. Science 309:1728–32
    [Google Scholar]
  50. 50. 
    Bennett S 2004. Solexa Ltd. Pharmacogenomics 5:433–38
    [Google Scholar]
  51. 51. 
    Jain M, Olsen HE, Paten B, Akeson M 2016. The Oxford Nanopore MinION: delivery of nanopore sequencing to the genomics community. Genome Biol. 17:239
    [Google Scholar]
  52. 52. 
    Quail MA, Smith M, Coupland P, Otto TD, Harris SR et al. 2012. A tale of three next generation sequencing platforms: comparison of Ion Torrent, Pacific Biosciences and Illumina MiSeq sequencers. BMC Genom. 13:341
    [Google Scholar]
  53. 53. 
    Clarke J, Wu HC, Jayasinghe L, Patel A, Reid S, Bayley H 2009. Continuous base identification for single-molecule nanopore DNA sequencing. Nat. Nanotechnol. 4:265–70
    [Google Scholar]
  54. 54. 
    Eid J, Fehr A, Gray J, Luong K, Lyle J et al. 2009. Real-time DNA sequencing from single polymerase molecules. Science 323:133–38
    [Google Scholar]
  55. 55. 
    Chaisson MJ, Pevzner PA 2008. Short read fragment assembly of bacterial genomes. Genome Res. 18:324–30
    [Google Scholar]
  56. 56. 
    Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ 1990. Basic local alignment search tool. J. Mol. Biol. 215:403–10
    [Google Scholar]
  57. 57. 
    Weiner P 1973. Linear pattern matching algorithms. Proceedings of the 14th Annual Symposium on Switching and Automata Theory (SWAT 1973)1–11 Washington, DC: IEEE Comput. Soc.
    [Google Scholar]
  58. 58. 
    Giegerich R, Kurtz S 1997. From Ukkonen to McCreight and Weiner: a unifying view of linear-time suffix tree construction. Algorithmica 19:331–53
    [Google Scholar]
  59. 59. 
    Manber U, Myers G 1993. Suffix arrays: a new method for on-line string searches. SIAM J. Comput. 22:935–48
    [Google Scholar]
  60. 60. 
    Abouelhoda MI, Kurtz S, Ohlebusch E 2004. Replacing suffix trees with enhanced suffix arrays. J. Discrete Algorithms 2:53–86
    [Google Scholar]
  61. 61. 
    Grossi R, Gupta A, Vitter JS 2003. High-order entropy-compressed text indexes. Proceedings of the Fourteenth Annual ACM-SIAM Symposium on Discrete Algorithms, SODA’03841–50 Philadelphia: Soc. Indust. Appl. Math.
    [Google Scholar]
  62. 62. 
    Apostolico A 1985. The myriad virtues of subword trees. Combinatorial Algorithms on Words A Apostolico, Z Galil85–96 Berlin: Springer-Verlag
    [Google Scholar]
  63. 63. 
    Ferragina P, Manzini G 2000. Opportunistic data structures with applications. Proceedings of the 41st Annual Symposium on Foundations of Computer Science390–98 Los Alamitos, CA: IEEE Comput. Soc.
    [Google Scholar]
  64. 64. 
    Burrows M, Wheeler DJ 1994. A block sorting lossless data compression algorithm Tech. Rep. 124, Digit. Syst. Res. Cent., Palo Alto, CA
    [Google Scholar]
  65. 65. 
    Ferragina P, Manzini G, Mäkinen V, Navarro G 2007. Compressed representations of sequences and full-text indexes. ACM Trans. Algorithms 3:20
    [Google Scholar]
  66. 66. 
    Bentley JL, Sleator DD, Tarjan RE, Wei VK 1986. A locally adaptive data compression scheme. Commun. ACM 29:320–30
    [Google Scholar]
  67. 67. 
    Ferragina P, Luccio F, Manzini G, Muthukrishnan S 2009. Compressing and indexing labeled trees, with applications. J. ACM 57:4
    [Google Scholar]
  68. 68. 
    Sirén J, Välimäki N, Mäkinen V 2014. Indexing graphs for path queries with applications in genome research. IEEE/ACM Trans. Comput. Biol. Bioinformatics 11:375–88
    [Google Scholar]
  69. 69. 
    Jacobson GJ 1988. Succinct static data structures PhD Thesis, Carnegie Mellon Univ., Pittsburgh, PA
    [Google Scholar]
  70. 70. 
    Bowe A, Onodera T, Sadakane K, Shibuya T 2012. Succinct de Bruijn graphs. WABI 2012: Algorithms in Bioinformatics B Raphael, J Tang225–35 Berlin: Springer-Verlag
    [Google Scholar]
  71. 71. 
    Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J et al. 2009. BLAST+: architecture and applications. BMC Bioinform. 10:421
    [Google Scholar]
  72. 72. 
    Simpson JT, Durbin R 2010. Efficient construction of an assembly string graph using the FM-index. Bioinformatics 26:i367–73
    [Google Scholar]
  73. 73. 
    Ferragina P, Gagie T, Manzini G 2012. Lightweight data indexing and compression in external memory. Algorithmica 63:707–30
    [Google Scholar]
  74. 74. 
    Myers EW 2005. The fragment assembly string graph. Bioinformatics 21:ii79–85
    [Google Scholar]
  75. 75. 
    Birol I, Jackman SD, Nielsen CB, Qian JQ, Varhol R et al. 2009. De novo transcriptome assembly with ABySS. Bioinformatics 25:2872–77
    [Google Scholar]
  76. 76. 
    Indyk P, Motwani R 1998. Approximate nearest neighbors: towards removing the curse of dimensionality. Proceedings of the Thirtieth Annual ACM Symposium on Theory of Computing, STOC ’98604–13 New York: Assoc. Comput. Mach.
    [Google Scholar]
  77. 77. 
    Wang J, Shen HT, Song J, Ji J 2014. Hashing for similarity search: a survey. arXiv:1408.2927 [cs.DS]
    [Google Scholar]
  78. 78. 
    Li P, Owen A, Zhang CH 2012. One permutation hashing for efficient search and learning. arXiv:1208.1259 [cs.LG]
    [Google Scholar]
  79. 79. 
    Bar-Yossef Z, Jayram TS, Kumar R, Sivakumar D, Trevisan L 2002. Counting distinct elements in a data stream. Randomization and Approximation Techniques in Computer Science JDP Rolim, SP Vadhan1–10 Berlin: Springer-Verlag
    [Google Scholar]
  80. 80. 
    Indyk P, Woodruff D 2003. Tight lower bounds for the distinct elements problem. Proceedings of the 44th Annual IEEE Symposium on Foundations of Computer Science, 2003283–88 New York: IEEE
    [Google Scholar]
  81. 81. 
    Kane DM, Nelson J, Woodruff DP 2010. An optimal algorithm for the distinct elements problem. Proceedings of the Twenty-Ninth ACM SIGMOD-SIGACT-SIGART Symposium on Principles of Database Systems, PODS ’1041–52 New York: Assoc. Comput. Mach.
    [Google Scholar]
  82. 82. 
    Giroire F 2009. Order statistics and estimating cardinalities of massive data sets. Discrete Appl. Math. 157:406–27
    [Google Scholar]
  83. 83. 
    Flajolet P, Fusy É, Gandouet O, Meunier F 2007. HyperLogLog: the analysis of a near-optimal cardinality estimation algorithm. AofA ’07: Proceedings of the 2007 International Conference on Analysis of Algorithms127–46 Nancy, Fr. Discret. Math. Theor. Comput. Sci.
    [Google Scholar]
  84. 84. 
    Heule S, Nunkesser M, Hall A 2013. HyperLogLog in practice: algorithmic engineering of a state of the art cardinality estimation algorithm. Proceedings of the 16th International Conference on Extending Database Technology683–92 New York: Assoc. Comput. Mach.
    [Google Scholar]
  85. 85. 
    Cormode G, Muthukrishnan S 2005. An improved data stream summary: the Count-Min sketch and its applications. J. Algorithms 55:58–75
    [Google Scholar]
  86. 86. 
    Alon N, Matias Y, Szegedy M 1999. The space complexity of approximating the frequency moments. J. Comput. Syst. Sci. 58:137–47
    [Google Scholar]
  87. 87. 
    Woodruff D 2004. Optimal space lower bounds for all frequency moments. SODA ’04: Proceedings of the Fifteenth Annual ACM-SIAM Symposium on Discrete Algorithms167–75 Philadelphia: Soc. Indust. Appl. Math.
    [Google Scholar]
  88. 88. 
    Georganas E, Buluç A, Chapman J, Oliker L, Rokhsar D, Yelick K 2014. Parallel de Bruijn graph construction and traversal for de novo genome assembly. Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis437–48 Los Alamitos, CA: IEEE Comput. Soc.
    [Google Scholar]
  89. 89. 
    Melsted P, Halldórsson BV 2014. KmerStream: streaming algorithms for k-mer abundance estimation. Bioinformatics 30:3541–47
    [Google Scholar]
  90. 90. 
    Bloom BH 1970. Space/time trade-offs in hash coding with allowable errors. Commun. ACM 13:422–26
    [Google Scholar]
  91. 91. 
    Broder A, Mitzenmacher M 2004. Network applications of Bloom filters: a survey. Int. Math. 1:485–509
    [Google Scholar]
  92. 92. 
    Fan L, Cao P, Almeida J, Broder AZ 2000. Summary cache: a scalable wide-area web cache sharing protocol. IEEE/ACM Trans. Networking 8:281–93
    [Google Scholar]
  93. 93. 
    Cohen S, Matias Y 2003. Spectral bloom filters. SIGMOD ’03: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data241–52 New York: Assoc. Comput. Mach.
    [Google Scholar]
  94. 94. 
    Salikhov K, Sacomoto G, Kucherov G 2014. Using cascading Bloom filters to improve the memory usage for de Brujin graphs. Algorithms Mol. Biol. 9:2
    [Google Scholar]
  95. 95. 
    Rozov R, Shamir R, Halperin E 2014. Fast lossless compression via cascading Bloom filters. BMC Bioinform. 15:S7
    [Google Scholar]
  96. 96. 
    Crainiceanu A 2013. Bloofi: a hierarchical Bloom filter index with applications to distributed data provenance. Cloud-I ’13: Proceedings of the 2nd International Workshop on Cloud Intelligence New York: Assoc. Comput. Mach.
    [Google Scholar]
  97. 97. 
    Crainiceanu A, Lemire D 2015. Multidimensional Bloom filters. Inform. Syst. 54:311–24
    [Google Scholar]
  98. 98. 
    Sun C, Harris RS, Chikhi R, Medvedev P 2018. AllSome sequence Bloom trees. J. Comput. Biol. 25:467–79
    [Google Scholar]
  99. 99. 
    Bender MA, Farach-Colton M, Johnson R, Kraner R, Kuszmaul BC et al. 2012. Don't thrash: how to cache your hash on flash. Proc. VLDB Endow. 5:1627–37
    [Google Scholar]
  100. 100. 
    Knuth DE 1998. The Art of Computer Programming 3 Sorting and Searching Reading, MA: Addison–Wesley 2nd ed.
    [Google Scholar]
  101. 101. 
    Pandey P, Bender MA, Johnson R, Patro R 2017. A general-purpose counting filter: making every bit count. SIGMOD ’17: Proceedings of the 2017 ACM International Conference on Management of Data775–87 New York: Assoc. Comput. Mach.
    [Google Scholar]
  102. 102. 
    Raman R, Raman V, Rao SS 2002. Succinct indexable dictionaries with applications to encoding k-ary trees and multisets. SODA ’02: Proceedings of the Thirteenth Annual ACM-SIAM Symposium on Discrete Algorithms233–42 Philadelphia: Soc. Indust. Appl. Math.
    [Google Scholar]
  103. 103. 
    Yu Y, Liu J, Liu X, Zhang Y, Magner E et al. 2018. SeqOthello: Querying RNA-seq experiments at scale. Genome Biol. 19:167–80
    [Google Scholar]
  104. 104. 
    Zerbino DR, Birney E 2008. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18:821–29
    [Google Scholar]
  105. 105. 
    Chikhi R, Rizk G 2013. Space-efficient and exact de Bruijn graph representation based on a Bloom filter. Algorithms Mol. Biol. 8:22
    [Google Scholar]
  106. 106. 
    Pell J, Hintze A, Canino-Koning R, Howe A, Tiedje JM, Brown CT 2012. Scaling metagenome sequence assembly with probabilistic de Bruijn graphs. PNAS 109:13272–77
    [Google Scholar]
  107. 107. 
    Pandey P, Bender MA, Johnson R, Patro R 2017. deBGR: an efficient and near-exact representation of the weighted de Bruijn graph. Bioinformatics 33:i133–41
    [Google Scholar]
  108. 108. 
    Roberts M, Hayes W, Hunt BR, Mount SM, Yorke JA 2004. Reducing storage requirements for biological sequence comparison. Bioinformatics 20:3363–69
    [Google Scholar]
  109. 109. 
    Schleimer S, Wilkerson DS, Aiken A 2003. Winnowing: local algorithms for document fingerprinting. SIGMOD ’03: Proceedings of the 2003 ACM SIGMOD International Conference on Management of Data76–85 New York: Assoc. Comput. Mach.
    [Google Scholar]
  110. 110. 
    Roberts M, Hunt BR, Yorke JA, Bolanos RA, Delcher AL 2004. A preprocessor for shotgun assembly of large genomes. J. Comput. Biol. 11:734–52
    [Google Scholar]
  111. 111. 
    Orenstein Y, Pellow D, Marçais G, Shamir R, Kingsford C 2016. Compact universal k-mer hitting sets. WABI 2016: Algorithms in Bioinformatics M Frith, CNS Pedersen257–68 Cham, Switz.: Springer Int.
    [Google Scholar]
  112. 112. 
    Marçais G, Pellow D, Bork D, Orenstein Y, Shamir R, Kingsford C 2017. Improving the performance of minimizers and winnowing schemes. Bioinformatics 33:i110–17
    [Google Scholar]
  113. 113. 
    Marçais G, DeBlasio D, Kingsford C 2018. Asymptotically optimal minimizers schemes. Bioinformatics 34:i13–22
    [Google Scholar]
  114. 114. 
    Myers EW, Sutton GG, Delcher AL, Dew IM, Fasulo DP et al. 2000. A whole-genome assembly of Drosophila. Science 287:2196–204
    [Google Scholar]
  115. 115. 
    Pérez N, Gutierrez M, Vera N 2016. Computational performance assessment of k-mer counting algorithms. J. Comput. Biol. 23:248–55
    [Google Scholar]
  116. 116. 
    Erbert M, Rechner S, Müller-Hannemann M 2017. Gerbil: a fast and memory-efficient k-mer counter with GPU-support. Algorithms Mol. Biol. 12:9
    [Google Scholar]
/content/journals/10.1146/annurev-biodatasci-072018-021156
Loading
/content/journals/10.1146/annurev-biodatasci-072018-021156
Loading

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error