1932

Abstract

Alternative splicing is pivotal to the regulation of gene expression and protein diversity in eukaryotic cells. The detection of alternative splicing events requires specific omics technologies. Although short-read RNA sequencing has successfully supported a plethora of investigations on alternative splicing, the emerging technologies of long-read RNA sequencing and top-down mass spectrometry open new opportunities to identify alternative splicing and protein isoforms with less ambiguity. Here, we summarize improvements in short-read RNA sequencing for alternative splicing analysis, including percent splicing index estimation and differential analysis. We also review the computational methods used in top-down proteomics analysis regarding proteoform identification, including the construction of databases of protein isoforms and statistical analyses of search results. While many improvements in sequencing and computational methods will result from emerging technologies, there should be future endeavors to increase the effectiveness, integration, and proteome coverage of alternative splicing events.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-biodatasci-020722-044021
2023-08-10
2024-04-21
Loading full text...

Full text loading...

/deliver/fulltext/biodatasci/6/1/annurev-biodatasci-020722-044021.html?itemId=/content/journals/10.1146/annurev-biodatasci-020722-044021&mimeType=html&fmt=ahah

Literature Cited

  1. 1.
    Wang Y, Liu J, Huang BO, Xu YM, Li J et al. 2015. Mechanism of alternative splicing and its regulation. Biomed. Rep. 3:152–58
    [Google Scholar]
  2. 2.
    Ponomarenko EA, Poverennaya EV, Ilgisonis EV, Pyatnitskiy MA, Kopylov AT et al. 2016. The size of the human proteome: the width and depth. Int. J. Anal. Chem. 2016:7436849
    [Google Scholar]
  3. 3.
    Smith LM, Kelleher NL, Consort. Top Down Proteom 2013. Proteoform: a single term describing protein complexity. Nat. Methods 10:186–87
    [Google Scholar]
  4. 4.
    Tammaro C, Raponi M, Wilson DI, Baralle D. 2012. BRCA1 exon 11 alternative splicing, multiple functions and the association with cancer. Biochem. Soc. Trans. 40:768–72
    [Google Scholar]
  5. 5.
    Bischof K, Knappskog S, Stefansson I, McCormack EM, Trovik J et al. 2018. High expression of the p53 isoform gamma is associated with reduced progression-free survival in uterine serous carcinoma. BMC Cancer 18:684
    [Google Scholar]
  6. 6.
    Ji Y, Mishra RK, Davuluri RV. 2020. In silico analysis of alternative splicing on drug-target gene interactions. Sci. Rep. 10:134
    [Google Scholar]
  7. 7.
    Tang Z, Zhao J, Pearson ZJ, Boskovic ZV, Wang J. 2021. RNA-targeting splicing modifiers: drug development and screening assays. Molecules 26:82263
    [Google Scholar]
  8. 8.
    Zhang F, Deng CK, Wang M, Deng B, Barber R, Huang G. 2020. Identification of novel alternative splicing biomarkers for breast cancer with LC/MS/MS and RNA-Seq. BMC Bioinform. 21:541
    [Google Scholar]
  9. 9.
    Oh J, Pradella D, Shao C, Li H, Choi N et al. 2021. Widespread alternative splicing changes in metastatic breast cancer cells. Cells 10:4858
    [Google Scholar]
  10. 10.
    Bonnal SC, Lopez-Oreja I, Valcarcel J. 2020. Roles and mechanisms of alternative splicing in cancer—implications for care. Nat. Rev. Clin. Oncol. 17:457–74
    [Google Scholar]
  11. 11.
    Jin P, Tan Y, Zhang W, Li J, Wang K. 2020. Prognostic alternative mRNA splicing signatures and associated splicing factors in acute myeloid leukemia. Neoplasia 22:447–57
    [Google Scholar]
  12. 12.
    Sammeth M, Foissac S, Guigo R. 2008. A general definition and nomenclature for alternative splicing events. PLOS Comput. Biol. 4:e1000147
    [Google Scholar]
  13. 13.
    Iniguez LP, Ramirez M, Barbazuk WB, Hernandez G. 2017. Identification and analysis of alternative splicing events in Phaseolus vulgaris and Glycine max. BMC Genom. 18:650
    [Google Scholar]
  14. 14.
    Landry JR, Mager DL, Wilhelm BT. 2003. Complex controls: the role of alternative promoters in mammalian genomes. Trends Genet. 19:640–48
    [Google Scholar]
  15. 15.
    Shi Y, Chen Z, Gao J, Wu S, Gao H, Feng G. 2018. Transcriptome-wide analysis of alternative mRNA splicing signature in the diagnosis and prognosis of stomach adenocarcinoma. Oncol. Rep. 40:2014–22
    [Google Scholar]
  16. 16.
    Katz Y, Wang ET, Airoldi EM, Burge CB. 2010. Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat. Methods 7:1009–15
    [Google Scholar]
  17. 17.
    Kahles A, Ong CS, Zhong Y, Ratsch G. 2016. SplAdder: identification, quantification and testing of alternative splicing events from RNA-Seq data. Bioinformatics 32:1840–47
    [Google Scholar]
  18. 18.
    Halperin RF, Hegde A, Lang JD, Raupach EA, Group CRR et al. 2021. Improved methods for RNAseq-based alternative splicing analysis. Sci. Rep. 11:10740
    [Google Scholar]
  19. 19.
    Plubell DL, Kall L, Webb-Robertson BJ, Bramer LM, Ives A et al. 2022. Putting Humpty Dumpty back together again: What does protein quantification mean in bottom-up proteomics?. J. Proteome Res. 21:891–98
    [Google Scholar]
  20. 20.
    Wolters DA, Washburn MP, Yates JR 3rd. 2001. An automated multidimensional protein identification technology for shotgun proteomics. Anal. Chem. 73:5683–90
    [Google Scholar]
  21. 21.
    Catherman AD, Skinner OS, Kelleher NL. 2014. Top Down proteomics: facts and perspectives. Biochem. Biophys. Res. Commun. 445:683–93
    [Google Scholar]
  22. 22.
    Cesnik AJ, Miller RM, Ibrahim K, Lu L, Millikin RJ et al. 2021. Spritz: a proteogenomic database engine. J. Proteome Res. 20:1826–34
    [Google Scholar]
  23. 23.
    Fancello L, Burger T. 2022. An analysis of proteogenomics and how and when transcriptome-informed reduction of protein databases can enhance eukaryotic proteomics. Genome Biol. 23:132
    [Google Scholar]
  24. 24.
    Nesvizhskii AI. 2014. Proteogenomics: concepts, applications and computational strategies. Nat. Methods 11:1114–25
    [Google Scholar]
  25. 25.
    Hong M, Tao S, Zhang L, Diao LT, Huang X et al. 2020. RNA sequencing: new technologies and applications in cancer research. J. Hematol. Oncol. 13:166
    [Google Scholar]
  26. 26.
    Margulies M, Egholm M, Altman WE, Attiya S, Bader JS et al. 2005. Genome sequencing in microfabricated high-density picolitre reactors. Nature 437:376–80
    [Google Scholar]
  27. 27.
    Eid J, Fehr A, Gray J, Luong K, Lyle J et al. 2009. Real-time DNA sequencing from single polymerase molecules. Science 323:133–38
    [Google Scholar]
  28. 28.
    Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C et al. 2013. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29:15–21
    [Google Scholar]
  29. 29.
    Kim D, Pertea G, Trapnell C, Pimentel H, Kelley R, Salzberg SL. 2013. TopHat2: accurate alignment of transcriptomes in the presence of insertions, deletions and gene fusions. Genome Biol. 14:R36
    [Google Scholar]
  30. 30.
    Kim D, Langmead B, Salzberg SL. 2015. HISAT: a fast spliced aligner with low memory requirements. Nat. Methods 12:357–60
    [Google Scholar]
  31. 31.
    Kim D, Paggi JM, Park C, Bennett C, Salzberg SL. 2019. Graph-based genome alignment and genotyping with HISAT2 and HISAT-genotype. Nat. Biotechnol. 37:907–15
    [Google Scholar]
  32. 32.
    Li H. 2018. Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics 34:3094–100
    [Google Scholar]
  33. 33.
    Boratyn GM, Thierry-Mieg J, Thierry-Mieg D, Busby B, Madden TL. 2019. Magic-BLAST, an accurate RNA-seq aligner for long and short reads. BMC Bioinform. 20:405
    [Google Scholar]
  34. 34.
    Patrick K. 2007. 454 Life Sciences: illuminating the future of genome sequencing and personalized medicine. Yale J. Biol. Med. 80:191–94
    [Google Scholar]
  35. 35.
    Head SR, Komori HK, LaMere SA, Whisenant T, Van Nieuwerburgh F et al. 2014. Library construction for next-generation sequencing: overviews and challenges. Biotechniques 56:61–77
    [Google Scholar]
  36. 36.
    McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K et al. 2010. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 20:1297–303
    [Google Scholar]
  37. 37.
    Love MI, Huber W, Anders S. 2014. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 15:550
    [Google Scholar]
  38. 38.
    Oikonomopoulos S, Bayega A, Fahiminiya S, Djambazian H, Berube P, Ragoussis J. 2020. Methodologies for transcript profiling using long-read technologies. Front. Genet. 11:606
    [Google Scholar]
  39. 39.
    Sultan M, Schulz MH, Richard H, Magen A, Klingenhoff A et al. 2008. A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome. Science 321:956–60
    [Google Scholar]
  40. 40.
    Hardwick SA, Joglekar A, Flicek P, Frankish A, Tilgner HU. 2019. Getting the entire message: progress in isoform sequencing. Front. Genet. 10:709
    [Google Scholar]
  41. 41.
    Tilgner H, Raha D, Habegger L, Mohiuddin M, Gerstein M, Snyder M. 2013. Accurate identification and analysis of human mRNA isoforms using deep long read sequencing. G3 3:387–97
    [Google Scholar]
  42. 42.
    Tilgner H, Jahanbani F, Blauwkamp T, Moshrefi A, Jaeger E et al. 2015. Comprehensive transcriptome analysis using synthetic long-read sequencing reveals molecular co-association of distant splicing events. Nat. Biotechnol. 33:736–42
    [Google Scholar]
  43. 43.
    Tilgner H, Jahanbani F, Gupta I, Collier P, Wei E et al. 2018. Microfluidic isoform sequencing shows widespread splicing coordination in the human transcriptome. Genome Res. 28:231–42
    [Google Scholar]
  44. 44.
    De Coster W, Weissensteiner MH, Sedlazeck FJ. 2021. Towards population-scale long-read sequencing. Nat. Rev. Genet. 22:572–87
    [Google Scholar]
  45. 45.
    Clarke J, Wu HC, Jayasinghe L, Patel A, Reid S, Bayley H. 2009. Continuous base identification for single-molecule nanopore DNA sequencing. Nat. Nanotechnol. 4:265–70
    [Google Scholar]
  46. 46.
    Rhoads A, Au KF. 2015. PacBio sequencing and its applications. Genom. Proteom. Bioinform. 13:278–89
    [Google Scholar]
  47. 47.
    Sharon D, Tilgner H, Grubert F, Snyder M. 2013. A single-molecule long-read survey of the human transcriptome. Nat. Biotechnol. 31:1009–14
    [Google Scholar]
  48. 48.
    Wenger AM, Peluso P, Rowell WJ, Chang PC, Hall RJ et al. 2019. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nat. Biotechnol. 37:1155–62
    [Google Scholar]
  49. 49.
    Wang Y, Zhao Y, Bollas A, Wang Y, Au KF. 2021. Nanopore sequencing technology, bioinformatics and applications. Nat. Biotechnol. 39:1348–65
    [Google Scholar]
  50. 50.
    Logsdon GA, Vollger MR, Eichler EE. 2020. Long-read human genome sequencing and its applications. Nat. Rev. Genet. 21:597–614
    [Google Scholar]
  51. 51.
    Minkley D, Whitney MJ, Lin SH, Barsky MG, Kelly C, Upton C. 2014. Suffix tree searcher: exploration of common substrings in large DNA sequence sets. BMC Res. Notes 7:466
    [Google Scholar]
  52. 52.
    Kuhnle A, Mun T, Boucher C, Gagie T, Langmead B, Manzini G. 2020. Efficient construction of a complete index for pan-genomics read alignment. J. Comput. Biol. 27:500–13
    [Google Scholar]
  53. 53.
    Cheng H, Wu M, Xu Y. 2018. FMtree: a fast locating algorithm of FM-indexes for genomic data. Bioinformatics 34:416–24
    [Google Scholar]
  54. 54.
    Langmead B, Trapnell C, Pop M, Salzberg SL. 2009. Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 10:R25
    [Google Scholar]
  55. 55.
    Schafer S, Miao K, Benson CC, Heinig M, Cook SA, Hubner N. 2015. Alternative splicing signatures in RNA-seq data: percent spliced in (PSI). Curr. Protoc. Hum. Genet. 87:11.16.1–11.16.14
    [Google Scholar]
  56. 56.
    Ding L, Rath E, Bai Y. 2017. Comparison of alternative splicing junction detection tools using RNA-seq data. Curr. Genom. 18:268–77
    [Google Scholar]
  57. 57.
    Muller IB, Meijers S, Kampstra P, van Dijk S, van Elswijk M et al. 2021. Computational comparison of common event-based differential splicing tools: practical considerations for laboratory researchers. BMC Bioinform. 22:347
    [Google Scholar]
  58. 58.
    Wang ET, Sandberg R, Luo S, Khrebtukova I, Zhang L et al. 2008. Alternative isoform regulation in human tissue transcriptomes. Nature 456:470–76
    [Google Scholar]
  59. 59.
    Zheng JT, Lin CX, Fang ZY, Li HD. 2020. Intron retention as a mode for RNA-seq data analysis. Front. Genet. 11:586
    [Google Scholar]
  60. 60.
    Sznajder LJ, Thomas JD, Carrell EM, Reid T, McFarland KN et al. 2018. Intron retention induced by microsatellite expansions as a disease biomarker. PNAS 115:4234–39
    [Google Scholar]
  61. 61.
    Echigoya Y, Lim KRQ, Nakamura A, Yokota T. 2018. Multiple exon skipping in the Duchenne muscular dystrophy hot spots: prospects and challenges. J. Pers. Med. 8:441
    [Google Scholar]
  62. 62.
    Pervouchine DD, Knowles DG, Guigo R. 2013. Intron-centric estimation of alternative splicing from RNA-seq data. Bioinformatics 29:273–74
    [Google Scholar]
  63. 63.
    Zhang D, Hu Q, Liu X, Ji Y, Chao HP et al. 2020. Intron retention is a hallmark and spliceosome represents a therapeutic vulnerability in aggressive prostate cancer. Nat. Commun. 11:2089
    [Google Scholar]
  64. 64.
    Ong CT, Adusumalli S. 2020. Increased intron retention is linked to Alzheimer's disease. Neural Regen. Res. 15:259–60
    [Google Scholar]
  65. 65.
    Broseus L, Ritchie W. 2020. Challenges in detecting and quantifying intron retention from next generation sequencing data. Comput. Struct. Biotechnol. J. 18:501–8
    [Google Scholar]
  66. 66.
    Mertes C, Scheller IF, Yepez VA, Celik MH, Liang Y et al. 2021. Detection of aberrant splicing events in RNA-seq data using FRASER. Nat. Commun. 12:529
    [Google Scholar]
  67. 67.
    Hussein SM, Puri MC, Tonge PD, Benevento M, Corso AJ et al. 2014. Genome-wide characterization of the routes to pluripotency. Nature 516:198–206
    [Google Scholar]
  68. 68.
    Kremer LS, Bader DM, Mertes C, Kopajtich R, Pichler G et al. 2017. Genetic diagnosis of Mendelian disorders via RNA sequencing. Nat. Commun. 8:15824
    [Google Scholar]
  69. 69.
    Lin KT, Krainer AR. 2019. PSI-Sigma: a comprehensive splicing-detection method for short-read and long-read RNA-seq analysis. Bioinformatics 35:5048–54
    [Google Scholar]
  70. 70.
    Shen S, Park JW, Lu ZX, Lin L, Henry MD et al. 2014. rMATS: robust and flexible detection of differential alternative splicing from replicate RNA-Seq data. PNAS 111:E5593–601
    [Google Scholar]
  71. 71.
    Trincado JL, Entizne JC, Hysenaj G, Singh B, Skalic M et al. 2018. SUPPA2: fast, accurate, and uncertainty-aware differential splicing analysis across multiple conditions. Genome Biol. 19:40
    [Google Scholar]
  72. 72.
    Shen S, Park JW, Huang J, Dittmar KA, Lu ZX et al. 2012. MATS: a Bayesian framework for flexible detection of differential alternative splicing from RNA-Seq data. Nucleic Acids Res. 40:e61
    [Google Scholar]
  73. 73.
    Kim SB, Chen VC, Park Y, Ziegler TR, Jones DP. 2008. Controlling the false discovery rate for feature selection in high-resolution NMR spectra. Stat. Anal. Data Min. 1:57–66
    [Google Scholar]
  74. 74.
    Yamashita M, Fenn JB. 1984. Electrospray ion-source—another variation on the free-jet theme. J. Phys. Chem. 88:4451–59
    [Google Scholar]
  75. 75.
    Guo J, Huan T. 2020. Comparison of full-scan, data-dependent, and data-independent acquisition modes in liquid chromatography-mass spectrometry based untargeted metabolomics. Anal. Chem. 92:8072–80
    [Google Scholar]
  76. 76.
    Zhang Y, Fonslow BR, Shan B, Baek MC, Yates JR 3rd. 2013. Protein analysis by shotgun/bottom-up proteomics. Chem. Rev. 113:2343–94
    [Google Scholar]
  77. 77.
    Washburn MP, Wolters D, Yates JR 3rd. 2001. Large-scale analysis of the yeast proteome by multidimensional protein identification technology. Nat. Biotechnol. 19:242–47
    [Google Scholar]
  78. 78.
    Uhlen M, Fagerberg L, Hallstrom BM, Lindskog C, Oksvold P et al. 2015. Proteomics. Tissue-based map of the human proteome. Science 347:1260419
    [Google Scholar]
  79. 79.
    Durbin KR, Fornelli L, Fellers RT, Doubleday PF, Narita M, Kelleher NL. 2016. Quantitation and identification of thousands of human proteoforms below 30 kDa. J. Proteome Res. 15:976–82
    [Google Scholar]
  80. 80.
    Melani RD, Gerbasi VR, Anderson LC, Sikora JW, Toby TK et al. 2022. The Blood Proteoform Atlas: a reference map of proteoforms in human hematopoietic cells. Science 375:411–18
    [Google Scholar]
  81. 81.
    Tiambeng TN, Wu Z, Melby JA, Ge Y. 2022. Size exclusion chromatography strategies and MASH explorer for large proteoform characterization. Methods Mol. Biol. 2500:15–30
    [Google Scholar]
  82. 82.
    Aballo TJ, Roberts DS, Melby JA, Buck KM, Brown KA, Ge Y. 2021. Ultrafast and reproducible proteomics from small amounts of heart tissue enabled by Azo and timsTOF Pro. J. Proteome Res. 20:4203–11
    [Google Scholar]
  83. 83.
    Tucholski T, Knott SJ, Chen B, Pistono P, Lin Z, Ge Y. 2019. A top-down proteomics platform coupling serial size exclusion chromatography and Fourier transform ion cyclotron resonance mass spectrometry. Anal. Chem. 91:3835–44
    [Google Scholar]
  84. 84.
    Donnelly DP, Rawlins CM, DeHart CJ, Fornelli L, Schachner LF et al. 2019. Best practices and benchmarks for intact protein analysis for top-down mass spectrometry. Nat. Methods 16:587–94
    [Google Scholar]
  85. 85.
    Takemori A, Butcher DS, Harman VM, Brownridge P, Shima K et al. 2020. PEPPI-MS: polyacrylamide-gel-based prefractionation for analysis of intact proteoforms and protein complexes by mass spectrometry. J. Proteome Res. 19:3779–91
    [Google Scholar]
  86. 86.
    Nesvizhskii AI, Aebersold R. 2005. Interpretation of shotgun proteomic data: the protein inference problem. Mol. Cell. Proteom. 4:1419–40
    [Google Scholar]
  87. 87.
    Qeli E, Ahrens CH. 2010. PeptideClassifier for protein inference and targeted quantitative proteomics. Nat. Biotechnol. 28:647–50
    [Google Scholar]
  88. 88.
    Compton PD, Zamdborg L, Thomas PM, Kelleher NL. 2011. On the scalability and requirements of whole protein mass spectrometry. Anal. Chem. 83:6868–74
    [Google Scholar]
  89. 89.
    Cai W, Tucholski T, Chen B, Alpert AJ, McIlwain S et al. 2017. Top-down proteomics of large proteins up to 223 kDa enabled by serial size exclusion chromatography strategy. Anal. Chem. 89:5467–75
    [Google Scholar]
  90. 90.
    Su P, McGee JP, Durbin KR, Hollas MAR, Yang M et al. 2022. Highly multiplexed, label-free proteoform imaging of tissues by individual ion mass spectrometry. Sci. Adv. 8:eabp9929
    [Google Scholar]
  91. 91.
    UniProt Consort 2021. UniProt: the universal protein knowledgebase in 2021. Nucleic Acids Res. 49:D480–89
    [Google Scholar]
  92. 92.
    Hornbeck PV, Zhang B, Murray B, Kornhauser JM, Latham V, Skrzypek E. 2015. PhosphoSitePlus, 2014: mutations, PTMs and recalibrations. Nucleic Acids Res. 43:D512–20
    [Google Scholar]
  93. 93.
    Wang X, Slebos RJ, Wang D, Halvey PJ, Tabb DL et al. 2012. Protein identification using customized protein sequence databases derived from RNA-Seq data. J. Proteome Res. 11:1009–17
    [Google Scholar]
  94. 94.
    Ntai I, LeDuc RD, Fellers RT, Erdmann-Gilmore P, Davies SR et al. 2016. Integrated bottom-up and top-down proteomics of patient-derived breast tumor xenografts. Mol. Cell. Proteom. 15:45–56
    [Google Scholar]
  95. 95.
    Zhang B, Whiteaker JR, Hoofnagle AN, Baird GS, Rodland KD, Paulovich AG. 2019. Clinical potential of mass spectrometry-based proteogenomics. Nat. Rev. Clin. Oncol. 16:256–68
    [Google Scholar]
  96. 96.
    Specht H, Emmott E, Petelski AA, Huffman RG, Perlman DH et al. 2021. Single-cell proteomic and transcriptomic analysis of macrophage heterogeneity using SCoPE2. Genome Biol. 22:50
    [Google Scholar]
  97. 97.
    Miller RM, Jordan BT, Mehlferber MM, Jeffery ED, Chatzipantsiou C et al. 2022. Enhanced protein isoform characterization through long-read proteogenomics. Genome Biol. 23:69
    [Google Scholar]
  98. 98.
    Vyatkina K, Wu S, Dekker LJ, VanDuijn MM, Liu X et al. 2016. Top-down analysis of protein samples by de novo sequencing techniques. Bioinformatics 32:2753–59
    [Google Scholar]
  99. 99.
    Dupre M, Duchateau M, Sternke-Hoffmann R, Boquoi A, Malosse C et al. 2021. De novo sequencing of antibody light chain proteoforms from patients with multiple myeloma. Anal. Chem. 93:10627–34
    [Google Scholar]
  100. 100.
    Liu X, Dekker LJ, Wu S, Vanduijn MM, Luider TM et al. 2014. De novo protein sequencing by combining top-down and bottom-up tandem mass spectra. J. Proteome Res. 13:3241–48
    [Google Scholar]
  101. 101.
    Vyatkina K. 2021. Validation of de novo peptide sequences with bottom-up tag convolution. Proteomes 10:1
    [Google Scholar]
  102. 102.
    Vyatkina K. 2017. De novo sequencing of top-down tandem mass spectra: a next step towards retrieving a complete protein sequence. Proteomes 5:6
    [Google Scholar]
  103. 103.
    Koenig T, Menze BH, Kirchner M, Monigatti F, Parker KC et al. 2008. Robust prediction of the MASCOT score for an improved quality assessment in mass spectrometric proteomics. J. Proteome Res. 7:3708–17
    [Google Scholar]
  104. 104.
    Perkins DN, Pappin DJ, Creasy DM, Cottrell JS. 1999. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis 20:3551–67
    [Google Scholar]
  105. 105.
    Karabacak NM, Li L, Tiwari A, Hayward LJ, Hong P et al. 2009. Sensitive and specific identification of wild type and variant proteins from 8 to 669 kDa using top-down mass spectrometry. Mol. Cell. Proteom. 8:846–56
    [Google Scholar]
  106. 106.
    Liu X, Sirotkin Y, Shen Y, Anderson G, Tsai YS et al. 2012. Protein identification using top-down spectra. Mol. Cell. Proteom. 11:M111.008524
    [Google Scholar]
  107. 107.
    Geer LY, Markey SP, Kowalak JA, Wagner L, Xu M et al. 2004. Open mass spectrometry search algorithm. J. Proteome Res. 3:958–64
    [Google Scholar]
  108. 108.
    Kou Q, Xun L, Liu X. 2016. TopPIC: a software tool for top-down mass spectrometry-based proteoform identification and characterization. Bioinformatics 32:3495–97
    [Google Scholar]
  109. 109.
    Liu X, Mammana A, Bafna V. 2012. Speeding up tandem mass spectral identification using indexes. Bioinformatics 28:1692–97
    [Google Scholar]
  110. 110.
    Eng JK, Fischer B, Grossmann J, Maccoss MJ. 2008. A fast SEQUEST cross correlation algorithm. J. Proteome Res. 7:4598–602
    [Google Scholar]
  111. 111.
    Kim H, Han S, Um JH, Park K. 2018. Accelerating a cross-correlation score function to search modifications using a single GPU. BMC Bioinform. 19:480
    [Google Scholar]
  112. 112.
    De Bruyn M, Ceuleers H, Hanning N, Berg M, De Man JG et al. 2021. Proteolytic cleavage of bioactive peptides and protease-activated receptors in acute and post-colitis. Int. J. Mol. Sci. 22:1910711
    [Google Scholar]
  113. 113.
    De Strooper B. 2010. Proteases and proteolysis in Alzheimer disease: a multifactorial view on the disease process. Physiol. Rev. 90:465–94
    [Google Scholar]
  114. 114.
    Fortelny N, Pavlidis P, Overall CM. 2015. The path of no return—truncated protein N-termini and current ignorance of their genesis. Proteomics 15:2547–52
    [Google Scholar]
  115. 115.
    Mortz E, O'Connor PB, Roepstorff P, Kelleher NL, Wood TD et al. 1996. Sequence tag identification of intact proteins by matching tanden mass spectral data against sequence data bases. PNAS 93:8264–67
    [Google Scholar]
  116. 116.
    Meng F, Cargile BJ, Miller LM, Forbes AJ, Johnson JR, Kelleher NL. 2001. Informatics and multiplexing of intact protein identification in bacteria and the archaea. Nat. Biotechnol. 19:952–57
    [Google Scholar]
  117. 117.
    Senko MW, Beu SC, McLafferty FW. 1995. Determination of monoisotopic masses and ion populations for large biomolecules from resolved isotopic distributions. J. Am. Soc. Mass. Spectrom. 6:22933
    [Google Scholar]
  118. 118.
    Aggarwal S, Yadav AK. 2016. False discovery rate estimation in proteomics. Methods Mol. Biol. 1362:119–28
    [Google Scholar]
  119. 119.
    Kerfeld CA, Scott KM. 2011. Using BLAST to teach “E-value-tionary” concepts. PLOS Biol. 9:e1001014
    [Google Scholar]
  120. 120.
    LeDuc RD, Fellers RT, Early BP, Greer JB, Shams DP et al. 2019. Accurate estimation of context-dependent false discovery rates in top-down proteomics. Mol. Cell. Proteom. 18:796–805
    [Google Scholar]
  121. 121.
    Ang MY, Low TY, Lee PY, Wan Mohamad Nazarie WF, Guryev V, Jamal R 2019. Proteogenomics: from next-generation sequencing (NGS) and mass spectrometry-based proteomics to precision medicine. Clin. Chim. Acta 498:38–46
    [Google Scholar]
  122. 122.
    Conesa A, Madrigal P, Tarazona S, Gomez-Cabrero D, Cervera A et al. 2016. A survey of best practices for RNA-seq data analysis. Genome Biol. 17:13
    [Google Scholar]
  123. 123.
    Cai W, Guner H, Gregorich ZR, Chen AJ, Ayaz-Guner S et al. 2016. MASH Suite Pro: a comprehensive software tool for top-down proteomics. Mol. Cell. Proteom. 15:703–14
    [Google Scholar]
  124. 124.
    Smith LM, Agar JN, Chamot-Rooke J, Danis PO, Ge Y et al. 2021. The Human Proteoform Project: defining the human proteome. Sci. Adv. 7:eabk0734
    [Google Scholar]
  125. 125.
    Burnum-Johnson KE, Conrads TP, Drake RR, Herr AE, Iyengar R et al. 2022. New views of old proteins: clarifying the enigmatic proteome. Mol. Cell. Proteom. 21:100254
    [Google Scholar]
  126. 126.
    Cross KL, Dewhirst F, Podar M. 2021. Complete genome sequence of human oral Actinomyces sp. HMT897 strain ORNL0104, a host of the saccharibacterium (TM7) HMT351. Microbiol. Resour. Announc. 10:14e00040–21
    [Google Scholar]
  127. 127.
    Collins FS, Green ED, Guttmacher AE, Guyer MS. 2003. A vision for the future of genomics research. Nature 422:835–47
    [Google Scholar]
  128. 128.
    Hollas MAR, Robey MT, Fellers RT, LeDuc RD, Thomas PM, Kelleher NL. 2022. The Human Proteoform Atlas: a FAIR community resource for experimentally derived proteoforms. Nucleic Acids Res. 50:D526–33
    [Google Scholar]
  129. 129.
    Langmead B, Salzberg SL. 2012. Fast gapped-read alignment with Bowtie 2. Nat. Methods 9:357–59
    [Google Scholar]
  130. 130.
    Li H, Durbin R. 2009. Fast and accurate short read alignment with Burrows–Wheeler transform. Bioinformatics 25:1754–60
    [Google Scholar]
  131. 131.
    Liao Y, Smyth GK, Shi W. 2013. The Subread aligner: fast, accurate and scalable read mapping by seed-and-vote. Nucleic Acids Res. 41:e108
    [Google Scholar]
  132. 132.
    Patro R, Duggal G, Love MI, Irizarry RA, Kingsford C. 2017. Salmon provides fast and bias-aware quantification of transcript expression. Nat. Methods 14:417–19
    [Google Scholar]
  133. 133.
    Bushnell B. 2014. BBMap: a fast, accurate, splice-aware aligner Tech. Rep. LBNL-7065E Lawrence Berkeley Natl. Lab. Berkeley, CA:
  134. 134.
    Wu TD, Reeder J, Lawrence M, Becker G, Brauer MJ. 2016. GMAP and GSNAP for genomic sequence alignment: enhancements to speed, accuracy, and functionality. Methods Mol. Biol. 1418:283–334
    [Google Scholar]
  135. 135.
    Wu TD, Watanabe CK. 2005. GMAP: a genomic mapping and alignment program for mRNA and EST sequences. Bioinformatics 21:1859–75
    [Google Scholar]
  136. 136.
    Vaquero-Garcia J, Barrera A, Gazzara MR, Gonzalez-Vallinas J, Lahens NF et al. 2016. A new view of transcriptome complexity and regulation through the lens of local splicing variations. eLife 5:e11752
    [Google Scholar]
  137. 137.
    Sterne-Weiler T, Weatheritt RJ, Best AJ, Ha KCH, Blencowe BJ. 2018. Efficient and accurate quantitative profiling of alternative splicing patterns of any complexity on a laptop. Mol. Cell 72:187–200.e6
    [Google Scholar]
  138. 138.
    Sun RX, Luo L, Wu L, Wang RM, Zeng WF et al. 2016. pTop 1.0: a high-accuracy and high-efficiency search engine for intact protein identification. Anal. Chem. 88:3082–90
    [Google Scholar]
  139. 139.
    Kou Q, Wu S, Tolic N, Pasa-Tolic L, Liu Y, Liu X. 2017. A mass graph-based approach for the identification of modified proteoforms using top-down tandem mass spectra. Bioinformatics 33:1309–16
    [Google Scholar]
  140. 140.
    Park J, Piehowski PD, Wilkins C, Zhou MW, Mendoza J et al. 2018. Informed-Proteomics: open-source software package for top-down proteomics. Nat. Methods 14:9909–14. Erratum 2018. Nat. Methods 15:554
    [Google Scholar]
  141. 141.
    Wenger CD, Coon JJ. 2013. A proteomics search algorithm specifically designed for high-resolution tandem mass spectra. J. Proteome Res. 12:1377–86
    [Google Scholar]
/content/journals/10.1146/annurev-biodatasci-020722-044021
Loading
/content/journals/10.1146/annurev-biodatasci-020722-044021
Loading

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error