Proteogenomics: Integrating Next-Generation Sequencing and Mass Spectrometry to Characterize Human Proteomic Variation

Gloria M. Sheynkman; Michael R. Shortreed; Anthony J. Cesnik; Lloyd M. Smith

doi:10.1146/annurev-anchem-071015-041722

Annual Review of Analytical Chemistry

Volume 9, 2016

Review Article

Free

Proteogenomics: Integrating Next-Generation Sequencing and Mass Spectrometry to Characterize Human Proteomic Variation

Gloria M. Sheynkman^1,2,3, Michael R. Shortreed³, Anthony J. Cesnik³, and Lloyd M. Smith^3,4
View Affiliations Hide Affiliations

Affiliations: ¹Center for Cancer Systems Biology (CCSB) and Department of Cancer Biology, Dana-Farber Cancer Institute, Boston, Massachusetts 02215; email: [email protected] ²Department of Genetics, Harvard Medical School, Boston, Massachusetts 02115 ³Department of Chemistry, University of Wisconsin, Madison, Wisconsin 53706; email: [email protected], [email protected] ⁴Genome Center of Wisconsin, University of Wisconsin, Madison, Wisconsin 53706; email: [email protected]
Vol. 9:521-545 (Volume publication date June 2016) https://doi.org/10.1146/annurev-anchem-071015-041722
First published as a Review in Advance on March 30, 2016
© Annual Reviews

Abstract

Mass spectrometry–based proteomics has emerged as the leading method for detection, quantification, and characterization of proteins. Nearly all proteomic workflows rely on proteomic databases to identify peptides and proteins, but these databases typically contain a generic set of proteins that lack variations unique to a given sample, precluding their detection. Fortunately, proteogenomics enables the detection of such proteomic variations and can be defined, broadly, as the use of nucleotide sequences to generate candidate protein sequences for mass spectrometry database searching. Proteogenomics is experiencing heightened significance due to two developments: (a) advances in DNA sequencing technologies that have made complete sequencing of human genomes and transcriptomes routine, and (b) the unveiling of the tremendous complexity of the human proteome as expressed at the levels of genes, cells, tissues, individuals, and populations. We review here the field of human proteogenomics, with an emphasis on its history, current implementations, the types of proteomic variations it reveals, and several important applications.

Keyword(s): alternative splicing, customized protein databases, genetic variation, isoforms, novel splice junction, polymorphism, proteoform, proteomics, sample-specific databases, single amino acid variant

Article metrics loading...

/content/journals/10.1146/annurev-anchem-071015-041722

2016-06-12

2024-06-08

Full text loading...

/deliver/fulltext/anchem/9/1/annurev-anchem-071015-041722.html?itemId=/content/journals/10.1146/annurev-anchem-071015-041722&mimeType=html&fmt=ahah

Literature Cited

Yates JR. 1. 2013. The revolution and evolution of shotgun proteomics for large-scale proteome analysis. J. Am. Chem. Soc. 135:1629–40 [Google Scholar]
Mardis ER. 2. 2013. Next-generation sequencing platforms. Annu. Rev. Anal. Chem. 6:287–303 [Google Scholar]
Brent MR. 3. 2008. Steady progress and recent breakthroughs in the accuracy of automated genome annotation. Nat. Rev. Genet. 9:62–73 [Google Scholar]
Harrow J, Frankish A, Gonzalez JM, Tapanari E, Diekhans M. 4. et al. 2012. GENCODE: the reference human genome annotation for The ENCODE Project. Genome Res. 22:1760–74 [Google Scholar]
Kim M-S, Pinto SM, Getnet D, Nirujogi RS, Manda SS. 5. et al. 2014. A draft map of the human proteome. Nature 509:575–81 [Google Scholar]
Wilhelm M, Schlegl J, Hahne H, Gholami AM, Lieberenz M. 6. et al. 2014. Mass-spectrometry-based draft of the human proteome. Nature 509:582–87 [Google Scholar]
Edman P. 7. 1950. Method for determination of the amino acid sequence in peptides. Acta Chem. Scand. 4:283–93 [Google Scholar]
Biemann K. 8. 1992. Mass-spectrometry of peptides and proteins. Annu. Rev. Biochem. 61:977–1010 [Google Scholar]
Fenn JB, Mann M, Meng CK, Wong SF, Whitehouse CM. 9. 1989. Electrospray ionization for mass-spectrometry of large biomolecules. Science 246:64–71 [Google Scholar]
Hillenkamp F, Karas M, Beavis RC, Chait BT. 10. 1991. Matrix-assisted laser desorption ionization mass-spectrometry of biopolymers. Anal. Chem. 63:A1193–202 [Google Scholar]
Mann M, Wilm M. 11. 1994. Error tolerant identification of peptides in sequence databases by peptide sequence tags. Anal. Chem. 66:4390–99 [Google Scholar]
Eng JK, McCormack AL, Yates JR. 12. 1994. An approach to correlate tandem mass-spectral data of peptides with amino-acid-sequences in a protein database. J. Am. Soc. Mass Spectrom. 5:976–89 [Google Scholar]
Boguski MS, Lowe TMJ, Tolstoshev CM. 13. 1993. dbEST—database for “expressed sequence tags.”. Nat. Genet. 4:332–33 [Google Scholar]
Yates JR, Eng JK, McCormack AL. 14. 1995. Mining genomes: correlating tandem mass spectra of modified and unmodified peptides to sequences in nucleotide databases. Anal. Chem. 67:3202–10 [Google Scholar]
Henzel WJ, Billeci TM, Stults JT, Wong SC, Grimley C, Watanabe C. 15. 1993. Identifying proteins from 2-dimensional gels by molecular mass searching of peptide-fragments in protein-sequence databases. PNAS 90:5011–15 [Google Scholar]
James P, Quadroni M, Carafoli E, Gonnet G. 16. 1994. Protein identification in DNA databases by peptide mass fingerprinting. Protein Sci. 3:1347–50 [Google Scholar]
Sanger F, Nicklen S, Coulson AR. 17. 1977. DNA sequencing with chain-terminating inhibitors. PNAS 74:5463–67 [Google Scholar]
Smith LM, Sanders JZ, Kaiser RJ, Hughes P, Dodd C. 18. et al. 1986. Fluorescence detection in automated DNA-sequence analysis. Nature 321:674–79 [Google Scholar]
Pandey A, Mann M. 19. 2000. Proteomics to study genes and genomes. Nature 405:837–46 [Google Scholar]
Neubauer G, King A, Rappsilber J, Calvio C, Watson M. 20. et al. 1998. Mass spectrometry and EST-database searching allows characterization of the multi-protein spliceosome complex. Nat. Genet. 20:46–50 [Google Scholar]
Choudhary JS, Blackstock WP, Creasy DM, Cottrell JS. 21. 2001. Interrogating the human genome using uninterpreted mass spectrometry data. Proteomics 1:651–67 [Google Scholar]
Brunner E, Ahrens CH, Mohanty S, Baetschmann H, Loevenich S. 22. et al. 2007. A high-quality catalog of the Drosophila melanogaster proteome. Nat. Biotechnol. 25:576–83 [Google Scholar]
Baerenfaller K, Grossmann J, Grobei MA, Hull R, Hirsch-Hoffmann M. 23. et al. 2008. Genome-scale proteomics reveals Arabidopsis thaliana gene models and proteome dynamics. Science 320:938–41 [Google Scholar]
Merrihew GE, Davis C, Ewing B, Williams G, Kall L. 24. et al. 2008. Use of shotgun proteomics for the identification, confirmation, and correction of C. elegans gene annotations. Genome Res. 18:1660–69 [Google Scholar]
Wang Z, Gerstein M, Snyder M. 25. 2009. RNA-Seq: a revolutionary tool for transcriptomics. Nat. Rev. Genet. 10:57–63 [Google Scholar]
Altshuler D, Durbin RM, Abecasis GR, Bentley DR, Chakravarti A. 26. et al. 2010. A map of human genome variation from population-scale sequencing. Nature 467:1061–73 [Google Scholar]
Wang ET, Sandberg R, Luo S, Khrebtukova I, Zhang L. 27. et al. 2008. Alternative isoform regulation in human tissue transcriptomes. Nature 456:470–76 [Google Scholar]
Sultan M, Schulz MH, Richard H, Magen A, Klingenhoff A. 28. et al. 2008. A global view of gene activity and alternative splicing by deep sequencing of the human transcriptome. Science 321:956–60 [Google Scholar]
Wang X, Slebos RJ, Wang D, Halvey PJ, Tabb DL. 29. et al. 2012. Protein identification using customized protein sequence databases derived from RNA-Seq data. J. Proteome Res. 11:1009–17 [Google Scholar]
Bateman A, Martin MJ, O'Donovan C, Magrane M, Apweiler R. 30. et al. 2015. UniProt: a hub for protein information. Nucleic Acids Res. 43:D204–12 [Google Scholar]
Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A. 31. et al. 2014. RefSeq: an update on mammalian reference sequences. Nucleic Acids Res. 42:D756–63 [Google Scholar]
Ng SB, Turner EH, Robertson PD, Flygare SD, Bigham AW. 32. et al. 2009. Targeted capture and massively parallel sequencing of 12 human exomes. Nature 461:272–76 [Google Scholar]
Ingolia NT, Ghaemmaghami S, Newman JRS, Weissman JS. 33. 2009. Genome-wide analysis in vivo of translation with nucleotide resolution using ribosome profiling. Science 324:218–23 [Google Scholar]
Edwards NJ. 34. 2007. Novel peptide identification from tandem mass spectra using ESTs and sequence database compression. Mol. Syst. Biol. 3:102 [Google Scholar]
Tanner S, Shen ZX, Ng J, Florea L, Guigo R. 35. et al. 2007. Improving gene annotation using peptide mass spectrometry. Genome Res. 17:231–39 [Google Scholar]
Jaffe JD, Berg HC, Church GM. 36. 2004. Proteogenomic mapping as a complementary method to perform genome annotation. Proteomics 4:59–77 [Google Scholar]
Jaffe JD, Stange-Thomann N, Smith C, DeCaprio D, Fisher S. 37. et al. 2004. The complete genome and proteome of Mycoplasma mobile. Genome Res. 14:1447–61 [Google Scholar]
Kuster B, Mortensen P, Andersen JS, Mann M. 38. 2001. Mass spectrometry allows direct identification of proteins in large genomes. Proteomics 1:641–50 [Google Scholar]
Fermin D, Allen BB, Blackwell TW, Menon R, Adamski M. 39. et al. 2006. Novel gene and gene model detection using a whole genome open reading frame analysis in proteomics. Genome Biol. 7:R35 [Google Scholar]
Khatun J, Yu YB, Wrobel JA, Risk BA, Gunawardena HP. 40. et al. 2013. Whole human genome proteogenomic mapping for ENCODE cell line data: identifying protein-coding regions. BMC Genom. 14:141 [Google Scholar]
Sheynkman GM, Johnson JE, Jagtap PD, Shortreed MR, Onsongo G. 41. et al. 2014. Using Galaxy-P to leverage RNA-Seq for the discovery of novel protein variations. BMC Genom. 15:9 [Google Scholar]
Yang L, Duff MO, Graveley BR, Carmichael GG, Chen LL. 42. 2011. Genomewide characterization of non-polyadenylated RNAs. Genome Biol. 12:14 [Google Scholar]
Nesvizhskii AI, Aebersold R. 43. 2005. Interpretation of shotgun proteomic data—the protein inference problem. Mol. Cell. Proteom. 4:1419–40 [Google Scholar]
Sheynkman GM, Shortreed MR, Frey BL, Scalf M, Smith LM. 44. 2014. Large-scale mass spectrometric detection of variant peptides resulting from nonsynonymous nucleotide differences. J. Proteome Res. 13:228–40 [Google Scholar]
Woo S, Cha SW, Merrihew G, He Y, Castellana N. 45. et al. 2013. Proteogenomic database construction driven from large scale RNA-Seq data. J. Proteome Res. 13:21–28 [Google Scholar]
Sheynkman GM, Shortreed MR, Frey BL, Smith LM. 46. 2013. Discovery and mass spectrometric analysis of novel splice-junction peptides using RNA-Seq. Mol. Cell. Proteom. 12:2341–53 [Google Scholar]
Frenkel-Morgenstern M, Lacroix V, Ezkurdia I, Levin Y, Gabashvili A. 47. et al. 2012. Chimeras taking shape: potential functions of proteins encoded by chimeric RNA transcripts. Genome Res. 22:1231–42 [Google Scholar]
Kim H, Park H, Paek E. 48. 2015. NextSearch: a search engine for mass spectrometry data against a compact nucleotide exon graph. J. Proteome Res. 14:2784–91 [Google Scholar]
Woo S, Cha SW, Na S, Guest C, Liu T. 49. et al. 2014. Proteogenomic strategies for identification of aberrant cancer peptides using large-scale next-generation sequencing data. Proteomics 14:2719–30 [Google Scholar]
Zickmann F, Renard BY. 50. 2015. MSProGene: integrative proteogenomics beyond six-frames and single nucleotide polymorphisms. Bioinformatics 31:106–15 [Google Scholar]
Evans VC, Barker G, Heesom KJ, Fan J, Bessant C, Matthews DA. 51. 2012. De novo derivation of proteomes from transcriptomes for transcript and protein identification. Nat. Methods 9:1207–11 [Google Scholar]
Koch A, Gawron D, Steyaert S, Ndah E, Crappe J. 52. et al. 2014. A proteogenomics approach integrating proteomics and ribosome profiling increases the efficiency of protein identification and enables the discovery of alternative translation start sites. Proteomics 14:2688–98 [Google Scholar]
Van Damme P, Gawron D, Van Criekinge W, Menschaert G. 53. 2014. N-terminal proteomics and ribosome profiling provide a comprehensive view of the alternative translation initiation landscape in mice and men. Mol. Cell. Proteom. 13:1245–61 [Google Scholar]
Creasy DM, Cottrell JS. 54. 2002. Error tolerant searching of uninterpreted tandem mass spectrometry data. Proteomics 2:1426–34 [Google Scholar]
Craig R, Beavis RC. 55. 2004. TANDEM: matching proteins with tandem mass spectra. Bioinformatics 20:1466–67 [Google Scholar]
Lin BY, Mo F, Hong X, Gao F, Du L. 56. et al. 2008. A compatible exon-exon junction database for the identification of exon skipping events using tandem mass spectrum data. BMC Bioinform. 9:537 [Google Scholar]
Power KA, McRedmond JP, de Stefani A, Gallagher WM, Gaora PO. 57. 2009. High-throughput proteomics detection of novel splice isoforms in human platelets. PLOS ONE 4:e5001 [Google Scholar]
Zhang F, Drabier R. 58. 2013. SASD: the Synthetic Alternative Splicing Database for identifying novel isoform from proteomics. BMC Bioinform. 14:S13 [Google Scholar]
Roos FF, Jacob R, Grossmann J, Fischer B, Buhmann JM. 59. et al. 2007. PepSplice: cache-efficient search algorithms for comprehensive identification of tandem mass spectra. Bioinformatics 23:3016–23 [Google Scholar]
Zhou A, Zhang F, Chen JY. 60. 2010. PEPPI: a peptidomic database of human protein isoforms for proteomics experiments. BMC Bioinform. 11:Suppl. 6S7 [Google Scholar]
Schandorff S, Olsen JV, Bunkenborg J, Blagoev B, Zhang Y. 61. et al. 2007. A mass spectrometry-friendly database for cSNP identification. Nat. Methods 4:465–66 [Google Scholar]
Li J, Su ZL, Ma ZQ, Slebos RJC, Halvey P. 62. et al. 2011. A bioinformatics workflow for variant peptide detection in shotgun proteomics. Mol. Cell. Proteom. 10:M110.006536 [Google Scholar]
Alves G, Ogurtsov AY, Yu YK. 63. 2008. RAId_DbS: mass-spectrometry based peptide identification web server with knowledge integration. BMC Genom. 9:505 [Google Scholar]
Xi H, Park JS, Ding GH, Lee YH, Li YX. 64. 2009. SysPIMP: the web-based systematical platform for identifying human disease-related mutated sequences from mass spectrometry. Nucleic Acids Res. 37:D913–20 [Google Scholar]
Yip YL, Famiglietti M, Gos A, Duek PD, David FP. 65. et al. 2008. Annotating single amino acid polymorphisms in the UniProt/Swiss-Prot knowledgebase. Hum. Mutat. 29:361–66 [Google Scholar]
Li J, Duncan DT, Zhang B. 66. 2010. CanProVar: a human cancer proteome variation database. Hum. Mutat. 31:219–28 [Google Scholar]
Forbes SA, Bindal N, Bamford S, Cole C, Kok CY. 67. et al. 2011. COSMIC: mining complete cancer genomes in the Catalogue of Somatic Mutations in Cancer. Nucleic Acids Res. 39:D945–50 [Google Scholar]
Menon R, Omenn GS. 68. 2011. Identification of alternatively spliced transcripts using a proteomic informatics approach. Data Mining in Proteomics: From Standards to Applications M Hamacher, M Eisenacher, C Stephan 319–26 New York: Humana [Google Scholar]
Kroll JE, Galante PAF, Ohara DT, Navarro FCP, Ohno-Machado L, de Souza SJ. 69. 2012. A new portal for the analysis of human splicing variants. RNA Biol. 9:1339–43 [Google Scholar]
Tavares R, de Miranda Scherer N, Pauletti BA, Araujo E, Folador EL. 70. et al. 2014. SpliceProt: a protein sequence repository of predicted human splice variants. Proteomics 14:181–85 [Google Scholar]
Frenkel-Morgenstern M, Gorohovski A, Vucenovic D, Maestre L, Valencia A. 71. 2015. ChiTaRS 2.1—an improved database of the chimeric transcripts and RNA-Seq data with novel sense-antisense chimeric RNA transcripts. Nucleic Acids Res. 43:D68–75 [Google Scholar]
Smigielski EM, Sirotkin K, Ward M, Sherry ST. 72. 2000. dbSNP: a database of single nucleotide polymorphisms. Nucleic Acids Res. 28:352–55 [Google Scholar]
Lappalainen I, Lopez J, Skipper L, Hefferon T, Spalding JD. 73. et al. 2013. dbVar and DGVa: public archives for genomic structural variation. Nucleic Acids Res. 41:D936–41 [Google Scholar]
Dasari S, Chambers MC, Slebos RJ, Zimmerman LJ, Ham AJL, Tabb DL. 74. 2010. TagRecon: high-throughput mutation identification through sequence tagging. J. Proteome Res. 9:1716–26 [Google Scholar]
Su Z-D, Sheng Q-H, Li Q-R, Chi H, Jiang X. 75. et al. 2014. De novo identification and quantification of single amino-acid variants in human brain. J. Mol. Cell Biol. 6:421–33 [Google Scholar]
Castellana NE, Pham V, Arnott D, Lill JR, Bafna V. 76. 2010. Template proteogenomics: sequencing whole proteins using an imperfect database. Mol. Cell. Proteom. 9:1260–70 [Google Scholar]
Castellana NE, McCutcheon K, Pham VC, Harden K, Nguyen A. 77. et al. 2011. Resurrection of a clinical antibody: template proteogenomic de novo proteomic sequencing and reverse engineering of an anti-lymphotoxin-α antibody. Proteomics 11:395–405 [Google Scholar]
Castellana N, Bafna V. 78. 2010. Proteogenomics to discover the full coding content of genomes: a computational perspective. J. Proteom. 73:2124–35 [Google Scholar]
Krug K, Carpy A, Behrends G, Matic K, Soares NC, Macek B. 79. 2013. Deep coverage of the Escherichia coli proteome enables the assessment of false discovery rates in simple proteogenomic experiments. Mol. Cell. Proteom. 12:3420–30 [Google Scholar]
Blakeley P, Overton IM, Hubbard SJ. 80. 2012. Addressing statistical biases in nucleotide-derived protein databases for proteogenomic search strategies. J. Proteome Res. 11:5221–34 [Google Scholar]
Bitton DA, Smith DL, Connolly Y, Scutt PJ, Miller CJ. 81. 2010. An integrated mass-spectrometry pipeline identifies novel protein coding-regions in the human genome. PLOS ONE 5:e8949 [Google Scholar]
Jagtap P, Goslinga J, Kooren JA, McGowan T, Wroblewski MS. 82. et al. 2013. A two-step database search method improves sensitivity in peptide sequence matches for metaproteomics and proteogenomics studies. Proteomics 13:1352–57 [Google Scholar]
Ning K, Fermin D, Nesvizhskii AI. 83. 2010. Computational analysis of unassigned high-quality MS/MS spectra in proteomic data sets. Proteomics 10:2712–18 [Google Scholar]
Helmy M, Sugiyama N, Tomita M, Ishihama Y. 84. 2012. Mass spectrum sequential subtraction speeds up searching large peptide MS/MS spectra datasets against large nucleotide databases for proteogenomics. Genes Cells 17:633–44 [Google Scholar]
Sevinsky JR, Cargile BJ, Bunger MK, Meng F, Yates NA. 85. et al. 2008. Whole genome searching with shotgun proteomic data: applications for genome annotation. J. Proteome Res. 7:80–88 [Google Scholar]
Branca RM, Orre LM, Johansson HJ, Granholm V, Huss M. 86. et al. 2014. HiRIEF LC-MS enables deep proteome coverage and unbiased proteogenomics. Nat. Methods 11:59–62 [Google Scholar]
Kim S, Gupta N, Bandeira N, Pevzner PA. 87. 2009. Spectral dictionaries integrating de novo peptide sequencing with database search of tandem mass spectra. Mol. Cell. Proteom. 8:53–69 [Google Scholar]
Ferro M, Tardift M, Reguer E, Cahuzac R, Bndey C. 88. et al. 2008. PepLine: a software pipeline for high-throughput direct mapping of tandem mass spectrometry data on genomic sequences. J. Proteome Res. 7:1873–83 [Google Scholar]
Tanner S, Shu HJ, Frank A, Wang LC, Zandi E. 89. et al. 2005. InsPecT: identification of posttransitionally modified peptides from tandem mass spectra. Anal. Chem. 77:4626–39 [Google Scholar]
Nesvizhskii AI, Roos FF, Grossmann J, Vogelzang M, Eddes JS. 90. et al. 2006. Dynamic spectrum quality assessment and iterative computational analysis of shotgun proteomic data—toward more efficient identification of post-translational modifications, sequence polymorphisms, and novel peptides. Mol. Cell. Proteom. 5:652–70 [Google Scholar]
Frank AM, Bandeira N, Shen Z, Tanner S, Briggs SP. 91. et al. 2008. Clustering millions of tandem mass spectra. J. Proteome Res. 7:113–22 [Google Scholar]
Chernobrovkin AL, Kopylav AT, Zgoda VG, Moysa AA, Pyatnitskiy MA. 92. et al. 2015. Methionine to isothreonine conversion as a source of false discovery identifications of genetically encoded variants in proteogenomics. J. Proteom. 120:169–78 [Google Scholar]
Wuehr M, Freeman RM, Presler M, Horb ME, Peshkin L. 93. et al. 2014. Deep proteomics of the Xenopus laevis egg using an mRNA-derived reference database. Curr. Biol. 24:1467–75 [Google Scholar]
Blakeley P, Siepen JA, Lawless C, Hubbard SJ. 94. 2010. Investigating protein isoforms via proteomics: a feasibility study. Proteomics 10:1127–40 [Google Scholar]
Guo X, Trudgian DC, Lemoff A, Yadavalli S, Mirzaei H. 95. 2014. Confetti: a multiprotease map of the HeLa proteome for comprehensive proteomics. Mol. Cell. Proteom. 13:1573–84 [Google Scholar]
Frank AM, Monroe ME, Shah AR, Carver JJ, Bandeira N. 96. et al. 2011. Spectral archives: extending spectral libraries to analyze both identified and unidentified spectra. Nat. Methods 8:587–591 [Google Scholar]
Risk BA, Spitzer WJ, Giddings MC. 97. 2013. Peppy: proteogenomic search software. J. Proteome Res. 12:3019–25 [Google Scholar]
Wen B, Xu SH, Sheynkman GM, Feng Q, Lin L. 98. et al. 2014. sapFinder: an R/Bioconductor package for detection of variant peptides in shotgun proteomics experiments. Bioinformatics 30:3136–38 [Google Scholar]
Wang X, Zhang B. 99. 2013. customProDB: an R package to generate customized protein databases from RNA-Seq data for proteomics search. Bioinformatics 29:3235–37 [Google Scholar]
Ghali F, Krishna R, Perkins S, Collins A, Xia D. 100. et al. 2014. ProteoAnnotator—open source proteogenomics annotation software supporting PSI standards. Proteomics 14:2731–41 [Google Scholar]
Jagtap PD, Johnson JE, Onsongo G, Sadler FW, Murray K. 101. et al. 2014. Flexible and accessible workflows for improved proteogenomic analysis using the Galaxy framework. J. Proteome Res. 13:5898–908 [Google Scholar]
Boekel J, Chilton JM, Cooke IR, Horvatovich PL, Jagtap PD. 102. et al. 2015. Multi-omic data analysis using Galaxy. Nat. Biotechnol. 33:137–39 [Google Scholar]
Pang CNI, Tay AP, Aya C, Twine NA, Harkness L. 103. et al. 2014. Tools to covisualize and coanalyze proteomic data with genomes and transcriptomes: validation of genes and alternative mRNA splicing. J. Proteome Res. 13:84–98 [Google Scholar]
Nagaraj SH, Waddell N, Madugundu AK, Wood S, Jones A. 104. et al. 2015. PGTools: a software suite for proteogenomic data analysis and visualization. J. Proteome Res. 14:2255–66 [Google Scholar]
Zhu YF, Hultin-Rosenberg L, Forshed J, Branca RM, Orre LM, Lehtiö J. 105. 2014. SpliceVista, a tool for splice variant identification and visualization in shotgun proteomics data. Mol. Cell. Proteom. 13:1552–62 [Google Scholar]
Sanders WS, Wang N, Bridges SM, Malone BM, Dandass YS. 106. et al. 2011. The proteogenomic mapping tool. BMC Bioinform. 12:7 [Google Scholar]
Kuhring M, Renard BY. 107. 2012. iPiG: integrating peptide spectrum matches into genome browser visualizations. PLOS ONE 7:e50246 [Google Scholar]
Ingram VM. 108. 1957. Gene mutations in human haemoglobin: chemical difference between normal and sickle cell haemoglobin. Nature 180:326–28 [Google Scholar]
Cavallo A, Martin ACR. 109. 2005. Mapping SNPs to protein sequence and structure data. Bioinformatics 21:1443–50 [Google Scholar]
Karchin R. 110. 2009. Next generation tools for the annotation of human SNPs. Brief. Bioinform. 10:35–52 [Google Scholar]
Gatlin CL, Eng JK, Cross ST, Detter JC, Yates JR. 111. 2000. Automated identification of amino acid sequence variations in proteins by HPLC/microspray tandem mass spectrometry. Anal. Chem. 72:757–63 [Google Scholar]
Bunger MK, Cargile BJ, Sevinsky JR, Deyanova E, Yates NA. 112. et al. 2007. Detection and validation of non-synonymous coding SNPs from orthogonal analysis of shotgun proteomics data. J. Proteome Res. 6:2331–40 [Google Scholar]
Chen M, Yang B, Ying WT, He FC, Qian XH. 113. 2010. Annotation of non-synonymous single polymorphisms in human liver proteome by mass spectrometry. Protein Pept. Lett. 17:277–86 [Google Scholar]
Song C, Wang F, Cheng K, Wei X, Bian Y. 114. et al. 2013. Large-scale quantification of single amino-acid variations by a variation-associated database search strategy. J. Proteome Res. 13:241–48 [Google Scholar]
Krug K, Popic S, Carpy A, Taumer C, Macek B. 115. 2014. Construction and assessment of individualized proteogenomic databases for large-scale analysis of nonsynonymous single nucleotide variants. Proteomics 14:2699–708 [Google Scholar]
Zhang B, Wang J, Wang X, Zhu J, Liu Q. 116. et al. 2014. Proteogenomic characterization of human colon and rectal cancer. Nature 513:382–87 [Google Scholar]
Yan H, Yuan WS, Velculescu VE, Vogelstein B, Kinzler KW. 117. 2002. Allelic variation in human gene expression. Science 297:1143 [Google Scholar]
MacArthur DG, Balasubramanian S, Frankish A, Huang N, Morris J. 118. et al. 2012. A systematic survey of loss-of-function variants in human protein-coding genes. Science 335:823–28 [Google Scholar]
Rivas MA, Pirinen M, Conrad DF, Lek M, Tsang EK. 119. et al. 2015. Effect of predicted protein-truncating genetic variants on the human transcriptome. Science 348:666–69 [Google Scholar]
Geiger T, Cox J, Mann M. 120. 2010. Proteomic changes resulting from gene copy number variations in cancer cells. PLOS Genet. 6:e1001090 [Google Scholar]
Mertens F, Johansson B, Fioretos T, Mitelman F. 121. 2015. The emerging complexity of gene fusions in cancer. Nat. Rev. Cancer 15:371–81 [Google Scholar]
Sun H, Xing X, Li J, Zhou F, Chen Y. 122. et al. 2013. Identification of gene fusions from human lung cancer mass spectrometry data. BMC Genom. 14:S5 [Google Scholar]
Conlon KP, Basrur V, Rolland D, Wolfe T, Nesvizhskii AI. 123. et al. 2013. Fusion peptides from oncogenic chimeric proteins as putative specific biomarkers of cancer. Mol. Cell. Proteom. 12:2714–23 [Google Scholar]
Kim D, Salzberg SL. 124. 2011. TopHat-Fusion: an algorithm for discovery of novel fusion transcripts. Genome Biol. 12:15 [Google Scholar]
Casado-Vela J, Lacal JC, Elortza F. 125. 2013. Protein chimerism: novel source of protein diversity in humans adds complexity to bottom-up proteomics. Proteomics 13:5–11 [Google Scholar]
Li MY, Wang IX, Li Y, Bruzel A, Richards AL. 126. et al. 2011. Widespread RNA and DNA sequence differences in the human transcriptome. Science 333:53–58 [Google Scholar]
Kroll JE, de Souza SJ, de Souza GA. 127. 2014. Identification of rare alternative splicing events in MS/MS data reveals a significant fraction of alternative translation initiation sites. PeerJ 2:e673 [Google Scholar]
Mele M, Ferreira PG, Reverter F, DeLuca DS, Monlong J. 128. et al. 2015. The human transcriptome across tissues and individuals. Science 348:660–65 [Google Scholar]
Ning K, Nesvizhskii AI. 129. 2010. The utility of mass spectrometry-based proteomic data for validation of novel alternative splice forms reconstructed from RNA-Seq data: a preliminary assessment. BMC Bioinform. 11:Suppl. 11S14 [Google Scholar]
Low TY, van Heesch S, van den Toorn H, Giansanti P, Cristobal A. 130. et al. 2013. Quantitative and qualitative proteome characteristics extracted from in-depth integrated genomics and proteomics analysis. Cell Rep. 5:1469–78 [Google Scholar]
Gawron D, Gevaert K, Van Damme P. 131. 2014. The proteome under translational control. Proteomics 14:2647–59 [Google Scholar]
Gevaert K, Goethals M, Martens L, Van Damme J, Staes A. 132. et al. 2003. Exploring proteomes and analyzing protein processing by mass spectrometric identification of sorted N-terminal peptides. Nat. Biotechnol. 21:566–69 [Google Scholar]
Frith MC, Forrest AR, Nourbakhsh E, Pang KC, Kai C. 133. et al. 2006. The abundance of short proteins in the mammalian proteome. PLOS Genet. 2:515–28 [Google Scholar]
Andrews SJ, Rothnagel JA. 134. 2014. Emerging evidence for functional peptides encoded by short open reading frames. Nat. Rev. Genet. 15:193–204 [Google Scholar]
Oyama M, Itagaki C, Hata H, Suzuki Y, Izumi T. 135. et al. 2004. Analysis of small human proteins reveals the translation of upstream open reading frames of mRNAs. Genome Res. 14:2048–52 [Google Scholar]
Slavoff SA, Mitchell AJ, Schwaid AG, Cabili MN, Ma J. 136. et al. 2013. Peptidomic discovery of short open reading frame-encoded peptides in human cells. Nat. Chem. Biol. 9:59–64 [Google Scholar]
Ma J, Ward CC, Jungreis I, Slavoff SA, Schwaid AG. 137. et al. 2014. Discovery of human sORF-encoded polypeptides (SEPs) in cell lines and tissue. J. Proteome Res. 13:1757–65 [Google Scholar]
Vanderperre B, Lucier J-F, Bissonnette C, Motard J, Tremblay G. 138. et al. 2013. Direct detection of alternative open reading frames translation products in human significantly expands the proteome. PLOS ONE 8:e70698 [Google Scholar]
Vanderperre B, Lucier J-F, Roucou X. 139. 2012. HAltORF: a database of predicted out-of-frame alternative open reading frames in human. Database 2012:bas025 [Google Scholar]
Bianga J, Touat-Hamici Z, Bierla K, Mounicou S, Szpunar J. 140. et al. 2014. Speciation analysis for trace levels of selenoproteins in cultured human cells. J. Proteom. 108:316–24 [Google Scholar]
Belew AT, Meskauskas A, Musalgaonkar S, Advani VM, Sulima SO. 141. et al. 2014. Ribosomal frameshifting in the CCR5 mRNA is regulated by miRNAs and the NMD pathway. Nature 512:265–69 [Google Scholar]
Brosch M, Saunders GI, Frankish A, Collins MO, Yu L. 142. et al. 2011. Shotgun proteomics aids discovery of novel protein-coding genes, alternative splicing, and “resurrected” pseudogenes in the mouse genome. Genome Res. 21:756–67 [Google Scholar]
Ruiz-Orera J, Messeguer X, Subirana JA, Alba MM. 143. 2014. Long non-coding RNAs as a source of new peptides. eLife 3:e03523 [Google Scholar]
Sun H, Chen C, Shi M, Wang D, Liu M. 144. et al. 2014. Integration of mass spectrometry and RNA-Seq data to confirm human ab initio predicted genes and lncRNAs. Proteomics 14:2760–68 [Google Scholar]
Prabakaran S, Hemberg M, Chauhan R, Winter D, Tweedie-Cullen RY. 145. et al. 2014. Quantitative profiling of peptides from RNAs classified as noncoding. Nat. Commun. 5:10 [Google Scholar]
Khan Z, Ford MJ, Cusanovich DA, Mitrano A, Pritchard JK, Gilad Y. 146. 2013. Primate transcript and protein expression levels evolve under compensatory selection pressures. Science 342:1100–4 [Google Scholar]
Horvatovich P, Franke L, Bischoff R. 147. 2014. Proteomic studies related to genetic determinants of variability in protein concentrations. J. Proteome Res. 13:5–14 [Google Scholar]
Wu LF, Candille SI, Choi Y, Xie D, Jiang LH. 148. et al. 2013. Variation and genetic control of protein abundance in humans. Nature 499:79–82 [Google Scholar]
Battle A, Khan Z, Wang SH, Mitrano A, Ford MJ. 149. et al. 2015. Impact of regulatory variation from RNA to protein. Science 347:664–67 [Google Scholar]
Wu J-R, Zeng R. 150. 2012. Molecular basis for population variation: from SNPs to SAPs. FEBS Lett. 586:2841–45 [Google Scholar]
Alfaro JA, Sinha A, Kislinger T, Boutros PC. 151. 2014. Onco-proteogenomics: Cancer proteomics joins forces with genomics. Nat. Methods 11:1107–13 [Google Scholar]
Yang X, Lazar IM. 152. 2014. XMAn: a Homo sapiens mutated-peptide database for the MS analysis of cancerous cell states. J. Proteome Res. 13:5486–95 [Google Scholar]
Huang PJ, Lee CC, Tan BCM, Yeh YM, Chu LJ. 153. et al. 2015. CMPD: cancer mutant proteome database. Nucleic Acids Res. 43:D849–55 [Google Scholar]
Ellis MJ, Gillette M, Carr SA, Paulovich AG, Smith RD. 154. et al. 2013. Connecting genomic alterations to cancer biology with proteomics: the NCI Clinical Proteomic Tumor Analysis Consortium. Cancer Discov. 3:1108–12 [Google Scholar]
Halvey PJ, Wang XJ, Wang J, Bhat AA, Dhawan P. 155. et al. 2014. Proteogenomic analysis reveals unanticipated adaptations of colorectal tumor cells to deficiencies in DNA mismatch repair. Cancer Res. 74:387–97 [Google Scholar]
Nie S, Yin H, Tan Z, Anderson MA, Ruffin MT. 156. et al. 2014. Quantitative analysis of single amino acid variant peptides associated with pancreatic cancer in serum by an isobaric labeling quantitative method. J. Proteome Res. 13:6058–66 [Google Scholar]
Karpova MA, Karpov DS, Ivanov MV, Pyatnitskiy MA, Chernobrovkin AL. 157. et al. 2014. Exome-driven characterization of the cancer cell lines at the proteome level: the NCI-60 case study. J. Proteome Res. 13:5551–60 [Google Scholar]
Menon R, Zhang Q, Zhang Y, Fermin D, Bardeesy N. 158. et al. 2009. Identification of novel alternative splice isoforms of circulating proteins in a mouse model of human pancreatic cancer. Cancer Res. 69:300–9 [Google Scholar]
Menon R, Im H, Zhang E, Wu S-L, Chen R. 159. et al. 2014. Distinct splice variants and pathway enrichment in the cell-line models of aggressive human breast cancer subtypes. J. Proteome Res. 13:212–27 [Google Scholar]
Wang Q, Chaerkady R, Wu JA, Hwang HJ, Papadopoulos N. 160. et al. 2011. Mutant proteins as cancer-specific biomarkers. PNAS 108:2444–49 [Google Scholar]
Olsen L, Campos B, Winther O, Sgroi DC, Karger BL, Brusic V. 161. 2014. Tumor antigens as proteogenomic biomarkers in invasive ductal carcinomas. BMC Med. Genom. 7:Suppl. 3S2 [Google Scholar]
Mathivanan S, Ji H, Tauro BJ, Chen YS, Simpson RJ. 162. 2012. Identifying mutated proteins secreted by colon cancer cell lines using mass spectrometry. J. Proteom. 76:141–49 [Google Scholar]
Barnidge DR, Tschumper RC, Theis JD, Snyder MR, Jelinek DF. 163. et al. 2014. Monitoring M-proteins in patients with multiple myeloma using heavy-chain variable region clonotypic peptides and LC-MS/MS. J. Proteome Res. 13:1905–10 [Google Scholar]
Dasari S, Theis JD, Vrana JA, Meureta OM, Quint PS. 164. et al. 2015. Proteomic detection of immunoglobulin light chain variable region peptides from amyloidosis patient biopsies. J. Proteome Res. 14:1957–67 [Google Scholar]
Lavinder JJ, Horton AP, Georgiou G, Ippolito GC. 165. 2015. Next-generation sequencing and protein mass spectrometry for the comprehensive analysis of human cellular and serum antibody repertoires. Curr. Opin. Chem. Biol. 24:112–20 [Google Scholar]
Steijger T, Abril JF, Engstrom PG, Kokocinski F, Hubbard TJ. 166. et al. 2013. Assessment of transcript reconstruction methods for RNA-Seq. Nat. Methods 10:1177–84 [Google Scholar]
Sharon D, Tilgner H, Grubert F, Snyder M. 167. 2013. A single-molecule long-read survey of the human transcriptome. Nat. Biotechnol. 31:1009–14 [Google Scholar]
Tran JC, Zamdborg L, Ahlf DR, Lee JE, Catherman AD. 168. et al. 2011. Mapping intact protein isoforms in discovery mode using top-down proteomics. Nature 480:254–58 [Google Scholar]
Smith LM, Kelleher NL. 169. Consort. Top Down Proteom. 2013. Proteoform: a single term describing protein complexity. Nat. Methods 10:186–87 [Google Scholar]

/content/journals/10.1146/annurev-anchem-071015-041722

Proteogenomics: Integrating Next-Generation Sequencing and Mass Spectrometry to Characterize Human Proteomic Variation

Annual Review of Analytical Chemistry 9, 521 (2016); https://doi.org/10.1146/annurev-anchem-071015-041722

/content/journals/10.1146/annurev-anchem-071015-041722

Data & Media loading...

Article Type: Review Article

Most Cited Most Cited RSS feed

- Surface-Enhanced Raman Spectroscopy
  
  Paul L. Stiles, Jon A. Dieringer, Nilam C. Shah, and Richard P. Van Duyne
  
  Vol. 1 (2008), pp. 601–626
- Coherent Anti-Stokes Raman Scattering Microscopy: Chemical Imaging for Biology and Medicine
  
  Conor L. Evans, and X. Sunney Xie
  
  Vol. 1 (2008), pp. 883–909
- Electrochemical Impedance Spectroscopy
  
  Byoung-Yong Chang, and Su-Moon Park
  
  Vol. 3 (2010), pp. 207–229
- Applications of Aptamers as Sensors
  
  Eun Jeong Cho, Joo-Woon Lee, and Andrew D. Ellington
  
  Vol. 2 (2009), pp. 241–264
- Digital Microfluidics
  
  Kihwan Choi, Alphonsus H.C. Ng, Ryan Fobel, and Aaron R. Wheeler
  
  Vol. 5 (2012), pp. 413–440
- Semiconductor Quantum Dots for Bioimaging and Biodiagnostic Applications
  
  Brad A. Kairdolf, Andrew M. Smith, Todd H. Stokes, May D. Wang, Andrew N. Young, and Shuming Nie
  
  Vol. 6 (2013), pp. 143–162
- The Asphaltenes
  
  Oliver C. Mullins
  
  Vol. 4 (2011), pp. 393–418
- Next-Generation Sequencing Platforms
  
  Elaine R. Mardis
  
  Vol. 6 (2013), pp. 287–303
- Liposomes: Technologies and Analytical Applications
  
  Aldo Jesorka, and Owe Orwar
  
  Vol. 1 (2008), pp. 801–832
- Biomolecule Analysis by Ion Mobility Spectrometry
  
  Brian C. Bohrer, Samuel I. Merenbloom, Stormy L. Koeniger, Amy E. Hilderbrand, and David E. Clemmer
  
  Vol. 1 (2008), pp. 293–327
More Less

Annual Review of Analytical Chemistry

Volume 9, 2016

Review Article

Free

Proteogenomics: Integrating Next-Generation Sequencing and Mass Spectrometry to Characterize Human Proteomic Variation

Abstract

Most Read This Month

Most Cited Most Cited RSS feed