Multiset Statistics for Gene Set Analysis

Michael A. Newton; Zhishi Wang

doi:10.1146/annurev-statistics-010814-020335

Annual Review of Statistics and Its Application

Volume 2, 2015

Review Article

Free

Multiset Statistics for Gene Set Analysis

Michael A. Newton^1,2, and Zhishi Wang¹
View Affiliations Hide Affiliations

Affiliations: ¹Department of Statistics and ²Department of Biostatistics and Medical Informatics, University of Wisconsin, Madison, Wisconsin 53706; email: [email protected], [email protected]
Vol. 2:95-111 (Volume publication date May 2015) https://doi.org/10.1146/annurev-statistics-010814-020335
First published as a Review in Advance on January 19, 2015
© Annual Reviews

Abstract

An important data analysis task in statistical genomics involves the integration of genome-wide gene-level measurements with preexisting data on the same genes. A wide variety of statistical methodologies and computational tools have been developed for this general task. We emphasize one particular distinction among methodologies, namely whether they process gene sets one at a time (uniset) or simultaneously via some multiset technique. Owing to the complexity of collections of gene sets, the multiset approach offers some advantages, as it naturally accommodates set-size variations and among-set overlaps. However, this approach presents both computational and inferential challenges. After reviewing some statistical issues that arise in uniset analysis, we examine two model-based multiset methods for gene list data.

Keyword(s): gene set enrichment, role model, statistical genomics

Article metrics loading...

/content/journals/10.1146/annurev-statistics-010814-020335

2015-04-10

2024-05-11

Full text loading...

/deliver/fulltext/statistics/2/1/annurev-statistics-010814-020335.html?itemId=/content/journals/10.1146/annurev-statistics-010814-020335&mimeType=html&fmt=ahah

Literature Cited

Ackermann M, Strimmer K. 2009. A general modular framework for gene set enrichment analysis. BMC Bioinformatics 10:147 [Google Scholar]
Alexa A, Rahnenfuhrer J, Lengauer T. 2006. Improved scoring of functional groups from gene expression data by decorrelating GO graph structure. Bioinformatics 22:1600–7 [Google Scholar]
Ashburner M, Bell CA, Blake JA, Botstein D, Butler H. (Gene Ontology Consortium) 2000. Gene Ontology: tool for the unification of biology. Nat. Genet. 25:25–29 http://www.geneontology.org [Google Scholar]
Barry WT, Nobel AB, Wright FA. 2005. Significance analysis of functional categories in gene expression studies: a structured permutation approach. Bioinformatics 21:1943–49 [Google Scholar]
Barry WT, Nobel AB, Wright FA. 2008. A statistical framework for testing functional categories in microarray data. Ann. Appl. Stat. 2:286–315 [Google Scholar]
Bauer S, Gagneur J, Robinson PN. 2010. GOing Bayesian: model-based gene set analysis of genome-scale data. Nucleic Acids Res. 38:113523–32 [Google Scholar]
Bauer S, Robinson PN, Gagneur J. 2011. Model-based gene set analysis for Bioconductor. Bioinformatics 27:131882–83 [Google Scholar]
Croft D, Mundo AF, Haw R, Milacic M, Weiser J. et al. 2014. The Reactome pathway knowledgebase. Nucleic Acids Res. 42:1D472–77 [Google Scholar]
Dudoit S, van der Laan MJ. 2007. Multiple Testing Procedures with Applications to Genomics New York: Springer
Efron B. 2004. Large-scale simultaneous hypothesis testing: the choice of the null hypothesis. J. Am. Stat. Assoc. 99:96–104 [Google Scholar]
Efron B, Tibshirani R. 2007. On testing the significance of sets of genes. Ann. Appl. Stat. 1:107–29 [Google Scholar]
Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M. et al. 2004. Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 5:10R80 [Google Scholar]
Gillis J, Pavlidis P. 2013. Assessing identity, redundancy and confounds in Gene Ontology annotations over time. Bioinformatics 29:4476–82 [Google Scholar]
Goeman JJ, Bühlmann P. 2007. Analyzing gene expression data in terms of gene sets: methodological issues. Bioinformatics 23:8980–87 [Google Scholar]
Grossmann S, Bauer S, Robinson PN, Vingron M. 2007. Improved detection of overrepresentation of Gene-Ontology annotations with parent–child analysis. Bioinformatics 23:3024–31 [Google Scholar]
Joshi-Tope G, Gillespie M, Vastrik I, D'Eustachio P, Schmidt E. et al. 2005. Reactome: a knowledgebase of biological pathways. Nucleic Acids Res. 33:D428–32 http://www.reactome.org [Google Scholar]
Kanehisa M, Goto S. 2000. KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 28:27–30 http://www.genome.jp/kegg [Google Scholar]
Khatri P, Drăghici S. 2005. Ontological analysis of gene expression data: current tools, limitations, and open problems. Bioinformatics 21:3587–95 [Google Scholar]
Khatri P, Sirota M, Butte AJ. 2012. Ten years of pathway analysis: current approaches and outstanding challenges. PLOS Comput. Biol. 8:e1002375 [Google Scholar]
Liberzon A, Subramanian A, Pinchback R, Thorvaldsdottir H, Tamayo P, Mesirov JP. 2011. Molecular Signatures Database (MSigDB) 3.0. Bioinformatics 27:1739–40 http://www.broadinstitute.org/gsea/msigdb [Google Scholar]
Lu Y, Rosenfeld R, Simon I, Nau GJ, Bar-Joseph Z. 2008. A probabilistic generative model for GO enrichment analysis. Nucleic Acids Res. 36:e109 [Google Scholar]
Maciejewski H. 2014. Gene set analysis methods: statistical models and methodological differences. Brief. Bioinformat. 15:504–18 [Google Scholar]
Newton MA, He Q, Kendziorski C. 2012. A model-based analysis to infer the functional content of a gene list. Stat. Appl. Genet. Mol. Biol. 11:2 doi:10.2202/1544-6115.1716 [Google Scholar]
Newton MA, Quintana FA, den Boon JA, Sengupta S, Ahlquist P. 2007. Random-set methods identify distinct aspects of the enrichment signal in gene-set analysis. Ann. Appl. Stat. 1:185–106 [Google Scholar]
R Core Team 2014. R: A Language and Environment for Statistical Computing Vienna: R Found. Stat. Comput http://www.R-project.org
Sartor MA, Leikauf GD, Medvedovic M. 2009. LRpath: a logistic regression approach for identifying enriched biological groups in gene expression data. Bionformatics 25:211–17 [Google Scholar]
Sengupta S, den Boon JA, Chen I-H, Newton MA, Dahl DB. et al. 2006. Genome-wide expression profiling reveals EBV-associated inhibition of MHC class I expression in nasopharyngeal carcinoma. Cancer Res. 66:167999–8006 [Google Scholar]
Subramanian A, Tamayo P, Mootha VK, Mukherjee S, Ebert BL. et al. 2005. Gene set enrichment analysis: a knowledge-based approach for interpreting genome-wide expression profiles. PNAS 102:15545–50 [Google Scholar]
Supek F, Bošnjak M, Skunca N, Smuc T. 2011. REVIGO summarizes and visualizes long lists of Gene Ontology terms. PLOS ONE 6:7e21800 [Google Scholar]
Tian L, Greenberg SA, Kong SW, Altschuler J, Kohane IS. et al. 2005. Discovering statistically significant pathways in expression profiling studies. PNAS 102:13544–49 [Google Scholar]
Wang Z, He Q, Larget B, Newton MA. 2013. A multi-functional analyzer uses parameter constraints to improve the efficiency of model-based gene-set analysis. Ann. Appl. Stat. In press. arXiv:1310.6322 [stat.ME]
Wu D, Lim E, Vaillant F, Asselin-Labat ML, Visvader JE, Smyth GK. 2010. ROAST: rotation gene set tests for complex microarray experiments. Bioinformatics 26:172176–82 [Google Scholar]
Wu D, Smyth GK. 2012. Camera: a competitive gene set test accounting for inter-gene correlation. Nucleic Acids Res. 40:17e133 [Google Scholar]

/content/journals/10.1146/annurev-statistics-010814-020335

Multiset Statistics for Gene Set Analysis

Annual Review of Statistics and Its Application 2, 95 (2015); https://doi.org/10.1146/annurev-statistics-010814-020335

/content/journals/10.1146/annurev-statistics-010814-020335

Data & Media loading...

Article Type: Review Article

Most Cited Most Cited RSS feed

- Functional Data Analysis
  
  Jane-Ling Wang, Jeng-Min Chiou, and Hans-Georg Müller
  
  Vol. 3 (2016), pp. 257–295
- Probabilistic Forecasting
  
  Tilmann Gneiting, and Matthias Katzfuss
  
  Vol. 1 (2014), pp. 125–151
- Bayesian Computing with INLA: A Review
  
  Håvard Rue, Andrea Riebler, Sigrunn H. Sørbye, Janine B. Illian, Daniel P. Simpson, and Finn K. Lindgren
  
  Vol. 4 (2017), pp. 395–421
- Functional Regression
  
  Jeffrey S. Morris
  
  Vol. 2 (2015), pp. 321–359
- Topological Data Analysis
  
  Larry Wasserman
  
  Vol. 5 (2018), pp. 501–532
- Algorithmic Fairness: Choices, Assumptions, and Definitions
  
  Shira Mitchell, Eric Potash, Solon Barocas, Alexander D'Amour, and Kristian Lum
  
  Vol. 8 (2021), pp. 141–163
- Microbiome, Metagenomics, and High-Dimensional Compositional Data Analysis
  
  Hongzhe Li
  
  Vol. 2 (2015), pp. 73–94
- Learning Deep Generative Models
  
  Ruslan Salakhutdinov
  
  Vol. 2 (2015), pp. 361–385
- On p-Values and Bayes Factors
  
  Leonhard Held, and Manuela Ott
  
  Vol. 5 (2018), pp. 393–419
- High-Dimensional Statistics with a View Toward Applications in Biology
  
  Peter Bühlmann, Markus Kalisch, and Lukas Meier
  
  Vol. 1 (2014), pp. 255–278
More Less

Annual Review of Statistics and Its Application

Volume 2, 2015

Review Article

Free

Multiset Statistics for Gene Set Analysis

Abstract

Most Read This Month

Most Cited Most Cited RSS feed