Global Testing and Large-Scale Multiple Testing for High-Dimensional Covariance Structures

T. Tony Cai

doi:10.1146/annurev-statistics-060116-053754

Annual Review of Statistics and Its Application

Volume 4, 2017

Review Article

Free

Global Testing and Large-Scale Multiple Testing for High-Dimensional Covariance Structures

T. Tony Cai¹
View Affiliations Hide Affiliations

Affiliations: Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, Pennsylvania 19104; email: [email protected]
Vol. 4:423-446 (Volume publication date March 2017) https://doi.org/10.1146/annurev-statistics-060116-053754
First published as a Review in Advance on October 06, 2016
© Annual Reviews

Abstract

Driven by a wide range of contemporary applications, statistical inference for covariance structures has been an active area of current research in high-dimensional statistics. This review provides a selective survey of some recent developments in hypothesis testing for high-dimensional covariance structures, including global testing for the overall pattern of the covariance structures and simultaneous testing of a large collection of hypotheses on the local covariance structures with false discovery proportion and false discovery rate control. Both one-sample and two-sample settings are considered. The specific testing problems discussed include global testing for the covariance, correlation, and precision matrices, and multiple testing for the correlations, Gaussian graphical models, and differential networks.

Keyword(s): correlation matrix, covariance matrix, differential network, false discovery proportion, false discovery rate, Gaussian graphical model, global testing, multiple testing, null distribution, precision matrix, sparsity, thresholding

Article metrics loading...

/content/journals/10.1146/annurev-statistics-060116-053754

2017-03-07

2024-05-18

Full text loading...

/deliver/fulltext/statistics/4/1/annurev-statistics-060116-053754.html?itemId=/content/journals/10.1146/annurev-statistics-060116-053754&mimeType=html&fmt=ahah

Literature Cited

Anderson TW. 2003. An Introduction to Multivariate Statistical Analysis New York: Wiley, 3rd. ed.
Andrews DW. 1991. Heteroskedasticity and autocorrelation consistent covariance matrix estimation. Econometrica 59:817–58 [Google Scholar]
Bai Z, Jiang D, Yao J-F, Zheng S. 2009. Corrections to LRT on large-dimensional covariance matrix by RMT. Ann. Stat. 37:6B3822–40 [Google Scholar]
Baik J, Ben Arous G, Péché S. 2005. Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices. Ann. Probab. 33:51643–97 [Google Scholar]
Bandyopadhyay S, Mehta M, Kuo D, Sung M-K, Chuang R. et al. 2010. Rewiring of genetic networks in response to DNA damage. Sci. Signal. 330:1385–89 [Google Scholar]
Banerjee O, El Ghaoui L, d'Aspremont A. 2008. Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. J. Mach. Learn. Res. 9:485–516 [Google Scholar]
Benjamini Y, Hochberg Y. 1995. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. B 57:289–300 [Google Scholar]
Benjamini Y, Yekutieli D. 2001. The control of the false discovery rate in multiple testing under dependency. Ann. Stat. 29:1165–88 [Google Scholar]
Berthet Q, Rigollet P. 2013. Optimal detection of sparse principal components in high dimension. Ann. Stat. 41:41780–815 [Google Scholar]
Bickel PJ, Levina E. 2008a. Covariance regularization by thresholding. Ann. Stat. 36:62577–604 [Google Scholar]
Bickel PJ, Levina E. 2008b. Regularized estimation of large covariance matrices. Ann. Stat. 36:1199–227 [Google Scholar]
Birke M, Dette H. 2005. A note on testing the covariance matrix for large dimension. Stat. Probab. Lett. 74:3281–89 [Google Scholar]
Box GE. 1953. Non-normality and tests on variances. Biometrika 40:318–35 [Google Scholar]
Cai TT, Jiang T. 2011. Limiting laws of coherence of random matrices with applications to testing covariance structure and construction of compressed sensing matrices. Ann. Stat. 39:31496–525 [Google Scholar]
Cai TT, Jiang T. 2012. Phase transition in limiting distributions of coherence of high-dimensional random matrices. J. Multivar. Anal. 107:24–39 [Google Scholar]
Cai TT, Liu W. 2011. Adaptive thresholding for sparse covariance matrix estimation. J. Am. Stat. Assoc. 106:494672–84 [Google Scholar]
Cai TT, Liu W. 2016. Large-scale multiple testing of correlations. J. Am. Stat. Assoc. 111:229–40 [Google Scholar]
Cai TT, Liu W, Luo X. 2011. A constrained ℓ₁ minimization approach to sparse precision matrix estimation. J. Am. Stat. Assoc. 106:494594–607 [Google Scholar]
Cai TT, Liu W, Xia Y. 2013a. Two-sample covariance matrix testing and support recovery in high-dimensional and sparse settings. J. Am. Stat. Assoc. 108:501265–77 [Google Scholar]
Cai TT, Liu W, Zhou HH. 2016a. Estimating sparse precision matrix: Optimal rates of convergence and adaptive estimation. Ann. Stat. 44:455–88 [Google Scholar]
Cai TT, Ma Z. 2013. Optimal hypothesis testing for high dimensional covariance matrices. Bernoulli 19:5B2359–88 [Google Scholar]
Cai TT, Ma Z, Wu Y. 2013b. Sparse PCA: optimal rates and adaptive estimation. Ann. Stat. 41:63074–110 [Google Scholar]
Cai TT, Ma Z, Wu Y. 2015. Optimal estimation and rank detection for sparse spiked covariance matrices. Probab. Theory Relat. Fields 161:3–4781–815 [Google Scholar]
Cai TT, Ren Z, Zhou HH. 2013c. Optimal rates of convergence for estimating Toeplitz covariance matrices. Probab. Theory Relat. Fields 156:1–2101–43 [Google Scholar]
Cai TT, Ren Z, Zhou HH. 2016b. Estimating structured high-dimensional covariance and precision matrices: optimal rates and adaptive estimation (with discussion). Electron. J. Stat. 10:1–59 [Google Scholar]
Cai TT, Yuan M. 2012. Adaptive covariance matrix estimation through block thresholding. Ann. Stat. 40:42014–42 [Google Scholar]
Cai TT, Zhang A. 2016. Inference on high-dimensional differential correlation matrices. J. Multivar. Anal. 143:107–26 [Google Scholar]
Cai TT, Zhang CH, Zhou HH. 2010. Optimal rates of convergence for covariance matrix estimation. Ann. Stat. 38:42118–44 [Google Scholar]
Cai TT, Zhou HH. 2012. Optimal rates of convergence for sparse covariance matrix estimation. Ann. Stat. 40:52389–420 [Google Scholar]
Carter SL, Brechbühler CM, Griffin M, Bond AT. 2004. Gene co-expression network topology provides a framework for molecular characterization of cellular state. Bioinformatics 20:142242–50 [Google Scholar]
Chandrasekaran V, Parrilo PA, Willsky AS. 2012. Latent variable graphical model selection via convex optimization. Ann. Stat. 40:41935–67 [Google Scholar]
Chen SX, Zhang LX, Zhong PS. 2010. Tests for high-dimensional covariance matrices. J. Am. Stat. Assoc. 105:490810–19 [Google Scholar]
Danaher P, Wang P, Witten DM. 2014. The joint graphical lasso for inverse covariance estimation across multiple classes. J. R. Stat. Soc. B 76:2373–97 [Google Scholar]
d'Aspremont A, Banerjee O, El Ghaoui L. 2008. First-order methods for sparse covariance selection. SIAM J. Matrix Anal. Appl. 30:156–66 [Google Scholar]
de la Fuente A. 2010. From “differential expression” to “differential networking”—identification of dysfunctional regulatory networks in diseases. Trends Genet. 26:7326–33 [Google Scholar]
Dubois PC, Trynka G, Franke L, Hunt KA, Romanos J. et al. 2010. Multiple common variants for celiac disease influencing immune gene expression. Nat. Genet. 42:4295–302 [Google Scholar]
Efron B. 2004. Large-scale simultaneous hypothesis testing: the choice of a null hypothesis. J. Am. Stat. Assoc. 99:96–104 [Google Scholar]
Efron B. 2007. Correlation and large-scale simultaneous significance testing. J. Am. Stat. Assoc. 102:93–103 [Google Scholar]
El Karoui N. 2007. Tracy-Widom limit for the largest eigenvalue of a large class of complex sample covariance matrices. Ann. Probab. 35:2663–714 [Google Scholar]
El Karoui N. 2008. Operator norm consistent estimation of large dimensional sparse covariance matrices. Ann. Stat. 36:62717–56 [Google Scholar]
Fan J, Liao Y, Mincheva M. 2011. High dimensional covariance matrix estimation in approximate factor models. Ann. Stat. 39:63320–56 [Google Scholar]
Fan J, Liao Y, Mincheva M. 2013. Large covariance estimation by thresholding principal orthogonal complements. J. R. Stat. Soc. B 75:4603–80 [Google Scholar]
Farcomeni A. 2007. Some results on the control of the false discovery rate under dependence. Scand. J. Stat. 34:2275–97 [Google Scholar]
Fukushima A. 2013. Diffcorr: an R package to analyze and visualize differential correlations in biological networks. Gene 518:1209–14 [Google Scholar]
Fuller TF, Ghazalpour A, Aten JE, Drake TA, Lusis AJ, Horvath S. 2007. Weighted gene coexpression network analysis strategies applied to mouse weight. Mamm. Genome 18:6–7463–72 [Google Scholar]
Furrer R, Bengtsson T. 2007. Estimation of high-dimensional prior and posterior covariance matrices in Kalman filter variants. J. Multivar. Anal. 98:6–7227–55 [Google Scholar]
Genovese C, Wasserman L. 2004. A stochastic process approach to false discovery control. Ann. Stat. 32:1035–61 [Google Scholar]
Gupta A, Tang J. 1984. Distribution of likelihood ratio statistic for testing equality of covariance matrices of multivariate Gaussian models. Biometrika 71:3555–59 [Google Scholar]
Gupta SD, Giri N. 1973. Properties of tests concerning covariance matrices of normal distributions. Ann. Stat. 1:61222–24 [Google Scholar]
Hirai MY, Sugiyama K, Sawada Y, Tohge T, Obayashi T. et al. 2007. Omics-based identification of Arabidopsis Myb transcription factors regulating aliphatic glucosinolate biosynthesis. PNAS 104:156478–83 [Google Scholar]
Huang JZ, Liu N, Pourahmadi M, Liu L. 2006. Covariance matrix selection and estimation via penalised normal likelihood. Biometrika 93:185–98 [Google Scholar]
Hudson NJ, Reverter A, Dalrymple BP. 2009. A differential wiring analysis of expression data correctly identifies the gene containing the causal mutation. PLoS Comput. Biol. 5:5e1000382 [Google Scholar]
Ideker T, Krogan N. 2012. Differential network biology. Mol. Syst. Biol. 8:1565 [Google Scholar]
Jiang D, Jiang T, Yang F. 2012. Likelihood ratio tests for covariance matrices of high-dimensional normal distributions. J. Stat. Plann. Inference 142:82241–56 [Google Scholar]
Jiang T. 2004. The asymptotic distributions of the largest entries of sample correlation matrices. Ann. Appl. Probab. 14:2865–80 [Google Scholar]
John S. 1971. Some optimal multivariate tests. Biometrika 58:1123–27 [Google Scholar]
Johnstone IM. 2001. On the distribution of the largest eigenvalue in principal component analysis. Ann. Stat. 29:2295–327 [Google Scholar]
Johnstone IM, Lu AY. 2009. On consistency and sparsity for principal components analysis in high dimensions. J. Am. Stat. Assoc. 104:486682–93 [Google Scholar]
Lam C, Fan J. 2009. Sparsistency and rates of convergence in large covariance matrix estimation. Ann. Stat. 37:6B4254–78 [Google Scholar]
Lauritzen SL. 1996. Graphical Models Oxford, UK: Oxford Univ. Press
Ledoit O, Wolf M. 2002. Some hypothesis tests for the covariance matrix when the dimension is large compared to the sample size. Ann. Stat. 30:41081–102 [Google Scholar]
Lee HK, Hsu AK, Sajdak J, Qin J, Pavlidis P. 2004. Coexpression analysis of human genes across many microarray data sets. Genome Res. 14:61085–94 [Google Scholar]
Li J, Chen SX. 2012. Two sample tests for high-dimensional covariance matrices. Ann. Stat. 40:2908–40 [Google Scholar]
Li K-C, Palotie A, Yuan S, Bronnikov D, Chen D. et al. 2007. Finding disease candidate genes by liquid association. Genome Biol. 8:10R205 [Google Scholar]
Ligeralde AV, Brown BW. 1995. Band covariance matrix estimation using restricted residuals: A Monte Carlo analysis. Int. Econ. Rev. 36:751–67 [Google Scholar]
Liu W. 2013. Gaussian graphical model estimation with false discovery rate control. Ann. Stat. 41:62948–78 [Google Scholar]
Liu W, Shao Q-M. 2014. Phase transition and regularized bootstrap in large-scale t-tests with false discovery rate control. Ann. Stat. 42:52003–25 [Google Scholar]
Meinshausen N, Bühlmann P. 2006. High dimensional graphs and variable selection with the Lasso. Ann. Stat. 34:31436–62 [Google Scholar]
Muirhead RJ. 1982. Aspects of Multivariate Statistical Theory New York: Wiley
Nagao H. 1973. On some test criteria for covariance matrix. Ann. Stat. 1:4700–9 [Google Scholar]
Onatski A, Moreira M, Hallin M. 2013. Asymptotic power of sphericity tests for high-dimensional data. Ann. Stat. 41:31204–31 [Google Scholar]
Paul D. 2007. Asymptotics of sample eigenstructure for a large dimensional spiked covariance model. Stat. Sin. 17:41617 [Google Scholar]
Peche S. 2009. Universality results for the largest eigenvalues of some sample covariance matrix ensembles. Probab. Theory Relat. Fields 143:3–4481–516 [Google Scholar]
Perlman MD. 1980. Unbiasedness of the likelihood ratio tests for equality of several covariance matrices and equality of several multivariate normal populations. Ann. Stat. 8:247–63 [Google Scholar]
Qiu X, Klebanov L, Yakovlev A. 2005. Correlation between gene expression levels and limitations of the empirical Bayes methodology for finding differentially expressed genes. Stat. Appl. Genet. Mol. Biol. 4:134 [Google Scholar]
Qiu Y, Chen SX. 2012. Test for bandedness of high-dimensional covariance matrices and bandwidth estimation. Ann. Stat. 40:31285–314 [Google Scholar]
Raizada RD, Richards TL, Meltzoff A, Kuhl PK. 2008. Socioeconomic status predicts hemispheric specialisation of the left inferior frontal gyrus in young children. Neuroimage 40:31392–401 [Google Scholar]
Ravikumar P, Wainwright MJ, Raskutti G, Yu B. 2011. High-dimensional covariance estimation by minimizing ℓ₁ penalized log-determinant divergence. Electron. J. Stat. 5:935–80 [Google Scholar]
Ren Z, Sun T, Zhang C-H, Zhou HH. 2015. Asymptotic normality and optimalities in estimation of large Gaussian graphical models. Ann. Stat. 43:3991–1026 [Google Scholar]
Rothman AJ, Bickel PJ, Levina E, Zhu J. 2008. Sparse permutation invariant covariance estimation. Electron. J. Stat. 2:494–515 [Google Scholar]
Rothman AJ, Levina E, Zhu J. 2009. Generalized thresholding of large covariance matrices. J. Am. Stat. Assoc. 104:485177–86 [Google Scholar]
Schott JR. 2007. A test for the equality of covariance matrices when the dimension is large relative to the sample sizes. Comput. Stat. Data Anal. 51:126535–42 [Google Scholar]
Shao Q-M, Zhou W-X. 2014. Necessary and sufficient conditions for the asymptotic distributions of coherence of ultra-high dimensional random matrices. Ann. Probab. 42:2623–48 [Google Scholar]
Shaw P, Greenstein D, Lerch J, Clasen L, Lenroot R. et al. 2006. Intellectual ability and cortical development in children and adolescents. Nature 440:7084676–79 [Google Scholar]
Shedden K, Taylor J. 2005. Differential correlation detects complex associations between gene expression and clinical outcomes in lung adenocarcinomas. Methods of Microarray Data Analysis 4 JS Shoemaker, SM Lin 121–31 New York: Springer [Google Scholar]
Soshnikov A. 2002. A note on universality of the distribution of the largest eigenvalues in certain sample covariance matrices. J. Stat. Phys. 108:5–61033–56 [Google Scholar]
Srivastava MS. 2005. Some tests concerning the covariance matrix in high dimensional data. J. Jpn. Stat. Soc. 35:2251–72 [Google Scholar]
Srivastava MS, Yanagihara H. 2010. Testing the equality of several covariance matrices with fewer observations than the dimension. J. Multivar. Anal. 101:61319–29 [Google Scholar]
Storey JD. 2002. A direct approach to false discovery rates. J. R. Stat. Soc. B 64:479–98 [Google Scholar]
Storey JD, Taylor J, Siegmund D. 2004. Strong control, conservative point estimation, and simultaneous conservative consistency of false discovery rates: a unified approach. J. R. Stat. Soc. B 66:187–205 [Google Scholar]
Sugiura N, Nagao H. 1968. Unbiasedness of some test criteria for the equality of one or two covariance matrices. Ann. Math. Stat. 39:51686–92 [Google Scholar]
Sun T, Zhang C-H. 2013. Sparse matrix inversion with scaled Lasso. J. Mach. Learn. Res. 14:13385–418 [Google Scholar]
Sun W, Cai TT. 2007. Oracle and adaptive compound decision rules for false discovery rate control. J. Am. Stat. Assoc. 102:901–12 [Google Scholar]
Sun W, Cai TT. 2009. Large-scale multiple testing under dependence. J. R. Stat. Soc. B 71:393–424 [Google Scholar]
Sun W, Reich BJ, Cai TT, Guindani M, Schwartzman A. 2015. False discovery control in large-scale spatial multiple testing. J. R. Stat. Soc. B 77:59–83 [Google Scholar]
Vu VQ, Lei J. 2013. Minimax sparse principal subspace estimation in high dimensions. Ann. Stat. 41:62905–47 [Google Scholar]
Wu WB. 2008. On false discovery control under dependence. Ann. Stat. 36364–80
Wu WB, Pourahmadi M. 2009. Banding sample autocovariance matrices of stationary processes. Stat. Sin. 19:41755–68 [Google Scholar]
Xia Y, Cai T, Cai TT. 2015. Testing differential networks with applications to detecting gene-by-gene interactions. Biometrika 102:247–66 [Google Scholar]
Xiao H, Wu WB. 2011. Simultaneous inference of covariances. arXiv1109.0524 [math.ST]
Yuan M. 2010. Sparse inverse covariance matrix estimation via linear programming. J. Mach. Learn. Res. 11:2261–86 [Google Scholar]
Yuan M, Lin Y. 2007. Model selection and estimation in the Gaussian graphical model. Biometrika 94:119–35 [Google Scholar]
Zhang J, Li J, Deng H. 2008. Class-specific correlations of gene expressions: Identification and their effects on clustering analyses. Am. J. Hum. Genet. 83:2269–77 [Google Scholar]
Zhao SD, Cai TT, Li H. 2014. Direct estimation of differential networks. Biometrika 101:253–68 [Google Scholar]
Zheng S, Bai Z, Yao J. 2015. Substitution principle for CLT of linear spectral statistics of high-dimensional sample covariance matrices with applications to hypothesis testing. Ann. Stat. 43:2546–91 [Google Scholar]
Zhu D, Hero AO, Qin ZS, Swaroop A. 2005. High throughput screening of co-expressed gene pairs with controlled false discovery rate (FDR) and minimum acceptable strength (MAS). J. Comput. Biol. 12:71029–45 [Google Scholar]
Zou H, Hastie T, Tibshirani R. 2006. Sparse principal component analysis. J. Comput. Graph. Stat. 15:2265–86 [Google Scholar]

/content/journals/10.1146/annurev-statistics-060116-053754

Article Type: Review Article

Most Cited Most Cited RSS feed

- Functional Data Analysis
  
  Jane-Ling Wang, Jeng-Min Chiou, and Hans-Georg Müller
  
  Vol. 3 (2016), pp. 257–295
- Probabilistic Forecasting
  
  Tilmann Gneiting, and Matthias Katzfuss
  
  Vol. 1 (2014), pp. 125–151
- Bayesian Computing with INLA: A Review
  
  Håvard Rue, Andrea Riebler, Sigrunn H. Sørbye, Janine B. Illian, Daniel P. Simpson, and Finn K. Lindgren
  
  Vol. 4 (2017), pp. 395–421
- Functional Regression
  
  Jeffrey S. Morris
  
  Vol. 2 (2015), pp. 321–359
- Topological Data Analysis
  
  Larry Wasserman
  
  Vol. 5 (2018), pp. 501–532
- Algorithmic Fairness: Choices, Assumptions, and Definitions
  
  Shira Mitchell, Eric Potash, Solon Barocas, Alexander D'Amour, and Kristian Lum
  
  Vol. 8 (2021), pp. 141–163
- Microbiome, Metagenomics, and High-Dimensional Compositional Data Analysis
  
  Hongzhe Li
  
  Vol. 2 (2015), pp. 73–94
- Learning Deep Generative Models
  
  Ruslan Salakhutdinov
  
  Vol. 2 (2015), pp. 361–385
- On p-Values and Bayes Factors
  
  Leonhard Held, and Manuela Ott
  
  Vol. 5 (2018), pp. 393–419
- High-Dimensional Statistics with a View Toward Applications in Biology
  
  Peter Bühlmann, Markus Kalisch, and Lukas Meier
  
  Vol. 1 (2014), pp. 255–278
More Less

Annual Review of Statistics and Its Application

Volume 4, 2017

Review Article

Free

Global Testing and Large-Scale Multiple Testing for High-Dimensional Covariance Structures

Abstract

Most Read This Month

Most Cited Most Cited RSS feed