Structured Regularizers for High-Dimensional Problems: Statistical and Computational Issues

Martin J. Wainwright

doi:10.1146/annurev-statistics-022513-115643

Annual Review of Statistics and Its Application

Volume 1, 2014

Review Article

Free

Structured Regularizers for High-Dimensional Problems: Statistical and Computational Issues

Martin J. Wainwright¹
View Affiliations Hide Affiliations

Affiliations: Department of Statistics and Department of Electrical Engineering and Computer Sciences, University of California, Berkeley, California 94704; email: [email protected]
Vol. 1:233-253 (Volume publication date January 2014) https://doi.org/10.1146/annurev-statistics-022513-115643
© Annual Reviews

Abstract

Regularization is a widely used technique throughout statistics, machine learning, and applied mathematics. Modern applications in science and engineering lead to massive and complex data sets, which motivate the use of more structured types of regularizers. This survey provides an overview of the use of structured regularization in high-dimensional statistics, including regularizers for group-structured and hierarchical sparsity, low-rank matrices, additive and multiplicative matrix decomposition, and high-dimensional nonparametric models. It includes various examples with motivating applications; it also covers key aspects of statistical theory and provides some discussion of efficient algorithms.

Keyword(s): algorithms, high-dimensional statistics, M-estimation, regularization, statistical machine learning

Article metrics loading...

/content/journals/10.1146/annurev-statistics-022513-115643

2014-01-03

2024-04-19

Full text loading...

/deliver/fulltext/statistics/1/1/annurev-statistics-022513-115643.html?itemId=/content/journals/10.1146/annurev-statistics-022513-115643&mimeType=html&fmt=ahah

Literature Cited

Agarwal A, Negahban S, Wainwright MJ. 2012a. Fast global convergence of gradient methods for high-dimensional statistical recovery. Ann. Stat. 40:52452–82 [Google Scholar]
Agarwal A, Negahban S, Wainwright MJ. 2012b. Noisy matrix decomposition via convex relaxation: optimal rates in high dimensions. Ann. Stat. 40:21171–97 [Google Scholar]
Amini AA, Wainwright MJ. 2009. High-dimensional analysis of semidefinite relaxations for sparse principal component analysis. Ann. Stat. 37(5B):2877–921 [Google Scholar]
Bach F. 2008. Consistency of trace norm minimization. J. Mach. Learn. Res. 9:1019–48 [Google Scholar]
Bach F, Jenatton R, Mairal J, Obozinski G. 2012. Optimization with sparsity-inducing penalties. Found. Trends Mach. Learn. 4:11–106 [Google Scholar]
Baraniuk RG, Cevher V, Duarte MF, Hegde C. 2010. Model-based compressive sensing. IEEE Trans. Inf. Theory 56:41982–2001 [Google Scholar]
Berthet Q, Rigollet P. 2013. Computational lower bounds for sparse PCA Tech. Rep., Princeton Univ., Princeton, NJ. http://arxiv1304.0828
Bertsekas DP. 1995. Nonlinear Programming Belmont, MA: Athena Sci.
Bickel P, Li B. 2006. Regularization in statistics. TEST 15:2271–344 [Google Scholar]
Bickel P, Ritov Y, Tsybakov A. 2009. Simultaneous analysis of Lasso and Dantzig selector. Ann. Stat. 37:41705–32 [Google Scholar]
Boyd S, Vandenberghe L. 2004. Convex Optimization Cambridge, UK: Cambridge Univ. Press
Bredies K, Lorenz DA. 2008. Linear convergence of iterative soft thresholding. J. Fourier Anal. Appl. 14:813–37 [Google Scholar]
Bühlmann P, van de Geer S. 2011. Statistics for High-Dimensional Data New York: Springer
Bunea F, Tsybakov A, Wegkamp M. 2007. Sparsity oracle inequalities for the Lasso. Electron. J. Stat.169–94
Cai T, Liu W, Luo X. 2011. A constrained ℓ₁-minimization approach to sparse precision matrix estimation. J. Am. Stat. Assoc. 106:594–607 [Google Scholar]
Cai TT, Liu W, Zhou HH. 2012. Estimating sparse precision matrices: optimal rates of convergence and adaptive estimation Tech. Rep., Wharton Sch., Univ. Pa., Pennsylvania, PA. http://arxiv1212.2882
Candès EJ, Li X, Ma Y, Wright J. 2011. Robust principal component analysis?. J. ACM 58:11 [Google Scholar]
Candès EJ, Plan Y. 2011. Tight oracle bounds for low-rank matrix recovery from a minimal number of random measurements. IEEE Trans. Inf. Theory 57:42342–59 [Google Scholar]
Candès EJ, Recht B. 2009. Exact matrix completion via convex optimization. Found. Comput. Math. 9:6717–72 [Google Scholar]
Candès EJ, Tao T. 2007. The Dantzig selector: statistical estimation when p is much larger than n. Ann. Stat. 35:62313–51 [Google Scholar]
Chandrasekaran V, Parrilo PA, Willsky AS. 2012. Latent variable graphical model selection via convex optimization. Ann. Stat. 40:41935–67 [Google Scholar]
Chandrasekaran V, Sanghavi S, Parrilo PA, Willsky AS. 2011. Rank-sparsity incoherence for matrix decomposition. SIAM J. Optim. 21:572–96 [Google Scholar]
Chen S, Donoho DL, Saunders MA. 1998. Atomic decomposition by basis pursuit. SIAM J. Sci. Comput. 20:133–61 [Google Scholar]
Cohen A, Dahmen W, DeVore R. 2009. Compressed sensing and best k-term approximation. J. Am. Math. Soc. 22:1211–31 [Google Scholar]
d'Aspremont A, Banerjee O, El Ghaoui L. 2008. First-order methods for sparse covariance selection. SIAM J. Matrix Anal. Appl. 30:155–66 [Google Scholar]
Donoho DL. 2006. Compressed sensing. IEEE Trans. Inf. Theory 52:41289–306 [Google Scholar]
Donoho DL, Tanner JM. 2008. Counting faces of randomly-projected polytopes when the projection radically lowers dimension. J. Am. Math. Soc. 22:1–53 [Google Scholar]
Duchi J, Shalev-Shwartz S, Singer Y, Chandra T. 2008. Efficient projections onto the ℓ₁ -ball for learning in high dimensions. Presented at Int. Conf. Mach. Learn., 25th, Helsinki, Finland
Fan J, Li R. 2001. Variable selection via non-concave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96:4561348–60 [Google Scholar]
Fazel M. 2002. Matrix rank minimization with applications PhD Thesis, Stanford Univ., Stanford, CA. http://faculty.washington.edu/mfazel/thesis-final.pdf
Friedman J, Hastie T, Tibshirani R. 2008. Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9:432–41 [Google Scholar]
Fu WJ. 2001. Penalized regression: the bridge versus the Lasso. J. Comput. Graph. Stat. 7:3397–416 [Google Scholar]
Greenshtein E, Ritov Y. 2004. Persistency in high dimensional linear predictor-selection and the virtue of over-parametrization. Bernoulli 10:971–88 [Google Scholar]
Gross D. 2011. Recovering low-rank matrices from few coefficients in any basis. IEEE Trans. Inf. Theory 57:31548–66 [Google Scholar]
Gyorfi L, Kohler M, Krzyzak A, Walk H. 2002. A Distribution-Free Theory of Nonparametric Regression New York: Springer
Hoerl AE, Kennard RW. 1970. Ridge regression: biased estimation for nonorthogonal problems. Technometrics 12:55–67 [Google Scholar]
Hsu D, Kakade SM, Zhang T. 2011. Robust matrix decomposition with sparse corruptions. IEEE Trans. Inf. Theory 57:117221–34 [Google Scholar]
Huang J, Zhang T. 2010. The benefit of group sparsity. Ann. Stat. 38:41978–2004 [Google Scholar]
Jacob L, Obozinski G, Vert JP. 2009. Group Lasso with overlap and graph Lasso Presented at the 26th Int. Conf. Mach. Learn., Montreal, Canada
Jalali A, Ravikumar P, Sanghavi SS, Ruan C. 2010. A dirty model for multi-task learning. Adv. Neural Inf. Process. Syst. 23:964–72 [Google Scholar]
Javanmard A, Montanari A. 2013. Hypothesis testing in high-dimensional regression under the Gaussian random design model: asymptotic theory Tech. Rep., Stanford Univ., Stanford, CA. http://arxiv1301.4240
Kim Y, Kim J, Kim Y. 2006. Blockwise sparse regression. Stat. Sin. 16:2375–90 [Google Scholar]
Koltchinskii V, Lounici K, Tsybakov AB. 2011. Nuclear-norm penalization and optimal rates for noisy low-rank matrix completion. Ann. Stat. 39:2302–29 [Google Scholar]
Koltchinskii V, Yuan M. 2008. Sparse recovery in large ensembles of kernel machines. Presented at Annu. Conf. Learn. Theory, 21st, Helsinki, Finland.
Koltchinskii V, Yuan M. 2010. Sparsity in multiple kernel learning. Ann. Stat. 38:3660–95 [Google Scholar]
Lam C, Fan J. 2009. Sparsistency and rates of convergence in large covariance matrix estimation. Ann. Stat. 37:4254–78 [Google Scholar]
Lin Y, Zhang HH. 2006. Component selection and smoothing in multivariate nonparametric regression. Ann. Stat. 34:2272–97 [Google Scholar]
Liu H, Lafferty J, Wasserman L. 2009. The nonparanormal: semiparametric estimation of high-dimensional undirected graphs. J. Mach. Learn. Res. 10:1–37 [Google Scholar]
Loh P, Wainwright MJ. 2012. High-dimensional regression with noisy and missing data: provable guarantees with non-convexity. Ann. Stat. 40:31637–64 [Google Scholar]
Lounici K, Pontil M, Tsybakov AB, van de Geer S. 2011. Oracle inequalities and optimal inference under group sparsity. Ann. Stat. 39:42164–204 [Google Scholar]
Mazumder R, Hastie T, Tibshirani R. 2010. Spectral regularization algorithms for learning large incomplete matrices. J. Mach. Learn. Res. 11:2287–322 [Google Scholar]
Meier L, van de Geer S, Buhlmann P. 2009. High-dimensional additive modeling. Ann. Stat. 37:3779–821 [Google Scholar]
Meinshausen N, Bühlmann P. 2006. High-dimensional graphs and variable selection with the Lasso. Ann. Stat. 34:1436–62 [Google Scholar]
Micchelli CA, Morales JM, Pontil M. 2013. Regularizers for structured sparsity. Adv. Comput. Math. 38455–89
Negahban S, Ravikumar P, Wainwright MJ, Yu B. 2012. A unified framework for high-dimensional analysis of M-estimators with decomposable regularizers. Stat. Sci. 27:4538–57 [Google Scholar]
Negahban S, Wainwright MJ. 2011a. Estimation of (near) low-rank matrices with noise and high-dimensional scaling. Ann. Stat. 39:21069–97 [Google Scholar]
Negahban S, Wainwright MJ. 2011b. Simultaneous support recovery in high-dimensional regression: benefits and perils of ℓ_1,∞-regularization. IEEE Trans. Inf. Theory 57:63841–63 [Google Scholar]
Negahban S, Wainwright MJ. 2012. Restricted strong convexity and (weighted) matrix completion: optimal bounds with noise. J. Mach. Learn. Res. 13:1665–97 [Google Scholar]
Nesterov Y. 2004. Introductory Lectures on Convex Optimization New York: Kluwer Acad.
Nesterov Y. 2007. Gradient methods for minimizing composite objective function Tech. Rep. 76, Cent. Oper. Res. Econom., Catholic Univ., Louvain, Belg.
Nesterov Y. 2012. Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim. 22:2341–62 [Google Scholar]
Obozinski G, Wainwright MJ, Jordan MI. 2011. Union support recovery in high-dimensional multivariate regression. Ann. Stat. 39:11–47 [Google Scholar]
Oymak S, Jalali A, Fazel M, Eldar YC, Hassibi B. 2012. Simultaneously structured models with applications to sparse and low-rank matrices Tech. Rep., Calif. Inst. Technol., Pasadena, CA. http://arxiv1212.3753
Raskutti G, Wainwright MJ, Yu B. 2010. Restricted eigenvalue conditions for correlated Gaussian designs. J. Mach. Learn. Res. 11:2241–59 [Google Scholar]
Raskutti G, Wainwright MJ, Yu B. 2011. Minimax rates of estimation for high-dimensional linear regression over ℓ_q-balls. IEEE Trans. Inf. Theory 57:106976–94 [Google Scholar]
Raskutti G, Wainwright MJ, Yu B. 2012. Minimax-optimal rates for sparse additive models over kernel classes via convex programming. J. Mach. Learn. Res. 12:389–427 [Google Scholar]
Ravikumar P, Liu H, Lafferty J, Wasserman L. 2009. SpAM: sparse additive models. J. R. Stat. Soc. Ser. B 71:51009–30 [Google Scholar]
Ravikumar P, Wainwright MJ, Raskutti G, Yu B. 2011. High-dimensional covariance estimation by minimizing ℓ₁-penalized log-determinant divergence. Electron. J. Stat. 5:935–80 [Google Scholar]
Recht B. 2011. A simpler approach to matrix completion. J. Mach. Learn. Res. 12:3413–30 [Google Scholar]
Recht B, Fazel M, Parrilo P. 2010. Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Rev. 52:3471–501 [Google Scholar]
Rohde A, Tsybakov A. 2011. Estimation of high-dimensional low-rank matrices. Ann. Stat. 39:2887–930 [Google Scholar]
Rothman AJ, Bickel PJ, Levina E, Zhu J. 2008. Sparse permutation invariant covariance estimation. Electron. J. Stat. 2:494–515 [Google Scholar]
Rudelson M, Zhou S. 2012. Reconstruction from anisotropic random measurements. IEEE Trans. Inf. Theory 59:3434–47 [Google Scholar]
Srebro N, Alon N, Jaakkola TS. 2005. Generalization error bounds for collaborative prediction with low-rank matrices Presented at Neural Inf. Proc. Syst., 17th, Vancouver
Srebro N, Rennie J, Jaakkola TS. 2004. Maximum-margin matrix factorization Presented at Neural Inf. Proc. Syst., 17th, Vancouver
Stojnic M, Parvaresh F, Hassibi B. 2009. On the reconstruction of block-sparse signals with an optimal number of measurements. IEEE Trans. Signal Process. 57:83075–85 [Google Scholar]
Stone CJ. 1982. Optimal global rates of convergence for non-parametric regression. Ann. Stat. 10:41040–53 [Google Scholar]
Stone CJ. 1985. Additive regression and other non-parametric models. Ann. Stat. 13:2689–705 [Google Scholar]
Tibshirani R. 1996. Regression shrinkage and selection via the Lasso. J. R. Stat. Soc. Ser. B 58:1267–88 [Google Scholar]
Tikhonov AN. 1943. On the stability of inverse problems. C. R. (Doklady) Acad. Sci. SSSR 39:176–79 [Google Scholar]
Tropp JA. 2006. Just relax: convex programming methods for identifying sparse signals in noise. IEEE Trans. Inf. Theory 52:31030–51 [Google Scholar]
Tropp JA, Gilbert AC, Strauss MJ. 2006. Algorithms for simultaneous sparse approximation. Part I: greedy pursuit. Signal Process 86:572–88 [Google Scholar]
Tseng P. 2001. Convergence of block coordinate descent method for nondifferentiable maximization. J. Opt. Theory Appl. 109:3474–94 [Google Scholar]
Tseng P, Yun S. 2009. A block-coordinate gradient descent method for linearly constrained nonsmooth separable optimization. J. Optim. Theory Appl. 140:513–35 [Google Scholar]
Turlach B, Venables WN, Wright SJ. 2005. Simultaneous variable selection. Technometrics 27:349–63 [Google Scholar]
van de Geer S. 2000. Empirical Processes in M-Estimation Cambridge, UK: Cambridge Univ. Press
van de Geer S. 2008. High-dimensional generalized linear models and the Lasso. Ann. Stat. 36:614–45 [Google Scholar]
van de Geer S. 2012. Weakly decomposable regularization penalties and structured sparsity Tech. Rep., ETH Zurich, Switz. http://arxiv1204.4813v2
van de Geer S, Buhlmann P. 2009. On the conditions used to prove oracle results for the Lasso. Electron. J. Stat. 3:1360–92 [Google Scholar]
van de Geer S, Buhlmann P, Ritov Y. 2013. On asymptotically optimal confidence regions and tests for high-dimensional models Tech. Rep., ETH Zurich, Switz. http://arxiv1303.0518
Wainwright MJ. 2009. Sharp thresholds for high-dimensional and noisy sparsity recovery using ℓ₁-constrained quadratic programming (Lasso). IEEE Trans. Inf. Theory 55:2183–202 [Google Scholar]
Wu TT, Lange K. 2008. Coordinate descent algorithms for Lasso-penalized regression. Ann. Appl. Stat. 2:1224–44 [Google Scholar]
Xu H, Caramanis C, Sanghavi S. 2012. Robust PCA via outlier pursuit. IEEE Trans. Inf. Theory 58:53047–64 [Google Scholar]
Yuan M. 2010. High-dimensional inverse covariance matrix estimation via linear programming. J. Mach. Learn. Res. 11:2261–86 [Google Scholar]
Yuan M, Lin Y. 2006. Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. B 68:49–67 [Google Scholar]
Zhang CH, Huang J. 2008. The sparsity and bias of the Lasso selection in high-dimensional linear regression. Ann. Stat. 36:41567–94 [Google Scholar]
Zhang CH, Zhang SS. 2011. Confidence intervals for low-dimensional parameters with high-dimensional data Tech. Rep., Rutgers Univ., New Brunswick, NJ. http://arxiv1110.2563
Zhang CH, Zhang T. 2012. A general theory of concave regularization for high-dimensional sparse estimation problems. Stat. Sci. 27:4576–93 [Google Scholar]
Zhao P, Rocha G, Yu B. 2009. Grouped and hierarchical model selection through composite absolute penalties. Ann. Stat. 37:6A3468–97 [Google Scholar]
Zhao P, Yu B. 2006. On model selection consistency of Lasso. J. Mach. Learn. Res. 7:2541–67 [Google Scholar]
Zhou S, Lafferty J, Wasserman L. 2008. Time-varying undirected graphs. Presented at Annu. Conf. Learn. Theory, 21st, Helsinki

/content/journals/10.1146/annurev-statistics-022513-115643

Structured Regularizers for High-Dimensional Problems: Statistical and Computational Issues

Annual Review of Statistics and Its Application 1, 233 (2014); https://doi.org/10.1146/annurev-statistics-022513-115643

/content/journals/10.1146/annurev-statistics-022513-115643

Data & Media loading...

Article Type: Review Article

Most Cited Most Cited RSS feed

- Probabilistic Forecasting
  
  Tilmann Gneiting, and Matthias Katzfuss
  
  Vol. 1 (2014), pp. 125–151
- Functional Data Analysis
  
  Jane-Ling Wang, Jeng-Min Chiou, and Hans-Georg Müller
  
  Vol. 3 (2016), pp. 257–295
- Bayesian Computing with INLA: A Review
  
  Håvard Rue, Andrea Riebler, Sigrunn H. Sørbye, Janine B. Illian, Daniel P. Simpson, and Finn K. Lindgren
  
  Vol. 4 (2017), pp. 395–421
- Functional Regression
  
  Jeffrey S. Morris
  
  Vol. 2 (2015), pp. 321–359
- Topological Data Analysis
  
  Larry Wasserman
  
  Vol. 5 (2018), pp. 501–532
- Algorithmic Fairness: Choices, Assumptions, and Definitions
  
  Shira Mitchell, Eric Potash, Solon Barocas, Alexander D'Amour, and Kristian Lum
  
  Vol. 8 (2021), pp. 141–163
- Microbiome, Metagenomics, and High-Dimensional Compositional Data Analysis
  
  Hongzhe Li
  
  Vol. 2 (2015), pp. 73–94
- Learning Deep Generative Models
  
  Ruslan Salakhutdinov
  
  Vol. 2 (2015), pp. 361–385
- On p-Values and Bayes Factors
  
  Leonhard Held, and Manuela Ott
  
  Vol. 5 (2018), pp. 393–419
- High-Dimensional Statistics with a View Toward Applications in Biology
  
  Peter Bühlmann, Markus Kalisch, and Lukas Meier
  
  Vol. 1 (2014), pp. 255–278
More Less

Annual Review of Statistics and Its Application

Volume 1, 2014

Review Article

Free

Structured Regularizers for High-Dimensional Problems: Statistical and Computational Issues

Abstract

Most Read This Month

Most Cited Most Cited RSS feed