A Survey of Tuning Parameter Selection for High-Dimensional Regression

Yunan Wu; Lan Wang

doi:10.1146/annurev-statistics-030718-105038

Annual Review of Statistics and Its Application

Volume 7, 2020

Review Article

Free

A Survey of Tuning Parameter Selection for High-Dimensional Regression

Yunan Wu¹, and Lan Wang¹
View Affiliations Hide Affiliations

Affiliations: School of Statistics, University of Minnesota, Minneapolis, Minnesota 55455, USA; email: [email protected]
Vol. 7:209-226 (Volume publication date March 2020) https://doi.org/10.1146/annurev-statistics-030718-105038
Copyright © 2020 by Annual Reviews. All rights reserved

Abstract

Penalized (or regularized) regression, as represented by lasso and its variants, has become a standard technique for analyzing high-dimensional data when the number of variables substantially exceeds the sample size. The performance of penalized regression relies crucially on the choice of the tuning parameter, which determines the amount of regularization and hence the sparsity level of the fitted model. The optimal choice of tuning parameter depends on both the structure of the design matrix and the unknown random error distribution (variance, tail behavior, etc.). This article reviews the current literature of tuning parameter selection for high-dimensional regression from both the theoretical and practical perspectives. We discuss various strategies that choose the tuning parameter to achieve prediction accuracy or support recovery. We also review several recently proposed methods for tuning-free high-dimensional regression.

Keyword(s): Bayesian information criterion, BIC, cross-validation, lasso, scaled lasso, square-root lasso, tuning parameter

Article metrics loading...

/content/journals/10.1146/annurev-statistics-030718-105038

2020-03-07

2024-04-19

Full text loading...

/deliver/fulltext/statistics/7/1/annurev-statistics-030718-105038.html?itemId=/content/journals/10.1146/annurev-statistics-030718-105038&mimeType=html&fmt=ahah

Literature Cited

Antoniadis A 2010. Comments on: l₁-penalization for mixture regression models. TEST 19:257–58
[Google Scholar]
Belloni A, Chernozhukov V 2011. l₁-penalized quantile regression in high-dimensional sparse models. Ann. Stat. 39:82–130
[Google Scholar]
Belloni A, Chernozhukov V, Wang L 2011. Square-root lasso: pivotal recovery of sparse signals via conic programming. Biometrika 98:791–806
[Google Scholar]
Bickel PJ, Ritov Y, Tsybakov AB 2009. Simultaneous analysis of Lasso and Dantzig selector. Ann. Stat. 37:1705–32
[Google Scholar]
Bien J, Gaynanova I, Lederer J, Müller C 2016. Non-convex global minimization and false discovery rate control for the TREX. arXiv:1604.06815 [stat.ML]
[Google Scholar]
Bien J, Gaynanova I, Lederer J, Müller CL 2018. Prediction error bounds for linear regression with the TREX. TEST1–24
[Google Scholar]
Boyd S, Vandenberghe L 2004. Convex Optimization Cambridge, UK: Cambridge Univ. Press
Bühlmann P, Van de Geer S 2011. Statistics for High-Dimensional Data: Methods, Theory and Applications New York: Springer
Bunea F, Tsybakov A, Wegkamp M 2007. Sparsity oracle inequalities for the Lasso. Electron. J. Stat. 1:169–94
[Google Scholar]
Candes E, Tao T 2007. The Dantzig selector: statistical estimation when p is much larger than n. Ann. Stat. 35:2313–51
[Google Scholar]
Candès EJ, Plan Y 2009. Near-ideal model selection by l₁ minimization. Ann. Stat. 37:2145–77
[Google Scholar]
Chatterjee A, Lahiri SN 2011. Bootstrapping Lasso estimators. J. Am. Stat. Assoc. 106:608–25
[Google Scholar]
Chatterjee S, Jafarov J 2015. Prediction error of cross-validated Lasso. arXiv:1502.06291 [math.ST]
[Google Scholar]
Chen J, Chen Z 2008. Extended Bayesian information criteria for model selection with large model spaces. Biometrika 95:759–71
[Google Scholar]
Chen SS, Donoho DL, Saunders MA 2001. Atomic decomposition by basis pursuit. SIAM Rev. 43:129–59
[Google Scholar]
Chetverikov D, Liao Z, Chernozhukov V 2016. On cross-validated Lasso. arXiv:1605.02214 [math.ST]
[Google Scholar]
Chichignoud M, Lederer J, Wainwright M 2016. A practical scheme and fast algorithm to tune the lasso with optimality guarantees. J. Mach. Learn. Res. 17:1–20
[Google Scholar]
Datta A, Zou H 2017. CoCoLasso for high-dimensional error-in-variables regression. Ann. Stat. 45:2400–26
[Google Scholar]
Efron B, Hastie T, Johnstone I, Tibshirani R 2004. Least angle regression. Ann. Stat. 32:407–99
[Google Scholar]
Fan J, Li R 2001. Variable selection via nonconcave penalized likelihood and its oracle property. J. Am. Stat. Assoc. 96:1348–60
[Google Scholar]
Fan J, Lv J 2010. A selective overview of variable selection in high dimensional feature space. Stat. Sin. 20:101–48
[Google Scholar]
Fan Y, Tang CY 2013. Tuning parameter selection in high dimensional penalized likelihood. J. R. Stat. Soc. B 75:531–52
[Google Scholar]
Friedman J, Hastie T, Höfling H, Tibshirani R 2007. Pathwise coordinate optimization. Ann. Appl. Stat. 1:302–32
[Google Scholar]
Friedman J, Hastie T, Tibshirani R 2010. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33:1–22
[Google Scholar]
Greenshtein E, Ritov Y 2004. Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli 10:971–88
[Google Scholar]
Guo S, Wang Y, Yao Q 2016. High-dimensional and banded vector autoregressions. Biometrika 103:889–903
[Google Scholar]
Hall P, Lee ER, Park BU 2009. Bootstrap-based penalty choice for the lasso, achieving oracle performance. Stat. Sin. 19:449–71
[Google Scholar]
Hastie T, Efron B 2013. lars: least angle regression, lasso and forward stagewise. R package version 1.2. https://cran.r-project.org/web/packages/lars/index.html
[Google Scholar]
Hastie T, Tibshirani R, Friedman J 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction New York: Springer. 2nd ed.
Hastie T, Tibshirani R, Wainwright M 2015. Statistical Learning with Sparsity: the Lasso and Generalizations Boca Raton, FL: Chapman and Hall/CRC
Homrighausen D, McDonald DJ 2013. The lasso, persistence, and cross-validation. In Proceedings of the 30th International Conference on Machine Learning S Dasgupta, D McAllester1031–39 New York: ACM
[Google Scholar]
Homrighausen D, McDonald DJ 2017. Risk consistency of cross-validation with lasso-type procedures. Stat. Sin. 27:1017–36
[Google Scholar]
Jaeckel LA 1972. Estimating regression coefficients by minimizing the dispersion of the residuals. Ann. Math. Stat. 43:1449–58
[Google Scholar]
Kim Y, Kwon S, Choi H 2012. Consistent model selection criteria on high dimensions. J. Mach. Learn. Res. 13:1037–57
[Google Scholar]
Kock AB 2013. Oracle efficient variable selection in random and fixed effects panel data models. Econom. Theory 29:115–52
[Google Scholar]
Koenker R 2011. Additive models for quantile regression: model selection and confidence bandaids. Braz. J. Probab. Stat. 25:239–62
[Google Scholar]
Lederer J, Müller C 2015. Don't fall for tuning parameters: tuning-free variable selection in high dimensions with the TREX. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence2729–35 Palo Alto, CA: AAAI
[Google Scholar]
Lee ER, Noh H, Park BU 2014. Model selection via Bayesian information criterion for quantile regression models. J. Am. Stat. Assoc. 109:216–29
[Google Scholar]
Lepski OV 1991. On a problem of adaptive estimation in Gaussian white noise. Theory Probab. Appl. 35:454–66
[Google Scholar]
Lepski OV, Spokoiny VG 1997. Optimal pointwise adaptive methods in nonparametric estimation. Ann. Stat. 25:2512–46
[Google Scholar]
Li X, Zhao T, Wang L, Yuan X, Liu H 2018. flare: family of lasso regression. R package version 1.6.0. https://cran.r-project.org/web/packages/flare/index.html
[Google Scholar]
Meinshausen N, Bühlmann P 2006. High-dimensional graphs and variable selection with the Lasso. Ann. Stat. 34:1436–62
[Google Scholar]
Negahban SN, Ravikumar P, Wainwright MJ, Yu B 2012. A unified framework for high-dimensional analysis of m-estimators with decomposable regularizers. Stat. Sci. 27:538–57
[Google Scholar]
Owen AB 2007. A robust hybrid of lasso and ridge regression. Contemp. Math 443:59–71
[Google Scholar]
Parzen M, Wei L, Ying Z 1994. A resampling method based on pivotal estimating functions. Biometrika 81:341–50
[Google Scholar]
Rao R, Wu Y 1989. A strongly consistent procedure for model selection in a regression problem. Biometrika 76:369–74
[Google Scholar]
Schwarz G 1978. Estimating the dimension of a model. Ann. Stat. 6:461–64
[Google Scholar]
Sherwood B, Wang L 2016. Partially linear additive quantile regression in ultra-high dimension. Ann. Stat. 44:288–317
[Google Scholar]
Städler N, Bühlmann P, Van De Geer S 2010. l₁-penalization for mixture regression models. TEST 19:209–56
[Google Scholar]
Sun T, Zhang CH 2010. Comments on: l₁-penalization for mixture regression models. TEST 19:270–75
[Google Scholar]
Sun T, Zhang CH 2012. Scaled sparse linear regression. Biometrika 99:879–98
[Google Scholar]
Sun T, Zhang CH 2013. Sparse matrix inversion with scaled Lasso. J. Mach. Learn. Res. 14:3385–418
[Google Scholar]
Tian X, Loftus JR, Taylor JE 2018. Selective inference with unknown variance via the square-root lasso. Biometrika 105:755–68
[Google Scholar]
Tibshirani R 1996. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 58:267–88
[Google Scholar]
Van de Geer S 2016. Estimation and Testing Under Sparsity New York: Springer
Van de Geer SA 2008. High-dimensional generalized linear models and the lasso. Ann. Stat. 36:614–45
[Google Scholar]
Wainwright M 2019.High-Dimensional Statistics: A Non-Asymptotic Viewpoint. Cambridge, UK: Cambridge Univ. Press
Wang H, Li B, Leng C 2009. Shrinkage tuning parameter selection with a diverging number of parameters. J. R. Stat. Soc. B 71:671–83
[Google Scholar]
Wang H, Li R, Tsai CL 2007. Tuning parameter selectors for the smoothly clipped absolute deviation method. Biometrika 94:553–68
[Google Scholar]
Wang L 2009. Wilcoxon-type generalized Bayesian information criterion. Biometrika 96:163–73
[Google Scholar]
Wang L, Kim Y, Li R 2013. Calibrating nonconvex penalized regression in ultra-high dimension. Ann. Stat. 41:2505–36
[Google Scholar]
Wang L, Li R 2009. Weighted Wilcoxon-type smoothly clipped absolute deviation method. Biometrics 65:564–71
[Google Scholar]
Wang L, Peng B, Bradic J, Li R, Wu Y 2018. A tuning-free robust and efficient approach to high-dimensional regression Tech. Rep., Sch. Stat., Univ. Minn.
Wang L, Wu Y, Li R 2012. Quantile regression for analyzing heterogeneity in ultra-high dimension. J. Am. Stat. Assoc. 107:214–22
[Google Scholar]
Wang T, Zhu L 2011. Consistent tuning parameter selection in high dimensional sparse linear regression. J. Multivar. Anal. 102:1141–51
[Google Scholar]
Wu TT, Lange K 2008. Coordinate descent algorithms for lasso penalized regression. Ann. Appl. Stat. 2:224–44
[Google Scholar]
Xie H, Huang J 2009. SCAD-penalized regression in high-dimensional partially linear models. Ann. Stat. 37:673–96
[Google Scholar]
Zhang CH 2010. Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38:894–942
[Google Scholar]
Zhang CH, Huang J 2008. The sparsity and bias of the lasso selection in high-dimensional linear regression. Ann. Stat. 36:1567–94
[Google Scholar]
Zhang CH, Zhang T 2012. A general theory of concave regularization for high-dimensional sparse estimation problems. Stat. Sci. 27:576–93
[Google Scholar]
Zhang T 2010. Analysis of multi-stage convex relaxation for sparse regularization. J. Mach. Learn. Res. 11:1081–107
[Google Scholar]
Zhang X, Wu Y, Wang L, Li R 2016. A consistent information criterion for support vector machines in diverging model spaces. J. Mach. Learn. Res. 17:466–91
[Google Scholar]
Zhang Y, Li R, Tsai CL 2010. Regularization parameter selections via generalized information criterion. J. Am. Stat. Assoc. 105:312–23
[Google Scholar]
Zhao P, Yu B 2006. On model selection consistency of lasso. J. Mach. Learn. Res. 7:2541–63
[Google Scholar]
Zou H 2006. The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 101:1418–29
[Google Scholar]
Zou H, Hastie T, Tibshirani R 2007. On the “degrees of freedom” of the lasso. Ann. Stat. 35:2173–92
[Google Scholar]

/content/journals/10.1146/annurev-statistics-030718-105038

A Survey of Tuning Parameter Selection for High-Dimensional Regression

Annual Review of Statistics and Its Application 7, 209 (2020); https://doi.org/10.1146/annurev-statistics-030718-105038

/content/journals/10.1146/annurev-statistics-030718-105038

Data & Media loading...

Article Type: Review Article

Most Cited Most Cited RSS feed

- Probabilistic Forecasting
  
  Tilmann Gneiting, and Matthias Katzfuss
  
  Vol. 1 (2014), pp. 125–151
- Functional Data Analysis
  
  Jane-Ling Wang, Jeng-Min Chiou, and Hans-Georg Müller
  
  Vol. 3 (2016), pp. 257–295
- Bayesian Computing with INLA: A Review
  
  Håvard Rue, Andrea Riebler, Sigrunn H. Sørbye, Janine B. Illian, Daniel P. Simpson, and Finn K. Lindgren
  
  Vol. 4 (2017), pp. 395–421
- Functional Regression
  
  Jeffrey S. Morris
  
  Vol. 2 (2015), pp. 321–359
- Topological Data Analysis
  
  Larry Wasserman
  
  Vol. 5 (2018), pp. 501–532
- Algorithmic Fairness: Choices, Assumptions, and Definitions
  
  Shira Mitchell, Eric Potash, Solon Barocas, Alexander D'Amour, and Kristian Lum
  
  Vol. 8 (2021), pp. 141–163
- Microbiome, Metagenomics, and High-Dimensional Compositional Data Analysis
  
  Hongzhe Li
  
  Vol. 2 (2015), pp. 73–94
- Learning Deep Generative Models
  
  Ruslan Salakhutdinov
  
  Vol. 2 (2015), pp. 361–385
- On p-Values and Bayes Factors
  
  Leonhard Held, and Manuela Ott
  
  Vol. 5 (2018), pp. 393–419
- High-Dimensional Statistics with a View Toward Applications in Biology
  
  Peter Bühlmann, Markus Kalisch, and Lukas Meier
  
  Vol. 1 (2014), pp. 255–278
More Less

Annual Review of Statistics and Its Application

Volume 7, 2020

Review Article

Free

A Survey of Tuning Parameter Selection for High-Dimensional Regression

Abstract

Most Read This Month

Most Cited Most Cited RSS feed