1932

Abstract

Penalized (or regularized) regression, as represented by lasso and its variants, has become a standard technique for analyzing high-dimensional data when the number of variables substantially exceeds the sample size. The performance of penalized regression relies crucially on the choice of the tuning parameter, which determines the amount of regularization and hence the sparsity level of the fitted model. The optimal choice of tuning parameter depends on both the structure of the design matrix and the unknown random error distribution (variance, tail behavior, etc.). This article reviews the current literature of tuning parameter selection for high-dimensional regression from both the theoretical and practical perspectives. We discuss various strategies that choose the tuning parameter to achieve prediction accuracy or support recovery. We also review several recently proposed methods for tuning-free high-dimensional regression.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-statistics-030718-105038
2020-03-07
2024-06-15
Loading full text...

Full text loading...

/deliver/fulltext/statistics/7/1/annurev-statistics-030718-105038.html?itemId=/content/journals/10.1146/annurev-statistics-030718-105038&mimeType=html&fmt=ahah

Literature Cited

  1. Antoniadis A 2010. Comments on: l1-penalization for mixture regression models. TEST 19:257–58
    [Google Scholar]
  2. Belloni A, Chernozhukov V 2011. l1-penalized quantile regression in high-dimensional sparse models. Ann. Stat. 39:82–130
    [Google Scholar]
  3. Belloni A, Chernozhukov V, Wang L 2011. Square-root lasso: pivotal recovery of sparse signals via conic programming. Biometrika 98:791–806
    [Google Scholar]
  4. Bickel PJ, Ritov Y, Tsybakov AB 2009. Simultaneous analysis of Lasso and Dantzig selector. Ann. Stat. 37:1705–32
    [Google Scholar]
  5. Bien J, Gaynanova I, Lederer J, Müller C 2016. Non-convex global minimization and false discovery rate control for the TREX. arXiv:1604.06815 [stat.ML]
    [Google Scholar]
  6. Bien J, Gaynanova I, Lederer J, Müller CL 2018. Prediction error bounds for linear regression with the TREX. TEST1–24
    [Google Scholar]
  7. Boyd S, Vandenberghe L 2004. Convex Optimization Cambridge, UK: Cambridge Univ. Press
    [Google Scholar]
  8. Bühlmann P, Van de Geer S 2011. Statistics for High-Dimensional Data: Methods, Theory and Applications New York: Springer
    [Google Scholar]
  9. Bunea F, Tsybakov A, Wegkamp M 2007. Sparsity oracle inequalities for the Lasso. Electron. J. Stat. 1:169–94
    [Google Scholar]
  10. Candes E, Tao T 2007. The Dantzig selector: statistical estimation when p is much larger than n. Ann. Stat. 35:2313–51
    [Google Scholar]
  11. Candès EJ, Plan Y 2009. Near-ideal model selection by l1 minimization. Ann. Stat. 37:2145–77
    [Google Scholar]
  12. Chatterjee A, Lahiri SN 2011. Bootstrapping Lasso estimators. J. Am. Stat. Assoc. 106:608–25
    [Google Scholar]
  13. Chatterjee S, Jafarov J 2015. Prediction error of cross-validated Lasso. arXiv:1502.06291 [math.ST]
    [Google Scholar]
  14. Chen J, Chen Z 2008. Extended Bayesian information criteria for model selection with large model spaces. Biometrika 95:759–71
    [Google Scholar]
  15. Chen SS, Donoho DL, Saunders MA 2001. Atomic decomposition by basis pursuit. SIAM Rev. 43:129–59
    [Google Scholar]
  16. Chetverikov D, Liao Z, Chernozhukov V 2016. On cross-validated Lasso. arXiv:1605.02214 [math.ST]
    [Google Scholar]
  17. Chichignoud M, Lederer J, Wainwright M 2016. A practical scheme and fast algorithm to tune the lasso with optimality guarantees. J. Mach. Learn. Res. 17:1–20
    [Google Scholar]
  18. Datta A, Zou H 2017. CoCoLasso for high-dimensional error-in-variables regression. Ann. Stat. 45:2400–26
    [Google Scholar]
  19. Efron B, Hastie T, Johnstone I, Tibshirani R 2004. Least angle regression. Ann. Stat. 32:407–99
    [Google Scholar]
  20. Fan J, Li R 2001. Variable selection via nonconcave penalized likelihood and its oracle property. J. Am. Stat. Assoc. 96:1348–60
    [Google Scholar]
  21. Fan J, Lv J 2010. A selective overview of variable selection in high dimensional feature space. Stat. Sin. 20:101–48
    [Google Scholar]
  22. Fan Y, Tang CY 2013. Tuning parameter selection in high dimensional penalized likelihood. J. R. Stat. Soc. B 75:531–52
    [Google Scholar]
  23. Friedman J, Hastie T, Höfling H, Tibshirani R 2007. Pathwise coordinate optimization. Ann. Appl. Stat. 1:302–32
    [Google Scholar]
  24. Friedman J, Hastie T, Tibshirani R 2010. Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33:1–22
    [Google Scholar]
  25. Greenshtein E, Ritov Y 2004. Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli 10:971–88
    [Google Scholar]
  26. Guo S, Wang Y, Yao Q 2016. High-dimensional and banded vector autoregressions. Biometrika 103:889–903
    [Google Scholar]
  27. Hall P, Lee ER, Park BU 2009. Bootstrap-based penalty choice for the lasso, achieving oracle performance. Stat. Sin. 19:449–71
    [Google Scholar]
  28. Hastie T, Efron B 2013. lars: least angle regression, lasso and forward stagewise. R package version 1.2. https://cran.r-project.org/web/packages/lars/index.html
    [Google Scholar]
  29. Hastie T, Tibshirani R, Friedman J 2009. The Elements of Statistical Learning: Data Mining, Inference, and Prediction New York: Springer. 2nd ed.
    [Google Scholar]
  30. Hastie T, Tibshirani R, Wainwright M 2015. Statistical Learning with Sparsity: the Lasso and Generalizations Boca Raton, FL: Chapman and Hall/CRC
    [Google Scholar]
  31. Homrighausen D, McDonald DJ 2013. The lasso, persistence, and cross-validation. In Proceedings of the 30th International Conference on Machine Learning S Dasgupta, D McAllester1031–39 New York: ACM
    [Google Scholar]
  32. Homrighausen D, McDonald DJ 2017. Risk consistency of cross-validation with lasso-type procedures. Stat. Sin. 27:1017–36
    [Google Scholar]
  33. Jaeckel LA 1972. Estimating regression coefficients by minimizing the dispersion of the residuals. Ann. Math. Stat. 43:1449–58
    [Google Scholar]
  34. Kim Y, Kwon S, Choi H 2012. Consistent model selection criteria on high dimensions. J. Mach. Learn. Res. 13:1037–57
    [Google Scholar]
  35. Kock AB 2013. Oracle efficient variable selection in random and fixed effects panel data models. Econom. Theory 29:115–52
    [Google Scholar]
  36. Koenker R 2011. Additive models for quantile regression: model selection and confidence bandaids. Braz. J. Probab. Stat. 25:239–62
    [Google Scholar]
  37. Lederer J, Müller C 2015. Don't fall for tuning parameters: tuning-free variable selection in high dimensions with the TREX. In Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence2729–35 Palo Alto, CA: AAAI
    [Google Scholar]
  38. Lee ER, Noh H, Park BU 2014. Model selection via Bayesian information criterion for quantile regression models. J. Am. Stat. Assoc. 109:216–29
    [Google Scholar]
  39. Lepski OV 1991. On a problem of adaptive estimation in Gaussian white noise. Theory Probab. Appl. 35:454–66
    [Google Scholar]
  40. Lepski OV, Spokoiny VG 1997. Optimal pointwise adaptive methods in nonparametric estimation. Ann. Stat. 25:2512–46
    [Google Scholar]
  41. Li X, Zhao T, Wang L, Yuan X, Liu H 2018. flare: family of lasso regression. R package version 1.6.0. https://cran.r-project.org/web/packages/flare/index.html
    [Google Scholar]
  42. Meinshausen N, Bühlmann P 2006. High-dimensional graphs and variable selection with the Lasso. Ann. Stat. 34:1436–62
    [Google Scholar]
  43. Negahban SN, Ravikumar P, Wainwright MJ, Yu B 2012. A unified framework for high-dimensional analysis of m-estimators with decomposable regularizers. Stat. Sci. 27:538–57
    [Google Scholar]
  44. Owen AB 2007. A robust hybrid of lasso and ridge regression. Contemp. Math 443:59–71
    [Google Scholar]
  45. Parzen M, Wei L, Ying Z 1994. A resampling method based on pivotal estimating functions. Biometrika 81:341–50
    [Google Scholar]
  46. Rao R, Wu Y 1989. A strongly consistent procedure for model selection in a regression problem. Biometrika 76:369–74
    [Google Scholar]
  47. Schwarz G 1978. Estimating the dimension of a model. Ann. Stat. 6:461–64
    [Google Scholar]
  48. Sherwood B, Wang L 2016. Partially linear additive quantile regression in ultra-high dimension. Ann. Stat. 44:288–317
    [Google Scholar]
  49. Städler N, Bühlmann P, Van De Geer S 2010. l1-penalization for mixture regression models. TEST 19:209–56
    [Google Scholar]
  50. Sun T, Zhang CH 2010. Comments on: l1-penalization for mixture regression models. TEST 19:270–75
    [Google Scholar]
  51. Sun T, Zhang CH 2012. Scaled sparse linear regression. Biometrika 99:879–98
    [Google Scholar]
  52. Sun T, Zhang CH 2013. Sparse matrix inversion with scaled Lasso. J. Mach. Learn. Res. 14:3385–418
    [Google Scholar]
  53. Tian X, Loftus JR, Taylor JE 2018. Selective inference with unknown variance via the square-root lasso. Biometrika 105:755–68
    [Google Scholar]
  54. Tibshirani R 1996. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 58:267–88
    [Google Scholar]
  55. Van de Geer S 2016. Estimation and Testing Under Sparsity New York: Springer
    [Google Scholar]
  56. Van de Geer SA 2008. High-dimensional generalized linear models and the lasso. Ann. Stat. 36:614–45
    [Google Scholar]
  57. Wainwright M 2019.High-Dimensional Statistics: A Non-Asymptotic Viewpoint. Cambridge, UK: Cambridge Univ. Press
  58. Wang H, Li B, Leng C 2009. Shrinkage tuning parameter selection with a diverging number of parameters. J. R. Stat. Soc. B 71:671–83
    [Google Scholar]
  59. Wang H, Li R, Tsai CL 2007. Tuning parameter selectors for the smoothly clipped absolute deviation method. Biometrika 94:553–68
    [Google Scholar]
  60. Wang L 2009. Wilcoxon-type generalized Bayesian information criterion. Biometrika 96:163–73
    [Google Scholar]
  61. Wang L, Kim Y, Li R 2013. Calibrating nonconvex penalized regression in ultra-high dimension. Ann. Stat. 41:2505–36
    [Google Scholar]
  62. Wang L, Li R 2009. Weighted Wilcoxon-type smoothly clipped absolute deviation method. Biometrics 65:564–71
    [Google Scholar]
  63. Wang L, Peng B, Bradic J, Li R, Wu Y 2018. A tuning-free robust and efficient approach to high-dimensional regression Tech. Rep., Sch. Stat., Univ. Minn.
    [Google Scholar]
  64. Wang L, Wu Y, Li R 2012. Quantile regression for analyzing heterogeneity in ultra-high dimension. J. Am. Stat. Assoc. 107:214–22
    [Google Scholar]
  65. Wang T, Zhu L 2011. Consistent tuning parameter selection in high dimensional sparse linear regression. J. Multivar. Anal. 102:1141–51
    [Google Scholar]
  66. Wu TT, Lange K 2008. Coordinate descent algorithms for lasso penalized regression. Ann. Appl. Stat. 2:224–44
    [Google Scholar]
  67. Xie H, Huang J 2009. SCAD-penalized regression in high-dimensional partially linear models. Ann. Stat. 37:673–96
    [Google Scholar]
  68. Zhang CH 2010. Nearly unbiased variable selection under minimax concave penalty. Ann. Stat. 38:894–942
    [Google Scholar]
  69. Zhang CH, Huang J 2008. The sparsity and bias of the lasso selection in high-dimensional linear regression. Ann. Stat. 36:1567–94
    [Google Scholar]
  70. Zhang CH, Zhang T 2012. A general theory of concave regularization for high-dimensional sparse estimation problems. Stat. Sci. 27:576–93
    [Google Scholar]
  71. Zhang T 2010. Analysis of multi-stage convex relaxation for sparse regularization. J. Mach. Learn. Res. 11:1081–107
    [Google Scholar]
  72. Zhang X, Wu Y, Wang L, Li R 2016. A consistent information criterion for support vector machines in diverging model spaces. J. Mach. Learn. Res. 17:466–91
    [Google Scholar]
  73. Zhang Y, Li R, Tsai CL 2010. Regularization parameter selections via generalized information criterion. J. Am. Stat. Assoc. 105:312–23
    [Google Scholar]
  74. Zhao P, Yu B 2006. On model selection consistency of lasso. J. Mach. Learn. Res. 7:2541–63
    [Google Scholar]
  75. Zou H 2006. The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 101:1418–29
    [Google Scholar]
  76. Zou H, Hastie T, Tibshirani R 2007. On the “degrees of freedom” of the lasso. Ann. Stat. 35:2173–92
    [Google Scholar]
/content/journals/10.1146/annurev-statistics-030718-105038
Loading
/content/journals/10.1146/annurev-statistics-030718-105038
Loading

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error