Variable selection methods and model selection approaches are valuable statistical tools that are indispensable for almost any statistical modeling question. This review first considers the use of information criteria for model selection. Such criteria provide an ordering of the considered models where the best model is selected. Different modeling goals might require different criteria to be used. Next, the effect of including a penalty in the estimation process is discussed. Third, nonparametric estimation is discussed; it contains several aspects of model choice, such as the choice of the estimator to use and the selection of tuning parameters. Fourth, model averaging approaches are reviewed in which estimators from different models are weighted to provide one final estimator. There are several ways to choose the weights, and most of them result in data-driven, hence random, weights. Challenges for inference after model selection and inference for model-averaged estimators are discussed.


Article metrics loading...

Loading full text...

Full text loading...


Literature Cited

  1. Akaike H. 1973. Information theory and an extension of the maximum likelihood principle. Second International Symposium on Information Theory B Petrov, F Csáki 267–81 Budapest: Akadémiai Kiadó [Google Scholar]
  2. Ando T, Li KC. 2014. A model-averaging approach for high-dimensional regression. J. Am. Stat. Assoc. 109:254–65 [Google Scholar]
  3. Autin F, Claeskens G, Freyermuth JM. 2015. Asymptotic performance of projection estimators in standard and hyperbolic wavelet bases. Electron. J. Stat. 9:1852–83 [Google Scholar]
  4. Bartolucci F, Lupparelli M. 2008. Focused information criterion for capture-recapture models for closed populations. Scand. J. Stat. 35:629–49 [Google Scholar]
  5. Behl P, Claeskens G, Dette H. 2013. Focused model selection in quantile regression. Stat. Sin. 24:601–24 [Google Scholar]
  6. Behl P, Dette H, Frondel M, Tauchmann H. 2012. Choice is suffering: a focused information criterion for model selection. Econ. Model. 29:817–22 [Google Scholar]
  7. Belloni A, Chernozhukov V. 2013. Least squares after model selection in high-dimensional sparse models. Bernoulli 19:521–47 [Google Scholar]
  8. Berk R, Brown L, Buja A, Zhang K, Zhao L. 2013. Valid post-selection inference. Ann. Stat. 41:802–37 [Google Scholar]
  9. Bondell HD, Krishna A, Ghosh SK. 2010. Joint variable selection for fixed and random effects in linear mixed-effects models. Biometrics 66:1069–77 [Google Scholar]
  10. Brownlees CT, Gallo GM. 2008. On variable selection for volatility forecasting: the role of focused selection criteria. J. Financ. Econom. 6:513–39 [Google Scholar]
  11. Brownlees CT, Gallo GM. 2011. Shrinkage estimation of semiparametric multiplicative error models. Int. J. Forecast. 27:365–78 [Google Scholar]
  12. Brumback BA, Ruppert D, Wand MP. 1999. Comment on Shively, Kohn and Wood. J. Am. Stat. Assoc. 94:794–97 [Google Scholar]
  13. Buchholz A, Holländer N, Sauerbrei W. 2008. On properties of predictors derived with a two-step bootstrap model averaging approach—a simulation study in the linear regression model. Comput. Stat. Data Anal. 52:2778–93 [Google Scholar]
  14. Burnham KP, Anderson DR. 2002. Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach New York: Springer-Verlag, 2nd ed.. [Google Scholar]
  15. Candes E, Tao T. 2007. The Dantzig selector: statistical estimation when p is much larger than n. Ann. Stat. 35:2313–51 [Google Scholar]
  16. Charkhi A, Claeskens G, Hansen BE. 2016. Minimum mean squared error model averaging in likelihood models. Stat. Sin. In press [Google Scholar]
  17. Claeskens G, Carroll R. 2007. An asymptotic theory for model selection inference in general semiparametric problems. Biometrika 94:249–65 [Google Scholar]
  18. Claeskens G, Croux C, Van Kerckhoven J. 2006. Variable selection for logistic regression using a prediction focussed information criterion. Biometrics 62:972–79 [Google Scholar]
  19. Claeskens G, Croux C, Van Kerckhoven J. 2007. Prediction focussed model selection for autoregressive models. Aust. N. Z. J. Stat. 49:359–79 [Google Scholar]
  20. Claeskens G, Hjort NL. 2003. The focused information criterion. J. Am. Stat. Assoc. 98:900–16 [Google Scholar]
  21. Claeskens G, Hjort NL. 2008a. Minimising average risk in regression models. Econom. Theory 24:493–527 [Google Scholar]
  22. Claeskens G, Hjort NL. 2008b. Model Selection and Model Averaging Cambridge, UK: Cambridge Univ. Press [Google Scholar]
  23. Claeskens G, Krivobokova T, Opsomer JD. 2009. Asymptotic properties of penalized spline estimators. Biometrika 96:529–44 [Google Scholar]
  24. Claeskens G, Magnus JR, Vasnev AL, Wendun W. 2016. The forecast combination puzzle: a simple theoretical explanation. Int. J. Forecast. In press [Google Scholar]
  25. Claeskens G, Pircalabelu E, Waldorp L. 2015. Constructing graphical models via the focused information criterion. Modeling and Stochastic Learning for Forecasting in High Dimension A Antoniadis, X Brossat, JM Poggi 55–78 New York: Springer [Google Scholar]
  26. Danilov D, Magnus JR. 2004. On the harm that ignoring pretesting can cause. J. Econ. 122:27–46 [Google Scholar]
  27. Dawid AP. 1984. Present position and potential developments: some personal views: statistical theory: the prequential approach. J. R. Stat. Soc. Ser. A 147:278–92 [Google Scholar]
  28. de Boor C. 2001. A Practical Guide to Splines New York: Springer Rev. ed. [Google Scholar]
  29. Donoho DL, Johnstone IM. 1994. Ideal spatial adaptation by wavelet shrinkage. Biometrika 81:425–55 [Google Scholar]
  30. Donoho DL, Johnstone IM. 1995. Adapting to unknown smoothness via wavelet shrinkage. J. Am. Stat. Assoc. 90:1200–24 [Google Scholar]
  31. Donohue MC, Overholser R, Xu R, Vaida F. 2011. Conditional Akaike information under generalized linear and proportional hazards mixed models. Biometrika 98:685–700 [Google Scholar]
  32. Efron B. 2014. Estimation and accuracy after model selection. J. Am. Stat. Assoc. 109:991–1007 [Google Scholar]
  33. Eilers PHC, Marx BD. 1996. Flexible smoothing with B-splines and penalties. Stat. Sci. 11:89–121 [Google Scholar]
  34. Erven Tv, Grünwald P, de Rooij S. 2012. Catching up faster by switching sooner: a predictive approach to adaptive estimation with an application to the AIC–BIC dilemma. J. R. Stat. Soc. Ser. B 74:361–417 [Google Scholar]
  35. Fan J, Gijbels I. 1996. Local Polynomial Modelling and Its Applications Monogr. Stat. Appl. Probab. 66 London: Chapman & Hall [Google Scholar]
  36. Fan J, Li R. 2001. Variable selection via nonconcave penalized likelihood and its oracle properties. J. Am. Stat. Assoc. 96:1348–60 [Google Scholar]
  37. Fan J, Li R. 2002. Variable selection for Cox's proportional hazards model and frailty model. Ann. Stat. 30:74–99 [Google Scholar]
  38. Fokoue E, Clarke B. 2011. Bias-variance trade-off for prequential model list selection. Stat. Pap. 52:813–33 [Google Scholar]
  39. Green PJ, Hjort NL, Richardson S. 2003. Highly Structured Stochastic Systems Oxford Stat. Sci. Ser 27 Oxford, UK: Oxford Univ. Press [Google Scholar]
  40. Green PJ, Silverman BW. 1994. Nonparametric Regression and Generalized Linear Models: A Roughness Penalty Approach Monogr. Stat. Appl. Probab 58 London: Chapman & Hall [Google Scholar]
  41. Greven S, Kneib T. 2010. On the behavior of marginal and conditional Akaike information criteria in linear mixed models. Biometrika 97:773–89 [Google Scholar]
  42. Hannan EJ, Quinn BG. 1979. The determination of the order of an autoregression. J. R. Stat. Soc. Ser. B 41:190–95 [Google Scholar]
  43. Hansen BE. 2005. Challenges for econometric model selection. Econom. Theory 21:60–68 [Google Scholar]
  44. Hansen BE. 2007. Least squares model averaging. Econometrica 75:1175–89 [Google Scholar]
  45. Hansen BE, Racine JS. 2012. Jackknife model averaging. J. Econometr. 167:38–46 [Google Scholar]
  46. Hastie TJ, Tibshirani RJ, Friedman J. 2001. The Elements of Statistical Learning: Data Mining, Inference, and Prediction Heidelberg: Springer-Verlag [Google Scholar]
  47. Heritier S, Cantoni E, Copt S, Victoria-Feser MP. 2009. Robust Methods in Biostatistics Wiley Ser. Probab. Stat. Chichester, UK: Wiley [Google Scholar]
  48. Hjort NL. 2008. Focused information criteria for the linear hazard regression model. Statistical Models and Methods for Biomedical and Technical Systems F Vonta, M Nikulin, N Limnios, C Huber-Carol 487–502 Stat. Ind. Tech. Ser Boston: Birkhäuser [Google Scholar]
  49. Hjort NL, Claeskens G. 2003. Frequentist model average estimators. J. Am. Stat. Assoc. 98:879–99 [Google Scholar]
  50. Hjort NL, Claeskens G. 2006. Focused information criteria and model averaging for Cox hazard regression model. J. Am. Stat. Assoc. 101:1449–64 [Google Scholar]
  51. Hodges JS, Sargent DJ. 2001. Counting degrees of freedom in hierarchical and other richly parameterized models. Biometrika 88:367–79 [Google Scholar]
  52. Hoeting JA, Madigan D, Raftery AE, Volinsky CT. 1999. Bayesian model averaging: a tutorial. Stat. Sci. 14:382–401 [Google Scholar]
  53. Huang JZ. 2003. Local asymptotics for polynomial spline regression. Ann. Stat. 31:1600–35 [Google Scholar]
  54. Jansen M. 2014. Information criteria for variable selection under sparsity. Biometrika 101:37–55 [Google Scholar]
  55. Jansen M, Malfait M, Bultheel A. 1997. Generalized cross validation for wavelet thresholding. Signal Process. 56:33–44 [Google Scholar]
  56. Jansen M, Nason G, Silverman B. 2009. Multiscale methods for data on graphs and irregular multidimensional situations. J. R. Stat. Soc. Ser. B 71:97–125 [Google Scholar]
  57. Jansen M, Oonincx P. 2005. Second Generation Wavelets and Applications London: Springer-Verlag [Google Scholar]
  58. Konishi S, Kitagawa G. 2008. Information Criteria and Statistical Modeling Springer Ser. Stat New York: Springer [Google Scholar]
  59. Leeb H, Pötscher BM. 2003. The finite-sample distribution of post-model-selection estimators and uniform versus nonuniform approximations. Econom. Theory 19:100–42 [Google Scholar]
  60. Leeb H, Pötscher BM. 2005. Model selection and inference: facts and fiction. Econom. Theory 21:21–59 [Google Scholar]
  61. Liang H, Wu H, Zou G. 2008. A note on conditional AIC for linear mixed-effects models. Biometrika 95:773–78 [Google Scholar]
  62. Liang H, Zou G, Wan ATK, Zhang X. 2011. Optimal weight choice for frequentist model average estimators. J. Am. Stat. Assoc. 106:1053–66 [Google Scholar]
  63. Linhart H, Zucchini W. 1986. Model Selection New York: Wiley [Google Scholar]
  64. Liu CA. 2015. Distribution theory of the least squares averaging estimator. J. Econometr. 186:142–59 [Google Scholar]
  65. Liu W, Yang Y. 2011. Parametric or nonparametric? A parametricness index for model selection. Ann. Stat. 39:2074–102 [Google Scholar]
  66. Magnus JR, Powell O, Prüfer P. 2010. A comparison of two model averaging techniques with an application to growth empirics. J. Econometr. 154:139–53 [Google Scholar]
  67. Mallows CL. 1973. Some comments on Cp. Technometrics 15:661–75 [Google Scholar]
  68. Massart P. 2007. Concentration Inequalities and Model Selection Lect. Notes Math. Book 1896, Ecole Eté Probab. Saint-Flour. Berlin: Springer [Google Scholar]
  69. McQuarrie ADR, Tsai CL. 1998. Regression and Time Series Model Selection River Edge, NJ: World Sci. [Google Scholar]
  70. Meinshausen N, Bühlmann P. 2006. High-dimensional graphs and variable selection with the Lasso. Ann. Stat. 34:1436–62 [Google Scholar]
  71. Moral-Benito E. 2015. Model averaging in economics: an overview. J. Econ. Surv. 29:46–75 [Google Scholar]
  72. Müller S, Welsh AH. 2009. Robust model selection in generalized linear models. Stat. Sin. 19:1155–70 [Google Scholar]
  73. Naik PA, Shi P, Tsai CL. 2007. Extending the Akaike information criterion to mixture regression models. J. Am. Stat. Assoc. 102:244–54 [Google Scholar]
  74. Nason GP. 1996. Wavelet shrinkage using cross validation. J. R. Stat. Soc. Ser. B 58:463–79 [Google Scholar]
  75. Park MY, Hastie T. 2007. L1-regularization path algorithm for generalized linear models. J. R. Stat. Soc. Ser. B 69:659–77 [Google Scholar]
  76. Pircalabelu E, Claeskens G, Jahfari S, Waldorp LJ. 2015a. A focused information criterion for graphical models in fMRI connectivity with high dimensional data. Ann. Appl. Stat. 9:2179–214 [Google Scholar]
  77. Pircalabelu E, Claeskens G, Waldorp L. 2015b. A focused information criterion for graphical models. Stat. Comput. 25:1071–92 [Google Scholar]
  78. Pötscher BM. 1991. Effects of model selection on inference. Econ. Theory 7:163–85 [Google Scholar]
  79. Qian M, Murphy SA. 2011. Performance guarantees for individualized treatment rules. Ann. Stat. 39:1180–210 [Google Scholar]
  80. Rohan N, Ramanathan T. 2011. Order selection in ARMA models using the focused information criterion. Aust. N. Z. J. Stat. 53:217–31 [Google Scholar]
  81. Rolling CA, Yang Y. 2014. Model selection for estimating treatment effects. J. R. Stat. Soc. Ser. B 76:749–69 [Google Scholar]
  82. Ronchetti E. 1985. Robust model selection in regression. Stat. Probab. Lett. 3:21–23 [Google Scholar]
  83. Ronchetti E, Staudte RG. 1994. A robust version of Mallows' Cp. J. Am. Stat. Assoc. 89:550–59 [Google Scholar]
  84. Ruppert D, Wand MP, Carroll RJ. 2003. Semiparametric Regression Cambridge, UK: Cambridge Univ. Press [Google Scholar]
  85. Saefken B, Kneib T, van Waveren CS, Greven S. 2014. A unifying approach to the estimation of the conditional Akaike information in generalized linear mixed models. Electron. J. Stat. 8:201–25 [Google Scholar]
  86. Sauerbrei W, Holländer N, Buchholz A. 2008. Investigation about a screening step in model selection. Stat. Comput. 18:195–208 [Google Scholar]
  87. Schomaker M, Heumann C. 2014. Model selection and model averaging after multiple imputation. Comput. Stat. Data Anal. 71:758–70 [Google Scholar]
  88. Schomaker M, Wan AT, Heumann C. 2010. Frequentist model averaging with missing observations. Comput. Stat. Data Anal. 54:3336–47 [Google Scholar]
  89. Schwarz G. 1978. Estimating the dimension of a model. Ann. Stat. 6:461–64 [Google Scholar]
  90. Shibata R. 1980. Asymptotically efficient selection of the order of the model for estimatng parameters of a linear process. Ann. Stat. 8:147–64 [Google Scholar]
  91. Sin C, White H. 1996. Information criteria for selecting possibly misspecified parametric models. J. Econometr. 71:207–25 [Google Scholar]
  92. Stein C. 1981. Estimation of the mean of a multivariate normal distribution. Ann. Stat. 9:1135–51 [Google Scholar]
  93. Sueishi N. 2013. Generalized empirical likelihood-based focused information criterion and model averaging. Econometrics 1:141–56 [Google Scholar]
  94. Tibshirani R. 1996. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B 58:267–88 [Google Scholar]
  95. Vaida F, Blanchard S. 2005. Conditional Akaike information for mixed-effects models. Biometrika 92:351–70 [Google Scholar]
  96. Vansteelandt S, Bekaert M, Claeskens G. 2012. On model selection and model misspecification in causal inference. Stat. Methods Med. Res. 21:7–30 [Google Scholar]
  97. Vidakovic B. 1999. Statistical Modeling by Wavelets Wiley Ser. Probab. Math. Stat New York: Wiley [Google Scholar]
  98. Wan ATK, Zhang X, Wang S. 2013. Frequentist model averaging for multinomial and ordered logit models. Int. J. Forecast. 30:118–28 [Google Scholar]
  99. Wan ATK, Zhang X, Zou G. 2010. Least squares model averaging by Mallows criterion. J. Econometr. 156:277–83 [Google Scholar]
  100. Wand MP, Jones MC. 1995. Kernel Smoothing Monogr. Stat. Appl. Probab. 60 London: Chapman & Hall [Google Scholar]
  101. Yang Y. 2001. Adaptive regression by mixing. J. Am. Stat. Assoc. 96:574–88 [Google Scholar]
  102. Yang Y. 2005. Can the strengths of AIC and BIC be shared?. Biometrika 92:937–50 [Google Scholar]
  103. Zhang X, Liang H. 2011. Focused information criterion and model averaging for generalized additive partial linear models. Ann. Stat. 39:174–200 [Google Scholar]
  104. Zhang X, Wan ATK, Zhou SZ. 2012. Focused information criteria, model selection, and model averaging in a Tobit model with a nonzero threshold. J. Bus. Econ. Stat. 30:132–42 [Google Scholar]
  105. Zhang X, Zou G, Carroll RJ. 2015. Model averaging based on Kullback-Leibler distance. Stat. Sin. 25:1583–98 [Google Scholar]
  106. Zou H. 2006. The adaptive lasso and its oracle properties. J. Am. Stat. Assoc. 101:1418–29 [Google Scholar]
  107. Zou H, Hastie T. 2005. Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B 67:301–20 [Google Scholar]
  108. Zou H, Yuan M. 2008. Composite quantile regression and the oracle model selection theory. Ann. Stat. 36:1108–26 [Google Scholar]

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error