Most epidemiology textbooks that discuss models are vague on details of model selection. This lack of detail may be understandable since selection should be strongly influenced by features of the particular study, including contextual (prior) information about covariates that may confound, modify, or mediate the effect under study. It is thus important that authors document their modeling goals and strategies and understand the contextual interpretation of model parameters and model selection criteria. To illustrate this point, we review several established strategies for selecting model covariates, describe their shortcomings, and point to refinements, assuming that the main goal is to derive the most accurate effect estimates obtainable from the data and available resources. This goal shifts the focus to prediction of exposure or potential outcomes (or both) to adjust for confounding; it thus differs from the goal of ordinary statistical modeling, which is to passively predict outcomes. Nonetheless, methods and software for passive prediction can be used for causal inference as well, provided that the target parameters are shifted appropriately.


Article metrics loading...

Loading full text...

Full text loading...


Literature Cited

  1. Ahrens W, Pigeot I. 1.  2014. Handbook of Epidemiology New York: Springer. 2nd ed. [Google Scholar]
  2. Arabatzis AA, Gregoire TG, Reynolds MR Jr. 2.  1989. Conditional interval estimation of the mean following rejection of a two-sided test. Commun. Stat.-Theory Methods 18:4359–73 [Google Scholar]
  3. Austin PC. 3.  2009. Balance diagnostics for comparing the distribution of baseline covariates between treatment groups in propensity-score matched samples. Stat. Med. 28:3083–107 [Google Scholar]
  4. Austin PC. 4.  2014. A tutorial on the use of propensity score methods with survival time or time-to-event outcomes: reporting measures of effect similar to those used in randomized experiments. Stat. Med. 33:1242–58 [Google Scholar]
  5. Austin PC, Brunner LJ. 5.  2004. Inflation of the type I error rate when a continuous confounding variable is categorized in logistic regression analyses. Stat. Med. 23:1159–78 [Google Scholar]
  6. Bancroft TA, Han CP. 6.  1977. Inference based on conditional specification - note and a bibliography. Int. Stat. Rev. 45:117–27 [Google Scholar]
  7. Bang H, Robins JM. 7.  2005. Doubly robust estimation in missing data and causal inference models. Biometrics 61:962–72 Erratum 2008. Biometrics 64:2650 [Google Scholar]
  8. Bayarri MJ, Berger JO. 8.  2004. The interplay of Bayesian and frequentist analysis. Stat. Sci. 19:58–80 [Google Scholar]
  9. Berger JO, Berry DA. 9.  1988. Statistical analysis and the illusion of objectivity. Am. Sci. 76:159–65 [Google Scholar]
  10. Berzuini C, Dawid AP, Bernardinelli L. 10.  2012. Causal Inference: Statistical Perspectives and Applications. Chichester, UK: Wiley [Google Scholar]
  11. Blackwelder WC. 11.  1998. Equivalence trials. Encyclopedia of Biostatistics P Armitage, T Colton 1367–72 New York: Wiley [Google Scholar]
  12. Box GEP. 12.  1980. Sampling and Bayes inference in scientific modeling and robustness. J. R. Stat. Soc. Ser. A 143:383–430 [Google Scholar]
  13. Box GEP. 13.  1990. Comment. Stat. Sci. 5:390–91 [Google Scholar]
  14. Breslow NE, Day NE. 14.  1980. Statistical Methods in Cancer Research. Volume 1: The Analysis of Case-Control Studies. Lyon: IARC [Google Scholar]
  15. Brookhart MA, Schneeweiss S, Rothman KJ, Glynn RJ, Avorn J, Sturmer T. 15.  2006. Variable selection for propensity score models. Am. J. Epidemiol. 163:1149–56 [Google Scholar]
  16. Carlin BP, Louis TA. 16.  2000. Bayes and Empirical Bayes Methods of Data Analysis New York: Chapman & Hall, 2nd ed.. [Google Scholar]
  17. Carroll RJ, Ruppert D, Stefanski LA, Craineceanu C. 17.  2006. Measurement Error in Non-Linear Models Boca Raton, FL: Chapman & Hall [Google Scholar]
  18. Chatfield C. 18.  1995. Model uncertainty, data mining and statistical inference. J. R. Stat. Soc. Ser. A 158:419–66 [Google Scholar]
  19. Chatfield C. 19.  2002. Confessions of a pragmatic statistician. J. R. Stat. Soc. Ser. D 51:1–20 [Google Scholar]
  20. Checkoway H, Pearce N, Kriebel D. 20.  2004. Research Methods in Occupational Epidemiology New York: Oxford Univ. Press [Google Scholar]
  21. Chiou P. 21.  1997. Interval estimation of scale parameters following a pre-test for two exponential distributions. Comput. Stat. Data Anal. 23:477–89 [Google Scholar]
  22. Chiou P, Han CP. 22.  1995. Conditional interval estimation of the exponential location parameter following rejection of a pretest. Commun. Stat.-Theory Methods 24:1481–92 [Google Scholar]
  23. Colditz GA, Manson JE, Hankinson SE. 23.  1997. The Nurses' Health Study: 20-year contribution to the understanding of health among women. J. Womens Health 6:49–62 [Google Scholar]
  24. Cole SR, Platt RW, Schisterman EF, Chu HT, Westreich D. 24.  et al. 2010. Illustrating bias due to conditioning on a collider. Int. J. Epidemiol. 39:417–20 [Google Scholar]
  25. Copas JB. 25.  1983. Regression, prediction and shrinkage. J. R. Stat. Soc. Ser. B 45:311–54 [Google Scholar]
  26. Courvoisier DS, Combescure C, Agoritsas T, Gayet-Ageron A, Perneger TV. 26.  2011. Performance of logistic regression modeling: beyond the number of events per variable, the role of data structure. J. Clin. Epidemiol. 64:1463–64 [Google Scholar]
  27. Dales LD, Ury HK. 27.  1978. An improper use of statistical significance testing in studying covariables. Int. J. Epidemiol. 4:373–75 [Google Scholar]
  28. Dawid AP. 28.  2000. Causal inference without counterfactuals. J. Am. Stat. Assoc. 95:407–48 [Google Scholar]
  29. Dawid AP. 29.  2012. The decision-theoretic approach to causal inference. See Ref. 10 25–42
  30. Day NE, Byar DP, Green SB. 30.  1980. Overadjustment in case-control studies. Am. J. Epidemiol. 112:696–706 [Google Scholar]
  31. De Luna X, Waernbaum I, Richardson TS. 31.  2011. Covariate selection for the nonparametric estimation of an average treatment effect. Biometrika 98:861–75 [Google Scholar]
  32. Draper NR, Guttman I, Lapczak L. 32.  1979. Actual rejection levels in a certain stepwise test. Commun. Stat. A 8:99–105 [Google Scholar]
  33. Efron B. 33.  1975. Biased versus unbiased estimation. Adv. Math. 16:259–77 [Google Scholar]
  34. Efron B. 34.  2005. Bayesians, frequentists, and scientists. J. Am. Stat. Assoc. 100:1–5 [Google Scholar]
  35. Efron B, Morris C. 35.  1973. Stein's estimation rule and its competitors—empirical Bayes approach. J. Am. Stat. Assoc. 68:117–30 [Google Scholar]
  36. Efron B, Morris C. 36.  1975. Data analysis using Stein's estimator and its generalizations. J. Am. Stat. Assoc. 70:311–19 [Google Scholar]
  37. Efroymson MA. 37.  1960. Multiple regression analysis. Mathematical Methods for Digital Computers A Ralston, HS Wilf 191–203 Hoboken, NJ: Wiley [Google Scholar]
  38. Faraway JJ. 38.  1992. On the cost of data analysis. J. Comput. Graph. Stat. 1:213–19 [Google Scholar]
  39. Feyerabend P. 39.  1993. Against Method New York: Verso [Google Scholar]
  40. Flack VF, Chang PC. 40.  1987. Frequency of selecting noise variables in subset regression analysis—a simulation study. Am. Stat. 41:84–86 [Google Scholar]
  41. Freedman DA. 41.  1983. A note on screening regression equations. Am. Stat. 37:152–55 [Google Scholar]
  42. Freedman DA, Navidi W, Peters SC. 42.  1988. On the impact of variable selection in fitting regression equations. On Model Uncertainty and Its Statistical Implications TK Dijlestra 1–16 Berlin: Springer-Verlag [Google Scholar]
  43. Gelman A, Shalizi CR. 43.  2013. Philosophy and the practice of Bayesian statistics. Br. J. Math. Stat. Psychol. 66:8–38 [Google Scholar]
  44. Glymour MM, Greenland S. 44.  2008. Causal diagrams. See Ref. 120 183–209
  45. Greenland S. 45.  1986. Partial and marginal matching in case-control studies. Modern Statistical Methods in Chronic Disease Epidemiology SH Moolgavkar, RL Prentice 35–49 New York: Wiley [Google Scholar]
  46. Greenland S. 46.  1989. Comment: cautions in the use of preliminary test estimators. Stat. Med. 8:669–73 [Google Scholar]
  47. Greenland S. 47.  1989. Modeling and variable selection in epidemiologic analysis. Am. J. Public Health 79:340–49 [Google Scholar]
  48. Greenland S. 48.  1990. Randomization, statistics, and causal inference. Epidemiology 1:421–29 [Google Scholar]
  49. Greenland S. 49.  1991. Reducing mean squared error in the analysis of stratified epidemiologic studies. Biometrics 47:773–75 [Google Scholar]
  50. Greenland S. 50.  1993. Methods for epidemiologic analyses of multiple exposures: a review and a comparative study of maximum-likelihood, preliminary testing, and empirical-Bayes regression. Stat. Med. 12:717–36 [Google Scholar]
  51. Greenland S. 51.  1997. Estimating relative risk functions in case-control studies using a nonparametric logistic regression. Am. J. Epidemiol. 146:883–84 [Google Scholar]
  52. Greenland S. 52.  2000. Principles of multilevel modelling. Int. J. Epidemiol. 29:158–67 [Google Scholar]
  53. Greenland S. 53.  2000. When should epidemiologic regressions use random coefficients?. Biometrics 56:915–21 [Google Scholar]
  54. Greenland S. 54.  2001. Putting background information about relative risks into conjugate prior distributions. Biometrics 57:663–70 [Google Scholar]
  55. Greenland S. 55.  2002. A review of multilevel theory for ecologic analyses. Stat. Med. 21:389–95 [Google Scholar]
  56. Greenland S. 56.  2005. Epidemiologic measures and policy formulation: lessons from potential outcomes. Emerg. Themes Epidemiol. 2:5 [Google Scholar]
  57. Greenland S. 57.  2005. Multiple-bias modeling for observational studies (with discussion). J. R. Stat. Soc. Ser. A 168:267–308 [Google Scholar]
  58. Greenland S. 58.  2006. Bayesian perspectives for epidemiological research: I. Foundations and basic methods. Int. J. Epidemiol. 35:765–75 [Google Scholar]
  59. Greenland S. 59.  2007. Bayesian perspectives for epidemiological research. II. Regression analysis. Int. J. Epidemiol. 36:195–202 [Google Scholar]
  60. Greenland S. 60.  2008. Introduction to regression modeling. See Ref. 120 418–55
  61. Greenland S. 61.  2008. Introduction to regression models. See Ref. 120 381–417
  62. Greenland S. 62.  2008. Invited commentary: Variable selection versus shrinkage in the control of multiple confounders. Am. J. Epidemiol. 167:523–29 [Google Scholar]
  63. Greenland S. 63.  2009. Dealing with uncertainty about investigator bias: Disclosure is informative. J. Epidemiol. Community Health 63:593–98 [Google Scholar]
  64. Greenland S. 64.  2009. Relaxation penalties and priors for plausible modeling of nonidentified bias sources. Stat. Sci. 24:195–210 [Google Scholar]
  65. Greenland S. 65.  2012. Causal inference as a prediction problem: assumptions, identification, and evidence synthesis. See Ref. 10 43–58
  66. Greenland S. 66.  2012. Intuitions, simulations, theorems: the role and limits of methodology. Epidemiology 23:440–42 [Google Scholar]
  67. Greenland S. 67.  2014. Regression methods for epidemiological analysis. See Ref. 1 1087–159
  68. Greenland S. 68.  2014. Sensitivity analysis and bias analysis. See Ref. 1 685–706
  69. Greenland S, Lash TL. 69.  2008. Bias analysis. See Ref. 120 345–80
  70. Greenland S, Maclure M, Schlesselman JJ, Poole C. 70.  1991. Standardized regression coefficients: a further critique and review of some alternatives. Epidemiology 2:387–92 [Google Scholar]
  71. Greenland S, Neutra RR. 71.  1980. Control of confounding in the assessment of medical technology. Int. J. Epidemiol. 9:361–67 [Google Scholar]
  72. Greenland S, Pearl J. 72.  2011. Adjustments and their consequences—collapsibility analysis using graphical models. Int. Stat. Rev. 79:401–26 [Google Scholar]
  73. Greenland S, Rothman KJ. 73.  2008. Fundamentals of epidemiologic data analysis. See Ref. 120 213–37
  74. Greenland S, Rothman KJ. 74.  2008. Introduction to stratified analysis. See Ref. 120 258–82
  75. Greenland S, Schlesselman JJ, Criqui MH. 75.  1986. The fallacy of employing standardized regression coefficients and correlations as measures of effect. Am. J. Epidemiol. 123:203–8 [Google Scholar]
  76. Greenland S, Schwartzbaum JA, Finkle WD. 76.  2000. Problems due to small samples and sparse data in conditional logistic regression analysis. Am. J. Epidemiol. 151:531–39 [Google Scholar]
  77. Gustafson P, Greenland S. 77.  2009. Interval estimation for messy observational data. Stat. Sci. 24:328–42 [Google Scholar]
  78. Gustafson P, McCandless LC. 78.  2010. Probabilistic approaches to better quantifying the results of epidemiologic studies. Int. J. Environ. Res. Public Health 7:1520–39 [Google Scholar]
  79. Harrell F. 79.  2001. Regression Modeling Strategies New York: Springer [Google Scholar]
  80. Hastie T, Tibshirani R, Friendman J. 80.  2009. The Elements of Statistical Learning: Data Mining, Inference and Prediction New York: Springer. 2nd ed. [Google Scholar]
  81. Hauck WW, Anderson S. 81.  1986. A proposal for interpreting and reporting negative studies. Stat. Med. 5:203–9 [Google Scholar]
  82. Hernán MA. 82.  2005. Invited commentary: Hypothetical interventions to define causal effects—afterthought or prerequisite?. Am. J. Epidemiol. 162:618–20 [Google Scholar]
  83. Hernán MA, Robins JM. 83.  2015. Causal Inference New York: Chapman & Hall/CRC. In press [Google Scholar]
  84. Hurvich CM, Tsai CL. 84.  1990. The impact of model selection on inference in linear regression. Am. Stat. 44:214–17 [Google Scholar]
  85. Joffe MM. 85.  2009. Exhaustion, automation, theory, and confounding. Epidemiology 20:523–24 [Google Scholar]
  86. Kabaila P. 86.  2009. The coverage properties of confidence regions after model selection. Int. Stat. Rev. 77:405–14 [Google Scholar]
  87. Kelly KT. 87.  2011. Simplicity, truth, and probability. Handbook of the Philosophy of Science: Philosophy of Statistics PS Bandyopadhyay, MR Forster 983–1026 North Holland: Elsevier [Google Scholar]
  88. Kendall MG. 88.  1963. Comment on JO Irwin. The place of mathematics in medical and biological statistics. J. R. Stat. Soc. Ser. A 126:1–45 [Google Scholar]
  89. Kleinbaum D, Kupper LL, Morgenstern H. 89.  1982. Epidemiologic Research: Principles and Quantitative Methods. Belmont, CA: Lifetime Learn. [Google Scholar]
  90. Krieger N. 90.  2011. Epidemiology and the People's Health: Theory and Context New York: Oxford Univ. Press [Google Scholar]
  91. Lash TL. 91.  2007. Heuristic thinking and inference from observational epidemiology. Epidemiology 18:67–72 [Google Scholar]
  92. Lash TL, Fox MP, Fink AK. 92.  2009. Applying Quantitative Bias Analysis to Epidemiologic Data Boston: Springer [Google Scholar]
  93. Lash TL, Fox MP, MacLehose RF, Maldonado G, McCandless LC, Greenland S. 93.  2014. Good practices for quantitative bias analysis. Int. J. Epidemiol. 431969–85 [Google Scholar]
  94. Leamer E. 94.  1978. Specification Searches New York: Wiley [Google Scholar]
  95. Lee BK, Lessler J, Stuart EA. 95.  2010. Improving propensity score weighting using machine learning. Stat. Med. 29:337–46 [Google Scholar]
  96. Maldonado G, Greenland S. 96.  1993. Simulation study of confounder-selection strategies. Am. J. Epidemiol. 138:923–36 [Google Scholar]
  97. Mallows CL, Nair VN. 97.  1987. A unique unbiased estimator with an interesting property. Am. Stat. 41:205–6 [Google Scholar]
  98. Mansson R, Joffe MM, Sun WG, Hennessy S. 98.  2007. On the estimation and use of propensity scores in case-control and case-cohort studies. Am. J. Epidemiol. 166:332–39 [Google Scholar]
  99. McCaffrey DF, Ridgeway G, Morral AR. 99.  2004. Propensity score estimation with boosted regression for evaluating causal effects in observational studies. Psychol. Methods 9:403–25 [Google Scholar]
  100. McCullagh P, Nelder JA. 100.  1989. Generalized Linear Models New York: Chapman & Hall [Google Scholar]
  101. Mickey RM, Greenland S. 101.  1989. The impact of confounder selection criteria on effect estimation. Am. J. Epidemiol. 129:125–37 [Google Scholar]
  102. Miettinen OS. 102.  1976. Stratification by a multivariate confounder score. Am. J. Epidemiol. 104:609–20 [Google Scholar]
  103. Myers JA, Rassen JA, Gagne JJ, Huybrechts KF, Schneeweiss S. 103.  et al. 2011. Effects of adjusting for instrumental variables on bias and precision of effect estimates. Am. J. Epidemiol. 174:1223–27 [Google Scholar]
  104. Pang M, Kaufman JS, Platt RW. 104.  2014. Studying noncollapsibility of the odds ratio with marginal structural and logistic regression models. Stat. Methods Med. Res. In press [Google Scholar]
  105. Patorno E, Glynn RJ, Hernandez-Diaz S, Liu J, Schneeweiss S. 105.  2014. Studies with many covariates and few outcomes selecting covariates and implementing propensity-score-based confounding adjustments. Epidemiology 25:268–78 [Google Scholar]
  106. Pearce N. 106.  2011. Epidemiology in a changing world: variation, causation and ubiquitous risk factors. Int. J. Epidemiol. 40:503–12 [Google Scholar]
  107. Pearce SC. 107.  1983. The monstrous regiment of mathematicians. Statistician 32:375–78 [Google Scholar]
  108. Pearl J. 108.  2009. Causality: Models, Reasoning, and Inference New York: Cambridge Univ. Press. 2nd ed. [Google Scholar]
  109. Pearl J. 109.  2010. On a class of bias-amplifying variables that endanger effect estimates. Proc. 26th Conf. Uncertain. Artificial Intell. P Grunwald, P Spirtes 417–24 Corvallis, OR: AUAI [Google Scholar]
  110. Pearl J. 110.  2012. Understanding bias amplification. Am. J. Epidemiol. 174:1223–27 [Google Scholar]
  111. Perlman MD, Wu L. 111.  1999. The emperor's new tests (with discussion). Stat. Sci. 14:355–81 [Google Scholar]
  112. Phillips CV. 112.  2004. Publication bias in situ. BMC Med. Res. Methodol. 4:20 [Google Scholar]
  113. Popper KR. 113.  1959. The Logic of Scientific Discovery New York: Basic Books [Google Scholar]
  114. Rassen JA, Glynn RJ, Brookhart MA, Schneeweiss S. 114.  2011. Covariate selection in high-dimensional propensity score analyses of treatment effects in small samples. Am. J. Epidemiol. 173:1404–13 [Google Scholar]
  115. Robins JM. 115.  1999. Marginal structural models versus structural nested models as tools for causal inference. Statistical Models in Epidemiology: The Environment and Clinical Trials ME Halloran, D Berry 11695–134 New York: Springer [Google Scholar]
  116. Robins JM, Greenland S. 116.  1986. The role of model selection in causal inference from nonexperimental data. Am. J. Epidemiol. 123:392–402 [Google Scholar]
  117. Robins JM, Hernán MA. 117.  2009. Estimation of the causal effects of time-varying exposures. Longitudinal Data Analysis G Fitzmaurice, M Davidian, G Verbeke, G Molenberghs 553–600 New York: Chapman & Hall/CRC Press [Google Scholar]
  118. Robins JM, Mark SD, Newey WK. 118.  1992. Estimating exposure effects by modeling the expectation of exposure conditional on confounders. Biometrics 48:479–95 [Google Scholar]
  119. Rosenbaum PR. 119.  2002. Observational Studies New York: Springer [Google Scholar]
  120. Rothman KJ, Greenland S, Lash TL. 120.  2008. Modern Epidemiology. Philadelphia, PA: Lippincott Williams & Wilkins. 3rd ed. [Google Scholar]
  121. Rothman KJ, Greenland S, Lash TL. 121.  2008. Validity in epidemiologic studies. See Ref. 120 128–47
  122. Royston P, Sauerbrei W. 122.  2008. Multivariable Model-Building: A Pragmatic Approach to Regression Analysis Based on Fractional Polynomials for Modelling Continuous Variables Chichester, UK: Wiley [Google Scholar]
  123. Rubin DB. 123.  1991. Practical implications of modes of statistical inference for causal effects and the critical role of the assignment mechanism. Biometrics 47:1213–34 [Google Scholar]
  124. Rubin DB. 124.  2007. The design versus the analysis of observational studies for causal effects: parallels with the design of randomized trials. Stat. Med. 26:20–36 [Google Scholar]
  125. Schlesselman JJ. 125.  1982. Case-Control Studies: Design, Conduct, Analysis New York: Oxford Univ. Press [Google Scholar]
  126. Schneeweiss S, Rassen JA, Glynn RJ, Avorn J, Mogun H, Brookhart MA. 126.  2009. High-dimensional propensity score adjustment in studies of treatment effects using health care claims data. Epidemiology 20:512–22 [Google Scholar]
  127. Sclove SL, Morris C, Radhakrishnan R. 127.  1972. Non-optimality of preliminary-test estimators for mean of a multivariate normal distribution. Ann. Math. Stat. 43:1481–90 [Google Scholar]
  128. Senn SJ. 128.  1998. Mathematics: governess or handmaiden?. Statistician 47:251–59 [Google Scholar]
  129. Shrier I. 129.  2009. Propensity scores. Stat. Med. 28:1317–18 [Google Scholar]
  130. Sjölander A. 130.  2009. Propensity scores and M-structures. Stat. Med. 28:1416–20 [Google Scholar]
  131. Sterne JAC, Smith GD. 131.  2001. Sifting the evidence—what's wrong with significance tests?. BMJ 322:226–31 [Google Scholar]
  132. Steyerberg EW. 132.  2008. Clinical Prediction Models: A Practical Approach to Development, Validation, and Updating New York: Springer [Google Scholar]
  133. Steyerberg EW, Eijkemans MJC, Habbema JDF. 133.  1999. Stepwise selection in small data sets: a simulation study of bias in logistic regression analysis. J. Clin. Epidemiol. 52:935–42 [Google Scholar]
  134. Sullivan S, Greenland S. 134.  2013. Bayesian regression in SAS software. Int. J. Epidemiol. 42:308–17 [Google Scholar]
  135. Susser M. 135.  1977. Judgement and causal inference. Am. J. Epidemiol. 105:1–15 [Google Scholar]
  136. Tchetgen Tchetgen EJ. 136.  2013. On a closed-form doubly robust estimator of the adjusted odds ratio for a binary exposure. Am. J. Epidemiol. 177:1314–16 [Google Scholar]
  137. Turner RM, Spiegelhalter DJ, Smith GCS, Thompson SG. 137.  2009. Bias modelling in evidence synthesis. J. R. Stat. Soc. Ser. A 172:21–47 [Google Scholar]
  138. Valeri L, VanderWeele TJ. 138.  2013. Mediation analysis allowing for exposure-mediator interactions and causal interpretation: theoretical assumptions and implementation with SAS and SPSS macros. Psychol. Methods 18:137–50 [Google Scholar]
  139. Vandenbroucke JP. 139.  1987. Should we abandon statistical modeling altogether?. Am. J. Epidemiol. 126:10–13 [Google Scholar]
  140. van der Laan MJ, Rose R. 140.  2011. Targeted Learning: Causal Inference for Observational and Experimental Data New York: Springer [Google Scholar]
  141. VanderWeele TJ. 141.  2009. On the relative nature of overadjustment and unnecessary adjustment. Epidemiology 20:496–99 [Google Scholar]
  142. VanderWeele TJ. 142.  2011. Subtleties of explanatory language: What is meant by “mediation”. Eur. J. Epidemiol. 26:343–46 [Google Scholar]
  143. VanderWeele TJ, Hernán MA. 143.  2012. Causal effects and natural laws: towards a conceptualization of causal counterfactuals for nonmanipulable exposures with application to the effects of race and sex. See Ref. 10 101–13
  144. VanderWeele TJ, Hernán M. 144.  2013. Causal inference under multiple versions of treatment. J. Causal Inference 1:1–20 [Google Scholar]
  145. VanderWeele TJ, Shpitser I. 145.  2011. A new criterion for confounder selection. Biometrics 67:1406–13 [Google Scholar]
  146. Vansteelandt S, Bekaert M, Claeskens G. 146.  2012. On model selection and model misspecification in causal inference. Stat. Methods Med. Res. 21:7–30 [Google Scholar]
  147. Vansteelandt S, Daniel RM. 147.  2014. On regression adjustment for the propensity score. Stat. Med. 33:4053–72 [Google Scholar]
  148. Viallefont V, Raftery AE, Richardson S. 148.  2001. Variable selection and Bayesian model averaging in case-control studies. Stat. Med. 20:3215–30 [Google Scholar]
  149. Vittinghoff E, McCulloch CE. 149.  2007. Relaxing the rule of ten events per variable in logistic and Cox regression. Am. J. Epidemiol. 165:710–18 [Google Scholar]
  150. von Elm E, Altman DG, Egger M, Pocock SJ, Gøtzsche PC. 150.  et al. 2007. The Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) statement: guidelines for reporting observational studies. Lancet 370:1453–57 [Google Scholar]
  151. Wakefield J, Salway R. 151.  2001. A statistical framework for ecological and aggregate studies. J. R. Stat. Soc. Ser. A 164:119–37 [Google Scholar]
  152. Weiss RE. 152.  1995. The influence of variable selection—a Bayesian diagnostic perspective. J. Am. Stat. Assoc. 90:619–25 [Google Scholar]
  153. Weng HY, Hsueh YH, Messam LLM, Hertz-Picciotto I. 153.  2009. Methods of covariate selection: directed acyclic graphs and the change-in-estimate procedure. Am. J. Epidemiol. 169:1182–90 [Google Scholar]
  154. Westreich D, Cole SR, Funk MJ, Brookhart MA, Stuermer T. 154.  2011. The role of the c-statistic in variable selection for propensity score models. Pharmacoepidemiol. Drug Saf. 20:317–20 [Google Scholar]
  155. Westreich D, Greenland S. 155.  2013. The table 2 fallacy: presenting and interpreting confounder and modifier coefficients. Am. J. Epidemiol. 177:292–98 [Google Scholar]
  156. Westreich D, Lessler J, Funk MJ. 156.  2010. Propensity score estimation: neural networks, support vector machines, decision trees (CART), and meta-classifiers as alternatives to logistic regression. J. Clin. Epidemiol. 63:826–33 [Google Scholar]
  157. Whitehead AN. 157.  1929. Process and Reality New York: Harper [Google Scholar]
  158. Wilson A, Reich BJ. 158.  2014. Confounder selection via penalized credible regions. Biometrics 70852–61 [Google Scholar]
  159. Ziman J. 159.  1978. Reliable Knowledge New York: Cambridge [Google Scholar]
  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error