1932

Abstract

Methods for handling missing data in clinical psychology studies are reviewed. Missing data are defined, and a taxonomy of main approaches to analysis is presented, including complete-case and available-case analysis, weighting, maximum likelihood, Bayes, single and multiple imputation, and augmented inverse probability weighting. Missingness mechanisms, which play a key role in the performance of alternative methods, are defined. Approaches to robust inference, and to inference when the mechanism is potentially missing not at random, are discussed.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-clinpsy-080822-051727
2024-07-12
2025-04-21
Loading full text...

Full text loading...

/deliver/fulltext/clinpsy/20/1/annurev-clinpsy-080822-051727.html?itemId=/content/journals/10.1146/annurev-clinpsy-080822-051727&mimeType=html&fmt=ahah

Literature Cited

  1. Afifi AA, Elashoff RM. 1966.. Missing observations in multivariate statistics 1: review of the literature. . J. Am. Stat. Assoc. 61::595604
    [Google Scholar]
  2. Anderson TW. 1957.. Maximum likelihood estimates for the multivariate normal distribution when some observations are missing. . J. Am. Stat. Assoc. 52::2003
    [Crossref] [Google Scholar]
  3. Andrews M, Baguley T. 2013.. Prior approval: the growth of Bayesian methods in psychology. . Br. J. Math. Stat. Psychol. 66:(1):17
    [Crossref] [Google Scholar]
  4. Andridge RH, Little RJ. 2010.. A review of hot deck imputation for survey nonresponse. . Int. Stat. Rev. 78:(1):4064
    [Crossref] [Google Scholar]
  5. Andridge RH, Little RJ. 2011.. Proxy pattern-mixture analysis for survey nonresponse. . J. Off. Stat. 27:(2):15380
    [Google Scholar]
  6. Belson WA. 1959.. Matching and prediction on the principle of biological classification. . Appl. Stat. 8::6575
    [Crossref] [Google Scholar]
  7. Carpenter JR, Kenward MG. 2014.. Multiple Imputation and Its Application. New York:: Wiley
    [Google Scholar]
  8. Chen HY, Little RJ. 1999.. A test of missing completely at random for generalized estimating equations with missing data. . Biometrika 86:(1):113
    [Crossref] [Google Scholar]
  9. Chipman HA, George EI, McCulloch RE. 2010.. BART: Bayesian additive regression trees. . Ann. Appl. Stat. 4:(1):26698
    [Crossref] [Google Scholar]
  10. Cox DR. 1975.. Partial likelihood. . Biometrika 62:(2):26976
    [Crossref] [Google Scholar]
  11. Daniels MJ, Hogan JW. 2008.. Missing Data in Longitudinal Studies: Strategies for Bayesian Modeling and Sensitivity Analysis. Hoboken, NJ:: Taylor & Francis
    [Google Scholar]
  12. Dempster AP, Laird NM, Rubin DB. 1977a.. Maximum likelihood from incomplete data via the EM algorithm. . J. R. Stat. Soc. B 39:(1):122. Discussion. J. R. Stat. Soc. B 39:(1):2238
    [Google Scholar]
  13. Dempster AP, Schatzoff M, Wermuth N. 1977b.. A simulation study of alternatives to ordinary least squares. . J. Am. Stat. Assoc. 72:(357):7791
    [Crossref] [Google Scholar]
  14. Enders CK. 2022.. Applied Missing Data Analysis. New York:: Guilford Press. , 2nd ed..
    [Google Scholar]
  15. Enders CK. 2023.. Missing data: an update on the state of the art. . Psychol. Methods. In press. https://doi.org/10.1037/met0000563
    [Google Scholar]
  16. Freedman DA. 2006.. On the so-called “Huber sandwich estimator” and “robust standard errors. .” Am. Stat. 60::299302
    [Crossref] [Google Scholar]
  17. Gelman A, Lee D, Guo J. 2015.. Stan: a probabilistic programming language for Bayesian inference and optimization. . J. Educ. Behav. Stat. 40:(5):53043
    [Crossref] [Google Scholar]
  18. Giusti C, Little RJ. 2011.. An analysis of nonignorable nonresponse to income in a survey with a rotating panel design. . J. Off. Stat. 27:(2):21129
    [Google Scholar]
  19. Glynn RJ, Laird NM, Rubin DB. 1986.. Selection modeling versus mixture modeling with nonignorable nonresponse. . In Drawing Inferences from Self-Selected Samples, ed. H Wainer , pp. 11542. New York:: Springer-Verlag
    [Google Scholar]
  20. Glynn RJ, Laird NM, Rubin DB. 1993.. Multiple imputation in mixture models for nonignorable nonresponse with follow-ups. . J. Am. Stat. Assoc. 88::98493
    [Crossref] [Google Scholar]
  21. Graham JG. 2009.. Missing data analysis: making it work in the real world. . Annu. Rev. Psychol. 60::54976
    [Crossref] [Google Scholar]
  22. Greenlees WS, Reece JS, Zieschang KD. 1982.. Imputation of missing values when the probability of response depends on the variable being imputed. . J. Am. Stat. Assoc. 77::25161
    [Crossref] [Google Scholar]
  23. Hartley HO. 1958.. Maximum likelihood estimation from incomplete data. . Biometrics 14::17494
    [Crossref] [Google Scholar]
  24. Harville DA. 1977.. Maximum likelihood approaches to variance component estimation and to related problems. . J. Am. Stat. Assoc. 72::32038. Discussion. J. Am. Stat. Assoc. 72::33840
    [Google Scholar]
  25. Heckman J. 1976.. The common structure of statistical models of truncation, sample selection and limited dependent variables, and a simple estimator for such models. . Ann. Econ. Soc. Meas. 5::47592
    [Google Scholar]
  26. Heitjan DF. 1994.. Ignorability in general incomplete-data models. . Biometrika 81:(4):7018
    [Crossref] [Google Scholar]
  27. Heitjan DF, Rubin DB. 1990.. Inference from coarse data via multiple imputation with application to age heaping. . J. Am. Stat. Assoc. 85:(410):30414
    [Crossref] [Google Scholar]
  28. Holland PW. 1986.. Statistics and causal inference. . J. Am. Stat. Assoc. 81:(396):94560
    [Crossref] [Google Scholar]
  29. Imbens GW, Rubin DB. 2015.. Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction. New York:: Cambridge Univ. Press
    [Google Scholar]
  30. Jacobsen M, Keiding N. 1995.. Coarsening at random in general sample spaces and random censoring in continuous time. . Ann. Stat. 23:(3):77486
    [Crossref] [Google Scholar]
  31. Janssens W, van der Gaag J, Rinke de Wit TF, Tanović Z. 2014.. Refusal bias in the estimation of HIV prevalence. . Demography 51:(3):113157
    [Crossref] [Google Scholar]
  32. Jennrich RI, Schluchter MD. 1986.. Incomplete repeated-measures models with structured covariance matrices. . Biometrics 42::80520
    [Crossref] [Google Scholar]
  33. Keller BT, Enders CK. 2022.. Blimp User's Guide, Version 3. www.appliedmissingdata.com/blimp
    [Google Scholar]
  34. Kenward MG, Molenberghs G. 1998.. Likelihood-based frequentist inference when data are missing at random. . Stat. Sci. 3:(3):23647
    [Google Scholar]
  35. Laird NM, Ware JH. 1982.. Random-effects models for longitudinal data. . Biometrics 38::96374
    [Crossref] [Google Scholar]
  36. Lee KL, Tilling K, Cornish RP, Little RJ, Bell M, et al. 2021.. Framework for the treatment and reporting of missing data in observational studies: the Treatment And Reporting of Missing data in Observational Studies framework. . J. Clin. Epidemiol. 134::7988
    [Crossref] [Google Scholar]
  37. Lillard L, Smith JP, Welch F. 1986.. What do we really know about wages? The importance of nonreporting and census imputation. . J. Political Econ. 94::489506
    [Crossref] [Google Scholar]
  38. Lipsitz SR, Ibrahim JG, Zhao LP. 1999.. A weighted estimating equation for missing covariate data with properties similar to maximum likelihood. . J. Am. Stat. Assoc. 94::114760
    [Crossref] [Google Scholar]
  39. Little RJ. 1988.. A test of missing completely at random for multivariate data with missing values. . J. Am. Stat. Assoc. 83::1198202
    [Crossref] [Google Scholar]
  40. Little RJ. 1992.. Regression with missing X's: a review. . J. Am. Stat. Assoc. 87::122737
    [Google Scholar]
  41. Little RJ. 1993.. Pattern-mixture models for multivariate incomplete data. . J. Am. Stat. Assoc. 88::12534
    [Crossref] [Google Scholar]
  42. Little RJ. 1994.. A class of pattern-mixture models for normal missing data. . Biometrika 81:(3):47183
    [Crossref] [Google Scholar]
  43. Little RJ. 1995.. Modeling the drop-out mechanism in longitudinal studies. . J. Am. Stat. Assoc. 90::111221
    [Crossref] [Google Scholar]
  44. Little RJ. 2008.. Selection and pattern-mixture models. . In Advances in Longitudinal Data Analysis, ed. G Fitzmaurice, M Davidian, G Verbeke, G Molenberghs , pp. 40931. London:: CRC Press
    [Google Scholar]
  45. Little RJ. 2020.. On algorithmic and modeling approaches to imputation in large data sets. . Stat. Sin. 30::168596
    [Google Scholar]
  46. Little RJ. 2021a.. A note about the definition of propensity weights. . J. Surv. Stat. Methodol. 10:(4):1098106
    [Google Scholar]
  47. Little RJ. 2021b.. Missing data assumptions. . Annu. Rev. Stat. Appl. 8::89107
    [Crossref] [Google Scholar]
  48. Little RJ, An H. 2004.. Robust likelihood-based analysis of multivariate data with missing values. . Stat. Sin. 14::94968
    [Google Scholar]
  49. Little RJ, Carpenter JR, Lee KJ. 2022.. A comparison of three popular methods for handling missing data: complete-case analysis, weighting, and multiple imputation. . Sociol. Methods Res. https://doi.org/10.1177/00491241221113873
    [Google Scholar]
  50. Little RJ, Cohen ML, Dickersin K, Emerson SS, Farrar JT, et al. 2012.. The design and conduct of clinical trials to limit missing data. . Stat. Med. 31:(28):343343
    [Crossref] [Google Scholar]
  51. Little RJ, Rubin DB. 2019.. Statistical Analysis with Missing Data. New York:: Wiley. , 3rd ed..
    [Google Scholar]
  52. Little RJ, Rubin DB, Zanganeh SZ. 2016.. Conditions for ignoring the missing-data mechanism in likelihood inferences for parameter subsets. . J. Am. Stat. Assoc. 112::31420
    [Crossref] [Google Scholar]
  53. Little RJ, Vartivarian S. 2005.. Does weighting for nonresponse increase the variance of survey means?. Surv. Methodol. 31::16168
    [Google Scholar]
  54. Little RJ, Zhang N. 2011.. Subsample ignorable likelihood for regression analysis with missing data. . J. R. Stat. Soc. C 60:(4):591605
    [Crossref] [Google Scholar]
  55. Loh W-Y, Eltinge J, Cho MJ, Li Y. 2019.. Classification and regression trees and forests for incomplete data from sample surveys. . Stat. Sin. 29::43153
    [Google Scholar]
  56. Marini MM, Olsen AR, Rubin DB. 1980.. Maximum-likelihood estimation in panel studies with missing data. . Sociol. Methodol. 11::31457
    [Crossref] [Google Scholar]
  57. Mealli F, Rubin DB. 2015.. Clarifying missing at random and related definitions and implications when coupled with exchangeability. . Biometrika 102:(4):9951000. Correction. Biometrika 103:(2):491
    [Google Scholar]
  58. Meng XL, Rubin DB. 1993.. Maximum likelihood estimation via the ECM algorithm: a general framework. . Biometrika 80::26778
    [Crossref] [Google Scholar]
  59. Meng XL, van Dyk D. 1997.. The EM algorithm—an old folk song sung to a fast new tune. . J. R. Stat. Soc. B 59::51167
    [Crossref] [Google Scholar]
  60. Mohan K, Pearl J. 2021.. Graphical models for processing missing data. . J. Am. Stat. Assoc. 116:(534):102337
    [Crossref] [Google Scholar]
  61. Morgan JA, Sonquist JN. 1963.. Problems in the analysis of survey data: and a proposal. . J. Am. Stat. Assoc. 58::41534
    [Crossref] [Google Scholar]
  62. Muthén LK, Muthén BO. 1998–2017.. Mplus User's Guide. Los Angeles:: Muthén & Muthén. , 8th ed..
    [Google Scholar]
  63. Natl. Res. Counc. 2010.. The Prevention and Treatment of Missing Data in Clinical Trials. Washington, DC:: Natl. Acad. Press
    [Google Scholar]
  64. Pearl J, Glymour M, Jewell NP. 2016.. Causal Inference in Statistics: A Primer. New York:: Wiley
    [Google Scholar]
  65. Rabe-Hesketh S, Skrondal A. 2022.. Multilevel and Longitudinal Modeling Using Stata. College Station, TX:: Stata Press. , 4th ed..
    [Google Scholar]
  66. Raghunathan TE. 2016.. Missing Data Analysis in Practice. New York:: Chapman & Hall/CRC
    [Google Scholar]
  67. Raghunathan TE, Lepkowski JM, Van Hoewyk J, Solenberger P. 2001.. A multivariate technique for multiply imputing missing values using a sequence of regression models. . Surv. Methodol. 27:(1):8595
    [Google Scholar]
  68. Rao JNK. 1996.. On variance estimation with imputed survey data. . J. Am. Stat. Assoc. 91::499506
    [Crossref] [Google Scholar]
  69. Robins JM, Gill RD. 1997.. Non-response models for the analysis of non-monotone ignorable missing data. . Stat. Med. 16::3956
    [Crossref] [Google Scholar]
  70. Robins JM, Rotnitzky A. 1995.. Semiparametric efficiency in multivariate regression models with missing data. . J. Am. Stat. Assoc. 90::12229
    [Crossref] [Google Scholar]
  71. Robins JM, Rotnitzky A, Zhao LP. 1995.. Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. . J. Am. Stat. Assoc. 90::10621
    [Crossref] [Google Scholar]
  72. Rubin DB. 1976.. Inference and missing data. . Biometrika 63::58192
    [Crossref] [Google Scholar]
  73. Rubin DB. 1977.. Formalizing subjective notions about the effect of nonrespondents in sample surveys. . J. Am. Stat. Assoc. 72::53843
    [Crossref] [Google Scholar]
  74. Rubin DB. 1978.. Bayesian inference for causal effects: the role of randomization. . Ann. Stat. 6:(1):3458
    [Crossref] [Google Scholar]
  75. Rubin DB. 1987.. Multiple Imputation for Nonresponse in Surveys. New York:: Wiley
    [Google Scholar]
  76. Rubin DB. 2006.. Causal inference through potential outcomes and principal stratification: application to studies with “censoring” due to death. . Stat. Sci. 21:(3):299309
    [Google Scholar]
  77. Rubin DB, Schenker N. 1986.. Multiple imputation for interval estimation from simple random samples with ignorable nonresponse. . J. Am. Stat. Assoc. 81::36674
    [Crossref] [Google Scholar]
  78. Schafer JL. 1997.. Analysis of Incomplete Multivariate Data. Boca Raton, FL:: Chapman & Hall/CRC
    [Google Scholar]
  79. Schafer JL, Graham JW. 2002.. Missing data: our view of the state of the art. . Psychol. Methods 7:(2):14777
    [Crossref] [Google Scholar]
  80. Scharfstein DO, Rotnitsky A, Robins JM. 1999.. Adjusting for nonignorable drop-out using semiparametric nonresponse models. . J. Am. Stat. Assoc. 94::1096120. Discussion. J. Am. Stat. Assoc. 94::112146
    [Google Scholar]
  81. Schenker N, Taylor JMG. 1996.. Partially parametric techniques for multiple imputation. . Comput. Stat. Data Anal. 22::42546
    [Crossref] [Google Scholar]
  82. Seaman S, Galati J, Jackson D, Carlin J. 2013.. What is meant by “missing at random”?. Stat. Sci. 28:(2):25768
    [Crossref] [Google Scholar]
  83. Tanner MA, Wong WH. 1987.. The calculation of posterior distributions by data augmentation. . J. Am. Stat. Assoc. 82::52840. Discussion. J. Am. Stat. Assoc. 82::54150
    [Google Scholar]
  84. Tibshirani R. 1996.. Regression shrinkage and selection via the Lasso. . J. R. Stat. Soc. B 58:(1):26788
    [Crossref] [Google Scholar]
  85. Trawinski IM, Bargmann RW. 1964.. Maximum likelihood with incomplete multivariate data. . Ann. Math. Stat. 35::64757
    [Crossref] [Google Scholar]
  86. van Buuren S. 2018.. Flexible Imputation of Missing Data. Boca Raton, FL:: Chapman & Hall/CRC Press. , 2nd ed..
    [Google Scholar]
  87. van Buuren S, Oudshoorn CGM. 1999.. Flexible multivariate imputation by MICE. TNO Rep. PG/VGZ 99.054 , TNO Prev. Gezondh., Leiden, Neth.:
    [Google Scholar]
  88. van de Schoot R, Winter SD, Ryan O, Zondervan-Zwijnenburg M, Depaoli S. 2017.. A systematic review of Bayesian articles in psychology: the last 25 years. . Psychol. Methods 22:(2):21739
    [Crossref] [Google Scholar]
  89. Wagenmakers E-J, Love J, Marsman M, Jamil T, Ly A, et al. 2018.. Bayesian inference for psychology. Part II: example applications with JASP. . Psychon. Bull. Rev. 25:(1):5876
    [Crossref] [Google Scholar]
  90. Zhang G, Little RJ. 2009.. Extensions of the penalized spline of propensity prediction method of imputation. . Biometrics 65:(3):91118
    [Crossref] [Google Scholar]
  91. Zhang G, Little RJ. 2011.. A comparative study of doubly-robust estimators of the mean with missing data. . J. Stat. Comput. Simul. 81:(12):203958
    [Crossref] [Google Scholar]
  92. Zhou Y, Kalbfleisch JD, Little RJ. 2010.. Block-conditional MAR models for missing data. . Stat. Sci. 25:(4):51732
    [Crossref] [Google Scholar]
  93. Zou H, Hastie T. 2005.. Regularization and variable selection with the elastic net. . J. R. Stat. Soc. B 67::30120
    [Crossref] [Google Scholar]
/content/journals/10.1146/annurev-clinpsy-080822-051727
Loading
/content/journals/10.1146/annurev-clinpsy-080822-051727
Loading

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error