Handling Missing Data in Instrumental Variable Methods for Causal Inference

Edward H. Kennedy; Jacqueline A. Mauro; Michael J. Daniels; Natalie Burns; Dylan S. Small

doi:10.1146/annurev-statistics-031017-100353

Annual Review of Statistics and Its Application

Volume 6, 2019

Review Article

Free

Handling Missing Data in Instrumental Variable Methods for Causal Inference

Edward H. Kennedy¹, Jacqueline A. Mauro¹, Michael J. Daniels², Natalie Burns², and Dylan S. Small³
View Affiliations Hide Affiliations

Affiliations: ¹Department of Statistics and Data Science, Carnegie Mellon University, Pittsburgh, Pennsylvania 15213, USA ²Department of Statistics, University of Florida, Gainesville, Florida 32611, USA ³Department of Statistics, The Wharton School, University of Pennsylvania, Philadelphia, Pennsylvania 19104, USA; email: [email protected]
Vol. 6:125-148 (Volume publication date March 2019) https://doi.org/10.1146/annurev-statistics-031017-100353
First published as a Review in Advance on November 28, 2018
Copyright © 2019 by Annual Reviews. All rights reserved

Abstract

In instrumental variable studies, missing instrument data are very common. For example, in the Wisconsin Longitudinal Study, one can use genotype data as a Mendelian randomization–style instrument, but this information is often missing when subjects do not contribute saliva samples or when the genotyping platform output is ambiguous. Here we review missing at random assumptions one can use to identify instrumental variable causal effects, and discuss various approaches for estimation and inference. We consider likelihood-based methods, regression and weighting estimators, and doubly robust estimators. The likelihood-based methods yield the most precise inference and are optimal under the model assumptions, while the doubly robust estimators can attain the nonparametric efficiency bound while allowing flexible nonparametric estimation of nuisance functions (e.g., instrument propensity scores). The regression and weighting estimators can sometimes be easiest to describe and implement. Our main contribution is an extensive review of this wide array of estimators under varied missing-at-random assumptions, along with discussion of asymptotic properties and inferential tools. We also implement many of the estimators in an analysis of the Wisconsin Longitudinal Study, to study effects of impaired cognitive functioning on depression.

Keyword(s): causal inference, instrumental variable, missing data, observational study, semiparametric efficiency

Article metrics loading...

/content/journals/10.1146/annurev-statistics-031017-100353

2019-03-07

2024-05-05

Full text loading...

/deliver/fulltext/statistics/6/1/annurev-statistics-031017-100353.html?itemId=/content/journals/10.1146/annurev-statistics-031017-100353&mimeType=html&fmt=ahah

Literature Cited

Aaslund O, Grønquist H. 2010. Family size and child outcomes: is there really no trade-off. Labour Econ. 17:130–39
[Google Scholar]
Abadie A. 2003. Semiparametric instrumental variable estimation of treatment response models. J. Econom. 113:2231–63
[Google Scholar]
Abadie A, Imbens GW. 2006. Large sample properties of matching estimators for average treatment effects. Econometrica 74:1235–67
[Google Scholar]
Andridge RR, Little RJ. 2010. A review of hot deck imputation for survey nonresponse. Int. Stat. Rev. 78:140–64
[Google Scholar]
Angrist JD, Imbens GW. 1995. Two-stage least squares estimation of average causal effects in models with variable treatment intensity. J. Am. Stat. Assoc. 90:430431–42
[Google Scholar]
Angrist JD, Imbens GW, Rubin DB. 1996. Identification of causal effects using instrumental variables. J. Am. Stat. Assoc. 91:434444–55
[Google Scholar]
Baiocchi M, Cheng J, Small DS. 2014. Instrumental variable methods for causal inference. Stat. Med. 33:132297–340
[Google Scholar]
Becker G. 1960. An economic analysis of fertility. Demographic and Economic Change in Developed Countries209–40 New York: Columbia Univ. Press
[Google Scholar]
Bickel PJ, Klaassen CA, Ritov Y, Wellner JA. 1993. Efficient and Adaptive Estimation for Semiparametric Models Baltimore, MD: Johns Hopkins Univ. Press
Black W, Devereux P, Salvanes K. 2005. The more the merrier? The effects of family size and birth order on children's education. Q. J. Econ. 120:669–700
[Google Scholar]
Brookhart M, Rassen J, Schneeweiss S. 2010. Instrumental variable methods in comparative safety and effectiveness research. Pharmacoepidemiol. Drug Saf. 19:6537–54
[Google Scholar]
Burgess S, Seaman S, Lawlor DA, Casas JP, Thompson SG. 2011. Missing data methods in Mendelian randomization studies with multiple instruments. Am. J. Epidemiol. 174:91069–76
[Google Scholar]
Burgess S, Thompson SG. 2015. Mendelian Randomization: Methods for Using Genetic Variants in Causal Estimation Boca Raton, FL: CRC
Caceres-Delpiano J. 2006. The impacts of family size on investment in child quality. J. Hum. Resourc. 41:738–54
[Google Scholar]
Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CM. 2006. Measurement Error in Nonlinear Models: A Modern Perspective Boca Raton, FL: CRC
Chaudhuri S, Guilkey DK. 2016. GMM with multiple missing variables. J. Appl. Econom. 31:4678–706
[Google Scholar]
Chib S, Greenberg E. 2007. Semiparametric modeling and estimation of instrumental variable models. J. Comput. Graph. Stat. 16:186–114
[Google Scholar]
Chib S, Hamilton B. 2002. Semiparametric Bayes analysis of longitudinal data treatment models. J. Econom. 110:167–89
[Google Scholar]
Conley T, Hansen CB, McCulloch R, Rossi P. 2008. A semiparametric Bayesian approach to the instrumental variable problem. J. Econom. 144:1276–305
[Google Scholar]
Daniels MJ, Hogan JW. 2008. Missing Data in Longitudinal Studies: Strategies for Bayesian Modeling and Sensitivity Analysis Boca Raton, FL: CRC Press
Didelez V, Sheehan N. 2007. Mendelian randomization as an instrumental variable approach to causal inference. Stat. Methods Med. Res. 16:4309–30
[Google Scholar]
Farlow M, He Y, Tekin S, Xu J, Lane R, Charles H. 2004. Impact of APOE in mild cognitive impairment. Neurology 63:101898–901
[Google Scholar]
Ganguli M, Dodge HH, Shen C, DeKosky ST. 2004. Mild cognitive impairment, amnestic type: an epidemiologic study. Neurology 63:1115–21
[Google Scholar]
Hahn J. 1998. On the role of the propensity score in efficient semiparametric estimation of average treatment effects. Econometrica 66:2315–31
[Google Scholar]
Herd P, Carr D, Roan C. 2014. Cohort profile: Wisconsin longitudinal study (WLS). Int. J. Epidemiol. 43:134–41
[Google Scholar]
Hernán MA, Robins JM. 2006. Instruments for causal inference: an epidemiologist's dream. Epidemiology 17:4360–72
[Google Scholar]
Hirano K, Imbens GW, Ridder G. 2003. Efficient estimation of average treatment effects using the estimated propensity score. Econometrica 71:41161–89
[Google Scholar]
Holland PW. 1988. Causal inference, path analysis and recursive structural equations models. Socoiol. Methodol. 18:449–84
[Google Scholar]
Horvitz DG, Thompson DJ. 1952. A generalization of sampling without replacement from a finite universe. J. Am. Stat. Assoc. 47:260663–85
[Google Scholar]
Imbens GW, Angrist JD. 1994. Identification and estimation of local average treatment effects. Econometrica 62:2467–75
[Google Scholar]
Kennedy EH. 2018. Efficient nonparametric causal inference with missing exposures. arXiv:1802.08952 [stat.ME]
[Google Scholar]
Kennedy EH, Ma Z, McHugh MD, Small DS. 2017. Nonparametric methods for doubly robust estimation of continuous treatment effects. J. R. Stat. Soc. B 79:41229–45
[Google Scholar]
Kennedy EH, Small DS. 2017. Paradoxes in instrumental variable studies with missing data and one-sided noncompliance. arXiv:1705.00506 [stat.ME]
[Google Scholar]
Kraay A. 2012. Instrumental variables regressions with uncertain exclusion restrictions: a Bayesian approach. J. Appl. Econom. 27:1108–28
[Google Scholar]
Lawlor DA, Harbord RM, Sterne JA, Timpson N, Davey Smith G. 2008. Mendelian randomization: using genes as instruments for making causal inferences in epidemiology. Stat. Med. 27:81133–63
[Google Scholar]
Li L, Shen C, Li X, Robins JM. 2013. On weighting approaches for missing data. Stat. Methods Med. Res. 22:114–30
[Google Scholar]
Little RJ, Rubin DB. 2014. Statistical Analysis with Missing Data New York: Wiley
Lopes H, Polson N. 2014. Bayesian instrumental variables: priors and likelihoods. Econom. Rev. 33:1100–21
[Google Scholar]
McKeigue PM, Campbell H, Wild S, Vitart V, Hayward C et al. 2010. Bayesian methods for instrumental variable analysis with genetic instruments (‘Mendelian randomization’): example with urate transporter SLC2A9 as an instrumental variable for effect of urate levels on metabolic syndrome. Int. J. Epidemiol. 39:3907–18
[Google Scholar]
Mogstad M, Wiswall M. 2012. Instrumental variables estimation with partially missing instruments. Econ. Lett. 114:2186–89
[Google Scholar]
Molinari F. 2010. Missing treatments. J. Bus. Econ. Stat. 28:182–95
[Google Scholar]
Ogburn EL, Rotnitzky A, Robins JM. 2015. Doubly robust estimation of the local average treatment effect curve. J. R. Stat. Soc. B 77:2373–96
[Google Scholar]
Okui R, Small DS, Tan Z, Robins JM. 2012. Doubly robust instrumental variable regression. Stat. Sin. 22:1173–205
[Google Scholar]
Radloff LS. 1977. The CES-D scale: a self-report depression scale for research in the general population. Appl. Psychol. Meas. 1:3385–401
[Google Scholar]
Robins JM. 1994. Correcting for noncompliance in randomized trials using structural nested mean models. Commun. Stat. Theory Methods 23:82379–412
[Google Scholar]
Robins JM, Li L, Mukherjee R, Tchetgen Tchetgen E, van der Vaart AW. 2017. Minimax estimation of a functional on a structured high dimensional model. Ann. Stat. 45:51951–87
[Google Scholar]
Robins JM, Li L, Tchetgen Tchetgen EJ, van der Vaart AW. 2008. Higher order influence functions and minimax estimation of nonlinear functionals. Probability and Statistics: Essays in Honor of David A. Freedman D Nolan, T Speed335–421 Beachwood, OH: Inst. Math. Stat.
[Google Scholar]
Robins JM, Rotnitzky A. 1995. Semiparametric efficiency in multivariate regression models with missing data. J. Am. Stat. Assoc. 90:429122–29
[Google Scholar]
Robins JM, Rotnitzky A. 2001. Comment on the Bickel and Kwon article, Inference for semiparametric models: Some questions and an answer. Stat. Sin. 11:920–36
[Google Scholar]
Robins JM, Rotnitzky A, Scharfstein DO. 2000. Sensitivity analysis for selection bias and unmeasured confounding in missing data and causal inference models. Statistical Models in Epidemiology, the Environment, and Clinical Trials ME Halloran, D Berry1–94 New York: Springer
[Google Scholar]
Robins JM, Rotnitzky A, Zhao LP. 1994. Estimation of regression coefficients when some regressors are not always observed. J. Am. Stat. Assoc. 89:427846–66
[Google Scholar]
Robins JM, Rotnitzky A, Zhao LP. 1995. Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. J. Am. Stat. Assoc. 90:429106–21
[Google Scholar]
Rosenzweig M, Wolpin K. 1980. Testing the quantity-quality fertility model: the use of twins as a natural experiment. Econonmetrica 48:227–40
[Google Scholar]
Rubin DB. 1976. Inference and missing data. Biometrika 63:3581–92
[Google Scholar]
Rubin DB. 1996. Multiple imputation after 18+ years. J. Am. Stat. Assoc. 91:434473–89
[Google Scholar]
Rubin DB, van der Laan MJ. 2005.A general imputation methodology for nonparametric regression with censored data. Work. Pap. 194, Div. Biostat., Univ. Calif., Berkeley
Scharfstein DO, Rotnitzky A, Robins JM. 1999. Adjusting for nonignorable drop-out using semiparametric nonresponse models. J. Am. Stat. Assoc. 94:4481096–120
[Google Scholar]
Small DS. 2007. Sensitivity analysis for instrumental variables regression with overidentifying restrictions. J. Am. Stat. Assoc. 102:4781049–58
[Google Scholar]
Smith GD, Ebrahim S. 2003. Mendelian randomization: Can genetic epidemiology contribute to understanding environmental determinants of disease. Int. J. Epidemiol. 32:11–22
[Google Scholar]
Tan Z. 2006. Regression and weighting methods for causal inference using instrumental variables. J. Am. Stat. Assoc. 101:4761607–18
[Google Scholar]
Tan Z. 2010. Marginal and nested structural models using instrumental variables. J. Am. Stat. Assoc. 105:489157–69
[Google Scholar]
Tsiatis AA. 2006. Semiparametric Theory and Missing Data New York: Springer
van der Laan MJ. 2013. Targeted learning of an optimal dynamic treatment, and statistical inference for its mean outcome. Work. Pap. 317, Div. Biostat., Univ. Calif., Berkeley
[Google Scholar]
van der Laan MJ, Robins JM. 2003. Unified Methods for Censored Longitudinal Data and Causality New York: Springer
van der Vaart AW. 2000. Asymptotic Statistics Cambridge, UK: Cambridge Univ. Press
van der Vaart AW. 2002. Semiparametric statistics. Lectures on Probability Theory and Statistics: École d'Eté de Probabilités de Saint-Flour XXIX—1999 E Bolthausen, E Perkins, A van der Vaart331–457 New York: Springer
[Google Scholar]
Voight BF, Peloso GM, Orho-Melander M, Frikke-Schmidt R, Barbalic M et al. 2012. Plasma HDL cholesterol and risk of myocardial infarction: a Mendelian randomisation study. Lancet 380:9841572–80
[Google Scholar]
Williamson E, Forbes A, Wolfe R. 2012. Doubly robust estimators of causal exposure effects with missing data in the outcome, exposure or a confounder. Stat. Med. 31:304382–400
[Google Scholar]
Zhang Z, Liu W, Zhang B, Tang L, Zhang J. 2016. Causal inference with missing exposure information: methods and applications to an obstetric study. Stat. Methods Med. Res. 25:52053–66
[Google Scholar]

/content/journals/10.1146/annurev-statistics-031017-100353

Handling Missing Data in Instrumental Variable Methods for Causal Inference

Annual Review of Statistics and Its Application 6, 125 (2019); https://doi.org/10.1146/annurev-statistics-031017-100353

/content/journals/10.1146/annurev-statistics-031017-100353

Data & Media loading...

Supplemental Material

Supplementary Data

Download Supplemental Appendix 1 (PDF).

Article Type: Review Article

Most Cited Most Cited RSS feed

- Functional Data Analysis
  
  Jane-Ling Wang, Jeng-Min Chiou, and Hans-Georg Müller
  
  Vol. 3 (2016), pp. 257–295
- Probabilistic Forecasting
  
  Tilmann Gneiting, and Matthias Katzfuss
  
  Vol. 1 (2014), pp. 125–151
- Bayesian Computing with INLA: A Review
  
  Håvard Rue, Andrea Riebler, Sigrunn H. Sørbye, Janine B. Illian, Daniel P. Simpson, and Finn K. Lindgren
  
  Vol. 4 (2017), pp. 395–421
- Functional Regression
  
  Jeffrey S. Morris
  
  Vol. 2 (2015), pp. 321–359
- Topological Data Analysis
  
  Larry Wasserman
  
  Vol. 5 (2018), pp. 501–532
- Algorithmic Fairness: Choices, Assumptions, and Definitions
  
  Shira Mitchell, Eric Potash, Solon Barocas, Alexander D'Amour, and Kristian Lum
  
  Vol. 8 (2021), pp. 141–163
- Microbiome, Metagenomics, and High-Dimensional Compositional Data Analysis
  
  Hongzhe Li
  
  Vol. 2 (2015), pp. 73–94
- Learning Deep Generative Models
  
  Ruslan Salakhutdinov
  
  Vol. 2 (2015), pp. 361–385
- On p-Values and Bayes Factors
  
  Leonhard Held, and Manuela Ott
  
  Vol. 5 (2018), pp. 393–419
- High-Dimensional Statistics with a View Toward Applications in Biology
  
  Peter Bühlmann, Markus Kalisch, and Lukas Meier
  
  Vol. 1 (2014), pp. 255–278
More Less

Annual Review of Statistics and Its Application

Volume 6, 2019

Review Article

Free

Handling Missing Data in Instrumental Variable Methods for Causal Inference

Abstract

Supplementary Data

Most Read This Month

Most Cited Most Cited RSS feed