1932

Abstract

In instrumental variable studies, missing instrument data are very common. For example, in the Wisconsin Longitudinal Study, one can use genotype data as a Mendelian randomization–style instrument, but this information is often missing when subjects do not contribute saliva samples or when the genotyping platform output is ambiguous. Here we review missing at random assumptions one can use to identify instrumental variable causal effects, and discuss various approaches for estimation and inference. We consider likelihood-based methods, regression and weighting estimators, and doubly robust estimators. The likelihood-based methods yield the most precise inference and are optimal under the model assumptions, while the doubly robust estimators can attain the nonparametric efficiency bound while allowing flexible nonparametric estimation of nuisance functions (e.g., instrument propensity scores). The regression and weighting estimators can sometimes be easiest to describe and implement. Our main contribution is an extensive review of this wide array of estimators under varied missing-at-random assumptions, along with discussion of asymptotic properties and inferential tools. We also implement many of the estimators in an analysis of the Wisconsin Longitudinal Study, to study effects of impaired cognitive functioning on depression.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-statistics-031017-100353
2019-03-07
2024-10-06
Loading full text...

Full text loading...

/deliver/fulltext/statistics/6/1/annurev-statistics-031017-100353.html?itemId=/content/journals/10.1146/annurev-statistics-031017-100353&mimeType=html&fmt=ahah

Literature Cited

  1. Aaslund O, Grønquist H. 2010. Family size and child outcomes: is there really no trade-off. Labour Econ. 17:130–39
    [Google Scholar]
  2. Abadie A. 2003. Semiparametric instrumental variable estimation of treatment response models. J. Econom. 113:2231–63
    [Google Scholar]
  3. Abadie A, Imbens GW. 2006. Large sample properties of matching estimators for average treatment effects. Econometrica 74:1235–67
    [Google Scholar]
  4. Andridge RR, Little RJ. 2010. A review of hot deck imputation for survey nonresponse. Int. Stat. Rev. 78:140–64
    [Google Scholar]
  5. Angrist JD, Imbens GW. 1995. Two-stage least squares estimation of average causal effects in models with variable treatment intensity. J. Am. Stat. Assoc. 90:430431–42
    [Google Scholar]
  6. Angrist JD, Imbens GW, Rubin DB. 1996. Identification of causal effects using instrumental variables. J. Am. Stat. Assoc. 91:434444–55
    [Google Scholar]
  7. Baiocchi M, Cheng J, Small DS. 2014. Instrumental variable methods for causal inference. Stat. Med. 33:132297–340
    [Google Scholar]
  8. Becker G. 1960. An economic analysis of fertility. Demographic and Economic Change in Developed Countries20940 New York: Columbia Univ. Press
    [Google Scholar]
  9. Bickel PJ, Klaassen CA, Ritov Y, Wellner JA. 1993. Efficient and Adaptive Estimation for Semiparametric Models Baltimore, MD: Johns Hopkins Univ. Press
    [Google Scholar]
  10. Black W, Devereux P, Salvanes K. 2005. The more the merrier? The effects of family size and birth order on children's education. Q. J. Econ. 120:669–700
    [Google Scholar]
  11. Brookhart M, Rassen J, Schneeweiss S. 2010. Instrumental variable methods in comparative safety and effectiveness research. Pharmacoepidemiol. Drug Saf. 19:6537–54
    [Google Scholar]
  12. Burgess S, Seaman S, Lawlor DA, Casas JP, Thompson SG. 2011. Missing data methods in Mendelian randomization studies with multiple instruments. Am. J. Epidemiol. 174:91069–76
    [Google Scholar]
  13. Burgess S, Thompson SG. 2015. Mendelian Randomization: Methods for Using Genetic Variants in Causal Estimation Boca Raton, FL: CRC
    [Google Scholar]
  14. Caceres-Delpiano J. 2006. The impacts of family size on investment in child quality. J. Hum. Resourc. 41:738–54
    [Google Scholar]
  15. Carroll RJ, Ruppert D, Stefanski LA, Crainiceanu CM. 2006. Measurement Error in Nonlinear Models: A Modern Perspective Boca Raton, FL: CRC
    [Google Scholar]
  16. Chaudhuri S, Guilkey DK. 2016. GMM with multiple missing variables. J. Appl. Econom. 31:4678–706
    [Google Scholar]
  17. Chib S, Greenberg E. 2007. Semiparametric modeling and estimation of instrumental variable models. J. Comput. Graph. Stat. 16:186–114
    [Google Scholar]
  18. Chib S, Hamilton B. 2002. Semiparametric Bayes analysis of longitudinal data treatment models. J. Econom. 110:167–89
    [Google Scholar]
  19. Conley T, Hansen CB, McCulloch R, Rossi P. 2008. A semiparametric Bayesian approach to the instrumental variable problem. J. Econom. 144:1276–305
    [Google Scholar]
  20. Daniels MJ, Hogan JW. 2008. Missing Data in Longitudinal Studies: Strategies for Bayesian Modeling and Sensitivity Analysis Boca Raton, FL: CRC Press
    [Google Scholar]
  21. Didelez V, Sheehan N. 2007. Mendelian randomization as an instrumental variable approach to causal inference. Stat. Methods Med. Res. 16:4309–30
    [Google Scholar]
  22. Farlow M, He Y, Tekin S, Xu J, Lane R, Charles H. 2004. Impact of APOE in mild cognitive impairment. Neurology 63:101898–901
    [Google Scholar]
  23. Ganguli M, Dodge HH, Shen C, DeKosky ST. 2004. Mild cognitive impairment, amnestic type: an epidemiologic study. Neurology 63:1115–21
    [Google Scholar]
  24. Hahn J. 1998. On the role of the propensity score in efficient semiparametric estimation of average treatment effects. Econometrica 66:2315–31
    [Google Scholar]
  25. Herd P, Carr D, Roan C. 2014. Cohort profile: Wisconsin longitudinal study (WLS). Int. J. Epidemiol. 43:134–41
    [Google Scholar]
  26. Hernán MA, Robins JM. 2006. Instruments for causal inference: an epidemiologist's dream. Epidemiology 17:4360–72
    [Google Scholar]
  27. Hirano K, Imbens GW, Ridder G. 2003. Efficient estimation of average treatment effects using the estimated propensity score. Econometrica 71:41161–89
    [Google Scholar]
  28. Holland PW. 1988. Causal inference, path analysis and recursive structural equations models. Socoiol. Methodol. 18:449–84
    [Google Scholar]
  29. Horvitz DG, Thompson DJ. 1952. A generalization of sampling without replacement from a finite universe. J. Am. Stat. Assoc. 47:260663–85
    [Google Scholar]
  30. Imbens GW, Angrist JD. 1994. Identification and estimation of local average treatment effects. Econometrica 62:2467–75
    [Google Scholar]
  31. Kennedy EH. 2018. Efficient nonparametric causal inference with missing exposures. arXiv:1802.08952 [stat.ME]
    [Google Scholar]
  32. Kennedy EH, Ma Z, McHugh MD, Small DS. 2017. Nonparametric methods for doubly robust estimation of continuous treatment effects. J. R. Stat. Soc. B 79:41229–45
    [Google Scholar]
  33. Kennedy EH, Small DS. 2017. Paradoxes in instrumental variable studies with missing data and one-sided noncompliance. arXiv:1705.00506 [stat.ME]
    [Google Scholar]
  34. Kraay A. 2012. Instrumental variables regressions with uncertain exclusion restrictions: a Bayesian approach. J. Appl. Econom. 27:1108–28
    [Google Scholar]
  35. Lawlor DA, Harbord RM, Sterne JA, Timpson N, Davey Smith G. 2008. Mendelian randomization: using genes as instruments for making causal inferences in epidemiology. Stat. Med. 27:81133–63
    [Google Scholar]
  36. Li L, Shen C, Li X, Robins JM. 2013. On weighting approaches for missing data. Stat. Methods Med. Res. 22:114–30
    [Google Scholar]
  37. Little RJ, Rubin DB. 2014. Statistical Analysis with Missing Data New York: Wiley
    [Google Scholar]
  38. Lopes H, Polson N. 2014. Bayesian instrumental variables: priors and likelihoods. Econom. Rev. 33:1100–21
    [Google Scholar]
  39. McKeigue PM, Campbell H, Wild S, Vitart V, Hayward C et al. 2010. Bayesian methods for instrumental variable analysis with genetic instruments (‘Mendelian randomization’): example with urate transporter SLC2A9 as an instrumental variable for effect of urate levels on metabolic syndrome. Int. J. Epidemiol. 39:3907–18
    [Google Scholar]
  40. Mogstad M, Wiswall M. 2012. Instrumental variables estimation with partially missing instruments. Econ. Lett. 114:2186–89
    [Google Scholar]
  41. Molinari F. 2010. Missing treatments. J. Bus. Econ. Stat. 28:182–95
    [Google Scholar]
  42. Ogburn EL, Rotnitzky A, Robins JM. 2015. Doubly robust estimation of the local average treatment effect curve. J. R. Stat. Soc. B 77:2373–96
    [Google Scholar]
  43. Okui R, Small DS, Tan Z, Robins JM. 2012. Doubly robust instrumental variable regression. Stat. Sin. 22:1173–205
    [Google Scholar]
  44. Radloff LS. 1977. The CES-D scale: a self-report depression scale for research in the general population. Appl. Psychol. Meas. 1:3385–401
    [Google Scholar]
  45. Robins JM. 1994. Correcting for noncompliance in randomized trials using structural nested mean models. Commun. Stat. Theory Methods 23:82379–412
    [Google Scholar]
  46. Robins JM, Li L, Mukherjee R, Tchetgen Tchetgen E, van der Vaart AW. 2017. Minimax estimation of a functional on a structured high dimensional model. Ann. Stat. 45:51951–87
    [Google Scholar]
  47. Robins JM, Li L, Tchetgen Tchetgen EJ, van der Vaart AW. 2008. Higher order influence functions and minimax estimation of nonlinear functionals. Probability and Statistics: Essays in Honor of David A. Freedman D Nolan, T Speed335–421 Beachwood, OH: Inst. Math. Stat.
    [Google Scholar]
  48. Robins JM, Rotnitzky A. 1995. Semiparametric efficiency in multivariate regression models with missing data. J. Am. Stat. Assoc. 90:429122–29
    [Google Scholar]
  49. Robins JM, Rotnitzky A. 2001. Comment on the Bickel and Kwon article, Inference for semiparametric models: Some questions and an answer. Stat. Sin. 11:920–36
    [Google Scholar]
  50. Robins JM, Rotnitzky A, Scharfstein DO. 2000. Sensitivity analysis for selection bias and unmeasured confounding in missing data and causal inference models. Statistical Models in Epidemiology, the Environment, and Clinical Trials ME Halloran, D Berry1–94 New York: Springer
    [Google Scholar]
  51. Robins JM, Rotnitzky A, Zhao LP. 1994. Estimation of regression coefficients when some regressors are not always observed. J. Am. Stat. Assoc. 89:427846–66
    [Google Scholar]
  52. Robins JM, Rotnitzky A, Zhao LP. 1995. Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. J. Am. Stat. Assoc. 90:429106–21
    [Google Scholar]
  53. Rosenzweig M, Wolpin K. 1980. Testing the quantity-quality fertility model: the use of twins as a natural experiment. Econonmetrica 48:227–40
    [Google Scholar]
  54. Rubin DB. 1976. Inference and missing data. Biometrika 63:3581–92
    [Google Scholar]
  55. Rubin DB. 1996. Multiple imputation after 18+ years. J. Am. Stat. Assoc. 91:434473–89
    [Google Scholar]
  56. Rubin DB, van der Laan MJ. 2005.A general imputation methodology for nonparametric regression with censored data. Work. Pap. 194, Div. Biostat., Univ. Calif., Berkeley
  57. Scharfstein DO, Rotnitzky A, Robins JM. 1999. Adjusting for nonignorable drop-out using semiparametric nonresponse models. J. Am. Stat. Assoc. 94:4481096–120
    [Google Scholar]
  58. Small DS. 2007. Sensitivity analysis for instrumental variables regression with overidentifying restrictions. J. Am. Stat. Assoc. 102:4781049–58
    [Google Scholar]
  59. Smith GD, Ebrahim S. 2003. Mendelian randomization: Can genetic epidemiology contribute to understanding environmental determinants of disease. Int. J. Epidemiol. 32:11–22
    [Google Scholar]
  60. Tan Z. 2006. Regression and weighting methods for causal inference using instrumental variables. J. Am. Stat. Assoc. 101:4761607–18
    [Google Scholar]
  61. Tan Z. 2010. Marginal and nested structural models using instrumental variables. J. Am. Stat. Assoc. 105:489157–69
    [Google Scholar]
  62. Tsiatis AA. 2006. Semiparametric Theory and Missing Data New York: Springer
    [Google Scholar]
  63. van der Laan MJ. 2013. Targeted learning of an optimal dynamic treatment, and statistical inference for its mean outcome. Work. Pap. 317, Div. Biostat., Univ. Calif., Berkeley
    [Google Scholar]
  64. van der Laan MJ, Robins JM. 2003. Unified Methods for Censored Longitudinal Data and Causality New York: Springer
    [Google Scholar]
  65. van der Vaart AW. 2000. Asymptotic Statistics Cambridge, UK: Cambridge Univ. Press
    [Google Scholar]
  66. van der Vaart AW. 2002. Semiparametric statistics. Lectures on Probability Theory and Statistics: École d'Eté de Probabilités de Saint-Flour XXIX—1999 E Bolthausen, E Perkins, A van der Vaart331–457 New York: Springer
    [Google Scholar]
  67. Voight BF, Peloso GM, Orho-Melander M, Frikke-Schmidt R, Barbalic M et al. 2012. Plasma HDL cholesterol and risk of myocardial infarction: a Mendelian randomisation study. Lancet 380:9841572–80
    [Google Scholar]
  68. Williamson E, Forbes A, Wolfe R. 2012. Doubly robust estimators of causal exposure effects with missing data in the outcome, exposure or a confounder. Stat. Med. 31:304382–400
    [Google Scholar]
  69. Zhang Z, Liu W, Zhang B, Tang L, Zhang J. 2016. Causal inference with missing exposure information: methods and applications to an obstetric study. Stat. Methods Med. Res. 25:52053–66
    [Google Scholar]
/content/journals/10.1146/annurev-statistics-031017-100353
Loading
/content/journals/10.1146/annurev-statistics-031017-100353
Loading

Data & Media loading...

Supplementary Data

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error