1932

Abstract

Instrumental variables (IVs) are widely used to study the causal effect of an exposure on an outcome in the presence of unmeasured confounding. IVs require an instrument, a variable that () is associated with the exposure, () has no direct effect on the outcome except through the exposure, and () is not related to unmeasured confounders. Unfortunately, finding variables that satisfy conditions or can be challenging in practice. This article reviews works where instruments may not satisfy conditions or , which we refer to as invalid instruments. We review identification and inference under different violations of or , specifically under linear models, nonlinear models, and heteroskedastic models. We conclude with an empirical comparison of various methods by reanalyzing the effect of body mass index on systolic blood pressure from the UK Biobank.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-statistics-112723-034721
2025-03-07
2025-04-19
Loading full text...

Full text loading...

/deliver/fulltext/statistics/12/1/annurev-statistics-112723-034721.html?itemId=/content/journals/10.1146/annurev-statistics-112723-034721&mimeType=html&fmt=ahah

Literature Cited

  1. Anatolyev S. 2013.. Instrumental variables estimation and inference in the presence of many exogenous regressors. . Econom. J. 16:(1):2772
    [Crossref] [Google Scholar]
  2. Anderson TW, Rubin H. 1949.. Estimation of the parameters of a single equation in a complete system of stochastic equations. . Ann. Math. Stat. 20::4663
    [Crossref] [Google Scholar]
  3. Andrews DWK. 1999.. Consistent moment selection procedures for generalized method of moments estimation. . Econometrica 67:(3):54363
    [Crossref] [Google Scholar]
  4. Andrews I, Stock JH, Sun L. 2019.. Weak instruments in instrumental variables regression: theory and practice. . Annu. Rev. Econ. 11::72753
    [Crossref] [Google Scholar]
  5. Angrist JD, Imbens GW, Rubin DB. 1996.. Identification of causal effects using instrumental variables. . J. Am. Stat. Assoc. 91:(434):44455
    [Crossref] [Google Scholar]
  6. Angrist JD, Pischke JS. 2009.. Mostly Harmless Econometrics: An Empiricist's Companion. Princeton, NJ:: Princeton Univ. Press
    [Google Scholar]
  7. Armstrong TB, Kolesár M. 2021.. Sensitivity analysis using approximate moment condition models. . Quant. Econ. 12:(1):77108
    [Crossref] [Google Scholar]
  8. Ashley RA, Parmeter CF. 2015.. Sensitivity analysis for inference in 2SLS/GMM estimation with possibly flawed instruments. . Empir. Econ. 49:(4):115371
    [Crossref] [Google Scholar]
  9. Baiocchi M, Cheng J, Small DS. 2014.. Instrumental variable methods for causal inference. . Stat. Med. 33:(13):2297340
    [Crossref] [Google Scholar]
  10. Baiocchi M, Small DS, Lorch S, Rosenbaum PR. 2010.. Building a stronger instrument in an observational study of perinatal care for premature infants. . J. Am. Stat. Assoc. 105:(492):128596
    [Crossref] [Google Scholar]
  11. Bao Y, Clarke PS, Smart M, Kumari M. 2019.. Assessing the robustness of sisVIVE in a Mendelian randomization study to estimate the causal effect of body mass index on income using multiple SNPs from understanding society. . Stat. Med. 38:(9):152942
    [Crossref] [Google Scholar]
  12. Berk R, Brown L, Buja A, Zhang K, Zhao L. 2013.. Valid post-selection inference. . Ann. Stat. 41:(2):80237
    [Crossref] [Google Scholar]
  13. Berkowitz D, Caner M, Fang Y. 2008.. Are ``nearly exogenous instruments'' reliable?. Econ. Lett. 101:(1):2023
    [Crossref] [Google Scholar]
  14. Berkowitz D, Caner M, Fang Y. 2012.. The validity of instruments revisited. . J. Econom. 166:(2):25566
    [Crossref] [Google Scholar]
  15. Bound J, Jaeger DA, Baker RM. 1995.. Problems with instrumental variables estimation when the correlation between instruments and the endogenous variable is weak. . J. Am. Stat. Assoc. 90::44350
    [Google Scholar]
  16. Bowden J, Davey Smith G, Burgess S. 2015.. Mendelian randomization with invalid instruments: effect estimation and bias detection through Egger regression. . Int. J. Epidemiol. 44:(2):51225
    [Crossref] [Google Scholar]
  17. Bowden J, Davey Smith G, Haycock PC, Burgess S. 2016.. Consistent estimation in Mendelian randomization with some invalid instruments using a weighted median estimator. . Genet. Epidemiol. 40:(4):30414
    [Crossref] [Google Scholar]
  18. Bowden J, Del Greco MF, Minelli C, Davey Smith G, Sheehan N, Thompson J. 2017.. A framework for the investigation of pleiotropy in two-sample summary data Mendelian randomization. . Stat. Med. 36:(11):1783802
    [Crossref] [Google Scholar]
  19. Burgess S, Butterworth A, Thompson SG. 2013.. Mendelian randomization analysis with multiple genetic variants using summarized data. . Genet. Epidemiol. 37:(7):65865
    [Crossref] [Google Scholar]
  20. Burgess S, Zuber V, Gkatzionis A, Foley CN. 2018.. Modal-based estimation via heterogeneity-penalized weighting: model averaging for consistent and efficient estimation in Mendelian randomization when a plurality of candidate instruments are valid. . Int. J. Epidemiol. 47:(4):124254
    [Crossref] [Google Scholar]
  21. Carl D, Emmenegger C, Bühlmann P, Guo Z. 2023.. TSCI: two stage curvature identification for causal inference with invalid instruments. . arXiv:2304.00513 [stat.ME]
  22. Cheng X, Liao Z. 2015.. Select the valid and relevant moments: an information-based lasso for GMM with many moments. . J. Econom. 186:(2):44364
    [Crossref] [Google Scholar]
  23. Conley TG, Hansen CB, Rossi PE. 2012.. Plausibly exogenous. . Rev. Econ. Stat. 94:(1):26072
    [Crossref] [Google Scholar]
  24. Davey Smith G, Ebrahim S. 2003.. Mendelian randomization: Can genetic epidemiology contribute to understanding environmental determinants of disease?. Int. J. Epidemiol. 32:(1):122
    [Crossref] [Google Scholar]
  25. Davey Smith G, Ebrahim S. 2004.. Mendelian randomization: prospects, potentials, and limitations. . Int. J. Epidemiol. 33:(1):3042
    [Crossref] [Google Scholar]
  26. Davidson R, MacKinnon JG. 1993.. Estimation and Inference in Econometrics. Oxford, UK:: Oxford Univ. Press
    [Google Scholar]
  27. DiTraglia FJ. 2016.. Using invalid instruments on purpose: focused moment selection and averaging for GMM. . J. Econom. 195:(2):187208
    [Crossref] [Google Scholar]
  28. Efron B, Hastie T, Johnstone I, Tibshirani R. 2004.. Least angle regression. . Ann. Stat. 32:(2):40799
    [Crossref] [Google Scholar]
  29. Fan Q, Wu Y. 2024.. Endogenous treatment effect estimation with a large and mixed set of instruments and control variables. . Rev. Econ. Stat. 106:(6):165574
    [Crossref] [Google Scholar]
  30. Fogarty CB, Lee K, Kelz RR, Keele LJ. 2021.. Biased encouragements and heterogeneous effects in an instrumental variable study of emergency general surgical outcomes. . J. Am. Stat. Assoc. 116:(536):162536
    [Crossref] [Google Scholar]
  31. Goldberger AS. 1972.. Structural equation methods in the social sciences. . Econometrica 40:(6):9791001
    [Crossref] [Google Scholar]
  32. Guggenberger P. 2012.. On the asymptotic size distortion of tests when instruments locally violate the exogeneity assumption. . Econom. Theory 28:(2):387421
    [Crossref] [Google Scholar]
  33. Guo Z. 2023.. Causal inference with invalid instruments: post-selection problems and a solution using searching and sampling. . J. R. Stat. Soc. Ser. B 85:(3):95985
    [Crossref] [Google Scholar]
  34. Guo Z, Kang H, Cai TT, Small DS. 2018.. Confidence intervals for causal effects with invalid instruments by using two-stage hard thresholding with voting. . J. R. Stat. Soc. Ser. B 80:(4):793815
    [Crossref] [Google Scholar]
  35. Guo Z, Li X, Han L, Cai T. 2023.. Robust inference for federated meta-learning. . arXiv:2301.00718 [stat.ME]
  36. Guo Z, Zheng M, Bühlmann P. 2022.. Robustness against weak or invalid instruments: exploring nonlinear treatment models with machine learning. . arXiv:2203.12808 [stat.ME]
  37. Hahn J, Hausman J. 2005.. Estimation with valid and invalid instruments. . Ann. Écono. Stat. (79/80):2557
    [Google Scholar]
  38. Han C. 2008.. Detecting invalid instruments using L1-GMM. . Econ. Lett. 101:(3):28587
    [Crossref] [Google Scholar]
  39. Hansen LP. 1982.. Large sample properties of generalized method of moments estimators. . Econometrica 50:(4):102954
    [Crossref] [Google Scholar]
  40. Hartwig FP, Davey Smith G, Bowden J. 2017.. Robust inference in summary data Mendelian randomization via the zero modal pleiotropy assumption. . Int. J. Epidemiol. 46:(6):198598
    [Crossref] [Google Scholar]
  41. Hemani G, Bowden J, Davey Smith G. 2018.. Evaluating the potential role of pleiotropy in Mendelian randomization studies. . Hum. Mol. Genet. 27:(R2):R195208
    [Crossref] [Google Scholar]
  42. Hernán MA, Robins JM. 2006.. Instruments for causal inference: an epidemiologist's dream?. Epidemiology 17:(4):36072
    [Crossref] [Google Scholar]
  43. Holland PW. 1988.. Causal inference, path analysis, and recursive structural equations models. . Sociol. Methodol. 18:(1):44984
    [Crossref] [Google Scholar]
  44. Imbens GW, Angrist JD. 1994.. Identification and estimation of local average treatment effects. . Econometrica 62:(2):46775
    [Crossref] [Google Scholar]
  45. Kang H. 2017.. sisVIVE: some invalid some valid instrumental variables estimator. . R package, version 1.4. https://cran.r-project.org/web/packages/sisVIVE/
    [Google Scholar]
  46. Kang H, Jiang Y, Zhao Q, Small DS. 2021.. ivmodel: An R package for inference and sensitivity analysis of instrumental variables models with one endogenous variable. . Obs. Stud. 7:(2):124
    [Crossref] [Google Scholar]
  47. Kang H, Kreuels B, May J, Small DS. 2016a.. Full matching approach to instrumental variables estimation with application to the effect of malaria on stunting. . Ann. Appl. Stat. 10:(1):33564
    [Crossref] [Google Scholar]
  48. Kang H, Lee Y, Cai TT, Small DS. 2022.. Two robust tools for inference about causal effects with invalid instruments. . Biometrics 78:(1):2434
    [Crossref] [Google Scholar]
  49. Kang H, Zhang A, Cai TT, Small DS. 2016b.. Instrumental variables estimation with some invalid instruments and its application to Mendelian randomization. . J. Am. Stat. Assoc. 111:(513):13244
    [Crossref] [Google Scholar]
  50. Kolesár M, Chetty R, Friedman J, Glaeser E, Imbens GW. 2015.. Identification and inference with many invalid instruments. . J. Bus. Econ. Stat. 33:(4):47484
    [Crossref] [Google Scholar]
  51. Koo T, Lee Y, Small DS, Guo Z. 2023.. RobustIV and controlfunctionIV: causal inference for linear and nonlinear models with invalid instrumental variables. . Obs. Stud. 9:(4):97120
    [Crossref] [Google Scholar]
  52. Leeb H, Pötscher BM. 2005.. Model selection and inference: facts and fiction. . Econom. Theory 21:(1):2159
    [Crossref] [Google Scholar]
  53. Lewbel A. 2012.. Using heteroscedasticity to identify and estimate mismeasured and endogenous regressor models. . J. Bus. Econ. Stat. 30:(1):6780
    [Crossref] [Google Scholar]
  54. Lewbel A. 2019.. The identification zoo: meanings of identification in econometrics. . J. Econ. Lit. 57:(4):835903
    [Crossref] [Google Scholar]
  55. Liao Z. 2013.. Adaptive GMM shrinkage estimation with consistent moment selection. . Econom. Theory 29:(5):857904
    [Crossref] [Google Scholar]
  56. Lin Y, Jeon Y. 2006.. Random forests and adaptive nearest neighbors. . J. Am. Stat. Assoc. 101:(474):57890
    [Crossref] [Google Scholar]
  57. Lin Y, Windmeijer F, Song X, Fan Q. 2024.. On the instrumental variable estimation with many weak and invalid instruments. . J. R. Stat. Soc. Ser. B 2024::qkae025
    [Google Scholar]
  58. Liu Z, Ye T, Sun B, Schooling M, Tchetgen Tchetgen E. 2023.. Mendelian randomization mixed-scale treatment effect robust identification and estimation for causal inference. . Biometrics 79:(3):220819
    [Crossref] [Google Scholar]
  59. Locke AE, Kahali B, Berndt SI, Justice AE, Pers TH, et al. 2015.. Genetic studies of body mass index yield new insights for obesity biology. . Nature 518:(7538):197206
    [Crossref] [Google Scholar]
  60. Meinshausen N. 2006.. Quantile regression forests. . J. Mach. Learn. Res. 7:(35):98399
    [Google Scholar]
  61. Moreira MJ. 2003.. A conditional likelihood ratio test for structural models. . Econometrica 71:(4):102748
    [Crossref] [Google Scholar]
  62. Murray MP. 2006.. Avoiding invalid instruments and coping with weak instruments. . J. Econ. Perspect. 20:(4):11132
    [Crossref] [Google Scholar]
  63. Newey WK, Windmeijer F. 2009.. Generalized method of moments with many weak moment conditions. . Econometrica 77:(3):687719
    [Crossref] [Google Scholar]
  64. Patel A, Ditraglia F, Zuber V, Burgess S. 2024.. Selection of invalid instruments can improve estimation in Mendelian randomization. . Ann. Appl. Stat. 18:(2):23-aoas1856
    [Crossref] [Google Scholar]
  65. Pierce BL, Burgess S. 2013.. Efficient design for Mendelian randomization studies: subsample and 2-sample instrumental variable estimators. . Am. J. Epidemiol. 178:(7):117784
    [Crossref] [Google Scholar]
  66. Robins JM. 1994.. Correcting for non-compliance in randomized trials using structural nested mean models. . Commun. Stat. 23:(8):2379412
    [Crossref] [Google Scholar]
  67. Robins JM, Mark SD, Newey WK. 1992.. Estimating exposure effects by modelling the expectation of exposure conditional on confounders. . Biometrics 48:(2):47995
    [Crossref] [Google Scholar]
  68. Rubin DB. 1980.. Comment on “Randomized analysis of experimental data: the Fisher randomization test.”. . J. Am. Stat. Assoc. 75:(371):59193
    [Google Scholar]
  69. Sanderson E, Glymour MM, Holmes MV, Kang H, Morrison J, et al. 2022.. Mendelian randomization. . Nat. Rev. Methods Primers 2:(1):6
    [Crossref] [Google Scholar]
  70. Slob EA, Burgess S. 2020.. A comparison of robust Mendelian randomization methods using summary data. . Genet. Epidemiol. 44:(4):31329
    [Crossref] [Google Scholar]
  71. Small DS. 2007.. Sensitivity analysis for instrumental variables regression with overidentifying restrictions. . J. Am. Stat. Assoc. 102:(479):104958
    [Crossref] [Google Scholar]
  72. Small DS, Rosenbaum PR. 2008.. War and wages: the strength of instrumental variables and their sensitivity to unobserved biases. . J. Am. Stat. Assoc. 103:(483):92433
    [Crossref] [Google Scholar]
  73. Solovieff N, Cotsapas C, Lee PH, Purcell SM, Smoller JW. 2013.. Pleiotropy in complex traits: challenges and strategies. . Nat. Rev. Genet. 14:(7):48395
    [Crossref] [Google Scholar]
  74. Staiger D, Stock JH. 1997.. Instrumental variables regression with weak instruments. . Econometrica 65:(3):55786
    [Crossref] [Google Scholar]
  75. Stock JH, Wright JH, Yogo M. 2002.. A survey of weak instruments and weak identification in generalized method of moments. . J. Bus. Econ. Stat. 20:(4):51829
    [Crossref] [Google Scholar]
  76. Sudlow C, Gallacher J, Allen N, Beral V, Burton P, et al. 2015.. UK Biobank: an open access resource for identifying the causes of a wide range of complex diseases of middle and old age. . PLOS Med. 12:(3):e1001779
    [Crossref] [Google Scholar]
  77. Sun B, Cui Y, Tchetgen Tchetgen E. 2022.. Selective machine learning of the average treatment effect with an invalid instrumental variable. . J. Mach. Learn. Res. 23:(204):140
    [Google Scholar]
  78. Sun B, Liu Z, Tchetgen Tchetgen EJ. 2023.. Semiparametric efficient G-estimation with invalid instrumental variables. . Biometrika 110:(4):95371
    [Crossref] [Google Scholar]
  79. Tchetgen Tchetgen E, Sun B, Walter S. 2021.. The GENIUS approach to robust Mendelian randomization inference. . Stat. Sci. 36:(3):44364
    [Crossref] [Google Scholar]
  80. Tyrrell J, Jones SE, Beaumont R, Astley CM, Lovell R, et al. 2016.. Height, body mass index, and socioeconomic status: Mendelian randomisation study in UK Biobank. . BMJ 352::i582
    [Crossref] [Google Scholar]
  81. Wager S, Athey S. 2018.. Estimation and inference of heterogeneous treatment effects using random forests. . J. Am. Stat. Assoc. 113:(523):122842
    [Crossref] [Google Scholar]
  82. Windmeijer F, Farbmacher H, Davies N, Davey Smith G. 2019.. On the use of the lasso for instrumental variables estimation with some invalid instruments. . J. Am. Stat. Assoc. 114:(527):133950
    [Crossref] [Google Scholar]
  83. Windmeijer F, Liang X, Hartwig FP, Bowden J. 2021.. The confidence interval method for selecting valid instrumental variables. . J. R. Stat. Soc. Ser. B 83:(4):75276
    [Crossref] [Google Scholar]
  84. Wooldridge JM. 2010.. Econometric Analysis of Cross Section and Panel Data. Cambridge, MA:: MIT Press. , 2nd ed..
    [Google Scholar]
  85. Yao M, Guo Z, Liu Z. 2023.. Robust Mendelian randomization analysis by automatically selecting valid genetic instruments for inferring causal relationships between complex traits and diseases. . medRxiv 2023.02.20.23286200
  86. Ye T, Liu Z, Sun B, Tchetgen Tchetgen E. 2024.. GENIUS-MAWII: for robust Mendelian randomization with many weak invalid instruments. . J. R. Stat. Soc. Ser. B 2024::qkae024
    [Google Scholar]
  87. Ye T, Shao J, Kang H. 2021.. Debiased inverse-variance weighted estimator in two-sample summary-data Mendelian randomization. . Ann. Stat. 49:(4):2079100
    [Crossref] [Google Scholar]
  88. Zhang X, Wang L, Volgushev S, Kong D. 2022.. Fighting noise with noise: causal inference with many candidate instruments. . arXiv:2203.09330 [stat.ME]
  89. Zhao Q, Wang J, Hemani G, Bowden J, Small DS. 2020.. Statistical inference in two-sample summary-data Mendelian randomization using robust adjusted profile score. . Ann. Stat. 48:(3):174269
    [Crossref] [Google Scholar]
  90. Zou H. 2006.. The adaptive lasso and its oracle properties. . J. Am. Stat. Assoc. 101:(476):141829
    [Crossref] [Google Scholar]
/content/journals/10.1146/annurev-statistics-112723-034721
Loading
/content/journals/10.1146/annurev-statistics-112723-034721
Loading

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error