A Practical Guide to Weak Instruments

Michael P. Keane; Timothy Neal

doi:10.1146/annurev-economics-092123-111021

Annual Review of Economics

Review Article

A Practical Guide to Weak Instruments

Michael P. Keane^1,2,4, and Timothy Neal^2,3,4
View Affiliations Hide Affiliations

Affiliations: ¹Carey School of Business and Department of Economics, Johns Hopkins University, Baltimore, Maryland, USA; email: [email protected] ²School of Economics, University of New South Wales, Sydney, New South Wales, Australia; email: [email protected] ³Institute for Climate Risk and Response, University of New South Wales, Sydney, New South Wales, Australia ⁴Australian Research Council (ARC) Centre of Excellence in Population Ageing Research (CEPAR), Sydney, New South Wales, Australia
Vol. 16 https://doi.org/10.1146/annurev-economics-092123-111021
© Copyright © 2024 by the author(s). All rights reserved

Abstract

We survey the weak instrumental variables (IV) literature with the aim of giving simple advice to applied researchers. This literature focuses heavily on the problem of size inflation in two-stage least squares (2SLS) two-tailed t-tests that arises if instruments are weak. A common standard for acceptable instrument strength is a first-stage F of 10, which renders this size inflation modest. However, 2SLS suffers from other important problems that exist at much higher levels of instrument strength. In particular, 2SLS standard errors tend to be artificially small in samples where the 2SLS estimate is close to ordinary least squares (OLS). This power asymmetry means the t-test has inflated power to detect false positive effects when the OLS bias is positive. The Anderson-Rubin (AR) test avoids this problem and should be used in lieu of the t-test even with strong instruments. We illustrate the practical importance of this issue in IV papers published in the American Economic Review from 2011 to 2023. Use of the AR test often reverses t-test results. In particular, IV estimates that are close to OLS and significant according to the t-test are often insignificant according to AR. We also show that for first-stage F in the 10–20 range there is a high probability that OLS estimates will be closer to the truth than 2SLS. Hence we advocate a higher standard of instrument strength in applied work.