1932

Abstract

Social scientists commonly use computational models to estimate proxies of unobserved concepts, then incorporate these proxies into subsequent tests of their theories. The consequences of this practice, which occurs in over two-thirds of recent computational work in political science, are underappreciated. Imperfect proxies can reflect noise and contamination from other concepts, producing biased point estimates and standard errors. We demonstrate how analysts can use causal diagrams to articulate theoretical concepts and their relationships to estimated proxies, then apply straightforward rules to assess which conclusions are rigorously supportable. We formalize and extend common heuristics for “signing the bias”—a technique for reasoning about unobserved confounding—to scenarios with imperfect proxies. Using these tools, we demonstrate how, in often-encountered research settings, proxy-based analyses allow for valid tests for the existence and direction of theorized effects. We conclude with best-practice recommendations for the rapidly growing literature using learned proxies to test causal theories.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-polisci-051120-111443
2022-05-12
2024-03-29
Loading full text...

Full text loading...

/deliver/fulltext/polisci/25/1/annurev-polisci-051120-111443.html?itemId=/content/journals/10.1146/annurev-polisci-051120-111443&mimeType=html&fmt=ahah

Literature Cited

  1. Adcock R, Collier D. 2001. Measurement validity: a shared standard for qualitative and quantitative research. Am. Political Sci. Rev 95:529–46
    [Google Scholar]
  2. Angrist JD, Imbens GW, Rubin DB. 1996. Identification of causal effects using instrumental variables. J. Am. Stat. Assoc 91:444–55
    [Google Scholar]
  3. Ansolabehere S, Lessem R, Snyder JM Jr 2006. The orientation of newspaper endorsements in US elections, 1940–2002. Q. J. Political Sci 1:393–404
    [Google Scholar]
  4. Blackwell M, Honaker J, King G. 2017. A unified approach to measurement error and missing data: overview and applications. Sociol. Methods Res 46:303–41
    [Google Scholar]
  5. Blei DM, Lafferty JD. 2007. A correlated topic model of science. Ann. Appl. Stat 1:17–35
    [Google Scholar]
  6. Blei DM, Ng AY, Jordan MI. 2003. Latent Dirichlet allocation. J. Mach. Learn. Res 3:993–1022
    [Google Scholar]
  7. Carlson D, Montgomery JM. 2017. A pairwise comparison framework for fast, flexible, and reliable human coding of political texts. Am. Political Sci. Rev 111:835–43
    [Google Scholar]
  8. Carpenter B, Gelman A, Hoffman MD, Lee D, Goodrich B et al. 2017. Stan: a probabilistic programming language. J. Stat. Softw 76:1–32
    [Google Scholar]
  9. Carroll RJ, Kenkel B. 2019. Prediction, proxies, and power. Am. J. Political Sci 63:577–93
    [Google Scholar]
  10. Caughey D, O'Grady T, Warshaw C 2019. Policy ideology in European mass publics, 1981–2016. Am. Political Sci. Rev 113:674–93
    [Google Scholar]
  11. Cheng CL, Van Ness JW. 1999. Statistical Regression with Measurement Error New York: Oxford Univ. Press
  12. Clinton JD. 2012. Using roll call estimates to test models of politics. Annu. Rev. Political Sci 15:79–99
    [Google Scholar]
  13. Duarte G, Finkelstein N, Knox D, Mummolo J, Shpitser I 2021. An automated approach to causal inference in discrete settings. arXiv:2109.13471 [stat.ME]
    [Google Scholar]
  14. Fong C, Tyler M. 2018. Machine learning predictions as regression covariates. Political Anal. 29:467–84
    [Google Scholar]
  15. Freedom House 2014. Freedom in the World 2014: The Annual Survey of Political Rights and Civil Liberties Lanham, MD: Rowman & Littlefield
  16. Gentzkow M, Shapiro JM. 2010. What drives media slant? Evidence from US daily newspapers. Econometrica 78:35–71
    [Google Scholar]
  17. Greenland S, Lash TL 2008. Bias analysis. Modern Epidemiology KJ Rothman, S Greenland, TL Lash 345–80 Philadelphia: Lippincott Williams & Wilkins
    [Google Scholar]
  18. Grimmer J, Roberts ME, Stewart BM. 2021. Machine learning for social science: an agnostic approach. Annu. Rev. Political Sci 24:395–419
    [Google Scholar]
  19. Grimmer J, Stewart BM. 2013. Text as data: the promise and pitfalls of automatic content analysis methods for political texts. Political Anal. 21:267–97
    [Google Scholar]
  20. Gurr TR. 1974. Persistence and change in political systems, 1800–1971. Am. Political Sci. Rev 68:1482–504
    [Google Scholar]
  21. Keele L, Stevenson RT, Elwert F. 2020. The causal interpretation of estimated associations in regression models. Political Sci. Res. Methods 8:1–13
    [Google Scholar]
  22. Knox D, Lucas C 2021. A dynamic model of speech for the social sciences. Am. Political Science Rev 115:2649–66
    [Google Scholar]
  23. Kuroki M, Pearl J. 2014. Measurement bias and effect restoration in causal inference. Biometrika 101:423–37
    [Google Scholar]
  24. Larcinese V, Puglisi R, Snyder JM Jr 2011. Partisan bias in economic news: evidence on the agenda-setting behavior of US newspapers. J. Public Econ 95:1178–89
    [Google Scholar]
  25. Martin GJ, McCrain J. 2019. Local news and national politics. Am. Political Sci. Rev 113:372–84
    [Google Scholar]
  26. Miao W, Geng Z, Tchetgen Tchetgen EJ. 2018. Identifying causal effects with proxy variables of an unmeasured confounder. Biometrika 105:987–93
    [Google Scholar]
  27. Motolinia L. 2021. Electoral accountability and particularistic legislation: evidence from an electoral reform in Mexico. Am. Political Sci. Rev 115:97–113
    [Google Scholar]
  28. Munck GL, Verkuilen J. 2002. Conceptualizing and measuring democracy: evaluating alternative indices. Comp. Political Stud 35:5–34
    [Google Scholar]
  29. Nyhan B, McGhee E, Sides J, Masket S, Greene S 2012. One vote out of step? The effects of salient roll call votes in the 2010 election. Am. Politics Res 40:844–79
    [Google Scholar]
  30. Pearl J. 1995. Causal diagrams for empirical research. Biometrika 82:669–88
    [Google Scholar]
  31. Pearl J. 2009. Causality Cambridge, UK: Cambridge Univ. Press
  32. Pearl J, Mackenzie D 2018. The Book of Why: The New Science of Cause and Effect New York: Basic Books
  33. Poole KT 2008. The evolving influence of psychometrics in political science. The Oxford Handbook of Political Methodology JM Box-Steffensmeier, HE Brady, D Collier , Vol. 10199–213 Oxford, UK: Oxford Univ. Press
    [Google Scholar]
  34. Poole KT, Rosenthal H. 1985. A spatial model for legislative roll call analysis. Am. J. Political Sci 29:357–84
    [Google Scholar]
  35. Puglisi R, Snyder JM. 2015. Empirical studies of media bias. Handbook of Media Economics, Vol. 1647–67 Amsterdam: Elsevier
    [Google Scholar]
  36. Reed W, Clark DH, Nordstrom T, Hwang W. 2008. War, power, and bargaining. J. Politics 70:1203–16
    [Google Scholar]
  37. Roberts ME, Stewart BM, Tingley D, Airoldi EM et al. 2013. The structural topic model and applied social science. Paper presented at Advances in Neural Information Processing Systems Workshop on Topic Models: Computation, Application, and Evaluation, Dec. 10, Lake Tahoe, NV
    [Google Scholar]
  38. Roberts ME, Stewart BM, Tingley D, Lucas C, Leder-Luis J et al. 2014. Structural topic models for open-ended survey responses. Am. J. Political Sci 58:1064–82
    [Google Scholar]
  39. Roberts ME, Stewart BM, Airoldi EM. 2016a. A model of text for experimentation in the social sciences. J. Am. Stat. Assoc 111:988–1003
    [Google Scholar]
  40. Roberts ME, Stewart B, Tingley D 2016b. Navigating the local modes of big data: the case of topic models. Computational Social Sciences RM Alvarez 51–97 New York: Cambridge Univ. Press
    [Google Scholar]
  41. Slapin JB, Proksch SO. 2008. A scaling model for estimating time-series party positions from texts. Am. J. Political Sci 52:705–22
    [Google Scholar]
  42. Tchetgen Tchetgen EJ, Ying A, Cui Y, Shi X, Miao W 2020. An introduction to proximal causal learning. arXiv:2009.10982 [stat.ME]
  43. Treier S, Jackman S. 2008. Democracy as a latent variable. Am. J. Political Sci 52:201–17
    [Google Scholar]
  44. VanderWeele TJ, Hernán MA. 2012. Results on differential and dependent measurement error of the exposure and the outcome using signed directed acyclic graphs. Am. J. Epidemiol 175:1303–10
    [Google Scholar]
  45. VanderWeele TJ, Hernán MA, Robins JM. 2008. Causal directed acyclic graphs and the direction of unmeasured confounding bias. Epidemiology 19:720–28
    [Google Scholar]
  46. VanderWeele TJ, Robins JM. 2010. Signed directed acyclic graphs for causal inference. J. R. Stat. Soc. Ser. B (Stat. Methodol.) 72:111–27
    [Google Scholar]
  47. Waldner D. 2015. Process tracing and qualitative causal inference. Secur. Stud 24:239–50
    [Google Scholar]
  48. Weber M. 2017. Methodology of Social Sciences New York: Routledge
  49. Wooldridge JM. 2015. Introductory Econometrics: A Modern Approach Mason, OH: Cengage Learning
/content/journals/10.1146/annurev-polisci-051120-111443
Loading
/content/journals/10.1146/annurev-polisci-051120-111443
Loading

Data & Media loading...

Supplemental Material

Supplementary Data

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error