1932

Abstract

External validity captures the extent to which inferences drawn from a given study's sample apply to a broader population or other target populations. Social scientists frequently invoke external validity as an ideal, but they rarely attempt to make rigorous, credible external validity inferences. In recent years, methodologically oriented scholars have advanced a flurry of work on various components of external validity, and this article reviews and systematizes many of those insights. We first clarify the core conceptual dimensions of external validity and introduce a simple formalization that demonstrates why external validity matters so critically. We then organize disparate arguments about how to address external validity by advancing three evaluative criteria: model utility, scope plausibility, and specification credibility. We conclude with a practical aspiration that scholars supplement existing reporting standards to include routine discussion of external validity. It is our hope that these evaluation and reporting standards help rebalance scientific inquiry, such that the current obsession with causal inference is complemented with an equal interest in generalized knowledge.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-polisci-041719-102556
2021-05-11
2024-04-19
Loading full text...

Full text loading...

/deliver/fulltext/polisci/24/1/annurev-polisci-041719-102556.html?itemId=/content/journals/10.1146/annurev-polisci-041719-102556&mimeType=html&fmt=ahah

Literature Cited

  1. Abadie A, Diamond A, Hainmueller J. 2010. Synthetic control methods for comparative case studies: estimating the effect of California's tobacco control program. J. Am. Stat. Assoc. 105:493–505
    [Google Scholar]
  2. Acemoglu D. 2010. Theory, general equilibrium, and political economy in development economics. J. Econ. Perspect. 24:17–32
    [Google Scholar]
  3. Allcott H. 2015. Site selection bias in program evaluation. Q. J. Econ. 130:1117–65
    [Google Scholar]
  4. Andrews I, Oster E. 2019. A simple approximation for evaluating external validity bias. Econ. Lett. 178:58–62
    [Google Scholar]
  5. Angrist JD, Imbens GW, Rubin DB. 1996. Identification of causal effects using instrumental variables. J. Am. Stat. Assoc. 91:456–58
    [Google Scholar]
  6. Angrist JD, Pischke JS. 2008. Mostly Harmless Econometrics: An Empiricist's Companion Princeton, NJ: Princeton Univ. Press
  7. Angrist JD, Pischke JS. 2010. The credibility revolution in empirical economics: how better research design is taking the con out of econometrics. J. Econ. Perspect. 24:3–30
    [Google Scholar]
  8. Angrist JD, Rokkanen M. 2015. Wanna get away? Regression discontinuity estimation of exam school effects away from the cutoff. J. Am. Stat. Assoc. 110:1331–44
    [Google Scholar]
  9. Appelbaum M, Cooper H, Kline RB, Mayo-Wilson E, Nezu AM, Rao SM. 2018. Journal article reporting standards for quantitative research in psychology: the APA Publications and Communications Board Task Force report. Am. Psychol. 73:3–25
    [Google Scholar]
  10. Aronow PM, Samii C. 2016. Does regression produce representative estimates of causal effects?. Am. J. Political Sci. 60:250–67
    [Google Scholar]
  11. Aronow PM, Sävje F. 2020. The book of why: the new science of cause and effect. J. Am. Stat. Assoc. 1459:482–85
    [Google Scholar]
  12. Aronson E, Carlsmith JM. 1968. Experimentation in social psychology. Handb. Soc. Psychol. 2:1–79
    [Google Scholar]
  13. Aronson E, Wilson TD, Akert RM. 1994. Social Psychology: The Heart and the Mind. New York: Harper Collins
  14. Ashworth S, Berry CR, Mesquita EBD. 2014. All else equal in theory and data (big or small). PS: Political Sci. Politics 48:89–94
    [Google Scholar]
  15. Banerjee A, Banerji R, Berry J, Duflo E, Kannan H et al. 2017a. From proof of concept to scalable policies: challenges and solutions, with an application. J. Econ. Perspect. 31:73–102
    [Google Scholar]
  16. Banerjee A, Chassang S, Snowberg E. 2017b. Decision theoretic approaches to experiment design and external validity. Handbook of Economic Field Experiments, Vol. 2 Duflo E, Banerjee A 141–74 Oxford, UK: Elsevier
    [Google Scholar]
  17. Banerjee A, Duflo E, Goldberg N, Karlan D, Osei R et al. 2015. A multifaceted program causes lasting progress for the very poor: evidence from six countries. Science 348:1260799
    [Google Scholar]
  18. Bareinboim E, Pearl J. 2013. A general algorithm for deciding transportability of experimental results. J. Causal Inference 1:107–134
    [Google Scholar]
  19. Bareinboim E, Pearl J 2016. Causal inference and the data-fusion problem. PNAS 113:7345–52
    [Google Scholar]
  20. Bates MA, Glennerster R. 2017. The generalizability puzzle. Stanford Soc. Innov. Rev. 201:50–54
    [Google Scholar]
  21. Berinsky AJ, Huber GA, Lenz GS. 2012. Evaluating online labor markets for experimental research: Amazon.com's Mechanical Turk. Political Anal 20:351–68
    [Google Scholar]
  22. Berinsky AJ, Margolis MF, Sances MW. 2014. Separating the shirkers from the workers? Making sure respondents pay attention on self-administered surveys. Am. J. Political Sci. 58:739–53
    [Google Scholar]
  23. Bertanha M, Imbens GW. 2020. External validity in fuzzy regression discontinuity designs. J. Bus. Econ. Stat. 38:593–612
    [Google Scholar]
  24. Biglaiser G, Staats JL. 2010. Do political institutions affect foreign direct investment? A survey of U.S. corporations in Latin America. Political Res. Q. 63:508–22
    [Google Scholar]
  25. Bisbee J, Dehejia R, Pop-Eleches C, Samii C. 2017. Local instruments, global extrapolation: external validity of the labor supply–fertility local average treatment effect. J. Labor Econ. 35:S99–147
    [Google Scholar]
  26. Bold T, Kimenyi M, Mwabu G, Ng A, Sandefur J. 2018. Experimental evidence on scaling up education reforms in Kenya. J. Public Econ. 168:1–20
    [Google Scholar]
  27. Breskin A, Westreich D, Cole SR, Edwards JK. 2019. Using bounds to compare the strength of exchangeability assumptions for internal and external validity. Am. J. Epidemiol. 188:1355–60
    [Google Scholar]
  28. Brunswik E. 1947. Systematic and representative design of psychological experiments. Proceedings of the Berkeley Symposium on Mathematical Statistics and Probability143–202 Berkeley: Univ. Calif. Press
    [Google Scholar]
  29. Buchanan AL, Hudgens MG, Cole SR, Mollan KR, Sax PE et al. 2018. Generalizing evidence from randomized trials using inverse probability of sampling weights. J. R. Stat. Soc. Ser. A: Stat. Soc. 181:1193–209
    [Google Scholar]
  30. Camerer C 2015. The promise and success of lab–field generalizability in experimental economics: a critical reply to Levitt and List. Handbook of Experimental Economic Methodology GR Fréchette, A Schotter 249–95 Oxford, UK: Oxford Univ. Press
    [Google Scholar]
  31. Campbell DT. 1957. Factors relevant to the validity of experiments in social settings. Psychol. Bull. 54:297–312
    [Google Scholar]
  32. Campbell DT 1986. Relabeling internal and external validity for applied social scientists. Advances in Quasi-Experimental Design and Analysis WMK Trochim pp. 6777 San Francisco: Jossey-Bass
    [Google Scholar]
  33. Carroll L. 1865. Alice's Adventures in Wonderland London: Macmillan
  34. Cartwright N. 1999. The Dappled World: A Study of the Boundaries of Science Cambridge, UK: Cambridge Univ. Press
  35. Cartwright N. 2020. Middle-range theory: Without it what could anyone do?. Theoria 35:269323
    [Google Scholar]
  36. Cartwright N, Hardie J. 2012. Evidence-Based Policy: A Practical Guide to Doing It Better Oxford, UK: Oxford Univ. Press
  37. Clarke KA, Primo DM. 2012. A Model Discipline: Political Science and the Logic of Representations Oxford, UK: Oxford Univ. Press
  38. Cole SR, Stuart EA. 2010. Generalizing evidence from randomized clinical trials to target populations: the ACTG 320 Trial. Am. J. Epidemiol. 172:107–15
    [Google Scholar]
  39. Cook TD, Campbell DT 1979. Quasi-Experimentation: Design and Analysis for Field Settings, Vol. 3 Chicago: Rand McNally
  40. Coppock A. 2018. Generalizing from survey experiments conducted on Mechanical Turk: a replication approach. Political Sci. Res. Methods 7:613–28
    [Google Scholar]
  41. Coppock A, Green DP. 2015. Assessing the correspondence between experimental results obtained in the lab and field: a review of recent social science research. Political Sci. Res. Methods 3:113–31
    [Google Scholar]
  42. Coppock A, Hill SJ, Vavreck L. 2020. The small effects of political advertising are small regardless of context, message, sender, or receiver: evidence from 59 real-time randomized experiments. Sci. Adv. 6:eabc4046
    [Google Scholar]
  43. Coppock A, Leeper TJ, Mullinix KJ 2018. The generalizability of heterogeneous treatment effect estimates across samples. PNAS 115:12441–46
    [Google Scholar]
  44. Cronbach LJ, Shapiro K. 1982. Designing Evaluations of Educational and Social Programs San Francisco: Jossey-Bass
  45. Dahabreh IJ, Hayward R, Kent DM. 2016. Using group data to treat individuals: understanding heterogeneous treatment effects in the age of precision medicine and patient-centred evidence. Int. J. Epidemiol. 45:2184–93
    [Google Scholar]
  46. Deaton A. 2010. Instruments, randomization, and learning about development. J. Econ. Lit. 48:424–55
    [Google Scholar]
  47. Deaton A. 2019. Randomization in the tropics revisited: a theme and eleven variations. NBER Work. Pap. 27600
    [Google Scholar]
  48. Deaton A, Cartwright N. 2018. Understanding and misunderstanding randomized controlled trials. Soc. Sci. Med. 210:2–21
    [Google Scholar]
  49. Dehejia R, Pop-Eleches C, Samii C. 2021. From local to global: external validity in a fertility natural experiment. J. Bus. Econ. Stat. 39:217–43
    [Google Scholar]
  50. Druckman JN, Green DP, Kuklinski JH, Lupia A. 2006. The growth and development of experimental research in political science. Am. Political Sci. Rev. 100:627–35
    [Google Scholar]
  51. Druckman JN, Kam CD. 2011. Students as experimental participants. Cambridge Handbook of Experimental Political Science JN Druckman, DP Greene, JH Kuklinski, A Lupia 41–57 New York: Cambridge Univ. Press
    [Google Scholar]
  52. Dunning T. 2012. Natural Experiments in the Social Sciences: A Design-Based Approach Cambridge, UK: Cambridge Univ. Press
    [Google Scholar]
  53. Dunning T, Grossman G, Humphreys M, Hyde S, McIntosh C, Nellis G 2019.. Information, Accountability, and Cumulative Learning: Lessons from Metaketa I. Cambridge, UK: Cambridge Univ. Press
  54. Elman C, Gerring J, Mahoney J. 2019. The Production of Knowledge: Enhancing Progress in Social Science Cambridge, UK: Cambridge Univ. Press
  55. Falk A, Heckman JJ. 2009. Lab experiments are a major source of knowledge in the social sciences. Science 326:535–38
    [Google Scholar]
  56. Falleti TG, Lynch JF. 2009. Context and causal analysis. Comp. Political Stud. 42:1143–66
    [Google Scholar]
  57. Fey M, Ramsay KW 2011. Uncertainty and incentives in crisis bargaining: Game-free analysis of international conflict. Am. J. Political Sci. 55:149–69
    [Google Scholar]
  58. Findley MG, Denly M, Kikuta K. 2022. External Validity in the Social Sciences: An Integrated Approach Cambridge, UK: Cambridge Univ. Press Manuscript under contract
  59. Findley MG, Harris AS, Milner HV, Nielson DL. 2017a. Who controls foreign aid? Elite versus public perceptions of donor influence in aid-dependent Uganda. Int. Organ. 71:633–63
    [Google Scholar]
  60. Findley MG, Laney B, Nielson DL, Sharman JC. 2017b. External validity in parallel global field and survey experiments on anonymous incorporation. J. Politics 79:856–72
    [Google Scholar]
  61. Franco A, Malhotra N, Simonovits G, Zigerell LJ. 2017. Developing standards for post-hoc weighting in population-based survey experiments. J. Exp. Political Sci. 4:161–72
    [Google Scholar]
  62. Gaines BJ, Kuklinski JH. 2011. Experimental estimation of heterogeneous treatment effects related to self-selection. Am. J. Political Sci. 55:724–36
    [Google Scholar]
  63. Garcia FM, Wantchekon L. 2010. Theory, external validity, and experimental inference: some conjectures. Ann. Am. Acad. Political Soc. Sci. 628:132–47
    [Google Scholar]
  64. Gelman A, Hill J. 2007. Data Analysis Using Regression and Multilevel/Hierarchical Modelling Cambridge, UK: Cambridge Univ. Press
  65. Gerber A, Arceneaux K, Boudreau C, Dowling C, Hillygus S et al. 2014. Reporting guidelines for experimental research: a report from the experimental research section standards committee. J. Exp. Political Sci. 1:81–98
    [Google Scholar]
  66. Gerber AS, Green DP. 2012. Field Experiments: Design, Analysis, and Interpretation New York: WW Norton
  67. Gerring J. 2008. The mechanismic worldview: thinking inside the box. Br. J. Political Sci. 38:161–79
    [Google Scholar]
  68. Gerring J. 2011. Social Science Methodology: A Unified Framework Cambridge, UK: Cambridge Univ. Press
  69. Gisselquist RM. 2020. How the cases you choose affect the answers you get, revisited. World Dev 127:104800
    [Google Scholar]
  70. Goertz G. 2017. Multimethod Research, Causal Mechanisms, and Case Studies: An Integrated Approach Princeton, NJ: Princeton Univ. Press
  71. Goertz G, Mahoney J. 2012. A Tale of Two Cultures: Qualitative and Quantitative Research in the Social Sciences Princeton, NJ: Princeton Univ. Press
  72. Grossman G, Humphreys M, Sacramone-Lutz G. 2020. Information technology and political engagement: mixed evidence from Uganda. J. Politics 82:1321–36
    [Google Scholar]
  73. Grzymala-Busse A. 2011. Time will tell? Temporality and the analysis of causal mechanisms and processes. Comp. Political Stud. 44:1267–97
    [Google Scholar]
  74. Guala F. 2005. The Methodology of Experimental Economics Cambridge, UK: Cambridge Univ. Press
  75. Guala F. 2010. Extrapolation, analogy, and comparative process tracing. Philos. Sci. 77:1070–82
    [Google Scholar]
  76. Guardado J, Wantchékon L. 2018. Do electoral handouts affect voting behavior?. Electoral Stud 53:139–49
    [Google Scholar]
  77. Hartman E, Grieve R, Ramsahai R, Sekhon J. 2015. From sample average treatment effect to population average treatment effect on the treated: combining experimental with observational studies to estimate population treatment effects. J. R. Stat. Soc. Ser. A: Stat. Soc. 178:757–78
    [Google Scholar]
  78. Heckman JJ, Urzúa S. 2010. Comparing IV with structural models: what simple IV can and cannot identify. J. Econ. 156:27–37
    [Google Scholar]
  79. Heckman JJ, Vytlacil E. 2001. Policy-relevant treatment effects. Am. Econ. Rev. Pap. Proc. 91:107–11
    [Google Scholar]
  80. Heckman JJ, Vytlacil EJ. 2005. Structural equations, treatment effects, and econometric policy evaluation. Econometrica 73:669–738
    [Google Scholar]
  81. Henrich J, Heine SJ, Norenzayan A. 2010. The weirdest people in the world?. Behav. Brain Sci. 33:61–83
    [Google Scholar]
  82. Ho DE, Imai K, King G, Stuart EA. 2007. Matching as nonparametric preprocessing for reducing model dependence in parametric causal inference. Political Anal 15:199–236
    [Google Scholar]
  83. Holland PW. 1986. Statistics and causal inference. J. Am. Stat. Assoc. 81:945–60
    [Google Scholar]
  84. Hollenbach FM, Montgomery JM 2020. Bayesian model selection, model comparison, and model averaging. SAGE Handbook of Research Methods in Political Science and International Relations L Curini, RJ Franzese 937–60 London: SAGE
    [Google Scholar]
  85. Hotz VJ, Imbens GW, Mortimer JH. 2005. Predicting the efficacy of future training programs using past experiences at other locations. J. Econ. 125:241–70
    [Google Scholar]
  86. Huff C, Tingley D. 2015.. “ Who are these people?” Evaluating the demographic characteristics and political preferences of MTurk survey respondents. Res. Politics 2:1–12
    [Google Scholar]
  87. Imai K, Keele LJ, Tingley D, Yamamoto T. 2011. Unpacking the black box of causality: learning about causal mechanisms from experimental and observational studies. Am. Political Sci. Rev. 105:765–89
    [Google Scholar]
  88. Imai K, King G, Stuart EA. 2008. Misunderstandings between experimentalists and observationalists about causal inference. J. R. Stat. Soc. Ser. A: Stat. Soc. 171:481–502
    [Google Scholar]
  89. Imai K, Ratkovic M. 2013. Estimating treatment effect heterogeneity in randomized program evaluation. Ann. Appl. Stat. 7:443–70
    [Google Scholar]
  90. Imbens GW. 2010. Better LATE than nothing: some comments on Deaton (2009) and Heckman and Urzua (2009). J. Econ. Lit. 48:399–423
    [Google Scholar]
  91. Imbens GW, Angrist JD. 1994. Identification and estimation of local average treatment effects. Econometrica 62:467–75
    [Google Scholar]
  92. Imbens GW, Rubin DB. 2015. Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction New York: Cambridge Univ. Press
  93. Keane MP, Wolpin KI. 2007. Exploring the usefulness of a nonrandom holdout sample for model validation: welfare effects on female behavior. Int. Econ. Rev. 48:1351–78
    [Google Scholar]
  94. Kern HL, Stuart EA, Hill J, Green DP. 2016. Assessing methods for generalizing experimental impact estimates to target populations. J. Res. Educ. Eff. 9:103–27
    [Google Scholar]
  95. Kessler J, Lise V 2015. The external validity of experiments: the misleading emphasis on quantitative effects. Handbook of Experimental Economic Methodology GR Fréchette, A Schotter 391–406 Oxford, UK: Oxford Univ. Press
    [Google Scholar]
  96. King G, Keohane RO, Verba S. 1994. Designing Social Inquiry: Scientific Inference in Qualitative Research Princeton, NJ: Princeton Univ. Press
  97. Klašnja M, Titiunik R. 2017. The incumbency curse: weak parties, term limits, and unfulfilled accountability. Am. Political Sci. Rev. 111:129–48
    [Google Scholar]
  98. Kruskal W, Mosteller F. 1979. Representative sampling, III: the current statistical literature. Int. Stat. Rev./Rev. Int. Stat. 47:245–65
    [Google Scholar]
  99. Leamer EE. 2010. Tantalus on the road to asymptopia. J. Econ. Perspect. 24:31–46
    [Google Scholar]
  100. Lesko CR, Buchanan AL, Westreich D, Edwards JK, Hudgens MG, Cole SR. 2017. Generalizing study results. Epidemiology 28:553–61
    [Google Scholar]
  101. Levitt HM, Creswell JW, Josselson R, Bamberg M, Frost DM, Suarez-Orozco C. 2018. Journal article reporting standards for qualitative research in psychology: the APA Publications and Communications Board task force report. Am. Psychol. 73:26–46
    [Google Scholar]
  102. Lieberson S. 1985. Making It Count: The Improvement of Social Research and Theory Berkeley: Univ. Calif. Press
  103. Little A, Pepinsky TB. 2021. Learning from biased research designs. J. Politics In press. https://www.journals.uchicago.edu/doi/10.1086/710088
    [Google Scholar]
  104. Low H, Meghir C. 2017. The use of structural models in econometrics. J. Econ. Perspect. 31:33–58
    [Google Scholar]
  105. Lucas JW. 2003. Theory-testing, generalization, and the problem of external validity. Sociol. Theory 21:236–53
    [Google Scholar]
  106. Mackie JL. 1965. Causes and conditions. Am. Philos. Q. 2:245–64
    [Google Scholar]
  107. Marcellesi A. 2015. External validity: Is there still a problem?. Philos. Sci. 82:1308–17
    [Google Scholar]
  108. McDermott R 2011. Internal and external validity. Cambridge Handbook of Experimental Political Science, ed. JN Druckman, DP Green DP, JH Kuklinski, A Lupia 27–40 New York: Cambridge Univ. Press
    [Google Scholar]
  109. McFadden D, Talvitie AP 1977. Demand model estimation and validation. Urban travel demand forecasting project: phase 1 final report series, Vol. V Rep. UCB-ITS-SR-77-9 Inst. Transport. Stud., Univ. Calif Berkeley and Irvine:
  110. McIntyre L. 2019. The Scientific Attitude: Defending Science from Denial, Fraud, and Pseudoscience Cambridge, MA: MIT Press
  111. Miratrix LW, Sekhon JS, Theodoridis AG, Campos LF. 2018. Worth weighting? How to think about and use weights in survey experiments. Political Anal 26:275–91
    [Google Scholar]
  112. Morton RB, Williams KC. 2010. Experimental Political Science and the Study of Causality: From Nature to the Lab Cambridge, UK: Cambridge Univ. Press
  113. Muller SM. 2015. Causal interaction and external validity: obstacles to the policy relevance of randomized evaluations. World Bank Econ. Rev. 29:S217–25
    [Google Scholar]
  114. Mullinix KJ, Leeper TJ, Druckman JN, Freese J. 2015. The generalizability of survey experiments. J. Exp. Political Sci. 2:109–38
    [Google Scholar]
  115. Muralidharan K, Niehaus P. 2017. Experimentation at scale. J. Econ. Perspect. 31:103–24
    [Google Scholar]
  116. Mutz DC. 2011. Population-Based Survey Experiments Princeton, NJ: Princeton Univ. Press
  117. Nagler J, Tucker JA. 2015. Drawing inferences and testing theories with big data. PS: Political Sci. Politics 48:84–88
    [Google Scholar]
  118. Neumayer E, Plumper T. 2017. Robustness Tests for Quantitative Research Cambridge, UK: Cambridge Univ. Press
  119. Nguyen TQ, Ebnesajjad C, Cole SR, Stuart EA. 2017. Sensitivity analysis for an unobserved moderator in RCT-to-target-population generalization of treatment effects. Ann. Appl. Stat. 11:225–47
    [Google Scholar]
  120. Olsen R, Bell S, Orr L, Stuart EA. 2013. External validity in policy evaluations that choose sites purposively. J. Policy Anal. Manag. 32:107–21
    [Google Scholar]
  121. Pearl J. 2009. Causality: Models, Reasoning, and Inference Cambridge, UK: Cambridge Univ. Press
  122. Pearl J, Bareinboim E. 2014. External validity: from do-calculus to transportability across populations. Stat. Sci. 29:579–95
    [Google Scholar]
  123. Pearl J, Bareinboim E. 2019. Note on “generalizability of study results. .” Epidemiology 30:186–88
    [Google Scholar]
  124. Pearl J, Mackenzie D. 2018. The Book of Why: The New Science of Cause and Effect New York: Basic Books
  125. Pierson P. 2000. Increasing returns, path dependence, and the study of politics. Am. Political Sci. Rev. 94:251–67
    [Google Scholar]
  126. Pritchett L, Sandefur J. 2013. Context matters for size: why external validity claims and development practice don't mix Work. Pap., Cent. Glob. Dev Washington, DC:
  127. Pritchett L, Sandefur J. 2015. Learning from experiments when context matters. Am. Econ. Rev. 105:471–75
    [Google Scholar]
  128. Ragin CC. 2000. Fuzzy-Set Social Science Chicago: Univ. Chicago Press
  129. Ravallion M. 2012. Fighting poverty one experiment at a time: Poor Economics: A Radical Rethinking of the Way to Fight Global Poverty: review essay. J. Econ. Lit. 50:103–14
    [Google Scholar]
  130. Rodrik D 2009. The new development economics: We shall experiment, but how shall we learn?. What Works in Development? Thinking Big and Thinking Small J Cohen, W Easterly 24–50 Washington, DC: Brookings Inst. Press
    [Google Scholar]
  131. Rubin DB. 2004. Multiple Imputation for Nonresponse in Surveys New York: John Wiley & Sons
  132. Russell B. 1912. On the notion of cause. Proc. Aristot. Soc. 13:1–26
    [Google Scholar]
  133. Samii C. 2016. Causal empiricism in quantitative research. J. Politics 78:941–55
    [Google Scholar]
  134. Sartori G. 1970. Concept misformation in comparative politics. Am. Political Sci. Rev. 64:1033–53
    [Google Scholar]
  135. Schulz K. 2015. The rabbit-hole rabbit hole. New Yorker June 4. https://www.newyorker.com/culture/cultural-comment/the-rabbit-hole-rabbit-hole
    [Google Scholar]
  136. Sekhon JS, Titiunik R 2017. On interpreting the regression discontinuity design as a local experiment. Regression Discontinuity Designs: Theory and Applications, Vol. 38 MD Cattaneo, JC Escanciano 1–28 Bingley, UK: Emerald Publ.
    [Google Scholar]
  137. Shadish W, Cook TD, Campbell DT. 2002. Experimental and Quasi-Experimental Designs for Generalized Causal Inference Boston: Houghton Mifflin
  138. Tipton E. 2013. Improving generalizations from experiments using propensity score subclassification: assumptions, properties, and contexts. J. Educ. Behav. Stat. 38:239–66
    [Google Scholar]
  139. Tipton E, Hedges L, Vaden-Kiernan M, Borman G, Sullivan K, Caverly S. 2014. Sample selection in randomized experiments: a new method using propensity score stratified sampling. J. Res. Educ. Eff. 7:114–35
    [Google Scholar]
  140. Trochim WMK, Donnelly JP. 2006. The Research Methods Knowledge Base Cincinnati, OH: Atomic Dog. , 3rd ed..
  141. van Eersel GG, Koppenol-Gonzalez GV, Reiss J. 2019. Extrapolation of experimental results through analogical reasoning from latent classes. Philos. Sci. 86:219–35
    [Google Scholar]
  142. Vivalt E. 2020. How much can we generalize from impact evaluations?. J. Eur. Econ. Assoc. 18:63045–89
    [Google Scholar]
  143. Walker HA, Cohen BP. 1985. Scope statements: imperatives for evaluating theory. Am. Sociol. Rev. 50:288–301
    [Google Scholar]
  144. Weller N, Barnes J. 2014. Finding Pathways: Mixed-Method Research for Studying Causal Mechanisms Cambridge, UK: Cambridge Univ. Press
  145. Wells GL, Windschilt PD. 1999. Stimulus sampling and social psychology. Personal. Soc. Psychol. Bull. 25:1115–25
    [Google Scholar]
  146. Wells HG. 1905. A Modern Utopia London: Chapman & Hall
  147. Westreich D, Edwards JK, Lesko CR, Cole SR, Stuart EA. 2019. Target validity and the hierarchy of study designs. Am. J. Epidemiol. 188:438–43
    [Google Scholar]
  148. Westreich D, Edwards JK, Lesko CR, Stuart EA, Cole SR. 2017. Transportability of trial results using inverse odds of sampling weights. Am. J. Epidemiol. 186:1010–14
    [Google Scholar]
  149. Wilke A, Humphreys M 2020. Field experiments, theory, and external validity. SAGE Handbook of Research Methods in Political Science and International Relations L Curini, RJ Franzese 1007–35 London: SAGE
    [Google Scholar]
  150. Wilson MC, Knutsen CH. 2020. Geographical coverage in political science research. Perspect. Politics https://doi.org/10.1017/S1537592720002509
    [Crossref] [Google Scholar]
  151. Wing C, Bello-Gomez RA. 2018. Regression discontinuity and beyond: options for studying external validity in an internally valid design. Am. J. Eval. 39:91–108
    [Google Scholar]
  152. Wolpin KI. 2007. Ex ante policy evaluation, structural estimation, and model selection. Am. Econ. Rev. 97:48–52
    [Google Scholar]
  153. Yao Y, Vehtari A, Simpson D, Gelman A. 2018. Using stacking to average Bayesian predictive distributions (with discussion). Bayesian Anal 13:917–44
    [Google Scholar]
/content/journals/10.1146/annurev-polisci-041719-102556
Loading
/content/journals/10.1146/annurev-polisci-041719-102556
Loading

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error