Replicability, Robustness, and Reproducibility in Psychological Science

Brian A. Nosek; Tom E. Hardwicke; Hannah Moshontz; Aurélien Allard; Katherine S. Corker; Anna Dreber; Fiona Fidler; Joe Hilgard; Melissa Kline Struhl; Michèle B. Nuijten; Julia M. Rohrer; Felipe Romero; Anne M. Scheel; Laura D. Scherer; Felix D. Schönbrodt; Simine Vazire

doi:10.1146/annurev-psych-020821-114157

Replicability, Robustness, and Reproducibility in Psychological Science

Brian A. Nosek^1,2, Tom E. Hardwicke³, Hannah Moshontz⁴, Aurélien Allard⁵, Katherine S. Corker⁶, Anna Dreber⁷, Fiona Fidler⁸, Joe Hilgard⁹, Melissa Kline Struhl², Michèle B. Nuijten¹⁰, Julia M. Rohrer¹¹, Felipe Romero¹², Anne M. Scheel¹³, Laura D. Scherer¹⁴, Felix D. Schönbrodt¹⁵, and Simine Vazire¹⁶
View Affiliations Hide Affiliations

Affiliations: ¹Department of Psychology, University of Virginia, Charlottesville, Virginia 22904, USA; email: [email protected] ²Center for Open Science, Charlottesville, Virginia 22903, USA ³Department of Psychology, University of Amsterdam, 1012 ZA Amsterdam, The Netherlands ⁴Addiction Research Center, University of Wisconsin–Madison, Madison, Wisconsin 53706, USA ⁵Department of Psychology, University of California, Davis, California 95616, USA ⁶Psychology Department, Grand Valley State University, Allendale, Michigan 49401, USA ⁷Department of Economics, Stockholm School of Economics, 113 83 Stockholm, Sweden ⁸School of Biosciences, University of Melbourne, Parkville VIC 3010, Australia ⁹Department of Psychology, Illinois State University, Normal, Illinois 61790, USA ¹⁰Meta-Research Center, Tilburg University, 5037 AB Tilburg, The Netherlands ¹¹Department of Psychology, Leipzig University, 04109 Leipzig, Germany ¹²Department of Theoretical Philosophy, University of Groningen, 9712 CP Groningen, The Netherlands ¹³Department of Industrial Engineering and Innovation Sciences, Eindhoven University of Technology, 5612 AZ Eindhoven, The Netherlands ¹⁴University of Colorado Anschutz Medical Campus, Aurora, Colorado 80045, USA ¹⁵Department of Psychology, Ludwig Maximilian University of Munich, 80539 Munich, Germany ¹⁶School of Psychological Sciences, University of Melbourne, Parkville VIC 3052, Australia
Vol. 73:719-748 (Volume publication date January 2022) https://doi.org/10.1146/annurev-psych-020821-114157
First published as a Review in Advance on October 19, 2021
Copyright © 2022 by Annual Reviews. All rights reserved

Abstract

Replication—an important, uncommon, and misunderstood practice—is gaining appreciation in psychology. Achieving replicability is important for making research progress. If findings are not replicable, then prediction and theory development are stifled. If findings are replicable, then interrogation of their meaning and validity can advance knowledge. Assessing replicability can be productive for generating and testing hypotheses by actively confronting current understandings to identify weaknesses and spur innovation. For psychology, the 2010s might be characterized as a decade of active confrontation. Systematic and multi-site replication projects assessed current understandings and observed surprising failures to replicate many published findings. Replication efforts highlighted sociocultural challenges such as disincentives to conduct replications and a tendency to frame replication as a personal attack rather than a healthy scientific practice, and they raised awareness that replication contributes to self-correction. Nevertheless, innovation in doing and understanding replication and its cousins, reproducibility and robustness, has positioned psychology to improve research practices and accelerate progress.

Keyword(s): generalizability, metascience, replication, reproducibility, research methods, robustness, statistical inference, theory, validity

Article metrics loading...

/content/journals/10.1146/annurev-psych-020821-114157

2022-01-04

2024-05-05

Full text loading...

/deliver/fulltext/psych/73/1/annurev-psych-020821-114157.html?itemId=/content/journals/10.1146/annurev-psych-020821-114157&mimeType=html&fmt=ahah

Literature Cited

Alogna VK, Attaya MK, Aucoin P, Bahník Š, Birch S et al. 2014. Registered Replication Report: Schooler and Engstler-Schooler (1990). Perspect. Psychol. Sci. 9:5556–78
[Google Scholar]
Altmejd A, Dreber A, Forsell E, Huber J, Imai T et al. 2019. Predicting the replicability of social science lab experiments. PLOS ONE 14:12e0225826
[Google Scholar]
Anderson CJ, Bahník Š, Barnett-Cowan M, Bosco FA, Chandler J et al. 2016. Response to Comment on “Estimating the reproducibility of psychological science. Science 351:62771037
[Google Scholar]
Anderson MS, Martinson BC, De Vries R. 2007. Normative dissonance in science: results from a national survey of U.S. scientists. J. Empir. Res. Hum. Res. Ethics 2:43–14
[Google Scholar]
Appelbaum M, Cooper H, Kline RB, Mayo-Wilson E, Nezu AM, Rao SM. 2018. Journal article reporting standards for quantitative research in psychology: the APA Publications and Communications Board task force report. Am. Psychol. 73:13–25Corrigendum 2018. Am. Psychol 73:7947
[Google Scholar]
Armeni K, Brinkman L, Carlsson R, Eerland A, Fijten R et al. 2020. Towards wide-scale adoption of open science practices: the role of open science communities. MetaArXiv, Oct. 6 https://doi.org/10.31222/osf.io/7gct9
[Crossref]
Artner R, Verliefde T, Steegen S, Gomes S, Traets F et al. 2020. The reproducibility of statistical results in psychological research: an investigation using unpublished raw data. Psychol. Methods. In press. https://doi.org/10.1037/met0000365
[Crossref] [Google Scholar]
Baker M. 2016. Dutch agency launches first grants programme dedicated to replication. Nat. News. https://doi.org/10.1038/nature.2016.20287
[Crossref] [Google Scholar]
Bakker M, van Dijk A, Wicherts JM. 2012. The rules of the game called psychological science. Perspect. Psychol. Sci. 7:6543–54
[Google Scholar]
Bakker M, Wicherts JM. 2011. The (mis)reporting of statistical results in psychology journals. Behav. Res. Methods 43:3666–78
[Google Scholar]
Baribault B, Donkin C, Little DR, Trueblood JS, Oravecz Z et al. 2018. Metastudies for robust tests of theory. PNAS 115:112607–12
[Google Scholar]
Baron J, Hershey JC. 1988. Outcome bias in decision evaluation. J. Pers. Soc. Psychol. 54:4569–79
[Google Scholar]
Baumeister RF. 2016. Charting the future of social psychology on stormy seas: winners, losers, and recommendations. J. Exp. Soc. Psychol. 66:153–58
[Google Scholar]
Baumeister RF, Vohs KD. 2016. Misguided effort with elusive implications. Perspect. Psychol. Sci. 11:4574–75
[Google Scholar]
Benjamin DJ, Berger JO, Johannesson M, Nosek BA, Wagenmakers E-J et al. 2018. Redefine statistical significance. Nat. Hum. Behav. 2:16–10
[Google Scholar]
Botvinik-Nezer R, Holzmeister F, Camerer CF, Dreber A, Huber J et al. 2020. Variability in the analysis of a single neuroimaging dataset by many teams. Nature 582:781084–88
[Google Scholar]
Bouwmeester S, Verkoeijen PPJL, Aczel B, Barbosa F, Bègue L et al. 2017. Registered Replication Report: Rand, Greene, and Nowak (2012). Perspect. Psychol. Sci. 12:3527–42
[Google Scholar]
Brown NJL, Heathers JAJ. 2017. The GRIM test: A simple technique detects numerous anomalies in the reporting of results in psychology. Soc. Psychol. Pers. Sci. 8:4363–69
[Google Scholar]
Button KS, Ioannidis JPA, Mokrysz C, Nosek BA, Flint J et al. 2013. Power failure: why small sample size undermines the reliability of neuroscience. Nat. Rev. Neurosci. 14:5365–76
[Google Scholar]
Byers-Heinlein K, Bergmann C, Davies C, Frank M, Hamlin JK et al. 2020. Building a collaborative psychological science: lessons learned from ManyBabies 1. Can. Psychol. Psychol. Can. 61:4349–63
[Google Scholar]
Camerer CF, Dreber A, Forsell E, Ho T-H, Huber J et al. 2016. Evaluating replicability of laboratory experiments in economics. Science 351:62801433–36
[Google Scholar]
Camerer CF, Dreber A, Holzmeister F, Ho T-H, Huber J et al. 2018. Evaluating the replicability of social science experiments in Nature and Science between 2010 and 2015. Nat. Hum. Behav. 2:9637–44
[Google Scholar]
Carter EC, Schönbrodt FD, Gervais WM, Hilgard J. 2019. Correcting for bias in psychology: a comparison of meta-analytic methods. Adv. Methods Pract. Psychol. Sci. 2:2115–44
[Google Scholar]
Cent. Open Sci 2020. APA joins as new signatory to TOP guidelines. Center for Open Science Nov. 10. https://www.cos.io/about/news/apa-joins-as-new-signatory-to-top-guidelines
[Google Scholar]
Cesario J. 2014. Priming, replication, and the hardest science. Perspect. Psychol. Sci. 9:140–48
[Google Scholar]
Chambers C. 2019. What's next for Registered Reports?. Nature 573:7773187–89
[Google Scholar]
Cheung I, Campbell L, LeBel EP, Ackerman RA, Aykutoğlu B et al. 2016. Registered Replication Report: Study 1 from Finkel, Rusbult, Kumashiro, & Hannon (2002). Perspect. Psychol. Sci. 11:5750–64
[Google Scholar]
Christensen G, Wang Z, Paluck EL, Swanson N, Birke DJ, Miguel E, Littman R. 2019. Open science practices are on the rise: the State of Social Science (3S) Survey. MetaArXiv, Oct. 18. https://doi.org/10.31222/osf.io/5rksu
[Crossref]
Christensen-Szalanski JJ, Willham CF. 1991. The hindsight bias: a meta-analysis. Organ. Behav. Hum. Decis. Process. 48:1147–68
[Google Scholar]
Cohen J. 1962. The statistical power of abnormal-social psychological research: a review. J. Abnorm. Soc. Psychol. 65:3145–53
[Google Scholar]
Cohen J. 1973. Statistical power analysis and research results. Am. Educ. Res. J. 10:3225–29
[Google Scholar]
Cohen J. 1990. Things I have learned (so far). Am. Psychol. 45:1304–12
[Google Scholar]
Cohen J. 1992. A power primer. Psychol. Bull. 112:1155–59
[Google Scholar]
Cohen J. 1994. The earth is round (p < .05). Am. Psychol. 49:12997–1003
[Google Scholar]
Colling LJ, Szücs D, De Marco D, Cipora K, Ulrich R et al. 2020. Registered Replication Report on Fischer, Castel, Dodd, and Pratt (2003). Adv. Methods Pract. Psychol. Sci 3:2143–62
[Google Scholar]
Cook FL. 2016. Dear Colleague Letter: robust and reliable research in the social, behavioral, and economic sciences. National Science Foundation Sept. 20. https://www.nsf.gov/pubs/2016/nsf16137/nsf16137.jsp
[Google Scholar]
Crandall CS, Sherman JW. 2016. On the scientific superiority of conceptual replications for scientific progress. J. Exp. Soc. Psychol. 66:93–99
[Google Scholar]
Crisp RJ, Miles E, Husnu S 2014. Support for the replicability of imagined contact effects. Soc. Psychol. 45:4303–4
[Google Scholar]
Cronbach LJ, Meehl PE. 1955. Construct validity in psychological tests. Psychol. Bull. 52:4281–302
[Google Scholar]
Dang J, Barker P, Baumert A, Bentvelzen M, Berkman E et al. 2021. A multilab replication of the ego depletion effect. Soc. Psychol. Pers. Sci. 12:114–24
[Google Scholar]
Devezer B, Nardin LG, Baumgaertner B, Buzbas EO. 2019. Scientific discovery in a model-centric framework: reproducibility, innovation, and epistemic diversity. PLOS ONE 14:5e0216125
[Google Scholar]
Dijksterhuis A. 2018. Reflection on the professor-priming replication report. Perspect. Psychol. Sci. 13:2295–96
[Google Scholar]
Dreber A, Pfeiffer T, Almenberg J, Isaksson S, Wilson B et al. 2015. Using prediction markets to estimate the reproducibility of scientific research. PNAS 112:5015343–47
[Google Scholar]
Duhem PMM. 1954. The Aim and Structure of Physical Theory Princeton, NJ: Princeton Univ. Press
Ebersole CR, Alaei R, Atherton OE, Bernstein MJ, Brown M et al. 2017. Observe, hypothesize, test, repeat: Luttrell, Petty and Xu (2017) demonstrate good science. J. Exp. Soc. Psychol. 69:184–86
[Google Scholar]
Ebersole CR, Atherton OE, Belanger AL, Skulborstad HM, Allen JM et al. 2016a. Many Labs 3: evaluating participant pool quality across the academic semester via replication. J. Exp. Soc. Psychol. 67:68–82
[Google Scholar]
Ebersole CR, Axt JR, Nosek BA 2016b. Scientists’ reputations are based on getting it right, not being right. PLOS Biol. 14:5e1002460
[Google Scholar]
Ebersole CR, Mathur MB, Baranski E, Bart-Plange D-J, Buttrick NR et al. 2020. Many Labs 5: testing pre-data-collection peer review as an intervention to increase replicability. Adv. Methods Pract. Psychol. Sci. 3:3309–31
[Google Scholar]
Eerland A, Sherrill AM, Magliano JP, Zwaan RA, Arnal JD et al. 2016. Registered Replication Report: Hart & Albarracín (2011). Perspect. Psychol. Sci. 11:1158–71
[Google Scholar]
Ellemers N, Fiske ST, Abele AE, Koch A, Yzerbyt V. 2020. Adversarial alignment enables competing models to engage in cooperative theory building toward cumulative science. PNAS 117:147561–67
[Google Scholar]
Epskamp S, Nuijten MB. 2018. Statcheck: extract statistics from articles and recompute p values. Statistical Software https://CRAN.R-project.org/package=statcheck
[Google Scholar]
Errington TM, Denis A, Perfito N, Iorns E, Nosek BA 2021. Challenges for assessing reproducibility and replicability in preclinical cancer biology. eLife In press
[Google Scholar]
Etz A, Vandekerckhove J. 2016. A Bayesian perspective on the reproducibility project: psychology. PLOS ONE 11:2e0149794
[Google Scholar]
Fanelli D. 2010.. “ Positive” results increase down the hierarchy of the sciences. PLOS ONE 5:4e10068
[Google Scholar]
Fanelli D. 2012. Negative results are disappearing from most disciplines and countries. Scientometrics 90:3891–904
[Google Scholar]
Feest U. 2019. Why replication is overrated. Philos. Sci. 86:5895–905
[Google Scholar]
Ferguson MJ, Carter TJ, Hassin RR. 2014. Commentary on the attempt to replicate the effect of the American flag on increased Republican attitudes. Soc. Psychol. 45:4301–2
[Google Scholar]
Fetterman AK, Sassenberg K. 2015. The reputational consequences of failed replications and wrongness admission among scientists. PLOS ONE 10:12e0143723
[Google Scholar]
Forsell E, Viganola D, Pfeiffer T, Almenberg J, Wilson B et al. 2019. Predicting replication outcomes in the Many Labs 2 study. J. Econ. Psychol. 75:102117
[Google Scholar]
Franco A, Malhotra N, Simonovits G. 2014. Publication bias in the social sciences: unlocking the file drawer. Science 345:62031502–5
[Google Scholar]
Franco A, Malhotra N, Simonovits G. 2016. Underreporting in psychology experiments: evidence from a study registry. Soc. Psychol. Pers. Sci. 7:18–12
[Google Scholar]
Frank MC, Bergelson E, Bergmann C, Cristia A, Floccia C et al. 2017. A collaborative approach to infant research: promoting reproducibility, best practices, and theory-building. Infancy 22:4421–35
[Google Scholar]
Funder DC, Ozer DJ. 2019. Evaluating effect size in psychological research: sense and nonsense. Adv. Methods Pract. Psychol. Sci. 2:2156–68
[Google Scholar]
Gelman A, Carlin J. 2014. Beyond power calculations: assessing type S (sign) and type M (magnitude) errors. Perspect. Psychol. Sci. 9:6641–51
[Google Scholar]
Gelman A, Loken E. 2013. The garden of forking paths: why multiple comparisons can be a problem, even when there is no “fishing expedition” or “p-hacking” and the research hypothesis was posited ahead of time Work. Pap., Columbia Univ. New York:
Gergen KJ. 1973. Social psychology as history. J. Pers. Soc. Psychol. 26:2309–20
[Google Scholar]
Gervais WM, Jewell JA, Najle MB, Ng BKL. 2015. A powerful nudge? Presenting calculable consequences of underpowered research shifts incentives toward adequately powered designs. Soc. Psychol. Pers. Sci. 6:7847–54
[Google Scholar]
Ghelfi E, Christopherson CD, Urry HL, Lenne RL, Legate N et al. 2020. Reexamining the effect of gustatory disgust on moral judgment: a multilab direct replication of Eskine, Kacinik, and Prinz (2011). Adv. Methods Pract. Psychol. Sci. 3:13–23
[Google Scholar]
Gilbert DT, King G, Pettigrew S, Wilson TD 2016. Comment on “Estimating the reproducibility of psychological science. Science 351:62771037
[Google Scholar]
Giner-Sorolla R. 2012. Science or art? How aesthetic standards grease the way through the publication bottleneck but undermine science. Perspect. Psychol. Sci. 7:6562–71
[Google Scholar]
Giner-Sorolla R. 2019. From crisis of evidence to a “crisis” of relevance? Incentive-based answers for social psychology's perennial relevance worries. Eur. Rev. Soc. Psychol. 30:11–38
[Google Scholar]
Gollwitzer M. 2020. DFG Priority Program SPP 2317 Proposal: A meta-scientific program to analyze and optimize replicability in the behavioral, social, and cognitive sciences (META-REP). PsychArchives, May 29. http://dx.doi.org/10.23668/psycharchives.3010
[Crossref]
Gordon M, Viganola D, Bishop M, Chen Y, Dreber A et al. 2020. Are replication rates the same across academic fields? Community forecasts from the DARPA SCORE programme. R. Soc. Open Sci. 7:7200566
[Google Scholar]
Götz M, O'Boyle EH, Gonzalez-Mulé E, Banks GC, Bollmann SS 2020. The “Goldilocks Zone”: (Too) many confidence intervals in tests of mediation just exclude zero. Psychol. Bull. 147:195–114
[Google Scholar]
Greenwald AG. 1975. Consequences of prejudice against the null hypothesis. Psychol. Bull. 82:11–20
[Google Scholar]
Hagger MS, Chatzisarantis NLD, Alberts H, Anggono CO, Batailler C et al. 2016. A multilab preregistered replication of the ego-depletion effect. Perspect. Psychol. Sci. 11:4546–73
[Google Scholar]
Hanea AM, McBride MF, Burgman MA, Wintle BC, Fidler F et al. 2017. I nvestigate D iscuss E stimate A ggregate for structured expert judgement. Int. J. Forecast. 33:1267–79
[Google Scholar]
Hardwicke TE, Bohn M, MacDonald KE, Hembacher E, Nuijten MB et al. 2021. Analytic reproducibility in articles receiving open data badges at the journal Psychological Science: an observational study. R. Soc. Open Sci. 8:1201494
[Google Scholar]
Hardwicke TE, Mathur MB, MacDonald K, Nilsonne G, Banks GC et al. 2018. Data availability, reusability, and analytic reproducibility: evaluating the impact of a mandatory open data policy at the journal Cognition. R. Soc. Open Sci. 5:8180448
[Google Scholar]
Hardwicke TE, Serghiou S, Janiaud P, Danchev V, Crüwell S et al. 2020a. Calibrating the scientific ecosystem through meta-research. Annu. Rev. Stat. Appl. 7:11–37
[Google Scholar]
Hardwicke TE, Thibault RT, Kosie JE, Wallach JD, Kidwell M, Ioannidis J. 2020b. Estimating the prevalence of transparency and reproducibility-related research practices in psychology (2014–2017). MetaArXiv, Jan. 2. https://doi.org/10.31222/osf.io/9sz2y
[Crossref]
Hedges LV, Schauer JM. 2019. Statistical analyses for studying replication: meta-analytic perspectives. Psychol. Methods 24:5557–70
[Google Scholar]
Hoogeveen S, Sarafoglou A, Wagenmakers E-J. 2020. Laypeople can predict which social-science studies will be replicated successfully. Adv. Methods Pract. Psychol. Sci. 3:3267–85
[Google Scholar]
Hughes BM. 2018. Psychology in Crisis London: Palgrave Macmillan
Inbar Y. 2016. Association between contextual dependence and replicability in psychology may be spurious. PNAS 113:34E4933–34
[Google Scholar]
Ioannidis JPA. 2005. Why most published research findings are false. PLOS Med 2:8e124
[Google Scholar]
Ioannidis JPA. 2008. Why most discovered true associations are inflated. Epidemiology 19:5640–48
[Google Scholar]
Ioannidis JPA. 2014. How to make more published research true. PLOS Med 11:10e1001747
[Google Scholar]
Ioannidis JPA, Trikalinos TA. 2005. Early extreme contradictory estimates may appear in published research: the Proteus phenomenon in molecular genetics research and randomized trials. J. Clin. Epidemiol. 58:6543–49
[Google Scholar]
Isager PM, van Aert RCM, Bahník Š, Brandt M, DeSoto KA et al. 2020. Deciding what to replicate: A formal definition of “replication value” and a decision model for replication study selection. MetaArXiv, Sept. 2. https://doi.org/10.31222/osf.io/2gurz
[Crossref]
John LK, Loewenstein G, Prelec D. 2012. Measuring the prevalence of questionable research practices with incentives for truth telling. Psychol. Sci. 23:5524–32
[Google Scholar]
Kahneman D. 2003. Experiences of collaborative research. Am. Psychol. 58:9723–30
[Google Scholar]
Kerr NL. 1998. HARKing: Hypothesizing after the results are known. Pers. Soc. Psychol. Rev. 2:3196–217
[Google Scholar]
Kidwell MC, Lazarević LB, Baranski E, Hardwicke TE, Piechowski S et al. 2016. Badges to acknowledge open practices: a simple, low-cost, effective method for increasing transparency. PLOS Biol 14:5e1002456
[Google Scholar]
Klein RA, Cook CL, Ebersole CR, Vitiello C, Nosek BA et al. 2019. Many Labs 4: failure to replicate mortality salience effect with and without original author involvement. PsyArXiv, Dec. 11. https://doi.org/10/ghwq2w
[Crossref]
Klein RA, Ratliff KA, Vianello M, Adams RB, Bahník Š et al. 2014. Investigating variation in replicability: a “many labs” replication project. Soc. Psychol. 45:3142–52
[Google Scholar]
Klein RA, Vianello M, Hasselman F, Adams BG, Adams RB et al. 2018. Many Labs 2: investigating variation in replicability across samples and settings. Adv. Methods Pract. Psychol. Sci. 1:4443–90
[Google Scholar]
Kunda Z. 1990. The case for motivated reasoning. Psychol. Bull. 108:3480–98
[Google Scholar]
Lakens D. 2019. The value of preregistration for psychological science: a conceptual analysis. PsyArXiv, Nov. 18. https://doi.org/10.31234/osf.io/jbh4w
[Crossref]
Lakens D, Adolfi FG, Albers CJ, Anvari F, Apps MA et al. 2018. Justify your alpha. Nat. Hum. Behav. 2:3168–71
[Google Scholar]
Landy JF, Jia ML, Ding IL, Viganola D, Tierney W et al. 2020. Crowdsourcing hypothesis tests: making transparent how design choices shape research results. Psychol. Bull. 146:5451–79
[Google Scholar]
Leary MR, Diebels KJ, Davisson EK, Jongman-Sereno KP, Isherwood JC et al. 2017. Cognitive and interpersonal features of intellectual humility. Pers. Soc. Psychol. Bull. 43:6793–813
[Google Scholar]
LeBel EP, McCarthy RJ, Earp BD, Elson M, Vanpaemel W. 2018. A unified framework to quantify the credibility of scientific findings. Adv. Methods Pract. Psychol. Sci. 1:3389–402
[Google Scholar]
Leighton DC, Legate N, LePine S, Anderson SF, Grahe J 2018. Self-esteem, self-disclosure, self-expression, and connection on Facebook: a collaborative replication meta-analysis. Psi Chi J. Psychol. Res. 23:298–109
[Google Scholar]
Leising D, Thielmann I, Glöckner A, Gärtner A, Schönbrodt F. 2020. Ten steps toward a better personality science—how quality may be rewarded more in research evaluation. PsyArXiv, May 31. https://doi.org/10.31234/osf.io/6btc3
[Crossref]
Leonelli S 2018. Rethinking reproducibility as a criterion for research quality. Research in the History of Economic Thought and Methodology 36 L Fiorito, S Scheall, CE Suprinyak 129–46 Bingley, UK: Emerald
[Google Scholar]
Lewandowsky S, Oberauer K. 2020. Low replicability can support robust and efficient science. Nat. Commun. 11:1358
[Google Scholar]
Maassen E, van Assen MALM, Nuijten MB, Olsson-Collentine A, Wicherts JM. 2020. Reproducibility of individual effect sizes in meta-analyses in psychology. PLOS ONE 15:5e0233107
[Google Scholar]
Machery E. 2020. What is a replication?. Philos. Sci. 87:4545–67
[Google Scholar]
ManyBabies Consort 2020. Quantifying sources of variability in infancy research using the infant-directed-speech preference. Adv. Methods Pract. Psychol. Sci. 3:124–52
[Google Scholar]
Marcus A, Oransky I 2018. Meet the “data thugs” out to expose shoddy and questionable research. Science Feb. 18. https://www.sciencemag.org/news/2018/02/meet-data-thugs-out-expose-shoddy-and-questionable-research
[Google Scholar]
Marcus A, Oransky I. 2020. Tech firms hire “Red Teams.” Scientists should, too. WIRED July 16. https://www.wired.com/story/tech-firms-hire-red-teams-scientists-should-too/
[Google Scholar]
Mathur MB, VanderWeele TJ. 2020. New statistical metrics for multisite replication projects. J. R. Stat. Soc. A 183:31145–66
[Google Scholar]
Maxwell SE. 2004. The persistence of underpowered studies in psychological research: causes, consequences, and remedies. Psychol. Methods 9:2147–63
[Google Scholar]
Maxwell SE, Lau MY, Howard GS. 2015. Is psychology suffering from a replication crisis? What does “failure to replicate” really mean?. Am. Psychol. 70:6487–98
[Google Scholar]
Mayo DG. 2018. Statistical Inference as Severe Testing Cambridge, UK: Cambridge Univ. Press
McCarthy R, Gervais W, Aczel B, Al-Kire R, Baraldo S et al. 2021. A multi-site collaborative study of the hostile priming effect. Collabra Psychol 7:118738
[Google Scholar]
McCarthy RJ, Hartnett JL, Heider JD, Scherer CR, Wood SE et al. 2018. An investigation of abstract construal on impression formation: a multi-lab replication of McCarthy and Skowronski (2011). Int. Rev. Soc. Psychol. 31:115
[Google Scholar]
Meehl PE. 1978. Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald, and the slow progress of soft psychology. J. Consult. Clin. Psychol. 46:4806–34
[Google Scholar]
Meyer MN, Chabris C. 2014. Why psychologists' food fight matters. Slate Magazine July 31. https://slate.com/technology/2014/07/replication-controversy-in-psychology-bullying-file-drawer-effect-blog-posts-repligate.html
[Google Scholar]
Mischel W. 2008. The toothbrush problem. APS Observer Dec. 1. https://www.psychologicalscience.org/observer/the-toothbrush-problem
[Google Scholar]
Moran T, Hughes S, Hussey I, Vadillo MA, Olson MA et al. 2020. Incidental attitude formation via the surveillance task: a Registered Replication Report of Olson and Fazio (2001). PsyArXiv, April 17. https://doi.org/10/ghwq2z
[Crossref]
Moshontz H, Campbell L, Ebersole CR, IJzerman H, Urry HL et al. 2018. The Psychological Science Accelerator: advancing psychology through a distributed collaborative network. Adv. Methods Pract. Psychol. Sci. 1:4501–15
[Google Scholar]
Munafò MR, Chambers CD, Collins AM, Fortunato L, Macleod MR. 2020. Research culture and reproducibility. Trends Cogn. Sci. 24:291–93
[Google Scholar]
Muthukrishna M, Henrich J. 2019. A problem in theory. Nat. Hum. Behav. 3:3221–29
[Google Scholar]
Natl. Acad. Sci. Eng. Med 2019. Reproducibility and Replicability in Science Washington, DC: Natl. Acad. Press
Nelson LD, Simmons J, Simonsohn U. 2018. Psychology's renaissance. Annu. Rev. Psychol. 69:511–34
[Google Scholar]
Nickerson RS. 1998. Confirmation bias: a ubiquitous phenomenon in many guises. Rev. Gen. Psychol. 2:2175–220
[Google Scholar]
Nosek B. 2019a. Strategy for culture change. Center for Open Science June 11. https://www.cos.io/blog/strategy-for-culture-change
[Google Scholar]
Nosek B. 2019b. The rise of open science in psychology, a preliminary report. Center for Open Science June 3. https://www.cos.io/blog/rise-open-science-psychology-preliminary-report
[Google Scholar]
Nosek BA, Alter G, Banks GC, Borsboom D, Bowman SD et al. 2015. Promoting an open research culture. Science 348:62421422–25
[Google Scholar]
Nosek BA, Beck ED, Campbell L, Flake JK, Hardwicke TE et al. 2019. Preregistration is hard, and worthwhile. Trends Cogn. Sci. 23:10815–18
[Google Scholar]
Nosek BA, Ebersole CR, DeHaven AC, Mellor DT. 2018. The preregistration revolution. PNAS 115:112600–6
[Google Scholar]
Nosek BA, Errington TM. 2020a. What is replication?. PLOS Biol 18:3e3000691
[Google Scholar]
Nosek BA, Errington TM. 2020b. The best time to argue about what a replication means? Before you do it. Nature 583:7817518–20
[Google Scholar]
Nosek BA, Gilbert EA. 2017. Mischaracterizing replication studies leads to erroneous conclusions. PsyArXiv, April 18. https://doi.org/10.31234/osf.io/nt4d3
[Crossref]
Nosek BA, Spies JR, Motyl M. 2012. Scientific utopia: II. Restructuring incentives and practices to promote truth over publishability. Perspect. On Psychol. Sci. 7:6615–31
[Google Scholar]
Nuijten MB, Bakker M, Maassen E, Wicherts JM. 2018. Verify original results through reanalysis before replicating. Behav. Brain Sci. 41:e143
[Google Scholar]
Nuijten MB, Hartgerink CHJ, van Assen MALM, Epskamp S, Wicherts JM 2016. The prevalence of statistical reporting errors in psychology (1985–2013). Behav. Res. Methods 48:41205–26
[Google Scholar]
Nuijten MB, van Assen MA, Veldkamp CL, Wicherts JM. 2015. The replication paradox: Combining studies can decrease accuracy of effect size estimates. Rev. Gen. Psychol. 19:2172–82
[Google Scholar]
O'Donnell M, Nelson LD, Ackermann E, Aczel B, Akhtar A et al. 2018. Registered Replication Report: Dijksterhuis and van Knippenberg (1998). Perspect. Psychol. Sci. 13:2268–94
[Google Scholar]
Olsson-Collentine A, Wicherts JM, van Assen MALM. 2020. Heterogeneity in direct replications in psychology and its association with effect size. Psychol. Bull. 146:10922–40
[Google Scholar]
Open Sci. Collab 2015. Estimating the reproducibility of psychological science. Science 349:6251aac4716
[Google Scholar]
Patil P, Peng RD, Leek JT. 2016. What should researchers expect when they replicate studies? A statistical view of replicability in psychological science. Perspect. Psychol. Sci. 11:4539–44
[Google Scholar]
Pawel S, Held L. 2020. Probabilistic forecasting of replication studies. PLOS ONE 15:4e0231416
[Google Scholar]
Perugini M, Gallucci M, Costantini G. 2014. Safeguard power as a protection against imprecise power estimates. Perspect. Psychol. Sci. 9:3319–32
[Google Scholar]
Protzko J, Krosnick J, Nelson LD, Nosek BA, Axt J et al. 2020. High replicability of newly-discovered social-behavioral findings is achievable. PsyArXiv, Sept. 10. https://doi.org/10.31234/osf.io/n2a9x
[Crossref]
Rogers EM. 2003. Diffusion of Innovations New York: Free Press, 5th ed..
Romero F. 2017. Novelty versus replicability: virtues and vices in the reward system of science. Philos. Sci. 84:51031–43
[Google Scholar]
Rosenthal R. 1979. The file drawer problem and tolerance for null results. Psychol. Bull. 86:3638–41
[Google Scholar]
Rothstein HR, Sutton AJ, Borenstein M 2005. Publication bias in meta-analysis. Publication Bias in Meta-Analysis: Prevention, Assessment and Adjustments HR Rothstein, AJ Sutton, M Borenstein 1–7 Chichester, UK: Wiley & Sons
[Google Scholar]
Rouder JN. 2016. The what, why, and how of born-open data. Behav. Res. Methods 48:31062–69
[Google Scholar]
Scheel AM, Schijen M, Lakens D. 2020. An excess of positive results: comparing the standard psychology literature with Registered Reports. PsyArXiv, Febr. 5. https://doi.org/10.31234/osf.io/p6e9c
[Crossref]
Schimmack U. 2012. The ironic effect of significant results on the credibility of multiple-study articles. Psychol. Methods 17:4551–66
[Google Scholar]
Schmidt S. 2009. Shall we really do it again? The powerful concept of replication is neglected in the social sciences. Rev. Gen. Psychol. 13:290–100
[Google Scholar]
Schnall S 2014. Commentary and rejoinder on Johnson, Cheung, and Donnellan (2014a). Clean data: Statistical artifacts wash out replication efforts. Soc. Psychol. 45:4315–17
[Google Scholar]
Schwarz N, Strack F. 2014. Does merely going through the same moves make for a “direct” replication? Concepts, contexts, and operationalizations. Soc. Psychol. 45:4305–6
[Google Scholar]
Schweinsberg M, Madan N, Vianello M, Sommer SA, Jordan J et al. 2016. The pipeline project: pre-publication independent replications of a single laboratory's research pipeline. J. Exp. Soc. Psychol. 66:55–67
[Google Scholar]
Sedlmeier P, Gigerenzer G. 1992. Do studies of statistical power have an effect on the power of studies?. Psychol. Bull. 105:2309–16
[Google Scholar]
Shadish WR, Cook TD, Campbell DT 2002. Experimental and Quasi-Experimental Designs for Generalized Causal Inference Boston: Houghton Mifflin
Shiffrin RM, Börner K, Stigler SM. 2018. Scientific progress despite irreproducibility: a seeming paradox. PNAS 115:112632–39
[Google Scholar]
Shih M, Pittinsky TL 2014. Reflections on positive stereotypes research and on replications. Soc. Psychol. 45:4335–38
[Google Scholar]
Silberzahn R, Uhlmann EL, Martin DP, Anselmi P, Aust F et al. 2018. Many analysts, one data set: making transparent how variations in analytic choices affect results. Adv. Methods Pract. Psychol. Sci. 1:3337–56
[Google Scholar]
Simmons JP, Nelson LD, Simonsohn U 2011. False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychol. Sci. 22:111359–66
[Google Scholar]
Simons DJ. 2014. The value of direct replication. Perspect. Psychol. Sci. 9:176–80
[Google Scholar]
Simons DJ, Shoda Y, Lindsay DS. 2017. Constraints on generality (COG): a proposed addition to all empirical papers. Perspect. Psychol. Sci. 12:61123–28
[Google Scholar]
Simonsohn U. 2015. Small telescopes: detectability and the evaluation of replication results. Psychol. Sci. 26:5559–69
[Google Scholar]
Simonsohn U, Simmons JP, Nelson LD. 2020. Specification curve analysis. Nat. Hum. Behav. 4:1208–14
[Google Scholar]
Smaldino PE, McElreath R. 2016. The natural selection of bad science. R. Soc. Open Sci. 3:9160384
[Google Scholar]
Smith PL, Little DR. 2018. Small is beautiful: in defense of the small-N design. Psychon. Bull. Rev. 25:62083–101
[Google Scholar]
Soderberg CK. 2018. Using OSF to share data: a step-by-step guide. Adv. Methods Pract. Psychol. Sci. 1:1115–20
[Google Scholar]
Soderberg CK, Errington T, Schiavone SR, Bottesini JG, Thorn FS et al. 2021. Initial evidence of research quality of Registered Reports compared with the standard publishing model. Nat. Hum. Behav 5:8990–97
[Google Scholar]
Soto CJ. 2019. How replicable are links between personality traits and consequential life outcomes? The life outcomes of personality replication project. Psychol. Sci. 30:5711–27
[Google Scholar]
Spellman BA. 2015. A short (personal) future history of revolution 2.0. Perspect. Psychol. Sci. 10:6886–99
[Google Scholar]
Steegen S, Tuerlinckx F, Gelman A, Vanpaemel W 2016. Increasing transparency through a multiverse analysis. Perspect. Psychol. Sci. 11:5702–12
[Google Scholar]
Sterling TD. 1959. Publication decisions and their possible effects on inferences drawn from tests of significance—or vice versa. J. Am. Stat. Assoc. 54:28530–34
[Google Scholar]
Sterling TD, Rosenbaum WL, Weinkam JJ. 1995. Publication decisions revisited: the effect of the outcome of statistical tests on the decision to publish and vice versa. Am. Stat. 49:108–12
[Google Scholar]
Stroebe W, Strack F. 2014. The alleged crisis and the illusion of exact replication. Perspect. Psychol. Sci. 9:159–71
[Google Scholar]
Szucs D, Ioannidis JPA. 2017. Empirical assessment of published effect sizes and power in the recent cognitive neuroscience and psychology literature. PLOS Biol 15:3e2000797
[Google Scholar]
Tiokhin L, Derex M. 2019. Competition for novelty reduces information sampling in a research game—a registered report. R. Soc. Open Sci. 6:5180934
[Google Scholar]
Van Bavel JJ, Mende-Siedlecki P, Brady WJ, Reinero DA 2016. Contextual sensitivity in scientific reproducibility. PNAS 113:236454–59
[Google Scholar]
Vazire S. 2018. Implications of the credibility revolution for productivity, creativity, and progress. Perspect. Psychol. Sci. 13:4411–17
[Google Scholar]
Vazire S, Schiavone SR, Bottesini JG. 2020. Credibility beyond replicability: improving the four validities in psychological science. PsyArXiv, Oct. 7. https://doi.org/10.31234/osf.io/bu4d3
[Crossref]
Verhagen J, Wagenmakers E-J. 2014. Bayesian tests to quantify the result of a replication attempt. J. Exp. Psychol. Gen. 143:41457–75
[Google Scholar]
Verschuere B, Meijer EH, Jim A, Hoogesteyn K, Orthey R et al. 2018. Registered Replication Report on Mazar, Amir, and Ariely (2008). Adv. Methods Pract. Psychol. Sci. 1:3299–317
[Google Scholar]
Vosgerau J, Simonsohn U, Nelson LD, Simmons JP 2019. 99% impossible: a valid, or falsifiable, internal meta-analysis. J. Exp. Psychol. Gen. 148:91628–39
[Google Scholar]
Wagenmakers E-J, Beek T, Dijkhoff L, Gronau QF, Acosta A et al. 2016. Registered Replication Report: Strack, Martin, & Stepper (1988). Perspect. Psychol. Sci. 11:6917–28
[Google Scholar]
Wagenmakers E-J, Wetzels R, Borsboom D, van der Maas HL. 2011. Why psychologists must change the way they analyze their data. The case of psi: comment on Bem (2011). J. Pers. Soc. Psychol. 100:3426–32
[Google Scholar]
Wagenmakers E-J, Wetzels R, Borsboom D, van der Maas HL, Kievit RA. 2012. An agenda for purely confirmatory research. Perspect. Psychol. Sci. 7:6632–38
[Google Scholar]
Wagge J, Baciu C, Banas K, Nadler JT, Schwarz S et al. 2018. A demonstration of the Collaborative Replication and Education Project: replication attempts of the red-romance effect. PsyArXiv, June 22. https://doi.org/10.31234/osf.io/chax8
[Crossref]
Whitcomb D, Battaly H, Baehr J, Howard-Snyder D. 2017. Intellectual humility: owning our limitations. Philos. Phenomenol. Res. 94:3509–39
[Google Scholar]
Wicherts JM, Bakker M, Molenaar D. 2011. Willingness to share research data is related to the strength of the evidence and the quality of reporting of statistical results. PLOS ONE 6:11e26828
[Google Scholar]
Wiktop G. 2020. Systematizing Confidence in Open Research and Evidence (SCORE). Defense Advanced Research Projects Agency https://www.darpa.mil/program/systematizing-confidence-in-open-research-and-evidence
[Google Scholar]
Wilkinson MD, Dumontier M, Aalbersberg IJ, Appleton G, Axton M et al. 2016. The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 3:1160018
[Google Scholar]
Wilson BM, Harris CR, Wixted JT. 2020. Science is not a signal detection problem. PNAS 117:115559–67
[Google Scholar]
Wilson BM, Wixted JT. 2018. The prior odds of testing a true effect in cognitive and social psychology. Adv. Methods Pract. Psychol. Sci. 1:2186–97
[Google Scholar]
Wintle B, Mody F, Smith E, Hanea A, Wilkinson DP et al. 2021. Predicting and reasoning about replicability using structured groups. MetaArXiv, May 4. https://doi.org/10.31222/osf.io/vtpmb
[Crossref]
Yang Y, Youyou W, Uzzi B. 2020. Estimating the deep replicability of scientific findings using human and artificial intelligence. PNAS 117:2010762–68
[Google Scholar]
Yarkoni T. 2019. The generalizability crisis. PsyArXiv, Nov. 22. https://doi.org/10.31234/osf.io/jqw35
[Crossref]
Yong E 2012. A failed replication draws a scathing personal attack from a psychology professor. National Geo-graphic March 10. https://www.nationalgeographic.com/science/phenomena/2012/03/10/failed-replication-bargh-psychology-study-doyen/
[Google Scholar]
Zwaan RA, Etz A, Lucas RE, Donnellan MB 2018. Improving social and behavioral science by making replication mainstream: a response to commentaries. Behav. Brain Sci. 41:e157
[Google Scholar]

/content/journals/10.1146/annurev-psych-020821-114157

Replicability, Robustness, and Reproducibility in Psychological Science

Annual Review of Psychology 73, 719 (2022); https://doi.org/10.1146/annurev-psych-020821-114157

/content/journals/10.1146/annurev-psych-020821-114157

Data & Media loading...

Supplemental Material

Supplementary Data

Download the Supplemental Material as a single PDF. Includes Supplemental Text, Supplemental Tables 1-12, and Supplemental Figures 1-2.

Article Type: Review Article

Most Cited Most Cited RSS feed

- Job Burnout
  
  Christina Maslach, Wilmar B. Schaufeli, and Michael P. Leiter
  
  Vol. 52 (2001), pp. 397–422
- Executive Functions
  
  Adele Diamond
  
  Vol. 64 (2013), pp. 135–168
- Social Cognitive Theory: An Agentic Perspective
  
  Albert Bandura
  
  Vol. 52 (2001), pp. 1–26
- On Happiness and Human Potentials: A Review of Research on Hedonic and Eudaimonic Well-Being
  
  Richard M. Ryan, and Edward L. Deci
  
  Vol. 52 (2001), pp. 141–166
- Mediation Analysis
  
  David P. MacKinnon, Amanda J. Fairchild, and Matthew S. Fritz
  
  Vol. 58 (2007), pp. 593–614
- Missing Data Analysis: Making It Work in the Real World
  
  John W. Graham
  
  Vol. 60 (2009), pp. 549–576
- Sources of Method Bias in Social Science Research and Recommendations on How to Control It
  
  Philip M. Podsakoff, Scott B. MacKenzie, and Nathan P. Podsakoff
  
  Vol. 63 (2012), pp. 539–569
- Grounded Cognition
  
  Lawrence W. Barsalou
  
  Vol. 59 (2008), pp. 617–645
- Personality Structure: Emergence of the Five-Factor Model
  
  J M Digman
  
  Vol. 41 (1990), pp. 417–440
- Motivational Beliefs, Values, and Goals
  
  Jacquelynne S. Eccles, and Allan Wigfield
  
  Vol. 53 (2002), pp. 109–132
More Less

Annual Review of Psychology

Volume 73, 2022

Review Article

Free

Replicability, Robustness, and Reproducibility in Psychological Science

Abstract

Supplementary Data

Most Read This Month

Most Cited Most Cited RSS feed

Job Burnout

Executive Functions

Social Cognitive Theory: An Agentic Perspective

On Happiness and Human Potentials: A Review of Research on Hedonic and Eudaimonic Well-Being

Mediation Analysis

Missing Data Analysis: Making It Work in the Real World

Sources of Method Bias in Social Science Research and Recommendations on How to Control It

Grounded Cognition

Personality Structure: Emergence of the Five-Factor Model

Motivational Beliefs, Values, and Goals