Psychology advances knowledge by testing statistical hypotheses using empirical observations and data. The expectation is that most statistically significant findings can be replicated in new data and in new laboratories, but in practice many findings have replicated less often than expected, leading to claims of a replication crisis. We review recent methodological literature on questionable research practices, meta-analysis, and power analysis to explain the apparently high rates of failure to replicate. Psychologists can improve research practices to advance knowledge in ways that improve replicability. We recommend that researchers adopt open science conventions of preregi-stration and full disclosure and that replication efforts be based on multiple studies rather than on a single replication attempt. We call for more sophisticated power analyses, careful consideration of the various influences on effect sizes, and more complete disclosure of nonsignificant as well as statistically significant findings.


Article metrics loading...

Loading full text...

Full text loading...


Literature Cited

  1. Adolph KE, Cole WG, Komati M, Garciaguirre JS, Badaly D. et al. 2012. How do you learn to walk? Thousands of steps and dozens of falls per day. Psychol. Sci. 23:111387–94 [Google Scholar]
  2. Aldrich J. 2008. R. A. Fisher on Bayes and Bayes' theorem. Bayesian Anal 3:1161–70 [Google Scholar]
  3. Alogna VK, Attaya MK, Aucoin P, Bahník Š, Birch S. et al. 2014. Registered replication report. Perspect. Psychol. Sci. 9:5556–78 [Google Scholar]
  4. Anderson SF, Maxwell SE. 2017. Addressing the “replication crisis”: using original studies to design replication studies with appropriate statistical power. Multivar. Behav. Res. 52:305–24 [Google Scholar]
  5. Asendorpf JB, Conner M, de Fruyt F, de Houwer J, Denissen JJA. et al. 2013. Recommendations for increasing replicability in psychology. Eur. J. Personal. 27:2108–19 [Google Scholar]
  6. Bakker M, Wicherts JM. 2011. The (mis)reporting of statistical results in psychology journals. Behav. Res. Methods 43:3666–78 [Google Scholar]
  7. Baumeister. 2016. Charting the future of social psychology. J. Exp. Soc. Psychol. 66:153–58 [Google Scholar]
  8. Beasley WH, Rodgers JL. 2009. Resampling theory. Handbook of Quantitative Methods in Psychology, ed R Millsap, A Maydeu-Olivares 362–86 Thousand Oaks, CA: Sage [Google Scholar]
  9. Bench SW, Rivera GN, Schlegel RJ, Hicks JA, Lench HC. 2017. Does expertise matter in replication? An examination of the reproducibility project: psychology. J. Exp. Soc. Psychol. 68:181–84 [Google Scholar]
  10. Benjamini Y, Hochberg Y. 1995. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Ser. B 57:1289–300 [Google Scholar]
  11. Bhattacharjee Y. 2013. The mind of a con man. The New York Times Magazine Apr. 28
  12. Bonett DG. 2009. Meta-analytic interval estimation for standardized and unstandardized mean differences. Psychol. Methods 14:3225–38 [Google Scholar]
  13. Borenstein M, Hedges LV, Higgins JPT, Rothstein HR. 2009. Introduction to Meta-Analysis Hoboken, NJ: Wiley
  14. Box JF. 1978. R. A. Fisher: The Life of a Scientist Hoboken, NJ: Wiley
  15. Brandt MJ, IJzerman H, Dijksterhuis A, Farach FJ, Geller J. et al. 2014. The replication recipe: What makes for a convincing replication. J. Exp. Soc. Psychol. 50:217–24 [Google Scholar]
  16. Braver SL, Thoemmes FJ, Rosenthal R. 2014. Continuously cumulating meta-analysis and replicability. Perspect. Psychol. Sci. 9:3333–42 [Google Scholar]
  17. Brown SD, Furrow D, Hill DF, Gable JC, Porter LP, Jacobs WJ. 2014. A duty to describe. Perspect. Psychol. Sci. 9:6626–40 [Google Scholar]
  18. Button KS, Ioannidis JPA, Mokrysz C, Nosek BA, Flint J. et al. 2013. Power failure: why small sample size undermines the reliability of neuroscience. Nat. Rev. Neurosci. 14:5365–76 [Google Scholar]
  19. Cohen J. 1988. Statistical Power Analysis for the Behavioral Sciences Abingdon, UK: Routledge, 2nd ed..
  20. Cohen J. 1994. The Earth is round (p < 0.05). Am. Psychol. 49:12997–1003 [Google Scholar]
  21. Coyne JC. 2016. Replication initiatives will not salvage the trustworthiness of psychology. BMC Psychol 4:128 [Google Scholar]
  22. Cronbach LJ, Gleser GC, Nanda H, Rajaratnam N. 1972. The Dependability of Behavioral Measurements Hoboken, NJ: Wiley
  23. Cumming G. 2014. The new statistics. Psychol. Sci. 25:17–29 [Google Scholar]
  24. Curran PJ, Hussong AM. 2009. Integrative data analysis. Psychol. Methods 14:281–100 [Google Scholar]
  25. De Angelis CD, Drazen JM, Frizelle FA, Haug C, Hoey J. et al. 2005. Is this clinical trial fully registered? A statement from the International Committee of Medical Journal Editors. N. Engl. J. Med. 352:232436–38 [Google Scholar]
  26. Dickersin K, Rennie D. 2012. The evolution of trial registries and their use to assess the clinical trial enterprise. JAMA 307:171861–64 [Google Scholar]
  27. Ding N, Melloni L, Zhang H, Tian X, Poeppel D. 2016. Cortical tracking of hierarchical linguistic structures in connected speech. Nat. Neurosci. 19:1158–64 [Google Scholar]
  28. Ebersole CR, Atherton OE, Belanger AL, Skulborstad HM, Allen JM. et al. 2016. Many Labs 3: evaluating participant pool quality across the academic semester via replication. J. Exp. Soc. Psychol. 67:68–82 [Google Scholar]
  29. Edwards J, Berry J. 2010. The presence of something or the absence of nothing: increasing theoretical precision in management research. Organ. Res. Methods 13:4668–89 [Google Scholar]
  30. Efron B. 1979. Bootstrap methods: another look at the jackknife. Ann. Stat. 7:11–26 [Google Scholar]
  31. Efron B, Gong G. 1983. A leisurely look at the bootstrap, jackknife, and cross-validation. Am. Stat. 37:36–48 [Google Scholar]
  32. Fabrigar LR, Wegener DT. 2016. Conceptualizing and evaluating the replication of research results. J. Exp. Soc. Psychol. 66:68–80 [Google Scholar]
  33. Fiedler K. 2017. What constitutes strong psychological science? The (neglected) role of diagnosticity and a priori theorizing. Perspect. Psychol. Sci. 12:146–61 [Google Scholar]
  34. Finkel EJ, Eastwick PW, Reis HT. 2015. Best research practices in psychology: illustrating epistemological and pragmatic considerations with the case of relationship science. J. Personal. Soc. Psychol. 108:2275–97 [Google Scholar]
  35. Finkel EJ, Eastwick PW, Reis HT. 2017. Replicability and other features of a high-quality science: toward a balanced and empirical approach. J. Personal. Soc. Psychol. 113:2244–53 [Google Scholar]
  36. Fisher RA. 1925. Statistical Methods for Research Workers Edinburgh, Scotl.: Oliver & Boyd
  37. Fisher RA. 1958. The nature of probability. Centen. Rev. Arts Sci. 2:261–74 [Google Scholar]
  38. Francis G. 2012a. The psychology of replication and replication in psychology. Perspect. Psychol. Sci. 7:6585–94 [Google Scholar]
  39. Francis G. 2012b. Publication bias and the failure of replication in experimental psychology. Psychon. Bull. Rev. 19:6975–91 [Google Scholar]
  40. Francis G. 2012c. Too good to be true: publication bias in two prominent studies from experimental psychology. Psychon. Bull. Rev. 19:2151–56 [Google Scholar]
  41. Gallistel CR. 2009. The importance of proving the null. Psychol. Rev. 116:2439–53 [Google Scholar]
  42. Giner-Sorolla R. 2016. Approaching a fair deal for significance and other concerns. J. Exp. Soc. Psychol. 65:1–6 [Google Scholar]
  43. Greenwald AG. 1976. An editorial. J. Personal. Soc. Psychol. 33:11–7 [Google Scholar]
  44. Hagger MS, Chatzisarantis NLD. 2016. A multilab preregistered replication of the ego-depletion effect. Perspect. Psychol. Sci. 11:4546–73 [Google Scholar]
  45. Ioannidis JPA. 2005. Why most published research findings are false. PLOS Med 2:8e124 [Google Scholar]
  46. Kawakami K. 2015. Editorial. J. Personal. Soc. Psychol. 108:158–59 [Google Scholar]
  47. Kerr N. 1998. HARKing: hypothesizing after the results are known. Personal. Soc. Psychol. Rev. 2:3196–217 [Google Scholar]
  48. Kidwell MC, Lazarević LB, Baranski E, Hardwicke TE, Piechowski S. et al. 2016. Badges to acknowledge open practices: a simple, low-cost, effective method for increasing transparency. PLOS Biol 14:5e1002456 [Google Scholar]
  49. Klein RA, Ratliff KA, Vianello M, Adams RB, Bahnik S. et al. 2014. Investigating variation in replicability: a “many labs” replication project. Soc. Psychol. 45:3142–52 [Google Scholar]
  50. Kruschke JK. 2015. Doing Bayesian Data Analysis: A Tutorial with R, JAGS and Stan Burlington, MA: Academic, 2nd ed..
  51. Kruschke JK, Liddell TM. 2017. The Bayesian new statistics: hypothesis testing, estimation, meta-analysis, and power analysis from a Bayesian perspective. Psychon. Bull. Rev. 24:1–29 [Google Scholar]
  52. LeBel EP, Campbell L, Loving TJ. 2017. Benefits of open and high-powered research outweigh costs. J. Personal. Soc. Psychol. 113:2254–61 [Google Scholar]
  53. Ledgerwood A, Sherman JW. 2012. Short, sweet, and problematic? The rise of the short report in psychological science. Perspect. Psychol. Sci. 7:160–66 [Google Scholar]
  54. Ledgerwood A, Soderberg C, Sparks J. 2017. Designing a study to maximize informational value. Toward a More Perfect Psychology: Improving Trust, Accuracy, and Transparency MC Makel, JA Plucker 33–58 Washington, DC: Am. Psychol. Assoc. [Google Scholar]
  55. Lindsay DS. 2015. Replication in psychological science. Psychol. Sci. 26:121827–32 [Google Scholar]
  56. Maxwell SE, Lau MY, Howard GS. 2015. Is psychology suffering from a replication crisis? What does “failure to replicate” really mean?. Am. Psychol. 70:6487–98 [Google Scholar]
  57. McArdle JJ, Ritschard G. 2014. Contemporary Issues in Exploratory Data Mining in the Behavioral Sciences Abingdon, UK: Routledge
  58. McShane B, Böckenholt U. 2014. You cannot step into the same river twice: when power analyses are optimistic. Perspect. Psychol. Sci. 9:6612–25 [Google Scholar]
  59. McShane BB, Böckenholt U, Hansen KT. 2016. Adjusting for publication bias in meta-analysis. Perspect. Psychol. Sci. 11:5730–49 [Google Scholar]
  60. Meiser T. 2011. Much pain, little gain? Paradigm-specific models and methods in experimental psychology. Perspect. Psychol. Sci. 6:2183–91 [Google Scholar]
  61. Mill JS. 2008 (1843). A System of Logic London: Longmans, Green
  62. Munafò MR, Nosek BA, Bishop DVM, Button KS, Chambers CD. et al. 2017. A manifesto for reproducible science. Nat. Hum. Behav. 1:121 [Google Scholar]
  63. Nelson LD, Simmons J, Simonsohn U. 2018. Psychology's renaissance. Annu. Rev. Psychol 69 In press [Google Scholar]
  64. Neyman J, Pearson ES. 1928. On the use and interpretation of certain test criteria for purposes of statistical inference: part I. Biometrika 20A:1/2175–240 [Google Scholar]
  65. Neyman J, Pearson ES. 1933. On the problem of the most efficient tests of statistical hypotheses. Philos. Trans. R. Soc. Math. Phys. Eng. Sci. 231:289–337 [Google Scholar]
  66. Nosek BA, Alter G, Banks GC, Borsboom D, Bowman SD. et al. 2015. Promoting an open research culture. Science 348:62421422–25 [Google Scholar]
  67. Nosek BA, Bar-Anan Y. 2012. Scientific utopia: I. Opening scientific communication. Psychol. Inq. 23:3217–43 [Google Scholar]
  68. Nosek BA, Lakens DD. 2014. Registered reports: a method to increase the credibility of published results. Soc. Psychol. 45:3137–41 [Google Scholar]
  69. Nosek BA, Spies J, Motyl M. 2012. Scientific utopia: II. Restructuring incentives and practices to promote truth over publishability. Perspect. Psychol. Sci. 7:6615–31 [Google Scholar]
  70. Offord DR, Kraemer HC, Kazdin AE, Jensen PS, Harrington R. 1998. Lowering the burden of suffering from child psychiatric disorder: trade‐offs among clinical, targeted, and universal interventions. J. Am. Acad. Child Adolesc. Psychiatry 37:7686–94 [Google Scholar]
  71. Open Sci. Collab. 2015. Estimating the reproducibility of psychological science. Science 349:6251aac4716 [Google Scholar]
  72. Open Sci. Collab 2017. Maximizing the reproducibility of your research. Psychological Science Under Scrutiny: Recent Challenges and Proposed Solutions SO Lilienfeld, ID Waldman 3–21 New York: Wiley [Google Scholar]
  73. Overall NC, Girme YU, Lemay J, Edward P, Hammond MD. 2014. Attachment anxiety and reactions to relationship threat: the benefits and costs of inducing guilt in romantic partners. J. Personal. Soc. Psychol. 106:2235–56 [Google Scholar]
  74. Overath T, McDermott JH, Zarate JM, Poeppel D. 2015. The cortical analysis of speech-specific temporal structure revealed by responses to sound quilts. Nat. Neurosci. 18:6903–11 [Google Scholar]
  75. Ozomaro U, Wahlestedt C, Nemeroff CB. 2013. Personalized medicine in psychiatry: problems and promises. BMC Med 11:1132 [Google Scholar]
  76. Paolacci G, Chandler J. 2014. Inside the Turk: understanding Mechanical Turk as a participant pool. Curr. Dir. Psychol. Sci. 23:3184–88 [Google Scholar]
  77. Perugini M, Gallucci M, Costantini G. 2014. Safeguard power as a protection against imprecise power estimates. Perspect. Psychol. Sci. 9:3319–32 [Google Scholar]
  78. Richard FD, Bond CF, Stokes-Zoota JJ. 2003. One hundred years of social psychology quantitatively described. Rev. Gen. Psychol. 7:4331–63 [Google Scholar]
  79. Rogers JL, Howard KI, Vessey JT. 1993. Using significance tests to evaluate equivalence between two experimental groups. Psychol. Bull. 113:3553–65 [Google Scholar]
  80. Rossi JS. 1990. Statistical power of psychological research. J. Consult. Clin. Psychol. 58:5646–56 [Google Scholar]
  81. Scheibehenne B, Jamil T, Wagenmakers E. 2016. Bayesian evidence synthesis can reconcile seemingly inconsistent results. Psychol. Sci. 27:71043–46 [Google Scholar]
  82. Schimmack U. 2012. The ironic effect of significant results on the credibility of multiple-study articles. Psychol. Methods 17:4551–66 [Google Scholar]
  83. Schmidt FL, Oh I. 2016. The crisis of confidence in research findings in psychology: Is lack of replication the real problem? Or is it something else. Arch. Sci. Psychol. 4:132–37 [Google Scholar]
  84. Schooler JW, Engstler-Schooler TY. 1990. Verbal overshadowing of visual memories: Some things are better left unsaid. Cogn. Psychol. 22:136–71 [Google Scholar]
  85. Schweinsberg M, Madan N, Vianello M, Sommer SA, Jordan J. et al. 2016. The pipeline project: pre-publication independent replications of a single laboratory's research pipeline. J. Exp. Soc. Psychol. 66:55–67 [Google Scholar]
  86. Seaman MA, Serlin RC. 1998. Equivalence confidence intervals for two-group comparisons of means. Psychol. Methods 3:4403–11 [Google Scholar]
  87. Shrout PE, Yip-Bannicq M. 2017. Inferences about competing measures based on patterns of binary significance tests are questionable. Psychol. Methods 22:184–93 [Google Scholar]
  88. Simmons J, Nelson L, Simonsohn U. 2011. False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychol. Sci. 22:111359–66 [Google Scholar]
  89. Simons DJ, Holcombe AO, Spellman BA. 2014. An introduction to registered replication reports at Perspectives on Psychological Science. Perspect. Psychol. Sci. 9:5552–55 [Google Scholar]
  90. Simonsohn U. 2013. Just post it: the lesson from two cases of fabricated data detected by statistics alone. Psychol. Sci. 24:101875–88 [Google Scholar]
  91. Sripada C, Kessler D, Jonides J. 2014. Methylphenidate blocks effort-induced depletion of regulatory control in healthy volunteers. Psychol. Sci. 25:61227–34 [Google Scholar]
  92. Stewart LA, Parmar MKB. 1993. Meta-analysis of the literature or of individual patient data: Is there a difference. Lancet 341:8842418–22 [Google Scholar]
  93. Stigler SM. 1986. The History of Statistics: The Measurement of Uncertainty Before 1900 Cambridge, MA: Harvard Univ. Press
  94. Stroebe W. 2016. Are most published social psychological findings false?. J. Exp. Soc. Psychol. 66:134–44 [Google Scholar]
  95. Tackett JL, Lilienfeld SO, Johnson SL, Krueger RF, Miller JD. et al. 2017. It's time to broaden the replicability conversation: thoughts for and from clinical psychological science. Perspect. Psychol. Sci. 12:5742–56 [Google Scholar]
  96. Taylor DJ, Muller KE. 1996. Bias in linear model power and sample size calculation due to estimating noncentrality. Commun. Stat. Theory Methods 25:71595–610 [Google Scholar]
  97. Tukey JW. 1977. Exploratory Data Analysis Reading, MA: Addison-Wesley
  98. Van Bavel JJ, Mende-Siedlecki P, Brady WJ, Reinero DA. 2016. Contextual sensitivity in scientific reproducibility. PNAS 113:236454–59 [Google Scholar]
  99. Vazire S. 2016. Editorial. Soc. Psychol. Personal. Sci. 7:13–7 [Google Scholar]
  100. Verhagen J, Wagenmakers E. 2014. Bayesian tests to quantify the result of a replication attempt. J. Exp. Psychol. Gen. 143:41457–75 [Google Scholar]
  101. Vul E, Harris C, Winkielman P, Pashler H. 2009. Puzzlingly high correlations in fMRI studies of emotion, personality, and social cognition. Perspect. Psychol. Sci. 4:3274–90 [Google Scholar]
  102. Wade N. 2010. Inquiry on Harvard lab threatens ripple effect. The New York Times, Aug. 12
  103. Wagenmakers E, Verhagen J, Ly A. 2016. How to quantify the evidence for the absence of a correlation. Behav. Res. Methods 48:2413–26 [Google Scholar]
  104. Wagenmakers E, Wetzels R, Borsboom D, van der Maas HLJ, Kievit RA. 2012. An agenda for purely confirmatory research. Perspect. Psychol. Sci. 7:6632–38 [Google Scholar]
  105. Westfall J, Judd CM, Kenny DA. 2015. Replicating studies in which samples of participants respond to samples of stimuli. Perspect. Psychol. Sci. 10:3390–99 [Google Scholar]
  106. Zabell S. 1989. R. A. Fisher on the history of inverse probability. Stat. Sci. 4:3247–56 [Google Scholar]

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error