1932

Abstract

Recent evidence suggests that research practices in psychology and many other disciplines are far less effective than previously assumed, which has led to what has been called a “crisis of confidence” in psychological research (e.g., Pashler & Wagenmakers 2012). In response to the perceived crisis, standard research practices have come under intense scrutiny, and various changes have been suggested to improve them. The burgeoning field of metascience seeks to use standard quantitative data-gathering and modeling techniques to understand the reasons for inefficiency, to assess the likely effects of suggested changes, and ultimately to tell psychologists how to do better science. We review the pros and cons of suggested changes, highlighting the many complex research trade-offs that must be addressed to identify better methods.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-psych-020821-094927
2022-01-04
2024-04-16
Loading full text...

Full text loading...

/deliver/fulltext/psych/73/1/annurev-psych-020821-094927.html?itemId=/content/journals/10.1146/annurev-psych-020821-094927&mimeType=html&fmt=ahah

Literature Cited

  1. Aczel B, Palfi B, Szaszi B 2017. Estimating the evidential value of significant results in psychological science. PLOS ONE 12:8e0182651
    [Google Scholar]
  2. Albers C. 2019. The problem with unadjusted multiple and sequential statistical testing. Nat. Commun. 10:1921
    [Google Scholar]
  3. Amrhein V, Greenland S, McShane B. 2019. Retire statistical significance. Nature 567:305–7
    [Google Scholar]
  4. Armitage P, McPherson CK, Rowe BC. 1969. Repeated significance tests on accumulating data. J. R. Stat. Soc. A 132:2235–44
    [Google Scholar]
  5. Asendorpf JB, Conner M, De Fruyt F, De Houwer J, Denissen JJA et al. 2013. Recommendations for increasing replicability in psychology. Eur. J. Pers. 27:2108–19
    [Google Scholar]
  6. Baker DH, Vilidaite G, Lygo FA, Smith AK, Flack TR et al. 2021. Power contours: optimising sample size and precision in experimental psychology and human neuroscience. Psychol. Methods 26:3295314
    [Google Scholar]
  7. Baker M. 2016. Is there a reproducibility crisis?. Nature 533:452–54
    [Google Scholar]
  8. Baker SG, Heidenberger K. 1989. Choosing sample sizes to maximize expected health benefits subject to a constraint on total trial costs. Med. Decis. Mak. 9:114–25
    [Google Scholar]
  9. Bakker M, Van Dijk A, Wicherts JM. 2012. The rules of the game called psychological science. Perspect. Psychol. Sci. 7:6543–54
    [Google Scholar]
  10. Barrett LF. 2020. Forward into the past. Observer 33:35–7
    [Google Scholar]
  11. Baumeister RF. 2016. Charting the future of social psychology on stormy seas: winners, losers, and recommendations. J. Exp. Soc. Psychol. 66:153–58
    [Google Scholar]
  12. Begley CG, Ellis LM. 2012. Drug development: raise standards for preclinical cancer research. Nature 483:7391531–33
    [Google Scholar]
  13. Begley CG, Ioannidis JPA. 2015. Reproducibility in science: improving the standard for basic and preclinical research. Circ. Res. 116:1116–26
    [Google Scholar]
  14. Benjamin DJ, Berger JO, Johannesson M, Nosek BA, Wagenmakers EJ et al. 2018. Redefine statistical significance. Nat. Hum. Behav. 2:6–10
    [Google Scholar]
  15. Bero L. 2018. Meta-research matters: meta-spin cycles, the blindness of bias, and rebuilding trust. PLOS Biol 16:4e2005972
    [Google Scholar]
  16. Berry DA, Ho CH. 1988. One-sided sequential stopping boundaries for clinical trials: a decision-theoretic approach. Biometrics 44:1219–27
    [Google Scholar]
  17. Białek M. 2018. Replications can cause distorted belief in scientific progress. Behav. Brain Sci. 41:e122
    [Google Scholar]
  18. Brown AN, Wood BDK. 2018. Replication studies of development impact evaluations. J. Dev. Stud. 55:5917–25
    [Google Scholar]
  19. Bueno de Mesquita B, Gleditsch NP, James P, King G, Metelits C et al. 2003. Symposium on replication in international studies research. Int. Stud. Perspect. 4:172–107
    [Google Scholar]
  20. Button KS, Ioannidis JPA, Mokrysz C, Nosek BA, Flint J et al. 2013. Power failure: why small sample size undermines the reliability of neuroscience. Nat. Rev. Neurosci. 14:5365–76
    [Google Scholar]
  21. Button KS, Munafò MR 2017. Powering reproducible research. Psychological Science Under Scrutiny: Recent Challenges and Proposed Remedies ed. SO Lilienfeld, ID Waldman 22–33 New York: Wiley
    [Google Scholar]
  22. Carey B. 2015. Many psychology findings not as strong as claimed, study says. New York Times Aug. 27
    [Google Scholar]
  23. Chalmers I, Bracken MB, Djulbegovic B, Garattini S, Grant J et al. 2014. How to increase value and reduce waste when research priorities are set. Lancet 383:9912156–65
    [Google Scholar]
  24. Chalmers I, Glasziou P 2009. Avoidable waste in the production and reporting of research evidence. Lancet 374:968386–89
    [Google Scholar]
  25. Chambers CD. 2020. Frontloading selectivity: a third way in scientific publishing?. PLOS Biol 18:3e3000693
    [Google Scholar]
  26. Clark-Carter D. 1997. The account taken of statistical power in research published in the British Journal of Psychology. Br. J. Psychol. 88:71–83
    [Google Scholar]
  27. Cohen J. 1962. The statistical power of abnormal-social psychological research: a review. J. Abnorm. Soc. Psychol. 65:145–53
    [Google Scholar]
  28. Cohen J. 1988. Statistical Power Analysis for the Behavioral Sciences Hillsdale, NJ: Lawrence Erlbaum, 2nd ed..
  29. Coles NA, Tiokhin L, Scheel AM, Isager PM, Lakens D. 2018. The costs and benefits of replication studies. Behav. Brain Sci. 41:e124
    [Google Scholar]
  30. Colhoun HM, McKeigue PM, Smith GD. 2003. Problems of reporting genetic associations with complex outcomes. Lancet 361:9360865–72
    [Google Scholar]
  31. Colquhoun D. 2014. An investigation of the false discovery rate and the misinterpretation of p-values. R. Soc. Open Sci. 1:3140216
    [Google Scholar]
  32. Cumming G. 2014. The new statistics: why and how. Psychol. Sci. 25:17–29
    [Google Scholar]
  33. Detsky AS. 1985. Using economic analysis to determine the resource consequences of choices made in planning clinical trials. J. Chronic Dis. 38:9753–65
    [Google Scholar]
  34. Dreber A, Pfeiffer T, Almenberg J, Isaksson S, Wilson B et al. 2015. Using prediction markets to estimate the reproducibility of scientific research. PNAS 112:5015343–47
    [Google Scholar]
  35. Dunbar KN, Fugelsang JA 2005. Causal thinking in science: how scientists and students interpret the unexpected. Scientific and Technological Thinking ME Gorman, RD Tweney, DC Gooding, AP Kincannon 57–79 Mahwah, NJ: Lawrence Erlbaum
    [Google Scholar]
  36. Edwards MA, Roy S. 2017. Academic research in the 21st century: maintaining scientific integrity in a climate of perverse incentives and hypercompetition. Environ. Eng. Sci. 34:151–61
    [Google Scholar]
  37. Etz A, Vandekerckhove J. 2016. A Bayesian perspective on the reproducibility project: psychology. PLOS ONE 11:2e0149794
    [Google Scholar]
  38. Fanelli D. 2012. Negative results are disappearing from most disciplines and countries. Scientometrics 90:3891–904
    [Google Scholar]
  39. Fanelli D, Costas R, Larivière V. 2015. Misconduct policies, academic culture and career stage, not gender or pressures to publish, affect scientific integrity. PLOS ONE 10:6e0127556
    [Google Scholar]
  40. Fiedler K, Kutzner F, Krueger JI. 2012. The long way from α-error control to validity proper: problems with a short-sighted false-positive debate. Perspect. Psychol. Sci. 7:6661–69
    [Google Scholar]
  41. Fiedler K, Schott M 2017. False negatives. Psychological Science Under Scrutiny: Recent Challenges and Proposed Remedies SO Lilienfeld, ID Waldman 53–72 New York: Wiley
    [Google Scholar]
  42. Finkel EJ, Eastwick PW, Reis HT. 2015. Best research practices in psychology: illustrating epistemological and pragmatic considerations with the case of relationship science. J. Pers. Soc. Psychol. 108:2275–97
    [Google Scholar]
  43. Finkel EJ, Eastwick PW, Reis HT. 2017. Replicability and other features of a high-quality science: toward a balanced and empirical approach. J. Pers. Soc. Psychol. 113:2244–53
    [Google Scholar]
  44. Fisher RA. 1925. Statistical Methods for Research Workers Edinburgh, UK: Oliver & Boyd
  45. Francis G. 2013. We don't need replication, but we do need more data. Eur. J. Pers. 27:2125–26
    [Google Scholar]
  46. Freese J, Peterson D. 2017. Replication in social science. Annu. Rev. Sociol. 43:147–65
    [Google Scholar]
  47. Gilbert DT, King G, Pettigrew S, Wilson TD 2016. Comment on “Estimating the reproducibility of psychological science. Science 351:62771037–37
    [Google Scholar]
  48. Gillett R. 1994. The average power criterion for sample size estimation. Statistician 43:389–94
    [Google Scholar]
  49. Gross C. 2016. Scientific misconduct. Annu. Rev. Psychol. 67:693–711
    [Google Scholar]
  50. Hamann S, Canli T. 2004. Individual differences in emotion processing. Curr. Opin. Neurobiol. 14:2233–38
    [Google Scholar]
  51. Hartman TK, Stocks TVA, McKay R, Gibson-Miller J, Levita L et al. 2021. The authoritarian dynamic during the COVID-19 pandemic: effects on nationalism and anti-immigrant sentiment. Soc. Psychol. Pers. Sci. 12:7127485
    [Google Scholar]
  52. Hartshorne JK, Schachner A. 2012. Tracking replicability as a method of post-publication open evaluation. Front. Comput. Neurosci. 6:8
    [Google Scholar]
  53. Head ML, Holman L, Lanfear R, Kahn AT, Jennions MD. 2015. The extent and consequences of p-hacking in science. PLOS Biol 13:3e1002106
    [Google Scholar]
  54. Ioannidis JPA. 2005. Why most published research findings are false. PLOS Med 2:8e124
    [Google Scholar]
  55. Ioannidis JPA. 2018. Meta-research: why research on research matters. PLOS Biol 16:3e2005468
    [Google Scholar]
  56. John LK, Loewenstein G, Prelec D. 2012. Measuring the prevalence of questionable research practices with incentives for truth-telling. Psychol. Sci. 23:524–32
    [Google Scholar]
  57. Johnson VE. 2013. Revised standards for statistical evidence. PNAS 110:4819313–17
    [Google Scholar]
  58. Karrandinos MG. 1976. Optimum sample size and comments on some published formulae. Bull. Entomol. Soc. Am. 22:4417–21
    [Google Scholar]
  59. Kuehberger A, Schulte-Mecklenbeck M. 2018. Selecting target papers for replication. Behav. Brain Sci. 41:e139
    [Google Scholar]
  60. Kuhn TS. 1962. The Structure of Scientific Revolutions Chicago: Univ. Chicago Press
  61. Lakens D. 2014. Performing high-powered studies efficiently with sequential analyses. Eur. J. Soc. Psychol. 44:7701–10
    [Google Scholar]
  62. Lakens D, Adolfi FG, Albers CJ, Anvari F, Apps MAJ et al. 2018. Justify your alpha: a response to “Redefine statistical significance. Nat. Hum. Behav. 2:168–71
    [Google Scholar]
  63. Lakens D, Evers ERK. 2014. Sailing from the seas of chaos into the corridor of stability: practical recommendations to increase the informational value of studies. Perspect. Psychol. Sci. 9:3278–92
    [Google Scholar]
  64. LeBel EP, Berger D, Campbell L, Loving TJ. 2017a. Falsifiability is not optional. J. Pers. Soc. Psychol. 113:2254–61
    [Google Scholar]
  65. LeBel EP, Campbell L, Loving TJ. 2017b. Benefits of open and high-powered research outweigh costs. J. Pers. Soc. Psychol. 113:2230–43
    [Google Scholar]
  66. Leek JT, Peng RD. 2015. Statistics: P values are just the tip of the iceberg. Nature 520:612
    [Google Scholar]
  67. Lenth RV. 2001. Some practical guidelines for effective sample size determination. Am. Stat. 55:3187–93
    [Google Scholar]
  68. Lewandowsky S, Oberauer K. 2020. Low replicability can support robust and efficient science. Nat. Commun. 11:358
    [Google Scholar]
  69. Lilienfeld SO. 2017. Psychology's replication crisis and the grant culture: righting the ship. Perspect. Psychol. Sci. 12:4660–64
    [Google Scholar]
  70. Loftus GR. 1996. Psychology will be a much better science when we change the way we analyze data. Curr. Direct. Psychol. Sci. 5:161–71
    [Google Scholar]
  71. Maxwell SE. 2004. The persistence of underpowered studies in psychological research: causes, consequences, and remedies. Psychol. Methods 9:147–63
    [Google Scholar]
  72. McElreath R, Smaldino PE. 2015. Replication, communication, and the population dynamics of scientific discovery. PLOS ONE 10:8e0136088
    [Google Scholar]
  73. McGrath JE. 1981. Dilemmatics: the study of research choices and dilemmas. Am. Behav. Sci. 25:2179–210
    [Google Scholar]
  74. McShane BB, Böckenholt U. 2014. You cannot step into the same river twice: when power analyses are optimistic. Perspect. Psychol. Sci. 9:6612–25
    [Google Scholar]
  75. McShane BB, Gal D, Gelman A, Robert C, Tackett JL. 2019. Abandon statistical significance. Am. Stat. 73:235–45
    [Google Scholar]
  76. Michaels R. 2017. Confidence in courts: a delicate balance. Science 357:6353764
    [Google Scholar]
  77. Miller JO, Ulrich R. 2016. Optimizing research payoff. Perspect. Psychol. Sci. 11:5664–91
    [Google Scholar]
  78. Miller JO, Ulrich R. 2019. The quest for an optimal alpha. PLOS ONE 14:1e0208631
    [Google Scholar]
  79. Miller JO, Ulrich R. 2021. A simple, general, and efficient method for sequential hypothesis testing: the independent segments procedure. Psychol. Methods 26:448697
    [Google Scholar]
  80. Miller MG. 1996. Optimal allocation of resources to clinical trials. PhD Thesis, Sloan Sch. Manag., Mass. Inst. Technol. Cambridge
    [Google Scholar]
  81. Mosteller F, Weinstein M 1985. Toward evaluating the cost-effectiveness of medical and social experiments. Social Experimentation ed. JA Hausman, DA Wise 221–50 Chicago: Univ. Chicago Press
    [Google Scholar]
  82. Nosek BA, Alter G, Banks GC, Borsboom D, Bowman SD et al. 2015. Promoting an open research culture. Science 348:62421422–25
    [Google Scholar]
  83. Nosek BA, Ebersole CR, DeHaven AC, Mellor DT. 2018. The preregistration revolution. PNAS 115:112600–6
    [Google Scholar]
  84. Nosek BA, Spies JR, Motyl M. 2012. Scientific utopia II: restructuring incentives and practices to promote truth over publishability. Perspect. Psychol. Sci. 7:6615–31
    [Google Scholar]
  85. Nuzzo R. 2014. Scientific method: statistical errors. Nature 506:7487150–52
    [Google Scholar]
  86. Olsson-Collentine A, Wicherts JM, van Assen MALM. 2020. Heterogeneity in direct replications in psychology and its association with effect size. Psychol. Bull. 146:10922–40
    [Google Scholar]
  87. Open Sci. Collab 2015. Estimating the reproducibility of psychological science. Science 349:6251aac4716
    [Google Scholar]
  88. Pashler HE, Harris C. 2012. Is the replicability crisis overblown? Three arguments examined. Perspect. Psychol. Sci. 7:6531–36
    [Google Scholar]
  89. Pashler HE, Wagenmakers E. 2012. Editors' introduction to the special section on replicability in psychological science: a crisis of confidence?. Perspect. Psychol. Sci. 7:6528–30
    [Google Scholar]
  90. Poldrack RA. 2019. The costs of reproducibility. NeuroView 10:111–14
    [Google Scholar]
  91. Popper KR. 2002. 1963. Conjectures and Refutations: The Growth of Scientific Knowledge London: Taylor & Francis:
  92. Roberts RM. 1989. Serendipity: Accidental Discoveries in Science New York: Wiley
  93. Rosenthal R. 1979. The “file drawer problem” and tolerance for null results. Psychol. Bull. 86:638–41
    [Google Scholar]
  94. Rossi JS. 1990. Statistical power of psychological research: What have we gained in 20 years?. J. Consult. Clin. Psychol. 58:5646–56
    [Google Scholar]
  95. Saltelli A, Funtowicz S. 2017. What is science's crisis really about?. Futures 91:5–11
    [Google Scholar]
  96. Schimmack U. 2012. The ironic effect of significant results on the credibility of multiple-study articles. Psychol. Methods 17:4551–66
    [Google Scholar]
  97. Schimmack U. 2020. A meta-psychological perspective on the decade of replication failures in social psychology. Can. Psychol. Psychol. Can. 61:4364–76
    [Google Scholar]
  98. Schnuerch M, Erdfelder E. 2020. Controlling decision errors with minimal costs: the sequential probability ratio t test. Psychol. Methods 25:2206–26
    [Google Scholar]
  99. Schooler J. 2019. Metascience: the science of doing science. Observer 32:926–29
    [Google Scholar]
  100. Schunn CD, Anderson JR. 1999. The generality/specificity of expertise in scientific reasoning. Cogn. Sci. 23:3337–70
    [Google Scholar]
  101. Sedlmeier P, Gigerenzer G. 1989. Do studies of statistical power have an effect on the power of studies?. Psychol. Bull. 105:2309–16
    [Google Scholar]
  102. Sherman RA, Pashler H. 2019. Powerful moderator variables in behavioral science?. Don't bet on them (version 3). PsyArxiv May 24. https://doi.org/10.31234/osf.io/c65wm
    [Crossref] [Google Scholar]
  103. Sibley CG, Greaves LM, Satherley N, Wilson MS, Overall NC et al. 2020. Effects of the COVID-19 pandemic and nationwide lockdown on trust, attitudes towards government, and well-being. Am. Psychol. 75:5618–30
    [Google Scholar]
  104. Simmons JP, Nelson LD, Simonsohn U 2011. False-positive psychology: Undisclosed flexibility in data collection and analysis allows presenting anything as significant. Psychol. Sci. 22:111359–66
    [Google Scholar]
  105. Simon H. 1947. Administrative Behavior: A Study of Decision-Making Processes in Administrative Organization New York: Free Press, 2nd ed..
  106. Simonsohn U, Nelson LD, Simmons JP 2014. P-curve: a key to the file-drawer. J. Exp. Psychol. Gen. 143:2534–47
    [Google Scholar]
  107. Smaldino PE, McElreath R. 2016. The natural selection of bad science. R. Soc. Open Sci. 3:9160384
    [Google Scholar]
  108. Stanley TD, Carter EC, Doucouliagos H 2018. What meta-analyses reveal about the replicability of psychological research. Psychol. Bull. 144:121325–46
    [Google Scholar]
  109. Sterling TD. 1959. Publication decisions and their possible effects on inferences drawn from tests of significance—or vice versa. J. Am. Stat. Assoc. 54:28530–34
    [Google Scholar]
  110. Sternberg RJ, Sternberg K. 2010. The Psychologist's Companion: A Guide to Writing Scientific Papers for Students and Researchers New York: Cambridge Univ. Press
  111. Stroebe W, Postmes T, Spears R 2012. Scientific misconduct and the myth of self-correction in science. Perspect. Psychol. Sci. 7:6670–88
    [Google Scholar]
  112. Stroebe W, Strack F. 2014. The alleged crisis and the illusion of exact replication. Perspect. Psychol. Sci. 9:159–71
    [Google Scholar]
  113. Strube MJ. 2006. SNOOP: a program for demonstrating the consequences of premature and repeated null hypothesis testing. Behav. Res. Methods 38:124–27
    [Google Scholar]
  114. Ulrich R, Miller JO 2018. Some properties of p-curves, with an application to gradual publication bias. Psychol. Methods 23:3546–60
    [Google Scholar]
  115. Ulrich R, Miller JO 2020. Meta-research: Questionable research practices may have little effect on replicability. eLife 9:e58237
    [Google Scholar]
  116. Ulrich R, Miller JO, Erdfelder E. 2018. Effect size estimation from t-statistics in the presence of publication bias: a brief review of existing approaches with some extensions. Z. Psychol. 226:156–80
    [Google Scholar]
  117. Van Bavel JJ, Mende-Siedlecki P, Brady WJ, Reinero DA 2016. Contextual sensitivity in scientific reproducibility. PNAS 113:236454–59
    [Google Scholar]
  118. Wagenmakers EJ, Wetzels R, Borsboom D, Van Der Maas HLJ. 2011. Why psychologists must change the way they analyze their data: the case of psi: comment on Bem; 2011. J. Pers. Soc. Psychol. 100:3426–32
    [Google Scholar]
  119. Wald A 1947. Sequential Analysis New York: Wiley
  120. Williams B, Myerson J, Hale S 2008. Individual differences, intelligence, and behavior analysis. J. Exp. Anal. Behav. 90:2219–31
    [Google Scholar]
  121. Wilson BM, Wixted JT. 2018. The prior odds of testing a true effect in cognitive and social psychology. Adv. Methods Pract. Psychol. Sci. 1:2186–97
    [Google Scholar]
  122. Witt JK. 2019. Insights into criteria for statistical significance from signal detection analysis. Meta-Psychology 3: https://doi.org/10.15626/MP.2018.871
    [Crossref] [Google Scholar]
  123. Yong E 2012. Replication studies: bad copy. Nature 485:298–300
    [Google Scholar]
  124. Zwaan RA, Etz A, Lucas RE, Donnellan MB. 2018. Making replication mainstream. Behav. Brain Sci. 41:e120
    [Google Scholar]
/content/journals/10.1146/annurev-psych-020821-094927
Loading
/content/journals/10.1146/annurev-psych-020821-094927
Loading

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error