On p-Values and Bayes Factors

Leonhard Held; Manuela Ott

doi:10.1146/annurev-statistics-031017-100307

Annual Review of Statistics and Its Application

Volume 5, 2018

Review Article

Free

On p-Values and Bayes Factors

Leonhard Held¹, and Manuela Ott¹
View Affiliations Hide Affiliations

Affiliations: Epidemiology, Biostatistics and Prevention Institute, University of Zurich, CH-8001 Zurich, Switzerland; email: [email protected], [email protected]
Vol. 5:393-419 (Volume publication date March 2018) https://doi.org/10.1146/annurev-statistics-031017-100307
First published as a Review in Advance on December 08, 2017
© Annual Reviews

Abstract

The p-value quantifies the discrepancy between the data and a null hypothesis of interest, usually the assumption of no difference or no effect. A Bayesian approach allows the calibration of p-values by transforming them to direct measures of the evidence against the null hypothesis, so-called Bayes factors. We review the available literature in this area and consider two-sided significance tests for a point null hypothesis in more detail. We distinguish simple from local alternative hypotheses and contrast traditional Bayes factors based on the data with Bayes factors based on p-values or test statistics. A well-known finding is that the minimum Bayes factor, the smallest possible Bayes factor within a certain class of alternative hypotheses, provides less evidence against the null hypothesis than the corresponding p-value might suggest. It is less known that the relationship between p-values and minimum Bayes factors also depends on the sample size and on the dimension of the parameter of interest. We illustrate the transformation of p-values to minimum Bayes factors with two examples from clinical research.

Keyword(s): Bayes factor, evidence, minimum Bayes factor, objective Bayes, p-value, sample size

Article metrics loading...

/content/journals/10.1146/annurev-statistics-031017-100307

2018-03-07

2024-04-27

Full text loading...

/deliver/fulltext/statistics/5/1/annurev-statistics-031017-100307.html?itemId=/content/journals/10.1146/annurev-statistics-031017-100307&mimeType=html&fmt=ahah

Literature Cited

Bayarri MJ, Benjamin DJ, Berger JO, Sellke TM. 2016. Rejection odds and rejection ratios: a proposal for statistical practice in testing hypotheses. J. Math. Psychol. 72:90–103 [Google Scholar]
Bayarri MJ, Berger JO, Forte A, García-Donato G. 2012. Criteria for Bayesian model choice with application to variable selection. Ann. Stat. 40:1550–77 [Google Scholar]
Benjamin DJ, Berger JO, Johannesson M, Nosek BA, Wagenmakers E-J. et al. 2017. Redefine statistical significance. Nat. Hum. Behav. http://dx.doi.org/10.1038/s41562-017-0189-z [Crossref]
Berger J. 2006. The case for objective Bayesian analysis. Bayesian Anal 1:385–402 [Google Scholar]
Berger JO, Sellke T. 1987. Testing a point null hypothesis: the irreconcilability of P values and evidence (with discussion). J. Am. Stat. Assoc. 82:112–39Derivation of minimum Bayes factors for different classes of alternatives, including symmetric and local alternatives. [Google Scholar]
Bernardo JM, Smith AFM. 2000. Bayesian Theory Chichester, UK: Wiley
Berry DA. 2016. p-Values are not what they're cracked up to be. Am. Stat. 70: http://www.tandfonline.com/doi/full/10.1080/00031305.2016.1154108 [Google Scholar]
Bland M. 2015. An Introduction to Medical Statistics Oxford, UK: Oxford Univ. Press. 4th ed.
Casella G, Berger RL. 1987. Reconciling Bayesian and frequentist evidence in the one-sided testing problem. J. Am. Stat. Assoc. 82:106–11 [Google Scholar]
Cox DR. 2006. Principles of Statistical Inference Cambridge, UK: Cambridge Univ. Press
Cox DR, Donnelly CA. 2011. Principles of Applied Statistics Cambridge, UK: Cambridge Univ. Press
Davidson RR, Lever WE. 1970. The limiting distribution of the likelihood ratio statistic under a class of local alternatives. Sankhya Ser. A 32:209–24 [Google Scholar]
Donahue RMJ. 1999. A note on information seldom reported via the P value. Am. Stat. 53:303–6 [Google Scholar]
Edwards W, Lindman H, Savage LJ. 1963. Bayesian statistical inference for psychological research. Psychol. Rev. 70:193–242A celebrated introduction to the Bayesian paradigm; includes a pioneering section on minimum Bayes factors. [Google Scholar]
Fisher RA. 1958. Statistical Methods for Research Workers Edinburgh: Oliver & Boyd. 13th ed.
Good IJ. 1950. Probability and the Weighing of Evidence London: Griffin
Goodman SN. 1999.a Toward evidence-based medical statistics. 1: The p value fallacy. Ann. Intern. Med. 130:995–1004 [Google Scholar]
Goodman SN. 1999.b Toward evidence-based medical statistics. 2: The Bayes factor. Ann. Intern. Med. 130:1005–13Two papers advocating minimum Bayes factors as an alternative to p-values in medical research. [Google Scholar]
Goodman SN. 2005. P value. Encyclopedia of Biostatistics P Armitage, T Colton 3921–25 Chichester, UK: Wiley. 2nd ed. [Google Scholar]
Goodman SN. 2008. A dirty dozen: twelve p-value misconceptions. Semin. Hematol. 45:135–40 [Google Scholar]
Goodman SN. 2016. Aligning statistical and scientific reasoning. Science 352:1180–81 [Google Scholar]
Greenland S, Poole C. 2013. Living with p values: resurrecting a Bayesian perspective on frequentist statistics. Epidemiology 24:62–68 [Google Scholar]
Greenland S, Senn SJ, Rothman KJ, Carlin JB, Poole C. et al. 2016. Statistical tests, p values, confidence intervals, and power: a guide to misinterpretations. Eur. J. Epidemiol. 31:337–50 [Google Scholar]
Held L. 2010. A nomogram for p values. BMC Med. Res. Methodol. 10:21 [Google Scholar]
Held L, Gravestock I, Sabanés Bové D. 2016. Objective Bayesian model selection for Cox regression. Stat. Med. 35:5376–90 [Google Scholar]
Held L, Ott M. 2016. How the maximal evidence of p-values against point null hypotheses depends on sample size. Am. Stat. 70:335–41A sample-size adjusted calibration of p-values is proposed. [Google Scholar]
Held L, Sabanés Bové D, Gravestock I. 2015. Approximate Bayesian model selection with the deviance statistic. Stat. Sci. 30:242–57 [Google Scholar]
Hu J, Johnson VE. 2009. Bayesian model selection using test statistics. J. R. Stat. Soc. B 71:143–58 [Google Scholar]
Hung HMJ, O'Neill RT, Bauer P, Kohne K. 1997. The behavior of the p-value when the alternative hypothesis is true. Biometrics 53:11–22 [Google Scholar]
Jeffreys H. 1961. Theory of Probability Oxford, UK: Oxford Univ. Press. , 3rd ed..
Johnson VE. 2005. Bayes factors based on test statistics. J. R. Stat. Soc. B 67:689–701Bayes factors based on test statistics are introduced. [Google Scholar]
Johnson VE. 2008. Properties of Bayes factors based on test statistics. Scand. J. Stat. 35:354–68 [Google Scholar]
Johnson VE. 2016. Comments on the “ASA Statement on Statistical Significance and P-values” and marginally significant p-values. Am. Stat. 70: http://www.tandfonline.com/doi/full/10.1080/00031305.2016.1154108 [Google Scholar]
Johnson VE, Rossell D. 2010. On the use of non-local prior densities in Bayesian hypothesis tests. J. R. Stat. Soc. B 72:143–70 [Google Scholar]
Kass RE, Raftery AE. 1995. Bayes factors. J. Am. Stat. Assoc. 90:773–95 [Google Scholar]
Lee PM. 2004. Bayesian Statistics: An Introduction London: Wiley. , 3rd ed..
Li Y, Clyde MA. 2016. Mixtures of g-priors in generalized linear models. arXiv1503.06913v2 [stat.ME]
Liang F, Paulo R, Molina G, Clyde MA, Berger JO. 2008. Mixtures of g priors for Bayesian variable selection. J. Am. Stat. Assoc. 103:410–23 [Google Scholar]
Lindley DV. 1957. A statistical paradox. Biometrika 44:187–92 [Google Scholar]
Marsman M, Wagenmakers E-J. 2017. Three insights from a Bayesian interpretation of the one-sided p value. Educ. Psychol. Meas. 77:529–39 [Google Scholar]
Matthews JNS. 2006. Introduction to Randomized Controlled Clinical Trials Boca Raton, FL: Chapman & Hall/CRC, 2nd ed..
Matthews R, Wasserstein R, Spiegelhalter D. 2017. The ASA's p-value statement, one year on. Significance 14:38–41 [Google Scholar]
Ott M, Held L. 2017. Bayesian calibration of p-values from Fisher's exact test Tech. Rep., Univ. Zurich
Ramsey F, Schafer D. 2002. The Statistical Sleuth: A Course in Methods of Data Analysis Belmont, CA: Duxbury. , 2nd ed..
Royall RM. 1986. The effect of sample size on the meaning of significance tests. Am. Stat. 40:313–15 [Google Scholar]
Sellke T, Bayarri MJ, Berger JO. 2001. Calibration of p values for testing precise null hypotheses. Am. Stat 55:62–71A comprehensive paper on the −e p log p calibration gives different derivations. [Google Scholar]
Spiegelhalter DJ, Abrams KR, Myles JP. 2004. Bayesian Approaches to Clinical Trials and Health-Care Evaluation New York: Wiley
Stephens M, Balding DJ. 2009. Bayesian statistical methods for genetic association studies. Nat. Rev. Genet. 10:681–90 [Google Scholar]
Steyerberg EW. 2009. Clinical Prediction Models New York: Springer
Tukey JW. 1980. We need both exploratory and confirmatory. Am. Stat. 34:23–25 [Google Scholar]
Vovk VG. 1993. A logic of probability, with application to the foundations of statistics (with discussion and a reply by the author). J. R. Stat. Soc. B 55:317–51 [Google Scholar]
Wagenmakers E-J. 2007. A practical solution to the pervasive problems of p values. Psychon. Bull. Rev. 14:779–804 [Google Scholar]
Wakefield J. 2009. Bayes factors for genome-wide association studies: comparison with p-values. Genet. Epidemiol. 33:79–86 [Google Scholar]
Wang X, George EI. 2007. Adaptive Bayesian criteria in variable selection for generalized linear models. Stat. Sin. 17:667–90 [Google Scholar]
Wasserstein RL, Lazar NA. 2016. The ASA's statement on p-values: context, process, and purpose. Am. Stat. 70:129–33 [Google Scholar]
Yuan Y, Johnson VE. 2008. Bayesian hypothesis tests using nonparametric statistics. Stat. Sin. 18:1185–200 [Google Scholar]
Zellner A. 1986. On assessing prior distributions and Bayesian regression analysis with g-prior distributions. Bayesian Inference and Decision Techniques: Essays in Honor of Bruno de Finetti PK Goel, A Zellner 233–43 Amsterdam: North-Holland [Google Scholar]

/content/journals/10.1146/annurev-statistics-031017-100307

On p-Values and Bayes Factors

Annual Review of Statistics and Its Application 5, 393 (2018); https://doi.org/10.1146/annurev-statistics-031017-100307

/content/journals/10.1146/annurev-statistics-031017-100307

Data & Media loading...

Article Type: Review Article

Most Cited Most Cited RSS feed

- Probabilistic Forecasting
  
  Tilmann Gneiting, and Matthias Katzfuss
  
  Vol. 1 (2014), pp. 125–151
- Functional Data Analysis
  
  Jane-Ling Wang, Jeng-Min Chiou, and Hans-Georg Müller
  
  Vol. 3 (2016), pp. 257–295
- Bayesian Computing with INLA: A Review
  
  Håvard Rue, Andrea Riebler, Sigrunn H. Sørbye, Janine B. Illian, Daniel P. Simpson, and Finn K. Lindgren
  
  Vol. 4 (2017), pp. 395–421
- Functional Regression
  
  Jeffrey S. Morris
  
  Vol. 2 (2015), pp. 321–359
- Topological Data Analysis
  
  Larry Wasserman
  
  Vol. 5 (2018), pp. 501–532
- Algorithmic Fairness: Choices, Assumptions, and Definitions
  
  Shira Mitchell, Eric Potash, Solon Barocas, Alexander D'Amour, and Kristian Lum
  
  Vol. 8 (2021), pp. 141–163
- Microbiome, Metagenomics, and High-Dimensional Compositional Data Analysis
  
  Hongzhe Li
  
  Vol. 2 (2015), pp. 73–94
- Learning Deep Generative Models
  
  Ruslan Salakhutdinov
  
  Vol. 2 (2015), pp. 361–385
- On p-Values and Bayes Factors
  
  Leonhard Held, and Manuela Ott
  
  Vol. 5 (2018), pp. 393–419
- High-Dimensional Statistics with a View Toward Applications in Biology
  
  Peter Bühlmann, Markus Kalisch, and Lukas Meier
  
  Vol. 1 (2014), pp. 255–278
More Less

Annual Review of Statistics and Its Application

Volume 5, 2018

Review Article

Free

On p-Values and Bayes Factors

Abstract

Most Read This Month

Most Cited Most Cited RSS feed