1932

Abstract

Statistical distances, divergences, and similar quantities have an extensive history and play an important role in the statistical and related scientific literature. This role shows up in estimation, where we often use estimators based on minimizing a distance. Distances also play a prominent role in hypothesis testing and in model selection. We review the statistical properties of distances that are often used in scientific work, present their properties, and show how they compare to each other. We discuss an approximation framework for model-based inference using statistical distances. Emphasis is placed on identifying in what sense and which statistical distances can be interpreted as loss functions and used for model assessment. We review a special class of distances, the class of quadratic distances, connect it with the classical goodness-of-fit paradigm, and demonstrate its use in the problem of assessing model fit. These methods can be used in analyzing very large samples.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-statistics-031219-041228
2021-03-07
2024-06-24
Loading full text...

Full text loading...

/deliver/fulltext/statistics/8/1/annurev-statistics-031219-041228.html?itemId=/content/journals/10.1146/annurev-statistics-031219-041228&mimeType=html&fmt=ahah

Literature Cited

  1. Akaike H. 1974. A new look at the statistical model identification. IEEE Trans. Autom. Control 19:716–23
    [Google Scholar]
  2. Ali SM, Silvey SD. 1966. A general class of coefficients of divergence of one distribution from another. J. R. Stat. Soc. B 28:131–42
    [Google Scholar]
  3. Anderson NH, Hall P, Titterington DM 1994. Two-sample test statistics for measuring discrepancies between two multivariate probability density functions using kernel-based density estimates. J. Multivar. Anal. 50:41–54
    [Google Scholar]
  4. Aslan B, Zech G. 2002. A new class of binning free, multivariate goodness-of-fit tests: the energy tests. arXiv:hep-ex/0203010
  5. Balasubramanian K, Li T, Yuan M 2017. On the optimality of kernel-embedding based goodness-of-fit tests. arXiv:1709.08148v1 [stat.ML]
  6. Basu A, Harris IR, Hjort NL, Jones MC 1998. Robust and efficient estimation by minimising a density power divergence. Biometrika 85:549–59
    [Google Scholar]
  7. Basu A, Lindsay BG. 1994. Minimum disparity estimation for continuous models: efficiency, distributions and robustness. Ann. Inst. Stat. Math. 46:683–705
    [Google Scholar]
  8. Basu A, Mandal A, Pardo L 2010. Hypothesis testing for two discrete populations based on the Hellinger distance. Stat. Probab. Lett. 80:206–14
    [Google Scholar]
  9. Basu A, Shioya H, Park C 2011. Statistical Inference: The Minimum Distance Approach Boca Raton, FL: CRC Press
    [Google Scholar]
  10. Beran R. 1977. Minimum Hellinger distance estimates for parametric models. Ann. Stat. 5:445–63
    [Google Scholar]
  11. Berkson J. 1980. Minimum chi-square, not maximum likelihood. ! Ann. Stat. 8:457–87
    [Google Scholar]
  12. Bickel PJ, Rosenblatt M. 1973. On some global measures of the deviations of density function estimates. Ann. Stat. 1:1071–95
    [Google Scholar]
  13. Blume JD. 2002. Likelihood methods for measuring statistical evidence. Stat. Med. 21:2563–99
    [Google Scholar]
  14. Bowman AW. 1992. Density based tests for goodness-of-fit. J. Stat. Comput. Simul. 40:1–3
    [Google Scholar]
  15. Bowman AW, Foster PJ. 1993. Adaptive smoothing and density-based tests of multivariate normality. J. Am. Stat. Assoc. 88:529–37
    [Google Scholar]
  16. Burnham KP, Anderson DR. 2002. Model Selection and Multimodel Inference: A Practical Information-Theoretic Approach New York: Springer-Verlag. , 2nd. ed.
    [Google Scholar]
  17. Chen Y, Markatou M. 2020. Kernel tests for one, two, and k-sample goodness-of-fit: state of the art and implementation considerations. Statistical Modeling in Biomedical Research: Emerging Topics in Statistics and Biostatistics Y Zhao, DG Chen 309–37 New York: Springer
    [Google Scholar]
  18. Chwialkowski K, Strathmann H, Gretton A 2016. A kernel test of goodness of fit. PMLR 48:2606–15
    [Google Scholar]
  19. Commenges D, Sayyareh A, Letenneur L, Guedj J, Bar-Hen A 2008. Estimating a difference of Kullback–Leibler risks using a normalized difference of AIC. Ann. Appl. Stat. 2:1123–42
    [Google Scholar]
  20. Cover TM, Thomas JA. 2012. Elements of Information Theory New York: Wiley. , 2nd. ed.
    [Google Scholar]
  21. Cressie N, Read TRC. 1984. Multinomial goodness-of-fit tests. J. R. Stat. Soc. B 46:440–64
    [Google Scholar]
  22. Csiszár I. 1967. Information-type measures of difference of probability distributions and indirect observations. Stud. Sci. Math. Hung. 2:299–318
    [Google Scholar]
  23. Cutler A, Coredro-Brana OI. 1996. Minimum Hellinger distance estimation for finite mixture models. J. Am. Stat. Assoc. 91:1716–23
    [Google Scholar]
  24. Davies PL. 1995. Data features. Stat. Neerl. 49:185–245
    [Google Scholar]
  25. Donoho DL, Liu RC. 1988. Pathologies of some minimum distance estimators. Ann. Stat. 16:587–608
    [Google Scholar]
  26. Edelmann D, Fokianos K, Pitsillou M 2019. An updated literature review of distance correlation and its applications to time series. Int. Stat. Rev. 87:237–62
    [Google Scholar]
  27. Fan Y. 1997. Goodness-of-fit tests for a multivariate distribution by the empirical characteristic function. J. Multivar. Anal. 62:36–63
    [Google Scholar]
  28. Fan Y. 1998. Goodness-of-fit tests based on kernel density estimators with fixed smoothing parameters. Econom. Theory 14:604–21
    [Google Scholar]
  29. Gaißer S, Ruppert M, Schmid F 2010. A multivariate version of Hoeffding's phi-square. J. Multivar. Anal. 101:2571–86
    [Google Scholar]
  30. Ghosh A, Basu A. 2018. A new family of divergences originating from model adequacy tests and application to robust statistical inference. IEEE Trans. Inf. Theory 64:5581–91
    [Google Scholar]
  31. Giet L, Lubrano M. 2008. A minimum Hellinger distance estimator for stochastic differential equations: an application to statistical inference for continuous time interest rate models. Comput. Stat. Data Anal. 52:2945–65
    [Google Scholar]
  32. Gretton A, Borgwardt KM, Rasch MJ, Schölkopf B, Smola A 2012. A kernel two-sample test. J. Mach. Learn. Res. 13:723–73
    [Google Scholar]
  33. Gretton A, Sejdinovic D, Strathmann H, Balakrishnan S, Pontil M et al. 2012. Optimal kernel choice for large-scale two-sample tests. Advances in Neural Information Processing Systems 25 (NIPS 2012) F Pereira, CJC Burges, L Bottou, KQ Weinberger 1205–13 Red Hook, NY: Curran
    [Google Scholar]
  34. Hampel FR. 1968. Contribution to the theory of robust estimation PhD Thesis, Univ. Calif Berkeley:
    [Google Scholar]
  35. Hampel FR. 1974. The influence curve and its role in robust estimation. J. Am. Stat. Assoc. 69:383–93
    [Google Scholar]
  36. Havrda J, Charvát F. 1967. Quantification method of classification processes: concept of structural α-entropy. Kybernetika 3:30–35
    [Google Scholar]
  37. Hodges JL Jr., Lehmann EL. 1954. Testing the approximate validity of statistical hypotheses. J. R. Stat. Soc. B 16:261–68
    [Google Scholar]
  38. Huo X, Székely GJ. 2016. Fast computing for distance covariance. Technometrics 58:435–47
    [Google Scholar]
  39. Hušková M, Meintanis SG. 2008. Tests for the multivariate k-sample problem based on the empirical characteristic function. J. Nonparametric Stat. 20:263–77
    [Google Scholar]
  40. Kallenberg WC, Oosterhoff J, Schriever BF 1985. The number of classes in chi-squared goodness-of-fit tests. J. Am. Stat. Assoc. 80:959–68
    [Google Scholar]
  41. Kateri M. 2018. ϕ-Divergence in contingency table analysis. Entropy 20:324
    [Google Scholar]
  42. Klar B, Meintanis SG. 2005. Tests for normal mixtures based on the empirical characteristic function. Comput. Stat. Data Anal. 49:227–42
    [Google Scholar]
  43. Klebanov LB. 2005. N-distances and Their Applications Chicago: Univ. Chicago Press
    [Google Scholar]
  44. Klebanov LB, Gordon A, Xiao Y, Land H, Yakovlev A 2006. A permutation test motivated by microarray data analysis. Comput. Stat. Data Anal. 50:3619–28
    [Google Scholar]
  45. Kullback S, Leibler RA. 1951. On information and sufficiency. Ann. Math. Stat. 22:79–86
    [Google Scholar]
  46. Lin N, He X. 2006. Robust and efficient estimation under data grouping. Biometrika 93:99–112
    [Google Scholar]
  47. Lindsay BG. 1994. Efficiency versus robustness: the case for minimum Hellinger distance and related methods. Ann. Stat. 22:1081–114
    [Google Scholar]
  48. Lindsay BG. 2004. Statistical distances as loss functions in assessing model adequacy. The Nature of Scientific Evidence: Statistical, Philosophical and Empirical Considerations ML Taper, SR Lele 439–88 Chicago: Univ. Chicago Press
    [Google Scholar]
  49. Lindsay BG, Liu J. 2009. Model assessment tools for a model false world. Stat. Sci. 24:303–18
    [Google Scholar]
  50. Lindsay BG, Markatou M, Ray S 2014. Kernels, degrees of freedom, and power properties of quadratic distance goodness-of-fit tests. J. Am. Stat. Assoc. 109:395–410
    [Google Scholar]
  51. Lindsay BG, Markatou M, Ray S, Yang K, Chen SC 2008. Quadratic distances on probabilities: a unified foundation. Ann. Stat. 36:983–1006
    [Google Scholar]
  52. Liu J, Lindsay BG. 2009. Building and using semiparametric tolerance regions for parametric multinomial models. Ann. Stat. 37:3644–59
    [Google Scholar]
  53. Liu ZJ, Rao CR. 1995. Asymptotic distribution of statistics based on quadratic entropy and bootstrapping. J. Stat. Plan. Inference 43:1–18
    [Google Scholar]
  54. Markatou M, Chen Y. 2018. Non-quadratic distances in model assessment. Entropy 20:464
    [Google Scholar]
  55. Markatou M, Chen Y, Afendras G, Lindsay BG 2017. Statistical distances and their role in robustness. New Advances in Statistics and Data Science DG Chen, Z Jin, G Li, Y Li, A Liu, Y Zhao 3–26 New York: Springer
    [Google Scholar]
  56. Markatou M, Liu RC. 2019. Distance-based model assessment in continuous parametric models Tech. Rep., Dep. Biostat., SUNY Buffalo, NY:
    [Google Scholar]
  57. Markatou M, Sofikitou EM. 2019. Statistical distances and the construction of evidence functions for model adequacy. Front. Ecol. Evol. 7:447
    [Google Scholar]
  58. Meintanis SG, Swanepoel J, Allison J 2014. The probability weighted characteristic function and goodness-of-fit testing. J. Stat. Plan. Inference 146:122–32
    [Google Scholar]
  59. Panaretos VM, Zemel Y. 2019. Statistical aspects of Wasserstein distances. Annu. Rev. Stat. Appl. 6:405–31
    [Google Scholar]
  60. Pardo L. 2006. Statistical Inference Based on Divergence Measures Boca Raton, FL: Chapman & Hall/CRC
    [Google Scholar]
  61. Pardo L. 2019. New developments in statistical information theory based on entropy and divergence measures. Entropy 21:391
    [Google Scholar]
  62. Póczos B, Ghahramani Z, Schneider J 2012. Copula-based kernel dependency measures. Proceedings of the 29th International Conference on Machine Learning J Langford, J Pineau 775–82 Madison, WI: Omnipress
    [Google Scholar]
  63. Rachev ST. 1991. Probability Metrics and the Stability of Stochastic Models New York: Wiley
    [Google Scholar]
  64. Ramdas A, Reddi SJ, Póczos B, Singh A, Wasserman L 2015. On the decreasing power of kernel and distance based nonparametric hypothesis tests in high dimensions. Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence3571–77 Palo Alto, CA: AAAI
    [Google Scholar]
  65. Rao CR. 1982. Diversity: its measurement, decomposition, apportionment and analysis. Sankhya A 44:1–22
    [Google Scholar]
  66. Rao CR. 1984. Convexity properties of entropy functions and analysis of diversity. Inequalities in Statistics and Probability YL Tong 68–77 Hayward, CA: IMS
    [Google Scholar]
  67. Rao CR. 2010. Quadratic entropy and analysis of diversity. Sankhya A 72:70–80
    [Google Scholar]
  68. Rao CR, Nayak T. 1985. Cross entropy, dissimilarity measures, and characterizations of quadratic entropy. IEEE Trans. Inf. Theory 31:589–93
    [Google Scholar]
  69. Ray S, Lindsay BG. 2008. Model selection in high dimensions: a quadratic-risk-based approach. J. R. Stat. Soc. B 70:95–118
    [Google Scholar]
  70. Read TRC, Cressie NA. 1988. Goodness-of-Fit Statistics for Discrete Multivariate Data New York: Springer-Verlag
    [Google Scholar]
  71. Rényi A. 1967. On some basic problems of statistics from the point of view of information theory. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability LM Le Cam, J Neyman 531–43 Berkeley: Univ. Calif. Press
    [Google Scholar]
  72. Sejdinovic D, Sriperumbudur B, Gretton A, Fukumizu K 2013. Equivalence of distance-based and RKHS-based statistics in hypothesis testing. Ann. Stat. 41:2263–91
    [Google Scholar]
  73. Shannon CE. 1948. A mathematical theory of communication. Bell Syst. Technol. J. 27:379–423
    [Google Scholar]
  74. Simpson DG. 1987. Minimum Hellinger distance estimation for the analysis of count data. J. Am. Stat. Assoc. 82:802–7
    [Google Scholar]
  75. Simpson DG. 1989. Hellinger deviance tests: efficiency, breakdown points, and examples. J. Am. Stat. Assoc. 84:107–13
    [Google Scholar]
  76. Smola A, Gretton A, Song L, Schölkopf B 2007. A Hilbert space embedding for distributions. Algorithmic Learning Theory: 18th International Conference, ALT 2007 M Hutter, RA Servedio, E Takimoto 13–31 New York: Springer
    [Google Scholar]
  77. Stigler SM. 2005. Fisher in 1921. Stat. Sci. 1:32–49
    [Google Scholar]
  78. Stigler SM. 2007. The epic story of maximum likelihood. Stat. Sci. 22:598–620
    [Google Scholar]
  79. Szabo A, Boucher K, Carroll WL, Klebanov LB, Tsodikov AD, Yakovlev AY 2002. Variable selection and pattern recognition with gene expression data generated by the microarray technology. Math. Biosci. 176:71–98
    [Google Scholar]
  80. Szabo A, Boucher K, Jones D, Tsodikov AD, Klebanov LB, Yakovlev AY 2003. Multivariate exploratory tools for microarray data analysis. Biostatistics 4:555–67
    [Google Scholar]
  81. Székely GJ, Rizzo ML. 2009. Brownian distance covariance. Ann. Appl. Stat. 1:1236–65
    [Google Scholar]
  82. Székely GJ, Rizzo ML. 2014. Partial distance correlation with methods for dissimilarities. Ann. Stat. 42:2382–412
    [Google Scholar]
  83. Székely GJ, Rizzo ML. 2017. The energy of data. Annu. Rev. Stat. Appl. 4:447–79
    [Google Scholar]
  84. Székely GJ, Rizzo ML, Bakirov NK 2007. Measuring and testing dependence by correlation of distances. Ann. Stat. 35:2769–94
    [Google Scholar]
  85. Tenreiro C. 2005. On the role played by the fixed bandwidth in the Bickel-Rosenblatt goodness-of-fit test. SORT 29:201–16
    [Google Scholar]
  86. Tenreiro C. 2009. On the choice of the smoothing parameter for the BHEP goodness-of-fit test. Comput. Stat. Data Anal. 53:1038–53
    [Google Scholar]
  87. Yang J, Liu Q, Rao V, Neville J 2018. Goodness-of-fit testing for discrete distributions via Stein discrepancy. PMLR 80:5561–70
    [Google Scholar]
  88. Xi L, Lindsay BG. 1996. A note on calculating the π* index of fit for the analysis of contingency tables. Sociol. Methods Res. 25:248–59
    [Google Scholar]
  89. Zhu S, Chen B, Yang P 2019. Universal hypothesis testing with kernels: asymptotically optimal tests for goodness of fit. PMLR 89:1032–41
    [Google Scholar]
/content/journals/10.1146/annurev-statistics-031219-041228
Loading
/content/journals/10.1146/annurev-statistics-031219-041228
Loading

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error