1932

Abstract

Probability distributions are the building blocks of statistical modeling and inference. It is therefore of the utmost importance to know which distribution to use in what circumstances, as wrong choices will inevitably entail a biased analysis. In this article, we focus on circumstances involving complex data and describe the most popular flexible models for these settings. We focus on the following complex data: multivariate skew and heavy-tailed data, circular data, toroidal data, and cylindrical data. We illustrate the strength of flexible models on the basis of concrete examples and discuss major applications and challenges.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-statistics-040720-025210
2021-03-07
2024-04-29
Loading full text...

Full text loading...

/deliver/fulltext/statistics/8/1/annurev-statistics-040720-025210.html?itemId=/content/journals/10.1146/annurev-statistics-040720-025210&mimeType=html&fmt=ahah

Literature Cited

  1. Aas K, Czado C, Frigessi A, Bakken H 2009. Pair-copula constructions of multiple dependence. Insur. Math. Econ. 44:182–98
    [Google Scholar]
  2. Abe T, Ley C. 2017. A tractable, parsimonious and highly flexible model for cylindrical data, with applications. Econom. Stat. 4:91–104
    [Google Scholar]
  3. Abe T, Pewsey A. 2011a. Sine-skewed circular distributions. Stat. Pap. 52:683–707
    [Google Scholar]
  4. Abe T, Pewsey A. 2011b. Symmetric circular models through duplication and cosine perturbation. Comp. Stat. Data Anal. 55:3271–82
    [Google Scholar]
  5. Adcock C, Azzalini A. 2020. A selective overview of skew-elliptical and related distributions and of their applications. Symmetry 12:118
    [Google Scholar]
  6. Ameijeiras-Alonso J, Ley C. 2019. Sine-skewed toroidal distributions and their application in protein bioinformatics. arXiv:1910.13293 [stat.ME]
  7. Andrews D, Gnanadesikan R, Warner J 1971. Transformations of multivariate data. Biometrics 27:825–40
    [Google Scholar]
  8. Arellano-Valle RB, Genton MG. 2010. Multivariate extended skew-t distributions and related families. Metron 68:201–34
    [Google Scholar]
  9. Arnold R, Jupp PE. 2018. Statistics of orientations of symmetrical objects. Applied Directional Statistics: Modern Methods and Case Studies C Ley, T Verdebout 25–44 Boca Raton, FL: Chapman and Hall/CRC
    [Google Scholar]
  10. Atkinson A. 2020. The Box-Cox transformation: review and extensions. Stat. Sci. In press
    [Google Scholar]
  11. Azzalini A. 2017. sn: the skew-normal and related distributions such as the skew-t. R Package version 1.6-2. https://CRAN.R-project.org/package=sn
    [Google Scholar]
  12. Azzalini A, Capitanio A. 2003. Distributions generated by perturbation of symmetry with emphasis on a multivariate skew t-distribution. J. R. Stat. Soc. Ser. B 65:367–89
    [Google Scholar]
  13. Azzalini A, Dalla Valle A 1996. The multivariate skew-normal distribution. Biometrika 83:715–26
    [Google Scholar]
  14. Azzalini A, Genton MG. 2008. Robust likelihood methods based on the skew-t and related distributions. Int. Stat. Rev. 76:106–29
    [Google Scholar]
  15. Babić S, Ley C, Veredas D 2019. Comparison and classification of flexible distributions for multivariate skew and heavy-tailed data. Symmetry 11:101216
    [Google Scholar]
  16. Balakrishnan N, Lai CD. 2009. Continuous Bivariate Distributions New York: Springer
  17. Batschelet E. 1981. Circular Statistics in Biology London: Academic
  18. Bedford T, Cooke RM. 2002. Vines: a new graphical model for dependent random variables. Ann. Stat. 30:1031–68
    [Google Scholar]
  19. Bermúdez L, Karlis D, Santolino M 2017. A finite mixture of multiple discrete distributions for modelling heaped count data. Comput. Stat. Data Anal. 112:14–23
    [Google Scholar]
  20. Böhning D. 1999. Computer-Assisted Analysis of Mixtures and Applications: Meta-Analysis, Disease Mapping, and Others Boca Raton, FL: Chapman and Hall/CRC
  21. Box GE, Cox DR. 1964. An analysis of transformations. J. R. Stat. Soc. Ser. B 26:211–43
    [Google Scholar]
  22. Breiman L. 2001. Statistical modeling: the two cultures. Stat. Sci. 6:199–231
    [Google Scholar]
  23. Charemza W, Vela CD, Makarova S 2013. Too many skew-normal distributions? The practitioners perspective Discuss. Pap. Econ. 13/07 Univ. Leicester, UK
  24. Clayton DG. 1978. A model for association in bivariate life tables and its application in epidemiological studies of familial tendency in chronic disease incidence. Biometrika 65:141–51
    [Google Scholar]
  25. Craens D, Ley C. 2018. Invited opinion paper: Analysis of biological and biomedical data with circular statistics. Biostat. Biom. Open Access J. 5:555671
    [Google Scholar]
  26. Dominicy Y, Sinner C. 2017. Distributions and composite models for size-type data. Advances in Statistical Methodologies and Their Applications to Real Problems T Hokimoto 159–84 Rijeka, Croatia: InTech
    [Google Scholar]
  27. Dryden IL. 2005. Statistical analysis on high-dimensional spheres and shape spaces. Ann. Stat. 33:1643–65
    [Google Scholar]
  28. Embrechts P, Hofert M. 2013. Statistical inference for copulas in high dimensions: a simulation study. ASTIN Bull. J. IAA 43:81–95
    [Google Scholar]
  29. Everitt B, Hand D. 1981. Finite Mixture Distributions New York: Chapman and Hall
  30. Fang HB, Fang KT, Kotz S 2002. The meta-elliptical distributions with given marginals. J. Multivar. Anal. 82:1–16
    [Google Scholar]
  31. Field C, Genton MG. 2006. The multivariate g-and-h distribution. Technometrics 48:104–11
    [Google Scholar]
  32. Forbes F, Wraith D. 2014. A new family of multivariate heavy-tailed distributions with variable marginal amounts of tailweight: application to robust clustering. Stat. Comput. 24:971–84
    [Google Scholar]
  33. Frank MJ 1979. On the simultaneous associativity of F(x, y) and x + yF(x, y). Aequ. Math. 19:194–226
    [Google Scholar]
  34. Fréchet M. 1951. Sur les tableaux de corrélation dont les marges sont données. Ann. Univ. Lyon 3e Sér. Sci. Sect. A 14:53–77
    [Google Scholar]
  35. Gatto R, Jammalamadaka SR. 2007. The generalized von Mises distribution. Stat. Methodol. 4:341–53
    [Google Scholar]
  36. Gauss CF. 1809. Theoria motus corporum coelestium in sectionibus conicis solem ambientium. Hamburg, Ger: Perthes and Besser
  37. Gelfand AE, Banerjee S. 2017. Bayesian modeling and analysis of geostatistical data. Annu. Rev. Stat. Appl. 4:245–66
    [Google Scholar]
  38. Genest C, Rivest LP. 1993. Statistical inference procedures for bivariate Archimedean copulas. J. Am. Stat. Assoc. 88:1034–43
    [Google Scholar]
  39. Genton MG. 2004. Skew-Elliptical Distributions and Their Applications: A Journey Beyond Normality Boca Raton, FL: Chapman and Hall/CRC
  40. Genton MG, Thompson KR. 2004. Skew-elliptical time series with application to flooding risk. Time Series Analysis and Applications to Geophysical Systems D Brillinger, EA Robinson, FP Schoenberg 169–85 New York: Springer
    [Google Scholar]
  41. Ghalanos A, Theussl S. 2015. Rsolnp: general non-linear optimization using augmented Lagrange multiplier method. R Package version 1.16. https://CRAN.R-project.org/package=Rsolnp
    [Google Scholar]
  42. Gugliani G, Sarkar A, Ley C, Mandal S 2018. New methods to assess wind resources in terms of wind speed, load, power and direction. Renew. Energy 129:168–82
    [Google Scholar]
  43. Gumbel EJ. 1960. Bivariate exponential distributions. J. Am. Stat. Assoc. 55:698–707
    [Google Scholar]
  44. Hallin M, Ley C. 2012. Skew-symmetric distributions and Fisher information—a tale of two densities. Bernoulli 18:747–63
    [Google Scholar]
  45. Hamelryck T, Mardia KV, Ferkinghoff-Borg J 2012. Bayesian Methods in Structural Bioinformatics Berlin: Springer
  46. Hofert M, Kojadinovic I, Maechler M, Yan J 2017. copula: multivariate dependence with copulas. R Package version 1.0-0. https://CRAN.R-project.org/package=copula
    [Google Scholar]
  47. Hunt E. 2011. Human Intelligence Cambridge, UK: Cambridge Univ. Press
  48. Jammalamadaka SR, SenGupta A. 2001. Topics in Circular Statistics Singapore: World Sci.
  49. Jangamshetti SH, Rau GV. 2001. Normalized power curves as a tool for identification of optimum wind turbine generator parameters. IEEE Trans. Energy Convers. 16:283–88
    [Google Scholar]
  50. Jia L, Li K, Yu J, Guo X, Zhao T 2020. Prediction and analysis of Coronavirus Disease 2019. arXiv:2003.05447 [q-bio.PE]
  51. Joe H 1996. Families of m-variate distributions with given margins and m(m − 1)/2 bivariate dependence parameters. Lect. Notes Monogr. Ser. 28:120–41
    [Google Scholar]
  52. Joe H. 1997. Multivariate Models and Multivariate Dependence Concepts Berlin: Springer
  53. Johnson RA, Wehrly TE. 1978. Some angular-linear distributions and related regression models. J. Am. Stat. Assoc. 73:602–6
    [Google Scholar]
  54. Jones MC. 2015. On families of distributions with shape parameters (with discussion). Int. Stat. Rev. 83:175–92
    [Google Scholar]
  55. Jones MC, Faddy MJ. 2003. A skew extension of the t-distribution, with applications. J. R. Stat. Soc. Ser. B 65:159–74
    [Google Scholar]
  56. Jones MC, Pewsey A. 2005. A family of symmetric distributions on the circle. J. Am. Stat. Assoc. 100:1422–28
    [Google Scholar]
  57. Jones MC, Pewsey A. 2009. Sinh-arcsinh distributions. Biometrika 96:761–80
    [Google Scholar]
  58. Jones MC, Pewsey A, Kato S 2015. On a class of circulas: copulas for circular distributions. Ann. Inst. Stat. Math. 67:843–62
    [Google Scholar]
  59. Kato S, Jones MC. 2010. A family of distributions on the circle with links to, and applications arising from, Möbius transformation. J. Am. Stat. Assoc. 105:249–62
    [Google Scholar]
  60. Kato S, Jones MC. 2015. A tractable and interpretable four-parameter family of unimodal distributions on the circle. Biometrika 102:181–90
    [Google Scholar]
  61. Kato S, Pewsey A. 2015. A Möbius transformation-induced distribution on the torus. Biometrika 102:359–70
    [Google Scholar]
  62. Kato S, Shimizu K. 2008. Dependent models for observations which include angular ones. J. Stat. Plan. Infer. 138:3538–49
    [Google Scholar]
  63. Kelker D. 1970. Distribution theory of spherical distributions and a location-scale parameter generalization. Sankhyā Indian J. Stat. A 32:419–30
    [Google Scholar]
  64. Kleiber C, Kotz S. 2003. Statistical Size Distributions in Economics and Actuarial Sciences New York: Wiley
  65. Kotz S, Balakrishnan N, Johnson NL 2004. Continuous Multivariate Distributions: Models and Applications, Vol. 1 New York: Wiley
  66. Kowalczyk H. 2013. Inflation fan charts and different dimensions of uncertainty. What if macroeconomic uncertainty is high? Work. Pap. 157 Natl. Bank Pol Warsaw, Pol:.
  67. Kurowicka D, Joe H. 2010. Dependence Modeling: Vine Copula Handbook Singapore: World Sci.
  68. Lagona F. 2019. Correlated cylindrical data. Applied Directional Statistics: Modern Methods and Case Studies C Ley, T Verdebout 45–60 Boca Raton, FL: Chapman and Hall/CRC
    [Google Scholar]
  69. Lagona F, Picone M, Maruotti A 2015. A Hidden Markov model for the analysis of cylindrical time series. Environmetrics 26:535–44
    [Google Scholar]
  70. Lambert P, Vandenhende F. 2002. A copula-based model for multivariate non-normal longitudinal data: analysis of a dose titration safety study on a new antidepressant. Stat. Med. 21:3197–217
    [Google Scholar]
  71. Lee C, Famoye F, Alzaatreh A 2013. Methods for generating families of univariate continuous distributions in the recent decades. WIREs Comput. Stat. 5:219–38
    [Google Scholar]
  72. Ley C. 2015. Flexible modelling in statistics: past, present and future. J. Soc. Fr. Stat. 156:76–96
    [Google Scholar]
  73. Ley C, Paindaveine D. 2010a. Multivariate skewing mechanisms: a unified perspective based on the transformation approach. Stat. Probab. Lett. 80:1685–94
    [Google Scholar]
  74. Ley C, Paindaveine D. 2010b. On the singularity of multivariate skew-symmetric models. J. Multivar. Anal. 101:1434–44
    [Google Scholar]
  75. Ley C, Van de Wiele T, Van Eetvelde H 2019. Ranking soccer teams on the basis of their current strength: a comparison of maximum likelihood approaches. Stat. Model. 19:55–77
    [Google Scholar]
  76. Ley C, Verdebout T. 2017. Modern Directional Statistics Boca Raton, FL: Chapman and Hall/CRC
  77. Lombardi MJ, Veredas D. 2009. Indirect estimation of elliptical stable distributions. Comput. Stat. Data Anal. 53:2309–24
    [Google Scholar]
  78. Mardia KV. 1975. Statistics of directional data (with discussion). J. R. Stat. Soc. Ser. B 37:349–93
    [Google Scholar]
  79. Mardia KV, Hughes G, Taylor CC, Singh H 2008. A multivariate von Mises distribution with applications to bioinformatics. Can. J. Stat. 36:99–109
    [Google Scholar]
  80. Mardia KV, Jupp PE. 2000. Directional Statistics New York: Wiley
  81. Mardia KV, Ley C. 2018. Directional distributions. Wiley StatsRef: Statistics Reference Online N Balakrishnan, T Colton, B Everitt, W Piegorsch, F Ruggeri, J Teugels 1–13 New York: Wiley
    [Google Scholar]
  82. Mardia KV, Patrangenaru V. 2005. Directions and projective shapes. Ann. Stat. 33:1666–99
    [Google Scholar]
  83. Mardia KV, Sutton TW. 1975. On the modes of a mixture of two von Mises distributions. Biometrika 62:699–701
    [Google Scholar]
  84. Mardia KV, Sutton TW. 1978. A model for cylindrical variables with applications. J. R. Stat. Soc. Ser. B 40:229–33
    [Google Scholar]
  85. Mardia KV, Taylor CC, Subramaniam GK 2007. Protein bioinformatics and mixtures of bivariate von Mises distributions for angular data. Biometrics 63:505–12
    [Google Scholar]
  86. McLachlan GJ, Basford KE. 1988. Mixture Models: Inference and Applications to Clustering New York: M. Dekker
  87. McLachlan GJ, Lee SX, Rathnayake SI 2019. Finite mixture models. Annu. Rev. Stat. Appl. 6:355–78
    [Google Scholar]
  88. McNeil AJ, Frey R, Embrechts P 2005. Quantitative Risk Management: Concepts, Techniques and Tools, Vol. 3 Princeton, NJ: Princeton Univ. Press
  89. Mengersen KL, Robert C, Titterington M 2011. Mixtures: Estimation and Applications New York: Wiley
  90. Mooney JA, Helms PJ, Jolliffe IT 2003. Fitting mixtures of von Mises distributions: a case study involving sudden infant death syndrome. Comp. Stat. Data Anal. 41:505–13
    [Google Scholar]
  91. Müller D, Czado C. 2019. Dependence modelling in ultra high dimensions with vine copulas and the Graphical Lasso. Comput. Stat. Data Anal. 137:211–32
    [Google Scholar]
  92. Nagler T, Bumann C, Czado C 2019. Model selection in sparse high-dimensional vine copula models with an application to portfolio risk. J. Multivar. Anal. 172:180–92
    [Google Scholar]
  93. Nagler T, Schepsmeier U, Stoeber J, Brechmann EC, Graeler B, Erhardt T 2020. VineCopula: statistical inference of vine copulas. R Package version 2.4.1. https://CRAN.R-project.org/package=VineCopula
    [Google Scholar]
  94. Nelsen RB. 2003. Properties and applications of copulas: A brief survey. Proceedings of the First Brazilian Conference on Statistical Modelling in Insurance and Finance J Dhaene, N Kolev, P Morettin 10–28 Sao Paulo: Inst. Math. Stat. Univ. Sao Paulo
    [Google Scholar]
  95. Nelsen RB. 2006. An Introduction to Copulas New York: Springer. , 2nd. ed.
  96. Paindaveine D. 2012. Elliptical symmetry. Encyclopedia of Environmetrics AH El-Shaarawi, W Piegorsch 802–7 New York: Wiley. , 2nd. ed.
    [Google Scholar]
  97. Pewsey A. 2008. The wrapped stable family of distributions as a flexible model for circular data. Comp. Stat. Data Anal. 52:1516–23
    [Google Scholar]
  98. Pewsey A, Kato S. 2016. Parametric bootstrap goodness-of-fit testing for Wehrly–Johnson bivariate circular distributions. Stat. Comput. 26:1307–17
    [Google Scholar]
  99. Pewsey A, Neuhäuser M, Ruxton GD 2013. Circular Statistics in R Oxford, UK: Oxford Univ. Press
  100. Ramachandran GN. 1963. Stereochemistry of polypeptide chain configurations. J. Mol. Biol. 7:95–99
    [Google Scholar]
  101. Ranalli M, Lagona F, Picone M, Zambianchi E 2018. Segmentation of sea current fields by cylindrical hidden Markov models: a composite likelihood approach. J. R. Stat. Soc. Ser. C 67:575–98
    [Google Scholar]
  102. Scrucca L, Fop M, Murphy TB, Raftery AE 2016. mclust 5: Clustering, classification and density estimation using Gaussian finite mixture models. R J. 8:205–33
    [Google Scholar]
  103. Sellers KF, Swift AW, Weems KS 2017. A flexible distribution class for count data. J. Stat. Distrib. Appl. 4:22
    [Google Scholar]
  104. Siffer A. 2018. Rfolding: the folding test of unimodality. R Package version 1.0. https://CRAN.R-project.org/package=Rfolding
    [Google Scholar]
  105. Singh H, Hnizdo V, Demchuk E 2002. Probabilistic model for two dependent circular variables. Biometrika 89:719–23
    [Google Scholar]
  106. Sklar M 1959. Fonctions de répartition à n dimensions et leurs marges. Publ. Inst. Stat. Univ. Paris 8:229–31
    [Google Scholar]
  107. Stigler SM. 1986. The History of Statistics: The Measurement of Uncertainty Before 1900 Cambridge, MA: Harvard Univ. Press
  108. Tibshirani R, Taylor J, Lockhart R, Tibshirani R, Fithian W et al. 2015. Recent advances in post-selection statistical inference Breiman Lecture presented at NIPS 2015, the Twenty-Ninth Conference on Neural Information Processing Systems, Dec. 7–12 Montreal, Can:.
  109. Tukey JW. 1977. Modern techniques in data analysis Presented at the NSF-Sponsored Regional Research Conference, Southern Massachusetts University North Dartmouth, MA:
  110. Wang J, Boyer J, Genton MG 2004. A skew-symmetric representation of multivariate distributions. Stat. Sin. 14:1259–70
    [Google Scholar]
  111. Wehrly TE, Johnson RA. 1980. Bivariate models for dependence of angular observations and a related Markov process. Biometrika 66:255–56
    [Google Scholar]
  112. Wraith D, Forbes F. 2015. Location and scale mixtures of Gaussians with flexible tail behaviour: properties, inference and application to multivariate clustering. Comput. Stat. Data Anal. 90:61–73
    [Google Scholar]
  113. Wuertz D, Setz T, Chalabi Y 2020. fmultivar: Rmetrics—analysing and modeling multivariate financial return distributions. R Package version 3042.80.1. https://CRAN.R-project.org/package=fMultivar
    [Google Scholar]
  114. Ye Y. 1987. Interior algorithms for linear, quadratic, and linearly constrained non-linear programming PhD Thesis, Dep. ESS, Stanford University
  115. Yeo IK, Johnson RA. 2000. A new family of power transformations to improve normality or symmetry. Biometrika 87:954–59
    [Google Scholar]
/content/journals/10.1146/annurev-statistics-040720-025210
Loading
/content/journals/10.1146/annurev-statistics-040720-025210
Loading

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error