Model-Based Clustering

Isobel Claire Gormley; Thomas Brendan Murphy; Adrian E. Raftery

doi:10.1146/annurev-statistics-033121-115326

Annual Review of Statistics and Its Application

Volume 10, 2023

Review Article

Open Access

Model-Based Clustering

Isobel Claire Gormley¹, Thomas Brendan Murphy^1,2, and Adrian E. Raftery³
View Affiliations Hide Affiliations

Affiliations: ¹School of Mathematics and Statistics, University College Dublin, Dublin, Ireland; email: [email protected][email protected] ²Collegium de Lyon Institut d'Études Avancées and ERIC, Université de Lyon, Lyon, France ³Department of Statistics and Department of Sociology, University of Washington, Seattle, Washington; email: [email protected]
Vol. 10:573-595 (Volume publication date March 2023) https://doi.org/10.1146/annurev-statistics-033121-115326
First published as a Review in Advance on October 21, 2022
Copyright © 2023 by the author(s).

This work is licensed under a Creative Commons Attribution 4.0 International License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. See credit lines of images or other third-party material in this article for license information

Abstract

Clustering is the task of automatically gathering observations into homogeneous groups, where the number of groups is unknown. Through its basis in a statistical modeling framework, model-based clustering provides a principled and reproducible approach to clustering. In contrast to heuristic approaches, model-based clustering allows for robust approaches to parameter estimation and objective inference on the number of clusters, while providing a clustering solution that accounts for uncertainty in cluster membership. The aim of this article is to provide a review of the theory underpinning model-based clustering, to outline associated inferential approaches, and to highlight recent methodological developments that facilitate the use of model-based clustering for a broad array of data types. Since its emergence six decades ago, the literature on model-based clustering has grown rapidly, and as such, this review provides only a selection of the bibliography in this dynamic and impactful field.

Keyword(s): Bayesian inference, clustering, expectation–maximization algorithm, mixture models, software

Article metrics loading...

/content/journals/10.1146/annurev-statistics-033121-115326

2023-03-09

2024-05-01

Full text loading...

/deliver/fulltext/statistics/10/1/annurev-statistics-033121-115326.html?itemId=/content/journals/10.1146/annurev-statistics-033121-115326&mimeType=html&fmt=ahah

Literature Cited

Ahlquist JS, Breunig C. 2012. Model-based clustering and typologies in the social sciences. Political Anal. 20:92–112
[Google Scholar]
Airoldi EM, Blei DM, Fienberg SE, Xing EP. 2008. Mixed-membership stochastic blockmodels. J. Mach. Learn. Res. 9:651981–2014
[Google Scholar]
Aitkin M, Rubin DB. 1985. Estimation and hypothesis testing in finite mixture models. J. R. Stat. Soc. Ser. B 47:167–75
[Google Scholar]
Antonazzo F, Biernacki C, Keribin C 2021. A binned technique for scalable model-based clustering on huge datasets. Book of Short Papers of the 5th International Workshop on Models and Learning for Clustering and Classification (MBC2 2020), Catania, Italy S Ingrassia, A Punzo, R Rocci 11–16 Milan: Ledizioni
[Google Scholar]
Banfield JD, Raftery AE. 1989. Model-based Gaussian and non-Gaussian clustering. Tech. Rep. 186 Dep. Stat., Univ. Washington Seattle, WA:
Banfield JD, Raftery AE. 1993. Model-based Gaussian and non-Gaussian clustering. Biometrics 49:3803–21
[Google Scholar]
Baudry JP, Raftery AE, Celeux G, Lo K, Gottardo R. 2010. Combining mixture components for clustering. J. Comput. Graph. Stat. 19:332–53
[Google Scholar]
Benaglia T, Chauveau D, Hunter DR, Young D 2009. mixtools: An R package for analyzing finite mixture models. J. Stat. Softw. 32:61–29
[Google Scholar]
Bensmail H, Celeux G, Raftery AE, Robert CP. 1997. Inference in model-based cluster analysis. Stat. Comput. 7:11–10
[Google Scholar]
Benter W 1994. Computer based horse race handicapping and wagering systems: a report. Efficiency of Racetrack Betting Markets WT Ziemba, VS Lo, DB Haush 183–98 Singapore: World Sci.
[Google Scholar]
Biernacki C, Celeux G, Govaert G. 2000. Assessing a mixture model for clustering with the integrated completed likelihood. IEEE Trans. Pattern Anal. 22:7719–25
[Google Scholar]
Biernacki C, Jacques J 2013. A generative model for rank data based on insertion sort algorithm. Comput. Stat. Data Anal. 58:162–76
[Google Scholar]
Binder DA. 1978. Bayesian cluster analysis. Biometrika 65:131–38
[Google Scholar]
Bouveyron C, Celeux G, Murphy TB, Raftery AE. 2019. Model-Based Clustering and Classification for Data Science: With Applications in R Cambridge, UK: Cambridge Univ. Press
Bouveyron C, Jacques J 2011. Model-based clustering of time series in group-specific functional subspaces. Adv. Data Anal. Classif. 5:4281–300
[Google Scholar]
Busse LM, Orbanz P, Buhmann JM. 2007. Cluster analysis of heterogeneous rank data. Proceedings of the 24th International Conference on Machine Learning, ICML '07113–20 New York: ACM
[Google Scholar]
Cagnone S, Viroli C. 2012. A factor mixture analysis model for multivariate binary data. Stat. Model. 12:3257–77
[Google Scholar]
Carpaneto G, Toth P. 1980. Algorithm 548: Solution of the assignment problem [H]. ACM Trans. Math. Softw. 6:1104–11
[Google Scholar]
Celeux G, Govaert G. 1995. Gaussian parsimonious clustering models. Pattern Recognit. 28:5781–93
[Google Scholar]
Celeux G, Hurn M, Robert CP. 2000. Computational and inferential difficulties with mixture posterior distributions. J. Am. Stat. Assoc. 95:451957–70
[Google Scholar]
Celeux G, Martin-Magniette ML, Maugis C, Raftery AE. 2011. Letter to the editor: “A framework for feature selection in clustering. .” J. Am. Stat. Assoc. 106:493383
[Google Scholar]
Celeux G, Martin-Magniette ML, Maugis-Rabusseau C, Raftery AE. 2014. Comparing model selection and regularization approaches to variable selection in model-based clustering. J. Soc. Fr. Stat. 155:257–71
[Google Scholar]
Czekanowski J. 1909. Zur differential-diagnose der Neadertalgruppe. Korresp. Bl. Dtsch. Ges. Anthropol. Ethnol. Urgesch. 40:44–47
[Google Scholar]
Day NE. 1969. Estimating the components of a mixture of two normal distributions. Biometrika 56:3463–74
[Google Scholar]
Dean N, Raftery AE. 2010. Latent class analysis variable selection. Ann. Inst. Stat. Math. 62:111–35
[Google Scholar]
Dempster AP, Laird NM, Rubin DB. 1977. Maximum likelihood from incomplete data via the EM algorithm. With discussion. J. R. Stat. Soc. Ser. B 39:11–38
[Google Scholar]
Diebolt J, Robert CP. 1994. Estimation of finite mixture distributions through Bayesian sampling. J. R. Stat. Soc. Ser. B 56:2363–75
[Google Scholar]
Erosheva EA, Matsueda RL, Telesca D. 2014. Breaking bad: two decades of life-course data analysis in criminology, developmental psychology, and beyond. Annu. Rev. Stat. Appl. 1:301–32
[Google Scholar]
Everitt B. 1984. An Introduction to Latent Variable Models London: Chapman and Hall
Ferguson TS. 1973. A Bayesian analysis of some nonparametric problems. Ann. Stat. 1:209–30
[Google Scholar]
Fop M, Murphy TB. 2017. LCAvarsel: variable selection for latent class analysis. R Package version 1.1
[Google Scholar]
Fop M, Murphy TB. 2018. Variable selection methods for model-based clustering. Stat. Surv. 12:18–65
[Google Scholar]
Fop M, Murphy TB, Scrucca L. 2019. Model-based clustering with sparse covariance matrices. Stat. Comput. 29:4791–819
[Google Scholar]
Fop M, Smart K, Murphy TB. 2017. Variable selection for latent class analysis with application to low back pain diagnosis. Ann. Appl. Stat. 11:2085–115
[Google Scholar]
Fraley C, Raftery AE. 1998. How many clusters? Which clustering method? Answers via model-based cluster analysis. Comput. J. 41:8578–88
[Google Scholar]
Fraley C, Raftery AE. 2002. Model-based clustering, discriminant analysis and density estimation. J. Am. Stat. Assoc. 97:458611–31
[Google Scholar]
Frühwirth-Schnatter S. 2006. Finite Mixture and Markov Switching Models. New York: Springer
Frühwirth-Schnatter S 2011. Dealing with label switching under model uncertainty. Mixtures: Estimation and Application K Mengersen, CP Robert, DM Titterington 213–39 New York: Wiley
[Google Scholar]
Frühwirth-Schnatter S, Celeux G, Robert CP. 2019. Handbook of Mixture Analysis Boca Raton, FL: Chapman and Hall/CRC
Frühwirth-Schnatter S, Malsiner-Walli G. 2019. From here to infinity: sparse finite versus Dirichlet process mixtures in model-based clustering. Adv. Data Anal. Classif. 13:133–64
[Google Scholar]
García-Escudero LA, Gordaliza A, Greselin F, Ingrassia S, Mayo-Iscar A. 2018. Eigenvalues and constraints in mixture modeling: geometric and computational issues. Adv. Data Anal. Classif. 12:2203–33
[Google Scholar]
García-Escudero LA, Gordaliza A, Matrán C, Mayo-Iscar A. 2008. A general trimming approach to robust cluster analysis. Ann. Stat. 36:31324–45
[Google Scholar]
Ghahramani Z, Hinton GE. 1996. The EM algorithm for mixtures of factor analyzers. Tech. Rep. CRG-TR-96-1 Dep. Comput. Sci., Univ. Toronto, Toronto Can:.
Gollini I 2019. lvm4net: Latent variable models for networks. R Package version 0.3
[Google Scholar]
Gollini I, Murphy TB. 2014. Mixture of latent trait analyzers for model-based clustering of categorical data. Stat. Comput. 24:4569–88
[Google Scholar]
Gormley IC, Frühwirth-Schnatter S 2019. Mixture of experts models. Handbook of Mixture Analysis S Frühwirth-Schnatter, G Celeux, CP Robert 271–307 Boca Raton, FL: Chapman and Hall/CRC
[Google Scholar]
Gormley IC, Murphy TB. 2006. Analysis of Irish third-level college applications data. J. R. Stat. Soc. Ser. A 169:2361–79
[Google Scholar]
Gormley IC, Murphy TB. 2008. Exploring voting blocs within the Irish electorate: a mixture modeling approach. J. Am. Stat. Assoc. 103:4831014–27
[Google Scholar]
Gormley IC, Murphy TB. 2010. A mixture of experts latent position cluster model for social network data. Stat. Methodol. 7:3385–405
[Google Scholar]
Gormley IC, Murphy TB. 2019. MEclustnet: the mixture of experts latent position cluster model for network data. R Package version 1.2.2
[Google Scholar]
Grün B, Leisch F. 2007. Fitting finite mixtures of generalized linear regressions in R. Comput. Stat. Data Anal. 51:115247–52
[Google Scholar]
Grün B, Leisch F. 2008. FlexMix version 2: Finite mixtures with concomitant variables and varying and constant parameters. J. Stat. Softw. 28:41–35
[Google Scholar]
Handcock MS, Raftery AE, Tantrum JM. 2007. Model-based clustering for social networks. J. R. Stat. Soc. Ser. A 170:21–22
[Google Scholar]
Hejblum BP, Alkhassim C, Gottardo R, Caron F, Thiébaut R 2019. Sequential Dirichlet process mixtures of multivariate skew t-distributions for model-based clustering of flow cytometry data. Ann. Appl. Stat. 13:1638–60
[Google Scholar]
Hennig C. 2010. Methods for merging Gaussian mixture components. Adv. Data Anal. Classif. 4:13–34
[Google Scholar]
Hunt L, Jorgensen M. 1999. Theory & methods: mixture model clustering using the MULTIMIX program. Aust. N. Z. J. Stat. 41:2154–71
[Google Scholar]
Hunt L, Jorgensen M. 2003. Mixture model clustering for mixed data with missing information. Comput. Stat. Data Anal. 41:3–4429–40
[Google Scholar]
Ishwaran H, James LF. 2001. Gibbs sampling methods for stick-breaking priors. J. Am. Stat. Assoc. 96:453161–73
[Google Scholar]
Jacobs RA, Jordan MI, Nowlan SJ, Hinton GE. 1991. Adaptive mixtures of local experts. Neural Comput. 3:179–87
[Google Scholar]
Jacques J, Preda C. 2014. Model-based clustering for multivariate functional data. Comput. Stat. Data Anal. 71:92–106
[Google Scholar]
Jasra A, Holmes CC, Stephens DA. 2005. Markov chain Monte Carlo methods and the label switching problem in Bayesian mixture modeling. Stat. Sci. 20:150–67
[Google Scholar]
Jordan MI, Jacobs RA. 1994. Hierarchical mixtures of experts and the EM algorithm. Neural Comput. 6:2181–214
[Google Scholar]
Kalli M, Griffin JE, Walker SG. 2011. Slice sampling mixture models. Stat. Comput. 21:193–105
[Google Scholar]
Karlis D 2019. Mixture modelling of discrete data. Handbook of Mixture Analysis S Frühwirth-Schnatter, G Celeux, CP Robert 193–218 Boca Raton, FL: Chapman and Hall/CRC
[Google Scholar]
Karlis D, Meligkotsidou L. 2007. Finite mixtures of multivariate Poisson distributions with application. J. Stat. Plan. Inference 137:61942–60
[Google Scholar]
Keribin C. 2000. Consistent estimation of the order of mixture models. Sankhya A 62:149–66
[Google Scholar]
Krivitsky PN, Handcock MS. 2008. Fitting latent cluster models for networks with latentnet. J. Stat. Softw. 24:51–23
[Google Scholar]
Krivitsky PN, Handcock MS. 2020. latentnet: Latent position and cluster models for statistical networks. R Package version 2.10.5
[Google Scholar]
Krivitsky PN, Handcock MS, Raftery AE, Hoff PD. 2009. Representing degree distributions, clustering, and homophily in social networks with latent cluster random effects models. Soc. Netw. 31:3204–13
[Google Scholar]
Latouche P, Birmelé E, Ambroise C. 2011. Overlapping stochastic block models with application to the French political blogosphere. Ann. Appl. Stat. 5:1309–36
[Google Scholar]
Lazarsfeld PF 1950a. The logical and mathematical foundations of latent structure analysis. Studies in Social Psychology in World War II. Vol. IV: Measurement and Prediction SA Stouffer, L Guttman, EA Suchman, PF Lazarsfeld 362–412 Princeton, NJ: Princeton Univ. Press
[Google Scholar]
Lazarsfeld PF 1950b. Some latent structures. Studies in Social Psychology in World War II. Vol. IV: Measurement and Prediction SA Stouffer, L Guttman, EA Suchman, PF Lazarsfeld 413–73 Princeton, NJ: Princeton Univ. Press
[Google Scholar]
Lebret R, Iovleff S, Langrognet F, Biernacki C, Celeux G, Govaert G. 2015. Rmixmod: the R package of the model-based unsupervised, supervised, and semi-supervised classification mixmod library. J. Stat. Softw. 67:61–29
[Google Scholar]
Lee SX, McLachlan GJ. 2013. Model-based clustering and classification with non-normal mixture distributions. Stat. Methods Appl. 22:4427–54
[Google Scholar]
Lee SX, McLachlan GJ. 2018. EMMIXcskew: an R package for the fitting of a mixture of canonical fundamental skew t-distributions. J. Stat. Softw. 83:31–32
[Google Scholar]
Lee SX, McLachlan GJ. 2022. An overview of skew distributions in model-based clustering. J. Multivar. Anal. 188:104853
[Google Scholar]
Leisch F. 2004. FlexMix: a general framework for finite mixture models and latent class regression in R. J. Stat. Softw. 11:81–18
[Google Scholar]
Linnaeus C. 1753. Species Plantarum Stockholm: Laurentii Salvii. , 1st ed..
Linzer DA, Lewis JB. 2011. poLCA: An R package for polytomous variable latent class analysis. J. Stat. Softw. 42:101–29
[Google Scholar]
Liu Q, Crispino M, Scheel I, Vitelli V, Frigessi A. 2019. Model-based learning from preference data. Annu. Rev. Stat. Appl. 6:329–54
[Google Scholar]
Maier LM, Anderson DE, De Jager PL, Wicker LS, Hafler DA. 2007. Allelic variant in CTLA4 alters T cell phosphorylation patterns. PNAS 104:4718607–12
[Google Scholar]
Mallows CL. 1957. Non-null ranking models. Biometrika 44:1/2114–30
[Google Scholar]
Malsiner-Walli G, Frühwirth-Schnatter S, Grün B. 2016. Model-based clustering based on sparse finite Gaussian mixtures. Stat. Comput. 26:303–24
[Google Scholar]
Maugis C, Celeux G, Martin-Magniette ML. 2009a. Variable selection for clustering with Gaussian mixture models. Biometrics 65:3701–9
[Google Scholar]
Maugis C, Celeux G, Martin-Magniette ML. 2009b. Variable selection in model-based clustering: a general variable role modeling. Comput. Stat. Data Anal. 53:113872–82
[Google Scholar]
McLachlan G, Peel D. 1998. Robust cluster analysis via mixtures of multivariate t-distributions. Advances in Pattern Recognition A Amin, D Dori, P Pudil, H Freeman 658–66 Berlin: Springer-Verlag
[Google Scholar]
McLachlan G, Peel D. 2000. Finite Mixture Models. New York: Wiley-Interscience
McLachlan GJ, Basford KE. 1988. Mixture Models: Inference and Applications to Clustering New York: Marcel Dekker
McLachlan GJ, Bean RW, Peel D. 2002. A mixture model–based approach to the clustering of microarray expression data. Bioinformatics 18:3413–22
[Google Scholar]
McLachlan GJ, Krishnan T. 2008. The EM Algorithm and Extensions New York: Wiley-Interscience. , 2nd ed..
McLachlan GJ, Lee SX, Rathnayake SI. 2019. Finite mixture models. Annu. Rev. Stat. Appl. 6:355–78
[Google Scholar]
McLachlan GJ, Peel D, Basford KE, Adams P 1999. The EMMIX software for the fitting of mixtures of normal and t-components. J. Stat. Softw. 4:21–14
[Google Scholar]
McLachlan GJ, Peel D, Bean RW. 2003. Modelling high-dimensional data by mixtures of factor analyzers. Comput. Stat. Data Anal. 41:3–4379–88
[Google Scholar]
McNicholas PD 2016a. Mixture Model–Based Classification Boca Raton, FL: Chapman and Hall/CRC
McNicholas PD. 2016b. Model-based clustering. J. Classif. 33:331–73
[Google Scholar]
McNicholas PD, ElSherbiny A, McDaid AF, Murphy TB. 2021. pgmm: Parsimonious Gaussian mixture models. R Package version 1.2.5
[Google Scholar]
McNicholas PD, Jampani KR, Subedi S. 2019. longclust: Model-based clustering and classification for longitudinal data. R Package version 1.2.3
[Google Scholar]
McNicholas PD, Murphy TB. 2008. Parsimonious Gaussian mixture models. Stat. Comput. 18:3285–96
[Google Scholar]
McNicholas PD, Murphy TB. 2010. Model-based clustering of longitudinal data. Can. J. Stat. 38:1153–68
[Google Scholar]
McParland D, Gormley IC. 2016. Model based clustering for mixed data: clustMD. Adv. Data Anal. Classif. 10:2155–69
[Google Scholar]
McParland D, Gormley IC. 2017. clustMD: Model based clustering for mixed data. R Package version 1.2.1
[Google Scholar]
McParland D, Gormley IC, McCormick TH, Clark SJ, Kabudula CW, Collinson MA. 2014. Clustering South African households based on their asset status using latent variable models. Ann. Appl. Stat. 8:2747–76
[Google Scholar]
McParland D, Murphy TB 2019. Mixture modelling of high-dimensional data. Handbook of Mixture Analysis S Frühwirth-Schnatter, G Celeux, CP Robert 39–70 Boca Raton, FL: Chapman and Hall/CRC
[Google Scholar]
McParland D, Phillips CM, Brennan L, Roche HM, Gormley IC. 2017. Clustering high-dimensional mixed data to uncover sub-phenotypes: joint analysis of phenotypic and genotypic data. Stat. Med. 36:284548–69
[Google Scholar]
Melnykov V. 2016a. ClickClust: an R package for model-based clustering of categorical sequences. J. Stat. Softw. 74:91–34
[Google Scholar]
Melnykov V. 2016b. Model-based biclustering of clickstream data. Comput. Stat. Data Anal. 93:31–45
[Google Scholar]
Mengersen KL, Robert CP 1996. Testing for mixtures: a Bayesian entropic approach. Bayesian Statistics 5: Proceedings of the Fifth Valencia International Meeting JM Bernardo, JO Berger, AP Dawid, AFM Smith 255–76 Oxford, UK: Oxford Univ. Press
[Google Scholar]
Mollica C, Tardella L. 2014. Epitope profiling via mixture modeling of ranked data. Stat. Med. 33:213738–58
[Google Scholar]
Mollica C, Tardella L. 2017. Bayesian Plackett-Luce mixture models for partially ranked data. Psychometrika 82:2442–58
[Google Scholar]
Mollica C, Tardella L. 2020. PLMIX: an R package for modelling and clustering partially ranked data. J. Stat. Comput. Simul. 90:5925–59
[Google Scholar]
Mollica C, Tardella L. 2021. Bayesian analysis of ranking data with the extended Plackett-Luce model. Stat. Methods Appl. 30:1175–94
[Google Scholar]
Montanari A, Viroli C. 2010. Heteroscedastic factor mixture analysis. Stat. Model. 10:4441–60
[Google Scholar]
Morris K, McNicholas PD. 2016. Clustering, classification, discriminant analysis, and dimension reduction via generalized hyperbolic mixtures. Comput. Stat. Data Anal. 97:133–50
[Google Scholar]
Murphy K, Murphy TB. 2020. Gaussian parsimonious clustering models with covariates and a noise component. Adv. Data Anal. Classif. 14:2293–325
[Google Scholar]
Murphy K, Murphy TB, Piccarreta R, Gormley IC. 2021. Clustering longitudinal life-course sequences using mixtures of exponential-distance models. J. R. Stat. Soc. Ser. A 184:41414–51
[Google Scholar]
Murphy K, Murphy TB, Piccarreta R, Gormley IC. 2022. MEDseq: Mixtures of exponential-distance models with covariates. R Package version 1.3.3
[Google Scholar]
Murphy K, Viroli C, Gormley IC. 2020. Infinite mixtures of infinite factor analysers. Bayesian Anal. 15:3937–63
[Google Scholar]
Murphy TB, Martin D. 2003. Mixtures of distance-based models for ranking data. Comput. Stat. Data Anal. 41:3–4645–55
[Google Scholar]
Murray PM, Browne RP, McNicholas PD. 2020. Mixtures of hidden truncation hyperbolic factor analyzers. J. Classif. 37:2366–79
[Google Scholar]
Murtagh F, Raftery AE. 1984. Fitting straight lines to point patterns. Pattern Recognit. 17:5479–83
[Google Scholar]
Ng TLJ, Murphy TB. 2021. Model-based clustering of count processes. J. Classif. 38:2188–211
[Google Scholar]
Nowicki K, Snijders TAB. 2001. Estimation and prediction of stochastic blockstructures. J. Am. Stat. Assoc. 96:4551077–87
[Google Scholar]
O'Hagan A, Murphy TB, Gormley IC. 2012. Computational aspects of fitting mixture models via the expectation–maximization algorithm. Comput. Stat. Data Anal. 56:123843–64
[Google Scholar]
O'Hagan A, Murphy TB, Gormley IC, McNicholas PD, Karlis D. 2016. Clustering with the multivariate normal inverse Gaussian distribution. Comput. Stat. Data Anal. 93:18–30
[Google Scholar]
Papastamoulis P. 2018. Overfitting Bayesian mixtures of factor analyzers with an unknown number of components. Comput. Stat. Data Anal. 124:220–34
[Google Scholar]
Pearson K. 1894. Contributions to the mathematical theory of evolution. Philos. Trans. R. Soc. A 185:71–110
[Google Scholar]
Plackett RL. 1975. The analysis of permutations. J. R. Stat. Soc. Ser. C 24:2193–202
[Google Scholar]
Pyne S, Hu X, Wang K, Rossin E, Lin TI et al. 2009. Automated high-dimensional flow cytometric data analysis. PNAS 106:218519–24
[Google Scholar]
Quintana FA, Iglesias PL. 2003. Bayesian clustering and product partition models. J. R. Stat. Soc. Ser. B 65:2557–74
[Google Scholar]
Raftery AE 1996. Hypothesis testing and model selection. Markov Chain Monte Carlo in Practice WR Gilks, S Richardson, DJ Spiegelhalter 163–88 London: Chapman and Hall
[Google Scholar]
Raftery AE, Dean N. 2006. Variable selection for model-based clustering. J. Am. Stat. Assoc. 101:473168–78
[Google Scholar]
Raftery AE, Newton M, Satagopan J, Krivitsky P 2007. Estimating the integrated likelihood via posterior simulation using the harmonic mean identity. Bayesian Statistics 8 JM Bernardo, MJ Bayarri, JO Berger, AP Dawid, D Heckerman et al.1–45 Oxford, UK: Oxford Univ. Press
[Google Scholar]
Richardson S, Green PJ. 1997. On Bayesian analysis of mixtures with an unknown number of components (with discussion). J. R. Stat. Soc. Ser. B 59:4731–92
[Google Scholar]
Roick T, Karlis D, McNicholas PD. 2021. Clustering discrete-valued time series. Adv. Data Anal. Classif. 15:1209–29
[Google Scholar]
Rost J. 1990. Rasch models in latent classes: an integration of two approaches to item analysis. Appl. Psychol. Meas. 14:3271–82
[Google Scholar]
Rousseau J, Mengersen K. 2011. Asymptotic behaviour of the posterior distribution in overfitted mixture models. J. R. Stat. Soc. Ser. B 73:5689–710
[Google Scholar]
Salter-Townshend M, White A, Gollini I, Murphy TB. 2012. Review of statistical network analysis: models, algorithms, and software. Stat. Anal. Data Min. 5:4243–64
[Google Scholar]
Schwarz G. 1978. Estimating the dimension of a model. Ann. Stat. 6:2461–64
[Google Scholar]
Scrucca L, Fop M, Murphy TB, Raftery AE. 2016. mclust 5: Clustering, classification and density estimation using Gaussian finite mixture models. R J. 8:1289–317
[Google Scholar]
Scrucca L, Fraley C, Murphy TB, Raftery AE. 2022. Model-Based Clustering, Classification and Density Estimation Using mclust in R Boca Raton, FL: Chapman and Hall/CRC
Scrucca L, Raftery AE. 2018. clustvarsel: A package implementing variable selection for Gaussian model-based clustering in R. J. Stat. Softw. 84:11–28
[Google Scholar]
Sneath PHA. 1957. The application of computers to taxonomy. J. Gen. Microbiol. 17:201–6
[Google Scholar]
Snijders TAB, Nowicki K. 1997. Estimation and prediction for stochastic blockmodels for graphs with latent block structure. J. Classif. 14:175–100
[Google Scholar]
Sokal RR, Michener CD. 1958. A statistical method for evaluating systematic relationships. Univ. Kans. Sci. Bull. 38:1409–38
[Google Scholar]
Sørensen Ø, Crispino M, Liu Q, Vitelli V. 2020. BayesMallows: an R package for the Bayesian Mallows model. R J. 12:1324–42
[Google Scholar]
Spiegelhalter DJ, Best NG, Carlin BP, Van Der Linde A. 2002. Bayesian measures of model complexity and fit. J. R. Stat. Soc. Ser. B 64:4583–639
[Google Scholar]
Steele RJ, Raftery AE. 2010. Performance of Bayesian model selection criteria for Gaussian mixture models. In Frontiers of Statistical Decision Making and Bayesian Analysis ed. M-H Chen, P Müller, D Sun, K Ye, DK Dey, pp. 113–30. New York: Springer
[Google Scholar]
Stephens M. 2000a. Bayesian analysis of mixture models with an unknown number of components—an alternative to reversible jump methods. Ann. Stat. 28:140–74
[Google Scholar]
Stephens M. 2000b. Dealing with label switching in mixture models. J. R. Stat. Soc. Ser. B 62:4795–809
[Google Scholar]
Subedi S, Browne RP. 2020. A family of parsimonious mixtures of multivariate Poisson-lognormal distributions for clustering multivariate count data. Stat 9:1e310
[Google Scholar]
Tang Y, Browne RP, McNicholas PD. 2018. Flexible clustering of high-dimensional data via mixtures of joint generalized hyperbolic distributions. Stat 7:1e177
[Google Scholar]
Tanner MA, Wong WH. 1987. The calculation of posterior distributions by data augmentation. J. Am. Stat. Assoc. 82:398528–40
[Google Scholar]
Teicher H. 1963. Identifiability of finite mixtures. Ann. Math. Stat. 34:41265–69
[Google Scholar]
Van Havre Z, White N, Rousseau J, Mengersen K. 2015. Overfitting Bayesian mixture models with an unknown number of components. PLOS ONE 10:7e0131739
[Google Scholar]
Vermunt J. 2007. Multilevel mixture item response theory models: an application in education testing. Proceedings of the 56th Session of the International Statistical Institute, Lisbon, Portugal Voorburg, Neth: Int. Stat. Inst.
[Google Scholar]
Viroli C, Anderlucci L. 2021. Deep mixtures of unigrams for uncovering topics in textual data. Stat. Comput. 31:31–10
[Google Scholar]
Viroli C, McLachlan GJ. 2019. Deep Gaussian mixture models. Stat. Comput. 29:143–51
[Google Scholar]
Vitelli V, Sørensen Ø, Crispino M, Frigessi A, Arjas E. 2017. Probabilistic preference learning with the Mallows rank model. J. Mach. Learn. Res. 18:1–49
[Google Scholar]
Wasserman S, Robins G, Steinley D 2007. Statistical models for networks: a brief review of some recent research. Statistical Network Analysis: Models, Issues, and New Directions EM Airoldi, DM Blei, SE Fienberg, A Goldenberg, EP Xing, AX Zheng 45–56 New York: Springer
[Google Scholar]
White A, Murphy TB. 2014. BayesLCA: an R package for Bayesian latent class analysis. J. Stat. Softw. 61:131–28
[Google Scholar]
Wolfe JH. 1965. A computer program for the maximum-likelihood analysis of types USNPRA Tech. Bull. 65-15 US Naval Pers. Res. Act. San Diego, CA:
Wolfe JH. 1967. NORMIX: computational methods for estimating the parameters of multivariate normal mixture distributions of types. USNPRA Tech. Bull. 68-2 US Naval Pers. Res. Act. San Diego, CA:
Wolfe JH. 1970. Pattern clustering by multivariate mixture analysis. Multivar. Behav. Res. 5:3329–50
[Google Scholar]
Yuksel SE, Wilson JN, Gader PD. 2012. Twenty years of mixture of experts. IEEE Trans. Neural Netw. Learn. Syst. 23:81177–93
[Google Scholar]
Zhang Y, Melnykov V, Zhu X. 2021. Model-based clustering of time-dependent categorical sequences with application to the analysis of major life event patterns. Stat. Anal. Data Min. 14:3230–40
[Google Scholar]

/content/journals/10.1146/annurev-statistics-033121-115326

Model-Based Clustering

Annual Review of Statistics and Its Application 10, 573 (2023); https://doi.org/10.1146/annurev-statistics-033121-115326

/content/journals/10.1146/annurev-statistics-033121-115326

Data & Media loading...

Article Type: Review Article

Most Cited Most Cited RSS feed

- Probabilistic Forecasting
  
  Tilmann Gneiting, and Matthias Katzfuss
  
  Vol. 1 (2014), pp. 125–151
- Functional Data Analysis
  
  Jane-Ling Wang, Jeng-Min Chiou, and Hans-Georg Müller
  
  Vol. 3 (2016), pp. 257–295
- Bayesian Computing with INLA: A Review
  
  Håvard Rue, Andrea Riebler, Sigrunn H. Sørbye, Janine B. Illian, Daniel P. Simpson, and Finn K. Lindgren
  
  Vol. 4 (2017), pp. 395–421
- Functional Regression
  
  Jeffrey S. Morris
  
  Vol. 2 (2015), pp. 321–359
- Topological Data Analysis
  
  Larry Wasserman
  
  Vol. 5 (2018), pp. 501–532
- Algorithmic Fairness: Choices, Assumptions, and Definitions
  
  Shira Mitchell, Eric Potash, Solon Barocas, Alexander D'Amour, and Kristian Lum
  
  Vol. 8 (2021), pp. 141–163
- Microbiome, Metagenomics, and High-Dimensional Compositional Data Analysis
  
  Hongzhe Li
  
  Vol. 2 (2015), pp. 73–94
- Learning Deep Generative Models
  
  Ruslan Salakhutdinov
  
  Vol. 2 (2015), pp. 361–385
- On p-Values and Bayes Factors
  
  Leonhard Held, and Manuela Ott
  
  Vol. 5 (2018), pp. 393–419
- High-Dimensional Statistics with a View Toward Applications in Biology
  
  Peter Bühlmann, Markus Kalisch, and Lukas Meier
  
  Vol. 1 (2014), pp. 255–278
More Less

Annual Review of Statistics and Its Application

Volume 10, 2023

Review Article

Open Access

Model-Based Clustering

Abstract

Most Read This Month

Most Cited Most Cited RSS feed