We survey latent variable models for solving data-analysis problems. A latent variable model is a probabilistic model that encodes hidden patterns in the data. We uncover these patterns from their conditional distribution and use them to summarize data and form predictions. Latent variable models are important in many fields, including computational biology, natural language processing, and social network analysis. Our perspective is that models are developed iteratively: We build a model, use it to analyze data, assess how it succeeds and fails, revise it, and repeat. We describe how new research has transformed these essential activities. First, we describe probabilistic graphical models, a language for formulating latent variable models. Second, we describe mean field variational inference, a generic algorithm for approximating conditional distributions. Third, we describe how to use our analyses to solve problems: exploring the data, forming predictions, and pointing us in the direction of improved models.


Article metrics loading...

Loading full text...

Full text loading...


Literature Cited

  1. Airoldi EM, Blei DM, Fienberg SE, Xing EP. 2008. Mixed membership stochastic blockmodels. J. Mach. Learn. Res. 9:1981–2014 [Google Scholar]
  2. Antoniak CE. 1974. Mixtures of Dirichlet processes with applications to Bayesian nonparametric problems. Ann. Stat. 2:1152–74 [Google Scholar]
  3. Asuncion A, Welling M, Smyth P, Teh YW. 2009. On smoothing and inference for topic models. Proc. Conf. Uncertain. Artif. Intell. 25:27–34 [Google Scholar]
  4. Attias H. 1999. Inferring parameters and structure of latent variable models by variational Bayes. Proc. Conf. Uncertain. Artif. Intell. 15:21–30 [Google Scholar]
  5. Attias H. 2000. A variational Bayesian framework for graphical models. Adv. Neural Inf. Process. Syst. 12:209–15 [Google Scholar]
  6. Baker FB. 1992. Item Response Theory New York: Marcel Dekker
  7. Bar-Shalom Y, Li XR, Kirubarajan T. 2004. Estimation with Applications to Tracking and Navigation: Theory Algorithms and Software New York: Wiley
  8. Bartholomew DJ, Knott M, Moustaki I. 2011. Latent Variable Models and Factor Analysis 899 A Unified Approach New York: Wiley
  9. Belin TR, Rubin DB. 1995. The analysis of repeated-measures data on schizophrenic reaction times using mixture models. Stat. Med. 14:747–68 [Google Scholar]
  10. Bell RM, Koren Y. 2007. Lessons from the Netflix prize challenge. ACM SIGKDD Explor. Newsl. 9:75–79 [Google Scholar]
  11. Bernardo JM, Smith AFM. 1994. Bayesian Theory Chichester, UK: Wiley
  12. Bishop CM. 2006. Pattern Recognition and Machine Learning New York: Springer
  13. Bishop CM. 2013. Model-based machine learning. Philos. Trans. R. Soc. A 371:20120222 [Google Scholar]
  14. Bishop CM, Spiegelhalter D, Winn J. 2003. VIBES: a variational inference engine for Bayesian networks. Adv. Neural Inf. Process. Syst. 15:793–800 [Google Scholar]
  15. Blei DM. 2012. Probabilistic topic models. Commun. ACM 55:77–84 [Google Scholar]
  16. Blei DM, Lafferty JD. 2006. Dynamic topic models. Proceedings of the 23rd International Conference on Machine Learning W Cohen, A Moore 113–20 New York: Assoc. Comput. Mach.
  17. Blei DM, Lafferty JD. 2007. A correlated topic model of science. Ann. Appl. Stat. 1:17–35 [Google Scholar]
  18. Blei DM, Ng AY, Jordan MI. 2003. Latent Dirichlet allocation. J. Mach. Learn. Res. 3:993–1022 [Google Scholar]
  19. Box GEP. 1976. Science and statistics. J. Am. Stat. Assoc. 71:791–99 [Google Scholar]
  20. Box GEP. 1980. Sampling and Bayes' inference in scientific modelling and robustness. J. R. Stat. Soc. A 143:383–430 [Google Scholar]
  21. Box GEP, Draper NR. 1987. Empirical Model-Building and Response Surfaces New York: Wiley
  22. Box GEP, Hill WJ. 1967. Discrimination among mechanistic models. Technometrics 9:57–71 [Google Scholar]
  23. Box GEP, Hunter WG. 1962. A useful method for model-building. Technometrics 4:301–18 [Google Scholar]
  24. Box GEP, Hunter WG. 1965. The experimental study of physical mechanisms. Technometrics 7:23–42 [Google Scholar]
  25. Box GEP, Tiao GC. 1973. Bayesian Inference in Statistical Analysis New York: Wiley
  26. Brown LD. 1986. Fundamentals of Statistical Exponential Families with Applications in Statistical Decision Theory Hayward, CA: Inst. Math. Stat.
  27. Claeskens G, Hjort NL. 2008. Model Selection and Model Averaging New York: Cambridge Univ. Press
  28. Clinton J, Jackman S, Rivers D. 2004. The statistical analysis of roll call data. Am. Polit. Sci. Rev. 98:355–70 [Google Scholar]
  29. Collins M, Dasgupta S, Schapire R. 2002. A generalization of principal component analysis to the exponential family. Adv. Neural Inf. Process. Syst. 14:617–24 [Google Scholar]
  30. Cook RD, Weisberg S. 1982. Residuals and Influence in Regression London: Chapman & Hall
  31. Dawid AP, Lauritzen SL. 1993. Hyper Markov laws in the statistical analysis of decomposable graphical models. Ann. Stat. 21:1272–317 [Google Scholar]
  32. Dempster AP, Laird NM, Rubin DB. 1977. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B 36:1–38 [Google Scholar]
  33. Diaconis P. 1985. Theories of data analysis: from magical thinking through classical statistics. Exploring Data: Tables, Trends, and Shapes DC Hoaglin, F Mosteller, JW Tukey 1–36 New York: Wiley [Google Scholar]
  34. Durbin R, Eddy SR, Krogh A, Mitchison G. 1998. Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids New York: Cambridge Univ. Press
  35. Efron B. 2013. Empirical Bayes modeling, computation, and accuracy Tech. rep. 263, Div. Biostat., Stanford Univ., Stanford, CA. http://statweb.stanford.edu/∼ckirby/brad/papers/2013EBModeling.pdf
  36. Efron B, Morris C. 1973. Combining possibly related estimation problems. J. R. Stat. Soc. B 35:379–421 [Google Scholar]
  37. Erosheva E, Fienberg S, Lafferty J. 2004. Mixed-membership models of scientific publications. Proc. Natl. Acad. Sci. USA 101:Suppl. 15220–27 [Google Scholar]
  38. Erosheva EA, Fienberg SE, Joutard C. 2007. Describing disability through individual-level mixture models for multivariate binary data. Ann. Appl. Stat. 1:502–37 [Google Scholar]
  39. Ferguson TS. 1973. A Bayesian analysis of some nonparametric problems. Ann. Stat. 1:209–30 [Google Scholar]
  40. Geisser S. 1975. The predictive sample reuse method with applications. J. Am. Stat. Assoc. 70:320–28 [Google Scholar]
  41. Gelfand AE, Smith AFM. 1990. Sampling-based approaches to calculating marginal densities. J. Am. Stat. Assoc. 85:398–409 [Google Scholar]
  42. Gelman A, Carlin JB, Stern HS, Rubin DB. 1995. Bayesian Data Analysis London: Chapman & Hall
  43. Gelman A, Hill J. 2007. Data Analysis Using Regression and Multilevel/Hierarchical Models New York: Cambridge Univ. Press
  44. Gelman A, Meng X-L, Stern H. 1996. Posterior predictive assessment of model fitness via realized discrepancies. Stat. Sin. 6:733–807 [Google Scholar]
  45. Gelman A, Shalizi CR. 2012. Philosophy and the practice of Bayesian statistics. Br. J. Math. Stat. Psychol. 66:8–38 [Google Scholar]
  46. Gelman A, Van Mechelen I, Verbeke G, Heitjan DF, Meulders M. 2005. Multiple imputation for model checking: completed-data plots with missing and latent data. Biometrics 61:74–85 [Google Scholar]
  47. Geman S, Geman D. 1984. Stochastic relaxation, Gibbs distributions and the Bayesian restoration of images. IEEE Trans. Pattern Anal. Mach. Intell. 6:721–41 [Google Scholar]
  48. Gershman SJ, Blei DM. 2012. A tutorial on Bayesian nonparametric models. J. Math. Psychol. 56:1–12 [Google Scholar]
  49. Ghahramani Z. 2012. Bayesian nonparametrics and the probabilistic approach to modelling. Philos. Trans. R. Soc. A 371:1984 [Google Scholar]
  50. Ghahramani Z, Beal MJ. 2001. Propagation algorithms for variational Bayesian learning. Adv. Neural Inf. Process. Syst. 13:507–13 [Google Scholar]
  51. Gilks WR, Thomas A, Spiegelhalter DJ. 1992. A language and program for complex Bayesian modelling. Statistician 43:169–77 [Google Scholar]
  52. Good IJ. 1983. The philosophy of exploratory data analysis. Philos. Sci. 50:283–95 [Google Scholar]
  53. Good IJ. 2009 (1983). Subjective probability as the measure of a non-measurable set. Good Thinking: The Foundations of Probability and Its Applications73–82 Mineola, NY: Dover [Google Scholar]
  54. Guttman I. 1967. The use of the concept of a future observation in goodness-of-fit problems. J. R. Stat. Soc. B 29:83–100 [Google Scholar]
  55. Hastings WK. 1970. Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57:97–109 [Google Scholar]
  56. Hjort NL, Holmes C, Müller P, Walker SG. 2010. Bayesian Nonparametrics New York: Cambridge Univ. Press
  57. Hoffman MD, Blei DM, Wang C, Paisley J. 2013. Stochastic variational inference. J. Mach. Learn. Res. 14:1303–47 [Google Scholar]
  58. Hotelling H. 1933. Analysis of a complex of statistical variables into principal components. J. Educ. Psychol. 24:417–41 [Google Scholar]
  59. Hotelling H. 1936. Relations between two sets of variates. Biometrika 28:321–77 [Google Scholar]
  60. Ishwaran H, Rao JS. 2005. Spike and slab variable selection: frequentist and Bayesian strategies. Ann. Stat. 33:730–73 [Google Scholar]
  61. Jordan MI. 1999. Learning in Graphical Models Cambridge, MA: MIT Press
  62. Jordan MI. 2004. Graphical models. Stat. Sci. 19:140–55 [Google Scholar]
  63. Jordan MI, Ghahramani Z, Jaakkola TS, Saul LK. 1999. An introduction to variational methods for graphical models. Mach. Learn. 37:183–233 [Google Scholar]
  64. Kalman RE. 1960. A new approach to linear filtering and prediction problems. Trans. ASME J. Basic Eng. 82:35–45 [Google Scholar]
  65. Kass RE, Raftery AE. 1995. Bayes factors. J. Am. Stat. Assoc. 90:773–95 [Google Scholar]
  66. Knowles DA, Minka TP. 2011. Non-conjugate variational message passing for multinomial and binary regression. Adv. Neural Inf. Process. Syst. 24:1701–9 [Google Scholar]
  67. Koller D, Friedman N. 2009. Probabilistic Graphical Models: Principles and Techniques Cambridge, MA: MIT Press
  68. Koren Y, Bell R, Volinsky C. 2009. Matrix factorization techniques for recommender systems. Computer 42:30–37 [Google Scholar]
  69. Krnjajić M, Kottas A, Draper D. 2008. Parametric and nonparametric Bayesian model specification: a case study involving models for count data. Comput. Stat. Data Anal. 52:2110–28 [Google Scholar]
  70. Kullback S, Leibler RA. 1951. On information and sufficiency. Ann. Math. Stat. 22:79–86 [Google Scholar]
  71. Lauritzen SL. 2007. Discussion of some aspects of model selection for prediction: article of Chakrabarti and Ghosh. Bayesian Statistics 8 JM Bernardo, MJ Bayarri, JO Berger, AP David, D Heckerman, et al. 84–90 Oxford, UK: Oxford Univ. Press [Google Scholar]
  72. Lee DD, Seung HS. 1999. Learning the parts of objects by non-negative matrix factorization. Nature 401:788–91 [Google Scholar]
  73. Lehmann EL. 1990. Model specification: the views of Fisher and Neyman, and later developments. Stat. Sci 5:2160–68 [Google Scholar]
  74. MacKay DJC. 2003. Information Theory, Inference, and Learning Algorithms New York: Cambridge Univ. Press
  75. McAuliffe JD, Pachter L, Jordan MI. 2004. Multiple-sequence functional annotation and the generalized hidden Markov phylogeny. Bioinformatics 20:1850–60 [Google Scholar]
  76. McCallum A, Schultz K, Singh S. 2009. FACTORIE: probabilistic programming via imperatively defined factor graphs. Adv. Neural Inf. Process. Syst. 22:1249–57 [Google Scholar]
  77. McCullagh P, Nelder JA. 1989. Generalized Linear Models London: Chapman & Hall
  78. Meng X-L. 1994. Posterior predictive p-values. Ann. Stat. 22:1142–60 [Google Scholar]
  79. Metropolis N, Rosenbluth AW, Rosenbluth MN, Teller AH, Teller E. 1953. Equations of state calculations by fast computing machines. J. Chem. Phys. 21:1087–92 [Google Scholar]
  80. Mimno D, Blei DM. 2011. Bayesian checking for topic models. Proceedings of the Conference on Empirical Methods in Natural Language Processing P Merlo 227–37 Stroudsburg, PA: Assoc. Comput. Linguist.
  81. Mohamed S, Ghahramani Z, Heller K. 2008. Bayesian exponential family PCA. Adv. Neural Inf. Process. Syst. 21:1089–96 [Google Scholar]
  82. Morris CN. 1983. Parametric empirical Bayes inference: theory and applications. J. Am. Stat. Assoc. 78:47–65 [Google Scholar]
  83. Murphy KP. 2013. Machine Learning: A Probabilistic Approach Cambridge, MA: MIT Press
  84. Neal RM, Hinton GE. 1999. A view of the EM algorithm that justifies incremental, sparse, and other variants. Learning in Graphical Models MI Jordan 355–68 Cambridge, MA: MIT Press [Google Scholar]
  85. Nelder JA, Wedderburn RWM. 1972. Generalized linear models. J. R. Stat. Soc. A 135:370–84 [Google Scholar]
  86. Pearl J. 1988. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference New York: Morgan Kaufmann Publ.
  87. Pearson K. 1901. On lines and planes of closest fit to systems of points. Lond. Edinb. Dublin Philos. Mag. J. Sci. 6:559–72 [Google Scholar]
  88. Peterson C, Anderson JR. 1987. A mean field theory learning algorithm for neural networks. Complex Syst. 1:995–1019 [Google Scholar]
  89. Popper KR. 1959. The Logic of Scientific Discovery London: Hutchinson
  90. Pritchard JK, Stephens M, Donnelly P. 2000. Inference of population structure using multilocus genotype data. Genetics 155:945–59 [Google Scholar]
  91. Rabiner LR. 1989. A tutorial on hidden Markov models and selected applications in speech recognition. Proc. IEEE 77:257–86 [Google Scholar]
  92. Robbins H. 1980. An empirical Bayes estimation problem. Proc. Natl. Acad. Sci. USA 77:6988–89 [Google Scholar]
  93. Robbins H, Monro S. 1951. A stochastic approximation method. Ann. Math. Stat. 22:400–7 [Google Scholar]
  94. Robert CP, Casella G. 2004. Monte Carlo Statistical Methods New York: Springer
  95. Roweis S. 1998. EM algorithms for PCA and SPCA. Adv. Neural Inf. Process. Syst. 10:626–32 [Google Scholar]
  96. Rubin DB. 1984. Bayesianly justifiable and relevant frequency calculations for the applied statistician. Ann. Stat. 12:1151–72 [Google Scholar]
  97. Rue H, Martino S, Chopin N. 2009. Approximate Bayesian inference for latent Gaussian models by using integrated nested Laplace approximations. J. R. Stat. Soc. B 71:319–92 [Google Scholar]
  98. Salakhutdinov R, Mnih A. 2008. Probabilistic matrix factorization. Adv. Neural Inf. Process. Syst. 20:1257–64 [Google Scholar]
  99. Saul LK, Jordan MI. 1996. Exploiting tractable substructures in intractable networks. Adv. Neural Inf. Process. Syst. 8:486–92 [Google Scholar]
  100. Siepel A, Haussler D. 2004. Combining phylogenetic and hidden Markov models in biosequence analysis. J. Comput. Biol. 11:413–28 [Google Scholar]
  101. Skrondal A, Rabe-Hesketh S. 2007. Latent variable modelling: a survey. Scand. J. Stat. 34:712–45 [Google Scholar]
  102. Smola AJ, Vishwanathan SVN, Eskin E. 2003. Laplace propagation. Adv. Neural Inf. Process. Syst. 16:441–48 [Google Scholar]
  103. Stan Development Team 2013. Stan: A C++ Library for Probability and Sampling, Version 1.3 http://mc-stan.org [Google Scholar]
  104. Steyvers M, Griffiths T. 2006. Probabilistic topic models. Latent Semantic Analysis: A Road to Meaning T Landauer, D McNamara, S Dennis, W Kintsch 424–40 London: Laurence Erlbaum [Google Scholar]
  105. Stone M. 1974. Cross-validatory choice and assessment of statistical predictions. J. R. Stat. Soc. B 36:111–47 [Google Scholar]
  106. Teh YW, Jordan MI. 2008. Hierarchical Bayesian nonparametric models with applications. Bayesian Nonparametrics NL Hjort, C Holmes, P Müller, SG Walker 158–207 New York: Cambridge Univ. Press [Google Scholar]
  107. Thomson G. 1939. The factorial analysis of human ability. Br. J. Educ. Psychol. 9:188–95 [Google Scholar]
  108. Thurstone LL. 1931. Multiple factor analysis. Psychol. Rev. 38:406–27 [Google Scholar]
  109. Thurstone LL. 1938. Primary Mental Abilities Chicago: Univ. Chicago Press
  110. Tierney L, Kass RE, Kadane JB. 1989. Fully exponential Laplace approximations to expectations and variances of nonpositive functions. J. Am. Stat. Assoc. 84:710–16 [Google Scholar]
  111. Tipping ME, Bishop CM. 1999. Probabilistic principal component analysis. J. R. Stat. Soc. B 61:611–22 [Google Scholar]
  112. Tukey JW. 1962. The future of data analysis. Ann. Math. Stat. 33:1–67 [Google Scholar]
  113. van Dyk DA, Kang H. 2004. Highly structured models for spectral analysis in high-energy astrophysics. Stat. Sci. 19:275–93 [Google Scholar]
  114. Wainwright MJ, Jordan MI. 2008. Graphical models, exponential families, and variational inference. Found. Trends Mach. Learn. 1:1–305 [Google Scholar]
  115. Wang C, Blei DM. 2013. Variational inference in nonconjugate models. J. Mach. Learn. Res. 14:1005–1031 [Google Scholar]
  116. West M, Harrison J. 1997. Bayesian Forecasting and Dynamic Models Berlin: Springer
  117. Wiegerinck W. 2000. Variational approximations between mean field theory and the junction tree algorithm. Proc. Conf. Uncertain. Artif. Intell 16:626–33 [Google Scholar]
  118. Xing EP, Jordan MI, Russell S. 2003. A generalized mean field algorithm for variational inference in exponential families. Proc. Conf. Uncertain. Artif. Intell. 19:583–91 [Google Scholar]

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error