Learning Deep Generative Models

Ruslan Salakhutdinov

doi:10.1146/annurev-statistics-010814-020120

Annual Review of Statistics and Its Application

Volume 2, 2015

Review Article

Free

Learning Deep Generative Models

Ruslan Salakhutdinov¹
View Affiliations Hide Affiliations

Affiliations: Departments of Computer Science and Statistical Sciences, University of Toronto, Toronto M5S 3G4, Canada; email: [email protected]
Vol. 2:361-385 (Volume publication date May 2015) https://doi.org/10.1146/annurev-statistics-010814-020120
© Annual Reviews

Abstract

Building intelligent systems that are capable of extracting high-level representations from high-dimensional sensory data lies at the core of solving many artificial intelligence–related tasks, including object recognition, speech perception, and language understanding. Theoretical and biological arguments strongly suggest that building such systems requires models with deep architectures that involve many layers of nonlinear processing. In this article, we review several popular deep learning models, including deep belief networks and deep Boltzmann machines. We show that (a) these deep generative models, which contain many layers of latent variables and millions of parameters, can be learned efficiently, and (b) the learned high-level feature representations can be successfully applied in many application domains, including visual object recognition, information retrieval, classification, and regression tasks.

Keyword(s): deep belief networks, deep Boltzmann machines, deep learning, graphical models

Article metrics loading...

/content/journals/10.1146/annurev-statistics-010814-020120

2015-04-10

2024-04-27

Full text loading...

/deliver/fulltext/statistics/2/1/annurev-statistics-010814-020120.html?itemId=/content/journals/10.1146/annurev-statistics-010814-020120&mimeType=html&fmt=ahah

Literature Cited

Bengio Y. 2009. Learning deep architectures for AI. Found. Trends Mach. Learn. 2:1–127 [Google Scholar]
Bengio Y, Lamblin P, Popovici D, Larochelle H. 2007. Greedy layer-wise training of deep networks. Adv. Neural Inf. Process. Syst. 19:153–60 [Google Scholar]
Bengio Y, LeCun Y. 2007. Scaling learning algorithms towards AI. Large-Scale Kernel Machines L Bottou, O Chapelle, D DeCoste, J Weston 321–360 Cambridge, MA: MIT Press [Google Scholar]
Blei DM, Ng AY, Jordan MI. 2003. Latent Dirichlet allocation. J. Mach. Learn. Res. 3:993–1022 [Google Scholar]
Blei DM. 2014. Build, compute, critique, repeat: data analysis with latent variable models. Annu. Rev. Stat. Appl. 1:203–32 [Google Scholar]
Collobert R, Weston J. 2008. A unified architecture for natural language processing: deep neural networks with multitask learning. Proc. 25th Int. Conf. Mach. Learn. Helsinki Jul. 5–9 160–67 New York: ACM [Google Scholar]
Dahl GE, Jaitly N, Salakhutdinov R. 2014. Multi-task neural networks for QSAR predictions. arXiv:1406.1231 [stat.ML]
Decoste D, Schölkopf B. 2002. Training invariant support vector machines. Mach. Learn. 46:161–90 [Google Scholar]
Guillaumin M, Verbeek J, Schmid C. 2010. Multimodal semi-supervised learning for image classification. Proc. IEEE Conf. Comput. Vis. Pattern Recognit., San Francisco Jun. 13–18 902–9 Piscataway, NJ: IEEE [Google Scholar]
Hinton GE. 2002. Training products of experts by minimizing contrastive divergence. Neural Comput. 14:81711–800 [Google Scholar]
Hinton GE. 2007. To recognize shapes, first learn to generate images. Prog. Brain Res. 165:535–47 [Google Scholar]
Hinton GE, Salakhutdinov RR. 2006. Reducing the dimensionality of data with neural networks. Science 313:5786504–7 [Google Scholar]
Hinton GE, Sejnowski T. 1983. Optimal perceptual inference. Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Washington, DC448–53 Silver Spring, MD: IEEE [Google Scholar]
Hinton GE, Osindero S, Teh Y-W. 2006. A fast learning algorithm for deep belief nets. Neural Comput. 18:71527–54 [Google Scholar]
Hinton G, Deng L, Yu D, Dahl GE, Mohamed A. et al. 2012. Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. 29:682–97 [Google Scholar]
Hofmann T. 1999. Probabilistic latent semantic analysis. Proc. 15th Conf. Uncertainty in Artif. Intell. Stockholm, Swe. Jul. 30–Aug. 1 289–96 San Francisco: Morgan Kaufmann [Google Scholar]
Huiskes MJ, Lew MS. 2008. The MIR Flickr retrieval evaluation. Proc. 16th Int. Conf. Multimed. Inf. Retr., Vancouver, Can.Oct. 30–31 New York: ACM [Google Scholar]
Krizhevsky A, Sutskever I, Hinton GE. 2012. ImageNet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25:1106–14 [Google Scholar]
Larochelle H, Bengio Y, Louradour J, Lamblin P. 2009. Exploring strategies for training deep neural networks. J. Mach. Learn. Res. 10:1–40 [Google Scholar]
LeCun Y, Huang FJ, Bottou L. 2004. Learning methods for generic object recognition with invariance to pose and lighting. Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Washington, DC Jun. 27–Jul. 2 297–104 Los Alamitos, CA: IEEE [Google Scholar]
Lee H, Grosse R, Ranganath R, Ng AY. 2009. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. Proc. 26th Intl. Conf. Mach. Learn. Montreal Jun. 14–18 609–16 New York: ACM [Google Scholar]
Lee TS, Mumford D, Romero R, Lamme V. 1998. The role of the primary visual cortex in higher level vision. Vision Res. 38:2429–54 [Google Scholar]
Lenz I, Lee H, Saxena A. 2013. Deep learning for detecting robotic grasps. Proc. Robot. Sci. Syst. IX Berlin, Ger.Jun. 24–28 http://www.roboticsproceedings.org/rss09/p12.pdf
Memisevic R, Hinton GE. 2010. Learning to represent spatial transformations with factored higher-order Boltzmann machines. Neural Comput. 22:61473–92 [Google Scholar]
Mohamed A, Dahl GE, Hinton G. 2012. Acoustic modeling using deep belief networks. IEEE Trans. Audio Speech Lang. Proc. 20:14–22 [Google Scholar]
Nair V, Hinton GE. 2009. Implicit mixtures of restricted Boltzmann machines. Adv. Neural Inf. Process. Syst. 21:1145–52 [Google Scholar]
Neal RM. 2001. Annealed importance sampling. Stat. Comput. 11:125–39 [Google Scholar]
Ranzato MA, Huang F, Boureau Y, LeCun Y. 2007. Unsupervised learning of invariant feature hierarchies with applications to object recognition. Proc. IEEE Conf. Comput. Vis. Pattern Recognit., Minneapolis, MN June 18–23 1–8 Piscataway, NJ: IEEE doi: 10.1109/CVPR.2007.383157 [Google Scholar]
Ranzato M, Boureau Y-L, LeCun Y. 2008. Sparse feature learning for deep belief networks. Adv. Neural Inform. Proc. Syst. 20:1185–92 [Google Scholar]
Robbins H, Monro S. 1951. A stochastic approximation method. Ann. Math. Stat. 22:400–7 [Google Scholar]
Rumelhart DE, Hinton GE, Williams RJ. 1986. Learning representations by back-propagating errors. Nature 323:533–36 [Google Scholar]
Salakhutdinov RR. 2008. Learning and evaluating Boltzmann machines Tech. Rep. UTML TR 2008-002, Dep. Comput. Sci., Univ. Toronto, Toronto
Salakhutdinov RR, Hinton GE. 2007. Learning a nonlinear embedding by preserving class neighbourhood structure. Proc. 11th Int. Conf. Artif. Intell. Stat., San Juan, PR Mar. 21–24 412–19 Brookline, MA: Microtome [Google Scholar]
Salakhutdinov RR, Hinton GE. 2008. Using deep belief nets to learn covariance kernels for Gaussian processes. Adv. Neural Inf. Process. Syst. 20:1249–56 [Google Scholar]
Salakhutdinov RR, Hinton GE. 2009a. Deep Boltzmann machines. Proc. 12th Int. Conf. Artif. Intell. Stat., Clearwater Beach, FL Apr. 16–18 448–55 Brookline, MA: Microtome [Google Scholar]
Salakhutdinov RR, Hinton GE. 2009b. Replicated softmax: an undirected topic model. Adv. Neural Inf. Proc. Syst. 22:1607–14 [Google Scholar]
Salakhutdinov RR, Hinton GE. 2009c. Semantic hashing. Int. J. Approx. Reason. 50:969–78 [Google Scholar]
Salakhutdinov RR, Hinton GE. 2013. Modeling documents with Deep Boltzmann Machines. Proc. 29th Conf. Uncertain. Artif. Intell., Bellevue, WA Jul. 11–15 616–24 Corvallis, OR: AUAI Press [Google Scholar]
Salakhutdinov RR, Murray I. 2008. On the quantitative analysis of deep belief networks. Proc. 25th Int. Conf. Mach. Learn., Helsinki Jul. 5–9 872–79 New York: ACM [Google Scholar]
Serre T, Oliva A, Poggio TA. 2007. A feedforward architecture accounts for rapid categorization. PNAS 104:6424–29 [Google Scholar]
Smolensky P. 1986. Information processing in dynamical systems: foundations of harmony theory. Parallel Distributed Processing DE Rumelhart, JL McClelland, chapter 6 194–281 Cambridge, MA: MIT Press [Google Scholar]
Socher R, Huang EH, Pennington J, Ng AY, Manning CD. 2011. Dynamic pooling and unfolding recursive autoencoders for paraphrase detection. Adv. Neural Inf. Process. Syst. 24:801–9 [Google Scholar]
Srivastava N, Salakhutdinov R. 2014. Multimodal learning with deep Boltzmann machines. J. Machine Learn. Res. 15:2949–80 [Google Scholar]
Sutskever I, Martens J, Dahl G, Hinton G. 2013. On the importance of momentum and initialization in deep learning. Proc. 30th Int. Conf. Mach. Learn., Atlanta Jun. 16–21 1139–47
Taylor G, Hinton GE, Roweis ST. 2006. Modeling human motion using binary latent variables. Adv. Neural Inf. Process. Syst. 19:1345–52 [Google Scholar]
Taylor GW, Fergus R, LeCun Y, Bregler C. 2010. Convolutional learning of spatio-temporal features. Proc. 11th Eur. Conf. Comput. Vis., Crete, Greece Sep. 5–11 140–153 Berlin: Springer [Google Scholar]
Tieleman T. 2008. Training restricted Boltzmann machines using approximations to the likelihood gradient. Proc. 25th Int. Conf. Mach. Learn., Helsinki Jul. 5–9 1064–71 New York: ACM [Google Scholar]
Torralba A, Fergus R, Weiss Y. 2008. Small codes and large image databases for recognition. Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Anchorage, AK Jun. 23–28 1–8 Silver Spring, MD: IEEE doi: 10.1109/CVPR.2008.4587633 [Google Scholar]
Uria B, Murray I, Larochelle H. 2014. A deep and tractable density estimator. Proc. 31st Int. Conf. Mach. Learn. Beijing Jun. 21–26 467–75 Brookline, MA: Microtome [Google Scholar]
Vincent P, Larochelle H, Bengio Y, Manzagol P. 2008. Extracting and composing robust features with denoising autoencoders. Proc. 25th Int. Conf. Mach. Learn. Helsinki, Jul. 5–9 1096–103 New York: ACM [Google Scholar]
Wang T, Wu D, Coates A, Ng AY. 2012. End-to-end text recognition with convolutional neural networks. Proc. 21st Int. Conf. Pattern Recognit., Tsukuba, Jpn. Nov. 11–15 3304–8 Piscataway, NJ: IEEE [Google Scholar]
Welling M, Rosen-Zvi M, Hinton GE. 2005. Exponential family harmoniums with an application to information retrieval. Adv. Neural Inf. Process. Syst. 17:1481–88 [Google Scholar]
Younes L. 1989. Parameter inference for imperfectly observed Gibbsian fields. Probab. Theory Rel. Fields 82:625–45 [Google Scholar]
Younes L. 1999. On the convergence of Markovian stochastic algorithms with rapidly decreasing ergodicity rates. Stoch. Stoch. Rep. 65:177–228 [Google Scholar]
Yuille AL. 2004. The convergence of Contrastive Divergences. Adv. Neural Inf. Process. Syst. 17:1593–600 [Google Scholar]

/content/journals/10.1146/annurev-statistics-010814-020120

Learning Deep Generative Models

Annual Review of Statistics and Its Application 2, 361 (2015); https://doi.org/10.1146/annurev-statistics-010814-020120

/content/journals/10.1146/annurev-statistics-010814-020120

Data & Media loading...

Article Type: Review Article

Most Cited Most Cited RSS feed

- Probabilistic Forecasting
  
  Tilmann Gneiting, and Matthias Katzfuss
  
  Vol. 1 (2014), pp. 125–151
- Functional Data Analysis
  
  Jane-Ling Wang, Jeng-Min Chiou, and Hans-Georg Müller
  
  Vol. 3 (2016), pp. 257–295
- Bayesian Computing with INLA: A Review
  
  Håvard Rue, Andrea Riebler, Sigrunn H. Sørbye, Janine B. Illian, Daniel P. Simpson, and Finn K. Lindgren
  
  Vol. 4 (2017), pp. 395–421
- Functional Regression
  
  Jeffrey S. Morris
  
  Vol. 2 (2015), pp. 321–359
- Topological Data Analysis
  
  Larry Wasserman
  
  Vol. 5 (2018), pp. 501–532
- Algorithmic Fairness: Choices, Assumptions, and Definitions
  
  Shira Mitchell, Eric Potash, Solon Barocas, Alexander D'Amour, and Kristian Lum
  
  Vol. 8 (2021), pp. 141–163
- Microbiome, Metagenomics, and High-Dimensional Compositional Data Analysis
  
  Hongzhe Li
  
  Vol. 2 (2015), pp. 73–94
- Learning Deep Generative Models
  
  Ruslan Salakhutdinov
  
  Vol. 2 (2015), pp. 361–385
- On p-Values and Bayes Factors
  
  Leonhard Held, and Manuela Ott
  
  Vol. 5 (2018), pp. 393–419
- High-Dimensional Statistics with a View Toward Applications in Biology
  
  Peter Bühlmann, Markus Kalisch, and Lukas Meier
  
  Vol. 1 (2014), pp. 255–278
More Less

Annual Review of Statistics and Its Application

Volume 2, 2015

Review Article

Free

Learning Deep Generative Models

Abstract

Most Read This Month

Most Cited Most Cited RSS feed