1932

Abstract

Learning representations of data is an important problem in statistics and machine learning. While the origin of learning representations can be traced back to factor analysis and multidimensional scaling in statistics, it has become a central theme in deep learning with important applications in computer vision and computational neuroscience. In this article, we review recent advances in learning representations from a statistical perspective. In particular, we review the following two themes: () unsupervised learning of vector representations and () learning of both vector and matrix representations.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-statistics-031219-041131
2020-03-07
2024-03-28
Loading full text...

Full text loading...

/deliver/fulltext/statistics/7/1/annurev-statistics-031219-041131.html?itemId=/content/journals/10.1146/annurev-statistics-031219-041131&mimeType=html&fmt=ahah

Literature Cited

  1. Abbeel P, Ng AY. 2004. Apprenticeship learning via inverse reinforcement learning. Proceedings of the 21st International Conference on Machine Learning C Brodley 1–8 New York: ACM
    [Google Scholar]
  2. Arjovsky M, Chintala S, Bottou L 2017. Wasserstein generative adversarial networks. Proceedings of the 34th International Conference on Machine Learning D Precup, YW Teh 214–23 New York: ACM
    [Google Scholar]
  3. Bell AJ, Sejnowski TJ. 1997. The independent components of natural scenes are edge filters. Vis. Res. 37:3327–38
    [Google Scholar]
  4. Bengio Y, Delalleau O, Roux NL, Paiement JF, Vincent P, Ouimet M 2004. Learning eigenfunctions links spectral embedding and kernel PCA. Neural Comput 16:2197–219
    [Google Scholar]
  5. Blei DM, Kucukelbir A, McAuliffe JD 2017. Variational inference: a review for statisticians. J. Am. Stat. Assoc. 112:859–77
    [Google Scholar]
  6. Breiman L. 2017. Classification and Regression Trees London: Routledge
  7. Dai J, Lu Y, Wu YN 2014. Generative modeling of convolutional neural networks. arXiv:1412.6296 [cs.CV]
  8. Dai Z, Almahairi A, Bachman P, Hovy E, Courville A 2017. Calibrating energy-based generative adversarial networks. arXiv:1702.01691 [cs.LG]
  9. Dempster AP, Laird NM, Rubin DB 1977. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. B 39:1–38
    [Google Scholar]
  10. Devlin J, Chang MW, Lee K, Toutanova K 2019. BERT: pre-training of deep bidirectional transformers for language understanding. arXiv:1810.04805 [cs.CL]
  11. Dinh L, Krueger D, Bengio Y 2014. NICE: non-linear independent components estimation. arXiv:1410.8516 [cs.LG]
  12. Dinh L, Sohl-Dickstein J, Bengio S 2017. Density estimation using Real NVP. arXiv:1605.08803 [cs.LG]
  13. Dornhoff LL. 1972. Group Representation Theory: Modular Representation Theory New York: Dekker
  14. Friedman JH. 1991. Multivariate adaptive regression splines. Ann. Stat. 19:1–67
    [Google Scholar]
  15. Gao R, Lu Y, Zhou J, Zhu SC, Wu YN 2018a. Learning generative ConvNets via multi-grid modeling and sampling. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition9155–64 Red Hook, NY: Curran
    [Google Scholar]
  16. Gao R, Xie J, Zhu SC, Wu YN 2018b. Learning grid cells as vector representation of self-position coupled with matrix representation of self-motion. arXiv:1810.05597 [stat.ML]
  17. Gao R, Xie J, Zhu SC, Wu YN 2019. Learning V1 simple cells with vector representations of local contents and matrix representations of local motions. arXiv:1902.03871 [cs.NE]
  18. Gómez-Bombarelli R, Wei JN, Duvenaud D, Hernández-Lobato JM, Sánchez-Lengeling B et al. 2018. Automatic chemical design using a data-driven continuous representation of molecules. ACS Central Sci 4:268–76
    [Google Scholar]
  19. Goodfellow I, Pouget-Abadie J, Mirza M, Xu B, Warde-Farley D et al. 2014. Generative adversarial nets. Advances in Neural Information Processing Systems 27 Z Ghahramani, M Welling, C Cortes, ND Lawrence, KQ Weinberger 2672–80 San Diego, CA: NeurIPS
    [Google Scholar]
  20. Grathwohl W, Chen RT, Betterncourt J, Sutskever I, Duvenaud D 2019. Ffjord: Free-form continuous dynamics for scalable reversible generative models. arXiv:1810.01367 [cs.LG]
  21. Hafting T, Fyhn M, Molden S, Moser MB, Moser EI 2005. Microstructure of a spatial map in the entorhinal cortex. Nature 436:801–6
    [Google Scholar]
  22. Hamilton WL, Ying R, Leskovec J 2017. Representation learning on graphs: methods and applications. IEEE Data Eng. Bull. 40:52–74
    [Google Scholar]
  23. Han T, Lu Y, Zhu SC, Wu YN 2017. Alternating back-propagation for generator network. Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence S Singh, S Markovitch 1976–84 Palo Alto, CA: AAAI
    [Google Scholar]
  24. Han T, Nijkamp E, Fang X, Hill M, Zhu SC, Wu YN 2019. Divergence triangle for joint training of generator model, energy-based model, and inference model. arXiv:1812.10907 [stat.ML]
  25. Hinton GE. 2002. Training products of experts by minimizing contrastive divergence. Neural Comput 14:1771–800
    [Google Scholar]
  26. Hinton GE. 2012. A practical guide to training restricted Boltzmann machines. Neural Networks: Tricks of the Trade G Montavon, GB Orr, KR Müller 599–619 New York: Springer
    [Google Scholar]
  27. Hinton GE, Dayan P, Frey BJ, Neal RM 1995. The “wake-sleep” algorithm for unsupervised neural networks. Science 268:1158–61
    [Google Scholar]
  28. Hochreiter S, Schmidhuber J. 1997. Long short-term memory. Neural Comput 9:1735–80
    [Google Scholar]
  29. Hubel DH, Wiesel TN. 1959. Receptive fields of single neurones in the cat's striate cortex. J. Physiol. 148:574–91
    [Google Scholar]
  30. Hyvärinen A, Karhunen J, Oja E 2004. Independent Component Analysis New York: Wiley
  31. Isola P, Zhu JY, Zhou T, Efros AA 2018. Image-to-image translation with conditional adversarial networks. arXiv:1611.07004 [cs.CV]
  32. Jin L, Lazarow J, Tu Z 2017. Introspective classification with convolutional nets. Advances in Neural Information Processing Systems 30 I Guyon, UV Luxburg, S Bengio, H Wallach, R Fergus et al.823–33 San Diego, CA: NeurIPS
    [Google Scholar]
  33. Jordan MI, Ghahramani Z, Jaakkola TS, Saul LK 1999. An introduction to variational methods for graphical models. Mach. Learn. 37:183–233
    [Google Scholar]
  34. Karras T, Aila T, Laine S, Lehtinen J 2017. Progressive growing of GANs for improved quality, stability, and variation. arXiv:1710.10196 [cs.NE]
  35. Kim T, Bengio Y. 2016. Deep directed generative models with energy-based probability estimation. arXiv:1606.03439 [cs.LG]
  36. Kingma DP, Dhariwal P. 2018. Glow: Generative flow with invertible 1x1 convolutions. Advances in Neural Information Processing Systems 31 S Bengio, H Wallach, H Larochelle, K Grauman, N Cesa-Bianchi, R Garnett 10215–24 San Diego, CA: NeurIPS
    [Google Scholar]
  37. Kingma DP, Welling M. 2014. Auto-encoding variational Bayes. arXiv:1312.6114 [stat.ML]
  38. Kipf TN, Welling M. 2016. Semi-supervised classification with graph convolutional networks. arXiv:1609.02907 [cs.LG]
  39. Koren Y, Bell R, Volinsky C 2009. Matrix factorization techniques for recommender systems. Computer 42:30–37
    [Google Scholar]
  40. Krizhevsky A, Sutskever I, Hinton GE 2012. ImageNet classification with deep convolutional neural networks. Advances in Neural Information Processing Systems 25 F Pereira, CJC Burges, L Bottou, KQ Weinberger 1097–105 San Diego, CA: NeurIPS
    [Google Scholar]
  41. Kruskal JB. 1964. Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis. Psychometrika 29:1–27
    [Google Scholar]
  42. Lafferty J, McCallum A, Pereira FC 2001. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. Proceedings of the 18th International Conference on Machine Learning CE Brodley, AP Danyluk 282–89 New York: ACM
    [Google Scholar]
  43. Lazarow J, Jin L, Tu Z 2017. Introspective neural networks for generative modeling. IEEE International Conference on Computer Vision2774–83 New York: IEEE
    [Google Scholar]
  44. LeCun Y, Bottou L, Bengio Y, Haffner P 1998. Gradient-based learning applied to document recognition. Proc. IEEE 86:2278–324
    [Google Scholar]
  45. Lee DD, Seung HS. 2001. Algorithms for non-negative matrix factorization. Advances in Neural Information Processing Systems 13 TK Leen, TG Dietterich, V Tresp 556–62 San Diego, CA: NeurIPS
    [Google Scholar]
  46. Lee H, Grosse R, Ranganath R, Ng AY 2009. Convolutional deep belief networks for scalable unsupervised learning of hierarchical representations. Proceedings of the 26th Annual International Conference on Machine Learning A Danyluk, L Bottou, M Littman 609–16 New York: ACM
    [Google Scholar]
  47. Lee K, Xu W, Fan F, Tu Z 2018. Wasserstein introspective neural networks. 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition3702–11 New York: IEEE
    [Google Scholar]
  48. Li KC. 1991. Sliced inverse regression for dimension reduction. J. Am. Stat. Assoc. 86:316–27
    [Google Scholar]
  49. Liu Z, Luo P, Wang X, Tang X 2015. Deep learning face attributes in the wild. International Conference on Computer Vision3730–38 New York: IEEE
    [Google Scholar]
  50. Lu Y, Zhu SC, Wu YN 2016. Learning FRAME models using CNN filters. Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence1902–10 Palo Alto, CA: AAAI
    [Google Scholar]
  51. Mikolov T, Chen K, Corrado G, Dean J 2013. Efficient estimation of word representations in vector space. arXiv:1301.3781 [cs.CL]
  52. Mirza M, Osindero S. 2014. Conditional generative adversarial nets. arXiv:1411.1784 [cs.LG]
  53. Mnih A, Gregor K. 2014. Neural variational inference and learning in belief networks. Proceedings of the 31st International Conference on Machine Learning EP Xing, T Jebara 1791–99 New York: ACM
    [Google Scholar]
  54. Neal RM. 2011. MCMC using Hamiltonian dynamics. Handbook of Markov Chain Monte Carlo S Brooks, A Gelman, GL Jones, XL Meng 113–162 Boca Raton, FL: Chapman & Hall/CRC
    [Google Scholar]
  55. Ngiam J, Chen Z, Koh PW, Ng AY 2011. Learning deep energy models. Proceedings of the 28th International Conference on Machine Learning L Getoor, T Scheffer 1105–12 New York: ACM
    [Google Scholar]
  56. Nijkamp E, Zhu SC, Wu YN 2019. On learning non-convergent short-run MCMC toward energy-based model. arXiv:1904.09770 [stat.ML]
  57. O'Keefe J. 1979. A review of the hippocampal place cells. Prog. Neurobiol. 13:419–39
    [Google Scholar]
  58. Olshausen BA, Field DJ. 1997. Sparse coding with an overcomplete basis set: a strategy employed by V1. Vis. Res. 37:3311–25
    [Google Scholar]
  59. Paccanaro A, Hinton GE. 2001. Learning distributed representations of concepts using linear relational embedding. IEEE Trans. Knowl. Data Eng. 13:232–44
    [Google Scholar]
  60. Pennington J, Socher R, Manning C 2014. Glove: Global vectors for word representation. Conference on Empirical Methods in Natural Language Processing1532–43 Stroudsburg, PA: ACL
    [Google Scholar]
  61. Radford A, Metz L, Chintala S 2015. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv:1511.06434 [cs.LG]
  62. Radford A, Narasimhan K, Salimans T, Sutskever I 2018. Improving language understanding by generative pre-training Tech. Rep., OpenAI San Francisco: https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf
  63. Reed S, Akata Z, Yan X, Logeswaran L, Schiele B, Lee H 2016. Generative adversarial text to image synthesis. arXiv:1605.05396 [cs.NE]
  64. Rezende DJ, Mohamed S. 2015. Variational inference with normalizing flows. Proceedings of the 32nd International Conference on Machine Learning F Bach, D Blei 1530–38 New York: ACM
    [Google Scholar]
  65. Rezende DJ, Mohamed S, Wierstra D 2014. Stochastic backpropagation and approximate inference in deep generative models. Proceedings of the 31st International Conference on Machine Learning EP Xing, T Jebara 1278–86 New York: ACM
    [Google Scholar]
  66. Roweis ST, Saul LK. 2000. Nonlinear dimensionality reduction by locally linear embedding. Science 290:2323–26
    [Google Scholar]
  67. Rubin DB, Thayer DT. 1982. EM algorithms for ML factor analysis. Psychometrika 47:69–76
    [Google Scholar]
  68. Salakhutdinov R, Hinton GE. 2009. Deep Boltzmann machines. Proceedings of the 12th International Conference on Artificial Intelligence and Statistics D van Dyk, M Welling 448–55 Brookline, MA: Microtome
    [Google Scholar]
  69. Sohn K, Lee H, Yan X 2015. Learning structured output representation using deep conditional generative models. Advances in Neural Information Processing Systems 28 C Cortes, ND Lawrence, DD Lee, M Sugiyama, R Garnett 3483–91 San Diego, CA: NeurIPS
    [Google Scholar]
  70. Sutton RS, Barto AG. 1998. Introduction to Reinforcement Learning Cambridge, MA: MIT Press
  71. Tibshirani R. 1996. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 58:267–88
    [Google Scholar]
  72. Tu Z. 2007. Learning generative models via discriminative approaches. 2007 IEEE Conference on Computer Vision and Pattern Recognition1–8 New York: IEEE
    [Google Scholar]
  73. Tyleček R, Šára R. 2013. Spatial pattern templates for recognition of objects with regular structure. Pattern Recognition: 35th German Conference, GCPR 2013, Saarbrücken, Germany, September 3–6, 2013, Proceedings J Weickert, M Hein, B Schiele 364–74 New York: Springer
    [Google Scholar]
  74. van der Maaten L, Hinton G 2008. Visualizing data using t-SNE. J. Mach. Learn. Res. 9:2579–605
    [Google Scholar]
  75. Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L et al. 2017. Attention is all you need. Advances in Neural Information Processing Systems 30 I Guyon, UV Luxburg, S Bengio, H Wallach, R Fergus et al.5998–6008 San Diego, CA: NeurIPS
    [Google Scholar]
  76. Wang TC, Liu MY, Zhu JY, Liu G, Tao A et al. 2018. Video-to-video synthesis. Advances in Neural Information Processing Systems 31 S Bengio, H Wallach, H Larochelle, K Grauman, N Cesa-Bianchi, R Garnett 1144–56 San Diego, CA: NeurIPS
    [Google Scholar]
  77. Wu YN, Gao R, Han T, Zhu SC 2019. A tale of three probabilistic families: discriminative, descriptive and generative models. Q. Appl. Math. 77:423–65
    [Google Scholar]
  78. Xie J, Gao R, Zheng Z, Zhu SC, Wu YN 2019a. Learning dynamic generator model by alternating back-propagation through time. The Thirty-Third AAAI Conference on Artificial Intelligence5490–507 Palo Alto, CA: AAAI
    [Google Scholar]
  79. Xie J, Lu Y, Gao R, Wu YN 2018a. Cooperative learning of energy-based model and latent variable model via MCMC teaching. The Thirty-Second AAAI Conference on Artificial Intelligence4292–301 Palo Alto, CA: AAAI
    [Google Scholar]
  80. Xie J, Lu Y, Gao R, Zhu SC, Wu YN 2019b. Cooperative training of descriptor and generator networks. IEEE Trans. Pattern Anal. Mach. Intel. In press. https://dx.doi.org/10.1109/TPAMI.2018.2879081
    [Crossref] [Google Scholar]
  81. Xie J, Lu Y, Zhu SC, Wu YN 2016. A theory of generative ConvNet. Proceedings of the 33rd International Conference on Machine Learning MF Balcan, KQ Weinberger 2635–44 New York: ACM
    [Google Scholar]
  82. Xie J, Zheng Z, Fang X, Zhu SC, Wu YN 2019c. Multimodal conditional learning with fast thinking policy-like model and slow thinking planner-like model. arXiv:1902.02812 [stat.ML]
  83. Xie J, Zheng Z, Gao R, Wang W, Zhu SC, Wu YN 2018b. Learning descriptor networks for 3D shape synthesis and analysis. arXiv:1804.00586 [cs.CV]
  84. Xie J, Zhu SC, Wu YN 2017. Synthesizing dynamic patterns by spatial-temporal generative ConvNet. 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)7093–101 New York: IEEE
    [Google Scholar]
  85. Xing X, Han T, Gao R, Zhu SC, Wu YN 2019. Unsupervised disentangling of appearance and geometry by deformable generator network. arXiv:1806.06298 [cs.LG]
  86. Yu F, Seff A, Zhang Y, Song S, Funkhouser T, Xiao J 2015. LSUN: Construction of a large-scale image dataset using deep learning with humans in the loop. arXiv:1506.03365 [cs.CV]
  87. Zee A. 2016. Group Theory in a Nutshell for Physicists Princeton, NJ: Princeton Univ. Press
  88. Ziebart BD, Maas AL, Bagnell JA, Dey AK 2008. Maximum entropy inverse reinforcement learning. Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence1433–38 Palo Alto, CA: AAAI
    [Google Scholar]
/content/journals/10.1146/annurev-statistics-031219-041131
Loading
/content/journals/10.1146/annurev-statistics-031219-041131
Loading

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error