1932

Abstract

We review the problem of defining and inferring a state for a control system based on complex, high-dimensional, highly uncertain measurement streams, such as videos. Such a state, or representation, should contain all and only the information needed for control and discount nuisance variability in the data. It should also have finite complexity, ideally modulated depending on available resources. This representation is what we want to store in memory in lieu of the data, as it separates the control task from the measurement process. For the trivial case with no dynamics, a representation can be inferred by minimizing the information bottleneck Lagrangian in a function class realized by deep neural networks. The resulting representation has much higher dimension than the data (already in the millions) but is smaller in the sense of information content, retaining only what is needed for the task. This process also yields representations that are invariant to nuisance factors and have maximally independent components. We extend these ideas to the dynamic case, where the representation is the posterior density of the task variable given the measurements up to the current time, which is in general much simpler than the prediction density maintained by the classical Bayesian filter. Again, this can be finitely parameterized using a deep neural network, and some applications are already beginning to emerge. No explicit assumption of Markovianity is needed; instead, complexity trades off approximation of an optimal representation, including the degree of Markovianity.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-control-060117-105140
2018-05-28
2024-04-25
Loading full text...

Full text loading...

/deliver/fulltext/control/1/1/annurev-control-060117-105140.html?itemId=/content/journals/10.1146/annurev-control-060117-105140&mimeType=html&fmt=ahah

Literature Cited

  1. 1.  Kalman RE 1960. A new approach to linear filtering and prediction problems. ASME J. Basic Eng. 82:35–45
    [Google Scholar]
  2. 2.  Arun K, Kung S 1990. Balanced approximation of stochastic systems. SIAM J. Matrix Anal. Appl. 11:42–68
    [Google Scholar]
  3. 3.  Lindquist A, Picci G 1979. On the stochastic realization problem. SIAM J. Control Optim. 17:365–89
    [Google Scholar]
  4. 4.  Akaike H 1974. A new look at the statistical model identification. IEEE Trans. Autom. Control 19:716–23
    [Google Scholar]
  5. 5.  Lewis FL, Vrabie D, Syrmos VL 2012. Optimal Control New York: Wiley & Sons
  6. 6.  Chiaromonte F, Cook RD, Li B 2002. Sufficient dimension reduction in regressions with categorical predictors. Ann. Stat. 30:475–97
    [Google Scholar]
  7. 7.  Shyr A, Urtasun R, Jordan MI 2010. Sufficient dimension reduction for visual sequence classification. 2010 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)3610–17 New York: IEEE
    [Google Scholar]
  8. 8.  Sundaramoorthi G, Petersen P, Varadarajan VS, Soatto S 2009. On the set of images modulo viewpoint and contrast changes. 2009 IEEE Conference on Computer Vision and Pattern Recognition832–39 New York: IEEE
    [Google Scholar]
  9. 9.  Krizhevsky A, Sutskever I, Hinton GE 2012. ImageNet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 25:1097–105
    [Google Scholar]
  10. 10.  Cover TM, Thomas JA 2012. Elements of Information Theory New York: Wiley & Sons
  11. 11.  Zhang C, Bengio S, Hardt M, Recht B, Vinyals O 2016. Understanding deep learning requires rethinking generalization. arXiv:1611.03530
  12. 12.  Achille A, Soatto S 2018. On the emergence of invariance and disentangling in deep representations. Int. J. Mach. Learn. Res. In press
  13. 13.  Jeffreys H 1960. An extension of the Pitman–Koopman theorem. Math. Proc. Camb. Philos. Soc. 56:393–95
    [Google Scholar]
  14. 14.  Bahadur RR 1954. Sufficiency and statistical decision functions. Ann. Math. Stat. 25:423–62
    [Google Scholar]
  15. 15.  Achille A, Soatto S 2018. Information dropout: learning optimal representations through noisy computation. IEEE Trans. Pattern Anal. Mach. Intell. In press
  16. 16.  Koren Y 2010. Collaborative filtering with temporal dynamics. Commun. ACM 53:89–97
    [Google Scholar]
  17. 17.  Wu CY, Ahmed A, Beutel A, Smola AJ, Jing H 2017. Recurrent recommender networks. WSDM '17: Proceedings of the Tenth ACM International Conference on Web Search and Data Mining495–503 New York: ACM
    [Google Scholar]
  18. 18.  Krishnan RG, Shalit U, Sontag D 2015. Deep Kalman filters. arXiv:1511.05121
  19. 19.  Raiko T, Tornio M 2009. Variational Bayesian learning of nonlinear hidden state-space models for model predictive control. Neurocomputing 72:3704–12
    [Google Scholar]
  20. 20.  Langford J, Salakhutdinov R, Zhang T 2009. Learning nonlinear dynamic models. ICML '09: Proceedings of the 26th Annual International Conference on Machine Learning593–600 New York: ACM
    [Google Scholar]
  21. 21.  Jazwinski AH 2007. Stochastic Processes and Filtering Theory North Chelmsford, MA: Courier
  22. 22.  Wan EA, Van Der Merwe R 2000. The unscented Kalman filter for nonlinear estimation. Proceedings of the IEEE 2000 Adaptive Systems for Signal Processing, Communications, and Control Symposium153–58 New York: IEEE
    [Google Scholar]
  23. 23.  Fox R, Tishby N 2016. Minimum-information LQG control part II: retentive controllers. 2016 IEEE 55th Conference on Decision and Control (CDC)5603–9 New York: IEEE
    [Google Scholar]
  24. 24.  Tiomkin S, Polani D, Tishby N 2017. Control capacity of partially observable dynamic systems in continuous time. arXiv:1701.04984
  25. 25.  Rubin J, Shamir O, Tishby N 2012. Trading value and information in MDPS. Decision Making with Imperfect Decision Makers TV Guy, M Kàrny`, DH Wolpert 57–74 Berlin: Springer
    [Google Scholar]
  26. 26.  Fox R, Moshkovitz M, Tishby N 2016. Principled option learning in Markov decision processes. arXiv:1609.05524
  27. 27.  Dosovitskiy A, Koltun V 2016. Learning to act by predicting the future. arXiv:1611.01779
  28. 28.  Houthooft R, Chen X, Duan Y, Schulman J, De Turck F, Abbeel P 2016. VIME: variational information maximizing exploration. Adv. Neural Inf. Process. Syst. 29:1109–117
    [Google Scholar]
  29. 29.  Dong J, Soatto S 2015. Domain-size pooling in local descriptors: DSP-SIFT. 2015 IEEE Conference on Computer Vision and Pattern Recognition5097–106 New York: IEEE
    [Google Scholar]
  30. 30.  Tishby N, Pereira FC, Bialek W 1999. The information bottleneck method. The 37th Annual Allerton Conference on Communication, Control, and Computing B Hajek, RS Sreenivas 368–77 Urbana: Univ. Ill.
    [Google Scholar]
  31. 31.  Alemi AA, Fischer I, Dillon JV, Murphy K 2016. Deep variational information bottleneck. arXiv:1612.00410
  32. 32.  Soatto S 2013. Actionable information in vision. Machine Learning for Computer Vision R Cipolla, S Battiato, GM Farinella 17–48 Berlin: Springer
    [Google Scholar]
  33. 33.  Strouse D, Schwab DJ 2016. The deterministic information bottleneck. arXiv:1604.00268
  34. 34.  LeCun Y, Boser B, Denker J, Henderson D, Howard R et al. 1990. Handwritten digit recognition with a back-propagation network. Adv. Neural Inf. Process. Syst. 2:396–404
    [Google Scholar]
  35. 35.  Soatto S, Chiuso A 2016. Visual representations: defining properties and deep approximations. 4th International Conference on Learning Representation (ICLR). https://arxiv.org/abs/1411.7676
  36. 36.  Glorot X, Bengio Y 2010. Understanding the difficulty of training deep feedforward neural networks. Proceedings of the Thirteenth International Conference on Artificial Intelligence and Statistics249–56 N.p.: PMLR
    [Google Scholar]
  37. 37.  Nesterov Y 2013. Introductory Lectures on Convex Optimization: A Basic Course New York: Springer
  38. 38.  Kingma D, Ba J 2014. Adam: a method for stochastic optimization. arXiv:1412.6980
  39. 39.  Mnih V, Kavukcuoglu K, Silver D, Rusu AA, Veness J et al. 2015. Human-level control through deep reinforcement learning. Nature 518:529–33
    [Google Scholar]
  40. 40.  Kingma DP, Salimans T, Welling M 2015. Variational dropout and the local reparameterization trick. Adv. Neural Inf. Process. Syst. 28:2575–83
    [Google Scholar]
  41. 41.  McAllester D 2013. A PAC-Bayesian tutorial with a dropout bound. arXiv:1307.2118
  42. 42.  Dziugaite GK, Roy DM 2017. Computing nonvacuous generalization bounds for deep (stochastic) neural networks with many more parameters than training data. Proceedings of the Thirty-Third Conference on Uncertainty in Artificial Intelligence chap. 173 N.p.: AUAI Press http://auai.org/uai2017/proceedings/papers/173.pdf
    [Google Scholar]
  43. 43.  Chaudhari P, Soatto S 2017. Stochastic gradient descent performs variational inference, converges to limit cycles for deep networks. arXiv:1710.11029
  44. 44.  Hochreiter S, Schmidhuber J 1997. Flat minima. Neural Comput 9:1–42
    [Google Scholar]
  45. 45.  Bar-Shalom Y, Fortmann TE 1987. Tracking and Data Association San Diego, CA: Academic
/content/journals/10.1146/annurev-control-060117-105140
Loading
/content/journals/10.1146/annurev-control-060117-105140
Loading

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error