1932

Abstract

Estimation of functions from sparse and noisy data is a central theme in machine learning. In the last few years, many algorithms have been developed that exploit Tikhonov regularization theory and reproducing kernel Hilbert spaces. These are the so-called kernel-based methods, which include powerful approaches like regularization networks, support vector machines, and Gaussian regression. Recently, these techniques have also gained popularity in the system identification community. In both linear and nonlinear settings, kernels that incorporate information on dynamic systems, such as the smoothness and stability of the input–output map, can challenge consolidated approaches based on parametric model structures. In the classical parametric setting, the complexity of the model (the model order) needs to be chosen, typically from a finite family of alternatives, by trading bias and variance. This (discrete) model order selection step may be critical, especially when the true model does not belong to the model class. In regularization-based approaches, model complexity is controlled by tuning (continuous) regularization parameters, making the model selection step more robust. In this article, we review these new kernel-based system identification approaches and discuss extensions based on nuclear and norms.

Loading

Article metrics loading...

/content/journals/10.1146/annurev-control-053018-023744
2019-05-03
2024-03-29
Loading full text...

Full text loading...

/deliver/fulltext/control/2/1/annurev-control-053018-023744.html?itemId=/content/journals/10.1146/annurev-control-053018-023744&mimeType=html&fmt=ahah

Literature Cited

  1. 1.  Akaike H 1974. A new look at the statistical model identification. IEEE Trans. Autom. Control 19:716–23
    [Google Scholar]
  2. 2.  Schwarz G 1978. Estimating the dimension of a model. Ann. Stat. 6:461–64
    [Google Scholar]
  3. 3.  Arlot S, Celisse A 2014. A survey of cross-validation procedures for model selection. Statist. Surv. 4:40–79
    [Google Scholar]
  4. 4.  Hastie TJ, Tibshirani RJ, Friedman J 2001. The Elements of Statistical Learning: Data Mining, Inference and Prediction New York: Springer
  5. 5.  Bishops C 1996. Neural Networks for Pattern Recognition Oxford, UK: Oxford Univ. Press
  6. 6.  Haykin S 2009. Neural Networks and Learning Machines Upper Saddle River, NJ: Pearson Educ.
  7. 7.  Lecun Y, Bengio Y, Hinton G 2015. Deep learning. Nature 521:436–44
    [Google Scholar]
  8. 8.  Tikhonov A, Arsenin V 1977. Solutions of Ill-Posed Problems Washington, DC: Winston/Wiley
  9. 9.  Bertero M 1989. Linear inverse and ill-posed problems. Adv. Electron. Electron Phys. 75:1–120
    [Google Scholar]
  10. 10.  Aronszajn N 1950. Theory of reproducing kernels. Trans. Am. Math. Soc. 68:337–404
    [Google Scholar]
  11. 11.  Bergman S 1950. The Kernel Function and Conformal Mapping Providence, RI: Am. Math. Soc.
  12. 12.  Bertero M, Poggio T, Torre V 1988. Ill-posed problems in early vision. Proc. IEEE 76:869–89
    [Google Scholar]
  13. 13.  Wahba G 1990. Spline Models for Observational Data Philadelphia: Soc. Ind. Appl. Math.
  14. 14.  Poggio T, Girosi F 1990. Networks for approximation and learning. Proc. IEEE 78:1481–97
    [Google Scholar]
  15. 15.  Girosi F 1997.An equivalence between sparse approximation and support vector machines AI Memo 1606, CBCL Pap. 147, Mass. Inst. Technol., Cambridge, MA
  16. 16.  Kimeldorf G, Wahba G 1970. A correspondence between Bayesian estimation on stochastic processes and smoothing by splines. Ann. Math. Stat. 41:495–502
    [Google Scholar]
  17. 17.  Lukic M, Beder J 2001. Stochastic processes with sample paths in reproducing kernel Hilbert spaces. Trans. Am. Math. Soc. 353:3945–69
    [Google Scholar]
  18. 18.  Bell B, Pillonetto G 2004. Estimating parameters and stochastic functions of one variable using nonlinear measurement models. Inverse Probl. 20:627
    [Google Scholar]
  19. 19.  Aravkin A, Bell B, Burke J, Pillonetto G 2015. The connection between Bayesian estimation of a Gaussian random field and RKHS. IEEE Trans. Neural Netw. Learn. Syst. 26:1518–24
    [Google Scholar]
  20. 20.  Evgeniou T, Pontil M, Poggio T 2000. Regularization networks and support vector machines. Adv. Comput. Math. 13:1–50
    [Google Scholar]
  21. 21.  Schölkopf B, Smola AJ 2001. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond Cambridge, MA: MIT Press
  22. 22.  Drucker H, Burges C, Kaufman L, Smola A, Vapnik V 1997. Support vector regression machines. Advances in Neural Information Processing Systems 9 MC Mozer, MI Jordan, T Petsche15561 Cambridge, MA: MIT Press
    [Google Scholar]
  23. 23.  Vapnik V 1998. Statistical Learning Theory New York: Wiley
  24. 24.  Rasmussen C, Williams C 2006. Gaussian Processes for Machine Learning Cambridge, MA: MIT Press
  25. 25.  Collobert R, Bengio S 2001. SVMTorch: support vector machines for large-scale regression problems. J. Mach. Learn. Res. 1:143–60
    [Google Scholar]
  26. 26.  Maritz JS, Lwin T 1989. Empirical Bayes Methods London: Chapman and Hall. 2nd ed.
  27. 27.  Aravkin A, Burke J, Chiuso A, Pillonetto G 2012. On the estimation of hyperparameters for empirical Bayes estimators: maximum marginal likelihood versus minimum MSE. IFAC Proc. Vol. 45:16125–30
    [Google Scholar]
  28. 28.  Aravkin A, Burke J, Chiuso A, Pillonetto G 2014. Convex versus nonconvex estimators for regression and sparse estimation: the mean squared error properties of ARD and GLasso. J. Mach. Learn. Res. 15:217–52
    [Google Scholar]
  29. 29.  Wahba G 1977. Practical approximate solutions to linear operator equations when the data are noisy. SIAM J. Numer. Anal. 14:651–67
    [Google Scholar]
  30. 30.  Smale S, Zhou D 2007. Learning theory estimates via integral operators and their approximations. Constr. Approx. 26:153–72
    [Google Scholar]
  31. 31.  Yuan M, Cai TT 2010. A reproducing kernel Hilbert space approach to functional linear regression. Ann. Stat. 38:3412–44
    [Google Scholar]
  32. 32.  Wu Q, Ying Y, Zhou D 2006. Learning rates of least-square regularized regression. Found. Comput. Math. 6:171–92
    [Google Scholar]
  33. 33.  Mukherjee S, Niyogi P, Poggio T, Rifkin R 2006. Learning theory: stability is sufficient for generalization and necessary and sufficient for consistency of empirical risk minimization. Adv. Comput. Math. 25:161–93
    [Google Scholar]
  34. 34.  Poggio T, Rifkin R, Mukherjee S, Niyogi P 2004. General conditions for predictivity in learning theory. Nature 428:419–22
    [Google Scholar]
  35. 35.  Alon N, Ben-David S, Cesa-Bianchi N, Haussler D 1997. Scale-sensitive dimensions, uniform convergence, and learnability. J. ACM 44:615–31
    [Google Scholar]
  36. 36.  Evgeniou T, Pontil M 1999. On the dimension for regression in reproducing kernel Hilbert spaces. Algorithmic Learning Theory: 10th International Conference, ALT ’99, Tokyo, Japan, December 1999 O Watanabe, T Yokomori10617 Berlin: Springer
    [Google Scholar]
  37. 37.  Bousquet O, Elisseeff A 2002. Stability and generalization. J. Mach. Learn. Res. 2:499–526
    [Google Scholar]
  38. 38.  Schiller R 1979. A distributed lag estimator derived from smoothness priors. Econ. Lett. 2:219–23
    [Google Scholar]
  39. 39.  Akaike H 1979. Smoothness priors and the distributed lag estimator Tech. Rep. 40, Dep. Stat., Stanford Univ., Stanford, CA
  40. 40.  Kitagawa G, Gersch W 1996. Smoothness Priors Analysis of Time Series New York: Springer
  41. 41.  Chiuso A 2016. Regularization and Bayesian learning in dynamical systems: past, present and future. Annu. Rev. Control 41:24–38
    [Google Scholar]
  42. 42.  Goodwin G, Gevers M, Ninness B 1992. Quantifying the error in estimated transfer functions with application to model order selection. IEEE Trans. Autom. Control 37:913–28
    [Google Scholar]
  43. 43.  Ljung L, Goodwin G, Agüero JC 2014. Stochastic embedding revisited: a modern interpretation. 53rd IEEE Conference on Decision and Control334045 New York: IEEE
    [Google Scholar]
  44. 44.  Chandrasekaran V, Recht B, Parrilo P, Willsky A 2012. The convex geometry of linear inverse problems. Found. Comput. Math. 12:805–49
    [Google Scholar]
  45. 45.  Liu Z, Vandenberghe L 2009. Interior-point method for nuclear norm approximation with application to system identification. SIAM J. Matrix Anal. Appl. 31:1235–56
    [Google Scholar]
  46. 46.  Grossmann C, Jones C, Morari M 2009. System identification via nuclear norm regularization for simulated moving bed processes from incomplete data sets. Proceedings of the 48th IEEE Conference on Decision and Control4692–97 New York: IEEE
    [Google Scholar]
  47. 47.  Mohan K, Fazel M 2010. Reweighted nuclear norm minimization with application to system identification. Proceedings of the 2010 American Control Conference2953–59 New York: IEEE
    [Google Scholar]
  48. 48.  Rojas C, Toth R, Hjalmarsson H 2014. Sparse estimation of polynomial and rational dynamical models. IEEE Trans. Autom. Control 59:2962–77
    [Google Scholar]
  49. 49.  Pillonetto G, Chen T, Chiuso A, De Nicolao G, Ljung L 2016. Regularized linear system identification using atomic, nuclear and kernel-based norms: the role of the stability constraint. Automatica 69:137–49
    [Google Scholar]
  50. 50.  Franz M, Schölkopf B 2006. A unifying view of Wiener and Volterra theory and polynomial kernel regression. Neural Comput. 18:3097–118
    [Google Scholar]
  51. 51.  Lin T, Horne B, Tino P, Giles C 1996. Learning long-term dependencies in NARX recurrent neural networks. IEEE Trans. Neural Netw. 7:1329–38
    [Google Scholar]
  52. 52.  Shun-Feng S, Yang F 2002. On the dynamical modeling with neural fuzzy networks. IEEE Trans. Neural Netw. 13:1548–53
    [Google Scholar]
  53. 53.  Fan J, Gijbels I 1996. Local Polynomial Modelling and Its Applications London: Chapman and Hall
  54. 54.  Billings S, Hua-Liang W 2005. A new class of wavelet networks for nonlinear system identification. IEEE Trans. Neural Netw. 16:862–74
    [Google Scholar]
  55. 55.  Leithead WE, Solak E, Leith DJ 2003. Direct identification of nonlinear structure using Gaussian process prior models. 2003 European Control Conference2565–70 New York: IEEE
    [Google Scholar]
  56. 56.  Pillonetto G, Chiuso A, Quang MH 2011. A new kernel-based approach for nonlinear system identification. IEEE Trans. Autom. Control 56:2825–40
    [Google Scholar]
  57. 57.  Zhao W, Chen H, Bai E, Li K 2015. Kernel-based local order estimation of nonlinear nonparametric systems. Automatica 51:243–54
    [Google Scholar]
  58. 58.  Roll J, Nazin A, Ljung L 2005. Nonlinear system identification via direct weight optimization. Automatica 41:475–90
    [Google Scholar]
  59. 59.  Bai E, Liu Y 2007. Recursive direct weight optimization in nonlinear system identification: a minimal probability approach. IEEE Trans. Autom. Control 52:1218–31
    [Google Scholar]
  60. 60.  Bai EW 2010. Non-parametric nonlinear system identification: an asymptotic minimum mean squared error estimator. IEEE Trans. Autom. Control 55:1615–26
    [Google Scholar]
  61. 61.  Suykens J, Gestel TV, Brabanter JD, Moor BD, Vandewalle J 2002. Least Squares Support Vector Machines Singapore: World Sci.
  62. 62.  Suykens J, Alzate C, Pelckmans K 2010. Primal and dual model representations in kernel-based learning. Stat. Surv. 4:148–83
    [Google Scholar]
  63. 63.  Frigola R, Lindsten F, Schön T, Rasmussen C 2013. Bayesian inference and learning in Gaussian process state-space models with particle MCMC. Advances in Neural Information Processing Systems 26 CJC Burges, L Bottou, M Welling, Z Ghahramani, KQ Weinberger315664 Red Hook, NY: Curran
    [Google Scholar]
  64. 64.  Frigola R, Rasmussen C 2013. Integrated preprocessing for Bayesian nonlinear system identification with Gaussian processes. 52nd Annual Conference on Decision and Control5371–76 New York: IEEE
    [Google Scholar]
  65. 65.  Ljung L 1999. System Identification: Theory for the User Upper Saddle River, NJ: Prentice Hall. 2nd ed.
  66. 66.  Pruyt E, Cunningham S, Kwakkel J, de Bruijn J 2014. From data-poor to data-rich: system dynamics in the era of big data. Proceedings of the 2014 International Conference of the System Dynamics Society pap. 1390 Albany, NY: Syst. Dyn. Soc.
    [Google Scholar]
  67. 67.  Pillonetto G, De Nicolao G 2010. A new kernel-based approach for linear system identification. Automatica 46:81–93
    [Google Scholar]
  68. 68.  Pillonetto G, Chiuso A, De Nicolao G 2011. Prediction error identification of linear systems: a nonparametric Gaussian regression approach. Automatica 47:291–305
    [Google Scholar]
  69. 69.  Chen T, Ohlsson H, Ljung L 2012. On the estimation of transfer functions, regularizations and Gaussian processes—revisited. Automatica 48:1525–35
    [Google Scholar]
  70. 70.  Vishwanathan SVN, Smola AJ, Vidal R 2007. Binet-Cauchy kernels on dynamical systems and its application to the analysis of dynamic scenes. Int. J. Comput. Vis. 73:95–119
    [Google Scholar]
  71. 71.  Carmeli C, Vito ED, Toigo A 2006. Vector valued reproducing kernel Hilbert spaces of integrable functions and Mercer theorem. Anal. Appl. 4:377–408
    [Google Scholar]
  72. 72.  Pillonetto G, Dinuzzo F, Chen T, Nicolao GD, Ljung L 2014. Kernel methods in system identification, machine learning and function estimation: a survey. Automatica 50:657–82
    [Google Scholar]
  73. 73.  Dinuzzo F 2015. Kernels for linear time invariant system identification. SIAM J. Control Optim. 53:3299–317
    [Google Scholar]
  74. 74.  Pillonetto G 2018. System identification using kernel-based regularization: new insights on stability and consistency issues. Automatica 93:321–32
    [Google Scholar]
  75. 75.  Cucker F, Smale S 2001. On the mathematical foundations of learning. Bull. Am. Math. Soc. 39:1–49
    [Google Scholar]
  76. 76.  Argyriou A, Dinuzzo F 2014. A unifying view of representer theorems. Proceedings of the 31st International Conference on Machine Learning EP Xing, T Jebara74856 Proc. Mach. Learn. Res. 32(2) N.p.: PMLR
    [Google Scholar]
  77. 77.  Pillonetto G, Chiuso A, De Nicolao G 2010. Regularized estimation of sums of exponentials in spaces generated by stable spline kernels. Proceedings of the 2010 American Control Conference498–503 New York: IEEE
    [Google Scholar]
  78. 78.  Zorzi M, Chiuso A 2018. The harmonic analysis of kernel functions. Automatica 94:125–37
    [Google Scholar]
  79. 79.  Allen DM 1974. The relationship between variable selection and data augmentation and a method for prediction. Technometrics 16:125–27
    [Google Scholar]
  80. 80.  Wang L, Cluett W 1996. Use of PRESS residuals in dynamic system identification. Automatica 32:781–84
    [Google Scholar]
  81. 81.  Craven P, Wahba G 1979. Smoothing noisy data with spline functions. Numer. Math. 31:377–403
    [Google Scholar]
  82. 82.  Golub G, Heath M, Wahba G 1979. Generalized cross-validation as a method for choosing a good ridge parameter. Technometrics 21:215–23
    [Google Scholar]
  83. 83.  Aravkin A, Bell B, Burke J, Pillonetto G 2015. The connection between Bayesian estimation of a Gaussian random field and RKHS. IEEE Trans. Neural Netw. Learn. Syst. 26:1518–24
    [Google Scholar]
  84. 84.  Cox R 1946. Probability, frequency, and reasonable expectation. Am. J. Phys. 14:1–13
    [Google Scholar]
  85. 85.  MacKay D 1992. Bayesian interpolation. Neural Comput. 4:415–47
    [Google Scholar]
  86. 86.  De Nicolao G, Sparacino G, Cobelli C 1997. Nonparametric input estimation in physiological systems: problems, methods and case studies. Automatica 33:851–70
    [Google Scholar]
  87. 87.  Pillonetto G, Chiuso A 2015. Tuning complexity in regularized kernel-based regression and linear system identification: the robustness of the marginal likelihood estimator. Automatica 58:106–17
    [Google Scholar]
  88. 88.  Gilks W, Richardson S, Spiegelhalter D 1996. Markov Chain Monte Carlo in Practice London: Chapman and Hall
  89. 89.  Ninness B, Henriksen S 2010. Bayesian system identification via MCMC techniques. Automatica 46:40–51
    [Google Scholar]
  90. 90.  Andrieu C, Doucet A, Holenstein R 2010. Particle Markov chain Monte Carlo methods. J. R. Stat. Soc. B 72:269–342
    [Google Scholar]
  91. 91.  Bottegal G, Aravkin A, Hjalmarsson H, Pillonetto G 2016. Robust EM kernel-based methods for linear system identification. Automatica 67:114–26
    [Google Scholar]
  92. 92.  Gunter L, Zhu J 2007. Efficient computation and model selection for the support vector regression. Neural Comput. 19:1633–55
    [Google Scholar]
  93. 93.  Dinuzzo F, Neve M, De Nicolao G, Gianazza U 2007. On the representer theorem and equivalent degrees of freedom of SVR. J. Mach. Learn. Res. 8:2467–95
    [Google Scholar]
  94. 94.  Dinuzzo F, De Nicolao G 2009. An algebraic characterization of the optimum of regularized kernel methods. Mach. Learn. 74:315–45
    [Google Scholar]
  95. 95.  Mairal J, Bach F, Ponce J, Sapiro G 2010. Online learning for matrix factorization and sparse coding. J. Mach. Learn. Res. 11:19–60
    [Google Scholar]
  96. 96.  Donoho D 2006. Compressed sensing. IEEE Trans. Inf. Theory 52:1289–306
    [Google Scholar]
  97. 97.  Tibshirani R 1996. Regression shrinkage and selection via the lasso. J. R. Stat. Soc. B 58:267–88
    [Google Scholar]
  98. 98.  Efron B, Hastie T, Johnstone L, Tibshirani R 2004. Least angle regression. Ann. Stat. 32:407–99
    [Google Scholar]
  99. 99.  Yuan M, Lin Y 2006. Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. B 68:49–67
    [Google Scholar]
  100. 100.  Zou H 2006. The adaptive Lasso and its oracle properties. J. Am. Stat. Assoc. 101:1418–29
    [Google Scholar]
  101. 101.  Mackay D 1994. Bayesian non-linear modelling for the prediction competition. ASHRAE Trans. 100:3704–16
    [Google Scholar]
  102. 102.  Wipf D, Nagarajan S 2007. A new view of automatic relevance determination. Advances in Neural Information Processing Systems 20 JC Platt, D Koller, Y Singer, ST Roweis162532 Red Hook, NY: Curran
    [Google Scholar]
  103. 103.  Tipping M 2001. Sparse Bayesian learning and the relevance vector machine. J. Mach. Learn. Res. 1:211–44
    [Google Scholar]
  104. 104.  Mitchell TJ, Beauchamp JJ 1988. Bayesian variable selection in linear regression. J. Am. Stat. Assoc. 83:1023–32
    [Google Scholar]
  105. 105.  Wipf D, Nagarajan S 2010. Iterative reweighted and methods for finding sparse solutions. IEEE J. Sel. Top. Signal Process. 4:317–29
    [Google Scholar]
  106. 106.  Materassi D, Innocenti G 2010. Topological identification in networks of dynamical systems. IEEE Trans. Autom. Control 55:1860–71
    [Google Scholar]
  107. 107.  Chiuso A, Pillonetto G 2012. A Bayesian approach to sparse dynamic network identification. Automatica 48:1553–65
    [Google Scholar]
  108. 108.  Bottegal G, Chiuso A, van den Hof P 2018. On dynamic network modeling of stationary multivariate processes. IFAC-PapersOnLine 51:15850–55
    [Google Scholar]
  109. 109.  Granger C 1963. Economic processes involving feedback. Inf. Control 6:28–48
    [Google Scholar]
  110. 110.  Lind I, Ljung L 2008. Regressor and structure selection in NARX models using a structured ANOVA approach. Automatica 44:383–95
    [Google Scholar]
  111. 111.  Hong X, Mitchell RJ, Chen S, Harris CJ, Li K, Irwin GW 2008. Model selection approaches for non-linear system identification: a review. Int. J. Syst. Sci. 39:925–46
    [Google Scholar]
  112. 112.  Fazel M, Kei PT, Sun D, Tseng P 2013. Hankel matrix rank minimization with applications to system identification and realization. SIAM J. Matrix Anal. Appl. 34:946–77
    [Google Scholar]
  113. 113.  Prando G, Chiuso A, Pillonetto G 2017. Maximum entropy vector kernels for MIMO system identification. Automatica 79:326–39
    [Google Scholar]
  114. 114.  Zorzi M, Sepulchre R 2016. AR identification of latent-variable graphical models. IEEE Trans. Autom. Control 61:2327–40
    [Google Scholar]
  115. 115.  Zorzi M, Chiuso A 2017. Sparse plus low rank network identification: a nonparametric approach. Automatica 76:355–66
    [Google Scholar]
  116. 116.  Pillonetto G 2016. A new kernel-based approach to hybrid system identification. Automatica 70:21–31
    [Google Scholar]
  117. 117.  Goethals I, Pelckmans K, Suykens J, De Moor B 2005. Identification of MIMO Hammerstein models using least squares support vector machines. Automatica 41:1263–72
    [Google Scholar]
  118. 118.  Falck T, Pelckmans K, Suykens J, De Moor B 2009. Identification of Wiener-Hammerstein systems using LS-SVMs. IFAC Proc. Vol. 42:10820–25
    [Google Scholar]
  119. 119.  Goethals I, Pelckmans K, Falck T, Suykens J, De Moor B 2010. NARX identification of Hammerstein systems using least-squares support vector machines. Block-Oriented Nonlinear System Identification F Giri, EW Bai24158 London: Springer
    [Google Scholar]
  120. 120.  Falck T, Dreesen P, Brabanter KD, Pelckmans K, Moor BD, Suykens J 2012. Least-squares support vector machines for the identification of Wiener-Hammerstein systems. Control Eng. Pract. 20:1165–74
    [Google Scholar]
  121. 121.  Lindsten F, Schön T, Jordan M 2012. A semiparametric Bayesian approach to Wiener system identification. IFAC Proc. Vol. 45:161137–42
    [Google Scholar]
  122. 122.  Tether A 1970. Construction of minimal linear state-variable models from finite input-output data. IEEE Trans. Autom. Control 15:427–36
    [Google Scholar]
  123. 123.  Fazel M, Hindi H, Boyd S 2001. A rank minimization heuristic with application to minimum order system approximation. Proceedings of the 2001 American Control Conference 64734–39 New York: IEEE
    [Google Scholar]
  124. 124.  Rudin W 1987. Real and Complex Analysis Singapore: McGraw-Hill
/content/journals/10.1146/annurev-control-053018-023744
Loading
/content/journals/10.1146/annurev-control-053018-023744
Loading

Data & Media loading...

  • Article Type: Review Article
This is a required field
Please enter a valid email address
Approval was a Success
Invalid data
An Error Occurred
Approval was partially successful, following selected items could not be processed due to error