Abstract
Cognition and control processes share many similar characteristics, and decisionmaking and learning under the paradigm of multiple models has increasingly gained attention in both fields. The controlled finite Markov chain (CFMC) approach enables to deal with a large variety of signals and systems with multivariable, nonlinear, and stochastic nature. In this article, adaptive control based on multiple models is considered. For a set of candidate plant models, CFMC models (and controllers) are constructed off-line. The outcomes of the CFMC models are compared with frequentist information obtained from on-line data. The best model (and controller) is chosen based on the Kullback–Leibler information. This approach to adaptive control emphasizes the use of physical models as the basis of reliable plant identification. Three series of simulations are conducted: to examine the performace of the developed Matlab-tools; to illustrate the approach in the control of a nonlinear nonminimum phase van der Vusse CSTR plant; and to examine the suggested model selection method for the adaptive control.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.References
Berthiaux H, Mizonov V. Applications of Markov chains in particulate process engineering: a review. Can J Chem Eng. 2004;82:1143–68.
Bertsekas DP. Dynamic programming and optimal control. Belmont, Massachusetts: Athena Scientific; 2007.
Chen H, Kremling A, Allgöwer F. Nonlinear predictive control of a benchmark CSTR. In: Proceedings of the 3rd european control conference, Rome, Italy; 1995. p. 3247–58.
Filev D. Model bank based intelligent control. In Proceedings of the NAFIPS, New Orleans, USA; 2002. p. 583–6.
Ghahramani Z, Jordan MI. Learning From Incomplete Data. Lab Memo No. 1509, Center for Biological and Computational learning, Department of Brain and Cognitive Sciences, Paper No. 108, MIT Artificial Intelligence Laboratory; 1994.
Gosavi, A. Reinforcement learning: a tutorial survey and recent advances. INFORMS J Comput. 2009;21(2):178–92.
Häggström O. Finite Markov chains and algorithmic applications. Cambridge: Cambridge University Press; 2002.
Hsu CS. Cell-to-cell mapping—a method of global analysis for nonlinear systems. New York: Springer-Verlag; 1987.
Hussain A, Abdullah R, Chambers J, Gurney K, Redgrave P. Emergent common functional principles in control theory and the vertebrate brain: a case study with autonomous vehicle control. Artificial Neural Networks—ICANN, Prague, Czech Republic; 2008. p. 949–58.
Hyötyniemi H. Neocybernetics in biology. Helsinki University of Technology, Control Engineering Laboratory, Report 151, Finland, 2006; 267 p.
Ikonen E, Najim K. Process identification and control. New York: Marcel Dekker; 2002. p. 310.
Jacobs R, Jordan M, Nowlan S, Hinton G. Adaptive mixtures of local experts. Neural Comput. 1991;3:79–87.
Johansen TA, Murray-Smith R. The operating regime approach to nonlinear modelling and control. In: Murray-Smith R, Johansen TA (eds) Multiple model approaches to modelling and control. London: Taylor and Francis; 1997, p. 3–72.
Kaelbling LP, Littman ML, Moore AW. Reinforcement learning: a survey. J Artif Intell Res. 1996;4:237–85.
Kárný M, Kracík J, Guy TV. Cooperative decision making without facilitator. St. Petersburg, Russia: IFAC ALCOSP; 2007.
Kemeny JG, Snell JL. Finite Markov chains. Princeton: van Nostrand; 1960.
Kulhavy R. Kullback–Leibler distance approach to system identification. In: IFAC symposium on adaptive systems in control and signal processing, Budapest, 1995; p. 55–66.
Lee JM, Lee JH. Approximate dynamic programming strategies and their applicability for process control: a review and future directions. Int J Control Autom Syst. 2004;2(3):263–78.
Li XR, Zhao Z, Li X-B. General model-set design methods for multiple-model approach. IEEE Trans Autom Control. 2005;50(9):1260–76.
Lunze J. On the Markov property of quantised state measurement sequences. Automatica. 1998;32(11):1439–44.
Lunze J, Nixdorf B, Richter H. Process supervision by means of a hybrid model. J Proc Cont. 2001;11:89–104.
Motter M, Principe JC. Predictive multiple model switching control with the self-organizing map. Int J Robust Nonlinear Control. 2002;12(11):1029–51.
Murray-Smith R, Johansen TA. Multiple model approaches to modelling and control. London: Taylor & Francis; 1997.
Narendra KS, Balakrishnan J, Ciliz MK. Adaptation and learning using multiple models, switching and tuning. IEEE Control Syst. 1995;15(3):37–51.
Poznyak AS, Najim K, Gómez-Ramírez E. Self-learning control of finite Markov chains. New York: Marcel Dekker; 2000.
Puterman ML. Markov decision processes—discrete stochastic dynamic programming. New York: Wiley & Sons; 1994.
Riordon JS. An adaptive automation controller for discrete-time Markov processes. Automatica. 1969;5:721–730.
Sanz, R. Cognition and control. A preparatory workshop for EU seventh research framework programme, 9 March 2006, Luxembourg; 2006.
Shah S, Cluett W. Recursive least squares based estimation schemes for self-tuning control. Can J Chem Eng. 1991;69:89–96.
Shorten R, Murray-Smith R, Bjorgan R, Gollee H. On the interpretation of local models in blended multiple model structures. Int J Control. 1999;72(7/8):620–8.
White DJ. Real applications of Markov decision processes. Interfaces. 1985;15(6):73–83.
White DJ. Further real applications of Markov decision processes. Interfaces. 1989;18(5):55–61.
White DJ. A survey of applications of Markov decision processes. J Oper Res Soc. 1993;44(11):1073–96.
Acknowledgements
The authors thank the anonymous reviewers for their helpful comments and suggestions.
Author information
Authors and Affiliations
Corresponding author
Appendix A: Kullback–Leibler Distance
Appendix A: Kullback–Leibler Distance
Following Kulhavy [17], suppose that X 1, X 2, …, X k+1 form a controlled Markov chain with a conditional probability mass function S(y|z,u) = Pr \(\{X_{k+1}=y {\vert} X_{k}=z, U_{k}=u\}.\) The transition probability distribution is known partially, it is assumed to belong to a family S θ. The task is to estimate θ.
For a sequence of observations \({\bf x}=\left( x_{1},x_{2},...,x_{k+1}\right) \) and control actions \({\bf u}=\left( u_{1},u_{2},...,u_{k+1}\right), \) we have
where
-
\(R_{\bf x,u}(a,b,c)\) is the empirical distribution \(R_{{\bf x,u}}( a,b,c) =\frac{N_{\bf x,u}( a,b,c)}{k},\) and \(N_{{\bf x,u}}(a,b,c)\) counts the number of occurrences of the triplets (a,b,c) in the sequence formed by \({\bf x}\) and \({\bf u}.\)
-
\(\overline{H}(R_{{\bf x,u}})\) is the conditional Shannon entropy
\( \overline{H}\left(R_{{\bf x,u}}\right) =\sum_{\left( y,z,v\right) }R_{ {\bf x,u}}\left( y,z,v\right) \log R_{{\bf x,u}}\left( y,z,v\right) +\sum_{\left( z,v\right) }R_{{\bf x,u}}\left( z,v\right) \log R_{ {\bf x,u}}\left( z,v\right)\)
of a random variable Y given another random variable Z and a control action V, described jointly by a probability distribution \(R_{{\bf x,u}},\) and
-
\(\overline{D}\left( R_{{\bf x,u}}||S_{\theta }\right) \) is the conditional Kullback–Leibler distance \( \overline{D}\left( R_{{\bf x,u}}||S_{\theta }\right) =\sum_{\left( y,z,v\right) }R_{{\bf x,u}}\left( y,z,v\right) \log \frac{{R_{{\bf x,u}}}(y,z,v)}{S_{\theta}( y|z,v) R_{\bf x,u}(z,v)} \)
-
of a joint probability distribution \(R_{{\bf x,u}}\) and a conditional distribution S θ.
Further, we can regard any of the conditional distributions S θ(y|z,v) as a set of distributions S z,vθ (y), and conditional empirical distribution \(R_{{\bf x,u}}\left( y|z,v\right) \) as a set of points \(R_{{\bf x}}^{z,v}\left( y\right), \) \(\% z\in {\mathcal{X}},\) \(v\in {\mathcal{U}}.\) We can then write
The posterior distribution of the unknown parameter \(\theta \) conditional on \({\bf x}\) and \({\bf u}\) is
since the conditional entropy \(\overline{H}(R_{{\bf x}})\) does not depend on θ.
Rights and permissions
About this article
Cite this article
Ikonen, E., Najim, K. Multiple Model-Based Control Using Finite Controlled Markov Chains. Cogn Comput 1, 234–243 (2009). https://doi.org/10.1007/s12559-009-9020-0
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12559-009-9020-0