[go: up one dir, main page]

Skip to main content
Log in

Multiple Model-Based Control Using Finite Controlled Markov Chains

  • Published:
Cognitive Computation Aims and scope Submit manuscript

Abstract

Cognition and control processes share many similar characteristics, and decisionmaking and learning under the paradigm of multiple models has increasingly gained attention in both fields. The controlled finite Markov chain (CFMC) approach enables to deal with a large variety of signals and systems with multivariable, nonlinear, and stochastic nature. In this article, adaptive control based on multiple models is considered. For a set of candidate plant models, CFMC models (and controllers) are constructed off-line. The outcomes of the CFMC models are compared with frequentist information obtained from on-line data. The best model (and controller) is chosen based on the Kullback–Leibler information. This approach to adaptive control emphasizes the use of physical models as the basis of reliable plant identification. Three series of simulations are conducted: to examine the performace of the developed Matlab-tools; to illustrate the approach in the control of a nonlinear nonminimum phase van der Vusse CSTR plant; and to examine the suggested model selection method for the adaptive control.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. Berthiaux H, Mizonov V. Applications of Markov chains in particulate process engineering: a review. Can J Chem Eng. 2004;82:1143–68.

    Article  CAS  Google Scholar 

  2. Bertsekas DP. Dynamic programming and optimal control. Belmont, Massachusetts: Athena Scientific; 2007.

    Google Scholar 

  3. Chen H, Kremling A, Allgöwer F. Nonlinear predictive control of a benchmark CSTR. In: Proceedings of the 3rd european control conference, Rome, Italy; 1995. p. 3247–58.

  4. Filev D. Model bank based intelligent control. In Proceedings of the NAFIPS, New Orleans, USA; 2002. p. 583–6.

  5. Ghahramani Z, Jordan MI. Learning From Incomplete Data. Lab Memo No. 1509, Center for Biological and Computational learning, Department of Brain and Cognitive Sciences, Paper No. 108, MIT Artificial Intelligence Laboratory; 1994.

  6. Gosavi, A. Reinforcement learning: a tutorial survey and recent advances. INFORMS J Comput. 2009;21(2):178–92.

    Article  Google Scholar 

  7. Häggström O. Finite Markov chains and algorithmic applications. Cambridge: Cambridge University Press; 2002.

    Google Scholar 

  8. Hsu CS. Cell-to-cell mapping—a method of global analysis for nonlinear systems. New York: Springer-Verlag; 1987.

    Google Scholar 

  9. Hussain A, Abdullah R, Chambers J, Gurney K, Redgrave P. Emergent common functional principles in control theory and the vertebrate brain: a case study with autonomous vehicle control. Artificial Neural Networks—ICANN, Prague, Czech Republic; 2008. p. 949–58.

  10. Hyötyniemi H. Neocybernetics in biology. Helsinki University of Technology, Control Engineering Laboratory, Report 151, Finland, 2006; 267 p.

  11. Ikonen E, Najim K. Process identification and control. New York: Marcel Dekker; 2002. p. 310.

    Google Scholar 

  12. Jacobs R, Jordan M, Nowlan S, Hinton G. Adaptive mixtures of local experts. Neural Comput. 1991;3:79–87.

    Article  Google Scholar 

  13. Johansen TA, Murray-Smith R. The operating regime approach to nonlinear modelling and control. In: Murray-Smith R, Johansen TA (eds) Multiple model approaches to modelling and control. London: Taylor and Francis; 1997, p. 3–72.

    Google Scholar 

  14. Kaelbling LP, Littman ML, Moore AW. Reinforcement learning: a survey. J Artif Intell Res. 1996;4:237–85.

    Google Scholar 

  15. Kárný M, Kracík J, Guy TV. Cooperative decision making without facilitator. St. Petersburg, Russia: IFAC ALCOSP; 2007.

    Google Scholar 

  16. Kemeny JG, Snell JL. Finite Markov chains. Princeton: van Nostrand; 1960.

    Google Scholar 

  17. Kulhavy R. Kullback–Leibler distance approach to system identification. In: IFAC symposium on adaptive systems in control and signal processing, Budapest, 1995; p. 55–66.

  18. Lee JM, Lee JH. Approximate dynamic programming strategies and their applicability for process control: a review and future directions. Int J Control Autom Syst. 2004;2(3):263–78.

    Google Scholar 

  19. Li XR, Zhao Z, Li X-B. General model-set design methods for multiple-model approach. IEEE Trans Autom Control. 2005;50(9):1260–76.

    Article  Google Scholar 

  20. Lunze J. On the Markov property of quantised state measurement sequences. Automatica. 1998;32(11):1439–44.

    Article  Google Scholar 

  21. Lunze J, Nixdorf B, Richter H. Process supervision by means of a hybrid model. J Proc Cont. 2001;11:89–104.

    Article  CAS  Google Scholar 

  22. Motter M, Principe JC. Predictive multiple model switching control with the self-organizing map. Int J Robust Nonlinear Control. 2002;12(11):1029–51.

    Article  Google Scholar 

  23. Murray-Smith R, Johansen TA. Multiple model approaches to modelling and control. London: Taylor & Francis; 1997.

  24. Narendra KS, Balakrishnan J, Ciliz MK. Adaptation and learning using multiple models, switching and tuning. IEEE Control Syst. 1995;15(3):37–51.

    Article  Google Scholar 

  25. Poznyak AS, Najim K, Gómez-Ramírez E. Self-learning control of finite Markov chains. New York: Marcel Dekker; 2000.

    Google Scholar 

  26. Puterman ML. Markov decision processes—discrete stochastic dynamic programming. New York: Wiley & Sons; 1994.

    Google Scholar 

  27. Riordon JS. An adaptive automation controller for discrete-time Markov processes. Automatica. 1969;5:721–730.

    Article  Google Scholar 

  28. Sanz, R. Cognition and control. A preparatory workshop for EU seventh research framework programme, 9 March 2006, Luxembourg; 2006.

  29. Shah S, Cluett W. Recursive least squares based estimation schemes for self-tuning control. Can J Chem Eng. 1991;69:89–96.

    Article  CAS  Google Scholar 

  30. Shorten R, Murray-Smith R, Bjorgan R, Gollee H. On the interpretation of local models in blended multiple model structures. Int J Control. 1999;72(7/8):620–8.

    Google Scholar 

  31. White DJ. Real applications of Markov decision processes. Interfaces. 1985;15(6):73–83.

    Article  Google Scholar 

  32. White DJ. Further real applications of Markov decision processes. Interfaces. 1989;18(5):55–61.

    Article  Google Scholar 

  33. White DJ. A survey of applications of Markov decision processes. J Oper Res Soc. 1993;44(11):1073–96.

    Article  Google Scholar 

Download references

Acknowledgements

The authors thank the anonymous reviewers for their helpful comments and suggestions.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Enso Ikonen.

Appendix A: Kullback–Leibler Distance

Appendix A: Kullback–Leibler Distance

Following Kulhavy [17], suppose that X 1, X 2, …, X k+1 form a controlled Markov chain with a conditional probability mass function S(y|z,u) = Pr \(\{X_{k+1}=y {\vert} X_{k}=z, U_{k}=u\}.\) The transition probability distribution is known partially, it is assumed to belong to a family S θ. The task is to estimate θ.

For a sequence of observations \({\bf x}=\left( x_{1},x_{2},...,x_{k+1}\right) \) and control actions \({\bf u}=\left( u_{1},u_{2},...,u_{k+1}\right), \) we have

$$ \begin{aligned} S_{\theta }^{k}\left( {\bf x}|x_{1},{\bf u}\right) &=\prod_{i=1}^{k}S_{\theta }\left( x_{i+1}|x_{i},u_{i}\right)\\ &=\exp \left\{\sum_{i=1}^{k}\log S_{\theta }\left( x_{i+1}|x_{i},u_{i}\right) \right\}\\ &=\exp\left\{ k\sum_{\left( y,z,v\right) \in {\mathcal{X}}^{2}\times {\mathcal{U }}}R_{{\bf x,u}}\left( y,z,v\right) \log S_{\theta }\left( y|z,v\right) \right\}\\ &=\exp \left\{ -k\left[\overline{H}\left( R_{{\bf x}}\right) +\overline{D} \left( R_{{\bf x}}||S_{\theta }\right) \right] \right\} \end{aligned} $$

where

  • \(R_{\bf x,u}(a,b,c)\) is the empirical distribution \(R_{{\bf x,u}}( a,b,c) =\frac{N_{\bf x,u}( a,b,c)}{k},\) and \(N_{{\bf x,u}}(a,b,c)\) counts the number of occurrences of the triplets (a,b,c) in the sequence formed by \({\bf x}\) and \({\bf u}.\)

  • \(\overline{H}(R_{{\bf x,u}})\) is the conditional Shannon entropy

    \( \overline{H}\left(R_{{\bf x,u}}\right) =\sum_{\left( y,z,v\right) }R_{ {\bf x,u}}\left( y,z,v\right) \log R_{{\bf x,u}}\left( y,z,v\right) +\sum_{\left( z,v\right) }R_{{\bf x,u}}\left( z,v\right) \log R_{ {\bf x,u}}\left( z,v\right)\)

    of a random variable Y given another random variable Z and a control action V, described jointly by a probability distribution \(R_{{\bf x,u}},\) and

  • \(\overline{D}\left( R_{{\bf x,u}}||S_{\theta }\right) \) is the conditional Kullback–Leibler distance \( \overline{D}\left( R_{{\bf x,u}}||S_{\theta }\right) =\sum_{\left( y,z,v\right) }R_{{\bf x,u}}\left( y,z,v\right) \log \frac{{R_{{\bf x,u}}}(y,z,v)}{S_{\theta}( y|z,v) R_{\bf x,u}(z,v)} \)

  • of a joint probability distribution \(R_{{\bf x,u}}\) and a conditional distribution S θ.

Further, we can regard any of the conditional distributions S θ(y|z,v) as a set of distributions S z,vθ (y), and conditional empirical distribution \(R_{{\bf x,u}}\left( y|z,v\right) \) as a set of points \(R_{{\bf x}}^{z,v}\left( y\right), \) \(\% z\in {\mathcal{X}},\) \(v\in {\mathcal{U}}.\) We can then write

$$ \overline{D}(R_{{\bf x,u}}||S_{\theta }) =\sum_{( z,v) \in {\mathcal{X}}\times {\mathcal{U}}}R_{{\bf x,u}}( z,v) D( R_{{\bf x,u}}^{z,v}||S_{\theta }^{z,v}). $$

The posterior distribution of the unknown parameter \(\theta \) conditional on \({\bf x}\) and \({\bf u}\) is

$$ P_{{\bf x}}\propto P(\theta) S_{\theta }^{k}({\bf x} ,{\bf u}|x_{1}) \propto P(\theta) \exp \{ -k \overline{D}(R_{{\bf x,u}}||S_{\theta })\} $$

since the conditional entropy \(\overline{H}(R_{{\bf x}})\) does not depend on θ.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Ikonen, E., Najim, K. Multiple Model-Based Control Using Finite Controlled Markov Chains. Cogn Comput 1, 234–243 (2009). https://doi.org/10.1007/s12559-009-9020-0

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12559-009-9020-0

Keywords

Navigation