Model-free $H_{\infty}$ Control of Linear Discrete-time Systems using Q-learning and LMI Based on I/O Data

입출력 데이터 기반 Q-학습과 LMI를 이용한 선형 이산 시간 시스템의 모델-프리 $H_{\infty}$ 제어기 설계

  • Published : 2009.07.01

Abstract

In this paper, we consider the design of $H_{\infty}$ control of linear discrete-time systems having no mathematical model. The basic approach is to use Q-learning which is a reinforcement learning method based on actor-critic structure. The model-free control design is to use not the mathematical model of the system but the informations on states and inputs. As a result, the derived iterative algorithm is expressed as linear matrix inequalities(LMI) of measured data from system states and inputs. It is shown that, for a sufficiently rich enough disturbance, this algorithm converges to the standard $H_{\infty}$ control solution obtained using the exact system model. A simple numerical example is given to show the usefulness of our result on practical application.

Keywords

References

  1. M. Abu-Khalaf and F.L. Lewis, 'Nearly optimal controls laws for nonlinear systems with saturating actuators using a neural network HJB approach,' Automatica ,vol. 41, no.5, pp.779-791, 2005 https://doi.org/10.1016/j.automatica.2004.11.034
  2. A AI-Tamimi, M. Abu-Khalaf and F.L. Lewis, 'Model-Free Q-Learning Designs for Discrete-Time Zero-Sum Games with Application to H-Infinity Control,' Automatica, vol.43, no.3, pp.473-482, 2007 https://doi.org/10.1016/j.automatica.2006.09.019
  3. S.J. Bradtke, B.E. Ydstie and A.G. Barto, 'Adaptive Linear Quadratic Control Using Policy Iteration, Proc. of ACC, pp.3475-3476, 1994 https://doi.org/10.1109/ACC.1994.735224
  4. G. Saridis and C.S. Lee, 'An Approximation Theory of optimal Control for Trainable Manipulators,' IEEE Trans. Systems, Man, Cybernetics, vol.9, no.3, pp.152-159, 1979 https://doi.org/10.1109/TSMC.1979.4310171
  5. P.J. Werbos, 'Approximate dynamic programming for real-time control and neural modeling,' Handbook of Intelligent Control, edited by D.A White and D.A Sofge, New York: Van Nostrand Reinhold, 1992
  6. R.A Howard, Dynamic Programming and Markov Processes, MIT Press, Cambridge, 1960
  7. P. Werbos, 'Neural networks for control and system identification', Proc. of CDC, 1989
  8. C.J. Watkins. Learning from delayed rewards, Ph.D. Thesis, University of Cambridge, England, 1989
  9. S. Boyd, L. El Ghaoui, E. Feron and V. Balakrishnan, Linear matrix inequalities in systems and control theory, Philadelphia, PA: SIAM, 1994
  10. D.P. Bertsekas and J.N. Tsitsiklis. Neuro-Dynamic Programming. Athena Scientific, MA1996
  11. K. Zhou and J.C. Doyle. Essentials of robust control, Prentice-Hall, 1997
  12. R.S. Sutton and A.G. Barto. Reinforcement Learning-An introduction, MIT Press, Cambridge, 1998
  13. J. Si, A. Barto, W. Powel and D. Wunch, Handbook of Learning and Approximate Dynamic Programming, John Wiley, New Jersey, 2004