Explorized Policy Iteration For Continuous-Time Linear Systems

Lee, Jae-Young;Chun, Tae-Yoon;Choi, Yoon-Ho;Park, Jin-Bae;

doi:10.5370/KIEE.2012.61.3.451

The Transactions of The Korean Institute of Electrical Engineers (전기학회논문지)

Volume 61 Issue 3
/
Pages.451-458
/
2012
/
1975-8359(pISSN)
/
2287-4364(eISSN)

The Korean Institute of Electrical Engineers (대한전기학회)

DOI QR Code

Explorized Policy Iteration For Continuous-Time Linear Systems

연속시간 선형시스템에 대한 탐색화된 정책반복법

이재영 (연세대학교 전기전자공학과) ;
전태윤 (연세대학교 전기전자공학과) ;
최윤호 (경기대학교 전자공학부) ;
박진배 (연세대학교 전기전자공학과)

Received : 2011.11.29
Accepted : 2012.02.20
Published : 2012.03.01

https://doi.org/10.5370/KIEE.2012.61.3.451 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

This paper addresses the problem that policy iteration (PI) for continuous-time (CT) systems requires explorations of the state space which is known as persistency of excitation in adaptive control community, and as a result, proposes a PI scheme explorized by an additional probing signal to solve the addressed problem. The proposed PI method efficiently finds in online fashion the related CT linear quadratic (LQ) optimal control without knowing the system matrix A, and guarantees the stability and convergence to the LQ optimal control, which is proven in this paper in the presence of the probing signal. A design method for the probing signal is also presented to balance the exploration of the state space and the control performance. Finally, several simulation results are provided to verify the effectiveness of the proposed explorized PI method.

Keywords

References

R. A. Howard, Dynamic Programming and Markov Processes, Cambridge, MA: MIT Press, 1960.
R. S. Sutton and A. G.. Barto, Reinforcement Learning: an introduction, MIT Press, Cambridge, Massachussetts, 1998.
F. Y. Wang, H. Zhang, and D. Liu, "Adaptive dynamic programming: an introduction," IEEE Computational Intelligent Magazine, vol. 4, no. 2, pp. 39-47, 2009. https://doi.org/10.1109/MCI.2009.932261
J. J. Murray, C. J. Cox, G. G. Lendaris, and R. Saeks, "Adaptive dynamic programming," IEEE Trans. Systems, Mans and Cybernetics, vol. 32, no. 2, pp. 140-153, 2002. https://doi.org/10.1109/TSMCC.2002.801727
F. L. Lewis and D. Vrabie, "Reinforcement learning and adaptive dynamic programming for feedback control," IEEE Circuits and Systems Magazine, vol. 9, no. 3, pp. 32-50, 2009. https://doi.org/10.1109/MCAS.2009.933854
S. J. Bradke and B. E. Ydstie, "Adaptive linear quadratic control using policy iteration," Proc. American Control Conference, pp. 3475-3479, 1994.
K. J. Zhang, Y. K. Xu, X. Chen, and X. R. Cao, "Policy iteration based feedback control," Automatica, vol. 44, no. 4, pp. 1055-1061, 2008. https://doi.org/10.1016/j.automatica.2007.08.014
D. Vrabie, O. Pastravanu, M. Abu-Khalaf, and F. L. Lewis, "Adaptive optimal control for continuoustime linear systems based on policy iteration," Automatica, vol. 45, no. 2, pp. 477-484, 2009. https://doi.org/10.1016/j.automatica.2008.08.017
D. Vrabie, O. Pastravanu, and F. L. Lewis, "Policy iteration for continuous-time systems with unkown internal dynamics," In Proc. Mediterranean Conf. Control and Automation, Athens, Greece, 2007.
L. Kleinman, "On an iterative technique for Riccati equation computations," IEEE Trans. Automatic Control, vol. AC-13, no. 1, pp. 114-115, 1968.
R. Beard, G.. Saridis, and J. Wen, "Approximate solutions to the time-invariant Hamilton-Jacobi-Bellman equation," Journal of Optimization Theory and Applications, vol. 96, no. 3, pp. 589-626, 1998. https://doi.org/10.1023/A:1022664528457
H. K. Khalil, Nonlinear Systems, Prentice Hall, 2002.
J. C. Willems, P. Rapisarda, I. Markovsky, and B. L. M. Moor, "A note on persistency of excitation," Systems & Control Letters, vol. 54, no. 4, pp. 325-329, 2005. https://doi.org/10.1016/j.sysconle.2004.09.003
G. Strang, Linear Algebra and Its Applications, California: Thomson Higher Edition, 2006.
B. L. Stevens and F. L. Lewis, Aircraft Control and Simulations, Willey, 2nd Edition, 2003.
J. Y. Lee, J. B. Park, and Y. H. Choi, 'Policyiteration- based adaptive optimal control for uncertain continuous-time linear systems with excitation signals, Int'l Conf. on Control, Automation, and Systems (ICCAS), Ilsan, South Korea, Oct. 2010.

The Transactions of The Korean Institute of Electrical Engineers (전기학회논문지)

Explorized Policy Iteration For Continuous-Time Linear Systems

연속시간 선형시스템에 대한 탐색화된 정책반복법

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)