Labeling Q-Learning for Maze Problems with Partially Observable States

  • Lee, Hae-Yeon (Dept. of Electrical and Communication Engineering, Graduate School of Engineering, Tohoku Univ) ;
  • Hiroyuki Kamaya (Dept. of Electrical Engineering, Hachinohe National College of Technology) ;
  • Kenich Abe (Dept. of Electrical and Communication Engineering, Graduate School of Engineering, Tohoku Univ)
  • Published : 2000.10.01

Abstract

Recently, Reinforcement Learning(RL) methods have been used far teaming problems in Partially Observable Markov Decision Process(POMDP) environments. Conventional RL-methods, however, have limited applicability to POMDP To overcome the partial observability, several algorithms were proposed [5], [7]. The aim of this paper is to extend our previous algorithm for POMDP, called Labeling Q-learning(LQ-learning), which reinforces incomplete information of perception with labeling. Namely, in the LQ-learning, the agent percepts the current states by pair of observation and its label, and the agent can distinguish states, which look as same, more exactly. Labeling is carried out by a hash-like function, which we call Labeling Function(LF). Numerous labeling functions can be considered, but in this paper, we will introduce several labeling functions based on only 2 or 3 immediate past sequential observations. We introduce the basic idea of LQ-learning briefly, apply it to maze problems, simple POMDP environments, and show its availability with empirical results, look better than conventional RL algorithms.

Keywords