(The Development of Janggi Board Game Using Backpropagation Neural Network and Q Learning Algorithm)

;;;;

Journal of the Institute of Electronics Engineers of Korea TE (대한전자공학회논문지TE)

Volume 39 Issue 1
/
Pages.83-90
/
2002
/
1229-7380(pISSN)

The Institute of Electronics and Information Engineers (대한전자공학회)

(The Development of Janggi Board Game Using Backpropagation Neural Network and Q Learning Algorithm)

역전파 신경회로망과 Q학습을 이용한 장기보드게임 개발

황상문 (전주공업대학 전자정보과) ;
박인규 (중부대학교 정보공학부 전자계산학과) ;
백덕수 (익산대학 전자정보과) ;
진달복 (원광대학교 전기전자공학부 전자공학과)

Published : 2002.03.01

PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

This paper proposed the strategy learning method by means of the fusion of Back-Propagation neural network and Q learning algorithm for two-person, deterministic janggi board game. The learning process is accomplished simply through the playing each other. The system consists of two parts of move generator and search kernel. The one consists of move generator generating the moves on the board, the other consists of back-propagation and Q learning plus $\alpha$$\beta$ search algorithm in an attempt to learn the evaluation function. while temporal difference learns the discrepancy between the adjacent rewards, Q learning acquires the optimal policies even when there is no prior knowledge of effects of its moves on the environment through the learning of the evaluation function for the augmented rewards. Depended on the evaluation function through lots of games through the learning procedure it proved that the percentage won is linearly proportional to the portion of learning in general.

본 논문은 2인용 보드게임의 정보에 대한 전략을 학습할 수 있는 방법을 역전파 신경회로망과 Q학습알고리즘을 이용하여 제안하였다. 학습의 과정은 단순히 상대프로세스와의 대국에 의하여 이루어진다. 시스템의 구성은 탐색을 담당하는 부분과 기물의 수를 발생하는 부분으로 구성되어 있다. 수의 발생부분은 보드의 상태에 따라서 갱신되고, 탐색커널은 αβ 탐색을 기본으로 역전파 신경회로망과 Q학습을 결합하여 게임에 대해 양호한 평가함수를 학습하였다. 학습의 과정에서 일련의 기물의 이동에 있어서 인접한 평가치들의 차이만을 줄이는 Temporal Difference학습과는 달리, 기물의 이동에 따른 평가치에 대해 갱신된 평가치들을 이용하여 평가함수를 학습함으로써 최적의 전략을 유도할 수 있는 Q학습알고리즘을 사용하였다. 일반적으로 많은 학습을 통하여 평가함수의 정확도가 보장되면 승률이 학습의 양에 비례함을 알 수 있었다.

Keywords

References

Boyan, J. A. (1992). Modular neural networks for learning. Master's thesis, University of Cambridge. Available via FTP from archive. ohiostate.edu:/pub/neuroprose
Hecht-Nielsen, R.(1989). Neurocomputing. Addison-Wesley Publishing Company, Inc. Holland, J. H. (1983). Escaping brittleness. In Proceedings of the International Machine Learning Workshop, pp 92-95
Hopfield, J. (1982). Neural networks and physical systems with emergent collective computational. abilities. In Proceedings of the National Academy of Sciences USA, volume 79, pp 2554-2558
Lee, K.-F. and Mahajan, S.(1988). A pattern classification approach to evaluation function learning. Artifical Intelligence, 36,1-25
McKinsey, J. C. (1952). Introduction to the theory of games. The RAND Series. McGraw-Hill Book Company, Inc
Minsky, M. and papert, S. (1969). Perceptrons. MIT Press, Cambirdge. Shannon, C. E. (1950). Programming a computer for playing chess. Philosophy Magazine, 41,256-275
Sutton, R. S. (1984). Temporal credit assignment in reinforcement learning. PhD Thesis, University of Massachusetts, Amherst
Tom M. Mitchell, (1997). Machine learning, The McGraw-Hill Companies, Inc. pp. 367-387

Journal of the Institute of Electronics Engineers of Korea TE (대한전자공학회논문지TE)

(The Development of Janggi Board Game Using Backpropagation Neural Network and Q Learning Algorithm)

역전파 신경회로망과 Q학습을 이용한 장기보드게임 개발

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)