Range Detection of Wa/Kwa Parallel Noun Phrase using a Probabilistic Model and Modification Information

확률모형과 수식정보를 이용한 와/과 병렬사구 범위결정

  • Published : 2008.02.15

Abstract

Recognition of parallel structure at early stage of sentence parsing can reduce the complexity of parsing. In this paper, we propose an unsupervised language-independent probabilistic model for recongition of parallel noun structures. The proposed model is based on the idea of swapping constituents, which replies the properties of symmetry (two or more identical constituents are repeated) and of reversibility (the order of constituents is inter-changeable) in parallel structures. The non-symmetric patterns that cannot be captured by the general symmetry rule are resolved additionally by the modifier information. In particular this paper shows how the proposed model is applied to recognize Korean parallel noun phrases connected by "wa/kwa" particle. Our model is compared with other models including supervised models and performs better on recongition of parallel noun phrases.

한국어 구문 분석의 초기 단계로서 병렬구조의 해석은 파싱의 효율을 높일 수 있다. 본 논문은 병렬구조 해석을 위한 비지도식 언어에 독립적인 확률 모델을 제안한다. 이 모델은 병렬구조의 대칭성과 상호교환성에 근거한다. 대칭성은 같은 구조가 반복된다는 것이고, 교환성은 좌우 구성요소를 교환해도 같은 의미를 지닌다는 것이다. 병렬구조는 일반적으로 대칭성을 따르지만, 수식어의 성질에 따라서 한쪽만을 수식하는 비대칭적인 구조가 출현하기도 한다. 비대칭 병렬구조 해석을 위해서 추가적으로 수식관계 통계정보를 사용한다. 제안된 모델을 본 논문에서는 "와/과" 조사로 이루어진 한국어의 명사구 병렬구조를 해석하는데 사용되는 것[1]을 중점으로 보여준다. 지도적 방식에 의한 모델을 포함한 다른 모델들에 비해 효율적임을 실험적으로 보여준다.

Keywords

References

  1. Kurohashi, Sadao and Makoto Nagao, 1994a. KN Parser: Japanese dependency/case structure analyzer. In Proceedings of Workshop on Sharable Natural Language Resources, pages 4855
  2. Abney, S., 'Parsing by Chunks,' In R.C. Berwick, S.P. Abney and C. Tenny, editors, Principle-Based Parsing: Computation and Psycholinguistics, Kluwer, pp. 257-278, 1991
  3. 이관규, '국어 대등구성 연구', 서광학술 자료사, 1992
  4. 박준식, '품사 패턴을 이용한 한국어 병렬 구문의 해석', 한국과학기술원 석사학위 논문, 1998
  5. Kurohashi, S. and Nagao, M., 'A Syntactic analysis method of long Japanese sentences based on detection of conjunctive structures,' Computational Linguistics, Vol.20, No.4, pp. 507-534, 1994
  6. Quinlan, J. Ross, 'C4.5:Programs for Machine Learning', Morgan Kaufmann Publishers, 1993
  7. Joachims, Thorsten, Learning to Classify Text Using Support Vector Machines. Dissertation, Kluwer, 2002
  8. Corbett, Edward P. J. Classical Rhetoric for the Modern Student. 3rd ed. NY: Oxford University Press, p. 428. 1990
  9. The KAIST corpus 1996-1997, Korea Advanced Institute of Science and Technology, http://korterm.org/, 1997
  10. Resnik, Philip, 'Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language,' Journal of Artificial Intelligence Research, Vol.11, pp. 95-130, 1999 https://doi.org/10.1613/jair.514
  11. Jaynes, E.T., 'Information theory and statistical mechanics,' Physics Reviews106, pp. 620-630, 1957 https://doi.org/10.1103/PhysRev.106.620
  12. Eric Sven Ristad. 1998. Maximum entropy modeling toolkit, release 1.6 beta. http://www.mnemonic. com/software/memt
  13. Brown, P. F., S. A. Della Pietra, V. J. Della Pietra, and R. L. Mercer. 'The mathematics of statistical machine translation: Parameter estimation. Computational linguistics, Vol.19, pp. 263-312, 1993
  14. Och, Franz Josef, Hermann Ney, 'A Systematic Comparison of Various Statistical Alignment Models,' Computational Linguistics, 29(1):19-51, 2003 https://doi.org/10.1162/089120103321337421
  15. Choi, Yong-Seok, Ji-Ae Shin, Key-Sun Choi (2006), Identification of Boundaries in Parallel Noun Phrases: A Probabilistic Swapping Model, International Journal of Computer Processing of Oriental Languages, 19(2&3), 109-132 https://doi.org/10.1142/S0219427906001451
  16. Choi, Key-Sun, Hee-Sook Bae, Procedures and Problems in Korean-Chinese-Japanese Wordnet with Shared Semantic Hierarchy, WordNet Conference, pp. 320-325, 2004.1, Brno, Czech
  17. Yoon, Juntae, Key-Sun Choi, Mansuk Song 'Corpus-Based Approach for Nominal Compound Analysis for Korean Based on Linguistic and Statistical Information,' Natural Language Engineering vol 7/No 3, 251-270, 2001