Korean Probabilistic Syntactic Model using Head Co-occurrence

Lee, Kong-Joo;Kim, Jae-Hoon;

doi:10.3745/KIPSTB.2002.9B.6.809

The KIPS Transactions:PartB (정보처리학회논문지B)

Volume 9B Issue 6
/
Pages.809-816
/
2002
/
1598-284X(pISSN)

Korea Information Processing Society (한국정보처리학회)

DOI QR Code

Korean Probabilistic Syntactic Model using Head Co-occurrence

중심어 간의 공기정보를 이용한 한국어 확률 구문분석 모델

이공주 (㈜한국마이크로소프트) ;
김재훈 (한국해양대학교 컴퓨터공학과)

Published : 2002.12.01

https://doi.org/10.3745/KIPSTB.2002.9B.6.809 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Since a natural language has inherently structural ambiguities, one of the difficulties of parsing is resolving the structural ambiguities. Recently, a probabilistic approach to tackle this disambiguation problem has received considerable attention because it has some attractions such as automatic learning, wide-coverage, and robustness. In this paper, we focus on Korean probabilistic parsing model using head co-occurrence. We are apt to meet the data sparseness problem when we're using head co-occurrence because it is lexical. Therefore, how to handle this problem is more important than others. To lighten the problem, we have used the restricted and simplified phrase-structure grammar and back-off model as smoothing. The proposed model has showed that the accuracy is about 84%.

구문 분석에서 가장 큰 문제점 중 하나는 구문 구조의 중의성을 어떻게 해결하느냐에 달려있다. 확률 구문 규칙은 구문 구조의 중의성 해결에 한 방법이 될 수 있다. 본 논문에서는 중심어 간의 공기정보를 이용하여 한국어 구문 구조의 중의성을 해결하는 확률 모델을 제안하고자 한다. 중심어는 어휘를 이용하기 때문에 자료 부족 문제를 야기시킬 수 있다. 이 때문에 자료부족 문제를 어떻게 해결하느냐에 따라 어휘 정보 사용의 성공이 결정될 수 있다. 본 논문에서는 구문규칙을 단순화하고 Back-off 방법을 이용해서 이 문제를 완화한다. 제안된 모델은 실험 데이터에 대해 약 84%의 정확도를 보였다.

Keywords

References

김재훈, '부분 구문분석 방법론' 정보처리학회지, 제7권 제6 호, pp.83-96, 2000
박성배, 장병탁, '최대 엔트로피 모델을 이용한 텍스트 단위화 학습' 제13회 한글 및 한국어 정보처리학술대회논문집, pp. 130-138, 2001
송영빈, 채영숙, 박용일, 이정민, 설가영, 황혜리, 한나리, 최기선. '동사의 애매성 해소를 위한 구문의미사전의 구축' 한글 및 한국어 정보처리학술대회, pp.280-287, 1999
신승은, 서영훈, '부사 정보를 이용한 한국어 구조 중의성 해소' 한글 및 한국어 정보처리학술대회, pp.110-115, 2000
이공주, 김재훈, 김길창, '제한된 형태의 구구조 문법에 기반한 한국어 구문 분석' 정보과학회논문지(B), 제25권 제4호, pp. 722-732, 1998
이공주, 김재훈, 최기선, 김길창, '구문 트리 부착 코퍼스 구축을 위한 한국어 구문 태그' 인지과학, 제7권 제4호, PP.7-24, 1996
이공주, 김재훈, 김길창. '한국어 구구조문법을 기반으로하는 확률적 구문 분석' 한국정보과학회 가을학술발표논문집, pp. 557-560, 1996
이수광, 옥철영, '확률적 문법규칙에 기반한 국어사전의 뜻풀이말 구문분석기' 정보과학회논문지, 제28권 제5호, pp.448-460. 2001
이수선, 박현재, 우요섭, '한국어 분석의 중의성 해소를 위한 하위범주화사전 구축' 한글 및 한국어 정보처리학술대회, pp. 257-264, 1999
정석원, 박의규, 나동렬, 윤준태, '격판계와 상호정보를 이용한 한국어 의존 파서' 제13회 한글 및 한국어 정보처리학술대회논문집, pp.450-456, 2001
조형준, 박종철, '결합범주문법과 구문분석' 한글 및 한국어 정보처리학술대회, pp.223-230, 1999
최용석, 이주호, 최기선, '격틀 자동구축과 격틀평가 방법에 관한 연구' 한글 및 한국어 정보처리학술대회, pp.272-279, 1999
C. Manning and H. Schutze, Foundations of Statistical Natural Language Processing, The MIT Press, 1999
D. Hindle and M. Rooth, 'Structural ambiguity and lexical relations,' Computational Linguistics, Vol.19, No.1, pp. 103-120, 1993
David Crystal. A Dictionary of Linguistics and Phonetics. Basil Blackwell, 1985
D. M. Magerman, 'Statistical decision-tree models for parsing,' Proc. of the 33rd Annual Meeting of the Assoc. for Computational Linguistics (ACL-95), pp.276-283, IB95 https://doi.org/10.3115/981658.981695
E. Black, S. Abney, D. Flickenger, C. Gdaniec, R. Grishman, P. Harrison, D. Hindle, R. Ingria, F. Jelinek, J Klavans, M. Liberman, M. Marcus, S. Roukos, B. Santorini, and T. Strzalkowski. 'A procedure for quantitatively comparing the syntactic coverage of English granunars,' Proceedings of Fourth DARPA Speech and Natural Language Workshop, pp.306-311, 1991 https://doi.org/10.3115/112405.112467
Eugene Chamiak. Parsing with context-free grammar and word statistics. Technical Report CS-95-28, Dept. of Computer Science, Brown Univ., 1995
E. Black, F. Jelinek, J Lafferty, D. M. Magerman, R. Mercer, and S. Roukos. 'Towards history-based grammars: Using richer models for probabilistic parsing,' Proc. of the 31st Annual Meeting of the Assoc. for Computational Linguistics (ACL-93), pp.31-37, 1993 https://doi.org/10.3115/981574.981579
J Eisner, 'Bilexical grammars and a cubic-time probabilistic parser,' Proceedings workshop on Parsing Technologies, pp.54-56, 1997
K. J. Sea, K. C. Nam, and K. S. Choi, 'A probabilistic model of the dependency parse for the variable-word-order languages by using ascending dependency,' Computer Processing of Orienial langauge, Vol.12, No.3. pp.309-323, 1999
M. J. Collins. 'A new statistical parser based on bigram lexical dependencies,' Proc. of the 34th Annual Meeting of the Assoc. for Computational Linguistics (ACL-96), pp.184-191, 1996 https://doi.org/10.3115/981863.981888
S. F. Chen and J. Goodman, An Empirical Study of Smoothing Techniques for Language Modeling, TR-10-98, Computer Science Group Harvard University Cambridge, Massachusetts, 1998
S. M. Katz, 'Estimation of probabilities from sparse data for the language model component of a speech recognizer,' IEEE Trans. on Acoustics, Speech and Signal Processing, ASSP-35, pp.400-401, 1987 https://doi.org/10.1109/TASSP.1987.1165125

The KIPS Transactions:PartB (정보처리학회논문지B)

Korean Probabilistic Syntactic Model using Head Co-occurrence

중심어 간의 공기정보를 이용한 한국어 확률 구문분석 모델

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)