Automatically Extracting Unknown Translations Using Phrase Alignment

Kim, Jae-Hoon;Yang, Sung-Il;

doi:10.3745/KIPSTB.2007.14-B.3.231

정보처리학회논문지B (The KIPS Transactions:PartB)

제14B권3호
/
Pages.231-240
/
2007
/
1598-284X(pISSN)

한국정보처리학회 (Korea Information Processing Society)

DOI QR Code

정렬기법을 이용한 미등록 대역어의 자동 추출

Automatically Extracting Unknown Translations Using Phrase Alignment

김재훈 (한국한양대학교 컴퓨터공학과) ;
양성일 (한국전자통신연구원 언어처리연구팀)

발행 : 2007.06.30

https://doi.org/10.3745/KIPSTB.2007.14-B.3.231 인용 PDF KSCI

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

이 논문은 정렬 기법을 이용한 미등록 대역어 추출 모델을 제안하고 그 추출 시스템을 구현한다. 제안된 미등록 대역어 추출 모델은 일종의 구절정렬 모델로서 경계모델과 언어모델 그리고 번역 모델로 구성된다. 제안된 추출 시스템은 병렬말뭉치 구축, 단어정렬, 미등록어 추출로 구성된다. 이 논문에서는 제안된 시스템을 평가하기 위해서 약 1,500여 개의 미등록어가 포함된 2,200문장의 평가말뭉치를 구축하여 다양한 실험을 수행하였다. 실험을 통해서 제안된 모델이 미등록 대역어 추출에 매우 유용함을 알 수 있었다. 앞으로 좀 더 객관적인 평가를 위해 대량의 평가말뭉치 구축이 선행되어야 하며 좀 더 양질의 병렬말뭉치의 구축이 필요할 것이다. 또한 미등록어 추출 모델을 개선하기 다양한 연구가 추진되어야 할 것이다.

In this paper, we propose an automatic extraction model for unknown translations and implement an unknown translation extraction system using the proposed model. The proposed model as a phrase-alignment model is incorporated with three models: a phrase-boundary model, a language model, and a translation model. Using the proposed model we implement the system for extracting unknown translations, which consists of three parts: construction of parallel corpora, alignment of Korean and English words, extraction of unknown translations. To evaluate the performance of the proposed system we have established the reference corpus for extracting unknown translation, which comprises of 2,220 parallel sentences including about 1,500 unknown translations. Through several experiments, we have observed that the proposed model is very useful for extracting unknown translations. In the future, researches on objective evaluation and establishment of parallel corpora with good quality should be performed and studies on improving the performance of unknown translation extraction should be kept up.

키워드

참고문헌

Hutchins, W. J. and Somers, H. L., An Introduction to Machine Translation, Academic Press Limited, 1992
Papineni, K. Roukos, S. Ward, Todd, Zhu, W. J., BLEU: A Method for Automatic Evaluation of Machine Translation, IBM Research Report RC22176, 2001
NIST 2006 Machine Translation Evaluation Official Results, http://www.nist.gov/speech/tests/mt/mt06eval_official_results.html, 2006
Arnold, D. J., Balkan, L., Meijer, S., Humphreys, R. L. and Sadler, L., Machine Translation: an Introductory Guide, Blackwells-NCC, London, 1994
Rey, A., Eassys on Terminology, John Benjamins, 1997
Sinha, R. M. K., 'Interpreting Unknown Words in Machine Translation from Hindi to English', Proceeding of Computational Intelligence, pp.278-282, 2005
이연호, 김금희, 이홍윤, 유병기, 김규웅, 이영교, 임인칠, '한-일 기계번역 시스템의 관용구 및 미등록어 처리 알고리즘', 대한전자공학회 학술대회 논문집, 제14권, 1호, pp.201-204, 1991
Manning, C. D. and Schutze, H., Foundation of Statistical Natural Language Processing, The MIT Press, 1999
Resnik, P. and Smith N.A., 'The web as a parallel corpus', Computational Linguistics, vo. 29, no. 3, pp.349-380, 2003 https://doi.org/10.1162/089120103322711578
Kilgarriff, A. and Grefenstette, G., 'Introduction to the Special Issue on the Web as Corpus'. Computational Linguistics, vol. 29, no. 3, pp.333-347, 2003 https://doi.org/10.1162/089120103322711569
Gale, W. A. and Church, K. W., 'A program for aligning sentences in bilingual corpora', Computational Linguistics, vol. 19, no. 1, pp.75-102, 1993
Brown, P., Della Pietra, V., Della Pietra, S., and Mercer, R., 'The mathematics of statistical machine translation: Parameter estimation', Computational Linguistics, vol. 19, no. 2, pp.263-311, 1993
Smadja, F., McKeown, K. R. and Hatzivassiloglou, V., 'Translating collocations for bilingual lexicons: A statistical approach', Computational Linguistics, vol. 22, no. 1, pp.1-38, 1996
Diab, M. 'An unsupervised method for word sense tagging using parallel corpora: A preliminary investigation', Special Interest Group in Lexical Semantics Workshop, Association for Computational Linguistics, 2000
Zhang, Y. and Vogel, S., 'An efficient phrase-to-phrase alignment model for arbitrarily long phrase and large corpora', Proceedings of the Tenth Conference of the European Association for Machine Translation, pp.294-301, 2005
Och, F. J. and Ney, H., 'The alignment template approach to statistical machine translation', Computational Linguistics, vol. 30, no. 4. pp.417-449, 2004 https://doi.org/10.1162/0891201042544884
Wu, D. 'Stochastic inversion transduction grammars and bilingual parsing of parallel corpora', Computational Linguistics, vol. 23, no. 3, pp.377-403, 1997
Yamada, K. and Knight, K. 'A syntax-based statistical translation model', Proceedings of the 39th Annual Conference of the Association for Computational Linguistics, pp.523-530, 2001 https://doi.org/10.3115/1073012.1073079
Ion, R., Ceausu, A. and Tufs, D. 'Dependency-based phrase alignment', Proceedings of the Fifth International Conference on Language Resources and Evaluation, pp.1290-1293 2006
Gale, W. and Church, K. 'Identifying word correspondence in parallel text', Proceedings of the workshop on Speech and Natural Language, pp.152-157, 1991 https://doi.org/10.3115/112405.112428
Fung, P. and Church, K. 'K-vec: A new approach for aligning parallel texts', Proceedings of COLING 94, pp.1096-1102, 1994 https://doi.org/10.3115/991250.991328
Hiemstra, D. 'Multilingual domain modeling In Twenty-One: automatic creation of a bi-directional translation lexicon from a parallel corpus', Proceedings of the 8th CLIN meeting, pp.41-58, 1998
Fung, P. 'A statistical view of bilingual lexicon extraction: From parallel corpora to nonparallel corpora', Proceedings of the Third Conference of the Association for Machine Translation in the Americas, pp.1-16, 1998 https://doi.org/10.1007/3-540-49478-2_1
Varma, N. Identifying Word Translation in Parallel Corpora Using Measures of Association, Master Thesis, Department of Computer Science, University of Minnesota, USA, 2002
Koehn, P. Noun Phrase Translation, PhD. Thesis, University of Southern California, 2003
Callison-Burch, C., Koehn, P. and Osborne, M. 'Improved statistical machine translation using paraphrases', Proceedings of the Human Language Technology Conference of the North American Chapter of the ACL, pp.17-24, 2006 https://doi.org/10.3115/1220835.1220838
Kim, C.-H. and Hong, M. 'A Korean syntactic parser customized for Korean-English patent MT system', Proceedings of the 5th International Conference on Natural Language, pp.44-55, 2006 https://doi.org/10.1007/11816508_7
서형원, 김형철, 조희영, 김재훈, 양성일, '웹 문서로부터 한영 병렬말뭉치의 자동 구축', 제26회 한국정보처리학회 추계학술대회 논문집, 제13권, 제2호, pp.161-164, 2006
조희영, 서형원, 김재훈, 양성일, '한영 명사구 기계 번역', 제18회 한글 및 한국어 정보처리 학술대회 발표 논문집, pp.273-278, 2006
Stolcke, A. 'SRILM-An extensible language modeling toolkit', Proceedings of Intl. Conf. on Spoken Language Processing, vol. 2, pp.901-904, 2002
Crego, J.M., Marino, J. B., Gispert, A. 'An ngram-based statistical machine translation decoder', Proceedings of the 9th European Conference on Speech Communication and Technology, pp.3193-3196, 2005

정보처리학회논문지B (The KIPS Transactions:PartB)

정렬기법을 이용한 미등록 대역어의 자동 추출

Automatically Extracting Unknown Translations Using Phrase Alignment

초록

키워드

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)