• Title/Summary/Keyword: Translation Disambiguation

Search Result 18, Processing Time 0.022 seconds

An Evaluation of Translation Quality by Homograph Disambiguation in Korean-X Neural Machine Translation Systems (한-X 신경기계번역시스템에서 동형이의어 분별에 따른 변역질 평가)

  • Nguyen, Quang-Phuoc;Shin, Joon-Choul;Ock, Cheol-Young
    • Annual Conference on Human and Language Technology
    • /
    • 2018.10a
    • /
    • pp.504-509
    • /
    • 2018
  • Neural machine translation (NMT) has recently achieved the state-of-the-art performance. However, it is reported failing in the word sense disambiguation (WSD) for several popular language pairs. In this paper, we explore the extent to which NMT systems are able to disambiguate the Korean homographs. Homographs, words with different meanings but the same written form, cause the word choice problems for NMT systems. Consistent with the popular language pairs, we discover that NMT systems fail to translate Korean homographs correctly. We provide a Korean word sense disambiguation tool-UTagger to use for improvement of NMT's translation quality. We conducted translation experiments using Korean-English and Korean-Vietnamese language pairs. The experimental results show that UTagger can significantly improve the translation quality of NMT in terms of the BLEU, TER, and DLRATIO evaluation metrics.

  • PDF

Translation Disambiguation Based on 'Word-to-Sense and Sense-to-Word' Relationship (`단어-의미 의미-단어` 관계에 기반한 번역어 선택)

  • Lee Hyun-Ah
    • The KIPS Transactions:PartB
    • /
    • v.13B no.1 s.104
    • /
    • pp.71-76
    • /
    • 2006
  • To obtain a correctly translated sentence in a machine translation system, we must select target words that not only reflect an appropriate meaning in a source sentence but also make a fluent sentence in a target language. This paper points out that a source language word has various senses and each sense can be mapped into multiple target words, and proposes a new translation disambiguation method based on this 'word-to-sense and sense-to-word' relationship. In my method target words are chosen through disambiguation of a source word sense and selection of a target word. Most of translation disambiguation methods are based on a 'word-to-word' relationship that means they translate a source word directly into a target wort so they require complicate knowledge sources that directly link a source words to target words, which are hard to obtain like bilingual aligned corpora. By combining two sub-problems for each language, knowledge for translation disambiguation can be automatically extracted from knowledge sources for each language that are easy to obtain. In addition, disambiguation results satisfy both fidelity and intelligibility because selected target words have correct meaning and generate naturally composed target sentences.

Noun Sense Identification of Korean Nominal Compounds Based on Sentential Form Recovery

  • Yang, Seong-Il;Seo, Young-Ae;Kim, Young-Kil;Ra, Dong-Yul
    • ETRI Journal
    • /
    • v.32 no.5
    • /
    • pp.740-749
    • /
    • 2010
  • In a machine translation system, word sense disambiguation has an essential role in the proper translation of words when the target word can be translated differently depending on the context. Previous research on sense identification has mostly focused on adjacent words as context information. Therefore, in the case of nominal compounds, sense tagging of unit nouns mainly depended on other nouns surrounding the target word. In this paper, we present a practical method for the sense tagging of Korean unit nouns in a nominal compound. To overcome the weakness of traditional methods regarding the data sparseness problem, the proposed method adopts complement-predicate relation knowledge that was constructed for machine translation systems. Our method is based on a sentential form recovery technique, which recognizes grammatical relationships between unit nouns. This technique makes use of the characteristics of Korean predicative nouns. To show that our method is effective on text in general domains, the experiments were performed on a test set randomly extracted from article titles in various newspaper sections.

Comparison Thai Word Sense Disambiguation Method

  • Modhiran, Teerapong;Kruatrachue, Boontee;Supnithi, Thepchai
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 2004.08a
    • /
    • pp.1307-1312
    • /
    • 2004
  • Word sense disambiguation is one of the most important problems in natural language processing research topics such as information retrieval and machine translation. Many approaches can be employed to resolve word ambiguity with a reasonable degree of accuracy. These strategies are: knowledge-based, corpus-based, and hybrid-based. This paper pays attention to the corpus-based strategy. The purpose of this paper is to compare three famous machine learning techniques, Snow, SVM and Naive Bayes in Word-Sense Disambiguation on Thai language. 10 ambiguous words are selected to test with word and POS features. The results show that SVM algorithm gives the best results in solving of Thai WSD and the accuracy rate is approximately 83-96%.

  • PDF

Corpus-Based Ontology Learning for Semantic Analysis (의미 분석을 위한 말뭉치 기반의 온톨로지 학습)

  • 강신재
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.9 no.1
    • /
    • pp.17-23
    • /
    • 2004
  • This paper proposes to determine word senses in Korean language processing by corpus-based ontology learning. Our approach is a hybrid method. First, we apply the previously-secured dictionary information to select the correct senses of some ambiguous words with high precision, and then use the ontology to disambiguate the remaining ambiguous words. The mutual information between concepts in the ontology was calculated before using the ontology as knowledge for disambiguating word senses. If mutual information is regarded as a weight between ontology concepts, the ontology can be treated as a graph with weighted edges, and then we locate the least weighted path from one concept to the other concept. In our practical machine translation system, our word sense disambiguation method achieved a 9% improvement over methods which do not use ontology for Korean translation.

  • PDF

A Hybrid Method of Verb disambiguation in Machine Translation (기계번역에서 동사 모호성 해결에 관한 하이브리드 기법)

  • Moon, Yoo-Jin;Martha Palmer
    • The Transactions of the Korea Information Processing Society
    • /
    • v.5 no.3
    • /
    • pp.681-687
    • /
    • 1998
  • The paper presents a hybrid mcthod for disambiguation of the verb meaning in the machine translation. The presented verb translation algorithm is to perform the concept-based method and the statistics-based method simultaneously. It uses a collocation dictionary, WordNct and the statistical information extracted from corpus. In the transfer phase of the machine translation, it tries to find the target word of the source verb. If it fails, it refers to Word Net to try to find it by calculating word similarities between the logical constraints of the source sentence and those in the collocation dictionary. At the same time, it refers to the statistical information extracted from corpus to try to find it by calculating co-occurrence similarity knowledge. The experimental result shows that the algorithm performs more accurate verb translation than the other algorithms and improves accuracy of the verb translation by 24.8% compared to the collocation-based method.

  • PDF

A Survey of Machine Translation and Parts of Speech Tagging for Indian Languages

  • Khedkar, Vijayshri;Shah, Pritesh
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.4
    • /
    • pp.245-253
    • /
    • 2022
  • Commenced in 1954 by IBM, machine translation has expanded immensely, particularly in this period. Machine translation can be broken into seven main steps namely- token generation, analyzing morphology, lexeme, tagging Part of Speech, chunking, parsing, and disambiguation in words. Morphological analysis plays a major role when translating Indian languages to develop accurate parts of speech taggers and word sense. The paper presents various machine translation methods used by different researchers for Indian languages along with their performance and drawbacks. Further, the paper concentrates on parts of speech (POS) tagging in Marathi dialect using various methods such as rule-based tagging, unigram, bigram, and more. After careful study, it is concluded that for machine translation, parts of speech tagging is a major step. Also, for the Marathi language, the Hidden Markov Model gives the best results for parts of speech tagging with an accuracy of 93% which can be further improved according to the dataset.

Ontology Construction and Its Application to Disambiguate Word Senses (온톨로지 구축 및 단어 의미 중의성 해소에의 활용)

  • Kang, Sin-Jae
    • The KIPS Transactions:PartB
    • /
    • v.11B no.4
    • /
    • pp.491-500
    • /
    • 2004
  • This paper presents an ontology construction method using various computational language resources, and an ontology-based word sense disambiguation method. In order to acquire a reasonably practical ontology the Kadokawa thesaurus is extended by inserting additional semantic relations into its hierarchy, which are classified as case relations and other semantic relations. To apply the ontology to disambiguate word senses, we apply the previously-secured dictionary information to select the correct senses of some ambiguous words with high precision, and then use the ontology to disambiguate the remaining ambiguous words. The mutual information between concepts in the ontology was calculated before using the ontology as knowledge for disambiguating word senses. If mutual information is regarded as a weight between ontology concepts, the ontology can be treated as a graph with weighted edges, and then we locate the weighted path from one concept to the other concept. In our practical machine translation system, our word sense disambiguation method achieved a 9% improvement over methods which do not use ontology for Korean translation.

An Alignment based technique for Text Translation between Traditional Chinese and Simplified Chinese

  • Sue J. Ker;Lin, Chun-Hsien
    • Proceedings of the Korean Society for Language and Information Conference
    • /
    • 2002.02a
    • /
    • pp.147-156
    • /
    • 2002
  • Aligned parallel corpora have proved very useful in many natural language processing tasks, including statistical machine translation and word sense disambiguation. In this paper, we describe an alignment technique for extracting transfer mapping from the parallel corpus. During building our system and data collection, we observe that there are three types of translation approaches can be used. We especially focuses on Traditional Chinese and Simplified Chinese text lexical translation and a method for extracting transfer mappings for machine translation.

  • PDF

English Syntactic Disambiguation Using Parser's Ambiguity Type Information

  • Lee, Jae-Won;Kim, Sung-Dong;Chae, Jin-Seok;Lee, Jong-Woo;Kim, Do-Hyung
    • ETRI Journal
    • /
    • v.25 no.4
    • /
    • pp.219-230
    • /
    • 2003
  • This paper describes a rule-based approach for syntactic disambiguation used by the English sentence parser in E-TRAN 2001, an English-Korean machine translation system. We propose Parser's Ambiguity Type Information (PATI) to automatically identify the types of ambiguities observed in competing candidate trees produced by the parser and synthesize the types into a formal representation. PATI provides an efficient way of encoding knowledge into grammar rules and calculating rule preference scores from a relatively small training corpus. In the overall scoring scheme for sorting the candidate trees, the rule preference scores are combined with other preference functions that are based on statistical information. We compare the enhanced grammar with the initial one in terms of the amount of ambiguity. The experimental results show that the rule preference scores could significantly increase the accuracy of ambiguity resolution.

  • PDF