• Title, Summary, Keyword: Transliteration

Search Result 50, Processing Time 0.037 seconds

Using Semantic Knowledge in the Uyghur-Chinese Person Name Transliteration

  • Murat, Alim;Osman, Turghun;Yang, Yating;Zhou, Xi;Wang, Lei;Li, Xiao
    • Journal of Information Processing Systems
    • /
    • v.13 no.4
    • /
    • pp.716-730
    • /
    • 2017
  • In this paper, we propose a transliteration approach based on semantic information (i.e., language origin and gender) which are automatically learnt from the person name, aiming to transliterate the person name of Uyghur into Chinese. The proposed approach integrates semantic scores (i.e., performance on language origin and gender detection) with general transliteration model and generates the semantic knowledge-based model which can produce the best candidate transliteration results. In the experiment, we use the datasets which contain the person names of different language origins: Uyghur and Chinese. The results show that the proposed semantic transliteration model substantially outperforms the general transliteration model and greatly improves the mean reciprocal rank (MRR) performance on two datasets, as well as aids in developing more efficient transliteration for named entities.

Phonics-based Rules for Improving Performance of English-to-Korean Transliteration (영.한 음차 표기 성능 향상을 위한 음철법 기반 규칙 구축)

  • Kim, Min-Jeong;Hong, Gum-Won;Park, So-Young;Rim, Hae-Chang
    • Phonetics and Speech Sciences
    • /
    • v.1 no.4
    • /
    • pp.133-144
    • /
    • 2009
  • This paper presents a method for constructing and using transliteration rules which are based on Phonics, an instructional method for speaking and writing English letters. Conventional approaches to automatic transliteration often focused on statistical methods. However, the construction or the collection of correct transliteration examples is always the bottleneck of the statistical transliteration model. Also, in practical domains where the collection of such data is very difficult, such as education and tourism, it is reasonable to build a system without much qualified data. Furthermore, compared with Korean orthography of borrowed foreign words, the proposed approach is much easier to construct, and can generate more refined rules. The experimentation result shows that the proposed approach can improve the performance of a statistical-based transliteration system.

  • PDF

A Probabilistic Context Sensitive Rewriting Method for Effective Transliteration Variants Generation (효과적인 외래어 이형태 생성을 위한 확률 문맥 의존 치환 방법)

  • Lee, Jae-Sung
    • The Journal of the Korea Contents Association
    • /
    • v.7 no.2
    • /
    • pp.73-83
    • /
    • 2007
  • An information retrieval system, using exact match, needs preprocessing or query expansion to generate transliteration variants in order to search foreign word transliteration variants in the documents. This paper proposes an effective method to generate other transliteration variants from a given transliteration. Because simple rewriting of confused characters produces too many false variants, the proposed method controls the generation priority by learning confusion patterns from real uses and calculating their probability. Especially, the left and right context of a pattern is considered, and local rewriting probability and global rewriting probability are calculated to produce more probable variants in earlier stage. The experimental result showed that the method was very effective by showing more than 80% recall with top 20 generations for a transliteration variants set collected from KT SET 2.0.

Building English-to-Korean Transliteration Dictionary Based on Pronouncing Dictionary (발음 사전에 기반한 영.한 음차 표기 사전의 구축)

  • Lee, Do-Gil
    • Phonetics and Speech Sciences
    • /
    • v.1 no.3
    • /
    • pp.103-108
    • /
    • 2009
  • This paper proposes a method for building a transliteration dictionary, which is based on pronouncing information extracted from two kinds of existing dictionaries. Also, it proposes a method for transforming the pronouncing information into Korean translitered words. To express the pronouncing information, we define Phoman code system. In order to avoid phonetic estimation process of English words which is the most important problem, the proposed method uses the pronouncing information extracted from the existing dictionaries. Therefore, unlike previous approaches, the proposed method does not need any incomplete phonetic estimation process so that it can produce accurate transliteration results. The proposed method has been fully implemented.

  • PDF

Automatic Extraction of English-Chinese Transliteration Pairs using Dynamic Window and Tokenizer (동적 윈도우와 토크나이저를 이용한 영-중 음차표기 대역쌍 자동 추출)

  • Jin, Cheng-Guo;Na, Seung-Hoon;Kim, Dong-Il;Lee, Jong-Hyeok
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.13 no.6
    • /
    • pp.417-421
    • /
    • 2007
  • Recently, many studies have focused on extracting transliteration pairs from bilingual texts. Most of these studies are based on the statistical transliteration model. The paper discusses the limitations of previous approaches and proposes novel approaches called dynamic window and tokenizer to overcome these limitations. Experimental results show that the average rates of word and character precision are 99.0% and 99.78%, respectively.

Retrieving English Words with a Spoken Work Transliteration (입말 표기를 이용한 영어 단어 검색)

  • Kim Ji-Seoung;Kim Kwang-Hyun;Lee Joon-Ho
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.39 no.3
    • /
    • pp.93-103
    • /
    • 2005
  • Users of searching Internet English dictionary sometimes do not know the correct spelling of the word in mind, but remember only its pronunciation. In order to help these users, we propose a method to retrieve English words effectively with a spoken word transliteration that is a Korean transliteration of English word pronunciation. We develop KONIX codes and transform a spoken word transliteration and English words into them. We then calculate the phonetic similarity between KONIX codes using edit distance and 2-gram methods. Experimental results show that the proposed method is very effective for retrieving English words with a spoken word transliteration.

An English-to-Korean Transliteration Model based on Grapheme and Phoneme (자소 및 음소 정보를 이용한 영어-한국어 음차표기 모델)

  • Oh Jong-Hoon;Choi Key-Sun
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.4
    • /
    • pp.312-326
    • /
    • 2005
  • There has been increasing interest in English-to-Korean transliteration recently. Previous ,works are related to a direct method like $\rightarrow$Korean graphemes> and a pivot method like $\rightarrow$English phoneme$\rightarrow$Korean graphemes>. Though most of the previous works focus on the direct method, transliteration, however, is a phonetic process rather than an orthographic one. In this point of view, we present an English-Korean transliteration model using grapheme and phoneme information. Unlike the previous works, our method uses phonetic information such as phonemes and their context. Moreover, we also use graphemes corresponding to phonemes. Our method shows about $60\%$ word accuracy.

Two Ways of the Romanization of Korean - Transliteration of Hanngul and the Transcription of Korean Sounds - (한글 로마자 번자법(飜字法)과 우리말 로마자 표음법(表音法) - 두 가지 서로 다른 표기방식 대비예시(對比例示)를 곁들여 -)

  • Youe Mahn Gunn
    • MALSORI
    • /
    • no.35_36
    • /
    • pp.63-76
    • /
    • 1998
  • The writer discusses the necessity of clear distinction between transliteration and transcription. Romanization problems in Korea have been entangled for decades by confusing and mixing those two. For the transliteration of Hanngul a new system with the utmost simplicity and perfect convertibility is suggested here. For the transcription of Korean sounds another system is suggested which can transcribe even the chroneme as well as all the phonemes. So it surpasses the current Hanngul orthography. Korean sentences containing many pairs of homographic heteronyms are romanized in the two ways side by side for the contrasting of the two systems.

  • PDF

Some Characteristics of Hanmal and Hangul from the viewpoint of Processing Hangul Information on Computers

  • Kim, Kyong-Sok
    • Proceedings of the KSPS conference
    • /
    • /
    • pp.456-463
    • /
    • 1996
  • In this paper, we discussed three cases to see the effects of the characteristics of Hangul writing system. In applications such as computer Hangul shorthands for ordinary people and pushbuttons with Hangul characters engraved, we found that there is much advantage in using Hangul. In case of Hangul Transliteration, we discussed some problems which are related with the characteristics of Hangul writing system. Shorthands use 3-set keyboards in England, America, and Korea. We saw how ordinary people can do computer Hangul shorthands, whereas only experts can do computer shorthands in other countries. Specifically, the facts that 1) Hangul characters are grouped into syllables (syllabic blocks) and that 2) there is already a 3-set Hangul keyboard for ordinary people allow ordinary people to do computer Hangul shorthands without taking special training as with English shorthands. This study was done by the author under the codename of 'Sejong 89'. In contrast like QWERTY or DVORAK, a 2-set Hangul keyboard cannot be used for shorthands. In case of English pushbuttons, one digit is associated with only one character. However, by engraving only syllable-initial characters on the phone pushbuttons, we can associate one Hangul "syllable" with one digit. Therefore, for a given number of digits, we can associate longer words or more meaningful words in Hangul than in English. We discussed the problems of the Hangul Transliteration system proposed by South Korea and suggested their solutions, if available. 1) We are incorrectly using the framework of transcription for transliteration. To solve the problem, the author suggests that a) we include all complex characters in the transliteration table, and that b) we specify syllable-initial and -final characters separately in the table. 2) The proposed system cannot represent independent characters and incomplete syllables. 3) The proposed system cannot distinguish between syllable-initial and -final characters.

  • PDF

The Refinement Effect of Foreign Word Transliteration Query on Meta Search (메타 검색에서 외래어 질의 정제 효과)

  • Lee, Jae-Sung
    • The KIPS Transactions:PartB
    • /
    • v.15B no.2
    • /
    • pp.171-178
    • /
    • 2008
  • Foreign word transliterations are not consistently used in documents, which hinders retrieving some important relevant documents in exact term matching information retrieval systems. In this paper, a meta search method is proposed, which expands and refines relevant variant queries from an original input foreign word transliteration query to retrieve the more relevant documents. The method firstly expands a transliteration query to the variants using a statistical method. Secondly the method selects the valid variants: it queries each variant to the retrieval systems beforehand and checks the validity of each variant by counting the number of appearance of the variant in the retrieved document and calculating the similarity of the context of the variant. Experiment result showed that querying with the variants produced at the first step, which is a base method of the test, performed 38% in average F measure, and querying with the refined variants at the second step, which is a proposed method, significantly improved the performance to 81% in average F measure.