• Title/Summary/Keyword: Typo Correction

Search Result 4, Processing Time 0.022 seconds

A Typo Correction System Using Artificial Neural Networks for a Text-based Ornamental Fish Search Engine

  • Hyunhak Song;Sungyoon Cho;Wongi Jeon;Kyungwon Park;Jaedong Shim;Kiwon Kwon
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.8
    • /
    • pp.2278-2291
    • /
    • 2023
  • Imported ornamental fish should be quarantined because they can have dangerous diseases depending on their habitat. The quarantine requires a lot of time because quarantine officers collect various information on the imported ornamental fish. Inefficient quarantine processes reduce its work efficiency and accuracy. Also, long-time quarantine causes the death of environmentally sensitive ornamental fish and huge financial losses. To improve existing quarantine systems, information on ornamental fish was collected and structured, and a server was established to develop quarantine performance support software equipped with a text search engine. However, the long names of ornamental fish in general can cause many typos and time bottlenecks when we type search words for the target fish information. Therefore, we need a technique that can correct typos. Typical typo character calibration compares input text with all characters in a calibrated candidate text dictionary. However, this approach requires computational power proportional to the number of typos, resulting in slow processing time and low calibration accuracy performance. Therefore, to improve the calibration accuracy of characters, we propose a fusion system of simple Artificial Neural Network (ANN) models and character preprocessing methods that accelerate the process by minimizing the computation of the models. We also propose a typo character generation method used for training the ANN models. Simulation results show that the proposed typo character correction system is about 6 times faster than the conventional method and has 10% higher accuracy.

A Method for Detection and Correction of Pseudo-Semantic Errors Due to Typographical Errors (철자오류에 기인한 가의미 오류의 검출 및 교정 방법)

  • Kim, Dong-Joo
    • Journal of the Korea Society of Computer and Information
    • /
    • v.18 no.10
    • /
    • pp.173-182
    • /
    • 2013
  • Typographical mistakes made in the writing process of drafts of electronic documents are more common than any other type of errors. The majority of these errors caused by mistyping are regarded as consequently still typo-errors, but a considerable number of them are developed into the grammatical errors and the semantic errors. Pseudo semantic errors among these errors due to typographical errors have more noticeable peculiarities than pure semantic errors between senses of surrounding context words within a sentence. These semantic errors can be detected and corrected by simple algorithm based on the co-occurrence frequency because of their prominent contextual discrepancy. I propose a method for detection and correction based on the co-occurrence frequency in order to detect semantic errors due to typo-errors. The co-occurrence frequency in proposed method is counted for only words with immediate dependency relation, and the cosine similarity measure is used in order to detect pseudo semantic errors. From the presented experimental results, the proposed method is expected to help improve the detecting rate of overall proofreading system by about 2~3%.

Interference Typo Correction Method by using Surrounding Word N-gram and Syllable N-gram (좌우 어절 N-gram 및 음절 N-gram을 이용한 간섭 오타 교정 방법)

  • Son, Sung-Hwan;Kang, Seung-Shik
    • Annual Conference on Human and Language Technology
    • /
    • 2019.10a
    • /
    • pp.496-499
    • /
    • 2019
  • 스마트폰의 쿼티 자판 소프트 키보드의 버튼과 버튼 사이 좁은 간격으로 인해 사용자가 의도치 않은 간섭 오타가 발생하는 것에 주목하였다. 그리고 오타 교정의 성능은 사용자의 관점에서 얼마나 잘 오타를 교정하느냐도 중요한 부분이지만, 또한 오타가 아닌 어절을 그대로 유지하는 것이 더 중요하게 판단될 수 있다. 왜냐하면 현실적으로 오타인 어절 보다 오타가 아닌 어절이 거의 대부분을 차지하기 때문이다. 따라서 해당 관점에서 교정 방법을 바라보고 연구할 필요가 있다. 이에 맞춰 본 논문에서는 대용량 한국어 말뭉치 데이터를 가지고 확률에 기반한 한국어 간섭 오타 수정 방법에 대해 제안한다. 제안하는 방법은 목표 어절의 좌우 어절 N-gram과 어절 내 좌우 음절 N-gram 정보를 바탕으로 발생할 수 있는 간섭 오타 교정 후보들 중 가운데서 가장 적합한 후보 어절을 선택하는 방법이다.

  • PDF

Methodology of Automatic Editing for Academic Writing Using Bidirectional RNN and Academic Dictionary (양방향 RNN과 학술용어사전을 이용한 영문학술문서 교정 방법론)

  • Roh, Younghoon;Chang, Tai-Woo;Won, Jongwun
    • The Journal of Society for e-Business Studies
    • /
    • v.27 no.2
    • /
    • pp.175-192
    • /
    • 2022
  • Artificial intelligence-based natural language processing technology is playing an important role in helping users write English-language documents. For academic documents in particular, the English proofreading services should reflect the academic characteristics using formal style and technical terms. But the services usually does not because they are based on general English sentences. In addition, since existing studies are mainly for improving the grammatical completeness, there is a limit of fluency improvement. This study proposes an automatic academic English editing methodology to deliver the clear meaning of sentences based on the use of technical terms. The proposed methodology consists of two phases: misspell correction and fluency improvement. In the first phase, appropriate corrective words are provided according to the input typo and contexts. In the second phase, the fluency of the sentence is improved based on the automatic post-editing model of the bidirectional recurrent neural network that can learn from the pair of the original sentence and the edited sentence. Experiments were performed with actual English editing data, and the superiority of the proposed methodology was verified.