Search | Korea Science

Lee, Se-Hee;Kim, Hark-Soo
- The KIPS Transactions:PartB
- /
- v.16B no.5
- /
- pp.427-434
- /
- 2009
With the rapid evolution of the Internet and mobile environments, text including spelling errors such as newly-coined words and abbreviated words are widely used. These spelling errors make it difficult to develop NLP (natural language processing) applications because they decrease the readability of texts. To resolve this problem, we propose a spelling error correction model using a spelling error correction dictionary and a newspaper corpus. The proposed model has the advantage that the cost of data construction are not high because it uses a newspaper corpus, which we can easily obtain, as a training corpus. In addition, the proposed model has an advantage that additional external modules such as a morphological analyzer and a word-spacing error correction system are not required because it uses a simple string matching method based on a correction dictionary. In the experiments with a newspaper corpus and a short message corpus collected from real mobile phones, the proposed model has been shown good performances (a miss-correction rate of 7.3%, a F1-measure of 97.3%, and a false positive rate of 1.1%) in the various evaluation measures.
https://doi.org/10.3745/KIPSTB.2009.16B.5.427 인용 PDF KSCI

Noh, Hyung-Jong;Cha, Jeong-Won;Lee, GaryGeun-Bae
- Journal of KIISE:Software and Applications
- /
- v.34 no.2
- /
- pp.131-139
- /
- 2007
In this paper, we present a preprocessor which corrects word spacing errors and spelling correction errors simultaneously. The proposed expands noisy-channel model so that it corrects both errors in colloquial style sentences effectively, while preprocessing algorithms have limitations because they correct each error separately. Using Eojeol transition pattern dictionary and statistical data such as n-gram and Jaso transition probabilities, it minimizes the usage of dictionaries and produces the corrected candidates effectively. In experiments we did not get satisfactory results at current stage, we noticed that the proposed methodology has the utility by analyzing the errors. So we expect that the preprocessor will function as an effective error corrector for general colloquial style sentence by doing more improvements.
PDF KSCI

Shin, Hyun-Kyung
- The KIPS Transactions:PartD
- /
- v.18D no.1
- /
- pp.67-80
- /
- 2011
On an automated business document processing system maintaining financial data, errors on query based retrieval of numbers are critical to overall performance and usability of the system. Automatic spelling correction methods have been emerged and have played important role in development of information retrieval system. However scope of the methods was limited to the symbols, for example alphabetic letter strings, which can be reserved in the form of trainable templates or custom dictionary. On the other hand, numbers, a sequence of digits, are not the objects that can be reserved into a dictionary but a pure markov sequence. In this paper we proposed a new OCR model for spelling correction for numbers using the multiple streams and the context based correction on top of probabilistic information retrieval framework. We implemented the proposed error correction model as a sub-module and integrated into an existing automated invoice document processing system. We also presented the comparative test results that indicated significant enhancement of overall precision of the system by our model.
https://doi.org/10.3745/KIPSTD.2011.18D.1.067 인용 PDF KSCI