JOURNAL BROWSE
Search
Advanced SearchSearch Tips
Automatic Correction of Errors in Annotated Corpus Using Kernel Ripple-Down Rules
facebook(new window)  Pirnt(new window) E-mail(new window) Excel Download
  • Journal title : Journal of KIISE
  • Volume 43, Issue 6,  2016, pp.636-644
  • Publisher : Korean Institute of Information Scientists and Engineers
  • DOI : 10.5626/JOK.2016.43.6.636
 Title & Authors
Automatic Correction of Errors in Annotated Corpus Using Kernel Ripple-Down Rules
Park, Tae-Ho; Cha, Jeong-Won;
 
 Abstract
Annotated Corpus is important to understand natural language using machine learning method. In this paper, we propose a new method to automate error reduction of annotated corpora. We use the Ripple-Down Rules(RDR) for reducing errors and Kernel to extend RDR for NLP. We applied our system to the Korean Wikipedia and blog corpus errors to find the annotated corpora error type. Experimental results with various views from the Korean Wikipedia and blog are reported to evaluate the effectiveness and efficiency of our proposed approach. The proposed approach can be used to reduce errors of large corpora.
 Keywords
morphological annotation corpora;error correction;kernel RDR;natural language processing;
 Language
Korean
 Cited by
 References
1.
J. Hong, J. Cha, "Error Correction of Sejong Morphological Annotation Corpora using Part-of-Speech Tagger and Frequency Information," Journal of KIISE. SA, ISSN:1226-2285, Vol. 40, No. 7, pp. 417-428, 2013.

2.
M. Choi, H. Seo, H. Kwon and J. Kim, "Detecting and correcting errors in Korean POS-tagged corpora," Journal of the Korean Society of Marine Engineering, Vol. 37, No. 2, pp. 227-235, 2013. crossref(new window)

3.
Wu. X., "Knowledge acquisition from database," Ablex Publishing Corp., USA, 1995.

4.
Zhu. X., Wu. X. and Chen Q., "Eliminating Class Noise in Large Datasets," Proc. of the 20th ICML International Conference on Machine Learning (ICML 2003). Washington D. C., Vol. 3, pp. 920-927, 2003.

5.
Zhu. X., Wu. X. and Chen. Q., "Bridging Local and Gobal Data Cleansing: Identifying Class Noise in Large," Distributed Data Datasets, Data Mining and Knowledge Discovery, pp. 275-308, Dec. 2006.

6.
Guyon, Isabelle, Matic. N. and Vapnik. V., "Discovering informative patterns and data cleaning," Advances in Knpwledge Discovery and Data Mining, AAAI/MIT Press, pp. 181-203, 1996.

7.
Gamberger, Dragan, Lavrac. N. and Groselj. C., "Experiments with noise filtering in a medical domain," Proc. of 16th ICML Conference, pp. 143-151, San Francisco, CA, 1999.

8.
John, G. H., "Robust decision trees: Removing outliers from databases," Proc. of the First International Conference on Knowledge Discovery and Data Mining, pp. 174-179, AAAI Press, 1995.

9.
Zeng, Xinchuan and Martinez. T., "A noise filtering method using neural networks," SCIMA 2003. IEEE International Workshop on Soft Computing Techniques in Instrumentation, Measurement and Related Applications, pp. 26-31, 17, May 2003.

10.
Edwards, G., and Compton, P., "Peirs: A pathologist maintained expert system for the interpretation of chemical pathology reports," Pathology, Vol. 25, No. 1, pp. 27-34, 1993. crossref(new window)

11.
Edwards, G., and Compton, P., "Experience with Ripple-Down Rules," Knowledge-Based System Journal, Vol. 19, Issue 5, pp. 356-362, 2006. crossref(new window)

12.
Cao, T.M. and Compton, P. A., "Simulation Framework for Knowledge Acquisition Evaluation," Twenty- Eighth Australasian Computer Science Conference ACSC2005. Newcastle, Vol. 38, pp. 353-360, 2005.

13.
Ghassan Beydoun, PhD Thesis, "Incremental Knowledge Acquisition for Search Control Heuristics," UNSW, 2000.

14.
Edwards and Compton. (2007. May 09). [Online]. Available: http://www.cse.unsw.edu.au/-cs9416/06s1/lectures/rdr/RDR_slides.pdf(downloaded 2016. Apr. 7)

15.
Nguyen, D. Q., Nguyen, D. Q., Pham, D. D., & Pham, S. B., "RDRPOSTagger: A Ripple Down Rules-based Part-Of-Speech Tagger," EACL'14, pp. 17-20. 2014.