Search | Korea Science

Lee, Jong-Chan;Lee, Won-Don
- Journal of the Korea Institute of Information and Communication Engineering
- /
- v.14 no.1
- /
- pp.53-62
- /
- 2010
This paper introduces a classification algorithm which can be applied to a learning problem with incomplete data sets, missing variable values or a class value. This algorithm uses a data expansion method which utilizes weighted values and probability techniques. It operates by extending a classifier which are considered to be in the optimal projection plane based on Fisher's formula. To do this, some equations are derived from the procedure to be applied to the data expansion. To evaluate the performance of the proposed algorithm, results of different measurements are iteratively compared by choosing one variable in the data set and then modifying the rate of missing and non-missing values in this selected variable. And objective evaluation of data sets can be achieved by comparing, the result of a data set with non-missing variable with that of C4.5 which is a known knowledge acquisition tool in machine learning.
https://doi.org/10.6109/jkiice.2010.14.1.053 인용 PDF KSCI

Lee, Jong Chan
- Journal of the Korea Convergence Society
- /
- v.12 no.8
- /
- pp.39-45
- /
- 2021
This paper discusses how to handle incomplete data including missing values. Optimally processing the missing value means obtaining an estimate that is the closest to the original value from the information contained in the training data, and replacing the missing value with this value. The way to achieve this is to use a decision tree that is completed in the process of classifying information by the classifier. In other words, this decision tree is obtained in the process of learning by inputting only complete information that does not include loss values among all training data into the C4.5 classifier. The nodes of this decision tree have classification variable information, and the higher node closer to the root contains more information, and the leaf node forms a classification region through a path from the root. In addition, the average of classified data events is recorded in each region. Events including the missing value are input to this decision tree, and the region closest to the event is searched through a traversal process according to the information of each node. The average value recorded in this area is regarded as an estimate of the missing value, and the compensation process is completed.
https://doi.org/10.15207/JKCS.2021.12.8.039 인용 PDF KSCI