Comparing Accuracy of Imputation Methods for Incomplete Categorical Data

  • Shin, Hyung-Won (Dept. of Computer Science & Industrial Systems Engineering, Yonsei University) ;
  • Sohn, So-Young (Dept. of Computer Science & Industrial Systems Engineering, Yonsei University)
  • Published : 2003.05.23

Abstract

Various kinds of estimation methods have been developed for imputation of categorical missing data. They include modal category method, logistic regression, and association rule. In this study, we propose two imputation methods (neural network fusion and voting fusion) that combine the results of individual imputation methods. A Monte-Carlo simulation is used to compare the performance of these methods. Five factors used to simulate the missing data are (1) true model for the data, (2) data size, (3) noise size (4) percentage of missing data, and (5) missing pattern. Overall, neural network fusion performed the best while voting fusion is better than the individual imputation methods, although it was inferior to the neural network fusion. Result of an additional real data analysis confirms the simulation result.

Keywords