DOI QR코드

DOI QR Code

A Study on a Statistical Matching Method Using Clustering for Data Enrichment

  • Kim Soon Y. ;
  • Lee Ki H. ;
  • Chung Sung S.
  • Published : 2005.08.01

Abstract

Data fusion is defined as the process of combining data and information from different sources for the effectiveness of the usage of useful information contents. In this paper, we propose a data fusion algorithm using k-means clustering method for data enrichment to improve data quality in knowledge discovery in database(KDD) process. An empirical study was conducted to compare the proposed data fusion technique with the existing techniques and shows that the newly proposed clustering data fusion technique has low MSE in continuous fusion variables.

Keywords

References

  1. 정성석, 김순영, 김현진 (2004). 데이터 보강을 위한 데이터 통합기법에 관한 연구, '응용통계연구', 제17권, 605-617
  2. Blake, C. L. and Merz, C. J. (1998). UCI Repository of machine learning databases [http://www.ics.uci.edu/-mlearn/MLRepository.html]. Irvine, CA: University of California, Department of Information and Computer Science
  3. Ingram, D., O'Hare, J., Scheuren, F. and Turek, J (2000). Statistical matching: a new validation case study. Proceedings of the Survey Research Methods Section, American Statistical Association
  4. Rassler, S. (2002). Statistical Matching : A frequentist theory, practical applications, and alternative Bayesian approaches. New York, Springer Verlag
  5. Saporta, G. (2002). Data fusion and data grafting, Computational Statistics & Data Analysis 38 465-473 https://doi.org/10.1016/S0167-9473(01)00072-X
  6. U.S. Department of Commerce, (1980). Report on exact and statistical matching techniques. Statistical Policy Working Paper 5. Washington, DC: Federal Committee on Statistical Methodology
  7. van der Putten, P., Joost N. K. and Gupta, A. (2002). Why the Information Explosion Can Be Bad for Data Mining, and How Data Fusion Provides a Way Out, Second SIAM International Conference on Data Mining, Arlington, April 11-13
  8. Yoshizoe, Y. and Araki, M. (1999). Use of statistical matching for household surveys In Japan. In 52nd Session of the International Statistical Institute, Helsinki, Finland