Advanced SearchSearch Tips
Influence of Data Preprocessing
facebook(new window)  Pirnt(new window) E-mail(new window) Excel Download
 Title & Authors
Influence of Data Preprocessing
Zhu, Changming; Gao, Daqi;
  PDF(new window)
In this paper, we research the influence of data preprocessing. We conclude that using different preprocessing methods leads to different classification performances. Moreover, not all data preprocessing methods are necessary, and a criterion is given to make sure which data preprocessing is necessary and which one is effective. Experiments on some real-world data sets validate that different data preprocessing methods result in different effects. Furthermore, experiments about some algorithms with different preprocessing methods also confirm that preprocessing has a great influence on the performance of a classifier.
Data preprocessing;Preprocessing criterion;Fisher;Pseudo Inverse;
 Cited by
S. Chen, Y. Zhu, D. Zhang, and J. Y. Yang, "Feature extraction approaches based on matrix pattern: MatPCA and Mat-FLDA," Pattern Recognition Letters, vol. 26, no. 8, pp. 1157-1167, 2005. crossref(new window)

N. Cristianini and J. Shawe-Taylor, An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods, Cambridge: Cambridge University Press, 2000.

J. A. Hartigan and M. A. Wong, "Algorithm AS 136: a kmeans clustering algorithm," Journal of the Royal Statistical Society Series C (Applied Statistics), vol. 28, no. 1, pp. 100-108, 1979.

A. J. Jain and R. C. Dubes, Algorithms for Clustering Data, Englewood Cliffs, NJ: Prentice-Hall Inc., 1988.

C. Zhu, "Improved multi-kernel classification machine with Nyström approximation technique and Universum data," Neurocomputing, vol. 175A, pp. 610-634, 2016.

V. N. Vapnik, Statistical Learning Theory, New York: Wiley, 1998.

E. Fix and J. L. Hodges, "Discriminatory analysis: nonparametric discrimination: consistency properties," International Statistical Review, vol. 57, no. 3, pp. 238-247, 1989. crossref(new window)

A. Y. Ng, M. I. Jordan, and Y. Weiss, "On spectral clustering: analysis and an algorithm," Advances in Neural Information Processing Systems, vol. 2, pp. 849-856, 2002.

S. X. Yu and J. Shi, "Multiclass spectral clustering," in Proceedings of 9th IEEE International Conference on Computer Vision, Nice, France, 2003, pp. 313-319.

K. Person, "On lines and planes of closest fit to system of points in space," Philiosophical Magazine Series 6, vol. 2, no. 11, pp. 559-572, 1901. crossref(new window)

I. T. Jolliffe, Principal Component Analysis, New York: Springer, 2002.

C. Saunders, J. Shawe-Taylor, and A. Vinokourov, "String kernels, fisher kernels and finite state automata," Advances in Neural Information Processing Systems, vol. 15, pp. 649-656, 2003.

D. J. Newman, S. Hettich, C. L. Blake, C. J. Merz, and D. W. Aha, "UCI repository of machine learning databases," 1998;