A Statistical Perspective of Neural Networks for Imbalanced Data Problems Oh, Sang-Hoon;
It has been an interesting challenge to find a good classifier for imbalanced data, since it is pervasive but a difficult problem to solve. However, classifiers developed with the assumption of well-balanced class distributions show poor classification performance for the imbalanced data. Among many approaches to the imbalanced data problems, the algorithmic level approach is attractive because it can be applied to the other approaches such as data level or ensemble approaches. Especially, the error back-propagation algorithm using the target node method, which can change the amount of weight-updating with regards to the target node of each class, attains good performances in the imbalanced data problems. In this paper, we analyze the relationship between two optimal outputs of neural network classifier trained with the target node method. Also, the optimal relationship is compared with those of the other error function methods such as mean-squared error and the n-th order extension of cross-entropy error. The analyses are verified through simulations on a thyroid data set.
Contour Plots of Objective Functions for Feed-Forward Neural Networks, International Journal of Contents, 2012, 8, 4, 30
Y. Sun, M. S. Kamel, A. K. C. W, and Y. Wang, "Cost-Sensitive Boosting for Classification of Imbalanced Data," Pattern Recognition, vol.40, 2007, pp. 3358-3378.
F. Provost and T. Fawcett, "Robust Classification for Imprecise Environments," Machine Learning, vol.42, 2001, pp. 203-231.
N. V. Chawla, K. W. Bowyer, L. O. all, and W. P. Kegelmeyer, "SMOTE: Synthetic Minority Over-sampling Technique," J. Artificial Intelligence Research, vol.16, 2002, pp. 321-357.
P. Kang and S. Cho, "EUS SVMs: ensemble of under-sampled SVMs for data imbalance problem, " Proc. ICONIP'06, 2006, p. 837-846.
Y.-M. Huang, C.-M. Hung, and H. C. Jiau, "Evaluation of Neural Networks and Data Mining Methods on a Credit Assessment Task for Class Imbalance Problem," Nonlinear Analysis, vol.7, 2006, pp. 720-747.
N. V. Chawla, D. A. Cieslak, L. O. Hall, and A. Joshi, "Automatically Countering Imbalance and Its Empirical Relationship to Cost," Data Mining and Knowledge Discovery, vol.17, no.2, 2008, pp. 225-252.
Z.-H. Zhou and X.-Y. Liu, "Training Cost-Sensitive Neural Networks with Methods Addressing the Class Imbalance Problem," IEEE Trans. Know. and Data Eng., vol.18, no. 1, Jan. 2006, pp. 63-77.
H. Zhao, "Instance Weighting versus Threshold Adjusting for Cost-Sensitive Classification," Knowledge and Information Systems, vol.15, 2008, pp. 321-334.
L. Bruzzone and S. B. Serpico, "Classification of Remote-Sensing Data by Neural Networks," Pattern Recognition Letters, vol.18, 1997, pp. 1323-1328.
D. E. Rumelhart and J. L. McClelland, Parallel Distributed Processing, Cambridge, MA, 1986.
Y. Lee, S.-H. Oh, and M. W. Kim,"An Analysis of Premature Saturation in Back-Propagation Learning," Neural Networks, vol.6, 1993, pp. 719-728.
S.-H. Oh, "Improving the Error Back-Propagation Algorithm with a Modified Error Function," IEEE Trans. Neural Networks, vol.8, 1997, pp. 799-803.
S.-H. Oh, "Classification of Imbalanced Data Using Multilayer Perceptrons," J. Korea Contents Association, vol.9, no.4, July 2009, pp.141-148.
H. White, "Learning in Artificial Neural Networks: A Statistical Perspective," Neural Computation, vol.1, no.4, Winter 1989, pp.425-464.
A. Frank and A. Asuncion, UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences http://archive.ics.uci.edu/ml, 2010.