Deriving a New Divergence Measure from Extended Cross-Entropy Error Function Oh, Sang-Hoon; Wakuya, Hiroshi; Park, Sun-Gyu; Noh, Hwang-Woo; Yoo, Jae-Soo; Min, Byung-Won; Oh, Yong-Sun;
Relative entropy is a divergence measure between two probability density functions of a random variable. Assuming that the random variable has only two alphabets, the relative entropy becomes a cross-entropy error function that can accelerate training convergence of multi-layer perceptron neural networks. Also, the n-th order extension of cross-entropy (nCE) error function exhibits an improved performance in viewpoints of learning convergence and generalization capability. In this paper, we derive a new divergence measure between two probability density functions from the nCE error function. And the new divergence measure is compared with the relative entropy through the use of three-dimensional plots.
Cross-Entropy;The n-th Order Extension of Cross-Entropy;Divergence Measure;Information Theory;Neural Networks;
K. Hornik, M. Stinchcombe, and H. White, “Multilayer Feed-forward Networks are Universal Approximators,” Neural Networks, vol. 2, 1989, pp. 359-366.
K. Hornik, “Approximation Capabilities of Multilayer Feedforward Networks,” Neural Networks, vol. 4, 1991, pp. 251-257
S. Suzuki, “Constructive Function Approximation by Three-Layer Artificial Neural Networks,” Neural Networks, vol. 11, 1998, pp. 1049-1058
D. E. Rumelhart and J. L. McClelland, Parallel Distributed Processing, Cambridge, MA, 1986.
A. van Ooyen and B. Nienhuis, “Improving the Convergence of the Backpropagation Algorithm,” Neural Networks, vol. 5, 1992, pp. 465-471.
S.-H. Oh, “Improving the Error Back-Propagation Algorithm with a Modified Error Function,” IEEE Trans. Neural Networks, vol. 8, 1997, pp. 799-803.
A. El-Jaroudi and J. Makhoul, "A New Error Criterion for Posterior probability Estimation with Neural Nets," Proc. IJCNN'90, vol. III, Jun. 1990, pp. 185-192.
M. Bichsel and P. Seitz, “Minimum Class Entropy: A maximum Information Approach to Layered Networks,” Neural Networks, vol. 2, 1989, pp. 133-141.
S. Ridella, S. Rovetta, and R. Zunino, “Representation and Generalization Properties of Class-Entropy Networks,” IEEE Trans. Neural Networks, vol. 10, 1999, pp. 31-47.
D. Erdogmus and J. C. Principe, "Entropy Minimization Algorithm for Multilayer Perceptrons," Proc. IJCNN'01, vol. 4, 2001, pp. 3003-3008.
K. E. Hild II, D. Erdogmus, K. Torkkola, and J. C. Principe, “Feature Extraction Using Information-Theoretic Learning,” IEEE Trans. PAMI, vol. 28, no. 9, 2006, pp. 1385-1392.
S.-J. Lee, M.-T. Jone, and H.-L. Tsai, “Constructing Neural Networks for Multiclass-Discretization Based on Information Theory,” IEEE Trans. Sys., Man, and Cyb.- Part B, vol. 29, 1999, pp. 445-453.
D. Erdogmus and J. C. Principe, "Information Transfer Through Classifiers and Its Relation to Probability of Error," Proc. IJCNN'01, vol. 1, 2001, pp. 50-54.
R. Kamimura and S. Nakanishi, “Hidden Information maximization for Feature Detection and Rule Discovery,” Network: Computation in Neural Systems, vol. 6, 1995, pp. 577-602.
K. Torkkola, "Nonlinear Feature Transforms Using Maximum Mutual Information," Proc. IJCNN'01, vol. 4, 2001, pp. 2756-2761.
T. M. Cover and J. A. Thomas, Elements of Information Theory, John Wiley & Sons, 1991.
S.-H. Oh, “Contour Plots of Objective Functions for FeedForward Neural Networks,” Int. Journal of Contents, vol. 8, no. 4, Dec. 2012, pp. 30-35.
S.-H. Oh, “Statistical Analyses of Various Error Functions For Pattern Classifiers,” CCIS, vol. 206, 2011, pp. 129-133.