Advanced SearchSearch Tips
Deriving a New Divergence Measure from Extended Cross-Entropy Error Function
facebook(new window)  Pirnt(new window) E-mail(new window) Excel Download
  • Journal title : International Journal of Contents
  • Volume 11, Issue 2,  2015, pp.57-62
  • Publisher : The Korea Contents Association
  • DOI : 10.5392/IJoC.2015.11.2.057
 Title & Authors
Deriving a New Divergence Measure from Extended Cross-Entropy Error Function
Oh, Sang-Hoon; Wakuya, Hiroshi; Park, Sun-Gyu; Noh, Hwang-Woo; Yoo, Jae-Soo; Min, Byung-Won; Oh, Yong-Sun;
  PDF(new window)
Relative entropy is a divergence measure between two probability density functions of a random variable. Assuming that the random variable has only two alphabets, the relative entropy becomes a cross-entropy error function that can accelerate training convergence of multi-layer perceptron neural networks. Also, the n-th order extension of cross-entropy (nCE) error function exhibits an improved performance in viewpoints of learning convergence and generalization capability. In this paper, we derive a new divergence measure between two probability density functions from the nCE error function. And the new divergence measure is compared with the relative entropy through the use of three-dimensional plots.
Cross-Entropy;The n-th Order Extension of Cross-Entropy;Divergence Measure;Information Theory;Neural Networks;
 Cited by
K. Hornik, M. Stinchcombe, and H. White, “Multilayer Feed-forward Networks are Universal Approximators,” Neural Networks, vol. 2, 1989, pp. 359-366. crossref(new window)

K. Hornik, “Approximation Capabilities of Multilayer Feedforward Networks,” Neural Networks, vol. 4, 1991, pp. 251-257 crossref(new window)

S. Suzuki, “Constructive Function Approximation by Three-Layer Artificial Neural Networks,” Neural Networks, vol. 11, 1998, pp. 1049-1058 crossref(new window)

D. E. Rumelhart and J. L. McClelland, Parallel Distributed Processing, Cambridge, MA, 1986.

A. van Ooyen and B. Nienhuis, “Improving the Convergence of the Backpropagation Algorithm,” Neural Networks, vol. 5, 1992, pp. 465-471. crossref(new window)

S.-H. Oh, “Improving the Error Back-Propagation Algorithm with a Modified Error Function,” IEEE Trans. Neural Networks, vol. 8, 1997, pp. 799-803. crossref(new window)

A. El-Jaroudi and J. Makhoul, "A New Error Criterion for Posterior probability Estimation with Neural Nets," Proc. IJCNN'90, vol. III, Jun. 1990, pp. 185-192.

M. Bichsel and P. Seitz, “Minimum Class Entropy: A maximum Information Approach to Layered Networks,” Neural Networks, vol. 2, 1989, pp. 133-141. crossref(new window)

S. Ridella, S. Rovetta, and R. Zunino, “Representation and Generalization Properties of Class-Entropy Networks,” IEEE Trans. Neural Networks, vol. 10, 1999, pp. 31-47. crossref(new window)

D. Erdogmus and J. C. Principe, "Entropy Minimization Algorithm for Multilayer Perceptrons," Proc. IJCNN'01, vol. 4, 2001, pp. 3003-3008.

K. E. Hild II, D. Erdogmus, K. Torkkola, and J. C. Principe, “Feature Extraction Using Information-Theoretic Learning,” IEEE Trans. PAMI, vol. 28, no. 9, 2006, pp. 1385-1392. crossref(new window)

S.-J. Lee, M.-T. Jone, and H.-L. Tsai, “Constructing Neural Networks for Multiclass-Discretization Based on Information Theory,” IEEE Trans. Sys., Man, and Cyb.- Part B, vol. 29, 1999, pp. 445-453. crossref(new window)

D. Erdogmus and J. C. Principe, "Information Transfer Through Classifiers and Its Relation to Probability of Error," Proc. IJCNN'01, vol. 1, 2001, pp. 50-54.

R. Kamimura and S. Nakanishi, “Hidden Information maximization for Feature Detection and Rule Discovery,” Network: Computation in Neural Systems, vol. 6, 1995, pp. 577-602. crossref(new window)

K. Torkkola, "Nonlinear Feature Transforms Using Maximum Mutual Information," Proc. IJCNN'01, vol. 4, 2001, pp. 2756-2761.

T. M. Cover and J. A. Thomas, Elements of Information Theory, John Wiley & Sons, 1991.

S.-H. Oh, “Contour Plots of Objective Functions for FeedForward Neural Networks,” Int. Journal of Contents, vol. 8, no. 4, Dec. 2012, pp. 30-35. crossref(new window)

S.-H. Oh, “Statistical Analyses of Various Error Functions For Pattern Classifiers,” CCIS, vol. 206, 2011, pp. 129-133.