Advanced SearchSearch Tips
Effect of Nonlinear Transformations on Entropy of Hidden Nodes
facebook(new window)  Pirnt(new window) E-mail(new window) Excel Download
  • Journal title : International Journal of Contents
  • Volume 10, Issue 1,  2014, pp.18-22
  • Publisher : The Korea Contents Association
  • DOI : 10.5392/IJoC.2014.10.1.018
 Title & Authors
Effect of Nonlinear Transformations on Entropy of Hidden Nodes
Oh, Sang-Hoon;
  PDF(new window)
Hidden nodes have a key role in the information processing of feed-forward neural networks in which inputs are processed through a series of weighted sums and nonlinear activation functions. In order to understand the role of hidden nodes, we must analyze the effect of the nonlinear activation functions on the weighted sums to hidden nodes. In this paper, we focus on the effect of nonlinear functions in a viewpoint of information theory. Under the assumption that the nonlinear activation function can be approximated piece-wise linearly, we prove that the entropy of weighted sums to hidden nodes decreases after piece-wise linear functions. Therefore, we argue that the nonlinear activation function decreases the uncertainty among hidden nodes. Furthermore, the more the hidden nodes are saturated, the more the entropy of hidden nodes decreases. Based on this result, we can say that, after successful training of feed-forward neural networks, hidden nodes tend not to be in linear regions but to be in saturated regions of activation function with the effect of uncertainty reduction.
Entropy;Hidden Nodes;Nonlinear Activation Function;Feed-Forward Neural Networks;
 Cited by
K. Hornik, M. Stincombe, and H. White, "Multilayer feedforward networks are universal approximators," Neural Networks, vol. 2, 1989, pp. 359-366. crossref(new window)

K. Hornik, "Approximation Capabilities of Multilayer Feedforward Networks," Neural Networks, vol. 4, 1991, pp. 251-257. crossref(new window)

S. Suzuki, "Constructive Function Approximation by Three-layer Artificial Neural Networks," Neural Networks, vol. 11, 1998, pp. 1049-1058. crossref(new window)

Y. Liao, S. C. Fang, and H. L. W. Nuttle, "Relaxed Conditions for Radial-Basis Function Networks to be Universal Approximators," Neural Networks, vol. 16, 2003, pp. 1019-1028. crossref(new window)

S. H. Oh and Y. Lee, "Effect of Nonlinear Transformations on Correlation Between Weighted Sums in Multilayer Perceptrons," IEEE Trans., Neural Networks, vol. 5, 1994, pp. 508-510. crossref(new window)

J. V. Shah and C. S. Poon, "Linear Independence of Internal Representations in Multilayer Perceptrons," IEEE Trans., Neural Networks, vol. 10, 1999, pp. 10-18. crossref(new window)

Y. Lee and H. K. Song, "Analysis on the Efficiency of Pattern Recognition Layers Using Information Measures," Proc. IJCNN'93 Nagoya, vol. 3, Oct. 1993, pp. 2129-2132.

A. El-Jaroudi and J. Makhoul, "A New Error Criterion for Posterior probability Estimation with Neural Nets," Proc. IJCNN'90, vol. 3, June 1990, pp. 185-192.

M. Bichsel and P. Seitz, "Minimum Class Entropy: A maximum Information Approach to Layered Networks," Neural Networks, vol. 2, 1989, pp. 133-141. crossref(new window)

S. Ridella, S. Rovetta, and R. Zunino, "Representation and Generalization Properties of Class-Entropy Networks," IEEE Trans. Neural Networks, vol. 10, 1999, pp. 31-47. crossref(new window)

D. Erdogmus and J. C. Principe, "Entropy Minimization Algorithm for Multilayer Perceptrons," Proc. IJCNN'01, vol. 4, 2001, pp. 3003-3008.

K. E. Hild II, D. Erdogmus, K. Torkkola, and J. C. Principe, "Feature Extraction Using Information-Theoretic Learning," IEEE Trans. PAMI, vol. 28, no. 9, 2006, pp. 1385-1392. crossref(new window)

R. Li, W. Liu, and J. C. Principe, "A Unifiying Criterion for Instaneous Blind Source Separation Based on Correntropy," Signal Processing, vol. 87, no. 8, 2007, pp. 1872-1881. crossref(new window)

S. Ekici, S. Yildirim, and M. Poyraz, "Energy and Entropy-Based Feature Extraction for Locating Fault on Transmission Lines by Using Neural Network and Wavelet packet Decomposition," Expert Systems with Applications, vol. 34, 2008, pp. 2937-2944 crossref(new window)

D. Erdogmus and J. C. Principe, "Information Transfer Through Classifiers and Its Relation to Probability of Error," Proc. IJCNN'01, vol. 1, 2001, pp. 50-54.

S. J. Lee, M. T. Jone, and H. L. Tsai, "Constructing Neural Networks for Multiclass-Discretization Based on Information Theory," IEEE Trans. Sys., Man, and Cyb.-Part B, vol. 29, 1999, pp. 445-453. crossref(new window)

R. Kamimura and S. Nakanishi, "Hidden Information maximization for Feature Detection and Rule Discovery," Network: Computation in Neural Systems, vol. 6, 1995, pp. 577-602. crossref(new window)

K. Torkkola, "Nonlinear Feature Transforms Using Maximum Mutual Information," Proc. IJCNN'01, vol. 4, 2001, pp. 2756-2761.

A. Papoulis, Probability, Random Variables, and Stochastic Processes, second ed., New York: McGraw-Hill, 1984.

Y. Lee, S. H. Oh, and M. W. Kim, "An Analysis of Premature Saturation in Back-Propagation Learning," Neural Networks, vol. 6, 1993, pp. 719-728. crossref(new window)

Y. Lee and S. H. Oh, "Input Noise Immunity of Multilayer Perceptrons," ETRI Journal, vol. 16, 1994, pp. 35-43. crossref(new window)

T. M. Cover and J. A. Thomas, Elements of Information Theory, John Wiley and Sons, INC. 1991.

S. H. Oh, "Decreasing of Correlations Among Hidden Neurons of Multilayer Perceptrons," Journal of the Korea Contents Association, vol. 3, no. 3, 2003, pp. 98-102.