k-Nearest Neighbor-Based Approach for the Estimation of Mutual Information

Cha, Woon-Ock;Huh, Moon-Yul;

doi:10.5351/CKSS.2008.15.6.977

Communications for Statistical Applications and Methods

제15권6호
/
Pages.977-991
/
2008
/
2287-7843(pISSN)
/
2383-4757(eISSN)

한국통계학회 (The Korean Statistical Society)

DOI QR Code

상호정보 추정을 위한 k-최근접이웃 기반방법

k-Nearest Neighbor-Based Approach for the Estimation of Mutual Information

차운옥 (한성대학교 공과대학 멀티미디어공학과) ;
허문열 (성균관대학교 통계학과)

Cha, Woon-Ock (Department of Mutimedia Engineering, Hansung University) ;
Huh, Moon-Yul (Department of Statistics, Sungkyunkwan University)

발행 : 2008.11.30

https://doi.org/10.5351/CKSS.2008.15.6.977 인용 PDF KSCI

PDF 다운로드

⟨ 이전 논문 다음 논문 ⟩

초록

본 논문에서는 연속형 변수에 대한 결합확률분포를 추정하지 않고도 상호정보(MI) 추정량을 구할 수 있는 k-최근접이웃 기반방법에 대하여 연구하였다. 변수가 동일한 값들을 가지는 경우 k-최근접이웃을 구할 때 생기는 문제점을 해결하기 위하여 지터링(jittering)과 붓스트랩(bootstrap) 방법을 제안하였다. 몬테칼로 모의실험과 실제 데이터에 대한 실험을 수행한 결과, k=1과 같이 작은 값을 사용한 k-최근접이웃 기반방법에 의해 효율적인 MI 추정량을 구할 수 있었다. k-최근접이웃 기반방법은 연속형 설명변수, 범주형 또는 연속형인 목적변수 형태의 데이터에 적용할 수 있으며, 목적변수에 영향을 주는 중요한 설명변수의 순서를 구할 수 있을 뿐만 아니라 다차원에도 적용할 수 있기 때문에 중요변수의 집합을 구하는 변수 선택(feature subset selection) 문제에도 적용할 수 있다.

This study is about the k-nearest neighbor-based approach for the estimation of mutual information when the type of target variable is categorical and continuous. The results of Monte-Carlo simulation and experiments with real-world data show that k=1 is preferable. In practical application with real world data, our study shows that jittering and bootstrapping is needed.

키워드

참고문헌

허문열, 차운옥 (2008). Sample-spacing 방법에 의한 상호정보의 추정, <응용통계연구>, 21, 301-312 https://doi.org/10.5351/KJAS.2008.21.2.301
Beirlant, J., Dudewicz, E. J., Gyor-, L. and Meulen, E. (1997). Nonparametric entropy estimation: An overview, International Journal of Mathematical and Statistical Sciences, 6, 17-39
Blake, C. and Merz, C. J. (1998). UCI machine learning repository, http://www.ics.uci.edu/mlearn/MLRepository
Brillinger, D. R. (2004). Some data analyses using mutual information, Brazilian Journal of Probability and Satistics, 18, 163-183
Cha, W. O. and Huh, M. Y. (2005). Discretization method based on quantiles for variable selection using mutual information, Communications of the Korean Statistical Society, 12, 659-672 https://doi.org/10.5351/CKSS.2005.12.3.659
Cover, T. M. and Thomas, J. A. (1991). Elements of Information Theory, John Wiley & Sons, New York
Huh, M. Y. (2005). DAVIS(Data visualization system), http://stat.skku.ac.kr/myhuh/DAVIS.html
Kononenko, I. (1994). Estimating attributes: Analysis and extensions of RELIEF, In Proceedings of European Conference on Machine Learning, 171-182
Kraskov, A., Staugbauer, H. and Grassberger, P. (2004). Estimating Mutual Information, Physical Review E 69, 066138
Lazo, A. V. and Rathie, P. (1978). On the entropy of continuous probability distributions, IEEE Transactions on Information Theory, 24, 120-122 https://doi.org/10.1109/TIT.1978.1055832
Miller, E. G. L. and Fisher III, J. W. (2003). ICA using spacings estimation of entropy, The Journal of Machine Learning Research, 4, 1271-1295 https://doi.org/10.1162/jmlr.2003.4.7-8.1271
Staugbauer, H., Kraskov, A., Astakhov, S. A. and Grassberger, P. (2004). Least dependent component analysis based on mutual information, Physical Review E 70, 066123

Communications for Statistical Applications and Methods

상호정보 추정을 위한 k-최근접이웃 기반방법

k-Nearest Neighbor-Based Approach for the Estimation of Mutual Information

초록

키워드

참고문헌

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)