A Development of Wireless Sensor Networks for Collaborative Sensor Fusion Based Speaker Gender Classification

협동 센서 융합 기반 화자 성별 분류를 위한 무선 센서네트워크 개발

  • Received : 2011.01.31
  • Accepted : 2011.04.30
  • Published : 2011.04.30

Abstract

In this paper, we develop a speaker gender classification technique using collaborative sensor fusion for use in a wireless sensor network. The distributed sensor nodes remove the unwanted input data using the BER(Band Energy Ration) based voice activity detection, process only the relevant data, and transmit the hard labeled decisions to the fusion center where a global decision fusion is carried out. This takes advantages of power consumption and network resource management. The Bayesian sensor fusion and the global weighting decision fusion methods are proposed to achieve the gender classification. As the number of the sensor nodes varies, the Bayesian sensor fusion yields the best classification accuracy using the optimal operating points of the ROC(Receiver Operating Characteristic) curves_ For the weights used in the global decision fusion, the BER and MCL(Mutual Confidence Level) are employed to effectively combined at the fusion center. The simulation results show that as the number of the sensor nodes increases, the classification accuracy was even more improved in the low SNR(Signal to Noise Ration) condition.

본 논문에서는 무선센서네트워크에서 이루어지는 협동적 센서융합을 이용한 화자성별분류를 제안하였다. 센서노드들은 BER(Band Energy Ratio) 기반 음성활동검출을 수행함으로써 불필요한 입력 데이터는 제거하고 관련성이 높은 데이터만을 처리 및 경판정한다. 개별적 센서노드에서 생성된 경판정 값들은 융합센터로 송신되고 전역적 결정 융합을 구축하기 때문에 전력 소모를 줄이고 네크워크 자원을 절약한다. 화자성별분류를 위한 센서융합기법으로써 베이시안(Bayesian) 센서융합 및 전역적 가중결정융합가법들이 제안되었다. 베이시안 센서융합의 경우, 배치되는 센서노드 수 변화에 따른 ROC(Receiver Operating Characteristic) 커브의 동작점을 통해 개별 센서노드 레벨에서 얻어진 경판정 값들을 처리하고 최적의 분류 융합을 결정한다. 전역적 결정을 위한 가중치로써 BER 및 MCL(Mutual Confidence Level)을 채택하여 개별적 지역 경판정 값들을 효율적으로 결합 및 융합시킨다. 센서 노드의 수가 증가함에 따라 분류화 성능이 개선되어졌으며 특히 낮은 SNH(Signal to Noise Ratio) 환경에서 성능 개선폭이 더 높게 나타남을 실험적으로 확인하였다.

Keywords

References

  1. 노광현, 이병복, 박애순, "유비쿼터스 흡네트워크를 위한 LoanRF 디바이스 기반의 센서 네트워크 설계 및 응용 한국신호처리시스템논문지, 제 7권, 제 3호, pp. 87- 94, 7. 2006.
  2. G. Wichern, H. Thornburg, and A Spanias, "Multi-channel audio segmentation for continuous observation and archival of large spaces," Proc. of ICASSP'09, pp. 237-240, 2009.
  3. A Arora, P. Dutta, S. Bapat, V. Kulathumani, H. Zhang, V. Naik, V. Mittal, H. Cao, M. Demirbas, M. Gouda, y. Choi, T. Herman, S. Kulkarni, U. Arumugam, M. Nesterenko, A. Vora, and M. Miyashita, "A line in the sand: A wireless sensor network for target detection, classification, and tracking," Computer Networks, Vol. 46, no. 5, pp. 605-634, Dec. 2004. https://doi.org/10.1016/j.comnet.2004.06.007
  4. AS. Bregman, Auditory Scene Analysis: The Perceptual Organization of Sound, MIT press, 1994.
  5. V. Berisha, H. Kwon, and A Spanias, "Real-time acoustic sensing using wireless sensor motes", Proc. of ISCAS'06, pp. 847-850, 2006.
  6. L. Lu, H.- J Zhang, H. Jiang, "Content analysis for audio classification and segmentation," IEEE Trans. on Speech and Audio Processing, Vol. 10, pp. 504-516, 2002. https://doi.org/10.1109/TSA.2002.804546
  7. D. Hall and S. McMullen, Mathematical Techniques in Multisensor Data Fusion, Artech House, 2004.
  8. P. K. Varshney, Distributed Detection and Data Fusion, Springer-Verlag, New York, 1997.
  9. A Benyassine, E. Shlornot, and H. Su, "ITU-T recommendation G.729, annex B, a silence compression scheme for use with G.729 optimized for V.70 digital simultaneous voice and data spplications, IEEE Communication Magazine, pp. 64 - 72, 1997.
  10. S.B. Davis and P. Mermelstein, "Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Sentences", IEEE Trans. on Acoust., Speech and Signal Processing, Vol. 28, pp. 357 - 366, 1980. https://doi.org/10.1109/TASSP.1980.1163420
  11. J A. Bilmes, "A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models," Technical Report, University of Berkeley, ICSI -TR-97-021, 1997.
  12. H. L. Van Trees, Detection, Estimation, and Modulation Theory, Part I, Wiley Interscience, 2001.
  13. J. S. Garofolo, L. F. Lamel, W. M. Fisher, J. G. Fiscus, D. S. Pallett, and N. L. Dahlgren, "The DARPA TIMIT acoustic-phonetic continuous speech corpus CD ROM," Tech. Rep. NISTIR 4930 / NTIS Order No. PB93-173938, National Institute of Standards and Technology, Gaithersburgh, Md, USA, Feb. 1993.