DOI QR코드

DOI QR Code

A Realization of Injurious moving picture filtering system with Gaussian Mixture Model and Frame-level Likelihood Estimation

Gaussian Mixture Model과 프레임 단위 유사도 추정을 이용한 유해동영상 필터링 시스템 구현

  • Kim, Min-Joung (Department of Aviations Information & Communication Engineering, Kyungwoon University) ;
  • Jeong, Jong-Hyeog (Department of Aviations Information & Communication Engineering, Kyungwoon University)
  • 김민정 (경운대학교 항공정보통신공학과) ;
  • 정종혁 (경운대학교 항공정보통신공학과)
  • Received : 2012.11.30
  • Accepted : 2013.04.04
  • Published : 2013.04.25

Abstract

In this paper, we propose the injurious moving picture filtering system using certain sounds contained in the injurious moving picture to filter injurious moving picture which is distributed without limitation in internet and internet storage space. For this purpose, the Gaussian Mixture Model which can well represent the characteristics of the sound, is used and frame level likelihood estimation is used to calculate the likelihood between filtering target data and the sound models. Also, the pruning method which can real-time proceed by reducing the comparing number of data, is applied for real-time processing, and MWMR method which showed good performance from existing speaker identification, is applied for the distinguish performance of high precision. In the identification experiment result, in case of the frame rate which is the proportion of total frame to high likelihood frame, is set to 50%, identification error rate is 6.06%, and in case of frame rate is set to 60%, error rate is 3.03%. As the result, the proposed system can distinguish between general and injurious moving picture effectively.

본 논문에서는 인터넷 및 인터넷 저장 공간에 제한없이 유통되고 있는 유해동영상을 필터링하기 위해 유해동영상에 포함된 특정 소리를 이용한 유해 동영상 필터링 시스템을 제안한다. 이를 위하여 소리의 특성을 잘 표현할 수 있는 Gaussian Mixture Model을 이용하였으며, 필터링 대상 데이터와 소리모델과의 유사도를 계산하기위해 프레임단위 유사도 추정을 이용하였다. 또, 실시간 처리를 위하여 비교대상 데이터의 수를 줄임으로서 실시간 처리가 가능한 프루닝 방법을 적용하였으며, 고정도의 구별 성능을 위하여 기존 화자식별에서 우수한 성능을 보였던 MWMR 방법을 적용하였다. 식별실험결과, 일반 영상과 유해 영상의 기준인 전체프레임 대비 유사도 높은 프레임의 비를 50%로 설정한 경우, 판별 오류율은 6.06%였으며, 프레임 비의 기준이 60%인 경우, 오류율은 3.03%를 나타내어 소리를 이용한 유해동영상 필터링 시스템이 효과적으로 일반영상과 유해영상을 구별할 수 있는 것을 확인하였다.

Keywords

References

  1. D. A. Reynolds and R. C. Rose, "Robust Text - Independent Speaker Identification using Gaussian Mixture Speaker Models," IEEE Trans. on SAP , Vol. 3, No. 1, pp. 72-83, 1995.
  2. D.A. Reynolds, "Speaker identification and verification using Gaussian mixture speaker models", Speech Communication , Vol. 17, No.1-2,pp.91-108, 1995. https://doi.org/10.1016/0167-6393(95)00009-D
  3. A. Rosenberg, J. DeLong, C.Lee, B.Juang and F. Soong, "The use of cohort normalized scores for speaker verification", proc. ICSLP , pp.599-602, 1992.
  4. T. Matsui and S. Furui, ""Likelihood normalization for speaker verification using a phoneme- and speaker-independent model," Speech Communication, Vol. 17, pp. 109-116, Aug. 1995. https://doi.org/10.1016/0167-6393(95)00011-C
  5. M. J. Kim, S. J. Oh, H. Y. Jung, S. Y. Suk, H. Y. Chung and H. Y. Chung, "Modified Weighting Model Rank Method for Improving the Performance of Real-Time Text-Independent Speaker Recognition System," Journal of the Acoustical Society of Korea, Vol. 21, No. 1(s), pp. 107-110, 2002.
  6. H. Matsumoto and H. Wakita, "Vowel normalization by frequency warped spectral matching," Speech Communication, Vol. 5, No. 2, pp. 239-251, 1986. https://doi.org/10.1016/0167-6393(86)90011-7
  7. K. Fukunaga, Introduction to Statistical Pattern Recognition. Academic Press, Inc., second ed., 1990.
  8. H. Gish and M. Schmidt, "Text-independent speaker identification," IEEE Signal Processing Magazine, pp. 18-32, Oct. 1994.
  9. M. J. Kim, S. J. Oh, H. Y. Jung, and H. Y. Chung, "Frame Selection, Hybrid, Modified Weighting Model Rank Method for Robust Text-Independent Speaker Identification," Journal of the Acoustical Society of Korea, Vol. 21, No. 8, pp. 735-743, 2002.
  10. M. J. Kim, S. J. Oh, S. Y. Suk, H. Y. Jung, and H. Y. Chung, "Modified Weighting Model Rank Method for Improving the performance of real-time text-independent speaker recognition system," Proc., Acous. Soc. Korea, pp. 107-110, July 2002.
  11. K. Markov and S. Nakagawa, "Text-independent speaker identification on TIMIT database," Proc. Acoust. Soc. Jap., pp. 83-84, March 1995.