Search | Korea Science

Kim, Bum-Jun;Moon, Hyeongi;Park, Sung-Wook;Park, Young cheol
- The Journal of the Acoustical Society of Korea
- /
- v.37 no.6
- /
- pp.475-482
- /
- 2018
In this paper, we present a study on data augmentation methods for DNN (Deep Neural Network)-based audio tagging. In this system, an audio signal is converted into a mel-spectrogram and used as an input to the DNN for audio tagging. To cope with the problem associated with a small number of training data, we augment the training samples using time stretching, pitch shifting, dynamic range compression, and block mixing. In this paper, we derive optimal parameters and combinations for the augmentation methods through audio tagging simulations.
https://doi.org/10.7776/ASK.2018.37.6.475 인용 PDF KSCI HTML

Gwak, Mi-Ra;Jo, Dong-Seop
- The KIPS Transactions:PartB
- /
- v.9B no.1
- /
- pp.67-76
- /
- 2002
MP3 audio format has good quality and high compression rate ; therefore, the use of MP3 format increases. The requirement of keeping the extra information such as author and lyrics in MP3 files increases. And the tagging systems designed to meet this requirement are suggested. ID3 vl tag and Lyrics3 v2 tag are two most widely used tagging systems. But ID3 vl tag and Lyrics3 v2 tag are the last things to arrive when the file is being streamed. Therefore, users cannot get the tag information until the entire audio file is downloaded. Moreover information synchronized with audio stream may lose its feature. In this paper, a system searching and playing audio files based on tag information in MP3 files is implemented. This system solves the problem that the tag information is ignored when an MP3 files is played on internet. An audio object is described in an XML document, and timing and synchronization between elements in that In document is provided in HTML+TIME style using XSL.
https://doi.org/10.3745/KIPSTB.2002.9B.1.067 인용 PDF KSCI

Park, Chungho;Kim, Donghyun;Ko, Hanseok
- The Journal of the Acoustical Society of Korea
- /
- v.39 no.5
- /
- pp.414-423
- /
- 2020
In this paper, we propose a Dilated Convolution Gate Linear Unit (DCGLU) to mitigate the lack of sparsity and small receptive field problems caused by the segmentation map extraction process in sound event detection with weak labels. In the advent of deep learning framework, segmentation map extraction approaches have shown improved performance in noisy environments. However, these methods are forced to maintain the size of the feature map to extract the segmentation map as the model would be constructed without a pooling operation. As a result, the performance of these methods is deteriorated with a lack of sparsity and a small receptive field. To mitigate these problems, we utilize GLU to control the flow of information and Dilated Convolutional Neural Networks (DCNNs) to increase the receptive field without additional learning parameters. For the performance evaluation, we employ a URBAN-SED and self-organized bird sound dataset. The relevant experiments show that our proposed DCGLU model outperforms over other baselines. In particular, our method is shown to exhibit robustness against nature sound noises with three Signal to Noise Ratio (SNR) levels (20 dB, 10 dB and 0 dB).
https://doi.org/10.7776/ASK.2020.39.5.414 인용 PDF KSCI