• Title/Summary/Keyword: Spectrum Enhancement

Search Result 219, Processing Time 0.024 seconds

A study on speech enhancement using complex-valued spectrum employing Feature map Dependent attention gate (특징 맵 중요도 기반 어텐션을 적용한 복소 스펙트럼 기반 음성 향상에 관한 연구)

  • Jaehee Jung;Wooil Kim
    • The Journal of the Acoustical Society of Korea
    • /
    • v.42 no.6
    • /
    • pp.544-551
    • /
    • 2023
  • Speech enhancement used to improve the perceptual quality and intelligibility of noise speech has been studied as a method using a complex-valued spectrum that can improve both magnitude and phase in a method using a magnitude spectrum. In this paper, a study was conducted on how to apply attention mechanism to complex-valued spectrum-based speech enhancement systems to further improve the intelligibility and quality of noise speech. The attention is performed based on additive attention and allows the attention weight to be calculated in consideration of the complex-valued spectrum. In addition, the global average pooling was used to consider the importance of the feature map. Complex-valued spectrum-based speech enhancement was performed based on the Deep Complex U-Net (DCUNET) model, and additive attention was conducted based on the proposed method in the Attention U-Net model. The results of the experiments on noise speech in a living room environment showed that the proposed method is improved performance over the baseline model according to evaluation metrics such as Source to Distortion Ratio (SDR), Perceptual Evaluation of Speech Quality (PESQ), and Short Time Object Intelligence (STOI), and consistently improved performance across various background noise environments and low Signal-to-Noise Ratio (SNR) conditions. Through this, the proposed speech enhancement system demonstrated its effectiveness in improving the intelligibility and quality of noisy speech.

Simultaneous Spectral Resolution and Sensitivity Enhancement in MR spectrum: Maximum Likelihood Deconvolution Reconstruction

  • Jeong, Gwang-Woo;Jeong, Jenny Eunice;Kang, Heoung-Keun
    • Journal of the Korean Magnetic Resonance Society
    • /
    • v.15 no.2
    • /
    • pp.157-174
    • /
    • 2011
  • Although the use of apodization functions in connection with postprocessing of a 2D NMR spectrum proves improved spectral quality, there is usually a trade-off between resolution enhancement and noise suppression due to a classical "uncertainty principle." In this study, therefore, a mathematical deconvolution technique called "Maximum Likelihood Deconvolution (MLD)" was adopted to achieve the spectral resolution and sensitivity enhancement simultaneously. The MLD technique greatly facilitates visualization and restoration of the genuine spectral information from complex 2D NMR spectra that would be problematic with the conventional apodization/FT processing. In particular, application of the MLD to the 2D-NOE spectrum would be very useful to derive the important proton connectivities, which are essential to achieve elucidating the 3D molecular structure.

A study on skip-connection with time-frequency self-attention for improving speech enhancement based on complex-valued spectrum (복소 스펙트럼 기반 음성 향상의 성능 향상을 위한 time-frequency self-attention 기반 skip-connection 기법 연구)

  • Jaehee Jung;Wooil Kim
    • The Journal of the Acoustical Society of Korea
    • /
    • v.42 no.2
    • /
    • pp.94-101
    • /
    • 2023
  • A deep neural network composed of encoders and decoders, such as U-Net, used for speech enhancement, concatenates the encoder to the decoder through skip-connection. Skip-connection helps reconstruct the enhanced spectrum and complement the lost information. The features of the encoder and the decoder connected by the skip-connection are incompatible with each other. In this paper, for complex-valued spectrum based speech enhancement, Self-Attention (SA) method is applied to skip-connection to transform the feature of encoder to be compatible with the features of decoder. SA is a technique in which when generating an output sequence in a sequence-to-sequence tasks the weighted average of input is used to put attention on subsets of input, showing that noise can be effectively eliminated by being applied in speech enhancement. The three models using encoder and decoder features to apply SA to skip-connection are studied. As experimental results using TIMIT database, the proposed methods show improvements in all evaluation metrics compared to the Deep Complex U-Net (DCUNET) with skip-connection only.

Method for Spectral Enhancement by Binary Mask for Speech Recognition Enhancement Under Noise Environment (잡음환경에서 음성인식 성능향상을 위한 바이너리 마스크를 이용한 스펙트럼 향상 방법)

  • Choi, Gab-Keun;Kim, Soon-Hyob
    • The Journal of the Acoustical Society of Korea
    • /
    • v.29 no.7
    • /
    • pp.468-474
    • /
    • 2010
  • The major factor that disturbs practical use of speech recognition is distortion by the ambient and channel noises. Generally, the ambient noise drops the performance and restricts places to use. DSR (Distributed Speech Recognition) based speech recognition also has this problem. Various noise cancelling algorithms are applied to solve this problem, but loss of spectrum and remaining noise by incorrect noise estimation at low SNR environments cause drop of recognition rate. This paper proposes methods for speech enhancement. This method uses MMSE-STSA for noise cancelling and ideal binary mask to compensate damaged spectrum. According to experiments at noisy environment (SNR 15 dB ~ 0 dB), the proposed methods showed better spectral results and recognition performance.

A study on the fingerpring enhancement using the fourier transform (퓨리에 변환을 이용한 지문영상의 개선에 관한 연구)

  • 곽윤식
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.21 no.8
    • /
    • pp.1897-1904
    • /
    • 1996
  • This study intends to extract the efficient spectrum characteristics of the fingerpriint image in the fourier domain and to apply them for image enhancement. In order to effectively acquire the spectrum characteristics of the fingerprint in the fourier domain, I set up a 1*64 window as a processing unit and, combining various kinds of the record and overlap lengths, made the power spectrum density estimate for each of those combinations. each spectrum characeristic acquired was applied to a re-synthesis process of the fingerprint image, and, through comparisons and evaluations of the resultant images, an improved gray scale image could be obtained. The validity of this algorithm could be confirmed by the comparison and evaluation fo the binary images which were grained on the established method and the one I used in this experiment.

  • PDF

Estimation and Analysis of Wave Spectrum Parameter using HeMOSU-2 Observation Data (HeMOSU-2 관측 자료를 이용한 파랑 스펙트럼 매개변수 추정 및 분석)

  • Lee, Uk-Jae;Ko, Dong-Hui;Kim, Ji-Young;Cho, Hong-Yeon
    • Journal of Korean Society of Coastal and Ocean Engineers
    • /
    • v.33 no.6
    • /
    • pp.217-225
    • /
    • 2021
  • In this study, wave spectrum data were calculated using the water surface elevation data observed at 5Hz intervals from the HeMOSU-2 meteorological tower installed on the west coast of Korea, and wave parameters were estimated using wave spectrum data. For all significant wave height ranges, the peak enhancement parameter (γopt) of the JONSWAP spectrum and the scale parameter (α) and shape parameter (β) of the modify BM spectrum were estimated based on the observed spectrum, and the distribution of each parameter was confirmed. As a result of the analysis, the peak enhancement parameter (γopt) of the JONSWAP spectrum was calculated to be 1.27, which is very low compared to the previously proposed 3.3. And in the range of all significant wave heights, the distribution of the peak enhancement parameter (γopt) was shown as a combined distribution of probability mass function (PMF) and probability density function (PDF). In addition, the scale parameter (α) and shape parameter (β) of the modify BM spectrum were estimated to be [0.245, -1.278], which are lower than the existing [0.300, -1.098], and the result of the linear correlation analysis between the two parameters was β = -3.86α.

A study on loss combination in time and frequency for effective speech enhancement based on complex-valued spectrum (효과적인 복소 스펙트럼 기반 음성 향상을 위한 시간과 주파수 영역 손실함수 조합에 관한 연구)

  • Jung, Jaehee;Kim, Wooil
    • The Journal of the Acoustical Society of Korea
    • /
    • v.41 no.1
    • /
    • pp.38-44
    • /
    • 2022
  • Speech enhancement is performed to improve intelligibility and quality of the noise-corrupted speech. In this paper, speech enhancement performance was compared using different loss functions in time and frequency domains. This study proposes a combination of loss functions to utilize advantage of each domain by considering both the details of spectrum and the speech waveform. In our study, Scale Invariant-Source to Noise Ratio (SI-SNR) is used for the time domain loss function, and Mean Squared Error (MSE) is used for the frequency domain, which is calculated over the complex-valued spectrum and magnitude spectrum. The phase loss is obtained using the sin function. Speech enhancement result is evaluated using Source-to-Distortion Ratio (SDR), Perceptual Evaluation of Speech Quality (PESQ), and Short-Time Objective Intelligibility (STOI). In order to confirm the result of speech enhancement, resulting spectrograms are also compared. The experimental results over the TIMIT database show the highest performance when using combination of SI-SNR and magnitude loss functions.

Noise Suppression Using Normalized Time-Frequency Bin Average and Modified Gain Function for Speech Enhancement in Nonstationary Noisy Environments

  • Lee, Soo-Jeong;Kim, Soon-Hyob
    • The Journal of the Acoustical Society of Korea
    • /
    • v.27 no.1E
    • /
    • pp.1-10
    • /
    • 2008
  • A noise suppression algorithm is proposed for nonstationary noisy environments. The proposed algorithm is different from the conventional approaches such as the spectral subtraction algorithm and the minimum statistics noise estimation algorithm in that it classifies speech and noise signals in time-frequency bins. It calculates the ratio of the variance of the noisy power spectrum in time-frequency bins to its normalized time-frequency average. If the ratio is greater than an adaptive threshold, speech is considered to be present. Our adaptive algorithm tracks the threshold and controls the trade-off between residual noise and distortion. The estimated clean speech power spectrum is obtained by a modified gain function and the updated noisy power spectrum of the time-frequency bin. This new algorithm has the advantages of simplicity and light computational load for estimating the noise. This algorithm reduces the residual noise significantly, and is superior to the conventional methods.

The Speech Recognition Using the Diffusion Network (확산망을 이용한 음성인식)

  • 허만택
    • Proceedings of the Acoustical Society of Korea Conference
    • /
    • 1996.10a
    • /
    • pp.70-75
    • /
    • 1996
  • In this paper, the pre-precessing method for the recognition of single vowels by use of spectrum envelope is presented , we use new method of an extrating spectrum envelope using the diffusion filter bank. We reduced the total processing time, and got higher enhancement of discrimination . By getting 88.3% of average recognition rate for single vowels of real voice through computer simulation, we confirmed it to be useful for speech recongition which use spectrum analysis for voice signal to have many frequency components.

  • PDF

A study of speech. enhancement through wavelet analysis using auditory mechanism (인간의 청각 메커니즘을 적용한 웨이블렛 분석을 통한 음성 향상에 대한 연구)

  • 이준석;길세기;홍준표;홍승홍
    • Proceedings of the IEEK Conference
    • /
    • 2002.06d
    • /
    • pp.397-400
    • /
    • 2002
  • This paper has been studied speech enhancement method in noisy environment. By mean of that we prefer human auditory mechanism which is perfect system and applied wavelet transform. Multi-resolution of wavelet transform make possible multiband spectrum analysis like human ears. This method was verified very effective way in noisy speech enhancement.

  • PDF