• Title/Summary/Keyword: Wasserstein GAN

Search Result 12, Processing Time 0.028 seconds

Proposing Effective Regularization Terms for Improvement of WGAN (WGAN의 성능개선을 위한 효과적인 정칙항 제안)

  • Hahn, Hee Il
    • Journal of Korea Multimedia Society
    • /
    • v.24 no.1
    • /
    • pp.13-20
    • /
    • 2021
  • A Wasserstein GAN(WGAN), optimum in terms of minimizing Wasserstein distance, still suffers from inconsistent convergence or unexpected output due to inherent learning instability. It is widely known some kinds of restriction on the discriminative function should be considered to solve such problems, which implies the importance of Lipschitz continuity. Unfortunately, there are few known methods to satisfactorily maintain the Lipschitz continuity of the discriminative function. In this paper we propose techniques to stably maintain the Lipschitz continuity of the discriminative function by adding effective regularization terms to the objective function, which limit the magnitude of the gradient vectors of the discriminator to one or less. Extensive experiments are conducted to evaluate the performance of the proposed techniques, which shows the single-sided penalty improves convergence compared with the gradient penalty at the early learning process, while the proposed additional penalty increases inception scores by 0.18 after 100,000 number of learning.

Technique Proposal to Stabilize Lipschitz Continuity of WGAN Based on Regularization Terms (정칙화 항에 기반한 WGAN의 립쉬츠 연속 안정화 기법 제안)

  • Hahn, Hee-Il
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.20 no.1
    • /
    • pp.239-246
    • /
    • 2020
  • The recently proposed Wasserstein generative adversarial network (WGAN) has improved some of the tricky and unstable training processes that are chronic problems of the generative adversarial network(GAN), but there are still cases where it generates poor samples or fails to converge. In order to solve the problems, this paper proposes algorithms to improve the sampling process so that the discriminator can more accurately estimate the data probability distribution to be modeled and to stably maintain the discriminator should be Lipschitz continuous. Through various experiments, we analyze the characteristics of the proposed techniques and verify their performances.

Experimental Analysis of Equilibrization in Binary Classification for Non-Image Imbalanced Data Using Wasserstein GAN

  • Wang, Zhi-Yong;Kang, Dae-Ki
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.11 no.4
    • /
    • pp.37-42
    • /
    • 2019
  • In this paper, we explore the details of three classic data augmentation methods and two generative model based oversampling methods. The three classic data augmentation methods are random sampling (RANDOM), Synthetic Minority Over-sampling Technique (SMOTE), and Adaptive Synthetic Sampling (ADASYN). The two generative model based oversampling methods are Conditional Generative Adversarial Network (CGAN) and Wasserstein Generative Adversarial Network (WGAN). In imbalanced data, the whole instances are divided into majority class and minority class, where majority class occupies most of the instances in the training set and minority class only includes a few instances. Generative models have their own advantages when they are used to generate more plausible samples referring to the distribution of the minority class. We also adopt CGAN to compare the data augmentation performance with other methods. The experimental results show that WGAN-based oversampling technique is more stable than other approaches (RANDOM, SMOTE, ADASYN and CGAN) even with the very limited training datasets. However, when the imbalanced ratio is too small, generative model based approaches cannot achieve satisfying performance than the conventional data augmentation techniques. These results suggest us one of future research directions.

Depth Image Restoration Using Generative Adversarial Network (Generative Adversarial Network를 이용한 손실된 깊이 영상 복원)

  • Nah, John Junyeop;Sim, Chang Hun;Park, In Kyu
    • Journal of Broadcast Engineering
    • /
    • v.23 no.5
    • /
    • pp.614-621
    • /
    • 2018
  • This paper proposes a method of restoring corrupted depth image captured by depth camera through unsupervised learning using generative adversarial network (GAN). The proposed method generates restored face depth images using 3D morphable model convolutional neural network (3DMM CNN) with large-scale CelebFaces Attribute (CelebA) and FaceWarehouse dataset for training deep convolutional generative adversarial network (DCGAN). The generator and discriminator equip with Wasserstein distance for loss function by utilizing minimax game. Then the DCGAN restore the loss of captured facial depth images by performing another learning procedure using trained generator and new loss function.

Automaitc Generation of Fashion Image Dataset by Using Progressive Growing GAN (PG-GAN을 이용한 패션이미지 데이터 자동 생성)

  • Kim, Yanghee;Lee, Chanhee;Whang, Taesun;Kim, Gyeongmin;Lim, Heuiseok
    • Journal of Internet of Things and Convergence
    • /
    • v.4 no.2
    • /
    • pp.1-6
    • /
    • 2018
  • Techniques for generating new sample data from higher dimensional data such as images have been utilized variously for speech synthesis, image conversion and image restoration. This paper adopts Progressive Growing of Generative Adversarial Networks(PG-GANs) as an implementation model to generate high-resolution images and to enhance variation of the generated images, and applied it to fashion image data. PG-GANs allows the generator and discriminator to progressively learn at the same time, continuously adding new layers from low-resolution images to result high-resolution images. We also proposed a Mini-batch Discrimination method to increase the diversity of generated data, and proposed a Sliced Wasserstein Distance(SWD) evaluation method instead of the existing MS-SSIM to evaluate the GAN model.

Comparison Analysis on Automatic Coloring System Algorithm Using Machine Learning (머신러닝을 활용한 자동 채색 시스템 알고리즘 비교 분석)

  • Lee, song eun;Lee, Ji Yeon;Kim, Na Heon;Kim, Jin Hwan
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2017.11a
    • /
    • pp.792-794
    • /
    • 2017
  • 현재 머신러닝(Machine Learning) 기술은 기존의 머신러닝과 조합 및 변형 되어 조금 더 발전 된 형태로 연구되어지고 있다. 따라서 수많은 알고리즘이 개발되고 있는 시점이다. 본 연구는 최근 좋은 결과로 관심을 받고있는 GAN(Generative Adversarial Net)을 중심으로 IT기술의 머신러닝과 그림을 조합하여 자동채색을 목적으로 GAN 알고리즘을 비교하고 분석하고자 한다. GAN 알고리즘들 가운데서 'Conditional GAN'과 'Wasserstein GAN'을 사용하여 자동채색을 적용시켰고, 가장 부합한 알고리즘을 찾고 성능을 비교하여 어떠한 알고리즘이 '자동채색' 목적에 더 부합한지 비교하고 판단 한다.

Extending StarGAN-VC to Unseen Speakers Using RawNet3 Speaker Representation (RawNet3 화자 표현을 활용한 임의의 화자 간 음성 변환을 위한 StarGAN의 확장)

  • Bogyung Park;Somin Park;Hyunki Hong
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.7
    • /
    • pp.303-314
    • /
    • 2023
  • Voice conversion, a technology that allows an individual's speech data to be regenerated with the acoustic properties(tone, cadence, gender) of another, has countless applications in education, communication, and entertainment. This paper proposes an approach based on the StarGAN-VC model that generates realistic-sounding speech without requiring parallel utterances. To overcome the constraints of the existing StarGAN-VC model that utilizes one-hot vectors of original and target speaker information, this paper extracts feature vectors of target speakers using a pre-trained version of Rawnet3. This results in a latent space where voice conversion can be performed without direct speaker-to-speaker mappings, enabling an any-to-any structure. In addition to the loss terms used in the original StarGAN-VC model, Wasserstein distance is used as a loss term to ensure that generated voice segments match the acoustic properties of the target voice. Two Time-Scale Update Rule (TTUR) is also used to facilitate stable training. Experimental results show that the proposed method outperforms previous methods, including the StarGAN-VC network on which it was based.

Combining multi-task autoencoder with Wasserstein generative adversarial networks for improving speech recognition performance (음성인식 성능 개선을 위한 다중작업 오토인코더와 와설스타인식 생성적 적대 신경망의 결합)

  • Kao, Chao Yuan;Ko, Hanseok
    • The Journal of the Acoustical Society of Korea
    • /
    • v.38 no.6
    • /
    • pp.670-677
    • /
    • 2019
  • As the presence of background noise in acoustic signal degrades the performance of speech or acoustic event recognition, it is still challenging to extract noise-robust acoustic features from noisy signal. In this paper, we propose a combined structure of Wasserstein Generative Adversarial Network (WGAN) and MultiTask AutoEncoder (MTAE) as deep learning architecture that integrates the strength of MTAE and WGAN respectively such that it estimates not only noise but also speech features from noisy acoustic source. The proposed MTAE-WGAN structure is used to estimate speech signal and the residual noise by employing a gradient penalty and a weight initialization method for Leaky Rectified Linear Unit (LReLU) and Parametric ReLU (PReLU). The proposed MTAE-WGAN structure with the adopted gradient penalty loss function enhances the speech features and subsequently achieve substantial Phoneme Error Rate (PER) improvements over the stand-alone Deep Denoising Autoencoder (DDAE), MTAE, Redundant Convolutional Encoder-Decoder (R-CED) and Recurrent MTAE (RMTAE) models for robust speech recognition.

Comparison of Seismic Data Interpolation Performance using U-Net and cWGAN (U-Net과 cWGAN을 이용한 탄성파 탐사 자료 보간 성능 평가)

  • Yu, Jiyun;Yoon, Daeung
    • Geophysics and Geophysical Exploration
    • /
    • v.25 no.3
    • /
    • pp.140-161
    • /
    • 2022
  • Seismic data with missing traces are often obtained regularly or irregularly due to environmental and economic constraints in their acquisition. Accordingly, seismic data interpolation is an essential step in seismic data processing. Recently, research activity on machine learning-based seismic data interpolation has been flourishing. In particular, convolutional neural network (CNN) and generative adversarial network (GAN), which are widely used algorithms for super-resolution problem solving in the image processing field, are also used for seismic data interpolation. In this study, CNN-based algorithm, U-Net and GAN-based algorithm, and conditional Wasserstein GAN (cWGAN) were used as seismic data interpolation methods. The results and performances of the methods were evaluated thoroughly to find an optimal interpolation method, which reconstructs with high accuracy missing seismic data. The work process for model training and performance evaluation was divided into two cases (i.e., Cases I and II). In Case I, we trained the model using only the regularly sampled data with 50% missing traces. We evaluated the model performance by applying the trained model to a total of six different test datasets, which consisted of a combination of regular, irregular, and sampling ratios. In Case II, six different models were generated using the training datasets sampled in the same way as the six test datasets. The models were applied to the same test datasets used in Case I to compare the results. We found that cWGAN showed better prediction performance than U-Net with higher PSNR and SSIM. However, cWGAN generated additional noise to the prediction results; thus, an ensemble technique was performed to remove the noise and improve the accuracy. The cWGAN ensemble model removed successfully the noise and showed improved PSNR and SSIM compared with existing individual models.

Voice Frequency Synthesis using VAW-GAN based Amplitude Scaling for Emotion Transformation

  • Kwon, Hye-Jeong;Kim, Min-Jeong;Baek, Ji-Won;Chung, Kyungyong
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.2
    • /
    • pp.713-725
    • /
    • 2022
  • Mostly, artificial intelligence does not show any definite change in emotions. For this reason, it is hard to demonstrate empathy in communication with humans. If frequency modification is applied to neutral emotions, or if a different emotional frequency is added to them, it is possible to develop artificial intelligence with emotions. This study proposes the emotion conversion using the Generative Adversarial Network (GAN) based voice frequency synthesis. The proposed method extracts a frequency from speech data of twenty-four actors and actresses. In other words, it extracts voice features of their different emotions, preserves linguistic features, and converts emotions only. After that, it generates a frequency in variational auto-encoding Wasserstein generative adversarial network (VAW-GAN) in order to make prosody and preserve linguistic information. That makes it possible to learn speech features in parallel. Finally, it corrects a frequency by employing Amplitude Scaling. With the use of the spectral conversion of logarithmic scale, it is converted into a frequency in consideration of human hearing features. Accordingly, the proposed technique provides the emotion conversion of speeches in order to express emotions in line with artificially generated voices or speeches.