High Representation based GAN defense for Adversarial Attack

Sutanto, Richard Evan;Lee, Suk Ho

  • Received : 2019.02.08
  • Accepted : 2019.02.19
  • Published : 2019.03.31


These days, there are many applications using neural networks as parts of their system. On the other hand, adversarial examples have become an important issue concerining the security of neural networks. A classifier in neural networks can be fooled and make it miss-classified by adversarial examples. There are many research to encounter adversarial examples by using denoising methods. Some of them using GAN (Generative Adversarial Network) in order to remove adversarial noise from input images. By producing an image from generator network that is close enough to the original clean image, the adversarial examples effects can be reduced. However, there is a chance when adversarial noise can survive the approximation process because it is not like a normal noise. In this chance, we propose a research that utilizes high-level representation in the classifier by combining GAN network with a trained U-Net network. This approach focuses on minimizing the loss function on high representation terms, in order to minimize the difference between the high representation level of the clean data and the approximated output of the noisy data in the training dataset. Furthermore, the generated output is checked whether it shows minimum error compared to true label or not. U-Net network is trained with true label to make sure the generated output gives minimum error in the end. At last, the remaining adversarial noise that still exist after low-level approximation can be removed with the U-Net, because of the minimization on high representation terms.


Neural networks;Adversarial examples;Generative adversarial network;Adversarial attack;Adversarial defense


  1. A.S. Rakin, Z. He, B. Gong, and D. Fan, "Blind Pre-Processing: A Robust Defense Method Against Adversarial Examples". arXiv preprint arXiv:1802.01549, 2018.
  2. C. Guo, M. Rana, M. Cisse, and L.V. Maaten, "Countering Adversarial Images using Input Transformations". arXiv preprint arXiv:1711.00117, 2017.
  3. I. J. Goodfellow, J. Shlens, and C. Szegedy, "Explaining and harnessing adversarial examples," arXiv preprint arXiv:1412.6572, 2014.
  4. C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. J. Goodfellow, and R. Fergus, "Intriguing properties of neural networks". arXiv preprint arXiv:1312.6199, 2013.
  5. I. J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. C. Courville, and Y. Bengio, "Generative Adversarial Nets". Neural Information Processing System (NIPS), 2014.
  6. O. Ronneberger, P. Fischer, and T. Brox, "U-Net: Convolutional Networks for Biomedical Image Segmentation". Medical Image Computing and Computer Assisted Intervention (MICCAI), 2015.
  7. P. Samangouei, M. Kabkab, and R. Chellappa, "Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models". arXiv preprint arXiv:1805.06605, 2017.
  8. L. Deng, "The MNIST Database of Handwritten Digit Images for Machine Learning Research [Best of the Web]". IEEE Signal Processing Magazine, Vol 29, pp 141-142, 2012.
  9. Perhaps the Simplest Introduction of Adversarial Examples Ever.


Grant : Development of prevention technology against AI dysfunction induced by deception attack

Supported by : National Research Foundation of Korea(NRF), Institute for Information and Communications Technology Promotion(IITP)