Transfer Learning using Multiple ConvNet Layers Activation Features with Principal Component Analysis for Image Classification

전이학습 기반 다중 컨볼류션 신경망 레이어의 활성화 특징과 주성분 분석을 이용한 이미지 분류 방법

  • Byambajav, Batkhuu (Department of Computer Engineering, Inha University) ;
  • Alikhanov, Jumabek (Department of Computer Engineering, Inha University) ;
  • Fang, Yang (Department of Computer Engineering, Inha University) ;
  • Ko, Seunghyun (Department of Computer Engineering, Inha University) ;
  • Jo, Geun Sik (Department of Computer Engineering, Inha University)
  • Received : 2017.12.05
  • Accepted : 2018.02.27
  • Published : 2018.03.31


Convolutional Neural Network (ConvNet) is one class of the powerful Deep Neural Network that can analyze and learn hierarchies of visual features. Originally, first neural network (Neocognitron) was introduced in the 80s. At that time, the neural network was not broadly used in both industry and academic field by cause of large-scale dataset shortage and low computational power. However, after a few decades later in 2012, Krizhevsky made a breakthrough on ILSVRC-12 visual recognition competition using Convolutional Neural Network. That breakthrough revived people interest in the neural network. The success of Convolutional Neural Network is achieved with two main factors. First of them is the emergence of advanced hardware (GPUs) for sufficient parallel computation. Second is the availability of large-scale datasets such as ImageNet (ILSVRC) dataset for training. Unfortunately, many new domains are bottlenecked by these factors. For most domains, it is difficult and requires lots of effort to gather large-scale dataset to train a ConvNet. Moreover, even if we have a large-scale dataset, training ConvNet from scratch is required expensive resource and time-consuming. These two obstacles can be solved by using transfer learning. Transfer learning is a method for transferring the knowledge from a source domain to new domain. There are two major Transfer learning cases. First one is ConvNet as fixed feature extractor, and the second one is Fine-tune the ConvNet on a new dataset. In the first case, using pre-trained ConvNet (such as on ImageNet) to compute feed-forward activations of the image into the ConvNet and extract activation features from specific layers. In the second case, replacing and retraining the ConvNet classifier on the new dataset, then fine-tune the weights of the pre-trained network with the backpropagation. In this paper, we focus on using multiple ConvNet layers as a fixed feature extractor only. However, applying features with high dimensional complexity that is directly extracted from multiple ConvNet layers is still a challenging problem. We observe that features extracted from multiple ConvNet layers address the different characteristics of the image which means better representation could be obtained by finding the optimal combination of multiple ConvNet layers. Based on that observation, we propose to employ multiple ConvNet layer representations for transfer learning instead of a single ConvNet layer representation. Overall, our primary pipeline has three steps. Firstly, images from target task are given as input to ConvNet, then that image will be feed-forwarded into pre-trained AlexNet, and the activation features from three fully connected convolutional layers are extracted. Secondly, activation features of three ConvNet layers are concatenated to obtain multiple ConvNet layers representation because it will gain more information about an image. When three fully connected layer features concatenated, the occurring image representation would have 9192 (4096+4096+1000) dimension features. However, features extracted from multiple ConvNet layers are redundant and noisy since they are extracted from the same ConvNet. Thus, a third step, we will use Principal Component Analysis (PCA) to select salient features before the training phase. When salient features are obtained, the classifier can classify image more accurately, and the performance of transfer learning can be improved. To evaluate proposed method, experiments are conducted in three standard datasets (Caltech-256, VOC07, and SUN397) to compare multiple ConvNet layer representations against single ConvNet layer representation by using PCA for feature selection and dimension reduction. Our experiments demonstrated the importance of feature selection for multiple ConvNet layer representation. Moreover, our proposed approach achieved 75.6% accuracy compared to 73.9% accuracy achieved by FC7 layer on the Caltech-256 dataset, 73.1% accuracy compared to 69.2% accuracy achieved by FC8 layer on the VOC07 dataset, 52.2% accuracy compared to 48.7% accuracy achieved by FC7 layer on the SUN397 dataset. We also showed that our proposed approach achieved superior performance, 2.8%, 2.1% and 3.1% accuracy improvement on Caltech-256, VOC07, and SUN397 dataset respectively compare to existing work.


Supported by : National Research Foundation of Korea(NRF)


  1. Abdi, H. and L. J. Williams, "Principal component analysis," Journal of Wiley Interdisciplinary Reviews: Computational Statistics, Vol. 2, No. 4(2010), 433-459.
  2. Azizpour, H., A. Razavian, J. Sullivanm A. Make and S. Carlsson, "Factors of Transferability for a Generic ConvNet Representation," IEEE, 2014.
  3. Donahue, J., Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng and T. Darrell, "Decaf: A deep convolutional activation feature for generic visual recognition," arXiv preprint arXiv: 1310.1531, 2013.
  4. Everingham, M., S. A. Eslami, L. Van Gool, C. K. Williams, J. Winn and A. Zisserman, "The pascal visual object classes challenge: A retrospective," International Journal of Computer Vision, Vol. 111, No. 1(2015), 98-136.
  5. Fukishima, K., "Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position," Biological cybernetics, Vol. 36, No. 4(1990), 192-202.
  6. Girshick, R., J. Donahue, T. Darrell and J. Malik, "Rich feature hierarchies for accurate object detection and semantic segmentation," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014.
  7. Griffin, G., A. Holub and P. Perona, "Caltech-256 object category dataset," California Institute of Technology, 2017.
  8. Jia, Y., E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama and T. Darell, "Caffe: Convolutional architecture for fast feature embedding," in Proceedings of the ACM International Conference on Multimedia, 2014.
  9. Jumabek, A., G. Myeong Hyeon, K. Seunghyun and J. Geun-Sik "Transfer Learning Based on AdaBoost for Feature Selection from Multiple ConvNet Layer Features", Korea information processing society, Vol. 23, No.1(2016), 633-635.
  10. Krizhevsky, A., I. Sutskever and G. E. Hinton, "Imagenet classification with deep convolutional neural netwokrs," in Advances in neural information processing systems, 2012.
  11. Krizhevsky, A. and G. Hinton, "Learning multiple layers of features from tiny images," Citeseer, 2009.
  12. Lee, J.-s., and H. . Ahn, "A Study on the Prediction Model of Stock Price Index Trend based on GA-MSVM that Simultaneously Optimizes Feature and Instance Selection", Journal of Intelligence and Information Systems, Vol. 23, No. 4 (2017), 147-168.
  13. LeCun, Y., L. Bottou, U. Bengio and P. Haffner, "Gradient-based learning applied to document recognition," Proceedings of the IEEE, Vol. 86, No. 11(1998), 2278-2324.
  14. Oquab, M., L. Bottou, I. Laptev and J. Sivic, "Learning and transferring mid-level image representations using convolutional neural networks," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2014.
  15. Razavian, A., H. Azizpour, J. Sullivan and S. Carlsson, "CNN features off-the-shelf: an astounding baseline for recognition," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2014.
  16. Russakovsky, O., J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein and others, "Imagenet large scale visual recognition challenge," International Journal of Computer Vision, Vol. 115, No. 3(2015), 211-252.
  17. Schapire, R. E. and Y. Singer, "Improved boosting algorithms using confidence-rated predictions," Machine learning, Vol. 7, No. 3 (1999), 297-226.
  18. Song, J. H., H. S. Choi, and S. W. Kim, "A Study on Commodity Asset Investment Model Based on Machine Learning Technique", Journal of Intelligence and Information Systems, Vol. 23, No. 4 (2017), 127-146.
  19. Sukjae, C., L. Jungwon, and K. Ohbyung, "Financial Fraud Detection using Text Mining Analysis against Municipal Cybercriminality", Journal of Intelligence and Information Systems, Vol. 23, No. 3 (2017), 119-138.
  20. Szegedy, C., W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke and A. Rabinovich, "Going deeper with convolutions," In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), (2015), 1-9.
  21. Xiao, J., K. A. Ehinger, J. Hays, A. Torralba and A. Olivia, "Sun database: Exploring a large collection of scene categories," International Journal of Computer Vision, (2014), 1-20.
  22. Zeiler, M. D. and R. Fergus, "Visualizing and understanding convolutional networks," in Computer Vision--ECCV 2014, 2014.