DOI QR코드

DOI QR Code

Effectiveness of Normalization Pre-Processing of Big Data to the Machine Learning Performance

빅데이터의 정규화 전처리과정이 기계학습의 성능에 미치는 영향

  • Jo, Jun-Mo (Dept. Electronic Engineering, TongMyong University)
  • Received : 2019.05.17
  • Accepted : 2019.06.15
  • Published : 2019.06.30

Abstract

Recently, the massive growth in the scale of data has been observed as a major issue in the Big Data. Furthermore, the Big Data should be preprocessed for normalization to get a high performance of the Machine learning since the Big Data is also an input of Machine Learning. The performance varies by many factors such as the scope of the columns in a Big Data or the methods of normalization preprocessing. In this paper, the various types of normalization preprocessing methods and the scopes of the Big Data columns will be applied to the SVM(: Support Vector Machine) as a Machine Learning method to get the efficient environment for the normalization preprocessing. The Machine Learning experiment has been programmed in Python and the Jupyter Notebook.

최근, 빅데이터 분야에서는 빅 데이터의 양적 팽창이 주요 이슈로 떠오르고 있다. 더군다나 이러한 빅데이터는 기계학습의 입력값으로 사용되어지고 있으며 이들의 성능을 향상시키기 위해 정규화 전처리가 필요하다. 이러한 성능은 빅데이터 컬럼의 범위나 정규화 전처리 방식에 따라 크게 좌우된다. 본 논문에서는 다양한 종류의 정규화 전처리 방식과 빅데이터 컬럼의 범위를 조절하면서 서포트벡터머신(SVM)의 기계학습방식에 적용함으로써 더욱 효과적인 정규화 전처리 방식을 파악하고자 하였다. 이를 위하여 파이썬언어와 주피터 노트북 환경에서 기계학습을 수행하고 분석하였다.

Keywords

KCTSAD_2019_v14n3_547_f0001.png 이미지

Fig. 1 Illustration of the decomposition. (a) An original layer with complexity O(dk2c). (b) An approximated layer with complexity reduced to O(d0k2c) + O(dd0) [1]

KCTSAD_2019_v14n3_547_f0002.png 이미지

Fig. 2 Result of the normalization(0, 3, 9, 12)

KCTSAD_2019_v14n3_547_f0003.png 이미지

Fig. 3 Result of the normalization(0, 3, 9, 12)

KCTSAD_2019_v14n3_547_f0004.png 이미지

Fig. 4 Enhancement by the normalization

Table 1. Scikit-learn utilities used in training

KCTSAD_2019_v14n3_547_t0001.png 이미지

Table 2. Std. Dev. and accuracy result of columns

KCTSAD_2019_v14n3_547_t0002.png 이미지

References

  1. X. Zhang, J. Zou, K. He, and J. Sun, "Accelerating very deep convolutional networks for classification and detection," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 38, 2015, pp. 1943-1955. https://doi.org/10.1109/TPAMI.2015.2502579
  2. R. Sathya, and A. Annamma, "Comparison of Supervised and Unsupervised Learning Algorithms for Pattern Classification," IJARAI, vol. 2, no. 2, 2013, pp.34-38.
  3. R. Sathya and A. Abraham, "Unsupervised Control Paradigm for Performance Evaluation," International Journal of Computer Application, vol. 44, no. 20, 2012, pp. 27-31. https://doi.org/10.5120/6380-8850
  4. X. C. Yin, X. Yin, K. Huang, and H. W. Hao, "Robust text detection in natural scene images," IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 36, no. 5, 2014, pp. 970-983. https://doi.org/10.1109/TPAMI.2013.182
  5. N. Kim and Y. Bae, "Status Diagnosis of Pump and Motor Applying K-Nearest Neighbors," J. of the Korea Institute of Electronic Communication Science, vol. 13, no. 6, 2018, pp. 1249-1255. https://doi.org/10.13067/JKIECS.2018.13.6.1249
  6. J. M. Keller, M. R. Gray, and J. A. Givens, "A Fuzzy K-Nearest Neighbor Algorithm," IEEE Trans. Systems, Man, and Cybernetics, vol. 15, no. 4, 1985, pp. 581-585.
  7. S. Bang, "Implementation of Image based Fire Detection System Using Convolution Neural Network," J. of the Korea Institute of Electronic Communication Science, vol. 12, no. 2, 2017, pp. 331-336. https://doi.org/10.13067/JKIECS.2017.12.2.331
  8. Y. Kim, S. Park, and D. Kim, "Research on Robust Face Recognition against Lighting Variation using CNN," J. of the Korea Institute of Electronic Communication Science, vol. 12, no. 2, 2017, pp. 325-330. https://doi.org/10.13067/JKIECS.2017.12.2.325
  9. C. Jung, R. Jang, D. Nyang, and K. Lee "A Study of User Behavior Recognition-Based PIN Entry Using Machine Learning Technique," Korea Information Processing Society review, computer and communication systems, vol. 7, no. 5, 2018, pp. 127-136.
  10. G. Lee, H. Ha, H. Hong, and H. Kim "Exploratory Research on Automating the Analysis of Scientific Argumentation Using Machine Learning," J. of the Korean Association for Science Education, vol. 38, no. 2, 2018, pp. 219-234. https://doi.org/10.14697/JKASE.2018.38.2.219