DOI QR코드

DOI QR Code

Design of Distributed Processing Framework Based on H-RTGL One-class Classifier for Big Data

빅데이터를 위한 H-RTGL 기반 단일 분류기 분산 처리 프레임워크 설계

  • Received : 2020.10.30
  • Accepted : 2020.12.07
  • Published : 2020.12.31

Abstract

Purpose: The purpose of this study was to design a framework for generating one-class classification algorithm based on Hyper-Rectangle(H-RTGL) in a distributed environment connected by network. Methods: At first, we devised one-class classifier based on H-RTGL which can be performed by distributed computing nodes considering model and data parallelism. Then, we also designed facilitating components for execution of distributed processing. In the end, we validate both effectiveness and efficiency of the classifier obtained from the proposed framework by a numerical experiment using data set obtained from UCI machine learning repository. Results: We designed distributed processing framework capable of one-class classification based on H-RTGL in distributed environment consisting of physically separated computing nodes. It includes components for implementation of model and data parallelism, which enables distributed generation of classifier. From a numerical experiment, we could observe that there was no significant change of classification performance assessed by statistical test and elapsed time was reduced due to application of distributed processing in dataset with considerable size. Conclusion: Based on such result, we can conclude that application of distributed processing for generating classifier can preserve classification performance and it can improve the efficiency of classification algorithms. In addition, we suggested an idea for future research directions of this paper as well as limitation of our work.

Keywords

Acknowledgement

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. NRF-2017R1A2B4009841)

References

  1. Baek, C. H., Choe, J. H., and Lim, S. U. 2018. Review and suggestion of characteristics and quality measurement items of artificial intelligence service. Journal of the Korean Society for Quality Management 46(3):677-694. https://doi.org/10.7469/JKSQM.2018.46.3.677
  2. Bekkerman, R., Bilenko, M., and Langford, J. 2011. Scaling up machine learning: Parallel and distributed approaches. Cambridge University Press.
  3. Caruana, G., Li, M., and Liu, Y. 2013. An ontology enhanced parallel SVM for scalable spam filter training. Neurocomputing 108:45-57. https://doi.org/10.1016/j.neucom.2012.12.001
  4. Chang, E. Y. 2011. Psvm: Parallelizing support vector machines on distributed computers. In Foundations of Large-Scale Multimedia Information Management and Retrieval, 213-230.
  5. Chao, T. S. 2001. Introduction to semiconductor manufacturing technology. SPIE PRESS.
  6. Cho, H., Kim, K. T., Jang, Y. H., Kim, S. H., Kim, J. S., Park, K. Y., Jang, J. S., and Kim, J. M. 2015. Development of Load Profile Monitoring System Based on Cloud Computing in Automotive. Journal of the Korean Society for Quality Management 43(4):573-588. https://doi.org/10.7469/JKSQM.2015.43.4.573
  7. Dai, W., and Ji, W. 2014. A mapreduce implementation of C4. 5 decision tree algorithm. International Journal of Database Theory and Application 7(1):49-60. https://doi.org/10.14257/ijdta.2014.7.1.05
  8. Giacomelli, P. 2013. Apache mahout cookbook. Packt Publishing.
  9. Guo, W., Alham, N. K., Liu, Y., Li, M., and Qi, M. 2016. A resource aware MapReduce based parallel SVM for large scale image classifications. Neural Processing Letters, 44(1):161-184. https://doi.org/10.1007/s11063-015-9472-z
  10. Hodge, V. J., O'Keefe, S., and Austin, J. 2016. Hadoop neural network for parallel and distributed feature selection. Neural Networks, 78:24-35. https://doi.org/10.1016/j.neunet.2015.08.011
  11. Huang, F., Matusevych, S., Anandkumar, A., Karampatziakis, N., and Mineiro, P. 2014. Distributed latent dirichlet allocation via tensor factorization. In NIPS Optimization Workshop. 1-5.
  12. Jeong, I., Kim, D.G., Choi, J. Y., and Ko, J. 2019. Geometric one-class classifiers using hyper-rectangles for knowledge extraction. Expert Systems with Applications, 117:112-124. https://doi.org/10.1016/j.eswa.2018.09.042
  13. Jin, R., Kou, C., Liu, R., and Li, Y. 2013. Efficient parallel spectral clustering algorithm design for large data sets under cloud computing environment. Journal of Cloud Computing: Advances, Systems and Applications 2(1):18. https://doi.org/10.1186/2192-113X-2-18
  14. Khan, S. S., and Madden, M. G. 2014. One-class classification: taxonomy of study and review of techniques. The Knowledge Engineering Review 29(3):345-374. https://doi.org/10.1017/S026988891300043X
  15. Kim, D. G., Choi, J. Y., and Ko, J. 2018. An Efficient One Class Classifier Using Gaussian-based Hyper-Rectangle Generation. Journal of Society of Korea Industrial and Systems Engineering 41(2):56-64. https://doi.org/10.11627/jkise.2018.41.2.056
  16. Kim, D. G., and Choi, J. Y. 2020. A distributed processing framework based on H-RTGL for an efficient One-Class Classification of Big Data. Proceedings of the Korean Society for Quality Management Conference, 137.
  17. Kim, H., Park, J., Jang, J., and Yoon, S. 2016. Deepspark: A spark-based distributed deep learning framework for commodity clusters. arXiv preprint arXiv:1602.08191.
  18. Liu, Y., Xu, L., and Li, M. 2017. The parallelization of back propagation neural network in mapreduce and spark. International Journal of Parallel Programming 45(4):760-779. https://doi.org/10.1007/s10766-016-0401-1
  19. Moritz, P., Nishihara, R., Stoica, I., and Jordan, M. I. 2015. Sparknet: Training deep networks in spark. arXiv preprint arXiv:1511.06051.
  20. Padhy, R. P. 2013. Big data processing with Hadoop-MapReduce in cloud systems. International Journal of Cloud Computing and Services Science 2(1):16-27.
  21. Parsian, M. 2015. Data algorithms: Recipes for scaling up with hadoop and spark. O'Reilly Media, Inc.
  22. Peralta, B., Parra, L., Herrera, O., and Caro, L. 2017. Distributed mixture-of-experts for Big Data using PETUUM framework. In 2017 36th International Conference of the Chilean Computer Science Society (SCCC). 1-7.
  23. Tax, D. M. J., 2010. One-class classifier results URL http://homepage.tudelft.nl/n9d04/occ/
  24. Xing, E. P., Ho, Q., Dai, W., Kim, J. K., Wei, J., Lee, S., Zheng, X., Xie, P., Kumar, A., and Yu, Y. (2015). Petuum: A new platform for distributed machine learning on big data. IEEE Transactions on Big Data, 1(2), 49-67. https://doi.org/10.1109/TBDATA.2015.2472014
  25. Zaharia, M., Chowdhury, M., Franklin, M. J., Shenker, S., and Stoica, I. 2010. Spark: Cluster computing with working sets. HotCloud, 10(10-10):95.