Design of Distributed Processing Framework Based on H-RTGL One-class Classifier for Big Data

Kim, Do Gyun;Choi, Jin Young;

doi:10.7469/JKSQM.2020.48.4.553

Journal of Korean Society for Quality Management (품질경영학회지)

Volume 48 Issue 4
/
Pages.553-566
/
2020
/
1229-1889(pISSN)
/
2287-9005(eISSN)

Korean Society for Quality Management (한국품질경영학회)

DOI QR Code

Design of Distributed Processing Framework Based on H-RTGL One-class Classifier for Big Data

빅데이터를 위한 H-RTGL 기반 단일 분류기 분산 처리 프레임워크 설계

Kim, Do Gyun (Engineering Research Institute, Ajou University) ;
Choi, Jin Young (Department of Industrial Engineering, Ajou University)

김도균 (아주대학교 공학연구소) ;
최진영 (아주대학교 산업공학과)

Received : 2020.10.30
Accepted : 2020.12.07
Published : 2020.12.31

https://doi.org/10.7469/JKSQM.2020.48.4.553 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

Purpose: The purpose of this study was to design a framework for generating one-class classification algorithm based on Hyper-Rectangle(H-RTGL) in a distributed environment connected by network. Methods: At first, we devised one-class classifier based on H-RTGL which can be performed by distributed computing nodes considering model and data parallelism. Then, we also designed facilitating components for execution of distributed processing. In the end, we validate both effectiveness and efficiency of the classifier obtained from the proposed framework by a numerical experiment using data set obtained from UCI machine learning repository. Results: We designed distributed processing framework capable of one-class classification based on H-RTGL in distributed environment consisting of physically separated computing nodes. It includes components for implementation of model and data parallelism, which enables distributed generation of classifier. From a numerical experiment, we could observe that there was no significant change of classification performance assessed by statistical test and elapsed time was reduced due to application of distributed processing in dataset with considerable size. Conclusion: Based on such result, we can conclude that application of distributed processing for generating classifier can preserve classification performance and it can improve the efficiency of classification algorithms. In addition, we suggested an idea for future research directions of this paper as well as limitation of our work.

Keywords

Acknowledgement

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. NRF-2017R1A2B4009841)

References

Baek, C. H., Choe, J. H., and Lim, S. U. 2018. Review and suggestion of characteristics and quality measurement items of artificial intelligence service. Journal of the Korean Society for Quality Management 46(3):677-694. https://doi.org/10.7469/JKSQM.2018.46.3.677
Bekkerman, R., Bilenko, M., and Langford, J. 2011. Scaling up machine learning: Parallel and distributed approaches. Cambridge University Press.
Caruana, G., Li, M., and Liu, Y. 2013. An ontology enhanced parallel SVM for scalable spam filter training. Neurocomputing 108:45-57. https://doi.org/10.1016/j.neucom.2012.12.001
Chang, E. Y. 2011. Psvm: Parallelizing support vector machines on distributed computers. In Foundations of Large-Scale Multimedia Information Management and Retrieval, 213-230.
Chao, T. S. 2001. Introduction to semiconductor manufacturing technology. SPIE PRESS.
Cho, H., Kim, K. T., Jang, Y. H., Kim, S. H., Kim, J. S., Park, K. Y., Jang, J. S., and Kim, J. M. 2015. Development of Load Profile Monitoring System Based on Cloud Computing in Automotive. Journal of the Korean Society for Quality Management 43(4):573-588. https://doi.org/10.7469/JKSQM.2015.43.4.573
Dai, W., and Ji, W. 2014. A mapreduce implementation of C4. 5 decision tree algorithm. International Journal of Database Theory and Application 7(1):49-60. https://doi.org/10.14257/ijdta.2014.7.1.05
Giacomelli, P. 2013. Apache mahout cookbook. Packt Publishing.
Guo, W., Alham, N. K., Liu, Y., Li, M., and Qi, M. 2016. A resource aware MapReduce based parallel SVM for large scale image classifications. Neural Processing Letters, 44(1):161-184. https://doi.org/10.1007/s11063-015-9472-z
Hodge, V. J., O'Keefe, S., and Austin, J. 2016. Hadoop neural network for parallel and distributed feature selection. Neural Networks, 78:24-35. https://doi.org/10.1016/j.neunet.2015.08.011
Huang, F., Matusevych, S., Anandkumar, A., Karampatziakis, N., and Mineiro, P. 2014. Distributed latent dirichlet allocation via tensor factorization. In NIPS Optimization Workshop. 1-5.
Jeong, I., Kim, D.G., Choi, J. Y., and Ko, J. 2019. Geometric one-class classifiers using hyper-rectangles for knowledge extraction. Expert Systems with Applications, 117:112-124. https://doi.org/10.1016/j.eswa.2018.09.042
Jin, R., Kou, C., Liu, R., and Li, Y. 2013. Efficient parallel spectral clustering algorithm design for large data sets under cloud computing environment. Journal of Cloud Computing: Advances, Systems and Applications 2(1):18. https://doi.org/10.1186/2192-113X-2-18
Khan, S. S., and Madden, M. G. 2014. One-class classification: taxonomy of study and review of techniques. The Knowledge Engineering Review 29(3):345-374. https://doi.org/10.1017/S026988891300043X
Kim, D. G., Choi, J. Y., and Ko, J. 2018. An Efficient One Class Classifier Using Gaussian-based Hyper-Rectangle Generation. Journal of Society of Korea Industrial and Systems Engineering 41(2):56-64. https://doi.org/10.11627/jkise.2018.41.2.056
Kim, D. G., and Choi, J. Y. 2020. A distributed processing framework based on H-RTGL for an efficient One-Class Classification of Big Data. Proceedings of the Korean Society for Quality Management Conference, 137.
Kim, H., Park, J., Jang, J., and Yoon, S. 2016. Deepspark: A spark-based distributed deep learning framework for commodity clusters. arXiv preprint arXiv:1602.08191.
Liu, Y., Xu, L., and Li, M. 2017. The parallelization of back propagation neural network in mapreduce and spark. International Journal of Parallel Programming 45(4):760-779. https://doi.org/10.1007/s10766-016-0401-1
Moritz, P., Nishihara, R., Stoica, I., and Jordan, M. I. 2015. Sparknet: Training deep networks in spark. arXiv preprint arXiv:1511.06051.
Padhy, R. P. 2013. Big data processing with Hadoop-MapReduce in cloud systems. International Journal of Cloud Computing and Services Science 2(1):16-27.
Parsian, M. 2015. Data algorithms: Recipes for scaling up with hadoop and spark. O'Reilly Media, Inc.
Peralta, B., Parra, L., Herrera, O., and Caro, L. 2017. Distributed mixture-of-experts for Big Data using PETUUM framework. In 2017 36th International Conference of the Chilean Computer Science Society (SCCC). 1-7.
Tax, D. M. J., 2010. One-class classifier results URL http://homepage.tudelft.nl/n9d04/occ/
Xing, E. P., Ho, Q., Dai, W., Kim, J. K., Wei, J., Lee, S., Zheng, X., Xie, P., Kumar, A., and Yu, Y. (2015). Petuum: A new platform for distributed machine learning on big data. IEEE Transactions on Big Data, 1(2), 49-67. https://doi.org/10.1109/TBDATA.2015.2472014
Zaharia, M., Chowdhury, M., Franklin, M. J., Shenker, S., and Stoica, I. 2010. Spark: Cluster computing with working sets. HotCloud, 10(10-10):95.

Journal of Korean Society for Quality Management (품질경영학회지)

Design of Distributed Processing Framework Based on H-RTGL One-class Classifier for Big Data

빅데이터를 위한 H-RTGL 기반 단일 분류기 분산 처리 프레임워크 설계

Abstract

Keywords

Acknowledgement

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)