An Efficient Algorithm for Big Data Prediction of Pipelining, Concurrency (PCP) and Parallelism based on TSK Fuzzy Model

TSK 퍼지 모델 이용한 효율적인 빅 데이터 PCP 예측 알고리즘

Kim, Jang-Young

  • Received : 2015.08.14
  • Accepted : 2015.09.24
  • Published : 2015.10.31


The time to address the exabytes of data has come as the information age accelerates. Big data transfer technology is essential for processing large amounts of data. This paper posits to transfer big data in the optimal conditions by the proposed algorithm for predicting the optimal combination of Pipelining, Concurrency, and Parallelism (PCP), which are major functions of GridFTP. In addition, the author introduced a simple design process of Takagi-Sugeno-Kang (TSK) fuzzy model and designed a model for predicting transfer throughput with optimal combination of Pipelining, Concurrency and Parallelism. Hence, the author evaluated the model of the proposed algorithm and the TSK model to prove the superiority.


Pipelining;Concurrency;Parallelism;Big data;TSK fuzzy model


  1. GridFTP, Globus Online “”
  2. T. Takagi and M. Sugeno, “Fuzzy identification of systems and its applications to modeling and control,” IEEE Trans. Syst., Man, Cybern., vol. 15. pp. 116-132, Jan. 1985
  3. M. Sugeno and T. Yasukawa, “A fuzzy-logic-based approach to qualitative modeling,” IEEE Trans, Fuzzy Syst., vol. 1, pp. 7-31, 1993.
  4. J. Kim, E. Yildirim, and T. Kosar, “A highly-accurate and low-overhead prediction model for transfer throughput optimization,” Proc. of DISCS Workshop, November 2012.
  5. B. Allen, J. Bresnahan, L. Childers, I. Foster, G. Kandaswamy, R. Kettimuthu, J. Kordas, M. Link, S. Martin, K. Pickett, and S. Tuecke, “Software as a service for data scientists,” Communications of the ACM, vol.55:2, pp.81–88, 2012.
  6. E. Yildirim, J. Kim, and T. Kosar, "Modeling Throughput Sampling Size for a Cloud-hosted Data Scheduling and Optimization Service," In Future Generation Computer Systems (FGCS), Vol. 29, No. 7 (2013) pp 1795-1807.
  7. E. Yildirim, J. Kim, and T. Kosar (Best Paper Award), "How GridFTP Pipelining, Parallelism and Concurrency Work: A Guide for optimizing large dataset transfers," In Proceedings of IEEE/ACM Supercomputing'12 Workshop on Network-Aware Data Management (NDM 2012), Salt Lake City, UT, November 2012.
  8. E. Yildirim, M. Balman, and T. Kosar, “Data-intensive Distributed Computing: Challenges and Solutions for Large-scale Information Management, ch. Data-aware Distributed Computing, IGI-Global, 2012.
  9. E. Yildirim, D. yin, and T. Kosar, “Prediction of optimal parallelism level in wide area data transfers,” IEEE Transactions on Parallel and Distributed Systems, vol. 22, no. 12, pp. 2033-2045, 2011.
  10. E. Yildirim, J. Kim, and T. Kosar, “Optimizing the sample size for a cloud-hosted data scheduling service,” Proc. 2nd International Workshop on Cloud Computing and Scientific Applications (CCSA in conjunction with CCGRID12), 2012.