Advanced SearchSearch Tips
Design of a Large-scale Task Dispatching & Processing System based on Hadoop
facebook(new window)  Pirnt(new window) E-mail(new window) Excel Download
  • Journal title : Journal of KIISE
  • Volume 43, Issue 6,  2016, pp.613-620
  • Publisher : Korean Institute of Information Scientists and Engineers
  • DOI : 10.5626/JOK.2016.43.6.613
 Title & Authors
Design of a Large-scale Task Dispatching & Processing System based on Hadoop
Kim, Jik-Soo; Cao, Nguyen; Kim, Seoyoung; Hwang, Soonwook;
This paper presents a MOHA(Many-Task Computing on Hadoop) framework which aims to effectively apply the Many-Task Computing(MTC) technologies originally developed for high-performance processing of many tasks, to the existing Big Data processing platform Hadoop. We present basic concepts, motivation, preliminary results of PoC based on distributed message queue, and future research directions of MOHA. MTC applications may have relatively low I/O requirements per task. However, a very large number of tasks should be efficiently processed with potentially heavy inter-communications based on files. Therefore, MTC applications can show another pattern of data-intensive workloads compared to existing Hadoop applications, typically based on relatively large data block sizes. Through an effective convergence of MTC and Big Data technologies, we can introduce a new MOHA framework which can support the large-scale scientific applications along with the Hadoop ecosystem, which is evolving into a multi-application platform.
Many-Task Computing;Hadoop;Big Data platform;multi-level scheduling;MOHA;
 Cited by
D. Thain, T. Tannenbaum, and M. Livny, "Distributed computing in practice: the Condor experience," Concurrency and Computation: Practice and Experience, Vol. 17, No. 2-4, pp. 323-356, 2005. crossref(new window)

B. Bode, D. M. Halstead, R. Kendall, Z. Lei, and D. Jackson, "The Portable Batch Scheduler and the Maui Scheduler on Linux Clusters," Proc. of the Usenix, Proceedings of the 4th Annual Linux Showcase & Conference, Nov. 2000.

IBM Tivoli Workload Scheduler LoadLeveler, [Online]. Available:

W. Gentzsch, "Sun Grid Engine: Towards Creating a Compute Power Grid," Proc. of the 1st IEEE International Symposium on Cluster Computing and the Grid (CCGrid 2001), May 2001.

J. J. Dongarra, S. W. Otto, M. Snir, and D. Walker, "A message passing standard for MPP and workstations," Communications of the ACM, Vol. 39, No. 7, pp. 84-90, 1996.

I. Raicu, I. Foster and Y. Zhao, "Many-Task Computing for Grids and Supercomputers," Proc. of the IEEE/ACM Workshop on Many-Task Computing on Grids and Supercomputers (MTAGS'08), 2008.

Workshop on Many-Task Computing on Clouds, Grids, and Supercomputers, [Online]. Available:

Ioan Raicu et al., "Middleware Support for Many-Task Computing," Cluster Computing, Vol. 13, Issue 3, Sep. 2010.

A. Luckow, M. Santcroos, O. Weidner, A. Merzky, P. Mantha, and S. Jha, "P* : A Model of Pilot Abstractions," Proc. of the 8th IEEE International Conference on eScience (eScience 2012), Oct. 2012.

J-S. Kim, S. Kim, S. Kim, S. Rho, S. Kim, and S. Hwang, "An Analysis of Multi-level Scheduling Mechanism for Large-scale Scientific Computing," Journal of KIISE: Computing Practice and Letters, Vol. 20, No. 7, Jul. 2014.

Apache Hadoop:

Vinod Kumar Vavilapalli et. al., "Apache Hadoop YARN: yet another resource negotiator," Proc. of the 4th annual Symposium on Cloud Computing (SOCC'13), Oct. 2013.

Arun C. Murthy et. al., Apache Hadoop YARN: Moving beyond MapReduce and Batch Processing with Apache Hadoop 2, Addison-Wesley, 2014.

J.-S. Kim, S. Rho, S. Kim, S. Kim, S. Kim, and S. Hwang, "HTCaaS: Leveraging Distributed Supercomputing Infrastructures for Large-Scale Scientific Computing," Proc. of the 6th Workshop on Many-Task Computing on Clouds, Grids, and Supercomputers (MTAGS'13) held with SC13, Nov. 2013.

J. Kreps, N. Narkhede, and J. Rao, "Kafka: A distributed messaging system for log processing," NetDB, 2011.

B. Snyder, D. Bosanac, And R. Davies, ActiveMQ in action, Manning, 2011.