High-Performance Korean Morphological Analyzer Using the MapReduce Framework on the GPU

  • Cho, Shi-Won (Division of Electronics and Electrical Engineering, Dongguk University) ;
  • Lee, Dong-Wook (Division of Electronics and Electrical Engineering, Dongguk University)
  • Received : 2010.09.11
  • Accepted : 2011.02.26
  • Published : 2011.07.01


To meet the scalability and performance requirements of data analyses, which often involve voluminous data, efficient parallel or concurrent algorithms and frameworks are essential. We present a high-performance Korean morphological analyzer which employs the MapReduce framework on the graphics processing unit (GPU). MapReduce is a programming framework introduced by Google to aid the development of web search applications on a large number of central processing units (CPUs). GPUs are designed as a special-purpose co-processor. Their programming interfaces are typically formulated for graphics applications. Compared to CPUs, GPUs have greater computation power and memory bandwidth; however, GPUs are more difficult to program because of the design of their architectures. The performance of the Korean morphological analyzer using the MapReduce framework on the GPU is evaluated in comparison with the CPU-based model. The proposed Korean Morphological analyzer shows promising scalable performance on distributed computing with the GPU.


  1. MapReduce: Simplified Data Processing on Large Clusters,
  3. OpenCL,
  5. CUDA, Wikipedia,
  6. Seung-Shick Kang and Yung Taek Kim, Syllablebased Model for the Korean Morphology. The 15th International Conference on Computational Linguistics, pp. 211-226, 1995
  7. Seung Hyun Yang and Young-Sum Kim, A High Speed Korean Morphological Analysis Method based on Pre-Analyzed Partial Words, Journal of the Korean Information Science Society: Software and Application, Vol. 27, No. 3., pp. 290-301, 2000
  8. Kwangseob Shim and Jaehyung Yang, MACH : A Supersonic Korean Morphological Analyzer, Proceedings of the 19th International Conference on Computational Linguistics (COLING-2002), pp. 939-945, 2002
  9. OpenMPI,
  10. MPICH2,
  11. Parallel Virtual File System (PVFS),