Go to the main menu
Skip to content
Go to bottom
REFERENCE LINKING PLATFORM OF KOREA S&T JOURNALS
> Journal Vol & Issue
KIPS Transactions on Software and Data Engineering
Journal Basic Information
Journal DOI :
Korea Information Processing Society
Editor in Chief :
Volume & Issues
Volume 2, Issue 12 - Dec 2013
Volume 2, Issue 11 - Nov 2013
Volume 2, Issue 10 - Oct 2013
Volume 2, Issue 9 - Sep 2013
Volume 2, Issue 8 - Aug 2013
Volume 2, Issue 7 - Jul 2013
Volume 2, Issue 6 - Jun 2013
Volume 2, Issue 5 - May 2013
Volume 2, Issue 4 - Apr 2013
Volume 2, Issue 3 - Mar 2013
Volume 2, Issue 2 - Feb 2013
Volume 2, Issue 1 - Jan 2013
Selecting the target year
Design of MAHA Supercomputing System for Human Genome Analysis
Kim, Young Woo ; Kim, Hong-Yeon ; Bae, Seungjo ; Kim, Hag-Young ; Woo, Young-Choon ; Park, Soo-Jun ; Choi, Wan ;
KIPS Transactions on Software and Data Engineering, volume 2, issue 2, 2013, Pages 81~90
DOI : 10.3745/KTSDE.2013.2.2.081
During the past decade, many changes and attempts have been tried and are continued developing new technologies in the computing area. The brick wall in computing area, especially power wall, changes computing paradigm from computing hardwares including processor and system architecture to programming environment and application usage. The high performance computing (HPC) area, especially, has been experienced catastrophic changes, and it is now considered as a key to the national competitiveness. In the late 2000's, many leading countries rushed to develop Exascale supercomputing systems, and as a results tens of PetaFLOPS system are prevalent now. In Korea, ICT is well developed and Korea is considered as a one of leading countries in the world, but not for supercomputing area. In this paper, we describe architecture design of MAHA supercomputing system which is aimed to develop 300 TeraFLOPS system for bio-informatics applications like human genome analysis and protein-protein docking. MAHA supercomputing system is consists of four major parts - computing hardware, file system, system software and bio-applications. MAHA supercomputing system is designed to utilize heterogeneous computing accelerators (co-processors like GPGPUs and MICs) to get more performance/$, performance/area, and performance/power. To provide high speed data movement and large capacity, MAHA file system is designed to have asymmetric cluster architecture, and consists of metadata server, data server, and client file system on top of SSD and MAID storage servers. MAHA system softwares are designed to provide user-friendliness and easy-to-use based on integrated system management component - like Bio Workflow management, Integrated Cluster management and Heterogeneous Resource management. MAHA supercomputing system was first installed in Dec., 2011. The theoretical performance of MAHA system was 50 TeraFLOPS and measured performance of 30.3 TeraFLOPS with 32 computing nodes. MAHA system will be upgraded to have 100 TeraFLOPS performance at Jan., 2013.
MAHA-FS : A Distributed File System for High Performance Metadata Processing and Random IO
Kim, Young Chang ; Kim, Dong Oh ; Kim, Hong Yeon ; Kim, Young Kyun ; Choi, Wan ;
KIPS Transactions on Software and Data Engineering, volume 2, issue 2, 2013, Pages 91~96
DOI : 10.3745/KTSDE.2013.2.2.091
The application field of supercomputing systems are changing to support into the field for both a large-volume data processing and high-performance computing at the same time such as bio-applications. These applications require high-performance distributed file system for storage management and efficient high-speed processing of large amounts of data that occurs. In this paper, we introduce MAHA-FS for supercomputing systems for processing large amounts of data and high-performance computing, providing excellent metadata operation performance and IO performance. It is shown through performance analysis that MAHA-FS provides excellent performance in terms of the metadata processing and random IO processing.
Workflow-based Bio Data Analysis System for HPC
Ahn, Shinyoung ; Kim, ByoungSeob ; Choi, Hyun-Hwa ; Jeon, Seunghyub ; Bae, Seungjo ; Choi, Wan ;
KIPS Transactions on Software and Data Engineering, volume 2, issue 2, 2013, Pages 97~106
DOI : 10.3745/KTSDE.2013.2.2.097
Since human genome project finished, the cost for human genome analysis has decreased very rapidly. This results in the sharp increase of human genome data to be analyzed. As the need for fast analysis of very large bio data such as human genome increases, non IT researchers such as biologists should be able to execute fast and effectively many kinds of bio applications, which have a variety of characteristics, under HPC environment. To accomplish this purpose, a biologist need to define a sequence of bio applications as workflow easily because generally bio applications should be combined and executed in some order. This bio workflow should be executed in the form of distributed and parallel computing by allocating computing resources efficiently under HPC cluster system. Through this kind of job, we can expect better performance and fast response time of very large bio data analysis. This paper proposes a workflow-based data analysis system specialized for bio applications. Using this system, non-IT scientists and researchers can analyze very large bio data easily under HPC environment.
Evaluation of Alignment Methods for Genomic Analysis in HPC Environment
Lim, Myungeun ; Jung, Ho-Youl ; Kim, Minho ; Choi, Jae-Hun ; Park, Soojun ; Choi, Wan ; Lee, Kyu-Chul ;
KIPS Transactions on Software and Data Engineering, volume 2, issue 2, 2013, Pages 107~112
DOI : 10.3745/KTSDE.2013.2.2.107
With the progress of NGS technologies, large genome data have been exploded recently. To analyze such data effectively, the assistance of HPC technique is necessary. In this paper, we organized a genome analysis pipeline to call SNP from NGS data. To organize the pipeline efficiently under HPC environment, we analyzed the CPU utilization pattern of each pipeline steps. We found that sequence alignment is computing centric and suitable for parallelization. We also analyzed the performance of parallel open source alignment tools and found that alignment method utilizing many-core processor can improve the performance of genome analysis pipeline.
Direct Pass-Through based GPU Virtualization for Biologic Applications
Choi, Dong Hoon ; Jo, Heeseung ; Lee, Myungho ;
KIPS Transactions on Software and Data Engineering, volume 2, issue 2, 2013, Pages 113~118
DOI : 10.3745/KTSDE.2013.2.2.113
The current GPU virtualization techniques incur large overheads when executing application programs mainly due to the fine-grain time-sharing scheduling of the GPU among multiple Virtual Machines (VMs). Besides, the current techniques lack of portability, because they include the APIs for the GPU computations in the VM monitor. In this paper, we propose a low overhead and high performance GPU virtualization approach on a heterogeneous HPC system based on the open-source Xen. Our proposed techniques are tailored to the bio applications. In our virtualization framework, we allow a VM to solely occupy a GPU once the VM is assigned a GPU instead of relying on the time-sharing the GPU. This improves the performance of the applications and the utilization of the GPUs. Our techniques also allow a direct pass-through to the GPU by using the IOMMU virtualization features embedded in the hardware for the high portability. Experimental studies using microbiology genome analysis applications show that our proposed techniques based on the direct pass-through significantly reduce the overheads compared with the previous Domain0 based approaches. Furthermore, our approach closely matches the performance for the applications to the bare machine or rather improves the performance.
Approximate Periods of Strings based on Distance Sum for DNA Sequence Analysis
Jeong, Ju Hui ; Kim, Young Ho ; Na, Joong Chae ; Sim, Jeong Seop ;
KIPS Transactions on Software and Data Engineering, volume 2, issue 2, 2013, Pages 119~122
DOI : 10.3745/KTSDE.2013.2.2.119
Repetitive strings such as periods have been studied vigorously in so diverse fields as data compression, computer-assisted music analysis, bioinformatics, and etc. In bioinformatics, periods are highly related to repetitive patterns in DNA sequences so called tandem repeats. In some cases, quite similar but not the same patterns are repeated and thus we need approximate string matching algorithms to study tandem repeats in DNA sequences. In this paper, we propose a new definition of approximate periods of strings based on distance sum. Given two strings
, we propose an algorithm that computes the minimum approximate period distance based on distance sum. Our algorithm runs in
time for the weighted edit distance, and runs in O(mn) time for the edit distance, and runs in O(n) time for the Hamming distance.
Genome Analysis Pipeline I/O Workload Analysis
Lim, Kyeongyeol ; Kim, Dongoh ; Kim, Hongyeon ; Park, Geehan ; Choi, Minseok ; Won, Youjip ;
KIPS Transactions on Software and Data Engineering, volume 2, issue 2, 2013, Pages 123~130
DOI : 10.3745/KTSDE.2013.2.2.123
As size of genomic data is increasing rapidly, the needs for high-performance computing system to process and store genomic data is also increasing. In this paper, we captured I/O trace of a system which analyzed 500 million sequence reads data in Genome analysis pipeline for 86 hours. The workload created 630 file with size of 1031.7 Gbyte and deleted 535 file with size of 91.4 GByte. What is interesting in this workload is that 80% of all accesses are from only two files among 654 files in the system. Size of read and write request in the workload was larger than 512 KByte and 1 Mbyte, respectively. Majority of read write operations show random and sequential patterns, respectively. Throughput and bandwidth observed in each processing phase was different from each other.
An Optimization Tool for Determining Processor Affinity of Networking Processes
Cho, Joong-Yeon ; Jin, Hyun-Wook ;
KIPS Transactions on Software and Data Engineering, volume 2, issue 2, 2013, Pages 131~136
DOI : 10.3745/KTSDE.2013.2.2.131
Multi-core processors can improve parallelism of application processes and thus can enhance the system throughput. Researchers have recently revealed that the processor affinity is an important factor to determine network I/O performance due to architectural characteristics of multi-core processors; thus, many researchers are trying to suggest a scheme to decide an optimal processor affinity. Existing schemes to dynamically decide the processor affinity are able to transparently adapt for system changes, such as modifications of application and upgrades of hardware, but these have limited access to characteristics of application behavior and run-time information that can be collected heuristically. Thus, these can provide only sub-optimal processor affinity. In this paper, we define meaningful system variables for determining optimal processor affinity and suggest a tool to gather such information. We show that the implemented tool can overcome limitations of existing schemes and can improve network bandwidth.
A Sequential Pattern Mining based on Dynamic Weight in Data Stream
Choi, Pilsun ; Kim, Hwan ; Kim, Daein ; Hwang, Buhyun ;
KIPS Transactions on Software and Data Engineering, volume 2, issue 2, 2013, Pages 137~144
DOI : 10.3745/KTSDE.2013.2.2.137
A sequential pattern mining is finding out frequent patterns from the data set in time order. In this field, a dynamic weighted sequential pattern mining is applied to a computing environment that changes depending on the time and it can be utilized in a variety of environments applying changes of dynamic weight. In this paper, we propose a new sequence data mining method to explore the stream data by applying the dynamic weight. This method reduces the candidate patterns that must be navigated by using the dynamic weight according to the relative time sequence, and it can find out frequent sequence patterns quickly as the data input and output using a hash structure. Using this method reduces the memory usage and processing time more than applying the existing methods. We show the importance of dynamic weighted mining through the comparison of different weighting sequential pattern mining techniques.
Implementation of Mobile Virtual Colored Overlay for People with Scotopic Sensitivity Syndrome
Jang, Young Gun ;
KIPS Transactions on Software and Data Engineering, volume 2, issue 2, 2013, Pages 145~150
DOI : 10.3745/KTSDE.2013.2.2.145
A film colored overlay has been used as an assistive device for dyslexics, Recently, several virtual colored overlays which can be used in computer were developed. But mobile virtual overlay has not been developed yet. In this paper, I implemented a mobile overlay application which is based android operating system and displays a colored overlay of screen all the time while user can freely interact with rest of apps in normal manner by using root window and service. A method is presented to determine the source color of a virtual overlays by estimating alpha value of alpha blending algorithm through measurement of the chromaticity and transmissivity of film overlays and I implemented all colors which are presented by using Intuitive Overlays. Test results of the developed virtual overlay show that all colors of the overlays are almost identical to the colors of Intuitive Overlay by using the chroma meter CS-200A.