• Title/Summary/Keyword: 병렬컴퓨팅

Search Result 97, Processing Time 0.277 seconds

Research of accelerating method of video quality measurement program using GPGPU (GPGPU를 이용한 영상 품질 측정 프로그램의 가속화 연구)

  • Lee, Seonguk;Byeon, Gibeom;Kim, Kisu;Hong, Jiman
    • Smart Media Journal
    • /
    • v.5 no.4
    • /
    • pp.69-74
    • /
    • 2016
  • Recently, parallel computing using GPGPU(General-Purpose computing on Graphics Processing Units) according to the development of the graphics processing unit is expanding. This can be achieved through the processing speeds faster than traditional computing environments across many fields, including science, medicine, engineering, and analysis. However, in using the GPU technology to implement the a parallel program there are many constraints. In this paper, we port a CPU-based program(Video Quality Measurement Program) to use technology. The program ported to GPU-based show about 1.83 times the execution speed than CPU-based program. We study on the acceleration of the GPU-based program. Also we discuss the technical constraints and problems that occur when you modify the CPU to the GPU-based programs.

Study on LLVM application in Parallel Computing System (병렬 컴퓨팅 시스템에서 LLVM 응용 연구)

  • Cho, Jungseok;Cho, Doosan;Kim, Yongyeon
    • The Journal of the Convergence on Culture Technology
    • /
    • v.5 no.1
    • /
    • pp.395-399
    • /
    • 2019
  • In order to support various parallel computing systems, it is necessary to extend LLVM IR to more efficiently support vector / matrix and to design LLVM IR to machine code as a new algorithm. As shown in the IR example, RISC instruction generation is naturally generated because the RISC instruction is basically composed of the RISC instruction, and the vector instruction is also not supported. There is a need for new IR structures, command generation algorithms and related extensions to support vector / matrix more robustly. To do this, it is important to map each instruction in the LLVM IR to the appropriate instruction in the target architecture (vector / matrix) (instruction selection algorithm). It is necessary to understand the meaning of LLVM IR command, to compare the meaning of each instruction of the target architecture with syntax, and to select the instruction that matches the pattern to make mapping efficient.

Design and Implementation of Initial OpenSHMEM Based on PCI Express (PCI Express 기반 OpenSHMEM 초기 설계 및 구현)

  • Joo, Young-Woong;Choi, Min
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.6 no.3
    • /
    • pp.105-112
    • /
    • 2017
  • PCI Express is a bus technology that connects the processor and the peripheral I/O devices that widely used as an industry standard because it has the characteristics of high-speed, low power. In addition, PCI Express is system interconnect technology such as Ethernet and Infiniband used in high-performance computing and computer cluster. PGAS(partitioned global address space) programming model is often used to implement the one-sided RDMA(remote direct memory access) from multi-host systems, such as computer clusters. In this paper, we design and implement a OpenSHMEM API based on PCI Express maintaining the existing features of OpenSHMEM to implement RDMA based on PCI Express. We perform experiment with implemented OpenSHMEM API through a matrix multiplication example from system which PCs connected with NTB(non-transparent bridge) technology of PCI Express. The PCI Express interconnection network is currently very expensive and is not yet widely available to the general public. Nevertheless, we actually implemented and evaluated a PCI Express based interconnection network on the RDK evaluation board. In addition, we have implemented the OpenSHMEM software stack, which is of great interest recently.

A Study on Performance Improvement of Distributed Computing Framework using GPU (GPU를 활용한 분산 컴퓨팅 프레임워크 성능 개선 연구)

  • Song, Ju-young;Kong, Yong-joon;Shim, Tak-kil;Shin, Eui-seob;Seong, Kee-kin
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • /
    • pp.499-502
    • /
    • 2012
  • 빅 데이터 분석의 시대가 도래하면서 대용량 데이터의 특성과 계산 집약적 연산의 특성을 동시에 가지는 문제 해결에 대한 요구가 늘어나고 있다. 대용량 데이터 처리의 경우 각종 분산 파일 시스템과 분산/병렬 컴퓨팅 기술들이 이미 많이 사용되고 있으며, 계산 집약적 연산 처리의 경우에도 GPGPU 활용 기술의 발달로 보편화되는 추세에 있다. 하지만 대용량 데이터와 계산 집약적 연산 이 두 가지 특성을 모두 가지는 문제를 처리하기 위해서는 많은 제약 사항들을 해결해야 하는데, 본 논문에서는 이에 대한 대안으로 분산 컴퓨팅 프레임워크인 Hadoop MapReduce와 Nvidia의 GPU 병렬 컴퓨팅 아키텍처인 CUDA 흘 연동하는 방안을 제시하고, 이를 밀집행렬(dense matrix) 연산에 적용했을 때 얻을 수 있는 성능 개선 효과에 대해 소개하고자 한다.

  • PDF

Performance Improvement of BLAST using Grid Computing and Implementation of Genome Sequence Analysis System (그리드 컴퓨팅을 이용한 BLAST 성능개선 및 유전체 서열분석 시스템 구현)

  • Kim, Dong-Wook;Choi, Han-Suk
    • The Journal of the Korea Contents Association
    • /
    • v.10 no.7
    • /
    • pp.81-87
    • /
    • 2010
  • This paper proposes a G-BLAST(BLAST using Grid Computing) system, an integrated software package for BLAST searches operated in heterogeneous distributed environment. G-BLAST employed 'database splicing' method to improve the performance of BLAST searches using exists computing resources. G-BLAST is a basic local alignment search tool of DNA Sequence using grid computing in heterogeneous distributed environment. The G-BLAST improved the existing BLAST search performance in gene sequence analysis. Also G-BLAST implemented the pipeline and data management method for users to easily manage and analyze the BLAST search results. The proposed G-BLAST system has been confirmed the speed and efficiency of BLAST search performance in heterogeneous distributed computing.

Profiler Design for Evaluating Performance of WebCL Applications (WebCL 기반 애플리케이션의 성능 평가를 위한 프로파일러 설계 및 구현)

  • Kim, Cheolwon;Cho, Hyeonjoong
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.4 no.8
    • /
    • pp.239-244
    • /
    • 2015
  • WebCL was proposed for high complex computing in Javascript. Since WebCL-based applications are distributed and executed on an unspecified number of general clients, it is important to profile their performances on different clients. Several profilers have been introduced to support various programming languages but WebCL profiler has not been developed yet. In this paper, we present a WebCL profiler to evaluate WebCL-based applications and monitor the status of GPU on which they run. This profiler helps developers know the execution time of applications, memory read/write time, GPU statues such as its power consumption, temperature, and clock speed.

Stale Synchronous Parallel Model in Edge Computing Environment (Edge Computing 환경에서의 Stale Synchronous Parallel Model 연구)

  • Kim, Dong-Hyun;Lee, Byung-Jun;Kim, Kyung-Tae;Youn, Hee-Yong
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • /
    • pp.89-92
    • /
    • 2018
  • 본 논문에서는 Edge computing 환경에서 다수의 노드들로 구성된 네트워크의 디바이스를 효율적으로 관리하기 위한 방법을 제안한다. 기존의 클라이언트-서버 모델은 모든 데이터와 그에 대한 요청을 중심 서버에서 처리하기 때문에, 다수의 노드로부터 생성된 많은 양의 데이터를 처리하는 데 빠른 응답속도를 보장하지 못한다. Edge computing은 분담을 통해 네트워크의 부담을 줄일 수 있는 IoT 네트워크에 적합한 방법으로, 데이터를 전송하고 받는 과정에서 네트워크의 대역폭을 사용하는 대신 서로 연결된 노드들이 협력해서 데이터를 처리하고, 또한 네트워크 말단에서의 데이터 처리가 허용되어 데이터 센터의 부담을 줄일 수 있다. 여러병렬 기계학습 모델 중 본 연구에서는 Stale Synchronous Parallel(SSP) 모델을 이용하여 Edge 노드에서 분산기계 학습에 적용하였다.

  • PDF

Parallel Finite Element Simulation of the Incompressible Navier-stokes Equations (병렬 유한요소 해석기법을 이용한 유동장 해석)

  • Choi H. G.;Kim B. J.;Kang S. W.;Yoo J. Y.
    • 한국전산유체공학회:학술대회논문집
    • /
    • /
    • pp.8-15
    • /
    • 2002
  • For the large scale computation of turbulent flows around an arbitrarily shaped body, a parallel LES (large eddy simulation) code has been recently developed in which domain decomposition method is adopted. METIS and MPI (message Passing interface) libraries are used for domain partitioning and data communication between processors, respectively. For unsteady computation of the incompressible Wavier-Stokes equation, 4-step splitting finite element algorithm [1] is adopted and Smagorinsky or dynamic LES model can be chosen fur the modeling of small eddies in turbulent flows. For the validation and performance-estimation of the parallel code, a three-dimensional laminar flow generated by natural convection inside a cube has been solved. Then, we have solved the turbulent flow around MIRA (Motor Industry Research Association) model at $Re = 2.6\times10^6$, which is based on the model height and inlet free stream velocity, using 32 processors on IBM SMP cluster and compared with the existing experiment.

  • PDF

Parallel Computation of a Flow Field Using FEM and Domain Decomposition Method (영역분할법과 유한요소해석을 이용한 유동장의 병렬계산)

  • Choi Hyounggwon;Kim Beomjun;Kang Sungwoo;Yoo Jung Yul
    • Proceedings of the KSME Conference
    • /
    • /
    • pp.55-58
    • /
    • 2002
  • Parallel finite element code has been recently developed for the analysis of the incompressible Wavier-Stokes equations using domain decomposition method. Metis and MPI libraries are used for the domain partitioning of an unstructured mesh and the data communication between sub-domains, respectively. For unsteady computation of the incompressible Navier-Stokes equations, 4-step splitting method is combined with P1P1 finite element formulation. Smagorinsky and dynamic model are implemented for the simulation of turbulent flows. For the validation performance-estimation of the developed parallel code, three-dimensional Laplace equation has been solved. It has been found that the speed-up of 40 has been obtained from the present parallel code fir the bench mark problem. Lastly, the turbulent flows around the MIRA model and Tiburon model have been solved using 32 processors on IBM SMP cluster and unstructured mesh. The computed drag coefficient agrees better with the existing experiment as the mesh resolution of the region increases, where the variation of pressure is severe.

  • PDF

An Application of MapReduce Technique over Peer-to-Peer Network (P2P 네트워크상에서 MapReduce 기법 활용)

  • Ren, Jian-Ji;Lee, Jae-Kee
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.15 no.8
    • /
    • pp.586-590
    • /
    • 2009
  • The objective of this paper describes the design of MapReduce over Peer-to-Peer network for dynamic environments applications. MapReduce is a software framework used for Cloud Computing which processing large data sets in a highly-parallel way. Based on the Peer-to-Peer network character which node failures will happen anytime, we focus on using a DHT routing protocol which named Pastry to handle the problem of node failures. Our results are very promising and indicate that the framework could have a wide application in P2P network systems while maintaining good computational efficiency and scalability. We believe that, P2P networks and parallel computing emerge as very hot research and development topics in industry and academia for many years to come.