• Title/Summary/Keyword: Parallel Computing

Search Result 807, Processing Time 0.027 seconds

VDI Performance Optimization with Hybrid Parallel Processing in Thick Client System under Heterogeneous Multi-Core Environment (Heterogeneous 멀티 코어 환경의 Thick Client에서 VDI 성능 최적화를 위한 혼합 병렬 처리 기법 연구)

  • Kim, Myeong-Seob;Huh, Eui-Nam
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.38B no.3
    • /
    • pp.163-171
    • /
    • 2013
  • Recently, the requirement of processing High Definition (HD) video or 3D application on low, mobile devices has been expanded and content data has been increased as well. It is becoming a major issue in Cloud computing where a Virtual Desktop Infrastructure (VDI) Service needs efficient data processing ability to provide Quality of Experience (QoE) in Cloud computing. In this paper, we propose three kind of Thick-Thin VDI Service which can share and delegate VDI service based on Thick Client using CPU and GPU. Furthermore, we propose and discuss the VDI Service Optimization Method in mixed CPU and GPU Heterogeneous Environment using CPU Parallel Processing OpenMP and GPU Parallel Processing CUDA.

Parallel LDPC Decoder for CMMB on CPU and GPU Using OpenCL (OpenCL을 활용한 CPU와 GPU 에서의 CMMB LDPC 복호기 병렬화)

  • Park, Joo-Yul;Hong, Jung-Hyun;Chung, Ki-Seok
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.11 no.6
    • /
    • pp.325-334
    • /
    • 2016
  • Recently, Open Computing Language (OpenCL) has been proposed to provide a framework that supports heterogeneous computing platforms. By using an OpenCL framework, digital communication systems can support various protocols in a unified computing environment to achieve both high portability and high performance. This article introduces a parallel software decoder of Low Density Parity Check (LDPC) codes for China Multimedia Mobile Broadcasting (CMMB) on a heterogeneous platform. Each step of LDPC decoding has different parallelization characteristics. In this paper, steps suitable for task-level parallelization are executed on the CPU, and steps suitable for data-level parallelization are processed by the GPU. To improve the performance of the proposed OpenCL kernels for LDPC decoding operations, explicit thread scheduling, loop-unrolling, and effective data transfer techniques are applied. The proposed LDPC decoder achieves high performance by using heterogeneous multi-core processors on a unified computing framework.

An Application of MapReduce Technique over Peer-to-Peer Network (P2P 네트워크상에서 MapReduce 기법 활용)

  • Ren, Jian-Ji;Lee, Jae-Kee
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.15 no.8
    • /
    • pp.586-590
    • /
    • 2009
  • The objective of this paper describes the design of MapReduce over Peer-to-Peer network for dynamic environments applications. MapReduce is a software framework used for Cloud Computing which processing large data sets in a highly-parallel way. Based on the Peer-to-Peer network character which node failures will happen anytime, we focus on using a DHT routing protocol which named Pastry to handle the problem of node failures. Our results are very promising and indicate that the framework could have a wide application in P2P network systems while maintaining good computational efficiency and scalability. We believe that, P2P networks and parallel computing emerge as very hot research and development topics in industry and academia for many years to come.

Artificial Intelligence Computing Platform Design for Underwater Localization (수중 위치측정을 위한 인공지능 컴퓨팅 플랫폼 설계)

  • Moon, Ji-Youn;Lee, Young-Pil
    • The Journal of the Korea institute of electronic communication sciences
    • /
    • v.17 no.1
    • /
    • pp.119-124
    • /
    • 2022
  • Successful underwater localization requires a large-scale, parallel computing environment that can be mounted on various underwater robots. Accordingly, we propose a design method for an artificial intelligence computing platform for underwater localization. The proposed platform consists of a total of four hardware modules. Transponder and hydrophone modules transmit and receive sound waves, and the FPGA module rapidly pre-processes the transmitted and received sound wave signals in parallel. Jetson module processes artificial intelligence based algorithms. We performed a sound wave transmission/reception experiment for underwater localization according to distance in an actual underwater environment. As a result, the designed platform was verified.

Preliminary Performance Testing of Geo-spatial Image Parallel Processing in the Mobile Cloud Computing Service (모바일 클라우드 컴퓨팅 서비스를 위한 위성영상 병렬 정보처리 성능 예비실험)

  • Kang, Sang-Goo;Lee, Ki-Won;Kim, Yong-Seung
    • Korean Journal of Remote Sensing
    • /
    • v.28 no.4
    • /
    • pp.467-475
    • /
    • 2012
  • Cloud computing services are known that they have many advantages from the point of view in economic saving, scalability, security, sharing and accessibility. So their applications are extending from simple office systems to the expert system for scientific computing. However, research or computing technology development in the geo-spatial fields including remote sensing applications are the beginning stage. In this work, the previously implemented smartphone app for image processing was first migrated to mobile cloud computing linked to Amazon web services. As well, parallel programming was applied for improving operation performance. Industrial needs and technology development cases in terms of mobile cloud computing services are being increased. Thus, a performance testing on a satellite image processing module was carried out as the main purpose of this study. Types of implementation or services for mobile cloud varies. As the result of this testing study in a given condition, the performance of cloud computing server was higher than that of the single server without cloud service. This work is a preliminary case study for the further linkage approach for mobile cloud and satellite image processing.

Construction and Performance Evaluation of Windows- based Parallel Computing Environment (윈도우즈 기반의 병렬컴퓨팅 환경 구축 및 성능평가)

  • Shin J.-R.;Kim M.-H.;Choi J.-Y.
    • 한국전산유체공학회:학술대회논문집
    • /
    • 2001.10a
    • /
    • pp.58-62
    • /
    • 2001
  • Aparallel computing environment was constructed based on Windows 2000 operating system. This cluster was configured using Fast-Ethernet system to hold up together the clients within a network domain. For the parallel computation, MPI implements for Windows such as MPICH.NT.1.2.2 and MP-MPICHNT.1.2 were used with Compaq Visual Fortran compiler which produce a well optimized executives for x86 systems. The evaluation of this cluster performance was carried out using a preconditioned Navier-Stokes code for the 2D analysis of a compressible and viscous flow around a compressor blade. The parallel performance was examined in comparison with those of Linux clusters studied previously by changing a number of processors, problem size and MPI libraries. The result from the test problems presents that parallel performance of the low cost Fast-Ethernet Windows cluster is superior to that of a Linux cluster of similar configuration and is comparable to that of a Myrinet cluster.

  • PDF

A two-level parallel algorithm for material nonlinearity problems

  • Lee, Jeeho;Kim, Min Seok
    • Structural Engineering and Mechanics
    • /
    • v.38 no.4
    • /
    • pp.405-416
    • /
    • 2011
  • An efficient two-level domain decomposition parallel algorithm is suggested to solve large-DOF structural problems with nonlinear material models generating unsymmetric tangent matrices, such as a group of plastic-damage material models. The parallel version of the stabilized bi-conjugate gradient method is developed to solve unsymmetric coarse problems iteratively. In the present approach the coarse DOF system is solved parallelly on each processor rather than the whole system equation to minimize the data communication between processors, which is appropriate to maintain the computing performance on a non-supercomputer level cluster system. The performance test results show that the suggested algorithm provides scalability on computing performance and an efficient approach to solve large-DOF nonlinear structural problems on a cluster system.

Analysis of Stator-Rotor Interactions by using Parallel Computer (정익-동익 상호작용의 병렬처리해석)

  • Lee J. J.;Choi J. M.;Lee D. H.
    • 한국전산유체공학회:학술대회논문집
    • /
    • 2004.10a
    • /
    • pp.111-114
    • /
    • 2004
  • CFD code that simulates stator-rotor interactions is developed applying parallel computing method. Modified Multi-Block Grid System which enhances perpendicularity in grid and is appropriate in parallel processing is introduced and Patched Algorithm is applied in sliding interface which is caused by movement of rotor. The experimental model in the turbo-machine is composed of 11 stators and 14 rotors. Analyses on two test cases which are one stator - one rotor model and three stators - four rotors model are performed. The results of the two cases have been compared with the experimental test data.

  • PDF

Integer-Pel Motion Estimation for HEVC on Compute Unified Device Architecture (CUDA)

  • Lee, Dongkyu;Sim, Donggyu;Oh, Seoung-Jun
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.3 no.6
    • /
    • pp.397-403
    • /
    • 2014
  • A new video compression standard called High Efficiency Video Coding (HEVC) has recently been released onto the market. HEVC provides higher coding performance compared to previous standards, but at the cost of a significant increase in encoding complexity, particularly in motion estimation (ME). At the same time, the computing capabilities of Graphics Processing Units (GPUs) have become more powerful. This paper proposes a parallel integer-pel ME (IME) algorithm for HEVC on GPU using the Compute Unified Device Architecture (CUDA). In the proposed IME, concurrent parallel reduction (CPR) is introduced. CPR performs several parallel reduction (PR) operations concurrently to solve two problems in conventional PR; low thread utilization and high thread synchronization latency. The proposed encoder reduces the portion of IME in the encoder to almost zero with a 2.3% increase in bitrate. In terms of IME, the proposed IME is up to 172.6 times faster than the IME in the HEVC reference model.

PCG Algorithms for Development of PC level Parallel Structural Analysis Method (PC level 병렬 구조해석법 개발을 위한 PCG 알고리즘)

  • 박효선;박성무;권윤한
    • Proceedings of the Computational Structural Engineering Institute Conference
    • /
    • 1998.10a
    • /
    • pp.362-369
    • /
    • 1998
  • The computational environment in which engineers perform their designs has been rapidly evolved from coarse serial machines to massively parallel machines. Although the recent development of high-performance computers are available for a number of years, only limited successful applications of the new computational environments in computational structural engineering field has been reported due to its limited availability and large cost associated with high-performance computing. As a new computational model for high-performance engineering computing without cost and availability problems, parallel structural analysis models for large scale structures on a network of personal computers (PCs) are presented in this paper. In structural analysis solving routine for the linear system of equations is the most time consuming part. Thus, the focus is on the development of efficient preconditioned conjugate gradient (PCG) solvers on the proposed computational model. Two parallel PCG solvers, PPCG-I and PPCG-II, are developed and applied to analysis of large scale space truss structures.

  • PDF