• Title, Summary, Keyword: GPGPU

Search Result 169, Processing Time 0.034 seconds

Performance Comparison of Join Operations Parallelization by using GPGPU (GPGPU 기반 조인 연산 병렬화 성능 비교)

  • Lee, Jong-Sub;Lee, Sang-Back;Lee, Kyu-Chul
    • Database Research
    • /
    • v.34 no.3
    • /
    • pp.28-44
    • /
    • 2018
  • In a database system, the most expensive operation among relational operations is a join operation. Generally, CPU-based join operations uses parallel processing with either 1 core or 16 cores at most, which does not significantly improve the function. On the other hand, GPGPU(General-Purpose computing on Graphics Processing Units) allows parallel processing through thousands of processing units, greatly reducing the time required to perform join operations. Parallelization of the operation using GPGPU uses NVIDIA's CUDA SDK. In this paper, we implement parallelization of the join operation using GPGPU and compare the performances. The used join operations are Nested Loop Join (NLJ), Sort Merge Join (SMJ) and Hash Join (HJ), and GPGPU equipment uses TITAN Xp, GTX 1080 Ti and GTX 1080. We measure and compare the performance of join operations based on CPU and GPGPU. We compare this performance with the performance of the previous study on the join operation based on GPGPU. The results of experiment show that the performance based on GPGPU is 6~328 times faster than the one based on CPU.

A Study of The GPGPU Performance (범용 그래픽 처리장치 (GPGPU)의 성능에 대한 연구)

  • Lee, Jongbok
    • The Journal of The Institute of Internet, Broadcasting and Communication
    • /
    • v.18 no.6
    • /
    • pp.201-206
    • /
    • 2018
  • As the artificial intelligence and big data technology has been developed recently, the importance of GPGPU, which is a general purpose graphics processing unit, is emphasized. In addition, by the demand for mining equipment to obtain bit coins, which is a block chain application technology, the price of GPGPU has increased sharply with scarcity. If a GPGPU can be precisely simulated, it is possible to conduct experiments on various GPGPU types and analyze performance without purchasing expensive ones. In this paper, we investigate the configuration of a GPGPU simulator and measure the performance of various benchmark programs using GPGPU-Sim.

Analysis of the GPGPU Performance for Various Combinations of Workloads Executed Concurrently (동시에 실행되는 워크로드 조합에 따른 GPGPU 성능 분석)

  • Kim, Dongwhan;Eom, Hyeonsang
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.3
    • /
    • pp.165-170
    • /
    • 2017
  • Many studies have utilized GPGPU (General-Purpose Graphic Processing Unit) and its high computing power to compute complex tasks. The characteristics of GPGPU programs necessitate the operations of memory copy between the host and device. A high latency period can affect the performance of the program. Thus, it is required to significantly improve the performance of GPGPU programs by optimizations. By executing multiple GPGPU programs simultaneously, the latency hiding effect of memory copy is achieved by overlapping the memory copy and computing operations in GPGPU. This paper presents the results of analyzing the latency hiding effect for memory copy operations. Furthermore, we propose a performance anticipation model and an algorithm for the limitations of using pinned memory, and show that the use of the proposed algorithm results in a 41% performance increase.

IPC-based Dynamic SM management on GPGPU for Executing AES Algorithm

  • Son, Dong Oh;Choi, Hong Jun;Kim, Cheol Hong
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.2
    • /
    • pp.11-19
    • /
    • 2020
  • Modern GPU can execute general purpose computation on the graphic processing unit, and provide high performance by exploiting many core on GPU. To run AES algorithm efficiently, parallel computational resources are required. However, computational resource of CPU architecture are not enough to cryptographic algorithm such as AES whereas GPU architecture has mass parallel computation resources. Therefore, this paper reduce the time to execute AES by employing parallel computational resource on GPGPU. Unfortunately, AES cannot utilize computational resource on GPGPU since it isn't suitable to GPGPU architecture. In this paper, IPC based dynamic SM management technique are proposed to efficiently execute AES on GPGPU. IPC based dynamic SM management can increase and decrease the number of active SMs by using IPC in run-time. According to simulation results, proposed technique improve the performance by increasing resource utilization compared to baseline GPGPU architecture. The results show that AES improve the performance by 41.2% on average.

Method of extract eye zone using GPGPU (GPGPU를 이용한 눈 영역 검출 기법)

  • Park, Young-Jae;Kim, Gye-Young
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • /
    • pp.269-272
    • /
    • 2011
  • 본 논문에서는 GPGPU를 이용한 눈 영역 검출 기법을 제안한다. 영상 전체의 평균과 분산을 기반으로 하여 각 마스크의 평균과 분산값을 비교는 비교적 간단한 알고리즘을 이용하여 눈 영역을 검출한다. 정확도의 경우 명암값의 대비를 이용한 기존의 방법과 비슷한 수준을 보였다. 하지만 연산속도의 경우 병렬처리 구간을 늘려 GPGPU를 사용한 제안된 방법이 우수한 성능을 보였다.

  • PDF

Implementation and Performance Evaluation of the Faddev-Leverrier Algorithm using GPGPU (GPGPU를 이용한 파데브-레브리어 알고리즘 구현 및 성능 분석)

  • Park, Yong-Hun;Kim, Cheol-Hong;Kim, Jong-Myon
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.8 no.3
    • /
    • pp.171-178
    • /
    • 2013
  • In this paper, we implement the Faddev-Leverier algorithm using GPGPU (General-Purpose Graphics Processing Unit) to accelerate singular value decomposition. In addition, we compare the performance of the algorithm using CPU and CPU plus GPGPU for eleven ${\times}n$ matrix sizes in order to decompose singular values, where =4, 8, 16, 32, 64, 128, 256, 512, 1,024, 2,048, and 4,096. Experimental results indicate that CPU achieves better performance than CPU plus GPGPU for $n{\leq}64$ because of a large number of read and write operations between CPU and GPGPU. However, CPU plus GPGPU outperforms CPU exponentially in the execution time for $n{\geq}64$.

Spatial Computation on Spark Using GPGPU (GPGPU를 활용한 스파크 기반 공간 연산)

  • Son, Chanseung;Kim, Daehee;Park, Neungsoo
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.5 no.8
    • /
    • pp.181-188
    • /
    • 2016
  • Recently, as the amount of spatial information increases, an interest in the study of spatial information processing has been increased. Spatial database systems extended from the traditional relational database systems are difficult to handle large data sets because of the scalability. SpatialHadoop extended from Hadoop system has a low performance, because spatial computations in SpationHadoop require a lot of write operations of intermediate results to the disk, resulting in the performance degradation. In this paper, Spatial Computation Spark(SC-Spark) is proposed, which is an in-memory based distributed processing framework. SC-Spark is extended from Spark in order to efficiently perform the spatial operation for large-scale data. In addition, SC-Spark based on the GPGPU is developed to improve the performance of the SC-Spark. SC-Spark uses the advantage of the Spark holding intermediate results in the memory. And GPGPU-based SC-Spark can perform spatial operations in parallel using a plurality of processing elements of an GPU. To verify the proposed work, experiments on a single AMD system were performed using SC-Spark and GPGPU-based SC-Spark for Point-in-Polygon and spatial join operation. The experimental results showed that the performance of SC-Spark and GPGPU-based SC-Spark were up-to 8 times faster than SpatialHadoop.

GPGPU Acceleration of SAT Algorithm with Propagation Routine Parallelization (전달 루틴의 병렬화를 통한 SAT 알고리즘의 GPGPU 가속화)

  • Kang, Hyeong-Ju
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.20 no.10
    • /
    • pp.1919-1926
    • /
    • 2016
  • Because of the enormous processing ability, General-Purpose Graphics Processing Unit(GPGPU) has been applied to many fields including electronics design automation. The SAT algorithm is one of the core algorithm in many electronics design automation tools. There has been some efforts to apply GPGPU to the SAT algorithm, but it is difficult to parallelize the SAT algorithm because of its characteristics. In this paper, I applied GPGPU to the SAT algorithm by parallelizing the propagation routine that is relatively suitable to parallel processing. On the basis of the similarity of the propagation routine to the sparse matrix multiplication, the data structure for the SAT problem is constituted, and the parallel propagation routine is described. To prevent data loss between paralllel threads, atomic operations are exploited. The experimental results for some benchmark SAT problems show that the proposed algorithm is superior to the previous GPGPU-based SAT solver.

SimTBS: Simulator For GPGPU Thread Block Scheduling (SimTBS: GPGPU 스레드블록 스케줄링 시뮬레이터)

  • Cho, Kyung-Woon;Bahn, Hyokyung
    • The Journal of The Institute of Internet, Broadcasting and Communication
    • /
    • v.20 no.4
    • /
    • pp.87-92
    • /
    • 2020
  • Although GPGPU (General-Purpose GPU) can maximize performance by parallelizing a task with tens of thousands of threads, those threads are internally grouped into a thread block, which is a base unit for processing and resource allocation. A thread block scheduler is a specialized hardware gadget whose role is to allocate thread blocks to GPGPU processing hardware in a round-robin manner. However, round-robin is a sequential allocation policy and is not optimized for GPGPU resource utilization. In this paper, we propose a thread block scheduler model which can analyze and quantify performances for various thread block scheduling policies. Experiment results from the implemented simulator of our model show that the legacy hardware thread block scheduling does not behave well when workload becomes heavy.

Method of Multi Thread Management based on Shader Instruction for Mobile GPGPU (GPGPU를 위한 쉐이더 명령어기반 멀티 스레드 관리 기법)

  • Lee, Kwang-Yeob;Park, Tae-Ryong
    • Journal of IKEEE
    • /
    • v.16 no.4
    • /
    • pp.310-315
    • /
    • 2012
  • This thesis is intended to design multi thread mobile GPGPU optimized in mobile environment, and to verify an effective thread management method of the multi thread mobile processor. In thread management, there is no management hardware and implement with software instructions. For the verification of the multi thread management method, Lane detection algorithm was implemented to compare nVidia's CUDA Architecture and the designed GPGPU in terms of thread management efficiency. The number of thread is normalized to 48 threads. An implemented Land Detection Algorithm is composed of Gaussian filter algorithm and Sobel Edge Detection algorithm. As a result, the designed GPGPU's thread efficiency is up to 2 times higher than CUDA's thread efficiency.