Search | Korea Science

Performance Optimization of Parallel Algorithms

Hudik, Martin;Hodon, Michal
- Journal of Communications and Networks
- /
- v.16 no.4
- /
- pp.436-446
- /
- 2014
The high intensity of research and modeling in fields of mathematics, physics, biology and chemistry requires new computing resources. For the big computational complexity of such tasks computing time is large and costly. The most efficient way to increase efficiency is to adopt parallel principles. Purpose of this paper is to present the issue of parallel computing with emphasis on the analysis of parallel systems, the impact of communication delays on their efficiency and on overall execution time. Paper focuses is on finite algorithms for solving systems of linear equations, namely the matrix manipulation (Gauss elimination method, GEM). Algorithms are designed for architectures with shared memory (open multiprocessing, openMP), distributed-memory (message passing interface, MPI) and for their combination (MPI + openMP). The properties of the algorithms were analytically determined and they were experimentally verified. The conclusions are drawn for theory and practice.
https://doi.org/10.1109/JCN.2014.000074 인용 PDF KSCI

Fast Circuit Simulation Based on Parallel-Distributed LIM using Cloud Computing System

Inoue, Yuta;Sekine, Tadatoshi;Hasegawa, Takahiro;Asai, Hideki
- JSTS:Journal of Semiconductor Technology and Science
- /
- v.10 no.1
- /
- pp.49-54
- /
- 2010
This paper describes a fast circuit simulation technique using the latency insertion method (LIM) with a parallel and distributed leapfrog algorithm. The numerical simulation results on the PC cluster system that uses the cloud computing system are shown. As a result, it is confirmed that our method is very useful and practical.
https://doi.org/10.5573/JSTS.2010.10.1.049 인용 PDF KSCI

AN ASYNCHRONOUS PARALLEL SOLVER FOR SOME MATRIX PROBLEMS

Park, Pil-Seong
- Journal of applied mathematics & informatics
- /
- v.7 no.3
- /
- pp.1045-1058
- /
- 2000
In usual synchronous parallel computing, workload balance is a crucial factor to reduce idle times of some processors that have finished their jobs earlier than others. However, it is difficult to achieve on a heterogeneous workstation clusters where the available computing power of each processor is unpredictable. As a way to overcome such a problem, the idea of asynchronous methods has grown out and is being increasingly used and studied, but there is none for eigenvalue problems yet. In this paper, we suggest a new asynchronous method to solve some singular matrix problems, that can also be used for finding a certain eigenvector of some matrices.

Adaptive and optimized agent placement scheme for parallel agent-based simulation

Jin, Ki-Sung;Lee, Sang-Min;Kim, Young-Chul
- ETRI Journal
- /
- v.44 no.2
- /
- pp.313-326
- /
- 2022
This study presents a noble scheme for distributed and parallel simulations with optimized agent placement for simulation instances. The traditional parallel simulation has some limitations in that it does not provide sufficient performance even though using multiple resources. The main reason for this discrepancy is that supporting parallelism inevitably requires additional costs in addition to the base simulation cost. We present a comprehensive study of parallel simulation architectures, execution flows, and characteristics. Then, we identify critical challenges for optimizing large simulations for parallel instances. Based on our cost-benefit analysis, we propose a novel approach to overcome the performance constraints of agent-based parallel simulations. We also propose a solution for eliminating the synchronizing cost among local instances. Our method ensures balanced performance through optimal deployment of agents to local instances and an adaptive agent placement scheme according to the simulation load. Additionally, our empirical evaluation reveals that the proposed model achieves better performance than conventional methods under several conditions.
https://doi.org/10.4218/etrij.2020-0399 인용 PDF KSCI

An Efficient Multidimensional Index Structure for Parallel Environments

Bok Koung-Soo;Song Seok-Il;Yoo Jae-Soo
- International Journal of Contents
- /
- v.1 no.1
- /
- pp.50-58
- /
- 2005
Generally, multidimensional data such as image and spatial data require large amount of storage space. There is a limit to store and manage those large amounts of data in single workstation. If we manage the data on parallel computing environment which is being actively researched these days, we can get highly improved performance. In this paper, we propose a parallel multidimensional index structure that exploits the parallelism of the parallel computing environment. The proposed index structure is nP(processor)-nxmD(disk) architecture which is the hybrid type of nP-nD and 1P-nD. Its node structure in-creases fan-out and reduces the height of an index. Also, a range search algorithm that maximizes I/O parallelism is devised, and it is applied to k-nearest neighbor queries. Through various experiments, it is shown that the proposed method outperforms other parallel index structures.
PDF

A Parallel Computation of Finite Element Analysis on a Transputer System (트랜스퓨터를 이용한 유안영속해석의 병렬계산)

Kim, Keun-Hwan;Choi, Kyung;Jung, Hyun-Kyo;Lee, Ki-Sik;Hahn, Song-Yop
- The Transactions of the Korean Institute of Electrical Engineers
- /
- v.41 no.7
- /
- pp.735-741
- /
- 1992
This paper presents a parallel algorithm for the finite element analysis using relatively inexpensive transputer parallel system. The substructure method, which is highly parallel in nature, is used to improve the parallel computing efficiency by splitting up the whole structure into substructures. The proposed algorithm is applied to a simple two-dimensional magnetostatic problem. It is found that the more the number of transputer is increased, the more the total computation time is reduced. And the computational efficiency becomes better as the number of internal boundary nodes becomes smaller.
PDF

Parallel String Matching and Optimization Using OpenCL on FPGA (FPGA 상에서 OpenCL을 이용한 병렬 문자열 매칭 구현과 최적화 방향)

Yoon, Jin Myung;Choi, Kang-Il;Kim, Hyun Jin
- The Transactions of The Korean Institute of Electrical Engineers
- /
- v.66 no.1
- /
- pp.100-106
- /
- 2017
In this paper, we propose a parallel optimization method of Aho-Corasick (AC) algorithm and Parallel Failureless Aho-Corasick (PFAC) algorithm using Open Computing Language (OpenCL) on Field Programmable Gate Array (FPGA). The low throughput of string matching engine causes the performance degradation of network process. Recently, many researchers have studied the string matching engine using parallel computing. FPGA's vendors offer a parallel computing platform using OpenCL. In this paper, we apply the AC and PFAC algorithm on DE1-SoC board with Cyclone V FPGA, where the optimization that considers FPGA architecture is performed. Experiments are performed considering global id, local id, local memory, and loop unrolling optimizations using PFAC algorithm. The performance improvement using loop unrolling is 129 times greater than AC algorithm that not adopt loop unrolling. The performance improvements using loop unrolling are 1.1, 0.2, and 1.5 times greater than those using global id, local id, and local memory optimizations mentioned above.
https://doi.org/10.5370/KIEE.2017.66.1.100 인용 PDF KSCI

COMPUTATIONAL EFFICIENCY OF A MODIFIED SCATTERING KERNEL FOR FULL-COUPLED PHOTON-ELECTRON TRANSPORT PARALLEL COMPUTING WITH UNSTRUCTURED TETRAHEDRAL MESHES

Kim, Jong Woon;Hong, Ser Gi;Lee, Young-Ouk
- Nuclear Engineering and Technology
- /
- v.46 no.2
- /
- pp.263-272
- /
- 2014
Scattering source calculations using conventional spherical harmonic expansion may require lots of computation time to treat full-coupled three-dimensional photon-electron transport in a highly anisotropic scattering medium where their scattering cross sections should be expanded with very high order (e.g., $P_7$ or higher) Legendre expansions. In this paper, we introduce a modified scattering kernel approach to avoid the unnecessarily repeated calculations involved with the scattering source calculation, and used it with parallel computing to effectively reduce the computation time. Its computational efficiency was tested for three-dimensional full-coupled photon-electron transport problems using our computer program which solves the multi-group discrete ordinates transport equation by using the discontinuous finite element method with unstructured tetrahedral meshes for complicated geometrical problems. The numerical tests show that we can improve speed up to 17~42 times for the elapsed time per iteration using the modified scattering kernel, not only in the single CPU calculation but also in the parallel computing with several CPUs.
https://doi.org/10.5516/NET.01.2013.033 인용 PDF KSCI

Parallel processing in structural reliability

Pellissetti, M.F.
- Structural Engineering and Mechanics
- /
- v.32 no.1
- /
- pp.95-126
- /
- 2009
The present contribution addresses the parallelization of advanced simulation methods for structural reliability analysis, which have recently been developed for large-scale structures with a high number of uncertain parameters. In particular, the Line Sampling method and the Subset Simulation method are considered. The proposed parallel algorithms exploit the parallelism associated with the possibility to simultaneously perform independent FE analyses. For the Line Sampling method a parallelization scheme is proposed both for the actual sampling process, and for the statistical gradient estimation method used to identify the so-called important direction of the Line Sampling scheme. Two parallelization strategies are investigated for the Subset Simulation method: the first one consists in the embarrassingly parallel advancement of distinct Markov chains; in this case the speedup is bounded by the number of chains advanced simultaneously. The second parallel Subset Simulation algorithm utilizes the concept of speculative computing. Speedup measurements in context with the FE model of a multistory building (24,000 DOFs) show the reduction of the wall-clock time to a very viable amount (<10 minutes for Line Sampling and ${\approx}$ 1 hour for Subset Simulation). The measurements, conducted on clusters of multi-core nodes, also indicate a strong sensitivity of the parallel performance to the load level of the nodes, in terms of the number of simultaneously used cores. This performance degradation is related to memory bottlenecks during the modal analysis required during each FE analysis.
https://doi.org/10.12989/sem.2009.32.1.095 인용 KSCI

Parallel Computation Algorithm of Gauss Elimination in Power system Analysis (전력계통해석을 위한 자코비안행렬 가우스소거의병렬계산 알고리즘)

서의석;오태규
- The Transactions of the Korean Institute of Electrical Engineers
- /
- v.43 no.2
- /
- pp.189-196
- /
- 1994
This paper describes a parallel computing algorithm in Gauss elimination of Jacobian matrix to large-scale power system. The structure of Jacobian matrix becomes different according to ordering method of buses. In sequential computation buses are ordered to minimize the number of fill-in in the triangulation of the Jacobian matrix. The proposed method develops the parallelism in the Gauss elimination by using ND(nested dissection) ordering. In this procedure the level structure of the power system network is transformed to be long and narrow by using end buses which results in balance of computing load among processes and maximization of parallel computation. Each processor uses the sequential computation method to preserve the sqarsity of matrix.
PDF

Search Result 282, Processing Time 0.028 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)