• Title/Summary/Keyword: Parallel Computing

Search Result 807, Processing Time 0.026 seconds

Direct Numerical Simulation of Active Fiber Composite (능동 섬유 복합재의 직접적 수치 모사)

  • 백승훈;김승조
    • Proceedings of the Korean Society For Composite Materials Conference
    • /
    • 2003.04a
    • /
    • pp.5-9
    • /
    • 2003
  • Stress and deflection of Active Fiber Composite(AFC) embedded and/or attached composite structures are numerically investigated at the constituent level by the Direct Numerical Simulation(DNS). The DNS approach which models and simulates the fiber and matrix directly using 3D finite elements need to be solved by efficient way. To handle this large scale problem, parallel program for solving piezoelectric behavior was developed and run on the parallel computing environment. Also, the stress result from DNS approach is compared with that from uniform field model.

  • PDF

Deep Web and MapReduce

  • Tao, Yufei
    • Journal of Computing Science and Engineering
    • /
    • v.7 no.3
    • /
    • pp.147-158
    • /
    • 2013
  • This invited paper introduces results on Web science and technology obtained during work with the Korea Advanced Institute of Science and Technology. In the first part, we discuss algorithms for exploring the deep Web, which refers to the collection of Web pages that cannot be reached by conventional Web crawlers. In the second part, we discuss sorting algorithms on the MapReduce system, which has become a dominant paradigm for massive parallel computing.

A Comparative Performance Study for Compute Node Sharing

  • Park, Jeho;Lam, Shui F.
    • Journal of Computing Science and Engineering
    • /
    • v.6 no.4
    • /
    • pp.287-293
    • /
    • 2012
  • We introduce a methodology for the study of the application-level performance of time-sharing parallel jobs on a set of compute nodes in high performance clusters and report our findings. We assume that parallel jobs arriving at a cluster need to share a set of nodes with the jobs of other users, in that they must compete for processor time in a time-sharing manner and other limited resources such as memory and I/O in a space-sharing manner. Under the assumption, we developed a methodology to simulate job arrivals to a set of compute nodes, and gather and process performance data to calculate the percentage slowdown of parallel jobs. Our goal through this study is to identify a better combination of jobs that minimize performance degradations due to resource sharing and contention. Through our experiments, we found a couple of interesting behaviors for overlapped parallel jobs, which may be used to suggest alternative job allocation schemes aiming to reduce slowdowns that will inevitably result due to resource sharing on a high performance computing cluster. We suggest three job allocation strategies based on our empirical results and propose further studies of the results using a supercomputing facility at the San Diego Supercomputing Center.

An Efficient Solution Method to MDO Problems in Sequential and Parallel Computing Environments (순차 및 병렬처리 환경에서 효율적인 다분야통합최적설계 문제해결 방법)

  • Lee, Se-Jung
    • Korean Journal of Computational Design and Engineering
    • /
    • v.16 no.3
    • /
    • pp.236-245
    • /
    • 2011
  • Many researchers have recently studied multi-level formulation strategies to solve the MDO problems and they basically distributed the coupling compatibilities across all disciplines, while single-level formulations concentrate all the controls at the system-level. In addition, approximation techniques became remedies for computationally expensive analyses and simulations. This paper studies comparisons of the MDO methods with respect to computing performance considering both conventional sequential and modem distributed/parallel processing environments. The comparisons show Individual Disciplinary Feasible (IDF) formulation is the most efficient for sequential processing and IDF with approximation (IDFa) is the most efficient for parallel processing. Results incorporating to popular design examples show this finding. The author suggests design engineers should firstly choose IDF formulation to solve MDO problems because of its simplicity of implementation and not-bad performance. A single drawback of IDF is requiring more memory for local design variables and coupling variables. Adding cheap memories can save engineers valuable time and effort for complicated multi-level formulations and let them free out of no solution headache of Multi-Disciplinary Analysis (MDA) of the Multi-Disciplinary Feasible (MDF) formulation.

COMPUTATIONAL EFFICIENCY OF A MODIFIED SCATTERING KERNEL FOR FULL-COUPLED PHOTON-ELECTRON TRANSPORT PARALLEL COMPUTING WITH UNSTRUCTURED TETRAHEDRAL MESHES

  • Kim, Jong Woon;Hong, Ser Gi;Lee, Young-Ouk
    • Nuclear Engineering and Technology
    • /
    • v.46 no.2
    • /
    • pp.263-272
    • /
    • 2014
  • Scattering source calculations using conventional spherical harmonic expansion may require lots of computation time to treat full-coupled three-dimensional photon-electron transport in a highly anisotropic scattering medium where their scattering cross sections should be expanded with very high order (e.g., $P_7$ or higher) Legendre expansions. In this paper, we introduce a modified scattering kernel approach to avoid the unnecessarily repeated calculations involved with the scattering source calculation, and used it with parallel computing to effectively reduce the computation time. Its computational efficiency was tested for three-dimensional full-coupled photon-electron transport problems using our computer program which solves the multi-group discrete ordinates transport equation by using the discontinuous finite element method with unstructured tetrahedral meshes for complicated geometrical problems. The numerical tests show that we can improve speed up to 17~42 times for the elapsed time per iteration using the modified scattering kernel, not only in the single CPU calculation but also in the parallel computing with several CPUs.

MPI-GWAS: a supercomputing-aided permutation approach for genome-wide association studies

  • Paik, Hyojung;Cho, Yongseong;Cho, Seong Beom;Kwon, Oh-Kyoung
    • Genomics & Informatics
    • /
    • v.20 no.1
    • /
    • pp.14.1-14.4
    • /
    • 2022
  • Permutation testing is a robust and popular approach for significance testing in genomic research that has the advantage of reducing inflated type 1 error rates; however, its computational cost is notorious in genome-wide association studies (GWAS). Here, we developed a supercomputing-aided approach to accelerate the permutation testing for GWAS, based on the message-passing interface (MPI) on parallel computing architecture. Our application, called MPI-GWAS, conducts MPI-based permutation testing using a parallel computing approach with our supercomputing system, Nurion (8,305 compute nodes, and 563,740 central processing units [CPUs]). For 107 permutations of one locus in MPI-GWAS, it was calculated in 600 s using 2,720 CPU cores. For 107 permutations of ~30,000-50,000 loci in over 7,000 subjects, the total elapsed time was ~4 days in the Nurion supercomputer. Thus, MPI-GWAS enables us to feasibly compute the permutation-based GWAS within a reason-able time by harnessing the power of parallel computing resources.

A Parallel Computation of Finite Element Analysis on a Transputer System (트랜스퓨터를 이용한 유안영속해석의 병렬계산)

  • Kim, Keun-Hwan;Choi, Kyung;Jung, Hyun-Kyo;Lee, Ki-Sik;Hahn, Song-Yop
    • The Transactions of the Korean Institute of Electrical Engineers
    • /
    • v.41 no.7
    • /
    • pp.735-741
    • /
    • 1992
  • This paper presents a parallel algorithm for the finite element analysis using relatively inexpensive transputer parallel system. The substructure method, which is highly parallel in nature, is used to improve the parallel computing efficiency by splitting up the whole structure into substructures. The proposed algorithm is applied to a simple two-dimensional magnetostatic problem. It is found that the more the number of transputer is increased, the more the total computation time is reduced. And the computational efficiency becomes better as the number of internal boundary nodes becomes smaller.

  • PDF

Analysis of Programming Techniques for Creating Optimized CUDA Software (최적화된 CUDA 소프트웨어 제작을 위한 프로그래밍 기법 분석)

  • Kim, Sung-Soo;Kim, Dong-Heon;Woo, Sang-Kyu;Ihm, In-Sung
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.7
    • /
    • pp.775-787
    • /
    • 2010
  • Unlike general-purpose CPUs, the GPUs have been specialized as many-core streaming processors, and are frequently replacing the CPUs in an increasing range of computations thanks to their outstanding parallel computing capacity. In order to respond to such trend, NVIDIA has recently issued a new parallel computing architecture called CUDA(Compute Unified Device Architecture), offering a flexible GPU programming environment for GPGPU(General Purpose GPU) computing. In general, when programmers use the CUDA API, they should clearly understand many aspects of GPU's computing architecture to produce efficient parallel software. In this article, we explain several optimization techniques for CUDA programming that we have verified through a lot of experiment and trial and error, and review how those techniques affect the performance of code execution. In particular, we use a specific problem as an example to analyze several elements that affect performances, such as effective accesses to hierarchical memory system, processor occupancy, and latency hiding. In conclusion, we present several directions that may be utilized effectively in CUDA-based parallel programming.

Design of Web-based Parallel Computing Environment Using Aglet (Aglet을 이용한 웹 기반 병렬컴퓨팅 환경설계)

  • 김윤호
    • Journal of the Korea Computer Industry Society
    • /
    • v.3 no.2
    • /
    • pp.209-216
    • /
    • 2002
  • World Wide Web has potential possibility of infrastructure for parallel computing environment connecting massive computing resources, not just platform to provide and share information via browser. The approach of Web-based parallel computing has many advantages of the ease of accessibility, scalability, cost-effectiveness, and utilization of existing networks. Applet has the possibility of decomposing the independent/parallel task, moving over network, and executing in computers connected in Web, but it lacks in the flexibility due to strict security semantic model. Therefore, in this paper, Web-based parallel computing environment using mobile agent, Aglet (Agile applet) was designed and possible implementation technologies and architecture were analyzed. And simple simulation and analysis was done compared with applet-based approach.

  • PDF

Efficient Parallel TLD on CPU-GPU Platform for Real-Time Tracking

  • Chen, Zhaoyun;Huang, Dafei;Luo, Lei;Wen, Mei;Zhang, Chunyuan
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.14 no.1
    • /
    • pp.201-220
    • /
    • 2020
  • Trackers, especially long-term (LT) trackers, now have a more complex structure and more intensive computation for nowadays' endless pursuit of high accuracy and robustness. However, computing efficiency of LT trackers cannot meet the real-time requirement in various real application scenarios. Considering heterogeneous CPU-GPU platforms have been more popular than ever, it is a challenge to exploit the computing capacity of heterogeneous platform to improve the efficiency of LT trackers for real-time requirement. This paper focuses on TLD, which is the first LT tracking framework, and proposes an efficient parallel implementation based on OpenCL. In this paper, we firstly make an analysis of the TLD tracker and then optimize the computing intensive kernels, including Fern Feature Extraction, Fern Classification, NCC Calculation, Overlaps Calculation, Positive and Negative Samples Extraction. Experimental results demonstrate that our efficient parallel TLD tracker outperforms the original TLD, achieving the 3.92 speedup on CPU and GPU. Moreover, the parallel TLD tracker can run 52.9 frames per second and meet the real-time requirement.