• Title, Summary, Keyword: Parallel Computing

Search Result 721, Processing Time 0.05 seconds

Numerical Study of SPGD-based Phase Control of Coherent Beam Combining under Various Turbulent Atmospheric Conditions (대기외란에 따른 SPGD 기반 결맞음 빔결합 시스템 위상제어 동작성능 분석)

  • Kim, Hansol;Na, Jeongkyun;Jeong, Yoonchan
    • Korean Journal of Optics and Photonics
    • /
    • v.31 no.6
    • /
    • pp.247-258
    • /
    • 2020
  • In this paper, based on a stochastic parallel gradient descent (SPGD) algorithm we study phase control of a coherent-beam-combining system under turbulent atmospheric conditions. Based on the statistical theory of atmospheric turbulence, we carry out the analysis of the phase and wavefront distortion of a laser beam propagating through a turbulent atmospheric medium. We also conduct numerical simulations of a coherent-beam-combining system with 7- and 19-channel laser beams distorted by atmospheric turbulence. Through numerical simulations, we characterize the phase-control characteristics and efficiency of the coherent-beam-combining system under various degrees of atmospheric turbulence. It is verified that the SPGD algorithm is capable of realizing 7-channel coherent beam combining with a beam-combining efficiency of more than 90%, even under the turbulent atmospheric conditions up to cn2 of 10-13 m-2/3. In the case of 19-channel coherent beam combining, it is shown that the same turbulent atmospheric conditions result in a drastic reduction of the beam-combining efficiency down to 60%, due to the elevated impact of the corresponding refractive-index inhomogeneity. In addition, by putting together the number of iterations of the SPGD algorithm required for phase locking under atmospheric turbulence and the time intervals of atmospheric phenomena, which typically are of the order of ㎲, it is estimated that hundreds of MHz to a few GHz of computing bandwidth of SPGD-based phase control may be required for a coherent-beam-combining system to confront such turbulent atmospheric conditions. We expect the results of this paper to be useful for quantitatively analyzing and predicting the effects of atmospheric turbulence on the SPGD-based phase-control performance of a coherent-beam-combining system.

The Analysis of Fire-Driven Flow and Temperature in The Railway Tunnel with Ventilation (환기를 동반한 철도터널 화재 연기유속 및 온도장 해석)

  • Jang, Yong-Jun;Lee, Chang-Hyun;Kim, Hag-Beom;Lee, Woo-Dong
    • Proceedings of the KSR Conference
    • /
    • /
    • pp.1794-1801
    • /
    • 2008
  • Fire-driven flow and temperature distribution in a ventilated tunnel was analyzed by Large Eddy Simulation using FDS code. The simulated tunnel is 182m length, 5.4m wide and 2.4m height. A pool fire was located 112m from tunnel entrance and was taken as a heat source of $0.89m^2$. The heat is assumed to be released uniformly throughout the whole simulated time. The fire strength was 2.76MW and the fuel burnt was octane. The parallel computational method was employed to accelerate the computing time and manage the large grid points which is not possible to handle in the one CPU. The total grid points used were $2.4{\times}10^6$ and 7 CPUs were used to calculate the momentum and energy equations. The simulated results were well compared with the experiments.

  • PDF

An Efficient Disk Sharing Technique supporting Single Disk I/O Space in Linux Cluster Systems (리눅스 클러스터 시스템에서 단일 디스크 입출력 공간을 지원하는 효율적 디스크 공유 기법)

  • 김태호;이종우;이재원;김성동;채진석
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.9 no.6
    • /
    • pp.635-645
    • /
    • 2003
  • One of very important features that are necessarily supported by clustered parallel computer systems is a single I/O system image in which users can access both the local and remote I/O resources transparently. In this paper, we propose an efficient disk sharing technique supporting a single disk I/O system image architecture. The design separates the I/O subsystem of a cluster into the file system and a set of virtual hard disk drivers. The virtual hard disk driver deals with a hard disk in the remote node as a local hard disk. All services provided by it are performed in the device driver level without any modification of file systems. Users can, therefore, access all the disks in the cluster regardless of their locations. Our virtual hard disk driver is implemented under the linux, and also tested in a linux cluster system. We find by experiments that it can successfully support a single disk I/O space, and at the same time it shows better performance than NFS. We are sure that this paper can be a guideline for single I/O space of other devices to be easily constructed.

A Multimedia Presentation Authoring System based on Conceptual Temporal Relations (개념적 시간관계 기반의 멀티미디어 프레젠테이션 저작 시스템)

  • 노승진;장진희;성미영
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.9 no.3
    • /
    • pp.266-277
    • /
    • 2003
  • Every conceptual temporal rat relationship can be described using one of seven relations (before, meets, overlaps, during, starts, finishes, and equals ). The conceptual representation provides an efficient means for our multimedia authoring system to automatically fill in the necessary timing details. We developed a multimedia Presentation authoring system that supports a mechanism for conceptually representing the temporal relations of different media. Among the many editors that make up our system, the temporal relation editor provides users with an intuitive mechanism for representing the conceptual flow of a presentation by simple and direct graphical manipulations. Our system is based on the SMIL(Synchronized Multimedia Integration Language). The conceptual temporal relation editor and other editors of our system exchange their information in real-time and automatically generate SMIL codes through the SMIL Object Manager. Our system uses TRN(Temporal Relation Network) as its internal multimedia presentation representation. The TRN corresponds exactly to the structure seen in the graphical representation of the presentation. A parallel relationship found in a TRN can be collapsed into a single synchronization block. This facilitates the determination of the playing time of each component and can be the basic unit for reusability of already prepared blocks of presentation code.

Comparison of the wall clock time for extracting remote sensing data in Hierarchical Data Format using Geospatial Data Abstraction Library by operating system and compiler (운영 체제와 컴파일러에 따른 Geospatial Data Abstraction Library의 Hierarchical Data Format 형식 원격 탐사 자료 추출 속도 비교)

  • Yoo, Byoung Hyun;Kim, Kwang Soo;Lee, Jihye
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.21 no.1
    • /
    • pp.65-73
    • /
    • 2019
  • The MODIS (Moderate Resolution Imaging Spectroradiometer) data in Hierarchical Data Format (HDF) have been processed using the Geospatial Data Abstraction Library (GDAL). Because of a relatively large data size, it would be preferable to build and install the data analysis tool with greater computing performance, which would differ by operating system and the form of distribution, e.g., source code or binary package. The objective of this study was to examine the performance of the GDAL for processing the HDF files, which would guide construction of a computer system for remote sensing data analysis. The differences in execution time were compared between environments under which the GDAL was installed. The wall clock time was measured after extracting data for each variable in the MODIS data file using a tool built lining against GDAL under a combination of operating systems (Ubuntu and openSUSE), compilers (GNU and Intel), and distribution forms. The MOD07 product, which contains atmosphere data, were processed for eight 2-D variables and two 3-D variables. The GDAL compiled with Intel compiler under Ubuntu had the shortest computation time. For openSUSE, the GDAL compiled using GNU and intel compilers had greater performance for 2-D and 3-D variables, respectively. It was found that the wall clock time was considerably long for the GDAL complied with "--with-hdf4=no" configuration option or RPM package manager under openSUSE. These results indicated that the choice of the environments under which the GDAL is installed, e.g., operation system or compiler, would have a considerable impact on the performance of a system for processing remote sensing data. Application of parallel computing approaches would improve the performance of the data processing for the HDF files, which merits further evaluation of these computational methods.

Development and evaluation of a 2-dimensional land surface flood analysis model using uniform square grid (정형 사각 격자 기반의 2차원 지표면 침수해석 모형 개발 및 평가)

  • Choi, Yun-Seok;Kim, Joo-Hun;Choi, Cheon-Kyu;Kim, Kyung-Tak
    • Journal of Korea Water Resources Association
    • /
    • v.52 no.5
    • /
    • pp.361-372
    • /
    • 2019
  • The purpose of this study is to develop a two-dimensional land surface flood analysis model based on uniform square grid using the governing equations except for the convective acceleration term in the momentum equation. Finite volume method and implicit method were applied to spatial and temporal discretization. In order to reduce the execution time of the model, parallel computation techniques using CPU were applied. To verify the developed model, the model was compared with the analytical solution and the behavior of the model was evaluated through numerical experiments in the virtual domain. In addition, inundation analyzes were performed at different spatial resolutions for the domestic Janghowon area and the Sebou river area in Morocco, and the results were compared with the analysis results using the CAESER-LISFLOOD (CLF) model. In model verification, simulation results were well matched with the analytical solution, and the flow analyses in the virtual domain were also evaluated to be reasonable. The results of inundation simulations in the Janghowon and the Sebou river area by this study and CLF model were similar with each other and for Janghowon area, the simulation result was also similar to the flooding area of flood hazard map. The different parts in the simulation results of this study and the CLF model were compared and evaluated for each case. The results of this study suggest that the model proposed in this study can simulate the flooding well in the floodplain. However, in case of flood analysis using the model presented in this study, the characteristics and limitations of the model by domain composition method, governing equation and numerical method should be fully considered.

Design of a Bit-Serial Divider in GF(2$^{m}$ ) for Elliptic Curve Cryptosystem (타원곡선 암호시스템을 위한 GF(2$^{m}$ )상의 비트-시리얼 나눗셈기 설계)

  • 김창훈;홍춘표;김남식;권순학
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.27 no.12C
    • /
    • pp.1288-1298
    • /
    • 2002
  • To implement elliptic curve cryptosystem in GF(2$\^$m/) at high speed, a fast divider is required. Although bit-parallel architecture is well suited for high speed division operations, elliptic curve cryptosystem requires large m(at least 163) to support a sufficient security. In other words, since the bit-parallel architecture has an area complexity of 0(m$\^$m/), it is not suited for this application. In this paper, we propose a new serial-in serial-out systolic array for computing division operations in GF(2$\^$m/) using the standard basis representation. Based on a modified version of tile binary extended greatest common divisor algorithm, we obtain a new data dependence graph and design an efficient bit-serial systolic divider. The proposed divider has 0(m) time complexity and 0(m) area complexity. If input data come in continuously, the proposed divider can produce division results at a rate of one per m clock cycles, after an initial delay of 5m-2 cycles. Analysis shows that the proposed divider provides a significant reduction in both chip area and computational delay time compared to previously proposed systolic dividers with the same I/O format. Since the proposed divider can perform division operations at high speed with the reduced chip area, it is well suited for division circuit of elliptic curve cryptosystem. Furthermore, since the proposed architecture does not restrict the choice of irreducible polynomial, and has a unidirectional data flow and regularity, it provides a high flexibility and scalability with respect to the field size m.

Benchmark Results of a Monte Carlo Treatment Planning system (몬데카를로 기반 치료계획시스템의 성능평가)

  • Cho, Byung-Chul
    • Progress in Medical Physics
    • /
    • v.13 no.3
    • /
    • pp.149-155
    • /
    • 2002
  • Recent advances in radiation transport algorithms, computer hardware performance, and parallel computing make the clinical use of Monte Carlo based dose calculations possible. To compare the speed and accuracies of dose calculations between different developed codes, a benchmark tests were proposed at the XIIth ICCR (International Conference on the use of Computers in Radiation Therapy, Heidelberg, Germany 2000). A Monte Carlo treatment planning comprised of 28 various Intel Pentium CPUs was implemented for routine clinical use. The purpose of this study was to evaluate the performance of our system using the above benchmark tests. The benchmark procedures are comprised of three parts. a) speed of photon beams dose calculation inside a given phantom of 30.5 cm$\times$39.5 cm $\times$ 30 cm deep and filled with 5 ㎣ voxels within 2% statistical uncertainty. b) speed of electron beams dose calculation inside the same phantom as that of the photon beams. c) accuracy of photon and electron beam calculation inside heterogeneous slab phantom compared with the reference results of EGS4/PRESTA calculation. As results of the speed benchmark tests, it took 5.5 minutes to achieve less than 2% statistical uncertainty for 18 MV photon beams. Though the net calculation for electron beams was an order of faster than the photon beam, the overall calculation time was similar to that of photon beam case due to the overhead time to maintain parallel processing. Since our Monte Carlo code is EGSnrc, which is an improved version of EGS4, the accuracy tests of our system showed, as expected, very good agreement with the reference data. In conclusion, our Monte Carlo treatment planning system shows clinically meaningful results. Though other more efficient codes are developed such like MCDOSE and VMC++, BEAMnrc based on EGSnrc code system may be used for routine clinical Monte Carlo treatment planning in conjunction with clustering technique.

  • PDF

Hardware Approach to Fuzzy Inference―ASIC and RISC―

  • Watanabe, Hiroyuki
    • Proceedings of the Korean Institute of Intelligent Systems Conference
    • /
    • /
    • pp.975-976
    • /
    • 1993
  • This talk presents the overview of the author's research and development activities on fuzzy inference hardware. We involved it with two distinct approaches. The first approach is to use application specific integrated circuits (ASIC) technology. The fuzzy inference method is directly implemented in silicon. The second approach, which is in its preliminary stage, is to use more conventional microprocessor architecture. Here, we use a quantitative technique used by designer of reduced instruction set computer (RISC) to modify an architecture of a microprocessor. In the ASIC approach, we implemented the most widely used fuzzy inference mechanism directly on silicon. The mechanism is beaded on a max-min compositional rule of inference, and Mandami's method of fuzzy implication. The two VLSI fuzzy inference chips are designed, fabricated, and fully tested. Both used a full-custom CMOS technology. The second and more claborate chip was designed at the University of North Carolina(U C) in cooperation with MCNC. Both VLSI chips had muliple datapaths for rule digital fuzzy inference chips had multiple datapaths for rule evaluation, and they executed multiple fuzzy if-then rules in parallel. The AT & T chip is the first digital fuzzy inference chip in the world. It ran with a 20 MHz clock cycle and achieved an approximately 80.000 Fuzzy Logical inferences Per Second (FLIPS). It stored and executed 16 fuzzy if-then rules. Since it was designed as a proof of concept prototype chip, it had minimal amount of peripheral logic for system integration. UNC/MCNC chip consists of 688,131 transistors of which 476,160 are used for RAM memory. It ran with a 10 MHz clock cycle. The chip has a 3-staged pipeline and initiates a computation of new inference every 64 cycle. This chip achieved an approximately 160,000 FLIPS. The new architecture have the following important improvements from the AT & T chip: Programmable rule set memory (RAM). On-chip fuzzification operation by a table lookup method. On-chip defuzzification operation by a centroid method. Reconfigurable architecture for processing two rule formats. RAM/datapath redundancy for higher yield It can store and execute 51 if-then rule of the following format: IF A and B and C and D Then Do E, and Then Do F. With this format, the chip takes four inputs and produces two outputs. By software reconfiguration, it can store and execute 102 if-then rules of the following simpler format using the same datapath: IF A and B Then Do E. With this format the chip takes two inputs and produces one outputs. We have built two VME-bus board systems based on this chip for Oak Ridge National Laboratory (ORNL). The board is now installed in a robot at ORNL. Researchers uses this board for experiment in autonomous robot navigation. The Fuzzy Logic system board places the Fuzzy chip into a VMEbus environment. High level C language functions hide the operational details of the board from the applications programme . The programmer treats rule memories and fuzzification function memories as local structures passed as parameters to the C functions. ASIC fuzzy inference hardware is extremely fast, but they are limited in generality. Many aspects of the design are limited or fixed. We have proposed to designing a are limited or fixed. We have proposed to designing a fuzzy information processor as an application specific processor using a quantitative approach. The quantitative approach was developed by RISC designers. In effect, we are interested in evaluating the effectiveness of a specialized RISC processor for fuzzy information processing. As the first step, we measured the possible speed-up of a fuzzy inference program based on if-then rules by an introduction of specialized instructions, i.e., min and max instructions. The minimum and maximum operations are heavily used in fuzzy logic applications as fuzzy intersection and union. We performed measurements using a MIPS R3000 as a base micropro essor. The initial result is encouraging. We can achieve as high as a 2.5 increase in inference speed if the R3000 had min and max instructions. Also, they are useful for speeding up other fuzzy operations such as bounded product and bounded sum. The embedded processor's main task is to control some device or process. It usually runs a single or a embedded processer to create an embedded processor for fuzzy control is very effective. Table I shows the measured speed of the inference by a MIPS R3000 microprocessor, a fictitious MIPS R3000 microprocessor with min and max instructions, and a UNC/MCNC ASIC fuzzy inference chip. The software that used on microprocessors is a simulator of the ASIC chip. The first row is the computation time in seconds of 6000 inferences using 51 rules where each fuzzy set is represented by an array of 64 elements. The second row is the time required to perform a single inference. The last row is the fuzzy logical inferences per second (FLIPS) measured for ach device. There is a large gap in run time between the ASIC and software approaches even if we resort to a specialized fuzzy microprocessor. As for design time and cost, these two approaches represent two extremes. An ASIC approach is extremely expensive. It is, therefore, an important research topic to design a specialized computing architecture for fuzzy applications that falls between these two extremes both in run time and design time/cost. TABLEI INFERENCE TIME BY 51 RULES {{{{Time }}{{MIPS R3000 }}{{ASIC }}{{Regular }}{{With min/mix }}{{6000 inference 1 inference FLIPS }}{{125s 20.8ms 48 }}{{49s 8.2ms 122 }}{{0.0038s 6.4㎲ 156,250 }} }}

  • PDF

Real-time Color Recognition Based on Graphic Hardware Acceleration (그래픽 하드웨어 가속을 이용한 실시간 색상 인식)

  • Kim, Ku-Jin;Yoon, Ji-Young;Choi, Yoo-Joo
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.14 no.1
    • /
    • pp.1-12
    • /
    • 2008
  • In this paper, we present a real-time algorithm for recognizing the vehicle color from the indoor and outdoor vehicle images based on GPU (Graphics Processing Unit) acceleration. In the preprocessing step, we construct feature victors from the sample vehicle images with different colors. Then, we combine the feature vectors for each color and store them as a reference texture that would be used in the GPU. Given an input vehicle image, the CPU constructs its feature Hector, and then the GPU compares it with the sample feature vectors in the reference texture. The similarities between the input feature vector and the sample feature vectors for each color are measured, and then the result is transferred to the CPU to recognize the vehicle color. The output colors are categorized into seven colors that include three achromatic colors: black, silver, and white and four chromatic colors: red, yellow, blue, and green. We construct feature vectors by using the histograms which consist of hue-saturation pairs and hue-intensity pairs. The weight factor is given to the saturation values. Our algorithm shows 94.67% of successful color recognition rate, by using a large number of sample images captured in various environments, by generating feature vectors that distinguish different colors, and by utilizing an appropriate likelihood function. We also accelerate the speed of color recognition by utilizing the parallel computation functionality in the GPU. In the experiments, we constructed a reference texture from 7,168 sample images, where 1,024 images were used for each color. The average time for generating a feature vector is 0.509ms for the $150{\times}113$ resolution image. After the feature vector is constructed, the execution time for GPU-based color recognition is 2.316ms in average, and this is 5.47 times faster than the case when the algorithm is executed in the CPU. Our experiments were limited to the vehicle images only, but our algorithm can be extended to the input images of the general objects.