• Title/Summary/Keyword: Parallel Computing

Search Result 807, Processing Time 0.025 seconds

Efficient Hardware Transactional Memory Scheme for Processing Transactions in Multi-core In-Memory Environment (멀티코어 인메모리 환경에서 트랜잭션을 처리하기 위한 효율적인 HTM 기법)

  • Jang, Yeonwoo;Kang, Moonhwan;Yoon, Min;Chang, Jaewoo
    • KIISE Transactions on Computing Practices
    • /
    • v.23 no.8
    • /
    • pp.466-472
    • /
    • 2017
  • Hardware Transactional Memory (HTM) has greatly changed the parallel programming paradigm for transaction processing. Since Intel has recently proposed Transactional Synchronization Extension (TSX), a number of studies based on HTM have been conducted. However, the existing studies support conflict prediction for a single cause of the transaction processing and provide a standardized TSX environment for all workloads. To solve the problems, we propose an efficient hardware transactional memory scheme for processing transactions in multi-core in-memory environment. First, the proposed scheme determines whether to use Software Transactional Memory (STM) or the serial execution as a fallback path of HTM by using a prediction matrix to collect the information of previously executed transactions. Second, the proposed scheme performs efficient transaction processing according to the characteristic of a given workload by providing a retry policy based on machine learning algorithms. Finally, through the experimental performance evaluation using Stanford transactional applications for multi-processing (STAMP), the proposed scheme shows 10~20% better performance than the existing schemes.

Unmanned Aircraft Platform Based Real-time LiDAR Data Processing Architecture for Real-time Detection Information (실시간 탐지정보 제공을 위한 무인기 플랫폼 기반 실시간 LiDAR 데이터 처리구조)

  • Eum, Junho;Berhanu, Eyassu;Oh, Sangyoon
    • KIISE Transactions on Computing Practices
    • /
    • v.21 no.12
    • /
    • pp.745-750
    • /
    • 2015
  • LiDAR(Light Detection and Ranging) technology provides realistic 3-dimension image information, and it has been widely utilized in various fields. However, the utilization of this technology in the military domain requires prompt responses to dynamically changing tactical environment and is therefore limited since LiDAR technology requires complex processing in order for extensive amounts of LiDAR data to be utilized. In this paper, we introduce an Unmanned Aircraft Platform Based Real-time LiDAR Data Processing Architecture that can provide real-time detection information by parallel processing and off-loading between the UAV processing and high-performance data processing areas. We also conducted experiments to verify the feasibility of our proposed architecture. Processing with ARM cluster similar to the UAV platform processing area results in similar or better performance when compared to the current method. We determined that our proposed architecture can be utilized in the military domain for tactical and combat purposes such as unmanned monitoring system.

Load Balancing of Heterogeneous Workstation Cluster based on Relative Load Index (상대적 부하 색인을 기반으로 한 이기종 워크스테이션 클러스터의 부하 균형)

  • Ji, Byoung-Jun;Lee, Kwang-Mo
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.8 no.2
    • /
    • pp.183-194
    • /
    • 2002
  • The clustering environment with heterogeneous workstations provides the cost effectiveness and usability for executing applications in parallel. Load balancing is considered a necessary feature for a cluster of heterogeneous workstations to minimize the turnaround time. Previously, static load balancing that assigns a predetermined weight for the processing capability of each workstation, or dynamic approaches which execute a benchmark program to get relative processing capability of each workstation were proposed. The execution of the benchmark program, which has nothing to do with the application being executed, consumes the computation time and the overall turnaround time is delayed. In this paper, we present efficient methods for task distribution and task migration, based on the relative load index. We designed and implemented a load balancing system for the clustering environment with heterogeneous workstations. Turnaround times of our methods and the round-robin approach, as well as the load balancing method using a benchmark program, were compared. The experimental results show that our methods outperform all the other methods that we compared.

Dynamic Directory Table: On-Demand Allocation of Directory Entries for Active Shared Cache Blocks (동적 디렉터리 테이블 : 공유 캐시 블록의 디렉터리 엔트리 동적 할당)

  • Bae, Han Jun;Choi, Lynn
    • Journal of KIISE
    • /
    • v.44 no.12
    • /
    • pp.1245-1251
    • /
    • 2017
  • In this study we present a novel directory architecture that can dynamically allocate a directory entry for a cache block on demand at runtime only when the block is shared by more than one core. Thus, we do not maintain coherence for private blocks, substantially reducing the number of directory entries. Even for shared blocks, we allocate directory entry dynamically only when the block is actively shared, further reducing the number of directory entries at runtime. For this, we propose a new directory architecture called dynamic directory table (DDT), which is implemented as a cache of active directory entries. Through our detailed simulation on PARSEC benchmarks, we show that DDT can outperform the expensive full-map directory by a slight margin with only 17.84% of directory area across a variety of different workloads. This is achieved by its faster access and high hit rates in the small directory. In addition, we demonstrate that even smaller DDTs can give comparable or higher performance compared to recent directory optimization schemes such as SPACE and DGD with considerably less area.

3-D Hetero-Integration Technologies for Multifunctional Convergence Systems

  • Lee, Kang-Wook
    • Journal of the Microelectronics and Packaging Society
    • /
    • v.22 no.2
    • /
    • pp.11-19
    • /
    • 2015
  • Since CMOS device scaling has stalled, three-dimensional (3-D) integration allows extending Moore's law to ever high density, higher functionality, higher performance, and more diversed materials and devices to be integrated with lower cost. 3-D integration has many benefits such as increased multi-functionality, increased performance, increased data bandwidth, reduced power, small form factor, reduced packaging volume, because it vertically stacks multiple materials, technologies, and functional components such as processor, memory, sensors, logic, analog, and power ICs into one stacked chip. Anticipated applications start with memory, handheld devices, and high-performance computers and especially extend to multifunctional convengence systems such as cloud networking for internet of things, exascale computing for big data server, electrical vehicle system for future automotive, radioactivity safety system, energy harvesting system and, wireless implantable medical system by flexible heterogeneous integrations involving CMOS, MEMS, sensors and photonic circuits. However, heterogeneous integration of different functional devices has many technical challenges owing to various types of size, thickness, and substrate of different functional devices, because they were fabricated by different technologies. This paper describes new 3-D heterogeneous integration technologies of chip self-assembling stacking and 3-D heterogeneous opto-electronics integration, backside TSV fabrication developed by Tohoku University for multifunctional convergence systems. The paper introduce a high speed sensing, highly parallel processing image sensor system comprising a 3-D stacked image sensor with extremely fast signal sensing and processing speed and a 3-D stacked microprocessor with a self-test and self-repair function for autonomous driving assist fabricated by 3-D heterogeneous integration technologies.

A Study on Horizontal Shuffle Scheduling for High Speed LDPC decoding in DVB-S2 (DVB-S2 기반 고속 LDPC 복호를 위한 Horizontal Shuffle Scheduling 방식에 관한 연구)

  • Lim, Byeong-Su;Kim, Min-Hyuk;Jung, Ji-Won
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.16 no.10
    • /
    • pp.2143-2149
    • /
    • 2012
  • DVB-S2 employs LDPC codes which approach to the Shannon's limit, since it has characteristics of a good distance, error floor does not appear. Furthermore it is possible to processes full parallel processing. However, it is very difficult to high speed decoding because of a large block size and number of many iterations. This paper present HSS algorithm to reduce the iteration numbers without performance degradation. In the flooding scheme, the decoder waits until all the check-to-variable messages are updated at all parity check nodes before computing the variable metric and updating the variable-to-check messages. The HSS algorithm is to update the variable metric on a check by check basis in the same way as one code draws benefit from the other. Eventually, LDPC decoding speed based on HSS algorithm improved 30% ~50% compared to conventional one without performance degradation.

A study on the Application of Effects-based Operation in Cyberspace (사이버공간에서의 효과중심작전 적용방안 연구)

  • Jang, Won-gu;Lee, Kyun-ho
    • Journal of Internet Computing and Services
    • /
    • v.21 no.1
    • /
    • pp.221-230
    • /
    • 2020
  • The effects-based operation, which would reduce unnecessary efforts and meaningless sacrifices incurred during a war and simultaneously reach the will of the enemy leadership by strategic attacks, was discarded for the reason that it was difficult to apply it to military power except for airpower. However, cyberspace, which can be thoroughly logical and calculated, can be suitable for conducting effects-based operations. This study examined a way to carry out effects-based operations in such cyberspaces. It laid the foundation for overcoming the limitations of effects-based operations revealed in previous battle cases and executing the operations in cyber battlespace where the boundary between physical and cyberspaces gradually disappeared. Futhermore, it demonstrated that effects-based operations could be carried out in cyberspace by establishing a military strategy, which could conduct the operations through an analysis of previous cyber-attack cases.

Performance evaluation and analysis of TILE-Gx36 many-core processor with PARSEC benchmark (PARSEC을 이용한 TILE-Gx36 다중코어 프로세서의 성능 평가 및 분석)

  • Lee, Boseon;Kim, Han-Yee;Yu, Heonchang;Suh, Taeweon
    • The Journal of Korean Association of Computer Education
    • /
    • v.17 no.1
    • /
    • pp.107-115
    • /
    • 2014
  • This paper evaluates and analyzes the performance of TILE-Gx36(Gx36), a many-core processor. The PARSEC parallel benchmark suite was used to measure the performance, and Core i7 (i7) and Atom are used for the performance comparison. When experimented with the maximum number of threads that can be executed concurrently on each machine, Gx36 showed a 2.73${\times}$ inferior performance to Core i7 and a 1.93${\times}$ superior performance to Atom. Gx36 has the largest Last Level Cache(LLC) among the compared processors. Nevertheless, it reported the biggest number of LLC misses, which, we strongly believe, is the major culprit for lower performance than expected. Our study suggests that the DDC employed in Gx36 is not a favorable cache structure for the general-purpose high-performance computing. The actual measurement with off-the-shelf machine provides non-biased data for polishing the future many-core architecture.

  • PDF

Efficient Method to Support Mobile Virtualization-based Cloud Resource Management (모바일 가상화기반 클라우드 자원관리를 지원하는 효율적 방법)

  • Kang, Yongho;Jang, Changbok;Lee, Wanjik;Heo, Seokyeol;Kim, Jooman
    • Journal of Digital Convergence
    • /
    • v.12 no.2
    • /
    • pp.277-283
    • /
    • 2014
  • Recently, various cloud service has been being provided on mobile devices as well as desktop pc and server computer. Also, Smartphone users are very rapidly increasing, and they are using it for enjoying various services(cloud service, game, banking service, mobile office, etc.). So, research to utilize resources on mobile device has been conducted. In this paper, We have suggested efficient method of cloud resource management by using information of available physical resources(CPU, memory, storage, etc.) between mobile devices, and information of physical resource in mobile device. Suggested technology is possible to guarantee real-time process and efficiently manage resources.

Design and Implementation of the Performance Driven UI-Mashup Architecture (성능 주도의 UI-Mashup 아키텍처의 설계 및 구현)

  • Cho, Dong-Il
    • Journal of Internet Computing and Services
    • /
    • v.15 no.1
    • /
    • pp.45-53
    • /
    • 2014
  • UI-Mashup is widely used as a service method to add value, which is composed of distributed various contents on the internet and has turned out to be one of the latest trends in web application program development. Previous UI-Mashup-related studies have focused primarily on the dynamic service composition and have not been able to adapt to a rapidly changing Web Standard, thusthe end users conclude that UI-Mashups are slow, incompatible and poor security services. In this study, We propose an architecture for the performance improvements of UI-Mashup.In order to provide fast services and security enhancements, the proposed architecture collects UI fragments on the server in parallel, and sends layouts and contents of Mashups UI to the client through a special delivery channel supporting fast reaction and response time. In this study, the implementation and performance tests were proceeded to verify the proposed architecture experimentally. As a result of the performance testing, the proposed architecture has two to three times faster response time and more than four times throughput compared to the previous UI-Mashup technology.