• Title, Summary, Keyword: Task Granularity

Search Result 7, Processing Time 0.07 seconds

Batch Resizing Policies and Techniques for Fine-Grain Grid Tasks: The Nuts and Bolts

  • Muthuvelu, Nithiapidary;Chai, Ian;Chikkannan, Eswaran;Buyya, Rajkumar
    • Journal of Information Processing Systems
    • /
    • v.7 no.2
    • /
    • pp.299-320
    • /
    • 2011
  • The overhead of processing fine-grain tasks on a grid induces the need for batch processing or task group deployment in order to minimise overall application turnaround time. When deciding the granularity of a batch, the processing requirements of each task should be considered as well as the utilisation constraints of the interconnecting network and the designated resources. However, the dynamic nature of a grid requires the batch size to be adaptable to the latest grid status. In this paper, we describe the policies and the specific techniques involved in the batch resizing process. We explain the nuts and bolts of these techniques in order to maximise the resulting benefits of batch processing. We conduct experiments to determine the nature of the policies and techniques in response to a real grid environment. The techniques are further investigated to highlight the important parameters for obtaining the appropriate task granularity for a grid resource.

Refined fixed granularity algorithm on Networks of Workstations (NOW 환경에서 개선된 고정 분할 단위 알고리즘)

  • Gu, Bon-Geun
    • The KIPS Transactions:PartA
    • /
    • v.8A no.2
    • /
    • pp.117-124
    • /
    • 2001
  • At NOW (Networks Of Workstations), the load sharing is very important role for improving the performance. The known load sharing strategy is fixed-granularity, variable-granularity and adaptive-granularity. The variable-granularity algorithm is sensitive to the various parameters. But Send algorithm, which implements the fixed-granularity strategy, is robust to task granularity. And the performance difference between Send and variable-granularity algorithm is not substantial. But, in Send algorithm, the computing time and the communication time are not overlapped. Therefore, long latency time at the network has influence on the execution time of the parallel program. In this paper, we propose the preSend algorithm. In the preSend algorithm, the master node can send the data to the slave nodes in advance without the waiting for partial results from the slaves. As the master node sent the next data to the slaves in advance, the slave nodes can process the data without the idle time. As stated above, the preSend algorithm can overlap the computing time and the communication time. Therefore we reduce the influence of the long latency time at the network and the execution time of the parallel program on the NOW. To compare the execution time of two algorithms, we use the $320{\times}320$ matrix multiplication. The comparison results of execution times show that the preSend algorithm has the shorter execution time than the Send algorithm.

  • PDF

Services Identification based on Use Case Recomposition (유스케이스 재구성을 통한 서비스 식별)

  • Kim, Yu-Kyong
    • The Journal of Society for e-Business Studies
    • /
    • v.12 no.4
    • /
    • pp.145-163
    • /
    • 2007
  • Service-Oriented Architecture is a style of information systems that enables the creation of applications that are built by combining loosely coupled and interoperable services. A service is an implementation of business functionality with proper granularity and invoked with well-defined interface. In service modeling, when the granularity of a service is finer, the reusability and flexibility of the service is lower. For solving this problem concerns with the service granularity, it is critical to identify and define coarse-grained services from the domain analysis model. In this paper, we define the process to identify services from the Use Case model elicited from domain analysis. A task tree is derived from Use Cases and their descriptions, and Use Cases are reconstructed by the composition and decomposition of the task tree. Reconstructed Use Cases are defined and specified as services. Because our method is based on the widely used UML Use Case models, it can be helpful to minimize time and cost for developing services in various platforms and domains.

  • PDF

Task-Level Dynamic Voltage Scaling for Embedded System Design: Recent Theoretical Results

  • Kim, Tae-Whan
    • Journal of Computing Science and Engineering
    • /
    • v.4 no.3
    • /
    • pp.189-206
    • /
    • 2010
  • It is generally accepted that dynamic voltage scaling (DVS) is one of the most effective techniques of energy minimization for real-time applications in embedded system design. The effectiveness comes from the fact that the amount of energy consumption is quadractically proportional to the voltage applied to the processor. The penalty is the execution delay, which is linearly and inversely proportional to the voltage. According to the granularity of tasks to which voltage scaling is applied, the DVS problem is divided into two subproblems: inter-task DVS problem, in which the determination of the voltage is carried out on a task-by-task basis and the voltage assigned to the task is unchanged during the whole execution of the task, and intra-task DVS problem, in which the operating voltage of a task is dynamically adjusted according to the execution behavior to reflect the changes of the required number of cycles to finish the task before the deadline. Frequent voltage transitions may cause an adverse effect on energy minimization due to the increase of the overhead of transition time and energy. In addition, DVS needs to be carefully applied so that the dynamically varying chip temperature should not exceed a certain threshold because a drastic increase of chip temperature is highly likely to cause system function failure. This paper reviews representative works on the theoretical solutions to DVS problems regarding inter-task DVS, intra-task DVS, voltage transition, and thermal-aware DVS.

A GPU scheduling framework for applications based on dataflow specification (데이터 플로우 기반 응용들을 위한 GPU 스케줄링 프레임워크)

  • Lee, Yongbin;Kim, Sungchan
    • Journal of Korea Multimedia Society
    • /
    • v.17 no.10
    • /
    • pp.1189-1197
    • /
    • 2014
  • Recently, general purpose graphic processing units(GPUs) are being widely used in mobile embedded systems such as smart phone and tablet PCs. Because of architectural limitations of mobile GPGPUs, only a single program is allowed to occupy a GPU at a time in a non-preemptive way. As a result, it is difficult to meet performance requirements of applications such as frame rate or response time if applications running on a GPU are not scheduled properly. To tackle this difficulty, we propose to specify applications using synchronous data flow model of computation such that applications are formed with edges and nodes. Then nodes of applications are scheduled onto a GPU unlike conventional scheduling an application as a whole. This approach allows applications to share a GPU at a finer granularity, node (or task)-level, providing several benefits such as eliminating need for manually partitioning applications and better GPU utilization. Furthermore, any scheduling policy can be applied in response to the characteristics of applications.

Adaptive Dynamic Load Balancing Strategies for Network-based Cluster Systems (네트워크 기반 클러스터 시스템을 위한 적응형 동적 부하균등 방법)

  • Jeong, Hun-Jin;Jeong, Jin-Ha;Choe, Sang-Bang
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.28 no.11
    • /
    • pp.549-560
    • /
    • 2001
  • Cluster system provides attractive scalability in terms of compution power and memory size. With the advances in high speed computer network technology, cluster systems are becoming increasingly competitive compared to expensive MPPs (massively parallel processors). Load balancing is very important issue since an inappropriate scheduling of tasks cannot exploit the true potential of the system and can offset the gain from parallelization. In parallel processing program, it is difficult to predict the load of each task before running the program. Furthermore, tasks are interdependent each other in many ways. The dynamic load balancing algorithm, which evaluates each processor's load in runtime, partitions each task into the appropriate granularity and assigns them to processors in proportion to their performance in cluster systems. However, if the communication cost between processing nodes is expensive, it is not efficient for all nodes to attend load balancing process. In this paper, we restrict a processor that attend load balancing by the communication cost and the deviation of its load from the average. We simulate various models of the cluster system with parameters such as communication cost, node number, and range of workload value to compare existing load balancing methods with the proposed dynamic algorithms.

  • PDF

A Ray-Tracing Algorithm Based On Processor Farm Model (프로세서 farm 모델을 이용한 광추적 알고리듬)

  • Lee, Hyo Jong
    • Journal of The Korea Computer Graphics Society
    • /
    • v.2 no.1
    • /
    • pp.24-30
    • /
    • 1996
  • The ray tracing method, which is one of many photorealistic rendering techniques, requires heavy computational processing to synthesize images. Parallel processing can be used to reduce the computational processing time. A parallel algorithm for the ray tracing has been implemented and executed for various images on transputer systems. In order to develop a scalable parallel algorithm, a processor farming technique has been exploited. Since each image is divided and distributed to each farming processor, the scalability of the parallel system and load balancing are achieved naturally in the proposed algorithm. Efficiency of the parallel algorithm is obtained up to 95% for nine processors. However, the best size of a distributed task is much higher in simple images due to less computational requirement for every pixel. Efficiency degradation is observed for large granularity tasks because of load unbalancing caused by the large task. Overall, transputer systems behave as good scalable parallel processing system with respect to the cost-performance ratio.

  • PDF