Peak Power Minimization for Clustered VLIW Architectures

분산된 VLIW 구조에서의 최대 전력 최소화 방법

  • 서재원 (한국과학기술원 전자전산학과) ;
  • 김태환 (한국과학기술원 전자전산학과) ;
  • 정기석 (홍익대학교 컴퓨터공학과)
  • Published : 2003.06.01

Abstract

VLIW architecture has emerged as one of the most effective architectures in dealing with multimedia applications. In multimedia applications, there is ample potential for parallelizing the execution of multiple operations because such applications typically have data intensive processing which often has limited data and/or control dependencies. As the degree of instruction-level parallelism increases, non-clustered VLIW architectures scale poorly because of the tremendous register port pressure. Therefore, clustered VLIW architecture is definitely preferred over non-clustered VLIW architecture when a higher degree of parallelizing is possible as in the case of multimedia processing However, having multiple clusters in an architecture implies that the amount of hardware is quite large, and therefore, power consumption becomes a very crucial issue. In this paper, we propose an algorithm to minimize the peak power consumption without incurring little or no delay penalty. The effectiveness of our algorithm has been verified by various sets of experiments, and up to 30.7% reduction in the peak power consumption is observed compared with the results that is optimized to minimize resources only.

VLIW 구조는 다량의 데이터를 처리하는 멀티미디어 애플리케이션에 매우 적합한 구조로서, 이 같은 종류의 애플리케이션에 대해 높은 수준의 병렬 처리를 가능케 한다. 이러한 병렬성을 더욱 증대 시키기 위하여 시스템을 확장하는 경우에 있어, 분산된 VLIW 구조는 그렇지 않은 구조에 비해 큰 강점을 갖는다. 하지만 여러 개의 분산된 클러스터를 하나의 구조 속에 포함하는 것은 필연적으로 적지 않은 양의 하드웨어를 요구하고, 이로 말미암아 전체 시스템에서 소모되는 전력 문제가 중요한 이슈로 대두된다. 본 논문에서는 분산된 VLIW 구조에서 전체 시스템의 성능 제한 조건을 만족시키는 동시에 최대 전력 소모량을 줄이는 효과적인 알고리즘을 제시한다. 일련의 실험을 통해 제시된 알고리즘이 최대 30.7%의 최대 전력 소모 감소 효과를 얻을 수 있음이 확인되었다.

Keywords

References

  1. V. Raghunathan, S. Ravi, A. Raghunathan, G. Lakshminarayana, 'Transient Power Management Through High Level Synthesis', ICCAD 2001
  2. C. Y. Wang and K. Roy, 'Maximum Power Estimation for CMOS Circuites Using Deterministic and Statistical Techniques', IEEE Trans. on VLSI Systems, 1998 https://doi.org/10.1109/92.661255
  3. Y. M. Jiang, A. Krstic, and K. T. Cheng, 'Estimation of Maximum Instantaneous Current through Supply Lines for CMOS circuites', IEEE Trans. on VLSI Systems, 2000 https://doi.org/10.1109/92.820762
  4. V. S. Lapinskii, M. F. Jacome and G. A. de Veciana, 'High-Quality Operation Binding for Clustered VLIW Datapaths', Design Automation Conference, 2001 https://doi.org/10.1145/378239.379051
  5. C. Akturan and M. F. Jacome, 'CALiBeR: A Software Pipelining Algorithm for Clustered VLIW Processors', Proc. of IEEE/ACM International Conference on Computer Aided Design, 2001 https://doi.org/10.1109/ICCAD.2001.968606
  6. J. Sanchez and A. Gonzalez, 'Modulo Scheduling for a Fully-Distributed Clustered VLIW Architecture', roc. of 33th International Symposium on Microarchitecture, 2000 https://doi.org/10.1109/MICRO.2000.898064
  7. J. Zalamea, J. Llosa, E. Ayguade and M. Valero, 'Modulo Scheduling with Integrated Register Spilling for Clustered VLIW Architectures', Proc. of the 33rd Annual International Symposium on Microarchitecture, 2000
  8. E. Ozer, S. Banerjia and T. Conte, 'Unified Assign and Schedule: A New Approach to Scheduling for Clustered Register File Microarchitectures', Proc. of the 31st Annual International Symposium on Microarchitecture, 1998 https://doi.org/10.1109/MICRO.1998.742792
  9. G. Desoli, 'Instruction Assignment for Clustered VLIW DSP Compilers: A New Approach', Technical Report HPL-98-13, HP Laboratories, 1998
  10. M. T.-C. Lee, V. Tiwari, S. Malik and M. Fujita, 'Power Analysis and Minimization Techniques for Embedded DSP Software', IEEE Trans. on VLSI Systems, 1997 https://doi.org/10.1109/92.555992
  11. L. Benini, D. Bruni, M. Chinosi, C. Silvano, V. Zaccaria and R. Zafanlon, 'A Power Modeling and Estimation Framework for VLIW Based Embedded Systems', PATMOS01-IEEE 11th International Workshop on Power and Timing Modeling, Optimization and Simulation, 2001
  12. S. Rixner, W. J. Dally, B. Khailany, P. Mattson, U. J. Kapasi and J. D. Owens, 'Register Organization for Media Processing', Proc. of the 6th International Symposium on High-Performance Computer Architecture, 2000 https://doi.org/10.1109/HPCA.2000.824366
  13. M. Lorenz, R. Leupers and P. Marwedel, 'Low-Energy DSP Code Generation Using a Genetic Algorithm', Proc. of International Conference on Computer Design, 2001 https://doi.org/10.1109/ICCD.2001.955062
  14. C. Lee, J. K. Lee and T.-T. Hwang, 'Compiler Optimization on Instruction Scheduling for Low Power', Proc. of the 13th International Symposium on System Synthesis, 2000 https://doi.org/10.1145/501790.501803
  15. M. Sami, D. Sciuto, C. Silvano and V. Zaccaria, 'An instruction-level energy model for embedded VLIW architectures', Trans. on Computer-Aided Design of Integrated Circuits and Systems, 2000 https://doi.org/10.1109/TCAD.2002.801105
  16. H. Yun and J. Kim, 'Power-aware Modulo Scheduling for High-Performance VLIW Processors', International Symposium on Low Power Electronics and Design, 2001 https://doi.org/10.1109/LPE.2001.945369
  17. P. G. Paulin and J. P. Knight, 'Force-Directed Scheduling for the Behavioral Synthesis of ASIC's', IEEE Trans. on Computer-Aided Design, 1989 https://doi.org/10.1109/43.31522
  18. N. D. Dutt, 'High-Level Synthesis Design Repositories' , http://www.jcs.uci.edu/~dutt
  19. W. H. Press, et al, 'Numerical Recipes in C', Cambridge University Press, 1988