Simulation of YUV-Aware Instructions for High-Performance, Low-Power Embedded Video Processors

고성능, 저전력 임베디드 비디오 프로세서를 위한 YUV 인식 명령어의 시뮬레이션

  • 김철홍 (전남대학교 전자컴퓨터공학부) ;
  • 김종면 (울산대학교 컴퓨터정보통신공학부)
  • Published : 2007.10.31

Abstract

With the rapid development of multimedia applications and wireless communication networks, consumer demand for video-over-wireless capability on mobile computing systems is growing rapidly. In this regard, this paper introduces YUV-aware instructions that enhance the performance and efficiency in the processing of color image and video. Traditional multimedia extensions (e.g., MMX, SSE, VIS, and AltiVec) depend solely on generic subword parallelism whereas the proposed YUV-aware instructions support parallel operations on two-packed 16-bit YUV (6-bit Y, 5-bits U, V) values in a 32-bit datapath architecture, providing greater concurrency and efficiency for color image and video processing. Moreover, the ability to reduce data format size reduces system cost. Experiment results on a representative dynamically scheduled embedded superscalar processor show that YUV-aware instructions achieve an average speedup of 3.9x over the baseline superscalar performance. This is in contrast to MMX (a representative Intel#s multimedia extension), which achieves a speedup of only 2.1x over the same baseline superscalar processor. In addition, YUV-aware instructions outperform MMX instructions in energy reduction (75.8% reduction with YUV-aware instructions, but only 54.8% reduction with MMX instructions over the baseline).

References

  1. A. Peleg and U. Weiser, 'MMX Technology Extension to the Intel Architecture,' IEEE Micro, Vol.16, No.4, pp. 42-50, Aug. 1996
  2. S. K. Raman, V. Pentkovski, and J. Keshava, 'Implementing Streaming SIMD Extensions on the Pentium III Processor,' IEEE Micro, Vol.20, No.4, pp. 28-39, 2000
  3. R. B. Lee, 'Subword Parallelism with MAX-2,' IEEE Micro, Vol.16, No.4, pp. 51-59, Aug. 1996
  4. M. Tremblay, J. M. O'Connor, V. Narayanan, and L. He, 'VIS Speeds New Media Processing,' IEEE Micro, Vol.16, No.4, pp. 10-20, Aug. 1996
  5. H. Nguyen and L. John, 'Exploiting SIMD Parallelism in DSP and Multimedia Algorithms using the AltiVec Technology,' in Proc. Intl. Conf. on Supercomputer, pp. 11-20, June 1999
  6. K. N. Plataniotis and A. N. Venetsanopoulos, Color Image Processing and Applications, Springer Verlag, 2000
  7. N. Slingerland and A. J. Smith, 'Measuring the Performance of Multimedia Instruction Sets,' IEEE Trans. on Computers, Vol.51, No.11, pp. 1317-1332, Nov. 2002 https://doi.org/10.1109/TC.2002.1047756
  8. R. C. Gonzalez and R. E. Woods, Digital Image Processing, 2nd Ed., Prentice Hall, 2002
  9. J. Suh and V. K. Prasanna, 'An Efficient Algorithm for Out-of-core Matrix Transposition,' IEEE Trans. on Computers, Vol.51, No.4, pp. 420-438, April 2002 https://doi.org/10.1109/12.995452
  10. D. Burger, T. M. Austin, and S. Bennett, 'Evaluating future micro-processors: the SimpleScalar tool set,' Tech. Report TR-1308, Univ. of Wisconsin-Madison Computer Sciences Dept., 1997
  11. D. Brooks, V. Tiwari, and M. Martonosi, 'Wattch: A framework for architectural-level power analysis and optimizations,' in Proc. of the IEEE Intl. Symp. on Computer Architecture, pp. 83-94, June 2000
  12. V. Tiwari, S. Malik, and A. Wolfe, 'Compilation Techniques for Low Energy: An Overview,' in Proc. of the IEEE Intl. Symp. on Low Power Electron., pp. 38-39, Oct. 1994