DOI QR코드

DOI QR Code

Performance Evaluation and Verification of MMX-type Instructions on an Embedded Parallel Processor

임베디드 병렬 프로세서 상에서 MMX타입 명령어의 성능평가 및 검증

  • Jung, Yong-Bum (School of Electrical Engineering, University of Ulsan) ;
  • Kim, Yong-Min (School of Electrical Engineering, University of Ulsan) ;
  • Kim, Cheol-Hong (Electronics and Computer Engineering, Chonnam National University) ;
  • Kim, Jong-Myon (School of Electrical Engineering, University of Ulsan)
  • 정용범 (울산대학교 전기공학부) ;
  • 김용민 (울산대학교 전기공학부) ;
  • 김철홍 (전남대학교 전자컴퓨터공학부) ;
  • 김종면 (울산대학교 전기공학부)
  • Received : 2011.03.20
  • Accepted : 2011.04.13
  • Published : 2011.10.31

Abstract

This paper introduces an SIMD(Single Instruction Multiple Data) based parallel processor that efficiently processes massive data inherent in multimedia. In addition, this paper implements MMX(MultiMedia eXtension)-type instructions on the data parallel processor and evaluates and analyzes the performance of the MMX-type instructions. The reference data parallel processor consists of 16 processors each of which has a 32-bit datapath. Experimental results for a JPEG compression application with a 1280x1024 pixel image indicate that MMX-type instructions achieves a 50% performance improvement over the baseline instructions on the same data parallel architecture. In addition, MMX-type instructions achieves 100% and 51% improvements over the baseline instructions in energy efficiency and area efficiency, respectively. These results demonstrate that multimedia specific instructions including MMX-type have potentials for widely used many-core GPU(Graphics Processing Unit) and any types of parallel processors.

본 논문에서는 멀티미디어에 내재한 무수한 데이터를 효율적으로 처리할 수 있는 SIMD(Single Instruction Multiple Data) 기반 병렬 프로세서를 소개한다. 또한, 인텔사의 대표적인 멀티미디어 전용 명령어인 MMX (MultiMedia eXtension)타입 명령어를 병렬 프로세서에 구현하여 성능을 평가하고 결과를 분석한다. 16개의 32-비트 프로세서로 구성된 병렬프로세서를 이용하여 1280x1024픽셀 이미지의 JPEG 압축 애플리케이션을 구현하고 모의 실험한 결과, 동일한 병렬프로세서 기반에서 MMX타입 명령어는 베이스라인 명령어보다 약 50%의 성능 향상을 보였다. 또한, MMX타입 명령어는 베이스라인 명령어보다 에너지 효율에서 100%, 시스템 면적 효율에서 51%의 향상을 보였다. 이러한 결과는 MMX를 포함한 멀티미디어 전용 명령어들이 현재 널리 사용되고 있는 매니코어 GPU(Graphics Processing Unit) 및 다양한 형태의 병렬프로세서에서도 잠재 가능성이 있음을 보여준다.

Keywords

References

  1. M. K. Chung, S. M. Park, N. W. Eum, "Technology and trend of parallel processor," Electronics and Telecommunications Research Institute Trend Analysis, vol. 24, no. 6, Dec. 2009.
  2. A.D. Blas et. al., "The UCSC Kestrel Parallel Processor," IEEE Trans. on Parallel and Distributed Systems, vol. 16, no. 1, pp. 80-92, Jan. 2005. https://doi.org/10.1109/TPDS.2005.12
  3. A. gentile and D. S. Wills, "Portable Video Supercomputing," IEEE Trans. on Computers, vol. 53, no. 8, pp. 960-973, Aug. 2004. https://doi.org/10.1109/TC.2004.48
  4. L. V. Huynh, C.-H. Kim, J.-M. Kim, "A massively parallel algorithm for fuzzy vector quantization," Journal of Korea Information Processing Society, Vol. 16-A, No. 6, pp. 411-418, Dec. 2009. https://doi.org/10.3745/KIPSTA.2009.16A.6.411
  5. A. Peleg and U. Weiser, "MMX Technology Extension to the Intel Architecture," IEEE Micro, vol.16, no. 4, pp. 42-50, Aug. 1996. https://doi.org/10.1109/40.526924
  6. H. Nguyen and L. John, "Exploiting SIMD Paralle lism in DSP and Multimedia Algorithms using the AltiVec Technology," in Proc. Intl. Conf. on Supercomputer, pp. 11-20, June 1999.
  7. R. B. Lee, "Subword Parallelism with MAX-2," IEEE Micro, vol. 16, no. 4, pp. 51-59, Aug. 1996. https://doi.org/10.1109/40.526925
  8. S. Oberman, G. Favor, F. Weber, "AMD 3DNow! technology: architecture and implementations," IEEE Micro, vol. 19, no. 2, pp. 37-48, Mar/Apr. 1999. https://doi.org/10.1109/40.755466
  9. M. Tremblay, J. M. O'Connor, V. Narayanan, and L. He, "VIS Speeds New Media Processing,"IEEE Micro, vol. 16, no. 4, pp. 10-20, Aug. 1996. https://doi.org/10.1109/40.526921
  10. J. Tyler, J. Lent, A. Mather, N. Huy, "AltiVec: bring vector technology to the PowerPC processor family," in IEEE International Performance, Computing, and Communications Conference, p. 437, Feb. 1999.
  11. S. K. Raman, V. Pentkovski, and J.Keshava, "Implementing streaming SIMD extensions on the pentium III processor," IEEE Micro, vol. 20, no. 4, pp.28-39, 2000. https://doi.org/10.1109/40.865864
  12. MIPS extension for digital media with 3D. Technical Report: http://www.mips.com, MIPS technologies, Inc., 1997.
  13. P. Ranganathan, S. Adve, and N. P. Jouppi, "Performance of image and video processing with genera l-purpose processors and media ISA extensions," in Proc. of the 26th Intl. Sym. on Computer Architecture, pp. 124-135, May 1999.
  14. R. Bhargava, L. John, B. Evans, and R. Radhakrishnan, "Evaluating MMX technology using DSP and multimedia applications," in Proc. of IEEE/ACM Sym. on Microarchitecture, pp. 37-46, 1998.
  15. N. Slingerland, and A. J. Smith, "Measuring the performance of multimedia instructionsets," IEEE Trans. on Computers, vol. 51, no. 11, pp. 1317-1332 , Nov. 2002. https://doi.org/10.1109/TC.2002.1047756
  16. A. Krikelis, I. P. Jalowiecki, D. Bean, R. Bishop, M. Facey, D. Boughton, S. Murphy, and M. Whitaker, "A programmable processor with 4096 processing units for media applications," in Proc. of the IEEE Intl. Conf. on Acoustics, Speech, and Signal Processing, vol. 2, pp. 937-940, May 2001.
  17. L. W. Tucker, and G. G. Robertson, "Architecture and applications of the connection machine," IEEE Computer, vol. 21, no. 8, pp. 26-38, 1988.
  18. "Connection machine model CM-2 technical summary," Thinking Machines Corp., version 51, May 1989.
  19. MarPar (MP-2) System Data Sheet. MarPar Corp oration, 1993.
  20. M. J. Irwin, R. M. Owens, "A Two-Dimensional, Distributed Logic Processor," IEEE Trans. on Computers, vol. 40, no. 10, pp. 1094-1101, 1991. https://doi.org/10.1109/12.93742
  21. M. Bolotski, R. Armithrajah, W. Chen, "ABACUS: A High Performance Architecture for Vision," in Proceedings of the International Conference on Pattern Recognition, 1994.
  22. S. M. Chai, T. M. Taha, D. S. Wills, and J. D. Meindl, "Heterogeneous architecture models for interconnect-motivated system design," IEEE Trans. VLSI Systems, special issue on system level interconnect prediction, vol. 8, no. 6, pp. 660-670, Dec. 2000.
  23. J. C. Eble, V. K. De, D. S. Wills, and J. D. Meindl, "A generic system simulator (GENESYS) for ASIC technology and architecture beyond 2001," In Proc. of the Ninth Ann. IEEE Intl. ASIC Conf., pp. 193-196, Sept. 1996.
  24. Wallace, G.K., "The JPEG still picture compression standard," IEEE Transactions on Consumer Electronics, vol 38. no 1, pp. 18 - 33 , Feb 1992.
  25. W. H. Chen, C. Smith, S. Fralick, A fast computational algorithm for the discrete cosine transform, IEEE Trans. Commun. 25 (9) (2002), pp. 1004-1009.
  26. Long-Wen Chang, Ching-Yang Wang, Shiuh-Ming Lee, "Designing JPEG quantization tables based on human visual system," ICIP 99, vol. 2, pp. 376-380, 1999
  27. V. Tiwari, S. Malik, and A. Wolfe, "Compilation Techniques for Low Energy: An Overview," in Proc. of the IEEE Intl. Symp. on Low Power Electron., pp. 38-39, Oct. 1994.

Cited by

  1. AMEX: 16비트 Thumb 명령어 집합 구조의 주소 지정 방식 확장 vol.17, pp.11, 2011, https://doi.org/10.9708/jksci/2012.17.11.001