DOI QR코드

DOI QR Code

Multimedia Extension Instructions and Optimal Many-core Processor Architecture Exploration for Portable Ultrasonic Image Processing

휴대용 초음파 영상처리를 위한 멀티미디어 확장 명령어 및 최적의 매니코어 프로세서 구조 탐색

  • Kang, Sung-Mo (School of Electrical Engineering, University of Ulsan) ;
  • Kim, Jong-Myon (School of Electrical Engineering, University of Ulsan)
  • 강성모 (울산대학교 전기공학과) ;
  • 김종면 (울산대학교 전기공학과)
  • Received : 2012.05.10
  • Accepted : 2012.07.05
  • Published : 2012.08.31

Abstract

This paper proposes design space exploration methodology of many-core processors including multimedia specific instructions to support high-performance and low power ultrasound imaging for portable devices. To explore the impact of multimedia instructions, we compare programs using multimedia instructions and baseline programs with a same many-core processor in terms of execution time, energy efficiency, and area efficiency. Experimental results using a $256{\times}256$ ultrasound image indicate that programs using multimedia instructions achieve 3.16 times of execution time, 8.13 times of energy efficiency, and 3.16 times of area efficiency over the baseline programs, respectively. Likewise, programs using multimedia instructions outperform the baseline programs using a $240{\times}320$ image (2.16 times of execution time, 4.04 times of energy efficiency, 2.16 times of area efficiency) as well as using a $240{\times}400$ image (2.25 times of execution time, 4.34 times of energy efficiency, 2.25 times of area efficiency). In addition, we explore optimal PE architecture of many-core processors including multimedia instructions by varying the number of PEs and memory size.

본 논문에서는 휴대용 초음파 영상의 고성능 및 저전력 처리를 위해 멀티미디어 전용 명령어를 내장한 매니코어의 디자인 공간 탐색 방법론을 제안한다. 이를 위해서 멀티미디어 확장 명령어로 인한 서브워드 병렬처리 방식을 적용한 프로그램과 적용하지 않은 프로그램의 성능을 비교하여 에너지 효율 및 면적효율을 측정하였다. 모의실험 결과, MMX 형태 명령어를 사용한 프로그램은 베이스라인 프로그램 보다 $256{\times}256$ 해상도에서 실행시간은 평균 3.16배, 에너지 효율은 평균 8.13배, 면적 효율은 평균 3.16배의 향상을 보였다. $240{\times}320$ 해상도와 $240{\times}400$ 해상도에서는 각각 실행시간 평균 2.16배, 2.25배, 에너지 효율은 4.04배 4.34배, 면적 효율은 2.16배, 2.25배 향상되었다. 더불어 이러한 MMX 형태 명령어를 포함한 매니코어의 프로세싱 엘리먼트 (Processing Element: PE) 개수 및 메모리 사이즈를 변화시키면서 각 초음파 영상의 해상도별로 최적의 시스템 면적 및 에너지 효율을 보이는 PE 구조를 탐색하였다.

Keywords

Acknowledgement

Supported by : 한국연구재단

References

  1. Kwang-Baek Kim, Sang-Ho Shin, "Extraction of Lumbar Multifidus Muscle using Ultrasound Imaging," Journal of The Korea Society of Computer and Information, Vol. 16, No. 2, pp.55-60, Feb. 2011. https://doi.org/10.9708/jksci.2011.16.2.055
  2. Byong-Kook Choi, Jong-Myon Kim, "Implementation of an Optimal Many- Core Processor for Beamforming Algorithm of Mobile Ultrasound Image Signals," Journal of The Korea Society of Computer and Information, Vol. 16, No. 8, pp.119-128, Aug. 2011. https://doi.org/10.9708/jksci.2011.16.8.119
  3. Luong Van Huynh, Cheol-Hong Kim, Jong-Myon Kim, "A Massively Parallel Algorithm for Fuzzy Vector Quantization," The KIPS transactions. Part A, Vol. 16, No. 6, pp.411-418, Dec. 2009. https://doi.org/10.3745/KIPSTA.2009.16A.6.411
  4. Jeff Bilmes, Krste Asanovic, Chee-Whye Chin, Jim Demmel, "Optimizing matrix multiply using PHiPAC: a portable, high-performance, ANSI C coding methodology," Int. Conf. on Supercomputing, pp.340-347, Jul. 1997.
  5. Hoo-Jeong Lee, Haing-Sei Lee, Young-Kil Kim, Min-Hwa Lee, "A Study on the Improvement a Lateral Resolution of the Ultrasound Imaging System," J. of KOSOMBE, Vol. 9, No. 1, pp.87-92, 1988.
  6. Joo-Han Kim, Tae-Kyong Song and Song-Bai Park, "Pipelined sampled-delay focusing in ultrasound imaging systems," Ultrasonic Imaging, Vol. 9, pp.75-91, Apr. 1987. https://doi.org/10.1016/0161-7346(87)90007-1
  7. Jeong Jo, "A study on the efficient ultrasound beamformer architecture for two-way dynamic focusing", Master Thesis, Sojang univ., 2005.
  8. Yu-jin Kim, "Embedded C Programming Code optimization," HanbitMedia, pp.214-216, 2008.
  9. S. M. Chai, T. Taha, D. S. Wills, and J. D. Meindl, "Heterogeneous architecture models for interconnect-motivated system design," IEEE Trans. on VLSI Systems, Vol. 8, No. 6, pp.660-670, Dec. 2000. https://doi.org/10.1109/92.902260
  10. V. Tiwari, S. Malik, and A. Wolfe, "Compilation techniques for low energy: An overview," in Proc. IEEE International Symposium on Low Power Electronics, pp.38-39, Oct. 1994.
  11. L. Codrescu, S.P. Nugent, J.D. Meindl, and D.S. Wills, "Modeling Technology Impact on Cluster Microprocessor Performance," IEEE Trans. VLSI Systems, Vol. 11, No. 5, pp. 909-920, Oct. 2003. https://doi.org/10.1109/TVLSI.2003.817512
  12. A. Gentile, S. Sander, L. Wills, and D. S. Wills, "The Impact of Grain Size of the Efficiency of Embedded SIMD Image Processing Architectures," Journal of Parallel Distributed Computing, Vol. 64, No. 11, pp.1318-1327, Nov. 2004. https://doi.org/10.1016/j.jpdc.2004.06.013
  13. International Technology Roadmap for Semiconductors 2009 Edition, http://www.itrs.net/Links/2009ITRS/Home2009.htm