Efficient Use of On-chip Memory through Profile-Driven Array Reorganization

Cho, Doosan;Youn, Jonghee;

doi:10.14372/IEMEK.2011.6.6.2

IEMEK Journal of Embedded Systems and Applications (대한임베디드공학회논문지)

Volume 6 Issue 6
/
Pages.345-359
/
2011
/
1975-5066(pISSN)

Institute of Embedded Engineering of Korea (대한임베디드공학회)

DOI QR Code

Efficient Use of On-chip Memory through Profile-Driven Array Reorganization

Cho, Doosan (Sunchon National Univ.) ;
Youn, Jonghee (Gangneung-Wonju National University)

Received : 2011.05.04
Accepted : 2011.07.07
Published : 2011.12.31

https://doi.org/10.14372/IEMEK.2011.6.6.2 Citation PDF KSCI

Download PDF

⟨ Previous Next ⟩

Abstract

In high performance embedded systems, the use of multiple on-chip memories is an essential architectural feature for exploiting inherent parallelism in multimedia applications. This feature allows multiple data accesses to be executed in parallel. However, it remains difficult to effectively exploit of multiple on-chip memories. The successful use of this architecture strongly depends on how to efficiently detect and exploit memory parallelism in target applications. In this paper, we propose a technique based on a linear array access descriptor [1], which is generated from profiled data, to detect and exploit memory parallelism. The proposed technique tackles an array reorganization problem to maximize memory parallelism in multimedia applications. We present preliminary experiments applying the proposed technique onto a representative coarse grained reconfigurable array processor (CGRA) with multimedia kernel codes. Our experimental results demonstrate that our technique optimizes data placement by putting independent data on separate storage. The results exhibit 9.8% higher performance on average compared to the existing method.

Keywords

References

Yunheung Paek, Jay Hoeflinger, and David Padua, "Simplification of array access patterns for compiler optimizations", In PLDI'98, pages60-71.
Jean-Francois Collard and Daniel Lavery, "Optimizations to prevent cache penalties for the intel Itanium 2 processor", In Proceedings of the CGO'03, 105-114.
P. Grun, N. Dutt, and A. Nicolau, "Access pattern based local memory customization for low power embedded systems", In Proceedings of the conference on DATE, 778-784.
M. Gupta and P. Banerjee, "Demonstration of automatic data partitioning techniques for parallelizing compilers on multicomputers", IEEE Trans. Parallel Distrib. Syst., 3(2):179-193, 1992. https://doi.org/10.1109/71.127259
Hartej Singh, Guangming Lu, Eliseu Filho, Rafael Maestre, Ming-Hau Lee, Fadi Kurdahi, and Nader Bagherzadeh, "Morphosys: case study of a reconfigurable computing system targeting multimedia applications", In Proceedings of DAC, 573-578, 2000.
M. Wolfe, "More iteration space tiling", In Proceedings of the ACM/IEEE conferenceon, Supercomputing'89, 655-664.
Nainesh Agarwal and Nikitas Dimopoulos, "Dspstone benchmark of codel's automated clock gating platform", In Proceedings of the IEEE VLSI, 508-509, 2007.
M. R. Guthaus, J. S. Ringenberg, D. Ernst, T. M. Austin, T. Mudge, and R. B. Brown, "Mibench: A free, commercially representative embedded benchmark suite", In Proceedings of the WWC-4. 2001.
ICD-C compiler framework, University of Dortmund, .http://www.icd.de/es/icd-c/
Yoonjin Kim, Mary Kiemb, Chulsoo Park, Jinyong Jung, and Kiyoung Choi, "Resource sharing and pipelining in coarse-grained reconfigurable architecture for domain-specific optimization", In Proceedings of DATE'05, 12-17.
A. Hatanaka and N. Bagherzadeh, "A modulo scheduling algorithm for a coarse-grain reconfigurable array template", In Proceedings of the IPDPS'07, 1-8, 2007.
Hyunchul Park, Kevin Fan, Manjunath Kudlur, and Scott Mahlke, "Modulo graph embedding: mapping applications onto coarse-grained reconfigurable architectures", In Proceedings of CASES'06, 136-146.
Kathryn McKinley and Steve Carr, "Improving data locality with loop transformations", ACM Transactions on Programming Languages and Systems, 18: 424-453, 1996. https://doi.org/10.1145/233561.233564
B. Mei, S. Vernalde, D. Verkest, H. De Man, and R. Lauwereins, "Adres: An architecture with tightly coupled vliw processor and coarse grained reconfigurable matrix", In Proceeding of Field Programmable Logic, FPL'03, 61-70.
Michael Joseph Wolfe, "High Performance Compilers for Parallel Computing", Addison-Wesley Longman Publishing Co., USA, 1995.
Wei Li, "Compiling for numa parallel machines", PhD thesis, Ithaca, NY, USA,1993.
Michael E. Wolf and Monica S. Lam, "A data locality optimizing algorithm", In Proceedings of the ACM SIGPLAN 1991, 30-44.
Michael E. Wolf, Dror E. Maydan, and Ding-Kai Chen, "Combining loop transformations considering caches and scheduling", In MICRO29, 274-286, 1996.
Daniel Edward Lenoski, "The design and analysis of DASH: a scalable directory-based multiprocessor", PhD thesis, Stanford, CA, USA, 1992.
Kai Li, "Shared virtual memory on loosely coupled multiprocessors", PhD thesis, 1986.
S. Lumetta, L. Murphy, X. Li, D. Culler, and I. Khalil, "Decentralized optimal power pricing: The development of a parallel program", In IEEE Parallel and Distributed Technology, 240-249, 1993.
V. Balasundaram and K. Kennedy, "A technique for summarizing data access and its use in parallelism enhancing transformations", In Proceedings of the ACM SIGPLAN 1989, 41-53.
Chau wen Tseng, "Compiler optimizations for eliminating barrier synchronization", ACM SIGPLAN, 144-155, 1995.

IEMEK Journal of Embedded Systems and Applications (대한임베디드공학회논문지)

Efficient Use of On-chip Memory through Profile-Driven Array Reorganization

Abstract

Keywords

References

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)