Advanced SearchSearch Tips
Scalable Application Mapping for SIMD Reconfigurable Architecture
facebook(new window)  Pirnt(new window) E-mail(new window) Excel Download
 Title & Authors
Scalable Application Mapping for SIMD Reconfigurable Architecture
Kim, Yongjoo; Lee, Jongeun; Lee, Jinyong; Paek, Yunheung;
  PDF(new window)
Coarse-Grained Reconfigurable Architecture (CGRA) is a very promising platform that provides fast turn-around-time as well as very high energy efficiency for multimedia applications. One of the problems with CGRAs, however, is application mapping, which currently does not scale well with geometrically increasing numbers of cores. To mitigate the scalability problem, this paper discusses how to use the SIMD (Single Instruction Multiple Data) paradigm for CGRAs. While the idea of SIMD is not new, SIMD can complicate the mapping problem by adding an additional dimension of iteration mapping to the already complex problem of operation and data mapping, which are all interdependent, and can thus significantly affect performance through memory bank conflicts. In this paper, based on a new architecture called SIMD reconfigurable architecture, which allows SIMD execution at multiple levels of granularity, we present how to minimize bank conflicts considering all three related sub-problems, for various RA organizations. We also present data tiling and evaluate a conflict-free scheduling algorithm as a way to eliminate bank conflicts for a certain class of mapping problem.
Coarse-grained reconfigurable architecture;application mapping;memory bank conflict;SIMD;
 Cited by
R. Hartenstein, "A decade of reconfigurable computing: a visionary retrospective," in Proceedings of Design, Automation and Test in Europe, 2001, pp. 642-649.

H. Park, K. Fan, S. A. Mahlke, T. Oh, H. Kim, and H.-s. Kim, "Edge-centric modulo scheduling for coarse-grained reconfigurable architectures," in Proceedings of PACT '08. New York, NY, USA: ACM, 2008, pp. 166-176.

K. Wu, A. Kanstein, J. Madsen, and M. Berekovic, "Mt-ADRES:multithreading on coarse-grained reconfigurable architecture," in ARC'07. Berlin, Heidelberg: Springer-Verlag, 2007, pp. 26-38.

H. Park, Y. Park, and S. Mahlke, "Polymorphic pipeline array: A flexible multicore accelerator with virtualized execution for mobile multimedia applications," in MICRO-42, dec. 2009, pp. 370- 380.

Y. Kim, J. Lee, T. X. Mai, and Y. Paek, "Improving performance of nested loops on reconfigurable array processors," ACM Transactions on Architecture and Code Optimization, 2012.

B. Mei, S. Vernalde, D. Verkest, H. D. Man, and R. Lauwereins, " ADRES: An architecture with tightly coupled VLIW processor and coarsegrained reconfigurable matrix," Lecture Notes in Computer Science, vol. 2778, pp. 61-70, 2003.

Y. Kim, J. Lee, A. Shrivastava, J. Yoon, D. Cho, and Y. Paek, "High throughput data mapping for coarse-grained reconfigurable architectures," Computer-Aided Design of Integrated Circuits and Systems, IEEE Transactions on, vol. 30, no. 11, pp. 1599 -1609, nov. 2011. crossref(new window)

Y. Kim, J. Lee, A. Shrivastava, and Y. Paek, "Operation and data mapping for cgras with multibank memory," SIGPLAN Not., vol. 45, no. 4, pp. 17-26, 2010. crossref(new window)

R. Barua, W. Lee, S. Amarasinghe, and A. Agarawal, "Compiler support for scalable and efficient memory systems," IEEE Trans. Comput., vol. 50, pp. 1234-1247, November 2001. crossref(new window)

M. I. Gordon, W. Thies, and S. Amarasinghe, "Exploiting coarse-grained task, data, and pipeline parallelism in stream programs," in Proceedings of International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2006, pp. 151-162.

H. Singh, M.-H. Lee, G. Lu, F. Kurdahi, N. Bagherzadeh, and E. Chaves Filho, "MorphoSys: an integrated reconfigurable system for dataparallel and computation-intensive applications," IEEE Trans. Comput., vol. 49, no. 5, pp. 465-481, 2000. crossref(new window)

Y. Lin, H. Lee, M. Woh, Y. Harel, S. Mahlke, T. Mudge, C. Chakrabarti, and K. Flautner, "Soda: A high-performance dsp architecture for softwaredefined radio," Micro, IEEE, vol. 27, no. 1, pp. 114-123, jan.-feb. 2007. crossref(new window)

M. Woh, S. Seo, S. Mahlke, T. Mudge, C. Chakrabarti, and K. Flautner, "Anysp: anytime anywhere anyway signal processing," in Proceedings of the 36th annual International Symposium on Computer Architecture. ACM, 2009, pp. 128-139.

G. Dasika, M. Woh, S. Seo, N. Clark, T. Mudge, and S. Mahlke, "Mighty-morphing power-SIMD," in Proceedings of International Conference on Compilers, Architectures and Synthesis for Embedded Systems. ACM, 2010, pp. 67-76.

S. Kyo and S. Okazaki, "IMAPCAR: A 100 gops in-vehicle vision processor based on 128 ring connected four-way VLIW processing elements," J. Signal Process. Syst., vol. 62, pp. 5-16, January 2011. crossref(new window)

H. Fatemi, B. Mesman, H. Corporaal, and P. Jonker, "RC-SIMD: Reconfigurable communication SIMD architecture for image processing applications," Journal of Embedded Computing, vol. 2, pp. 167- 179, 2006.

B. Bougard, B. De Sutter, D. Verkest, L. Van der Perre, and R. Lauwereins, "A coarse-grained array accelerator for software-defined radio baseband processing," IEEE Micro, vol. 28, pp. 41-50, July 2008. crossref(new window)

Y. Kim, J. Lee, J. Lee, T. X. Mai, I. Heo, and Y. Paek, "Exploiting both pipelining and data parallelism with SIMD reconfigurable architecture," in Proceedings of Reconfigurable Computing: Architectures, Tools and Applications, LNCS, vol. 7199. Springer, 2012, pp. 40-52.

Y. Kim, M. Kiemb, C. Park, J. Jung, and K. Choi, "Resource sharing and pipelining in coarse-grained reconfigurable architecture for domain-specific optimization," in Proceedings of Design, Automation and Test in Europe (DATE), 2005, pp. 12-17.

C.-H. O. Chen, S. Park, T. Krishna, and L.-S. Peh, "A low-swing crossbar and link generator for lowpower networks-on-chip," in Proceedings of the International Conference on Computer-Aided Design (ICCAD), 2011, pp. 779-786.

N. Binkert, B. Beckmann, G. Black, S. K. Reinhardt, A. Saidi, A. Basu, J. Hestness, D. R. Hower, T. Krishna, S. Sardashti, R. Sen, K. Sewell, M. Shoaib, N. Vaish, M. D. Hill, and D. A. Wood, "The gem5 simulator," SIGARCH Comput. Archit. News, vol. 39, pp. 1-7, Aug. 2011.

D. Wang, B. Ganesh, N. Tuaycharoen, K. Baynes, A. Jaleel, and B. Jacob, "Dramsim: a memory system simulator," SIGARCH Comput. Archit. News, vol. 33, pp. 100-107, November 2005. crossref(new window)

R. Gao, D. Xu, and J.P. Bentley, "Reconfigurable Hardware Implementation of an Improved Parallel Architecture for MPEG-4 Motion Estimation in Mobile Applications", Consumer Electronics, IEEE Transactions on, vol. 49, pp. 1383 - 1390, nov. 2003 crossref(new window)

C. Lo, S. Tsai, and M. Shieh, "Reconfigurable Architecture for Entropy Decoding and Inverse Transform in H.264", Consumer Electronics, IEEE Transactions on, vol. 56, pp. 1670 - 1676, aug. 2010 crossref(new window)

C. Lyuh, J. Suk, I. Chun, and T. Roh, "A Novel Reconfigurable Processor Using Dynamically Partitioned SIMD for Multimedia Applications" ETRI J., Volume 31, Number 6, Dec 2009, pp.709- 716 crossref(new window)

K.S. Choi and S.J. Ko, "Adaptive Scanning Based on a Morphological Representation of Coefficients for H.264/AVC,"ETRI J., vol. 31, no. 5, Oct. 2009, pp. 607-609 crossref(new window)