DOI QR코드

DOI QR Code

Design of a Parallel Rendering Processor Architecture with Effective Memory System

효과적인 메모리 구조를 갖는 병렬 렌더링 프로세서 설계

  • 박우찬 (세종대학교 컴퓨터공학과) ;
  • 윤덕기 (세종대학교 컴퓨터공학과) ;
  • 김경수 (세종대학교 컴퓨터공학과)
  • Published : 2006.08.01

Abstract

Current rendering processors are organized mainly to process a triangle as fast as possible and recently parallel 3D rendering processors, which can process multiple triangles in parallel with multiple rasterizers, begin to appear. For high performance in processing triangles, it is desirable for each rasterizer have its own local pixel cache. However, the consistency problem may occur in accessing the data at the same address simultaneously by more than one rasterizer. In this paper, we propose a parallel rendering processor architecture resolving such consistency problem effectively. Moreover, the proposed architecture reduces the latency due to a pixel cache miss significantly. For the above two goals, effective memory organizations including a new pixel cache architecture are presented. The experimental results show that the proposed architecture achieves almost linear speedup at best case even in sixteen rasterizers.

현재의 거의 대부분의 3차원 그래픽 프로세서는 한 개의 삼각형을 빠르게 처리하는 구조로 되어 있으며, 향후 여러 개의 삼각형을 병렬적으로 처리할 수 있는 프로세서가 등장할 것으로 예상된다. 고성능으로 삼각형을 처리하기 위해서는 각 래스터라이저마다 고유한 픽셀 캐시를 가져야 한다. 그런데, 병렬로 처리되는 경우 각각의 프로세서와 프레임 메모리 간에 일관성 문제가 발생할 수 있다. 본 논문에서는 각각의 그래픽 가속기에 픽셀 캐시를 사용가능 하게 하면서 성능을 증가시키고 일관성 문제를 해결하는 병렬 렌더링 프로세서를 제안한다. 제안하는 구조에서는 픽셀 캐시 미스에 의한 지연(latency)을 감소시켰다. 이러한 2가지 성과를 위하여 현재의 새로운 픽셀 캐시 구조에 효과적인 메모리 구조를 포함시켰다. 실험 결과는 제안하는 구조가 16개 이상의 래스터라이저에서 거의 선형적으로 속도 향상을 가져옴을 보여준다.

Keywords

References

  1. M. S. Suzuoki et al., 'A microprocessor with a 128-bit CPU, ten floating-point MAC's, four floating-point dividers, and an MPEG-2 decoder,' IEEE Journal of Solid-State Circuits, Vol.34, pp.1608-1618, Nov., 1999 https://doi.org/10.1109/4.799870
  2. K. Akeley, 'RealityEngine graphics,' In Proceedings of SIGGRAPH '93, pp.109-116, Aug., 1993 https://doi.org/10.1145/166117.166131
  3. J. S. Montrym, D. R. Baum, D. L. Dignam, and C. J. Migdal, 'InfinityRieality: A real-time graphics system,' Proceedings of SIGGRAPH '97, pp.293 - 302, Aug., 1997 https://doi.org/10.1145/258734.258871
  4. M. Deering and D. Naegle, 'The SAGE Architecture,' In Proceeddings of SIGGRAPH 2002, pp.683-692, July. 2002 https://doi.org/10.1145/566570.566638
  5. G. Humphreys, M.Eldridge, I. Buck, G. Stoll, M. Everett and P. Hanrahan, 'WireGL: A Scalable graphics system for clusters,' In Proceedings of SIGGRAPH 2001, pp.129-140, Aug., 2001 https://doi.org/10.1145/383259.383272
  6. A. K. Khan et al., 'A 150-MHz graphics rendering processor with 256-Mb embedded DRAM,' IEEE Journal of Solid State Circuits, Vol. 36, No.11, pp.1775-1783, Nov., 2001 https://doi.org/10.1109/4.962301
  7. S. Molnar, M. Cox, M. Ellsworth, and H. Fuchs, 'A sorting classification of parallel rendering,' IEEE Computer Graphics and Applications, Vol.14, No.4, pp.23-32, July, 1994 https://doi.org/10.1109/38.291528
  8. A. Wolfe and D. B. Noonburg, 'A superscalar 3D graphics engine,' In Proceedings of MICRO 32, pp.50-61, 1999 https://doi.org/10.1109/MICRO.1999.809443
  9. F. D. Michael, A. S. Stephen, and G. L. Michael, 'FBRAM: A new form memory optimized for 3D Graphics,' In Proceedings of SIGGRAPH '94, pp.167-174, 1994 https://doi.org/10.1145/192161.192194
  10. K. Inoue, H. Nakamura, and H. Kawai, 'A 10b Frame buffer memory with Z-compare and A-bending units,' IEEE Journal of Solid-State Circuits, Vol.30, No.12, pp.1563-1568, Dec., 1995 https://doi.org/10.1109/4.482207
  11. A. Kugler, 'The setup for triangle rasterization,' 11th Eurographics Workshop on Computer Graphics Hardware, pp.49-58, Aug., 1996
  12. Z. S. Hakura and A. Gupta, 'The design and analysis of a cache architecture for texture mapping,' In Proceedings of the 24thInternational Symposium on Computer Architecture, pp.108-120, June, 1997 https://doi.org/10.1145/264107.264152
  13. H. Igehy, M. Eldridge, and K. Proudfoot, 'Prefetching in a texture cache architecture,' In Proceedings of 1998 SIGGRAPH/Eurographics Workshop on Graphics Hardware, pp.133-142, August, 1998 https://doi.org/10.1145/285305.285321
  14. J. McCormack, R. McNamara, C. Gianos, L. Seiler, N. P. Jouppi, K. Correl, T. Dutton, and J. Zurawski, 'Neon: a (big) (fast) single-chip 3D workstation graphics accelerator,' Research Report 98/1, Western Research Laboratory, Compaq Corporation, Aug., 1998 (revised July 1999)
  15. L. Garber, 'The wild world of 3D graphics chips,' IEEE Computer, Vol.33, No.9, pp.12 - 16, Sept., 2000 https://doi.org/10.1109/MC.2000.868692
  16. Woo-Chan Park, Kil-Whan Lee, Il-San Kim, Tack-Don Han, and Sung-Bong Yang, 'An Effective Pixel Rasterization Pipeline Architecture for 3D Rendering Processors,' IEEE Transactions on Computers, Vol.52, No.11, pp.1501-1508, Nov., 2003 https://doi.org/10.1109/TC.2003.1244948
  17. M. Woo, J. Neider, T. Davis, and D. Shreiner, OpenGL programming guide, Addison-Wesley, Third edition, 1999
  18. R. Bar-Yehuda and C. Gotsman, 'Time/space tradoffs for polygon mesh rendering,' ACM Transactions on graphics, Vol.15, No.2, pp.141-152, 1996 https://doi.org/10.1145/234972.234976
  19. Kai Hwang, Advanced computer architecture: parallelism, scalability, programmability, McGraw Hill, 1993
  20. M. D. Hill, J. R. Larus, A. R. Lebeck, M. Talluri, and D. A. Wood, 'Wisconsin architectural research tool set,' ACM SIGARCH Computer Architecture News, Vol.21, pp.8-10, Sep., 1993 https://doi.org/10.1145/165496.165500
  21. D. A. Patterson and J. L. Hennessy, Computer organization & design: The hardware/software interface, Morgan Kaufmann Publisher Inc., Second edition, 1998
  22. http://www.spec.org/gpc/opc.static/opcview70.html