DOI QR코드

DOI QR Code

Low Power TLB Supporting Multiple Page Sizes without Operation System

운영체제 도움 없이 멀티 페이지를 지원하는 저전력 TLB 구조

  • Jung, Bo-Sung (ERI, Dept. of Control & Instrumentation, Gyeongsang National University) ;
  • Lee, Jung-Hoon (ERI, Dept. of Control & Instrumentation, Gyeongsang National University)
  • 정보성 (국립경상대학교 제어계측공학과) ;
  • 이정훈 (국립경상대학교 제어계측공학과)
  • Received : 2013.06.04
  • Accepted : 2013.11.11
  • Published : 2013.12.31

Abstract

Even though the multiple pages TLB are effective in improving the performance, a conventional method with OS support cannot utilize multiple page sizes in user application. Thus, we propose a new multiple-TLB structure supporting multiple page sizes for high performance and low power consumption without any operating system support. The proposed TLB is organised as two parts of a S-TLB(Small TLB) with a small page size and a L-TLB(Large TLB) with a large page size. Both are designed as fully associative bank structures. The S-TLB stores small pages are evicted from the L-TLB, and the L-TLB stores large pages including a small page generated by the CPU. Each one bank module of S-TLB and L-TLB can be selectively accessed base on particular one and two bits of the virtual address generated from CPU, respectively. Energy savings are achieved by reducing the number of entries accessed at a time. Also, this paper proposed the simple 1-bit LRU policy to improve the performance. The proposed LRU policy can present recently referenced block by using an additional one bit of each entry on TLBs. This method can simply select a least recently used page from the L-TLB. According to the simulation results, the proposed TLB can reduce Energy * Delay by about 76%, 57%, and 6% compared with a fully associative TLB, a ARM TLB, and a Dual TLB, respectively.

비록 멀티 페이지 TLB는 성능을 향상시키는데 효과적이지만, 운영체제의 도움을 통한 기존의 방법은 사용자 응용 프로그램에서는 멀티 페이지를 사용할 수 없는 치명적인 단점을 가진다. 이에 본 논문에서는 운영체제의 지원 없이 멀티 페이지를 이용하여 고성능과 저전력을 얻을 수 있는 새로운 멀티 TLB 구조를 제안한다. 제안된 TLB는 작은 페이지를 위한 TLB와 큰 페이지를 위한 TLB로 구성되며, 모두 완전연관 뱅크 구조를 가지고 있다. 작은 페이지를 지원하는 S-TLB(Small TLB)는 큰 페이지를 지원하는 L-TLB(Large TLB)에서 추출된 작은 페이지를 저장하게 되며, L-TLB는 CPU로부터 요청된 작은 페이지를 포함한 큰 가상 페이지 주소를 저장하게 된다. CPU가 요청한 가상주소의 특별한 한 비트와 두 비트를 이용하여 S-TLB와 L_TLB의 각각의 하나의 뱅크만이 접근되며, 동시에 접근되는 엔트리 수 감소에 의해 에너지 소비를 줄일 수 있다. 또한 본 논문에서 효과적인 성능향상을 위해 간단한 1비트 LRU 정책을 제안하였다. 제안된 LRU 정책은 각 TLB 엔트리에 추가적인 1 비트를 사용하여 최근에 참조된 블록을 나타낸다. 이 방법은 간단하게 L-TLB로부터 가장 최근에 참조된 페이지를 선택할 수 있다. 시뮬레이션 결과에 따르면, 제안된 구조는 완전연관 사상 TLB, Dual TLB 그리고 ARM TLB에 비해 76%, 57%, 그리고 6%의 에너지*지연시간을 줄일 수 있었다.

Keywords

References

  1. X. G. Qiu and M. Dubois. "Moving Address Tra nslation Closer to Memory in Distributed Shared-Memory Multiprocessors," IEEE Tran. on Parallel and Distributed Systems, Vol. 16, No 7, pp.612-623, Mar. 2005. https://doi.org/10.1109/TPDS.2005.84
  2. T. W. Barr. "Exploiting Address Space Continuity to Accelerate TLB Miss Handling," Master degree paper of Rice University, 2010.
  3. A. Basu, M. D. Hill, and M. M. Swift, "Reducing Memory Reference Energy with Opportunistic Virtual Caching," In Proceedings of International Symposium on Computer Architecture, pp.297-308, 2012.
  4. R. Bhargava et al., "Accelerating Two- Dimensional Page Walks for Virtualized Systems," Proceedings of the 13th international conference on Architecture support for programming languages and operation system, pp.26-35, 2008.
  5. B, Pham, V. Vaidyanathan, A. Jaleel and A. Bhattach arjee, "CoLT: Coalesces Large-Reach TLBs," Annual IEEE/ACM International Symposium on MICRO, pp.258-269, Dec. 2012.
  6. C. H. Pack, D. Y. Pack, "Increasing TLB Reach with Multiple Pages Size Subblocks," 21st IEEE International Performance, Computing and Communications Conference, pp.123-130, 2002.
  7. M. Talluri and M. D. Hill, "Surpassing the TLB performance of superpages with less operating system support," in Proc. of the 6th Symposium on Architectural Support for Programming Languages and Operating systems, pp.171-182, Oct. 1994.
  8. T. Fukunaga and T. Sueyoshi, "Improvement of parallel processing performance by using two kinds of Huge Page," Automation and Systems International Conference on Control. pp.2662-2666, Oct. 2008.
  9. T. W. Barr, A. L. Cox, and S. Rixner. "SpecTLB: a mechanism for speculative address translation," In Proceeding of the 38th annual international symposium on Computer architecture, pp.307-318, 2011.
  10. J. H. Lee and S. D. Kim, "A dynamic TLB managment structure to support different page sizes," Proceedings of the Second IEEE Asia Pacific Conference on ASICs, pp.299-302, Aug. 2000.
  11. A. Seznec, " Concurrent support of Multiple page sizes on a skewed associative TLB," IEEE transactions on computers, Vol. 53, pp.924-927, July, 2004. https://doi.org/10.1109/TC.2004.21
  12. cortex-A9: technical reference manual, 2008.
  13. Y. J. Chang, "Two New Techniques Integrated for Energy-Efficient TLB Design," IEEE Transactions on Very Large Scale Integration System, Vol. 15, No. 1, Jan. 2007.
  14. A. Bhattacharjee and M. Martonosi, "Inter-Core Cooperative TLB Prefetchers for Chip Multiprocessors," Proceedings of the 15th edition of Architecture support for programming languages and operation system, pp.359-370, 2010.
  15. D. Burger and T. M. Austin, "The SimpleScalar tool set, version 2.0, Technical Report TR-97-1 342," University of Wisconsin-Madison, 1997.
  16. G. Reinman. and N. P. Jouppi, "CACTI 3.0: An integrated cache timing and power, and area model," Compaq WRL Report, Aug. 2001.
  17. SPEC Benchmark Suite. http://www.spec.org