Advanced SearchSearch Tips
An Improved Dynamic Branch Predictor by Selective Access of a Specific Element in 4-Way Cache
facebook(new window)  Pirnt(new window) E-mail(new window) Excel Download
 Title & Authors
An Improved Dynamic Branch Predictor by Selective Access of a Specific Element in 4-Way Cache
Hwang, In-Sung; Hwang, Sun-Young;
  PDF(new window)
This paper proposes an improved branch predictor that reduces the number execution cycles of applications by selectively accessing a specific element in 4-way associative cache. When a branch instruction is fetched, the proposed branch predictor acquires a branch target address from the selected element in the cache by referring to MRU buffer. Branch prediction rate and application execution speed are considerably improved by increasing the number of BTAC entries in restricted power condition, when compared with that of previous branch predictor which accesses all elements. The effectiveness of the proposed dynamic branch predictor is verified by executing benchmark applications on the core simulator. Experimental results show that number of execution cycles decreases by an average of 10.1%, while power consumption increases an average of 7.4%, when compared to that of a core without a dynamic branch predictor. Execution cycles are reduced by 4.1% in comparison with a core which employs previous dynamic branch predictor.
Embedded System;Branch Prediction;MRU Buffer;BTAC;MDL;
 Cited by
T. Juan, S. Sanjeevan, and J. Navarro, "Dynamic history-length fitting : A third level of adaptivity for branch prediction," in Proc. Comput. Architecture, pp. 155-166, Barcelona, Spain, July 1998.

J. Lee and A. Smith, "Branch prediction strategies and branch target buffer design," Computer, vol. 17, no. 1, pp. 6-22, Jan. 1984.

J. Hennessy and D. Patterson, Computer Architecture: A Quantitative Approach, Morgan Kaufmann Publishers, 1990.

T. Ball and J. Laurs, "Branch prediction for free," in Proc. ACM SIGPLAN Conf. Programming Language Design Implementation, pp. 300-313, New York, U.S.A., Aug. 1993.

J. Patterson, "Accurate static branch prediction by value range propagation," in Proc. ACM SIGPLAN Conf. Programming Language Design Implementation, pp. 67-78, New York, U.S.A., June 1995.

B. Calder, D. Grunwald, M. Jones, D. Lindsay, J. Martin, M. Mozer, and B. Zorn, "Evidence-based static branch prediction using machine learning," ACM Trans. Programming Languages Syst., vol. 19, no. 1, pp. 1-43, Sep. 1996.

C. Cheng, The Schemes and Performances of Dynamic Branch Predictors, Technical Report, Berkeley Wireless Research Center, 2000.

R. Sendag, J. Yi, P. Chuang, and D. Lilja, "Low power/area branch prediction using complementary branch predictors," in Proc. IEEE Int. Parallel Distributed Process. Symp., pp. 1-12, Miami, U.S.A., Apr. 2008.

Y. Maa, M. Yen, S. Kuo, and G. Lee, "Cost-effective branch prediction by combining hedging and filtering," in Proc Int. Comput. Symp., pp. 648-655, Tainan, Taiwan, Dec. 2010.

T. Chen, P. Pan, G. Jiang, and M. Ye, "Record branch prediction : An optimized scheme for two-level branch predictors," in Proc. IEEE 14th Int. Conf. High Performance Comput. Commun., pp. 1526-1533, Liverpool, U.K., June 2012.

D. Parikh, K. Skadron, Y. Zhang, and M. Stan, "Power-aware branch prediction: Characterization and design," IEEE Trans. Comput., vol. 53, no. 2, pp. 168-186, Feb. 2004. crossref(new window)

L. Nadav and W. Shlomo, "Low power branch prediction for embedded application processors," in Proc. Low Power Electron. Design, pp. 67-72, Austin, U.S.A., Aug. 2010.

S. McFarling, Combining branch predictors, Technical Report, Western Research Laboratory, Dec. 1993.

Y. Ding and W. Zhang, "Loop-based instruction prefetching to reduce the worst-case execution time," IEEE Trans. Comput., vol. 59, no. 6, pp. 855-864, June 2010. crossref(new window)

M. Kobayashi, "Dynamic characteristics of loops," IEEE Trans. Comput., vol. 33, no. 2, pp. 125-132, Feb. 1984.

S. Segars, "The ARM9 family-High performance microprocessors for embedded applications," in Proc. Int. Conf. Comput. Design, pp. 230-235, Austin, U.S.A., Oct. 1998.

M. Guthaus, J. Ringenberg, D. Ernst, T. Austin, T. Mudge, and R. Brown, "MiBench: A free, commercially representative embedded benchmark suite," in Proc. IEEE Int. Workshops Workload Characterization, pp. 3-14, Austin, U.S.A., Dec. 2001.

K. Inoue, T. Ishihara, and K. Murakami, "Way-predicting set-associative cache for high performance and low energy consumption," in Proc. Int. Symp. Low Power Electron. Design, pp. 273-275, San Diego, U.S.A., Aug. 1999.

M. Calagos and Y. Chu, "Hybrid scheme for low-power set associative caches," Electron. Lett., vol. 48, no. 14, pp. 819-821, July 2012. crossref(new window)

K. Kedzierski, M. Moreto, F. Cazorla, and M. Valero, "Adapting cache partitioning algorithms to pseudo-LRU replacement policies," in Proc. Parallel Distributed Process, pp. 1-12, Atlanta, U.S.A., Apr. 2010.

N. Dutt and K. Choi, "Configurable processor for embedded computing," IEEE Comput., vol. 36, no. 1, pp. 120-123, Jan. 2003.

K. Choi and Y. Cho, "Recent trends in the SoC design methodology," Inst. Electron. Eng. Korea (IEEK) Mag., vol. 30, no. 9, pp. 17-27, Sep. 2003.

H. Lee and S. Hwang, "Design of a high-level synthesis system for automatic generation of pipelined datapath," J. Inst. Electron. Eng. Korea (IEEK), vol. 31-A, no. 4, pp. 53-67, Mar. 1994.

J. Cho, Y. Yoo, and S. Hwang, "Construction of an automatic generation system of embedded processor cores," J. Korean Inst. Commun. Inform. Sci, (KICS), vol. 30, no. 6A, pp. 526-534, June 2005.

ARM, ARM922T Technical Reference Manual (rev 0), 2001.

ARM, ARM Architecture Reference Manual (rev 0), 2005.