Article

Free access

Trace cache: a low latency approach to high bandwidth instruction fetching

Authors:

Eric Rotenberg,

James E. SmithAuthors Info & Claims

MICRO 29: Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture

Pages 24 - 35

Published: 02 December 1996 Publication History

Abstract

As the issue width of superscalar processors is increased, instruction fetch bandwidth requirements will also increase. It will become necessary to fetch multiple basic blocks per cycle. Conventional instruction caches hinder this effort because long instruction sequences are not always in contiguous cache locations. We propose supplementing the conventional instruction cache with a trace cache. This structure caches traces of the dynamic instruction stream, so instructions that are otherwise noncontiguous appear contiguous. For the Instruction Benchmark Suite (IBS) and SPEC92 integer benchmarks, a 4 kilobyte trace cache improves performance on average by 28% over conventional sequential fetching. Further, it is shown that the trace cache's efficient, low latency approach enables it to outperform more complex mechanisms that work solely out of the instruction cache.

References

[1]

T Conte, K. Menezes, P. Mills, and B. Patel. Optimization of instruction fetch mechanisms for high issue rates. 22nd Intl. Syrup. on Computer Architecture, pp. 333-344, June 1995.

Digital Library

[2]

S. Dutta and M. Franklin. Control flow prediction with treelike subgraphs for superscalar processors. 28th Intl. Symp. on Microarchitecture, pp. 258-263, Nov 1995.

Digital Library

[3]

M. Franklin and M. Smotherman. A fill-unit approach to multiple instruction issue. 27th Intl. Syrup. on Microarchitecture, pp. 162-171,Nov 1994.

Digital Library

[4]

G. F. Grohoski. Machine organization of the ibm rs/6000 processor, iBM Journal of R&D, 34(1):37-58, Jan 1990.

Digital Library

[5]

N. Jouppi. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers. 17th Intl. Symp. on Computer Architecture, pp. 364- 373, May 1990.

Digital Library

[6]

D. Kaeli and P. Emma. Branch history table prediction of moving target branches due to subroutine returns. 18th Intl. Syrup. on Computer Architecture, pp. 34-42, May 1991.

Digital Library

[7]

J. Larus. Efficient program tracing. IEEE Computer, 26(5):52-61, May 1993.

Digital Library

[8]

J. Lee and A. J. Smith. Branch prediction strategies and branch target buffer design. IEEE Computer, 21(7):6-22, Jan 1984.

Digital Library

[9]

J. Losq. Generalized history table for branch prediction. IBM Technical Disclosure Bulletin, 25(1 ):99-101, June 1982.

[10]

S. Melvin, M. Shebanow, and Y. Patt. Hardware support for large atomic units in dynamically scheduled machines. 21st intl. Syrup. on Microarchitecture, pp. 60-66, Dec 1988.

Digital Library

[11]

S.-T. Pan, K. So, and J. T. Rahmeh. improving the accuracy of dynamic branch prediction using branch correlation. 5th Intl. Conf. on Architectural Support for Programming Languages and Operating Systems, pp. 76-84, Oct 1992.

Digital Library

[12]

E. Rotenberg, S. Bennett, and J. Smith. Trace cache: a low latency approach to high bandwidth instruction fetching. Tech Report 1310, CS Dept., Univ. ofWisc. - Madison, 1996.

Digital Library

[13]

J. E. Smith. A study of branch prediction strategies. 8th Symp. on Computer Architecture, pp. 135-148, May 1981.

Digital Library

[14]

R. Uhlig, D. Nagle, T. Mudge, S. Sechrest, and J. Emer. Instruction fetching: Coping with code bloat. 22nd Intl. Syrup. on Computer Architecture, pp. 345-356, June 1995.

Digital Library

[15]

T-Y. Yeh. Two-level Adaptive Branch Prediction and Instruction Fetch Mechanisms for High Performance Superscalar Processors. PhD thesis, EECS Dept., University of Michigan - Ann Arbor, 1993.

Digital Library

[16]

T.-Y. Yeh, D. T Marr, and Y. N. Patt. Increasing the instruction fetch rate via multiple branch prediction and a branch address cache. 7th Intl. Conf. on Supercomputing, pp. 67- 76, July 1993.

Digital Library

[17]

T.-Y. Yeh and Y. N. Patt. A comprehensive instruction fetch mechanism for a processor supporting speculative execution. 25th Intl. Syrup. on Microarchitecture, pp. 129-139, Dec 1992.

Digital Library

Cited By

Khan TZhang DSriraman ADevietti JPokam GLitz HKasikci BMartínez JDuato JJohn L(2021)RippleProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00063(734-747)Online publication date: 14-Jun-2021
https://dl.acm.org/doi/10.1109/ISCA52012.2021.00063
Ren XMoody LTaram MJordan MTullsen DVenkat AMartínez JDuato JJohn L(2021)I see dead μopsProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00036(361-374)Online publication date: 14-Jun-2021
https://dl.acm.org/doi/10.1109/ISCA52012.2021.00036
Tsai PGan YSanchez DOskin MInoue K(2018)Rethinking the memory hierarchy for modern languagesProceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2018.00025(203-216)Online publication date: 20-Oct-2018
https://dl.acm.org/doi/10.1109/MICRO.2018.00025
Show More Cited By

Index Terms

Trace cache: a low latency approach to high bandwidth instruction fetching
1. Computer systems organization
  1. Architectures
    1. Parallel architectures
      1. Very long instruction word
    2. Serial architectures
      1. Complex instruction set computing
      2. Reduced instruction set computing
2. Hardware
  1. Integrated circuits
    1. Semiconductor memory
      1. Dynamic memory

Recommendations

A Trace Cache Microarchitecture and Evaluation
Special issue on cache memory and related problems

As the instruction issue width of superscalar processors increases, instruction fetch bandwidth requirements will also increase. It will eventually become necessary to fetch multiple basic blocks per clock cycle. Conventional instruction caches hinder ...
The Effect of Program Optimization on Trace Cache Efficiency
PACT '99: Proceedings of the 1999 International Conference on Parallel Architectures and Compilation Techniques

Trace cache, an instruction fetch technique that reduces taken branch penalties by storing and fetching program instructions in dynamic execution order, dramatically improves instruction fetch bandwidth. Similarly, program transformations like loop ...
Trace Cache Miss Rate

Instruction fetch mechanism is a performance bottleneck of Superscalar and Simultaneous Multithreading Processors. A hardware mechanism, known as Trace Cache, is used in several processor architectures to improve instruction fetch performance. Most ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MICRO 29: Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture

December 1996

359 pages

ISBN:0818676418

Chairmen:
Stephen Melvin
Zytek Communications Corp.
,
Steve Beaty
Hewlett-Packard Corp.

Copyright © Copyright (c) 1996 Institute of Electrical and Electronics Engineers, Inc. All rights reserved.

Sponsors

SIGMICRO: ACM Special Interest Group on Microarchitectural Research and Processing
IEEE-CS\TCMM: TC on Microprocessors & Microcomputers

Publisher

IEEE Computer Society

United States

Publication History

Published: 02 December 1996

Check for updates

Author Tags

Qualifiers

Article

Conference

MICRO96

Sponsor:

SIGMICRO
IEEE-CS\TCMM

MICRO96: 29th Annual International Symposium on Microarchitecture

December 2 - 4, 1996

Paris, France

Acceptance Rates

Overall Acceptance Rate 484 of 2,242 submissions, 22%

Upcoming Conference

MICRO '24

Sponsor:
sigmicro

57th Annual IEEE/ACM International Symposium on Microarchitecture

November 2 - 6, 2024

Austin , TX , USA

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

163
Total Citations
View Citations
2,020
Total Downloads

Downloads (Last 12 months)82
Downloads (Last 6 weeks)18

Reflects downloads up to 15 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Khan TZhang DSriraman ADevietti JPokam GLitz HKasikci BMartínez JDuato JJohn L(2021)RippleProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00063(734-747)Online publication date: 14-Jun-2021
https://dl.acm.org/doi/10.1109/ISCA52012.2021.00063
Ren XMoody LTaram MJordan MTullsen DVenkat AMartínez JDuato JJohn L(2021)I see dead μopsProceedings of the 48th Annual International Symposium on Computer Architecture10.1109/ISCA52012.2021.00036(361-374)Online publication date: 14-Jun-2021
https://dl.acm.org/doi/10.1109/ISCA52012.2021.00036
Tsai PGan YSanchez DOskin MInoue K(2018)Rethinking the memory hierarchy for modern languagesProceedings of the 51st Annual IEEE/ACM International Symposium on Microarchitecture10.1109/MICRO.2018.00025(203-216)Online publication date: 20-Oct-2018
https://dl.acm.org/doi/10.1109/MICRO.2018.00025
Padmanabha SLukefahr ADas RMahlke SHunter HMoreno JEmer JSanchez D(2017)Mirage coresProceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture10.1145/3123939.3123969(745-758)Online publication date: 14-Oct-2017
https://dl.acm.org/doi/10.1145/3123939.3123969
Estebanez ALlanos DGonzalez-Escribano A(2016)A Survey on Thread-Level Speculation TechniquesACM Computing Surveys10.1145/293836949:2(1-39)Online publication date: 30-Jun-2016
https://dl.acm.org/doi/10.1145/2938369
Liu FAhn HBeard SOh TAugust D(2015)DynaSpAMACM SIGARCH Computer Architecture News10.1145/2872887.275041443:3S(541-553)Online publication date: 13-Jun-2015
https://dl.acm.org/doi/10.1145/2872887.2750414
Padmanabha SLukefahr ADas RMahlke SPrvulovic M(2015)DynaMOSProceedings of the 48th International Symposium on Microarchitecture10.1145/2830772.2830791(322-333)Online publication date: 5-Dec-2015
https://dl.acm.org/doi/10.1145/2830772.2830791
Michaud PMondelli ASeznec A(2015)Revisiting Clustered Microarchitecture for Future Superscalar CoresACM Transactions on Architecture and Code Optimization10.1145/280078712:3(1-22)Online publication date: 31-Aug-2015
https://dl.acm.org/doi/10.1145/2800787
Liu FAhn HBeard SOh TAugust DMarr DAlbonesi D(2015)DynaSpAMProceedings of the 42nd Annual International Symposium on Computer Architecture10.1145/2749469.2750414(541-553)Online publication date: 13-Jun-2015
https://dl.acm.org/doi/10.1145/2749469.2750414
Hsu PLin PHwang TChang Y(2014)Compaction-free compressed cache for high performance multi-core systemProceedings of the 2014 IEEE/ACM International Conference on Computer-Aided Design10.5555/2691365.2691396(140-147)Online publication date: 3-Nov-2014
https://dl.acm.org/doi/10.5555/2691365.2691396
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Table of Contents