0% found this document useful (0 votes)

5 views3 pages

CS530 Fall2015 Lecture6

The document discusses various cache optimizations in memory hierarchy, including block placement, identification, replacement strategies, and advanced techniques such as way prediction and nonblocking caches. It emphasizes the importance of cache size, associativity, and energy efficiency in improving hit time and reducing miss penalties. Additionally, it covers compiler and hardware prefetching strategies to enhance memory access performance.

Uploaded by

oalqudi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views3 pages

CS530 Fall2015 Lecture6

Uploaded by

oalqudi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

9/14/15

Associative Cache Example

Memory:
Chapter 2 &
Appendix B,
Part 3
Gregory D. Peterson
gdp@utk.edu

Introduction
Memory Hierarchy Basics Memory Hierarchy Basics
• Six basic cache optimizations:
• Q1: Where can a block be placed in the upper – Larger block size
• Reduces compulsory misses
level? (Block placement) • Increases capacity and conflict misses, increases miss penalty

• Q2: How is a block found if it is in the upper – Larger total cache capacity to reduce miss rate
• Increases hit time, increases power consumption
level? (Block identification) – Higher associativity
• Q3: Which block should be replaced on a miss? • Reduces conflict misses
• Increases hit time, increases power consumption
(Block replacement) – Higher number of cache levels
• Q4: What happens on a write? • Reduces overall memory access time

(Write strategy) – Giving priority to read misses over writes

• Reduces miss penalty
– Avoiding address translation in cache indexing
• Reduces hit time

Copyright © 2012, Elsevier Inc. All rights reserved. Copyright © 2012, Elsevier Inc. All rights reserved.
Advanced Optimizations

Advanced Optimizations

Ten Advanced Optimizations L1 Size and Associativity

• Small and simple first level caches

– Critical timing path:
• addressing tag memory, then
• comparing tags, then
• selecting correct set
– Direct-mapped caches can overlap tag compare
and transmission of data
– Lower associativity reduces power because fewer
cache lines are accessed

Access time vs. size and associativity

1
9/14/15

Advanced Optimizations
L1 Size and Associativity Cache Example

• Assume a direct cache with 16 words and a

block size of 2 words.
• Which of these hit/miss and what are the final
contents after the following word addresses?
• 4, 36, 4, 13, 7, 12, 15, 11, 8, 56, 27, 21, 12
• What if the cache is 2-way set associative?

Energy per read vs. size and associativity

Copyright © 2012, Elsevier Inc. All rights reserved.

Advanced Optimizations

Advanced Optimizations
Way Prediction Pipelining Cache

• To improve hit time, predict the way to pre-set • Pipeline cache access to improve bandwidth
mux – Examples:
– Mis-prediction gives longer hit time • Pentium: 1 cycle
– Prediction accuracy • Pentium Pro – Pentium III: 2 cycles
• > 90% for two-way • Pentium 4 – Core i7: 4 cycles
• > 80% for four-way
• I-cache has better accuracy than D-cache
– First used on MIPS R10000 in mid-90s • Increases branch mis-prediction penalty
– Used on ARM Cortex-A8
• Makes it easier to increase associativity
• Extend to predict block as well
– “Way selection”
– Increases mis-prediction penalty
Copyright © 2012, Elsevier Inc. All rights reserved. Copyright © 2012, Elsevier Inc. All rights reserved.
Advanced Optimizations

Advanced Optimizations

Nonblocking Caches Multibanked Caches

• Allow hits before • Organize cache as independent banks to
previous misses support simultaneous access
complete
– ARM Cortex-A8 supports 1-4 banks for L2
– “Hit under miss”
– “Hit under multiple – Intel i7 supports 4 banks for L1 and 8 banks for
miss” L2
• L2 must support
this
• In general,
processors can hide
L1 miss penalty but
not L2 miss penalty
• Interleave banks according to block
address
Copyright © 2012, Elsevier Inc. All rights reserved. Copyright © 2012, Elsevier Inc. All rights reserved.

2
9/14/15

Advanced Optimizations

Advanced Optimizations
Critical Word First, Early Restart Merging Write Buffer

• Critical word first • When storing to a block that is already pending in

– Request missed word from memory first the write buffer, update write buffer
– Send it to the processor as soon as it arrives • Reduces stalls due to full write buffer

• Early restart • Do not apply to I/O addresses

– Request words in normal order
– Send missed work to the processor as soon as No write
it arrives buffering

• Effectiveness of these strategies depends

on block size and likelihood of another Write buffering
access to the portion of the block that has
not yet been fetched
Copyright © 2012, Elsevier Inc. All rights reserved. Copyright © 2012, Elsevier Inc. All rights reserved.

Advanced Optimizations

Advanced Optimizations
Compiler Optimizations Hardware Prefetching

• Loop Interchange • Fetch two blocks on miss (include next

– Swap nested loops to access memory in sequential block)
sequential order

• Blocking
– Instead of accessing entire rows or columns,
subdivide matrices into blocks
– Requires more memory accesses but improves
locality of accesses

Advanced Optimizations

Compiler Prefetching Summary

• Insert prefetch instructions before data is
needed
• Non-faulting: prefetch doesn’t cause exceptions

• Register prefetch
– Loads data into register
• Cache prefetch
– Loads data into cache

• Combine with loop unrolling and software

pipelining

Memory Hierarchy Design: A Quantitative Approach, Fifth Edition
No ratings yet
Memory Hierarchy Design: A Quantitative Approach, Fifth Edition
17 pages
Memory Hierarchy Design: A Quantitative Approach, Fifth Edition
No ratings yet
Memory Hierarchy Design: A Quantitative Approach, Fifth Edition
37 pages
Memory Hierarchy Design: A Quantitative Approach, Fifth Edition
No ratings yet
Memory Hierarchy Design: A Quantitative Approach, Fifth Edition
22 pages
Memory Hierarchy for Engineers
No ratings yet
Memory Hierarchy for Engineers
15 pages
Unit II
No ratings yet
Unit II
9 pages
COMP 740: Computer Architecture and Implementation: Montek Singh
No ratings yet
COMP 740: Computer Architecture and Implementation: Montek Singh
41 pages
Cache 1 54
No ratings yet
Cache 1 54
54 pages
CHAPTER 2 Memory Hierarchy Design & APPENDIX B. Review of Memory Heriarchy
No ratings yet
CHAPTER 2 Memory Hierarchy Design & APPENDIX B. Review of Memory Heriarchy
73 pages
10 Caches
No ratings yet
10 Caches
34 pages
Cache Optimizations
No ratings yet
Cache Optimizations
29 pages
Chapter 2 Neede For Guide Line Help From Smiw
No ratings yet
Chapter 2 Neede For Guide Line Help From Smiw
7 pages
202004221613338445rohit Engg Advance Opt of Cache
No ratings yet
202004221613338445rohit Engg Advance Opt of Cache
9 pages
Advanced Computer Architecture-06CS81-Memory Hierarchy Design
No ratings yet
Advanced Computer Architecture-06CS81-Memory Hierarchy Design
18 pages
CAQA6e ch2
No ratings yet
CAQA6e ch2
51 pages
Ec6009 Advanced Computer Architecture Unit V Memory and I/O: Cache Performance
No ratings yet
Ec6009 Advanced Computer Architecture Unit V Memory and I/O: Cache Performance
16 pages
ch2 Appb
No ratings yet
ch2 Appb
58 pages
L07 MemoryII
No ratings yet
L07 MemoryII
27 pages
Cache Optimization Techniques
No ratings yet
Cache Optimization Techniques
4 pages
Advanced Cache Strategies
No ratings yet
Advanced Cache Strategies
27 pages
Lecture 5 Cache Optimization
No ratings yet
Lecture 5 Cache Optimization
25 pages
UNIT-IV Memory and I/O
No ratings yet
UNIT-IV Memory and I/O
36 pages
5.2 Eleven Advanced Optimizations of Cache Performance
No ratings yet
5.2 Eleven Advanced Optimizations of Cache Performance
13 pages
CS530 Fall2015 Lecture7
No ratings yet
CS530 Fall2015 Lecture7
7 pages
Cache
No ratings yet
Cache
34 pages
Cache Memory Organization Guide
No ratings yet
Cache Memory Organization Guide
19 pages
CS 322M Digital Logic & Computer Architecture: Cache Optimization Techniques-II
No ratings yet
CS 322M Digital Logic & Computer Architecture: Cache Optimization Techniques-II
14 pages
Memory Hierarchy 4.0
No ratings yet
Memory Hierarchy 4.0
50 pages
Memory Cache
No ratings yet
Memory Cache
18 pages
Lecture 12: Cache Innovations
No ratings yet
Lecture 12: Cache Innovations
17 pages
Memory 2
No ratings yet
Memory 2
31 pages
Question: Who Cares About The Memory Hierarchy?: Caches and Memory Systems I
No ratings yet
Question: Who Cares About The Memory Hierarchy?: Caches and Memory Systems I
13 pages
Lecture4-Ch2-Memory Hierarchy Design
No ratings yet
Lecture4-Ch2-Memory Hierarchy Design
34 pages
Lecture 7
No ratings yet
Lecture 7
21 pages
Improving Cache Performance:: Average Memory Access Time Amat T + Miss Rate X Miss Penalty
No ratings yet
Improving Cache Performance:: Average Memory Access Time Amat T + Miss Rate X Miss Penalty
16 pages
Cache Misses
No ratings yet
Cache Misses
8 pages
Memory Hierarchy - Introduction: Cost Performance of Memory Reference
No ratings yet
Memory Hierarchy - Introduction: Cost Performance of Memory Reference
52 pages
l08 Caches 2
No ratings yet
l08 Caches 2
39 pages
Ca Q,,a 4TH Sem
No ratings yet
Ca Q,,a 4TH Sem
18 pages
Advanced Cache Optimizations - : Adapted From Patterson and Hennessey (Morgan Kauffman Pubs)
No ratings yet
Advanced Cache Optimizations - : Adapted From Patterson and Hennessey (Morgan Kauffman Pubs)
12 pages
Advanced Cache Optimization Techniques: Lecture 4E
No ratings yet
Advanced Cache Optimization Techniques: Lecture 4E
15 pages
Memory Hierarchies (Part 2) Review: The Memory Hierarchy
No ratings yet
Memory Hierarchies (Part 2) Review: The Memory Hierarchy
7 pages
10 Multi-Level Strategies: Assignments
No ratings yet
10 Multi-Level Strategies: Assignments
20 pages
Cache Miss Penalty Reduction
No ratings yet
Cache Miss Penalty Reduction
8 pages
L18 Cache Wrap Up
No ratings yet
L18 Cache Wrap Up
30 pages
Lec 4a
No ratings yet
Lec 4a
25 pages
Cache 2 Output
No ratings yet
Cache 2 Output
37 pages
Unit 4
No ratings yet
Unit 4
72 pages
CMP3010L09 MemoryII
No ratings yet
CMP3010L09 MemoryII
39 pages
UNIT2 Cahe-Opt
No ratings yet
UNIT2 Cahe-Opt
134 pages
Lecture: Cache Hierarchies: Topics: Cache Innovations (Sections B.1-B.3, 2.1)
No ratings yet
Lecture: Cache Hierarchies: Topics: Cache Innovations (Sections B.1-B.3, 2.1)
20 pages
Cache Memory
No ratings yet
Cache Memory
39 pages
10 Cacheperf
No ratings yet
10 Cacheperf
24 pages
Cache Performance Improving Cache Performance
No ratings yet
Cache Performance Improving Cache Performance
6 pages
Cache Optimization Techniques
No ratings yet
Cache Optimization Techniques
23 pages
DDCA Ch8
No ratings yet
DDCA Ch8
86 pages
Memory Hierarchy - Ways To Reduce Misses: DAP Spr. 98 ©UCB 1
No ratings yet
Memory Hierarchy - Ways To Reduce Misses: DAP Spr. 98 ©UCB 1
23 pages
Advanced Cache Optimization Techniques
No ratings yet
Advanced Cache Optimization Techniques
21 pages
Cacche
No ratings yet
Cacche
6 pages
Embedded System (E&TC) 5I
No ratings yet
Embedded System (E&TC) 5I
321 pages
Unit1 Microprocessor Generations
No ratings yet
Unit1 Microprocessor Generations
24 pages
Inside The Machine An Illustrated Introduction To Microprocessors and Computer Architecture 1st Edition Jon Stokes Available Instanly
75% (4)
Inside The Machine An Illustrated Introduction To Microprocessors and Computer Architecture 1st Edition Jon Stokes Available Instanly
172 pages
Genta ImDemo - Lua
No ratings yet
Genta ImDemo - Lua
9 pages
Mumbai University Question Papers
No ratings yet
Mumbai University Question Papers
6 pages
Computer Packages Notes
No ratings yet
Computer Packages Notes
147 pages
Juki - 740e
No ratings yet
Juki - 740e
24 pages
Intel Microprocessor Architecture Guide
No ratings yet
Intel Microprocessor Architecture Guide
75 pages
1 - Microp Evolution & 8085 Architecture - PPTX
No ratings yet
1 - Microp Evolution & 8085 Architecture - PPTX
31 pages
Question Bank ECE
No ratings yet
Question Bank ECE
50 pages
Department of Computer Science & Applications Panjab University
No ratings yet
Department of Computer Science & Applications Panjab University
24 pages
Ibm PC - What Was The First Multiprocessor x86 Motherboard - Retrocomputing Stack Exchange
No ratings yet
Ibm PC - What Was The First Multiprocessor x86 Motherboard - Retrocomputing Stack Exchange
5 pages
Processor History
No ratings yet
Processor History
4 pages
Chapter One - Introduction
No ratings yet
Chapter One - Introduction
56 pages
Evolution of Microprocessor With Its History
No ratings yet
Evolution of Microprocessor With Its History
4 pages
MP IMP Questions
No ratings yet
MP IMP Questions
2 pages
Module 4 MP
No ratings yet
Module 4 MP
11 pages
Computer Systems
No ratings yet
Computer Systems
24 pages
Lesson 4
No ratings yet
Lesson 4
29 pages
PC Maintenance & Virus Protection
No ratings yet
PC Maintenance & Virus Protection
40 pages
Computer Hardware For Radiology 1
No ratings yet
Computer Hardware For Radiology 1
6 pages
Digital Integrated Circuits
No ratings yet
Digital Integrated Circuits
47 pages
4-1 MOV Revisited: Chapter 4: Data Movement Instructions
No ratings yet
4-1 MOV Revisited: Chapter 4: Data Movement Instructions
15 pages
Edition Core Units For Btec Higher Nationals in Computing and It 1818738
No ratings yet
Edition Core Units For Btec Higher Nationals in Computing and It 1818738
80 pages
Shuttle - HOT557 v.1.32 (Motherboard) - User's Manual
No ratings yet
Shuttle - HOT557 v.1.32 (Motherboard) - User's Manual
35 pages
Intel 80586 (Pentium)
100% (3)
Intel 80586 (Pentium)
24 pages
Microprocessor Applications Guide
No ratings yet
Microprocessor Applications Guide
54 pages
Lecture 1 Microprocessor (Def, History Some Characteristics)
No ratings yet
Lecture 1 Microprocessor (Def, History Some Characteristics)
20 pages
Computer Maintenance Notes
No ratings yet
Computer Maintenance Notes
45 pages
List of Dell Poweredge Servers 1
No ratings yet
List of Dell Poweredge Servers 1
30 pages

CS530 Fall2015 Lecture6

Uploaded by

CS530 Fall2015 Lecture6

Uploaded by

9/14/15

Associative Cache Example

(Write strategy) – Giving priority to read misses over writes

Ten Advanced Optimizations L1 Size and Associativity

• Small and simple first level caches

Access time vs. size and associativity

• Assume a direct cache with 16 words and a

Energy per read vs. size and associativity

Copyright © 2012, Elsevier Inc. All rights reserved.

Nonblocking Caches Multibanked Caches

• Critical word first • When storing to a block that is already pending in

• Early restart • Do not apply to I/O addresses

• Effectiveness of these strategies depends

• Loop Interchange • Fetch two blocks on miss (include next

Compiler Prefetching Summary

• Combine with loop unrolling and software

You might also like