0% found this document useful (0 votes)

3 views37 pages

Cache 2

cache

Uploaded by

Rudransh choudhary

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views37 pages

Cache 2

cache

Uploaded by

Rudransh choudhary

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 37

Implementation of a Set-Associative Cache

A d dr es s
31 3 0 1 2 11 10 9 8 3 2 1 0

22 8

4 - to - 1 m ultip le xo r

H it D a ta

4-way set-associative cache with 4 comparators and one 4-to-1

multiplexor:size of cache is 1K blocks = 256 sets * 4-block set size
Way Predicting Caches
• Use processor address to index into way prediction table
• Look in predicted way at given index, then:

HIT MISS

Return copy Look in other way

of data from
cache

MISS
SLOW HIT
(change entry in
prediction table) Read block of data from
next level of cache
Reducing Miss rates by Compiler
Optimizations
Merging Arrays
int val[SIZE]; struct record{
int key[SIZE]; int val;
int key;
for (i=0; i<SIZE; i++){ };
key[i] = newkey; struct record records[SIZE];
val[i]++;
} for (i=0; i<SIZE; i++){
records[i].key = newkey;
records[i].val++;
}

• Reduces conflicts between val & key and improves spatial

locality
Loop Fusion
for (i = 0; i < N; i++)
for (j = 0; j < N; j++)
a[i][j] = 1/b[i][j] * c[i][j];
for (i = 0; i < N; i++)
for (j = 0; j < N; j++)
d[i][j] = a[i][j] + c[i][j];

for (i = 0; i < N; i++) Reference can be directly to register

for (j = 0; j < N; j++){
a[i][j] = 1/b[i][j] * c[i][j];
d[i][j] = a[i][j] + c[i][j];
}

Splitted loops: every access to a and c misses. Fused loops: only 1st
access misses. Improves temporal locality
Summary of Compiler Optimizations
to Reduce Cache Misses
vpenta (nasa7)

gmty (nasa7)

tomcatv

btrix (nasa7)

mxm (nasa7)

spice

cholesky (nasa7)

compress

1 1.5 2 2.5 3
Performance Improvement

merged arrays loop interchange loop fusion blocking

Reducing misses by hardware
prefetching of Instruction and Data
Hardware Instruction Prefetching
• Instruction prefetch in Alpha AXP 21064
– Fetch two blocks on a miss; the requested block (i) and the
next consecutive block (i+1)
– Requested block placed in cache, and next block in
instruction stream buffer
– If miss in cache but hit in stream buffer, move stream buffer
block into cache and prefetch next block (i+2)
Prefetched
Req instruction block
block Stream
Buffer
CPU
L1 Unified L2
Instruction Req Cache
RF block
Hardware Data Prefetching
• Prefetch-on-miss:
– Prefetch b + 1 upon miss on b

• One Block Lookahead (OBL) scheme

– Initiate prefetch for block b + 1 when block b is accessed
– Can extend to N block lookahead

• Strided prefetch
– If observed sequence of accesses to block: b, b+N, b+2N, then prefetch
b+3N etc.
Performance impact of prefetching
Performance Improvement

2.20 1.97
2.00
1.80
1.60 1.45 1.49
1.40
1.26 1.29 1.32
1.40 1.18 1.20 1.21
1.16
1.20
1.00

u
p

el
3d

ke
e

im
c

id
cf

s
pl
re

ca
ga

gr
m

ua
sw
m

ap
ce
ga

m
fa

eq
fa
wu

SPECint2000 SPECfp2000
Translation Lookaside Buffer
8KB

28 28

256 Entries 28

4MB
Inclusive Cache

Consider a CPU with two levels of cache memory. Now, suppose a block X is requested. If the block is found in
the L1 cache, then the data is read from the L1 cache and consumed by the CPU core. However, if the block is
not found in the L1 cache, but is present in L2, then it’s fetched from the L2 cache and placed in L1.

If the L1 cache is also full, a block is evicted from L1 to make room for the newer block while the L2 cache is
unchanged. However, if the data block is found neither in L1 and L2, then it’s fetched from the memory and placed
in both the cache levels. In this case, if the L2 cache is full and a block is evicted to make room for the new data,
the L2 cache sends an invalidation request to the L1 cache, so the evicted block is removed from there as well.
Due to this invalidation procedure, an inclusive cache is slightly slower than a non-inclusive or exclusive cache.
Exclusive Cache

Now, let’s consider the same example with non-inclusive or exclusive cache. Let’s suppose that the CPU core
sends a request for block X. If block X is found in L1, then it’s read and consumed by the core from that location.
However, if block X is not found in L1, but present in L2, then it’s moved from L2 to L1. If there’s no room in L1,
one block is evicted from L1 and stored in L2. This is the only way L2 cache is populated, and as such acts as a
victim cache. If block X isn’t found in L1 or L2, then it’s fetched from the memory and placed in just L1.
PSEUDO ASSOCIATIVE / COLUMN ASSOCIATIVE CACHE

Lecture 5 Cache Optimization
No ratings yet
Lecture 5 Cache Optimization
25 pages
Cache Misses
No ratings yet
Cache Misses
8 pages
10 Caches
No ratings yet
10 Caches
34 pages
Cache Optimization Techniques
No ratings yet
Cache Optimization Techniques
4 pages
Cache Performance Optimization Guide
No ratings yet
Cache Performance Optimization Guide
6 pages
Lec 34
No ratings yet
Lec 34
26 pages
Caches and Memory
No ratings yet
Caches and Memory
65 pages
Lecture: Cache Hierarchies: Topics: Cache Innovations (Sections B.1-B.3, 2.1)
No ratings yet
Lecture: Cache Hierarchies: Topics: Cache Innovations (Sections B.1-B.3, 2.1)
20 pages
Memory Hierarchy - Ways To Reduce Misses: DAP Spr. 98 ©UCB 1
No ratings yet
Memory Hierarchy - Ways To Reduce Misses: DAP Spr. 98 ©UCB 1
23 pages
Cache and Caching: Electrical and Electronic Engineering
No ratings yet
Cache and Caching: Electrical and Electronic Engineering
15 pages
Cache 2 Output
No ratings yet
Cache 2 Output
37 pages
Cache Optimization Techniques
No ratings yet
Cache Optimization Techniques
23 pages
Cache and Caching: Electrical and Electronic Engineering
No ratings yet
Cache and Caching: Electrical and Electronic Engineering
15 pages
5.2 Eleven Advanced Optimizations of Cache Performance
No ratings yet
5.2 Eleven Advanced Optimizations of Cache Performance
13 pages
Cache Performance Improving Cache Performance
No ratings yet
Cache Performance Improving Cache Performance
6 pages
Stanford Advanced Caches
No ratings yet
Stanford Advanced Caches
46 pages
Onur 447 Spring15 Lecture19 High Performance Caches Afterlecture
No ratings yet
Onur 447 Spring15 Lecture19 High Performance Caches Afterlecture
57 pages
Cache Presentation
No ratings yet
Cache Presentation
45 pages
EGC121lect19 Cache Prefetching
No ratings yet
EGC121lect19 Cache Prefetching
22 pages
Computer Organization and Architecture
No ratings yet
Computer Organization and Architecture
12 pages
R RRRRRRRR Final
No ratings yet
R RRRRRRRR Final
28 pages
Memory Hierarchy for Engineers
No ratings yet
Memory Hierarchy for Engineers
15 pages
Memory Cache
No ratings yet
Memory Cache
18 pages
Lecture 7
No ratings yet
Lecture 7
21 pages
09 Caches Tlbs
No ratings yet
09 Caches Tlbs
33 pages
Review #1/3: Pipelining & Performance
No ratings yet
Review #1/3: Pipelining & Performance
7 pages
Advanced Cache Strategies
No ratings yet
Advanced Cache Strategies
27 pages
Cache
No ratings yet
Cache
34 pages
Cache Optimization Techniques
No ratings yet
Cache Optimization Techniques
30 pages
4 Caches With Notes
No ratings yet
4 Caches With Notes
121 pages
F11 - Cache Aware Programming For Multicores
No ratings yet
F11 - Cache Aware Programming For Multicores
20 pages
25 e 50 Beb 5 Aad 8 F 60
No ratings yet
25 e 50 Beb 5 Aad 8 F 60
49 pages
Coa PPT
No ratings yet
Coa PPT
158 pages
Cache Impact On Performance: An Example: Assuming The Following Execution and Cache Parameters
No ratings yet
Cache Impact On Performance: An Example: Assuming The Following Execution and Cache Parameters
32 pages
15IF11 Multicore B
No ratings yet
15IF11 Multicore B
36 pages
Caching: Acknowledgements
No ratings yet
Caching: Acknowledgements
6 pages
Lecture 16: Cache Memories - Last Time - Today
No ratings yet
Lecture 16: Cache Memories - Last Time - Today
32 pages
Lec14 Demandpage
No ratings yet
Lec14 Demandpage
25 pages
Cache Memory: A Safe Place For Hiding or Storing Things
100% (1)
Cache Memory: A Safe Place For Hiding or Storing Things
34 pages
Computer Architecture: Cache Design
No ratings yet
Computer Architecture: Cache Design
61 pages
COMP 740: Computer Architecture and Implementation: Montek Singh
No ratings yet
COMP 740: Computer Architecture and Implementation: Montek Singh
41 pages
Cache Memory Essentials
No ratings yet
Cache Memory Essentials
36 pages
Caches - 6.004
No ratings yet
Caches - 6.004
8 pages
ch2 Appb
No ratings yet
ch2 Appb
58 pages
202004221613338445rohit Engg Advance Opt of Cache
No ratings yet
202004221613338445rohit Engg Advance Opt of Cache
9 pages
Comp Org Exam 3 Cheat Sheet
No ratings yet
Comp Org Exam 3 Cheat Sheet
3 pages
CMP3010L09 MemoryII
No ratings yet
CMP3010L09 MemoryII
39 pages
Improving Cache Performance Reducing Misses
No ratings yet
Improving Cache Performance Reducing Misses
9 pages
Memory 2
No ratings yet
Memory 2
31 pages
Microprocessor System Design: Error Correcting Codes Principle of Locality Cache Architecture
No ratings yet
Microprocessor System Design: Error Correcting Codes Principle of Locality Cache Architecture
28 pages
Lec 19
No ratings yet
Lec 19
19 pages
5.5 Cache Organization
No ratings yet
5.5 Cache Organization
8 pages
Week12 Updated
No ratings yet
Week12 Updated
60 pages
Lab 8
No ratings yet
Lab 8
10 pages
Module4 CAche Performance
No ratings yet
Module4 CAche Performance
40 pages
L07 MemoryII
No ratings yet
L07 MemoryII
27 pages
Cache Memory: A Safe Place For Hiding or Storing Things
No ratings yet
Cache Memory: A Safe Place For Hiding or Storing Things
34 pages
Lecture 12: Cache Innovations
No ratings yet
Lecture 12: Cache Innovations
17 pages
Chapter # 05
No ratings yet
Chapter # 05
42 pages
Axure RP Pro Tutorial
No ratings yet
Axure RP Pro Tutorial
11 pages
0795 A - LEVEL Computer SC P2
No ratings yet
0795 A - LEVEL Computer SC P2
4 pages
Docker & Security: Barth@stocard - de Mluft@ernw - de
No ratings yet
Docker & Security: Barth@stocard - de Mluft@ernw - de
52 pages
Thời gian làm bài: 60 phút (Không kể giao đề)
No ratings yet
Thời gian làm bài: 60 phút (Không kể giao đề)
4 pages
Dante Experiment#1
No ratings yet
Dante Experiment#1
15 pages
English Fal Grade 12 Lesson Plans
No ratings yet
English Fal Grade 12 Lesson Plans
104 pages
Math Problems for Competitive Exams
No ratings yet
Math Problems for Competitive Exams
5 pages
ACT Lab Manual
No ratings yet
ACT Lab Manual
6 pages
Cuttack PDF
No ratings yet
Cuttack PDF
119 pages
Unsung Ancient African Indigenous Heroines and Heros
No ratings yet
Unsung Ancient African Indigenous Heroines and Heros
27 pages
BPA for Oracle Receivables Users
No ratings yet
BPA for Oracle Receivables Users
34 pages
MS - Neuro
No ratings yet
MS - Neuro
10 pages
Korean EFL Learners' Vocabulary Study
No ratings yet
Korean EFL Learners' Vocabulary Study
15 pages
Editing Bhagavad-gītā for Devotees
No ratings yet
Editing Bhagavad-gītā for Devotees
56 pages
Engineering Course Catalog
No ratings yet
Engineering Course Catalog
7 pages
Lesson 1 3 Decimal Places Recap Lesson
No ratings yet
Lesson 1 3 Decimal Places Recap Lesson
2 pages
Singular and Plural Nouns Advanced Worksheets
75% (4)
Singular and Plural Nouns Advanced Worksheets
9 pages
Linked List: CSD-202 Data Structure and Algorithms
100% (1)
Linked List: CSD-202 Data Structure and Algorithms
40 pages
Loftware Cloud and NiceLabel LMS Overview Presentation - For CHANNEL
No ratings yet
Loftware Cloud and NiceLabel LMS Overview Presentation - For CHANNEL
55 pages
Kendriya Vidyalaya Sangathan Patna Region
No ratings yet
Kendriya Vidyalaya Sangathan Patna Region
5 pages
GraphWorX64 - Setting Dynamic Local Aliases
No ratings yet
GraphWorX64 - Setting Dynamic Local Aliases
1 page
Cultural Performances and Indigenous Practices
No ratings yet
Cultural Performances and Indigenous Practices
14 pages
Auxiliary Verb: By: Neng Puja Nurmalasari
No ratings yet
Auxiliary Verb: By: Neng Puja Nurmalasari
16 pages
Gplus Pcl6 Driver v29x Ig en
No ratings yet
Gplus Pcl6 Driver v29x Ig en
80 pages
DMF Lab Manual
No ratings yet
DMF Lab Manual
31 pages
8086 Microprocessor Guide
No ratings yet
8086 Microprocessor Guide
133 pages
Beginning JavaScript Und CSS Development With Jquery 1. Ed Edition Richard York Download
100% (1)
Beginning JavaScript Und CSS Development With Jquery 1. Ed Edition Richard York Download
44 pages
Wilkinson ARABPERSIANLANDRELATIONSHIPS 1973
No ratings yet
Wilkinson ARABPERSIANLANDRELATIONSHIPS 1973
13 pages
Materi Grammar
No ratings yet
Materi Grammar
4 pages
Dissertation Writers London
100% (1)
Dissertation Writers London
6 pages

Cache 2

Uploaded by

Cache 2

Uploaded by

Implementation of a Set-Associative Cache

In d ex V Tag D a ta V Tag D a ta V T ag D ata V T ag D ata

4-way set-associative cache with 4 comparators and one 4-to-1

Return copy Look in other way

• Reduces conflicts between val & key and improves spatial

for (i = 0; i < N; i++) Reference can be directly to register

merged arrays loop interchange loop fusion blocking

• One Block Lookahead (OBL) scheme

You might also like