0% found this document useful (0 votes)

5 views44 pages

ch5 1

Chapter 5 discusses memory hierarchy, emphasizing the principle of locality which includes temporal and spatial locality. It details various memory technologies such as SRAM, DRAM, and flash storage, along with their performance characteristics and access times. The chapter also covers cache memory, its organization, performance measurement, and the impact of cache misses on CPU performance.

Uploaded by

macbay prince

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views44 pages

ch5 1

Uploaded by

macbay prince

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 44

Chapter 5

Memory Hierarchy
Part 1
Introduction
§

5.1
Principle of Locality
• Programs access a small proportion of their
address space at any time
• Temporal locality
– Items accessed recently are likely to be accessed again
soon
– e.g., instructions in a loop, induction variables
• Spatial locality
– Items near those accessed recently are likely to be
accessed soon
– E.g., sequential instruction access, array data
Memory hierarchy
Memory hierarchy: A structure that uses multiple levels of memories;
as the distance from the processor increases, the size of the
memories and the access time both increase.
Memory Hierarchy Levels
• Block (aka line): unit of copying
–
May be multiple words
• If accessed data is present in
upper level
–
Hit: access satisfied by upper level
•
Hit ratio: hits/accesses
• If accessed data is absent
–
Miss: block copied from lower level
•
Time taken: miss penalty
•
Miss ratio: misses/accesses
= 1 – hit ratio
–
Then accessed data supplied from upper
level

Figure 5.2
Technologies
§

5.2 Memory
Memory Technology
• Static RAM (SRAM)
0.5ns – 2.5ns, $2000 – $5000 per GB (2012)
• Dynamic RAM (DRAM)
50ns – 70ns, $20 – $75 per GB (2012)
• Magnetic disk
5ms – 20ms, $0.20 – $2 per GB (2012)
• Flash semiconductor memory
5,000–50,000 ns, $0.75–$1.00 per GB (2012)

• Ideal memory
– Access time of SRAM
– Capacity and cost/GB of disk
SRAM Technology
DRAM Technology
•
Data stored as a charge in a capacitor
–
Single transistor used to access the charge
–
Must periodically be refreshed
•
Read contents and write back
•
Performed on a DRAM “row”

FIGURE 5.4
Advanced DRAM Organization
• Bits in a DRAM are organized as a
rectangular array
– DRAM accesses an entire row
– Burst mode: supply successive words from a
row with reduced latency
• Double data rate (DDR) DRAM
– Transfer on rising and falling clock edges
• Quad data rate (QDR) DRAM
– Separate DDR inputs and outputs
DRAM Performance Factors
• Row buffer
– Allows several words to be read and refreshed in parallel
• Synchronous DRAM
– Allows for consecutive accesses in bursts without
needing to send each address
– Improves bandwidth
• DRAM banking
– Allows simultaneous access to multiple DRAMs
– Improves bandwidth
• dual inline memory modules (DIMMs)
Storage
§

6.4 Flash
Flash Storage
• Nonvolatile semiconductor storage
– 100× – 1000× faster than disk
– Smaller, lower power, more robust
– But more $/GB (between disk and DRAM)

EEPROM
Storage
§

6.3 Disk
Disk Storage
•
Nonvolatile, rotating magnetic storage
Disk Sectors and Access
• Each sector records
– Sector ID
– Data (512 bytes, 4096 bytes proposed)
– Error correcting code (ECC)
• Used to hide defects and recording errors
– Synchronization fields and gaps
• Access to a sector involves
– Queuing delay if other accesses are pending
– Seek: move the heads
– Rotational latency
– Data transfer
– Controller overhead
Rotational latency also called rotational delay:
The time required for the desired sector of a disk to rotate
under the read/write head; usually assumed to be half the
rotation time.

The average rotational latency at 5400 RPM is

Transfer time, Transfer rate
Disk Access Example
Given
512B sector, 15,000rpm, 4ms average seek time, 100MB/s
transfer rate, 0.2ms controller overhead, idle disk

Average read time =

4ms average seek time
+ ½ / (15,000/60) = 2ms rotational latency
+ 512 / 100MB/s = 0.005ms transfer time
+ 0.2ms controller delay
= 6.2ms

If actual average seek time is 1ms

Average read time = 3.2ms
Caches
§

5.3 The Basics of

Cache Memory
• Cache memory
– The level of the memory hierarchy closest to
the CPU
• Given accesses X1, …, Xn–1, Xn


How do we know
if the data is
present?

Where do we look?
FIGURE 5.7 The cache just before and just after a reference to a word Xn that is not initially in the cache
Direct Mapped Cache
•
Location determined by address
•
Direct mapped: only one choice
–
(Block address) modulo (#Blocks in cache)


#Blocks is
a power of
2

Use low-
order
address bits
FIGURE 5.8 A direct-mapped cache with eight entries showing the addresses of memory words
between 0 and 31 that map to the same cache locations.
Cache Example
Below is a sequence of nine memory references to an empty eight-
block cache, including the action for each reference. Since there are
eight blocks in the cache, the low-order 3 bits of an address give the
block number:
Cache Example
Index V Tag Data
000 N
001 N
010 N
011 N
100 N
101 N
110 N
111 N

Figure 5-9 a. The initial state of the cache after power-on

The cache is initially empty, with all valid bits (V entry in cache) turned
off (N). The tag field will contain only the upper portion of the address.

The Figure 5-9 b to e show the cache contents after each miss in
the sequence has been handled.
Cache Example
Word addr Binary addr Hit/miss Cache block
22 10 110 Miss 110

b. After handling a miss of address (10110two)

Cache Example
Word addr Binary addr Hit/miss Cache block
26 11 010 Miss 010

c. After handling a miss of address (11010two)

Cache Example
Word addr Binary addr Hit/miss Cache block
16 10 000 Miss 000

d. After handling a miss of address (10000two)

Cache Example
Word addr Binary addr Hit/miss Cache block
3 00 011 Miss 011

e. After handling a miss of address (00011two)

Cache Example
Word addr Binary addr Hit/miss Cache block
18 10 010 Miss 010

f. After handling a miss of address (10010two)

Address Subdivision

FIGURE 5.10 This cache holds 1024 words or 4 KiB.

EXAMPLE Bits in a Cache

How many total bits are required for a direct-mapped

cache with 16 KiB of data and four-word blocks, assuming
a 64-bit address?

16 KiB = 4096 words (212)

four words (22) there are 1024 (210) blocks.
Each block has 4 × 32 (128 bits) of data plus
a tag, which is 64 − 10 − 2 − 2 bits, plus a valid bit.

Thus, the complete cache size

is 210 ×(4×32 + (64 −10 − 2 −

2) +1)
= 210 ×179 = 179 Kibits = 22.375 KiB
For this cache, the total number of bits in the cache is about
1.4 times as many as needed just for the storage of the data.
EXAMPLE Mapping an Address to a Multiword Cache Block
Consider a cache with 64 blocks and a block size of 16 bytes. To
what block number does byte address 1200 map?

The block is given by formula

(Block address) modulo (Number of blocks in the cache)
where the address of the block = Byte address / Bytes per block

Notice that this block address is the block containing all

addresses between

[Byte address / Bytes per block] * Bytes per

block And

[Byte address / Bytes per block] * Bytes per block + (Bytes per block
– 1)
Thus, with 16 bytes per block, byte address 1200 is block

address [1200 / 6] = 75

which maps to cache block

number (75 modulo 64) = 11.

In fact, this block maps all addresses between 1200 and

1215.
Block Size Considerations
• Larger blocks should reduce miss rate
– Due to spatial locality
• But in a fixed-sized cache
– Larger blocks  fewer of them
• More competition  increased miss rate
– Larger blocks  pollution
• Larger miss penalty
– Can override benefit of reduced miss rate
– Early restart and critical-word-first can help
Cache Misses
• On cache hit, CPU proceeds normally
• On cache miss
– Stall the CPU pipeline
– Fetch block from next level of hierarchy
– Instruction cache miss
• Restart instruction fetch
– Data cache miss
• Complete data access
Write-Through
• On data-write hit, could just update the block in cache
– But then cache and memory would be inconsistent
• Write through: also update memory
• But makes writes take longer
– e.g., if base CPI = 1, 10% of instructions are stores, write to
memory takes 100 cycles
• Effective CPI = 1 + 0.1×100 = 11
• Solution: write buffer
– Holds data waiting to be written to memory
– CPU continues immediately
• Only stalls on write if write buffer is already full
Write-Back
• Alternative: On data-write hit, just update
the block in cache
– Keep track of whether each block is dirty
• When a dirty block is replaced
– Write it back to memory
– Can use a write buffer to allow replacing block
to be read first
split cache
A scheme in which a level of the memory
hierarchy is composed of two independent
caches that operate in parallel with each other,
with one handling instructions and one handling
data.
Performance
§

5.4 Measuring and Improving Cache

Measuring Cache Performance

Components of CPU time

•
Program execution cycles
•
Includes cache hit time
Measuring Cache Performance
Measuring Cache Performance
Example - Cache Performance
Assume the miss rate of an instruction cache (I-cache) is 2%
and the miss rate of the data cache (D-cache) is 4%. If a
processor has a CPI of 2 without any memory stalls, and the
miss penalty is 100 cycles for all misses, determine how
much faster a processor would run with a perfect cache that
never missed. Assume the frequency of all loads and stores
is 36%.

Miss cycles per instruction

Instruction miss cycles = I * 0.02 × 100 = 2 * I
Data miss cycles = I * 0.36 × 0.04 × 100 = 1.44 * I
Total number of memory-stall cycles is 2 * I + 1.44 * I = 3.44 * I
Actual CPI = 2 + 2 + 1.44 = 5.44

CPU time with stalls/CPU time with perfect cache

= (I * CPIstall * Clock cycle) / (I * CPIperfect * Clock cycle)= CPIstall / CPIperfect
= 5.44/2 = 2.72 times faster
Average Access Time
Hit time is also important for performance
Average memory access time (AMAT)
AMAT = Hit time + Miss rate × Miss penalty
Example:
Find the AMAT for a processor with a 1 ns clock cycle time,
a miss penalty of 20 clock cycles, a miss rate of 0.05 misses
per instruction, and a cache access time (including hit
detection) of 1 clock cycle. Assume that the read and write
miss penalties are the same and ignore other write stalls.

AMAT = 1 + 0.05 × 20 = 2 clock cycle or 2ns

Performance Summary
•
When CPU performance increased
–
Miss penalty becomes more significant
•
Decreasing base CPI
–
Greater proportion of time spent on memory stalls
•
Increasing clock rate
–
Memory stalls account for more CPU cycles
•
Can’t neglect cache behavior when
evaluating system performance

Card List - Nemesis
No ratings yet
Card List - Nemesis
69 pages
Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
75% (4)
Chapter 05 Computer Organization and Design, Fifth Edition: The Hardware/Software Interface (The Morgan Kaufmann Series in Computer Architecture and Design) 5th Edition
105 pages
Mastering Proxmox - Second Edition
From Everand
Mastering Proxmox - Second Edition
Wasim Ahmed
No ratings yet
CH10 - Memory Hierarchy
No ratings yet
CH10 - Memory Hierarchy
106 pages
5 1
No ratings yet
5 1
39 pages
Computer Architecture: Memory Organization
No ratings yet
Computer Architecture: Memory Organization
65 pages
Unit V
No ratings yet
Unit V
44 pages
CA Chap5 Memory
No ratings yet
CA Chap5 Memory
91 pages
Chap 6
No ratings yet
Chap 6
48 pages
Computer Org and Arch: R.Magesh
No ratings yet
Computer Org and Arch: R.Magesh
48 pages
Memory Hierarchy Design
No ratings yet
Memory Hierarchy Design
115 pages
פרק ט - גדול ומהיר - ניצול היררכיות זיכרון
No ratings yet
פרק ט - גדול ומהיר - ניצול היררכיות זיכרון
77 pages
Computer Organization and Architecture (AT70.01) : Comp. Sc. and Inf. MGMT
No ratings yet
Computer Organization and Architecture (AT70.01) : Comp. Sc. and Inf. MGMT
49 pages
05) Cache Memory Introduction
No ratings yet
05) Cache Memory Introduction
20 pages
Chapter5 PDF
No ratings yet
Chapter5 PDF
95 pages
ACA Unit-5
No ratings yet
ACA Unit-5
54 pages
CODch 7 Slides
No ratings yet
CODch 7 Slides
49 pages
Cache1 2
No ratings yet
Cache1 2
30 pages
UNIT2 Cahe-Opt
No ratings yet
UNIT2 Cahe-Opt
134 pages
CMP3010L08 Memory
No ratings yet
CMP3010L08 Memory
45 pages
ECE4680 Computer Organization and Architecture Memory Hierarchy: Cache System
No ratings yet
ECE4680 Computer Organization and Architecture Memory Hierarchy: Cache System
25 pages
Chapter 5 Large and Fast Exploiting Memory Hierarchy
No ratings yet
Chapter 5 Large and Fast Exploiting Memory Hierarchy
96 pages
Computer Architecture: Memory Hierarchy Design
No ratings yet
Computer Architecture: Memory Hierarchy Design
60 pages
Chapter 5 - Memory
No ratings yet
Chapter 5 - Memory
44 pages
Lecture 13 - Introduction To Cache
No ratings yet
Lecture 13 - Introduction To Cache
47 pages
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
No ratings yet
Advanced Computer Architecture: BY Dr. Radwa M. Tawfeek
32 pages
Chapter5 - Direct Mapped Caches
No ratings yet
Chapter5 - Direct Mapped Caches
11 pages
Chapter 05
No ratings yet
Chapter 05
105 pages
Cache Mapping
100% (1)
Cache Mapping
44 pages
Chapter 5 Large and Fast Exploiting Memory Hierarchy
No ratings yet
Chapter 5 Large and Fast Exploiting Memory Hierarchy
101 pages
9 - Cache
No ratings yet
9 - Cache
58 pages
Lec8 - Caches
No ratings yet
Lec8 - Caches
55 pages
Chapter 6 Cache Memory
No ratings yet
Chapter 6 Cache Memory
22 pages
Computer Organization and Architecture (AT70.01)
No ratings yet
Computer Organization and Architecture (AT70.01)
49 pages
Cache Memory: A Safe Place For Hiding or Storing Things
No ratings yet
Cache Memory: A Safe Place For Hiding or Storing Things
34 pages
02b Cache
No ratings yet
02b Cache
48 pages
Cau 6 Cache
No ratings yet
Cau 6 Cache
25 pages
CS2115 Chapter-6
No ratings yet
CS2115 Chapter-6
45 pages
Lecture 04 IS064
No ratings yet
Lecture 04 IS064
41 pages
Chapter 6
No ratings yet
Chapter 6
37 pages
Unit Iv
No ratings yet
Unit Iv
61 pages
Cache Memory: A Safe Place For Hiding or Storing Things
100% (1)
Cache Memory: A Safe Place For Hiding or Storing Things
34 pages
Lecture 8
No ratings yet
Lecture 8
33 pages
Chapter 2z
No ratings yet
Chapter 2z
54 pages
Lect12 Cache
No ratings yet
Lect12 Cache
39 pages
Ca Mod 2
No ratings yet
Ca Mod 2
40 pages
Module 4: Memory System Organization & Architecture
No ratings yet
Module 4: Memory System Organization & Architecture
97 pages
Cache Memory
No ratings yet
Cache Memory
51 pages
13 - Large and Fast Exploiting Memory Hierarchy Final
No ratings yet
13 - Large and Fast Exploiting Memory Hierarchy Final
118 pages
CH04 COA10e
No ratings yet
CH04 COA10e
41 pages
Lec8 Memory
No ratings yet
Lec8 Memory
17 pages
Cache + Associations Ch-4
No ratings yet
Cache + Associations Ch-4
52 pages
5 Memory Hierarchy
No ratings yet
5 Memory Hierarchy
99 pages
15 Memory Hierarchy FINAL
No ratings yet
15 Memory Hierarchy FINAL
29 pages
Cache Memory
No ratings yet
Cache Memory
47 pages
CH 4.ppt Type I
No ratings yet
CH 4.ppt Type I
60 pages
BiD 05
No ratings yet
BiD 05
97 pages
25 e 50 Beb 5 Aad 8 F 60
No ratings yet
25 e 50 Beb 5 Aad 8 F 60
49 pages
The Tech Interview Playbook: From DSA to System Design
From Everand
The Tech Interview Playbook: From DSA to System Design
Chinmoy Mukherjee
No ratings yet
Storage Area Networks For Dummies
From Everand
Storage Area Networks For Dummies
Christopher Poelker
3.5/5 (2)
All My IT Tech Posts
From Everand
All My IT Tech Posts
Stephen Edwards
No ratings yet
The Hour Between Dog and Wolf
No ratings yet
The Hour Between Dog and Wolf
131 pages
Assembly&C Language
No ratings yet
Assembly&C Language
6 pages
ch2 1
No ratings yet
ch2 1
54 pages
ch4 2
No ratings yet
ch4 2
42 pages
ch2 2
No ratings yet
ch2 2
36 pages
Fundamentals of Blockchains
No ratings yet
Fundamentals of Blockchains
41 pages
Office 2010 Bible 3rd Edition John Walkenbach Instant Download
100% (2)
Office 2010 Bible 3rd Edition John Walkenbach Instant Download
47 pages
Noc19 mm11 Assignment2
No ratings yet
Noc19 mm11 Assignment2
9 pages
Inferences Example
No ratings yet
Inferences Example
4 pages
0068 - 0419 - V10 Datasheet Tetra SC20 Series - ENG
No ratings yet
0068 - 0419 - V10 Datasheet Tetra SC20 Series - ENG
2 pages
Chapter 2 - Relational Data Model
No ratings yet
Chapter 2 - Relational Data Model
26 pages
Log 20241006
No ratings yet
Log 20241006
6 pages
SFRA 5000: Sweep Frequency Response Analyzer
No ratings yet
SFRA 5000: Sweep Frequency Response Analyzer
4 pages
Everything Changed! New Cisco Collaboration Licensing You Can't Afford To Miss - Razvan Atimariti PDF
No ratings yet
Everything Changed! New Cisco Collaboration Licensing You Can't Afford To Miss - Razvan Atimariti PDF
39 pages
1.about The Ford Fulkerson Method
No ratings yet
1.about The Ford Fulkerson Method
6 pages
Veeam Vmce v12 On-Demand Course Description
No ratings yet
Veeam Vmce v12 On-Demand Course Description
4 pages
Kang Benchmarking Self-Supervised Learning On Diverse Pathology Datasets CVPR 2023 Paper
No ratings yet
Kang Benchmarking Self-Supervised Learning On Diverse Pathology Datasets CVPR 2023 Paper
11 pages
Prathamesh's Resume
No ratings yet
Prathamesh's Resume
1 page
Universe Pick Basic
100% (1)
Universe Pick Basic
306 pages
(Rtcamp) Associate Software Engineer Job Application Form 2025
No ratings yet
(Rtcamp) Associate Software Engineer Job Application Form 2025
4 pages
Topic:: Continous and Procedural Assignment and Operators
No ratings yet
Topic:: Continous and Procedural Assignment and Operators
11 pages
Drive Test Engineer - V2.0 - ENG - FG - TEL-Q6211 - PMKVY
No ratings yet
Drive Test Engineer - V2.0 - ENG - FG - TEL-Q6211 - PMKVY
118 pages
Bmapako@science Uz Ac ZW
No ratings yet
Bmapako@science Uz Ac ZW
4 pages
Mca Project - Synopsis
No ratings yet
Mca Project - Synopsis
11 pages
Manual de Reparación Pedal Boss GT-1
No ratings yet
Manual de Reparación Pedal Boss GT-1
29 pages
Deep Learning For Face Recognition: A Critical Analysis: Andrew Jason Shepley
No ratings yet
Deep Learning For Face Recognition: A Critical Analysis: Andrew Jason Shepley
27 pages
Check If The Heap Property Is Satisfied (How Do You Write A Code To Check This?) 23, 10, 9, 2, 10, 8, 4, 11, 7, 5, 6, 3, 2, 1, 0, 9
No ratings yet
Check If The Heap Property Is Satisfied (How Do You Write A Code To Check This?) 23, 10, 9, 2, 10, 8, 4, 11, 7, 5, 6, 3, 2, 1, 0, 9
2 pages
UPG Sustainability Leadership - Student Handbook
No ratings yet
UPG Sustainability Leadership - Student Handbook
16 pages
710 Manual 20535
No ratings yet
710 Manual 20535
82 pages
CT053 3 1 FDD Case Study 2020
No ratings yet
CT053 3 1 FDD Case Study 2020
2 pages
Dbaudio DWG 2d Cad Package v1.3 Release Notes
No ratings yet
Dbaudio DWG 2d Cad Package v1.3 Release Notes
7 pages
Lecture Note Chapter 10
No ratings yet
Lecture Note Chapter 10
26 pages
Riviera RLCD 32paf
No ratings yet
Riviera RLCD 32paf
45 pages
Ecdm T1
No ratings yet
Ecdm T1
49 pages
Ir Thermal Sensor Raspberry Manual
No ratings yet
Ir Thermal Sensor Raspberry Manual
9 pages

ch5 1

Uploaded by

ch5 1

Uploaded by

Chapter 5

The average rotational latency at 5400 RPM is

Average read time =

If actual average seek time is 1ms

5.3 The Basics of

Figure 5-9 a. The initial state of the cache after power-on

Index V Tag Data

b. After handling a miss of address (10110two)

Index V Tag Data

c. After handling a miss of address (11010two)

Index V Tag Data

d. After handling a miss of address (10000two)

Index V Tag Data

e. After handling a miss of address (00011two)

Index V Tag Data

f. After handling a miss of address (10010two)

FIGURE 5.10 This cache holds 1024 words or 4 KiB.

How many total bits are required for a direct-mapped

16 KiB = 4096 words (212)

Thus, the complete cache size

is 210 ×(4×32 + (64 −10 − 2 −

The block is given by formula

Notice that this block address is the block containing all

[Byte address / Bytes per block] * Bytes per

which maps to cache block

number (75 modulo 64) = 11.

In fact, this block maps all addresses between 1200 and

5.4 Measuring and Improving Cache

Components of CPU time

Miss cycles per instruction

CPU time with stalls/CPU time with perfect cache

AMAT = 1 + 0.05 × 20 = 2 clock cycle or 2ns

You might also like