Chapter 5 Large and Fast Exploiting Memory Hierarchy
Chapter 5 Large and Fast Exploiting Memory Hierarchy
5th
Edition
The Hardware/Software Interface
Chapter 5
Large and Fast:
Exploiting Memory
Hierarchy
§5.1 Introduction
Principle of Locality
Programs access a small proportion of
their address space at any time
Temporal locality
Items accessed recently are likely to be
accessed again soon
e.g., instructions in a loop, induction variables
Spatial locality
Items near those accessed recently are likely
to be accessed soon
E.g., sequential instruction access, array data
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 2
Taking Advantage of Locality
Memory hierarchy
Store everything on disk
Copy recently accessed (and nearby)
items from disk to smaller DRAM memory
Main memory
Copy more recently accessed (and nearby)
items from DRAM to smaller SRAM
memory
Cache memory attached to CPU
100
1989 4Mbit $50000
50
1992 16Mbit $15000
0
1996 64Mbit $10000
'80 '83 '85 '89 '92 '96 '98 '00 '04 '07
1998 128Mbit $4000
2000 256Mbit $1000
2004 512Mbit $250
2007 1Gbit $50
How do we know if
the data is present?
Where do we look?
Memory address
/ block size
#Blocks is a
power of 2
Use low-order
address bits
31 10 9 4 3 0
Tag Index Offset
22 bits 6 bits 4 bits
Read misses
Stall CPU, fetch block from memory, deliver to cache,
restart
Write hits
update data both in cache and memory (write-through)
Write misses
read the block into the cache, then write the word
Cache Misses
On cache read hit, CPU proceeds normally
On cache read miss
Stall the CPU pipeline
Fetch block from next level of hierarchy
Instruction cache miss
Restart instruction fetch
Data cache miss
Complete data access
Direct mapped
Block Cache Hit/miss Cache content after access
Block Cache block address index
address
0 1 2 3
0 0 miss Mem[0]
0 0 mod 4 = 0
8 0 miss Mem[8]
6 6 mod 4 = 2
0 0 miss Mem[0]
8 8 mod 4 = 0
6 2 miss Mem[0] Mem[6]
8 0 miss Mem[8] Mem[6]
Fully associative
Block Hit/miss Cache content after access
address
0 miss Mem[0]
8 miss Mem[0] Mem[8]
0 hit Mem[0] Mem[8]
6 miss Mem[0] Mem[8] Mem[6]
8 hit Mem[0] Mem[8] Mem[6]
optimization for
memory access
shared by all
processes
One page
table per
process
Is it possible that “TLB hit but Page table miss (page fault)”?
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 64
TLB Misses
If page is in memory
Load the PTE from memory and retry
Could be handled in hardware
Can get complex for more complicated page table
structures
Or in software
Raise a special exception, with optimized handler
If page is not in memory (page fault)
OS handles fetching the page and updating
the page table
Then restart the faulting instruction
locations
Besides, since all the processes have the same virtual
address space, OS typically needs to flush cache
when doing context switching (if virtual address tag is
used)
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 69
Memory Protection
Different tasks can share parts of their
virtual address spaces
But need to protect against errant access
Requires OS assistance
Hardware support for OS protection
Privileged supervisor mode (aka kernel mode)
Privileged instructions
Page tables and other state information only
accessible in supervisor mode
System call exception (e.g., syscall in MIPS)
Chapter 5 — Large and Fast: Exploiting Memory Hierarchy — 70
§5.5 A Common Framework for Memory Hierarchies
The Memory Hierarchy
The BIG Picture
Hardware caches
Reduce comparisons to reduce cost
Virtual memory
Full table lookup makes full associativity feasible
Benefit in reduced miss rate
resumes next VM
If a VM requires timer interrupts
VMM emulates a virtual timer interrupt for VM when
31 10 9 4 3 0
Tag Index Offset
18 bits 10 bits 4 bits
Read/Write Read/Write
Valid Valid
32 32
Address Address
32 Cache 128 Memory
CPU Write Data Write Data
32 128
Read Data Read Data
Ready Ready
Multiple cycles
per access
Could
partition into
separate
states to
reduce clock
cycle time
3 CPU A writes 1 to X 1 0 1