ch5 1
ch5 1
Memory Hierarchy
Part 1
Introduction
§
5.1
Principle of Locality
• Programs access a small proportion of their
address space at any time
• Temporal locality
– Items accessed recently are likely to be accessed again
soon
– e.g., instructions in a loop, induction variables
• Spatial locality
– Items near those accessed recently are likely to be
accessed soon
– E.g., sequential instruction access, array data
Memory hierarchy
Memory hierarchy: A structure that uses multiple levels of memories;
as the distance from the processor increases, the size of the
memories and the access time both increase.
Memory Hierarchy Levels
• Block (aka line): unit of copying
–
May be multiple words
• If accessed data is present in
upper level
–
Hit: access satisfied by upper level
•
Hit ratio: hits/accesses
• If accessed data is absent
–
Miss: block copied from lower level
•
Time taken: miss penalty
•
Miss ratio: misses/accesses
= 1 – hit ratio
–
Then accessed data supplied from upper
level
Figure 5.2
Technologies
§
5.2 Memory
Memory Technology
• Static RAM (SRAM)
0.5ns – 2.5ns, $2000 – $5000 per GB (2012)
• Dynamic RAM (DRAM)
50ns – 70ns, $20 – $75 per GB (2012)
• Magnetic disk
5ms – 20ms, $0.20 – $2 per GB (2012)
• Flash semiconductor memory
5,000–50,000 ns, $0.75–$1.00 per GB (2012)
• Ideal memory
– Access time of SRAM
– Capacity and cost/GB of disk
SRAM Technology
DRAM Technology
•
Data stored as a charge in a capacitor
–
Single transistor used to access the charge
–
Must periodically be refreshed
•
Read contents and write back
•
Performed on a DRAM “row”
FIGURE 5.4
Advanced DRAM Organization
• Bits in a DRAM are organized as a
rectangular array
– DRAM accesses an entire row
– Burst mode: supply successive words from a
row with reduced latency
• Double data rate (DDR) DRAM
– Transfer on rising and falling clock edges
• Quad data rate (QDR) DRAM
– Separate DDR inputs and outputs
DRAM Performance Factors
• Row buffer
– Allows several words to be read and refreshed in parallel
• Synchronous DRAM
– Allows for consecutive accesses in bursts without
needing to send each address
– Improves bandwidth
• DRAM banking
– Allows simultaneous access to multiple DRAMs
– Improves bandwidth
• dual inline memory modules (DIMMs)
Storage
§
6.4 Flash
Flash Storage
• Nonvolatile semiconductor storage
– 100× – 1000× faster than disk
– Smaller, lower power, more robust
– But more $/GB (between disk and DRAM)
EEPROM
Storage
§
6.3 Disk
Disk Storage
•
Nonvolatile, rotating magnetic storage
Disk Sectors and Access
• Each sector records
– Sector ID
– Data (512 bytes, 4096 bytes proposed)
– Error correcting code (ECC)
• Used to hide defects and recording errors
– Synchronization fields and gaps
• Access to a sector involves
– Queuing delay if other accesses are pending
– Seek: move the heads
– Rotational latency
– Data transfer
– Controller overhead
Rotational latency also called rotational delay:
The time required for the desired sector of a disk to rotate
under the read/write head; usually assumed to be half the
rotation time.
How do we know
if the data is
present?
Where do we look?
FIGURE 5.7 The cache just before and just after a reference to a word Xn that is not initially in the cache
Direct Mapped Cache
•
Location determined by address
•
Direct mapped: only one choice
–
(Block address) modulo (#Blocks in cache)
#Blocks is
a power of
2
Use low-
order
address bits
FIGURE 5.8 A direct-mapped cache with eight entries showing the addresses of memory words
between 0 and 31 that map to the same cache locations.
Cache Example
Below is a sequence of nine memory references to an empty eight-
block cache, including the action for each reference. Since there are
eight blocks in the cache, the low-order 3 bits of an address give the
block number:
Cache Example
Index V Tag Data
000 N
001 N
010 N
011 N
100 N
101 N
110 N
111 N
The cache is initially empty, with all valid bits (V entry in cache) turned
off (N). The tag field will contain only the upper portion of the address.
The Figure 5-9 b to e show the cache contents after each miss in
the sequence has been handled.
Cache Example
Word addr Binary addr Hit/miss Cache block
22 10 110 Miss 110
2) +1)
= 210 ×179 = 179 Kibits = 22.375 KiB
For this cache, the total number of bits in the cache is about
1.4 times as many as needed just for the storage of the data.
EXAMPLE Mapping an Address to a Multiword Cache Block
Consider a cache with 64 blocks and a block size of 16 bytes. To
what block number does byte address 1200 map?
block And
[Byte address / Bytes per block] * Bytes per block + (Bytes per block
– 1)
Thus, with 16 bytes per block, byte address 1200 is block
address [1200 / 6] = 75