Chapter 5 summary
Temporal locality: data accessed recently is more likely to be accessed again
Spatial locality: data located close to the recently accessed data is also likely to
be accessed soon
Memory hierarchy
. Registers. In the processor
. Caches (L1,L2,L3). Close to the processor. Made using the SRAM
technology. SRAM stands for static random access memory. SRAM: 1
bit -> 6 transistors
. Main memory(RAM). It is implemented from the DRAM technology.
DRAM stands for dynamic random access memory. DRAM: 1 bit -> 1
transistor + 1 capacitor
. Magnetic disk or flash memory. Slowest
GB = 10^9
GiB = 2^30
Any data found in cache or main memory is taken from the lowest level aka
magnetic disk or flash memory.
As we move away from the processor, the levels take progressively longer to
access.
Blocks are the minimum unit of information that can be exchanged.
If data requested by the processor appears in the upper level of the memory
hierarchy, a hit will happen. Whereas if data was not found, a miss will happen.
Lower level will be accessed to get the requested block of data.
Direct Mapped Cache
Each block in the memory has only one place in the cache.
Block address in cache is called index.
Index = (block address) modulo (number of blocks per cache)
Block @ = [tag][index]
2^n blocks per cache (n: index bits)
Byte@ = [block@][word offset(m)][byte offset], byte offset is either 2 or 3
4
bits. 2 bits if we are working with 32 bits addressing, and 3 bits if we are
working with 64 bits addressing.
word@ is block@ + word offset
3
2^n blocks per cache
2^m words per block
2^2 bytes per word if 32 bit addressing, 1 word = 4 bytes = 2^2 bytes = 32 bits
2^3 bytes per word if 64 bit addressing , 1 word = 8 bytes = 2^3 bytes = 64
2
bits
1
If it is mentioned that we have 8 blocks per cache, this means that n = 3.
Remember 2^n = 8 therefore n = 3
Each block is 2 words. Words per block is for m. So 2^m = 2 therefore m = 1
32 bit addressing => 1 word is 2^2 bytes
For 32 bit address the cache size(bits) equals number of blocks per cache *
contents of each block(data + tag + valid bits)
= 2^n * [2^m * 2^2 * 2^3 + 32 -(n+m+2) +1] which can be simplified to 2^n *
(2^(m+5) + 31 -n -m)
Tag bits for one block = 32 - (n+m+2)
Total tag bits = 2^n *(30-n-m)
For 64 bit address the cache size(bits) is 2^n * (2^m * 2^3 * 2^3 + 64-
(n+m+3)+1), this can be simplified to 2^n * (2^(m+6) + 62 - n - m)
Total tag bits in cache = 2^n * (61 - n - m)
Total valid bits 2^n * 1
Finding block address if byte address was given
Block@ = byte address / 2^(m+2) or (m+3)
Block@ = word address / 2^m
Finding word address if byte address was given
Word @ = byte@/2^2 or (2^3)
Example
16kbyte of data
4-words per block
32 bit addressing
Remember 2^n is for blocks per cache, and 2^m is for words per block
We have 4 words per block meaning 2^m = 4, therefore m = 2
Blocks per cache (2^n) = Data size / block size = 16kbyte / 4 words
Since it is 32 bit addressing, each word is 4 bytes. To calculate =>
16kbyte / 4 words * 4
=> k is 2^10
=> 2^n = 16*2^10 / 4*4 = 1024 which is 2^10.
So in this case, n = 10, meaning there are 2^10 or 1024 blocks per cache.
Now to get the cache size(bits). Cache size = 2^n * (2^(m+5) + 31 -n-m) =>
1024 * (2^(2+5) + 31 -10-2) = 147kbits
Example 2
Cache with 64 blocks
Block size = 16 bytes
Byte @ = 1200
Assuming we are working with 32 bit addressing(meaning one word is 4 bytes)
Calculate the index
Solution:
64 blocks per cache. 2^n = 64 => n = 6
Words per block(2^m) = 16 bytes . Convert bytes to words => 16 / 4 = 4.
4 words per block => 2^m = 4.
m = 2.
Index = block address modulo blocks per cache
Block @ = byte @ / 2^(m+2)
Block @ = 1200 / 2^(2+2)
Block @ = 75
Index = block address modulo block per cache
Index = 75 modulo 64 = 11
When block size is increased:
The miss rate decreases but the miss penalty increases, also the contention
between blocks increases and the number of blocks per cache increases.
Handling cache writes
To make memory and cache consistent, we write the data into both the memory
and cache. This scheme is called write-through. The write through method
comes with performance challenges, needing something called a write buffer.
Write buffer is a small, fast temporary storage area that holds data temporarily
before it is written to the main memory. Writing data to the memory can be
relatively slow compared to writing to cache. The write buffer allows the
processor to continue executing instructions while the write operations are
being handled in the background.
In a write-back scheme, when a write occurs, the new value is only written to
the block in the cache.
Write back, while exhibiting better performance than write through, is also more
complex to implement and offers less consistent data compared to write
through.