hw6 Circuits

Brock Wilcox
CS470
Homework 6
2012.04.12
1.Exercises 5.2.1 through 5.2.4, parts a and b
"In this exercise we look at memory locality properties of matrix computation. The following code is written in C,
where elements within the same row are stored contiguously."
Assume that the elements of both the arrays A and B contain single-precision floating-point values, declared with
the codes below:
a.
float A[4000][4000], B[8][8];

for (int I=0; I<4000; I++)
for (int J=0; J<8; J++)
A[I][J] = B[J][0] + A[I][J];
b.
float A[4000][4000], B[8][8];

for (int J=0; J<8; J++)
for (int I=0; I<4000; I++)
A[I][J] = B[J][0] + A[I][J];
Hint: Note that these codes differ slightly from those provided in the textbook exercise.
5.2.1 How many single-precision floating-point values can be stored in a 16-byte cache line? Show your work!
Single floats are 32-bits each, which is 4 bytes (8 bits each). So each float takes 4 bytes,
meaning we can fit 4 floats into the cacheline (4 * 4).
5.2.1.EXTRA.A How many cachelines total would the array A need, if it could be stored completely in cache? Show
your work!
A is 4000 * 4000 = 16,000,000 floats. We divide by 4 floats per cacheline, 16,000,000/4 =

4,000,000 cachelines.
5.2.1.EXTRA.B Since not all of the elements of A are used in the codes above, how many cachelines would the
required elements of the array A actually need, if they could be stored completely in cache? Show your work!
For (a) and (b) we use up to A[4000][8] (well, 1 less counting from zero). So that is 4000 * 8
= 32000 floats, which is 32000 / 4 = 8000 cachelines.
5.2.1.EXTRA.C How many cachelines would the array B require, if it could be stored completely in cache? Show
your work!
B is 8 * 8 = 64 floats. Since we have 4 floats per cacheline, that is 64 / 4 = 16 cachelines.

5.2.1.EXTRA.D Since not all of the elements of B are used in the codes above, how many cachelines would the
required elements of the array B actually need, if they could be stored completely in cache? Show your work!
In both (a) and (b) we use J[0..7][0], which means we are using 8 * 1 = 8 floats. 8 floats / 4
per cacheline = 2 cachelines needed.
5.2.2 References to which variables exhibit temporal locality? Why?
In (a) and (b) the indexes i and j use temporal locality, since they are accessed over and
over again. (these would most likely be stored in a register).
In (b), B[J][0] has temporal locality, since it is accessed over and over (4000 times in a row)
5.2.3 References to which variables exhibit spatial locality? Why?
In (a), A[I][J] has spacial locality. As J is incremented, nearby values of the array are
accessed.
5.2.4 (Using the code provided for exercises 5.2.1 through 5.2.3 above)
How many 16-byte cachelines are needed to store all 32-bit matrix elements being referenced -- as the code
executes? Show your work!
For this question only, assume that only one cacheline from the array A and only one cacheline from the
array B can be in the cache at the same time.
Hint: There will be a difference between the two sets of codes, as one of the codes will use all four floats in a
cacheline from one of the two arrays consecutively and the other code will not.
Note that only the first 8 elements of each row in A are used and only the first element of each of the eight
rows of B is used.
For (a), we need 2 cachelines to hold a whole row of A, which is 4 floats per cacheline.
For (b) we need a cacheline for each separate row of B, so 8 cachelines.
5.2.4.EXTRA.E
Since fetching values from memory (to put into the cache) is far more expensive than fetching values directly from
the cache, which of the two codes above is more efficient? Why?
The first is significantly better. It is able to fetch consecutive locations, whereas (b), as an
even multiple of 2, will keep having cache misses.
2.Exercise 5.3.1, using the list of references printed in red below:
"Caches are important to providing a high performance memory hierarchy to processors. Below is a list of 32-bit
memory address references, given as word addresses.
5, 215, 174, 7, 82, 215, 66, 175, 65, 104, 87, 214, 5
5.3.1 "For each of these references,
a) identify the binary address,
b) identify the tag, and
c) identify the index given a direct-mapped cache with 16 one-word blocks.
d) Also list if each reference is a hit or a miss, assuming the cache is initially empty."
Word addresses mean a block size of 4-bytes to hold the word. Since we have 16 one-word
blocks, we need 4 bits for the index (though since we are word-aligned we could in theory
ignore the 2 low bits). So the tag needs to be 32 4 = 28 bits, from the high end of the
address.
Reference
Address
Tag
Index
hit/miss
101
101 miss
215
11000000
1101
111 miss
174
10000000
1010
1110 miss
111
111 miss
82
1010010
101
10 miss
215
11000000
1101
66
1000010
100
10 miss
175
10000000
1010
1111 miss
65
1000001
100
1 miss
104
1101000
110
1000 miss
87
1010111
101
111 miss
214
11000000
1101
110 miss
101
101 miss
111 hit
3.Exercise 5.3.4
"There are many different design parameters that are important to a cache's overall performance. The table below
lists parameters for different ONE direct-mapped cache designs."
Cache data size Cache block size Cache access time

64 KB
2 words
2 cycle
5.3.4 "Calculate the total number of bits required for the cache listed in the table, assuming a 32-bit address. Given
that total size, find the total size of the closest direct-mapped cache with 16-word blocks of equal size or greater.
Explain why the second cache, despite its larger data size, might provide slower performance than the first cache."
Hint: Compute n and m first. See Section 5.2.
2^6 = 64, so n = 6. 2 words = 2^m = 2^1, so m=1.

So we get 2^n*(2^m*32+(32-n-m-2)+1)=5632 bits.
4.Exercises 5.4.1 and 5.4.2, parts a and b
"For a direct-mapped cache design with 32-bit address, the following bits of the address are used to access the
cache."
Tag Index Offset

a. 31-10 9-4
3-0
b. 31-12 11-5
4-0
5.4.1 "What is the cache line size (in words)?"
(a) Tag is 22 bits, index is 6 bits (n), offset is 4 bits. So we have 22 = 32 (n + m + 2) = 32

(8 +m), and so m = 32 22 8 = 2. So the cache line size is 2^2 = 4 words.
(b) Tag is 20 bits, index is 7 bits (n), offset is 5 bits. So we have 20 = 32 (n + m + 2) = 32
(7 + m + 2), so 32 20 9 = m = 3. So the cache line size is 2^3 = 8 words.
5.4.2 "How many entries does the cache have?"
This can be determined from the index. In (a) the index is 6 bits, so the cache is 2^6 = 64
blocks. In (b) the index is 7 bits, so the cache is 2^7 = 128 blocks.
5.Exercises 5.4.4 and 5.4.5

"Starting from power on, the following byte-addressed cache references are recorded." (12 references)
Address 0 4 16 132 232 160 1024 30 140 3100 180 2180
Hint: You may assume that m=2 and n = 6.
5.4.4 "How many blocks are replaced?"
Hint: First fill out the table below:
tag is 24 bits (bits 31-8), index is 6 bits (bits 7-2).
Address
Line ID
Hit/Miss
Replace?
1024
16
132 232 160
30
4
10000
1000 1000 1110 1010
1111
100
00000
0
0100 1000 0000
0
0
0
0 2
16
29 20
0 3
M M
M
M
M
M
M
M
N N
N
N
N
N
Y
N
0
3100
140
180
2180
11000
1000
10110 100010
00111
1100
100
000100
00
17
3
22
16
M
M
M
M
N
Y
N
Y
5.4.5 "What is the hit ratio?"
The hit ratio is 0/12 = 0

6.Exercise 5.5.2, using values from table below:
"Recall that we have two write policies and write allocation policies, their combinations can be implemented at either
in L1 or L2 cache.
L1
L2
Write-back, write allocate Write-through, non-write allocate
5.5.2 "Describe the procedure of handling an L1 write miss, considering the component involved and the possibility
of replacing a dirty block."
First we'd check to see if the block is dirty. If it is, then we write the dirty block to memory. Next
we retrieve the target block from memory (overwriting the block that is in our way). Finally we
write to our L1 block. Later, when something else must read in a different memory location
overwriting our block, the block is written.

hw6 Circuits

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

hw6 Circuits

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

hw6 Circuits

Uploaded by

Copyright:

Available Formats

Brock Wilcox

float A[4000][4000], B[8][8];

float A[4000][4000], B[8][8];

A is 4000 * 4000 = 16,000,000 floats. We divide by 4 floats per cacheline, 16,000,000/4 =

B is 8 * 8 = 64 floats. Since we have 4 floats per cacheline, that is 64 / 4 = 16 cachelines.

5.2.2 References to which variables exhibit temporal locality? Why?

Cache data size Cache block size Cache access time

2^6 = 64, so n = 6. 2 words = 2^m = 2^1, so m=1.

Tag Index Offset

(a) Tag is 22 bits, index is 6 bits (n), offset is 4 bits. So we have 22 = 32 (n + m + 2) = 32

5.Exercises 5.4.4 and 5.4.5

5.4.5 "What is the hit ratio?"

The hit ratio is 0/12 = 0

You might also like