[go: up one dir, main page]

0% found this document useful (0 votes)
42 views14 pages

Special Problem Set Ver 3

The document outlines various special problems related to computer architecture and operations, including MIPS code translation, multiplication and division processes, cache systems, and performance metrics for pipelined and non-pipelined operations. It also addresses modifications for new instructions and hardware requirements, along with calculations for system availability and reliability. Each section specifies tasks and calculations to be performed, focusing on understanding and optimizing processor performance.

Uploaded by

Arun Chikkaraju
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views14 pages

Special Problem Set Ver 3

The document outlines various special problems related to computer architecture and operations, including MIPS code translation, multiplication and division processes, cache systems, and performance metrics for pipelined and non-pipelined operations. It also addresses modifications for new instructions and hardware requirements, along with calculations for system availability and reliability. Each section specifies tasks and calculations to be performed, focusing on understanding and optimizing processor performance.

Uploaded by

Arun Chikkaraju
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 14

ECE 485 / 585 Special Problems

SP1: For the pseudo code segment shown,


write down the corresponding MIP code. A = 100
The variable A is located in register $s2 Loop: While A > 10
Variable C is located at memory location 0x9000 B=A+C
The resulting array B is to be stored starting at memory location A=A-5
specified in register $t5
(Do not modify or "correct" or simplify the original pseudo code)
SP2: (see tab sp2)
SP3: A representative multiplier hardware is shown in the figure.
Show the step by step contents of the
registers for 4-bit multiplication 0 1 1 0 x 1 0 1 0. Show all work and each step
NOTE:
4 bits instead of 32
8 bits instead of 64

Iteration Step Multiplicand Product


0 Initial Values
SP6: We want to add the instruction shift left logical (sll) to the multi cycle implementation.
sll rd, rt, shmt
State its meaning.
Show any modifications to existing HW or new HW needed, and also all the control signals (current and new ones
if needed). Provide explanatory comments.

SP8 and SP9: (See sp8 sp9 tab)


SP10: Following latencies are defined for a processor using multicycle operations.
IF ID EX MEM WB
300 250 200 350 150 (in ns)
Use the 6 line fragment of code specified in SP8. The probability of branch being taken is 80%.

For a non-pipelined sequential operation, determine


(a) the average thruput
(b) the average latency
For pipelined operation without assists (problem 3), determine
(c) the average thruput
(d) the average latency
SP11: See Sp11 tab
SP12: See Sp12 tab
SP13: See Sp13 tab
SP14: See Sp14 tab
SP15 The cycle time for each step in SP 14 is 200 ns. The probability of branch being taken is 90%.
The following calculations need to be made for the 6 line code segment shown.
For a non-pipelined sequential operation, determine
(a) the average thruput
(b) the average latency

For pipelined operation, determine


( c) the average thruput in case branch taken assumption is made
( d) the average thruput in case delayed branch assumption is made
(e) the average latency in case branch not taken assumption is made
(f) the average latency in case delayed branch assumption is made
SP 16: A split I-cache and D-cache system operates in the write back and write allocate mode.
The instruction distribution is as follows: Loads: 15%, stores: 10%, others 75%. The cache is one cycle
access cache. There are 4 words per block. The bus bandwidth between the cache and memory is
1 cycles/word and the memory access time per word for read is 3 cycles/access and for writes is
4 cycles/access. The hit rate for I-cache is 90% whereas for the D-cache is 85%. The cache blocks are dirty
15% of the time. Determine the overall CPI
SP 17 (a) : The repair time for a system is 2 hours. The FITs are 100,000. Determine the Down Time for
the system.
SP 17 (b): A computer system contains an I/O controller controlling three I/O units. The availability for
each of the sub-systems is indicated in the diagram. Determine the availability for at most 400 users.
Bus 0.97

cache memory disk I/O control


0.98 0.91 0.95 0.91

I/O unit I/O unit


CPU
0.92 0.93
0.99

200 users 300 users

SP 18: A split I-cache and D-cache system operates in the write through and no write allocate mode. The
I-cache hit is 85% and the D-cache hit is 90%. It takes 1 cycle for the CPU to access the cache. There
are 4 words/block and the bandwidth of the bus interconnecting the cache with memory is 2 cycles/word.
The memory port access (read or write) is 3 cycles/access (word or block) and the memory read/write time
is 0.5 cycles/word. A word buffer is placed between the cache and the bus for which the access time is
1 cycle/word. The effectiveness of the word buffer is 80%. The distribution of the instructions
is: Load: 10%; store 15%; others 75%. Determine the CPI
SP19: A split I-cache and D-cache system operates in the write back and write no-allocate mode. The
I-cache hit is 95% and the D-cache hit is 80%. It takes 1 cycle for the CPU to access the cache. There
are 4 words/block and the bandwidth of the bus interconnecting the cache with memory is 3 cycles/word.
The total memory access is 2 cycles/access per word (read or write).
80% of the blocks in D-cache can be dirty at any time. The distribution of instruction is
as follows: Data reads (loads): 10%; Dta writes (stores): 15%; other instructions 75%.
Determine the overall CPI
SP20: A split I-cache and D-cache system operates in the write-thru' and no-write allocate mode.
There is Write Buffer (WB)
of one word size introduced between the CPU and the D-cache in order to reduce memory stalls. The WB is
85% effective, i.e., 85% of the time, the CPU does not wait for the write to complete. The instruction
distribution is as follows: Loads 15%, Stores 10%, others: 75%. The cache is a one-cycle access cache and
it takes 0.5 cycles to access the WB. There are 3 words/block, The bus bandwidth between the cache and
memory is 5 words/ cycle and the memory access is 3 cycles/access. The hit rate is 85% for both I-cache
and D-cache and 70% of the blocks in cache are dirty. Determine the overall CPI.
SP21: The FITs for a system are 20000. The Down Time is specified to be 2 hours/year.
Determine the Mean Time To Repair (MTTR)
SP22: A computer system contains an I/O controller controlling three I/O units. The availability for
each of the sub-systems is indicated in the diagram.
Determine the availability for at most 500 users.
Bus 0.93

cache memory disk I/O control


0.98 0.92 0.95 0.91

CPU I/O unit I/O unit


0.99 0.96 0.95

200 users 300 users

SP 23: A representative multiplier hardware is shown in the figure. Show the step by step contents of the
registers for 4-bit multiplication 0 1 1 0 x 1 0 1 1. Show all work and each step
Iteration Step Multiplicand Product
0 Initial Values

SP 24: A 3 bit divider is being designed using the architecture shown in Fig 3.12 Page 240 of the text.
(4th Ed.) Show all the contents of the registers for division of unsigned numbers 1 0 0 1 0 0 by 1 1 0
(Note: same as Fig. 3.13 P 187 of 3rd Ed.)

3 bits

3 bit

6 bits

Iteration Step Divisor Register "Remainder" Register


0 Initial Values

SP 25: Following latencies are defined for a processor using multicycle operations.
IF ID EX MEM WB
250 100 150 300 150 (in ns)
The Distribution of instructions is as follows: R-Type: 55%; Branching 10%; LW: 20%; SW 15%
All LW instruction are followed by an R-type instruction which uses the result of the LW instruction resulting
in a stall of 600 ns in the pipelining. Situation.

For a non-pipelined sequential operation, determine


(a) the average latency
(b) the average thruput

For pipelined operation, determine


(c) the average latency
(d) the average thruput
SP 26: We want to create a new instruction which calculates the absolute value of the contents of a source register.
Example ABS R10, R5
(a) Given the numbers are in 2's complement, write down a short pseudo code for the ALU operation
(b) Show any modifications to existing HW or new HW needed, and also all the control signals (current and new ones
if needed). Provide explanatory comments.

SP 27: Following latencies are defined for a processor using multicycle operations.
IF ID EX MEM WB
200 150 200 350 150 (in ns)
The distribution of instructions is as follows: R-Type: 60%; LW: 25%; SW 15%

For a non-pipelined sequential operation, determine


(a) the average thruput
(b) the average latency

For pipelined operation, determine


(c) the average thruput
(d) the average latency
Fall 2024 ver 3

4 bits instead of 32
8 bits instead of 64
I/O control

I/O unit I/O unit


0.94

300 users 400 users


I/O control

I/O unit I/O unit


0.94

300 users 400 users

You might also like