[go: up one dir, main page]

0% found this document useful (0 votes)
40 views5 pages

Archmidsem 2009 Sol

The document is a periodical examination paper for M. Tech (CS) students focusing on Computer Architecture, dated February 26, 2009. It includes various questions related to speedup calculations, memory hierarchy, instruction pipelining, and code execution in different architectures. The paper consists of multiple-choice and descriptive questions, requiring quantitative justifications and explanations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
40 views5 pages

Archmidsem 2009 Sol

The document is a periodical examination paper for M. Tech (CS) students focusing on Computer Architecture, dated February 26, 2009. It includes various questions related to speedup calculations, memory hierarchy, instruction pipelining, and code execution in different architectures. The paper consists of multiple-choice and descriptive questions, requiring quantitative justifications and explanations.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

INDIAN STATISTICAL INSTITUTE

Periodical Examination

M. Tech (CS) - I Year (Semester - II)

Computer Architecture

Date: 26.02.2009 Maximum Marks: 40 Duration: 2.5 Hours

Note: Be precise in your answers. This is a three page question paper.

Q 1(i): A computer system contains a special purpose processor for doing floating-point operations.
You as a designer have determined that 70% of your computations can use the floating-point
processor. When a program uses the floating-point processor, the speedup of the floating-point
processor is 45% faster than when it does not use it. Find the overall speedup by using the
floating-point processor.

Ans 1(i): Here, fraction enhanced fen = 0.7 and speedup enhanced sen = 1.45. So, overall
1
speedup = 0.7
(1−0.7)+ 1.45
= 1.2775.

Q 1(ii): In order to improve the speedup, you are considering two options:

Option 1: The compiler design is modified so that 80% of the computations can use the
floating-point processor. Cost of this option is Rs. 25 lakhs.
Option 2: The floating-point processor is to be modified. The speedup of the floating-point
processor is 100% faster than when it does not use it. Assume in this case that 60% of
the computations can use the floating point processor. Cost of this option is Rs. 30 lakhs.

Which option would you recommend? Justify your answer quantitatively. [2+4=6]

Ans 1(ii): The relative figures for the two options are:
1
Option 1: Here fen = 0.8 and sen = 1.45. So, overall speedup is equal to 0.8
(1−0.8)+ 1.45
=
1.33.
The cost to speedup ratio is 25 lakhs = 18.7969 lakhs .
1.33
1
Option 2: Here fen = 0.6 and sen = 2.0. So, overall speedup is equal to (1−0.6)+ 0.6
=
2.0
1.4286.
30 lakhs
The cost to speedup ratio is 1.4286 = 20.9995 lakhs .

1
As the cost to speedup ratio is smaller for Option 1, we would choose Option 1.

Q 2: Suppose we make an enhancement to a computer that improves a mode of execution by a factor


of 15. The enhanced mode is used 55% of the time measured as a percentage of the execution
time when the enhanced mode is in use. Find out (a) what percentage of the original execution
time has been converted to fast mode? (b) what is the speedup we have obtained from fast
mode? [2+4=6]

Ans 2: Let the old execution time be t and the new execution time be t0 . Let t1 be the time span in
t1
t for which the enhanceement was used. Thus, fen = t . Because of the use of enhancement,
t1
let t1 of the old execution now finish in time t01 in the new execution. Thus, sen = t01 .
t01
In this problem, what we have been given instead is a new fraction f 0 = t0 for the enhanced
mode. We have to deduce speedup in terms of f 0 and sen . The new execution time t0 is
composed of the enhanced time t01 and the unenhanced time (t − t1 ).

t0 = t01 + (t − t1 )
= f 0 t0 + (t − t1 )
t0 (1 − f 0 ) = t − sen t01
= t − sen t0 f 0
t0 (1 − f 0 + sen f 0 ) = t
t
= 1 + f 0 (sen − 1)
t0
So, speedup is t
t0 = 1 + f 0 (sen − 1). Here, sen = 15 and f 0 = 0.55. So, speedup is equal to
1
1 + 0.55(15 − 1) = 8.7. We know, speedup is equal to (1−fen )+ fsen
en
. Plugging in the values
of speedup (8.7) and sen = 15, we have fen = 0.948.

Q 3 (i): Describe in short the concept of memory hierarchy explaining the role of each level of
memory.

Ans 3(i): This was discussed in class.

Q 3 (ii): The Clock cycles per instruction (CPI) of a computer system is 3.0 when all memory
accesses hit in the cache. The only data accesses are loads and stores and these total 53%
of the instructions. If the miss penalty is 31 clock cycles and the miss rate is 3%, how much
faster would the machine be if all instructions were cache hits? [4+2=6]

2
Ans 3(ii): CPU execution time for the machine that always hits =
(CPU clock cycles + Memory stall cycles) × clock cycle(CC) = (IC × CP I + 0) × CC
= IC × 3.0 × CC.
Memory stall cycles for the machine with the real cache =
IC × memory references per instruction × miss rate × miss penalty
= IC × (1 + 0.53) ×0.03 × 31
| {z }
1 instruction access and 0.53 data accesses per instruction
= IC × 1.4229.
CPU execution time in cache
= (IC × 3.0 + IC × 1.4229) × CC
= 4.4229 × IC × CC.
Performance ratio (inversion of execution times)
= CPUCPU execution time with cache
execution time without cache
4.4229×IC×CC
= 3.0×IC×CC
= 1.4743.

Q 4 Consider the following piece of ’C’ code.

for (i=0; i<= 100; i++)


{X[i] = Y[i] + Z;}

Assume that X and Y are arrays of 32-bit integers and Z and i are 32-bit integers. Assume that
all data values and their addresses are kept in memory at addresses 0, 5000, 1500 and 2000 for
X, Y, Z and i respectively except when they are operated on. Assume that values in registers
are lost between iterations of the loop.
(a) Write the code for DLX. (b) How many memory-data references will be executed? (c)
What is the code size in bytes? [8+2+1=11]

Ans 4: Note the following two sentences in the question. Assume that all data values and their
addresses are kept in memory at addresses 0, 5000, 1500 and 2000 for X, Y, Z and i respectively
except when they are operated on. Assume that values in registers are lost between iterations
of the loop.
Because of this assumption, registers are not used to hold updated or intermediate values.
Values are stored to memory and reloaded when needed. Also, as all addresses fit inside 16
bits, we use immediate instructions.

3
ADD R1, R0, R0 ;R1 will store ’i’; initialize it to zero
SW 2000(R0), R1 ;store ’i’
loop: LW R1, 2000(R0) ;get value of ’i’
SLL R2, R1, #2 ;making the offset for Y
ADDI R3, R2, #5000 ;add base address and the offset for Y
LW R4, 0(R3) ;load Y[i]
LW R5, 1500(R0) ;load Z
ADD R6, R4, R5 ;Y[i]+Z
LW R1, 2000(R0) ;again get value of ’i’ as
;it cannot be stored in register
SLL R2, R1, #2 ;making the offset for X
ADDI R7, R2, #0 ;add base address and the offset for X
SW 0(R7), R6 ;X[i]=Y[i]+Z
LW R1, 2000(R0) ;get value of ’i’
ADDI R1, R1, #1 ;increment ’i’
SW 2000(R0), R1 ;store ’i’
LW R1, 2000(R0) ;get value of ’i’
ADDI R8, R1, #-101 ;is counter at 101?
BNEZ R8, loop ;loop instruction

Instructions executed (though not asked for) is the number of instructions for initialization
instructions, plus the number of instructions in the loop times the number of iterations and is
equal to 2 + (16 × 101) = 1618.
The number of memory-data references executed is = 1 + (8 × 101) = 809.
The DLX instruction is 4 bytes wide, so the code size is 18 × 4 = 72.

Q 5: Show how the code sequence A × B − (A + C × B) will appear on the following architectures:
(a) stack, (b) accumulator, (c) register-memory, and (d) load-store. [1.5+1.5+1.5+1.5=6]

Ans 5: The code sequence for A×B −(A+C ×B) in the following architectures are shown below:

Q 6(i): Explain the effect of instruction pipelining on the bandwidth of the memory systems.

Ans 6(i): This was discussed in the class.

4
Stack Accumulator Register-Memory Load-Store
push A; load B; load R1, A; load R1, A;
push B; mul C; mul R1, B; load R2, B;
mul; add A; store R1, D; load R3, C;
push A; store D; load R2, C; mul R4, R3, R2;
push C; load A; mul R2, B; add R5, R1, R4;
push B; mul B; add R2, A; mul R6, R1, R2;
mul; sub D; sub R2, D; sub R7, R6, R5;
add;
sub;

Q 6(ii): Consider an unpipelined machine with five stages (Instruction Fetch, Instruction Decode/Register
Fetch, Execute/Address Calculation, Memory Access and Write Back). Assume that it has
11-ns clock cycles. The machine uses four cycles for ALU operations and branches and five
cycles for memory operations. Assume that the relative frequencies of these operations are
45%, 15% and 40% respectively. Pipelining the machine adds 1-ns of overhead to the clock.
Find out how much speedup we will gain in the instruction execution rate. You can ignore any
latency impact. [3+2=5]

Ans 6(ii): The average instruction execution time on the unpipelined machine is

Average instruction execution time = Clock Cycle(CC) × Average CPI


= 11ns × ((0.45 + 0.15) × 4 + 0.4 × 5)
= 11ns × 4.4
= 48.4ns

In the pipelined version, the clock will run at 11 + 1 = 12 ns. So, the average instruction
execution time is 12 ns. So, the speedup from pipelining is
ave. instruction time in unpipelined version
= ave. instruction time in pipelined version
= 48.4 ns
12 ns = 4.033

You might also like