Dpco Unit 4
Dpco Unit 4
Dpco Unit 4
Ans.: Degradation of performance is an instruction pipeline may be due to address dependency where
operand address cannot be calculated without available information needed by addressing mode for e.g. an
instructions with register indirect mode cannot proceed to fetch the operand if the previous instructions is
1
loading the address into the register. Hence operand access is delayed degrading the performance of
pipeline.
6. What is data hazard in pipelining? What are the solutions ? AU: Dec.-07, May-12, 13
Ans.: When either the source or the destination operands of an instruction are not available at the time
expected in the pipeline and as a result pipeline is stalled, we say such a situation is a data hazard.
1. The easiest way to handle data hazards is to stall the pipeline.
2. The second simple hardware technique which can handle data hazard is called forwarding or register by
passing.
7. What are the classification of data hazards ?AU: May-17
Ans.: The data hazard can be classified as,
1. RAW (read after write) hazard
2. WAW (write after write) hazard
3. WAR (write after read) hazard
8. What is meant by speculative execution ? AU: May-12
Ans.: Speculative execution means that instructions are executed before the processor is certain that they are
in the correct execution sequence. Hence, care must be taken that no processor registers or memory
locations are updated until it is confirmed that these instructions should indeed be executed. If the branch
decision indicates otherwise, the instructions and all their associated data in the execution units must be
purged, and the correct instructions fetched and executed.
9. What is called static and dynamic branch prediction ?
OR Differentiate between the static and dynamic techniques. AU: May-13
Ans.: The branch prediction decision is always the same every time a given instruction is executed. Any
approach that has this characteristic is called static branch prediction. Another approach in which the
prediction decision may change depending on execution history is called dynamic branch prediction.
10. What is the role of cache in pipelining? AU: Dec.-11.
Ans.: Each pipeline stage is expected to complete in one clock cycle. The clock period should be long enough
to let the slowest pipeline stage to complete. Faster stages have to wait for the slowest one to complete.
Since main memory is very slow compared to the execution, if each instruction needs to be fetched from
main memory, pipeline is almost useless. The cache memory reduces the memory access time and makes
pipelining useful.
2
1. Explain about building a data path. (13)
3
4
5
6
7
8
9
2. Explain about pipelining(13)
10
11
12
3. What are pipeline hazards? Outline the types of pipeline hazards. (13)
Hazards
There are situations in pipelining when the next instruction cannot execute in the following
clock cycle. These events are called hazards, and there are three different types.
1) Structural Hazard
2) Data Hazard
3) Control Hazard
In the above scenario, in cycle 4, instructions I1 and I4 are trying to access same resource (Memory) which
introduces a resource conflict.
● To avoid this problem, we have to keep the instruction on wait until the required resource (memory in our
case) becomes available. This wait will introduce stalls in the pipeline as shown below:
13
When the above instructions are executed in a pipelined processor, then data dependency condition will
occur, which means that I2 tries to read the data before I1 writes it, therefore, I2 incorrectly gets the old
value from I1.
● To minimize data dependency stalls in the pipeline, operand forwarding is used.
Operand Forwarding : In operand forwarding, we use the interface registers present between the stages to
hold intermediate output so that dependent instruction can access new value from the interface register
directly.
14
The output which we get I1 -> I2 -> I3 -> BI1
So, the output sequence is not equal to the expected output, that means the pipeline is not implemented
correctly.
1. Using Stalls - To correct the above problem we need to stop the Instruction fetch until we get target address
of branch instruction. This can be implemented by introducing delay slot until we get the target address
15
This technique is called dynamic branch prediction.
b. A branch prediction buffer or branch history table is a small memory indexed by the lower portion of the
address of the branch instruction.
The memory contains a bit that says whether the branch was recently taken or not.
3. Delayed Branching
1) The slot directly after a delayed branch instruction, which in the MIPS architecture is filled by an
instruction that does not affect the branch.
2) An instruction that always executes after the branch in the branch delay slot.
4. (i) Outline a control unit with a diagram and state the functions performed by a control unit.(8)
(ii) Outline the difference between hardwired and microprogrammedcontrol (5)
16
Hardwired Control Unit
● It is a method of generating control signals with the help of Finite State Machines
(FSM). It’s made in the form of a sequential logic circuit by physically connecting
components such as flip-flops, gates, and drums that result in the finished circuit. As a
result, it’s known as a hardwired controller.
17
about the interrupts, which will affect the execution of an instruction.
Microprogrammed Control Unit
● A control unit whose binary control values are saved as words in memory is called a
microprogrammed control unit.
18
5. (i)Explain about Instruction Execution (6)
I. Instruction Execution
Steps in detail:
❖ All instructions start by using the program counter to supply the instruction address to
the instruction memory.
❖ After the instruction is fetched, the register operands used by an instruction are
specified by fields of that instruction.
❖ Once the register operands have been fetched, all the instruction classes, except jump,
use the ALU after reading the registers.
➢ Memory reference instructions (load or store) use the ALU for an address
calculation.
➢ Arithmetic Logical instructions use the ALU for the operation execution.
❖ The second input to the ALU can come from a register or the immediate field of the
instruction.
❖ After using the ALU, the actions required to complete various instruction classes are
not same.
➢ Branches require the use of the ALU output to determine the next instruction
address, which comes either from the ALU (where the PC and branch off set
are summed) or from an adder that increments the current PC by 4.
Main 5 steps
1. Fetch an instruction and increment the program counter.
2. Decode the instruction and read registers from the register file.
3. Perform an ALU operation.
4. Read or write memory data if the instruction involves a memory operand.
19
5. Write the result into the destination register, if needed.
Load Instruction
❖ Depending on how the hardware is organized, some of these actions can be performed
at the same time.
Arithmetic and Logic Instruction
● There are either two source registers, or a source register and an immediate source
operand.
● No access to memory operands is required.
Eg. Add R3, R4, R5
Steps as follows
1. Fetch the instruction and increment the program counter.
2. Decode the instruction and read registers R4 and R5.
3. Compute the sum [R4] + [R5].
4. No action.
5. Load the result into the destination register, R3.
Store Instruction
20
Steps as follows:
1. Fetch the instruction and increment the program counter.
2. Decode the instruction and read registers R6 and R8.
3. Compute the effective address X + [R8].
4. Store the contents of register R6 into memory location X + [R8].
5. No action.
5.(ii)An instruction pipeline has five stages where each stage takes 2 nanoseconds and all instructions
use all five stages, branch instruction are not overlapped (i.e.) the instruction after the branch is not
fetched till the branch instruction is completed. Under ideal condition. (7)
Calculate the average instruction execution time assuming that 20% of all instruction executed are
branch instructions. Ignore the fact that some branch instructions may be conditional.
Each stage is 2ns. So, after 5 time units each of 2ns, the first instruction finishes (i.e., after 10ns), in every 2ns
after that a new instruction gets finished. This is assuming no branch instructions. Now, once the pipeline is full,
we can assume that the initial fill time doesn't matter our calculations and average execution time for each
instruction is 2ns assuming no branch instructions.
A. Now, we are given that 20% of instructions are branch (like JMP) and when a branch instruction is
executed, no further instruction enters the pipeline. So, we can assume every 5th instruction is a branch
instruction. So, with this assumption, total time to finish 5 instruction will be 5∗2+8=18 ns (as when a
branch instruction enters the pipeline and before it finishes, 4 pipeline stages will be empty
totaling 4∗2=8 ns, as it is mentioned in question that the next instruction fetch starts only when branch
instruction completes). And this is the same for every set of 5 instructions, and hence the average
instruction execution time =18/5=3.6 ns
B. This is just a complex statement. But what we need is to identify the % of branch instructions which
cause a branch to be taken as others will have no effect on the pipeline flow.
20% of instructions are branch instructions. 80% of branch instructions are conditional.
That means .2∗.8=16% of instructions are conditional branch instructions and it is given that 50% of
those result in a branch being taken.
So, 8% of instructions are conditional branches being taken and we also have 20% of 20%=4% of
unconditional branch instructions which are always taken.
So, percentage of instructions where a branch is taken is 8+4=12% instead of 20% in (A) part.
So, in 100 instructions there will be 12 branch instructions. We can do a different calculation here as compared
to (A) as 12 is not a divisor of 100. Each branch instruction causes a pipeline delay of 4∗2=8 ns.
So, 12 instructions will cause a delay of 12∗8=96 ns. For 100 instructions, we need 100∗2=200 ns without any
delay and with delay we require 200+96=296 ns for 100 instructions.
21