Computer Abstractions and Technology
Computer Abstractions and Technology
Computer Abstractions
and Technology
§1.1 Introduction
The Computer Revolution
■ Progress in computer technology
■ Underpinned by Moore’s Law
■ Makes novel applications feasible
■ Computers in automobiles
■ Cell phones
■ Human genome project
■ World Wide Web
■ Search Engines
■ Computers are pervasive
Output
device
Network
cable
Input Input
device device
Clock (cycles)
Data transfer
and computation
Update state
■ Performance improved by
■ Reducing number of clock cycles
■ Increasing clock rate
■ Hardware designer must often trade off clock
rate against cycle count
A is faster…
Relative frequency
■ Sequence 1: IC = 5 ■ Sequence 2: IC = 6
■ Clock Cycles ■ Clock Cycles
= 2×1 + 1×2 + 2×3 = 4×1 + 1×2 + 1×3
= 10 =9
■ Avg. CPI = 10/5 = 2.0 ■ Avg. CPI = 9/6 = 1.5
Chapter 1 — Computer Abstractions and Technology — 30
Performance Summary
The BIG Picture
■ Performance depends on
■ Algorithm: affects IC, possibly CPI
■ Programming language: affects IC, CPI
■ Compiler: affects IC, CPI
■ Instruction set architecture: affects IC, CPI, Tc
■ In CMOS IC technology
×30 5V → 1V ×1000
◼ Range: 0 to +2n – 1
◼ Example
◼ 0000 0000 0000 0000 0000 0000 0000 10112
= 0 + … + 1×23 + 0×22 +1×21 +1×20
= 0 + … + 8 + 0 + 2 + 1 = 1110
◼ Using 32 bits
◼ 0 to +4,294,967,295
x + x = 1111...1112 = −1
x + 1 = −x
◼ Example: negate +2
◼ +2 = 0000 0000 … 00102
◼ –2 = 1111 1111 … 11012 + 1
= 1111 1111 … 11102
◼ Instruction fields
◼ op: operation code (opcode)
◼ rs: first source register number
◼ rt: second source register number
◼ rd: destination register number
◼ shamt: shift amount (00000 for now)
◼ funct: function code (extends opcode)
0 17 18 8 0 32
000000100011001001000000001000002 = 0232402016
◼ i in $s0
lhi $s0, 61 0000 0000 0111 1101 0000 0000 0000 0000
ori $s0, $s0, 2304 0000 0000 0111 1101 0000 1001 0000 0000
op rs rt constant or address
6 bits 5 bits 5 bits 16 bits
◼ PC-relative addressing
◼ Target address = PC + offset × 4
◼ PC already incremented by 4 by this time
Chapter 2 — Instructions: Language of the Computer — 44
Jump Addressing
◼ Jump (j and jal) targets could be
anywhere in text segment
◼ Encode full address in instruction
op address
6 bits 26 bits
Static linking
Length of product is
the sum of operand
lengths
Initially 0
◼ Can be pipelined
◼ Several multiplication performed in parallel
Chapter 3 — Arithmetic for Computers — 10
MIPS Multiplication
◼ Two 32-bit registers for product
◼ HI: most-significant 32 bits
◼ LO: least-significant 32-bits
◼ Instructions
◼ mult rs, rt / multu rs, rt
◼ 64-bit product in HI/LO
◼ mfhi rd / mflo rd
◼ Move from HI/LO to rd
◼ Can test HI value to see if product overflows 32 bits
◼ mul rd, rs, rt
◼ Least-significant 32 bits of product –> rd
Initially dividend
Step 1
Step 2
Step 3
Step 4
space
◼ Compiled MIPS code:
f2c: lwc1 $f16, const5($gp)
lwc2 $f18, const9($gp)
div.s $f16, $f16, $f18
lwc1 $f18, const32($gp)
sub.s $f18, $f12, $f18
mul.s $f0, $f16, $f18
jr $ra
Chapter 3 — Arithmetic for Computers — 36
FP Example: Array Multiplication
◼ X=X+Y×Z
◼ All 32 × 32 matrices, 64-bit double-precision elements
◼ C code:
void mm (double x[][],
double y[][], double z[][]) {
int i, j, k;
for (i = 0; i! = 32; i = i + 1)
for (j = 0; j! = 32; j = j + 1)
for (k = 0; k! = 32; k = k + 1)
x[i][j] = x[i][j]
+ y[i][k] * z[k][j];
}
◼ Addresses of x, y, z in $a0, $a1, $a2, and
i, j, k in $s0, $s1, $s2
Chapter 3 — Arithmetic for Computers — 37
FP Example: Array Multiplication
◼ MIPS code:
li $t1, 32 # $t1 = 32 (row size/loop end)
li $s0, 0 # i = 0; initialize 1st for loop
L1: li $s1, 0 # j = 0; restart 2nd for loop
L2: li $s2, 0 # k = 0; restart 3rd for loop
sll $t2, $s0, 5 # $t2 = i * 32 (size of row of x)
addu $t2, $t2, $s1 # $t2 = i * size(row) + j
sll $t2, $t2, 3 # $t2 = byte offset of [i][j]
addu $t2, $a0, $t2 # $t2 = byte address of x[i][j]
l.d $f4, 0($t2) # $f4 = 8 bytes of x[i][j]
L3: sll $t0, $s2, 5 # $t0 = k * 32 (size of row of z)
addu $t0, $t0, $s1 # $t0 = k * size(row) + j
sll $t0, $t0, 3 # $t0 = byte offset of [k][j]
addu $t0, $a2, $t0 # $t0 = byte address of z[k][j]
l.d $f16, 0($t0) # $f16 = 8 bytes of z[k][j]
…
◼ Optional variations
◼ I: integer operand
◼ P: pop operand from stack
◼ R: reverse operand order
◼ But not all combinations allowed
A
Y
B
◼ Arithmetic/Logic Unit
◼ Multiplexer ◼ Y = F(A, B)
◼ Y = S ? I1 : I0
A
I0 M
u Y ALU Y
I1 x
B
S F
Clk
D Q
D
Clk
Q
Clk
D Q Write
Write D
Clk
Q
Load/
35 or 43 rs rt address
Store
31:26 25:21 20:16 15:0
Branch 4 rs rt address
31:26 25:21 20:16 15:0
◼ Four loads:
◼ Speedup
= 8/3.5 = 2.3
◼ Non-stop:
◼ Speedup
= 2n/0.5n + 1.5 ≈ 4
= number of stages