Introduction to CMOS VLSI Design
Chapter 4
Delay
Delay Definitions
tpdr: rising propagation delay
– From input to rising output
crossing VDD/2
tpdf: falling propagation delay
– From input to falling output
crossing VDD/2
tpd: average propagation delay
– tpd = (tpdr + tpdf)/2
tr: rise time
– From output crossing 0.2
VDD to 0.8 VDD
tf: fall time
– From output crossing 0.8 Inverter
VDD to 0.2 VDD
Chapter 4 CMOS VLSI Design 2
Delay Definitions
tcdr: rising contamination delay
– From input to rising output crossing VDD/2
tcdf: falling contamination delay
– From input to falling output crossing VDD/2
tcd: contamination delay
– tcd = (tcdr + tcdf)/2 ??
– tcd = min(tcdr , tcdf)
Chapter 4 CMOS VLSI Design 3
Simulated Inverter Delay
Solving differential equations by hand is too hard
SPICE simulator solves the equations numerically
– Uses more accurate I-V models too!
But simulations take time to write, may hide insight
2.0
1.5
1.0
(V)
tpdf = 66ps tpdr = 83ps
Vin
Vout
0.5
0.0
0.0 200p 400p 600p 800p 1n
t(s)
Chapter 4 CMOS VLSI Design 4
Delay Estimation
We would like to be able to easily estimate delay
– Not as accurate as simulation
– But easier to ask “What if?”
The step response usually looks like a 1st order RC
response with a decaying exponential.
Use RC delay models to estimate delay
– C = total capacitance on output node
– Use effective resistance R
– So that tpd = RC
Characterize transistors by finding their effective R
– Depends on average current as gate switches
Chapter 4 CMOS VLSI Design 5
Effective Resistance
Shockley models have limited value
– Not accurate enough for modern transistors
– Too complicated for much hand analysis
Simplification: treat transistor as resistor
– Replace Ids(Vds, Vgs) with effective resistance R
• Ids = Vds/R
– R averaged across switching of digital gate
Too inaccurate to predict current at any given time
– But good enough to predict RC delay
Chapter 4 CMOS VLSI Design 6
RC Delay Model
Use equivalent circuits for MOS transistors
– Ideal switch + capacitance and ON resistance
– Unit nMOS has resistance R, capacitance C
– Unit pMOS has resistance 2R (lower carrier
mobility), capacitance C
Capacitance proportional to width (k)
Resistance inversely proportional to width
d
s
kC
kC
R/k
d 2R/k
d
g k g kC
g k g
s kC kC
kC s
s
d
Chapter 4 CMOS VLSI Design 7
RC Values
Capacitance
– C = Cg = Cs = Cd = 2 fF/mm of gate width in 0.6 mm
– Gradually decline to 1 fF/mm in nanometer techs. (
(1 fF = 1.0E-15 F)
Resistance
– R 6 KW*mm in 0.6 mm process
– Improves with shorter channel lengths
Unit transistors
– May refer to minimum contacted device (4/2 l)
– Or maybe 1 mm wide device
– Doesn’t matter as long as you are consistent
Chapter 4 CMOS VLSI Design 8
Equivalent RC circuits
Chapter 4 CMOS VLSI Design Slide 9
Inverter Delay Estimate
Estimate the delay of a fanout-of-1 inverter
A=1, consider output capacitance
2C
2C 2C
2C 2C
2 Y 2
A Y
1 1 R C
C
R C C
C output
capacitance
d = 6RC
Chapter 4 CMOS VLSI Design 10
Delay Model Comparison
(Example 4.1, p.145)
Chapter 4 CMOS VLSI Design 11
Example: 3-input NAND
Sketch a 3-input NAND with transistor widths chosen to
achieve effective rise and fall resistances equal to a unit
inverter (R).
2 2 2
3
3
Chapter 4 CMOS VLSI Design 12
3-input NAND Caps
Annotate the 3-input NAND gate with gate and diffusion
capacitance.
2C 2C 2C
2C 2C 2C
2 2 2
2C 2C 2C
3 3C
3C
3C
3
3C
3C
3
3C
3C
Chapter 4 CMOS VLSI Design 13
3-input NAND Caps
Annotate the 3-input NAND gate with gate and diffusion
capacitance.
2 2 2
9C
3
5C
3C
3
5C
3C
3
5C
Chapter 4 CMOS VLSI Design 14
2C 2C 2C
2C 2C 2C
2 2 2
2C 2C 2C
3 3C
3C
3C
3
3C
3C
3
3C 2 2 2
3C
9C
3
5C
3C
3
5C
3C
3
5C
CMOS VLSI Design
Elmore Delay
ON transistors look like resistors (output capacitor)
Pullup or pulldown network modeled as RC ladder
Elmore delay of RC ladder
t pd R
nodes i
i to source Ci
R1C1 R1 R2 C2 ... R1 R2 ... RN CN
R1 R2 R3 RN
C1 C2 C3 CN
Chapter 4 CMOS VLSI Design 16
Example: 3-input NAND
Estimate worst-case rising and falling delay of 3-input NAND
driving h identical gates.
2 2 2 Y
A 3 9C 5hC
n2
B 3 n1 3C
h copies
C 3 3C
A=1, B=1, C=0
A=1, B=1, C=1
t pdf 3C R3 3C R3 R3 9 5h C R3 R3 R3
t pdr 9 5h RC
11 5h RC
rising output (nạp tụ)
falling output (xả tụ)
Chapter 4 CMOS VLSI Design 17
Delay Components
Delay has two parts
– Parasitic delay
• 9 or 11 RC
• Independent of load
– Effort delay
• 5h RC
• Proportional to load capacitance
Chapter 4 CMOS VLSI Design 18
Contamination Delay
Best-case (contamination) delay (minimum delay) can be
substantially less than propagation delay.
Ex: If all three inputs fall simultaneously (rising output)
2 2 2 Y
A 3 9C 5hC
n2
B 3 n1 3C
C 3 3C
A=0, B=0, C=0
R 5
tcdr 9 5h C 3 h RC
3 3
rising output (nạp tụ)
Chapter 4 CMOS VLSI Design 19
Diffusion Capacitance
We assumed contacted diffusion on every s / d.
Good layout minimizes diffusion area
Ex: NAND3 layout shares one diffusion contact
– Reduces output capacitance by 2C
– Merged uncontacted diffusion might help too
2C 2C
Shared
Contacted
Diffusion Isolated
Contacted 2 2 2
Merged Diffusion
Uncontacted 3 7C
Diffusion 3 3C
3C 3C 3C 3 3C
Chapter 4 CMOS VLSI Design 20
Layout Comparison
Which layout is better?
VDD VDD
A B A B
Y Y
GND GND
Chapter 4 CMOS VLSI Design 21
Logical Effort Review
Logical Effort
Delay in a Logic Gate
Multistage Logic Networks
Choosing the Best Number of Stages
Example
Summary
Chapter 4 CMOS VLSI Design 22
Introduction
Chip designers face a bewildering array of choices
– What is the best circuit topology for a function?
– How many stages of logic give least delay?
– How wide should the transistors be?
Logical effort is a method to make these decisions
– Uses a simple model of delay
– Allows back-of-the-envelope calculations
– Helps make rapid comparisons between alternatives
– Emphasizes remarkable symmetries
Chapter 4 CMOS VLSI Design 23
Example
A memory designer for an embedded automotive processor.
Help design the decoder for a register file.
A[3:0] A[3:0]
32 bits
4:16 Decoder
Decoder specifications:
16 words
16
Register File
– 16 word register file
– Each word is 32 bits wide
– Each bit presents load of 3 unit-sized transistors
– True and complementary address inputs A[3:0]
– Each input may drive 10 unit-sized transistors
needs to decide:
– How many stages to use?
– How large should each gate be?
– How fast can decoder operate?
Chapter 4 CMOS VLSI Design 24
Delay in a Logic Gate
Express delays in process-independent unit d d abs
Delay has two components: d = f + p
3RC
f: effort delay = gh (a.k.a. stage effort)
3 ps in 65 nm process
– Again has two components 60 ps in 0.6 mm process
g: logical effort
– Measures relative ability of gate to deliver current
– g 1 for inverter
h: electrical effort (or fanout) = Cout / Cin
– Ratio of output to input capacitance
– Sometimes called fanout
p: parasitic delay (normally ~1)
– Represents delay of gate driving no load
– Set by internal parasitic capacitance
Chapter 4 CMOS VLSI Design 25
Delay Plots
d =f+p 2-input
= gh + p 6
NAND Inverter
g = 4/3
Normalized Delay: d
5 p=2
What about d = (4/3)h + 2
4 g=1
NOR2? p=1
3 d=h+1
2 Effort Delay: f
1
Parasitic Delay: p
0
0 1 2 3 4 5
Electrical Effort:
h = Cout / Cin
Chapter 4 CMOS VLSI Design 26
Computing Logical Effort
DEF: Logical effort (g) is the ratio of the input
capacitance of a gate to the input capacitance of an
inverter delivering the same output current.
Measure from delay vs. fanout plots
Or estimate by counting transistor widths
2 2 A 4
Y
2 B 4
A 2
A Y Y
1 B 2 1 1
Cin = 3 Cin = 4 Cin = 5
g = 3/3 g = 4/3 g = 5/3
Chapter 4 CMOS VLSI Design 27
Catalog of Gates
Logical effort of common gates
Gate type Number of inputs
1 2 3 4 n
Inverter 1
NAND 4/3 5/3 6/3 (n+2)/3
NOR 5/3 7/3 9/3 (2n+1)/3
Tristate / mux 2 2 2 2 2
XOR, XNOR 4, 4 6, 12, 6 8, 16, 16, 8
Chapter 4 CMOS VLSI Design 28
Catalog of Gates
Parasitic delay of common gates
– In multiples of pinv (1)
Gate type Number of inputs
1 2 3 4 n
Inverter 1
NAND 2 3 4 n
NOR 2 3 4 n
Tristate / mux 2 4 6 8 2n
XOR, XNOR 4 6 8
Chapter 4 CMOS VLSI Design 29
Example: Ring Oscillator
Estimate the frequency of an N-stage ring oscillator
Logical Effort: g=1 31 stage ring oscillator in
0.6 mm process has
Electrical Effort: h=1 frequency of ~ 200 MHz
Parasitic Delay: p=1
Stage Delay: d=2
Frequency: fosc = 1/(2*N*d) = 1/4N
Chapter 4 CMOS VLSI Design 30
Example: FO4 Inverter
Estimate the delay of a fanout-of-4 (FO4) inverter
d
Logical Effort: g=1
Electrical Effort: h=4 The FO4 delay is about
Parasitic Delay: p=1 300 ps in 0.6 mm process
Stage Delay: d=5 15 ps in a 65 nm process
Chapter 4 CMOS VLSI Design 31
Multistage Logic Networks
Logical effort generalizes to multistage networks
Path Logical Effort G gi
Cout-path
Path Electrical Effort H
Cin-path
Path Effort F f i gi hi
10
x z
y
20
g1 = 1 g2 = 5/3 g3 = 4/3 g4 = 1
h1 = x/10 h2 = y/x h3 = z/y h4 = 20/z
Chapter 4 CMOS VLSI Design 32
Multistage Logic Networks
Logical effort generalizes to multistage networks
Path Logical Effort G gi
Cout path
Path Electrical Effort H
Cin path
Path Effort F f i gi hi
Can we write F = GH?
Chapter 4 CMOS VLSI Design 33
Paths that Branch
No! Consider paths that branch:
15
G =1 90
5
H = 90 / 5 = 18
GH = 18 15
90
h1 = (15 +15) / 5 = 6
h2 = 90 / 15 = 6
F = g1g2h1h2 = 36 = 2GH
Chapter 4 CMOS VLSI Design 34
Branching Effort
Introduce branching effort
– Accounts for branching between stages in path
Con path Coff path
b
Con path
B bi
Note:
h BHi
Now we compute the path effort
– F = GBH
Chapter 4 CMOS VLSI Design 35
Multistage Delays
Path Effort Delay DF f i
Path Parasitic Delay P pi
Path Delay D d i DF P
Chapter 4 CMOS VLSI Design 36
Designing Fast Circuits
D d i DF P
Delay is smallest when each stage bears same effort
fˆ gi hi F
1
N
Thus minimum delay of N stage path is
1
D NF P
N
This is a key result of logical effort
– Find fastest possible delay
– Doesn’t require calculating gate sizes
Chapter 4 CMOS VLSI Design 37
Gate Sizes
How wide should the gates be for least delay?
fˆ gh g CCoutin
gi Couti
Cini
fˆ
Working backward, apply capacitance
transformation to find input capacitance of each gate
given load it drives.
Check work by verifying input cap spec is met.
Chapter 4 CMOS VLSI Design 38
Example: 3-stage path
Select gate sizes x and y for least delay from A to B
y
x
45
A 8
x
y B
45
Chapter 4 CMOS VLSI Design 39
Example: 3-stage path
x
y
x
45
A 8
x
y B
45
Logical Effort G = (4/3)*(5/3)*(5/3) = 100/27
Electrical Effort H = 45/8
Branching Effort B=3*2=6
Path Effort F = GBH = 125
Best Stage Effort fˆ 3 F 5
Parasitic Delay P=2+3+2=7
Delay D = 3*5 + 7 = 22 = 4.4 FO4
Chapter 4 CMOS VLSI Design 40
Example: 3-stage path
Work backward for sizes
y = 45 * (5/3) / 5 = 15
x = (15*2) * (5/3) / 5 = 10
y
x
45
45
A P:
84 P:
x 4
N: 4 P:
y 12 B
B
N: 6 45
N: 3 45
Chapter 4 CMOS VLSI Design 41
Best Number of Stages
How many stages should a path use?
– Minimizing number of stages is not always fastest
Example: drive 64-bit datapath with unit inverter
Initial Driver 1 1 1 1
8 4 2.8
D = NF1/N + P 16 8
= N(64)1/N + N
23
Datapath Load 64 64 64 64
N: 1 2 3 4
f: 64 8 4 2.8
D: 65 18 15 15.3
Fastest
Chapter 4 CMOS VLSI Design 42
Derivation
Consider adding inverters to end of path
– How many give least delay? N - n1 ExtraInverters
Logic Block:
n1 n1Stages
D NF pi N n1 pinv
1
N Path Effort F
i 1
D 1 1 1
F N ln F N F N pinv 0
N
F
1
Define best stage effort N
pinv 1 ln 0
Chapter 4 CMOS VLSI Design 43
Best Stage Effort
pinv 1 ln 0 has no closed-form solution
Neglecting parasitics (pinv = 0), we find = 2.718 (e)
For pinv = 1, solve numerically for = 3.59
Chapter 4 CMOS VLSI Design 44
Sensitivity Analysis
How sensitive is delay to using exactly the best
number of stages? 1.6
1.51
D(N) /D(N)
1.4
1.26
1.2 1.15
1.0
(=6) ( =2.4)
0.0
0.5 0.7 1.0 1.4 2.0
N/ N
2.4 < < 6 gives delay within 15% of optimal
– We can be sloppy!
– Harris uses = 4, exact is process dependent
Chapter 4 CMOS VLSI Design 45
Optimal Power?
Switching vs. Crowbar
Chapter 4 CMOS VLSI Design Slide 46
Example, Revisited
A memory designer for an embedded automotive processor.
Help design the decoder for a register file.
A[3:0] A[3:0]
32 bits
4:16 Decoder
Decoder specifications:
16 words
16
Register File
– 16 word register file
– Each word is 32 bits wide
– Each bit presents load of 3 unit-sized transistors
– True and complementary address inputs A[3:0]
– Each input may drive 10 unit-sized transistors
needs to decide:
– How many stages to use?
– How large should each gate be?
– How fast can decoder operate?
Chapter 4 CMOS VLSI Design 47
Number of Stages
Decoder effort is mainly electrical and branching
Electrical Effort: H = (32*3) / 10 = 9.6
Branching Effort: B=8
If we neglect logical effort (assume G = 1)
Path Effort: F = GBH = 76.8
Number of Stages: N = log4F = 3.1
Try a 3-stage design (But G1 and 4)
Chapter 4 CMOS VLSI Design 48
Gate Sizes & Delay
Logical Effort: G = 1 * 6/3 * 1 = 2
Path Effort: F = GBH = 154
Stage Effort: fˆ F 1/ 3 5.36
Path Delay: D 3 fˆ 1 4 1 22.1
Gate sizes: z = 96*1/5.36 = 18 y = 18*2/5.36 = 6.7
A[3] A[3] A[2] A[2] A[1] A[1] A[0] A[0]
10 10 10 10 10 10 10 10
y z word[0]
96 units of wordline capacitance
y z word[15]
Chapter 4 CMOS VLSI Design 49
Comparison
Compare many alternatives with a spreadsheet
D = N(76.8 G)1/N + P
Design N G P D
NOR4 1 3 4 234
NAND4-INV 2 2 5 29.8
NAND2-NOR2 2 20/9 4 30.1
INV-NAND4-INV 3 2 6 22.1
NAND4-INV-INV-INV 4 2 7 21.1
NAND2-NOR2-INV-INV 4 20/9 6 20.5
NAND2-INV-NAND2-INV 4 16/9 6 19.7
INV-NAND2-INV-NAND2-INV 5 16/9 7 20.4
NAND2-INV-NAND2-INV-INV-INV 6 16/9 8 21.6
Chapter 4 CMOS VLSI Design 50
Review of Definitions
Term Stage Path
number of stages 1 N
logical effort g G gi
H
Cout-path
electrical effort h CCoutin Cin-path
Con-path Coff-path
branching effort b Con-path B bi
effort f gh F GBH
effort delay f DF f i
parasitic delay p P pi
delay d f p D di DF P
Chapter 4 CMOS VLSI Design 51
Method of Logical Effort
1) Compute path effort F GBH
2) Estimate best number of stages N log4 F
3) Sketch path with N stages
1
4) Estimate least delay D NF PN
5) Determine best stage effort ˆf F N1
gi Couti
6) Find gate sizes Cini
fˆ
Chapter 4 CMOS VLSI Design 52
Limits of Logical Effort
Chicken and egg problem
– Need path to compute G
– But don’t know number of stages without G
Simplistic delay model
– Neglects input rise time effects
Interconnect
– Iteration required in designs with wire
Maximum speed only
– Not minimum area/power for constrained delay
Chapter 4 CMOS VLSI Design 53
Summary
Logical effort is useful for thinking of delay in circuits
– Numeric logical effort characterizes gates
– NANDs are faster than NORs in CMOS
– Paths are fastest when effort delays are ~4
– Path delay is weakly sensitive to stages, sizes
– But using fewer stages doesn’t mean faster paths
– Delay of path is about log4F FO4 inverter delays
– Inverters and NAND2 best for driving large caps
Provides language for discussing fast circuits
– But requires practice to master
Chapter 4 CMOS VLSI Design 54