Statistical Timing Analysis using Bounds and Selective
Enumeration
Aseem Agarwal, David Blaauw, *Vladimir Zolotov, -Sarma Vrudhula
University of Michigan, Ann Arbor, MI
*Motorola, Inc., Austin, TX
-University of Arizona, Tucson, AZ
Abstract
The growing impact of within-die process variation has created
the need for statistical timing analysis, where gate delays are modeled as random variables. Statistical timing analysis has traditionally
suffered from exponential run time complexity with circuit size, due
to the dependencies created by reconverging paths in the circuit. In
this paper, we propose a new approach to statistical timing analysis
which uses statistical bounds and selective enumeration to refine
these bounds. First, we provide a formal definition of the statistical
delay of a circuit and derive a statistical timing analysis method from
this definition. Since this method for finding the exact statistical
delay has exponential run time complexity with circuit size, we also
propose a new method for computing statistical bounds which has
linear run time complexity. We prove the correctness of the proposed
bounds. Since we provide both a lower and upper bound on the true
statistical delay, we can determine the quality of the bounds. If the
computed bounds are not sufficiently close to each other, we propose
the use of a heuristic to iteratively improve the bounds using selective enumeration of the sample space with additional run time. The
proposed methods were implemented and tested on benchmark circuits. The results demonstrate that the proposed bounds have only a
small error, which could be further reduced using selective enumeration with modest additional run time.
Categories and Subject Descriptors
B.8.2 [Performance and Reliability]: Performance analysis
General Terms
Algorithms, performance, reliability
1 Introduction
Static timing analysis has become an indispensable part of performance verification. Traditionally, the variation in the underlying process parameters were modeled in static timing analysis (STA) using
so-called case analysis. In this methodology, best-case, nominal and
worst-case SPICE parameters sets are constructed and the timing
analysis is performed several times, each time using one case file.
Each execution of the static timing analysis is therefore deterministic, meaning that the analysis uses deterministic delays for the gates
and any statistical variation in the underlying silicon is hidden.
While this approach has been successfully used in the past to model
die-to-die variations in device and interconnect delay, it is not able to
accurately model variations within a single die. With the continual
scaling of feature sizes, the ability to control critical device parameters on a single die has become increasingly difficult.
Permission to make digital or hard copies of all or part of this work for
personal or classroom use is granted without fee provided that copies are not
made or distributed for profit or commercial advantage and that copies bear
this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. TAU 02, December 2 3, 2002, Monterey, California,
USA. Copyright 2002 ACM 1-58113-526-2/02/0012&$5.00.
Using a worst-case analysis for these so-called within-die variations therefore leads to very pessimistic analysis results since it
assumes that all devices on the die have worst-case characteristics,
ignoring their inherent statistical variation. The emerging dominance
of within-die variations therefore poses a major obstacle for deterministic STA, giving rise to the need for new statistical timing analysis approaches.
Variations in the delays of a circuit can be broadly classified into
two categories: environmental variations and process variations.
Environmental variation are caused by uncertainty in the environmental conditions during the operation of a chip, such as power supply and temperature variations. Process variations are due to
uncertainty in the device and interconnect characteristics, such as
effective gate length, doping concentrations, and oxide thickness.
These variations can be divided into between-die variations (or interdie variation) and within-die variations (or intra-die variations).
Within-die variations can have a deterministic component due to
topologically dependencies of device processing, such as CMP
effects and topologically correlated lithographic distortions. In some
cases, such topological dependencies can be directly accounted for
in the analysis, thereby reducing the statistical variation [11],
whereas in other cases, such variations are treated as random.
In this paper, we propose a formal model and efficient analysis
method for statistical STA in the presence of random within-die process variations. Since between-die variations can be adequately captured using case analysis, we focus on within-die variations. We also
treat all variations as random variations, meaning that topological
dependencies are either removed prior to the analysis or are treated
as random variations. We initially also do not address environmental
variations, although the proposed model and analysis methods can
be extended to such variations.
The extensive use of deterministic STA in recent years is in large
part due to its linear run time complexity with circuit size. In contrast, statistical STA has an underlying worst-case complexity that is
exponential with circuit size, which poses a fundamental obstacle to
its practical application. This high run time complexity is the result
of reconverging paths in the circuit which causes correlations
between their path delays due to shared sections of such paths. Previous statistical STA approaches [4-9] have therefore suffered from
very high runtimes, which has made them applicable only to small
circuits, from the use of approximate methods with unclear accuracy
impact, or from unrealistic assumptions, such as an assumed Gaussian distribution of gate delays.
In this paper, we propose a new method for statistical STA. Since
the formulation of statistical STA has varied in subtle but important
ways in the literature, we first provide a formal model of statistical
STA. We then derive the proposed procedure for statistical STA in a
strict manner from this problem formulation. Since the computational complexity of exact statistical STA is exponential with the circuit size, we present a new method for computing bounds on the
exact statistical delay of the circuit and proof the correctness of these
bounds. By computing only bounds of the true statistical behavior of
the circuit, we are able to preserve the important characteristic of
page 1
deterministic STA that it has a linear run time complexity with circuit size. Since we provide both a lower and upper bound on the true
statistical delay, we can determine the quality or error of the computed bounds. If the bounds are not sufficiently close to each other,
we use a heuristic method to iteratively improve the computed
bounds using selective enumeration of the sample space with additional run time. Combined, the proposed methods provide a statical
STA approach with linear run time, that is guaranteed conservative,
has a bounded error, and can be iteratively refined at the expense of
additional computational effort. The proposed methods were implemented and tested on large benchmark circuits. The difference
between the expected values of the upper and lower bound was
shown to be small, ranging from 4 - 12%, and this difference could
be reduced by 54% on average, using the proposed selective enumeration method, with modest additional run time.
The remainder of this paper is organized as follows. In Section 2
we presents a formal model of statistical STA. In Section 3, we
present a number of probabilistic timing graph transformations. In
Section 4, we derive our method for exact statistical timing analysis.
In Section 5, we present the computation of the lower and upper statistical bounds on the true statistical behavior. In Section 6, we presents the methods for exact graph reduction and for refinement of the
computed bounds using selective enumeration. In Section 7 we
present our results and in Section 8 our concluding remarks.
A path P of a timing graph G is a sequence of nodes, such that
each pair of adjacent nodes ng and nh has an edge egh=(ng,nh). The
path delay dP of path P is the sum of all the delays D(eij) of edge eij
on path P. Among all paths terminating at a node n, we define the
path with the maximum delay as the critical path of n. The delay of
the critical path of node n is referred to as its arrival time, ta(n). Note
that the arrival time for the source node ns is a deterministic value
equal to 0. The critical path of the sink node nf of a timing graph is
referred to as the critical path of the timing graph, and the arrival
time of nf, is referred to as its graph delay.
After fabrication, a deterministic timing graph GD can be conceptually formulated for each die. However, during the design of a chip,
the gate delays are unknown and must be modeled as random variables. Each gate delay is therefore specified either with a probability
distribution function (PDF) or probability density function (pdf) and
we define a probabilistic timing graph GP as follows:
Definition 2. A probabilistic timing graph GP is a timing graph
whose edges are assigned random variables of delay values.
Figure 1 shows an example of a delay probability distribution funcP(t)
1
P(0)=0
P(∞) = 1
2 Statistical Timing Analysis Formulation
In this Section, we present a formal model of statistical static timing analysis. Our goal is to model the impact of gate delays variation
due to within-die process variations on the circuit delay. Although at
design time, the delay of each gate is unknown, after a chip has been
manufactured, the gate delays are fixed and have a deterministic
value for each particular die. The randomness or variability of the
circuit delay is therefore over the fabricated die, and it is the probability distribution of the circuit delay that statistical timing analysis
aims to obtain.
At this point, we do not account for temporal variations of the
gate delays due to environmental factors, such as power supply fluctuations, temperature dependence and noise, which must be modeled
using case analysis. However, our general analysis approach can be
extended to these types of variations as well. Also, for simplicity of
formulation, we ignore the presence of false paths since these are
orthogonal to the issues discussed in this paper. We now give the following definition of a timing graph:
Definition 1. A timing graph G is a directed graph having exactly
one source and one sink node: G={N,E,ns,nf}, where
N={n1,n2,...,nk} is a set of nodes, E={e1,e2,...,el} is a set of edges,
n s ∈ N is the source node, and n f ∈ N is the sink node and each
edge e ∈ E is simply an ordered pair of nodes e=(ni,nj).
The nodes in the timing graph correspond to nets in the circuit, and
the edges in the graph correspond to connections from gate inputs to
gate outputs. Although circuits generally have multiple inputs and
outputs, we can trivially transform them to graphs with a single
source and sink by adding a virtual source and sink node.
In our formulation, a deterministic timing graph GD represents a
particular manufactured die, where each gate has a fixed delay value.
Each edge e in GD is assigned a delay D(e), which represents the
deterministic signal propagation delay from a gate’s input to its output. Similar to other statistical STA methods, we ignore the dependence of gate delay on the transition time of the gate input signals.
0
P( D ≤ t )= F (t ) =
∫0 f ( τ ) dτ
t
f(t)
∞
∫0 f ( t ) dt
= 1
dp=f(t)dt
0
Figure 1. Delay probability density and distribution functions.
tion and its corresponding probability density function. Since these
functions represent the variation of gate delays, they have the following obvious but important properties:
Property 1. A delay pdf is non-zero only on a finite interval.
Property 2. A delay PDF equals 0 for all delay values less than
its minimum value dmin and equals 1 for all values greater than its
maximum value dmax.
These properties follow from the fact that the delay of a real gate
cannot be less that some finite minimum delay value dmin or more
than some finite maximum delay value dmax. Similar to previous statistical STA methods [4][5], we assume statistical independence of
all edge delays. In practice, edge delays may be spatially correlated,
which complicates the analysis by creating additional correlations
between path delays. The contribution of this paper is therefore that
it provides an efficient solution to the problem of path delay correlation due to path reconvergence. The methods presented in this paper
can also be extended to timing graphs with correlated edge delays.
Note also that our method does not restrict the shape of the PDF of
edge delays, which in general is not Gaussian.
To simplify the implementation of statistical STA it is often more
convenient to approximate continuous probability density and distribution functions with discrete functions. A discrete pdf, corresponding to continuous pdf f(t), can be represented by a sequence of pairs
(di,pi), where d i = i∆ and p i =
di + ∆ ⁄ 2
∫d – ∆ ⁄ 2 f ( τ ) dτ. For computational
i
efficiency, we use discreet pdfs and PDFs in the final implementation
of our proposed statistical timing analysis approaches. However, for
page 2
generality, we will formulate the statistical timing analysis task
using continuous functions.
We now consider the sample space S of a probabilistic timing
graph GP, consisting of all deterministic timing graphs GD with edge
delays corresponding to the non-zero values of their probability distribution functions. The probability that a timing graph GD in S has
an edge i with delay Di between ti and Ti is
P ( t 1 ≤ τ 1 ≤ T 1, t 2 ≤ τ 2 ≤ T 2, … ) =
(EQ 1)
T1 T2
… p 1 ( τ 1 ) p 2 ( τ 2 )…dτ 1 dτ 2 …
t1 t2
∫ ∫
deterministic timing graph GD in S, we can compute its delay
D(GD), using any of the currently available means, such as traditional static timing analysis. The delay D(GP) is therefore defined on
the sample space S and is a random variable which completely
defines its timing behavior. The PDF of D(GP) is defined as follows:
Definition 3. The probability distribution function of the delay of
a probabilistic timing graph is expressed as:
∫
D ( GD ) ≤ t
p 1 ( t 1 ) p 2 ( t 2 )…dt 1 dt 2 … ,
(EQ 2)
where pi(ti) is the probability density function of the delay of edge
i and the integration is performed over the volume of sample
space where delay D(GD) of timing graph GD is less than t.
The probability density function can be computed by simple differentiation. If we use discreet edge delay pdfs, we can compute the
graph delay pdfs by enumerating the entire sample space consisting
of all permutations of the non-zero delay probabilities of all edges.
Of course, this method is exponential in its run time complexity with
circuit size and is not useful as a practical solution. However, its formulation is useful as a formal definition and for understanding the
underlying problem that needs to be solved.
3 Probabilistic Timing Graph Transforms
Before we discuss efficient methods for computing the graph
delay PDF, we discuss three basic transformations for probabilistic
timing graphs using the formulation in Section 2.
∞
∫0 p ( t – τ ) ⋅ q ( τ ) ⋅ dτ
(EQ 4)
2. Parallel Reduction.
Figure 2(b) shows a timing graph GP consisting of two parallel
edges with delays described by pdfs p(t) and q(t). For each of the
deterministic graphs GD in sample space S(GP), the graph delay
D(GD) is the maximum of its two edges delays. Applying EQ2, we
can express the probability distribution function DGp(t) as:
P ( D Gp ≤ t ) =
where p i ( τ i ) is the probability density function of edge i. Given a
P ( D ( GP ) ≤ t ) =
p Gp ( t ) =
∫
max ( t 1, t 2 ) ≤ t
p ( t 1 ) ⋅ q ( t 2 )dt 1 dt 2
(EQ 5)
The random variable DGp(t) is the statistical maximum of the two
independent random variables p and q. From standard probability
theory [3] it is well know that the probability distribution function
PG(t) of this maximum is:
PG ( t ) = P ( t ) ⋅ Q ( t )
(EQ 6)
By differentiation, we obtain the probability density function pG(t):
pG ( t ) = P ( t ) ⋅ q ( t ) + p ( t ) ⋅ Q ( t ) ,
(EQ 7)
Therefore, the edges in Figure 2(b) can be replaced with a single
edge have the pdf pG(t).
3. Subgraph substitution.
Let graph GP have subgraph GP1, as shown in Figure 3,such that:
1. GP1 and complementary graph GP1=GP-GP1 have only 2 common nodes a and b with edges belonging to both GP1 and GP1.
2. Node a (b) has only fanin (fanout) edges belonging to subgraph
GP1 and fanout (fanin) edges belonging to subgraph GP1.
Let the subgraph GP1 have a graph delay pdf d(GP1). We can substitute subgraph GP1 in graph GP with a single edge (a,b) having the
same delay pdf d(GP1). This results in a simpler timing graph with
the same graph delay. This can be proven by rearranging the integration in EQ2 for the initial graph delay PDF and separating the integration over the random variables corresponding to the delays of
subgraph GP1.
4 Statistical Timing Analysis
1. Series Reduction.
Figure 2(a) shows a probabilistic timing graph consisting of two
series connected edges with delays described by p(t) and q(t). The
total delay of the timing graph is the sum of its edge delays and by
applying EQ2, the PDF of the graph delay DGp(t) is:
P ( D Gp ≤ t ) =
∫
t1 + t2 ≤ t
p ( t 1 ) ⋅ q ( t 2 )dt 1 dt 2
(EQ 3)
The sum of the two independent edge delays is the convolution of its
edge delays, as is well known from standard probability theory [3]
and the two edges can be replaced with a single edge having the following probability density function:
The initial formulation presented in the Section 2 relies on the
enumeration of all possible edge delays with non-zero probability
and is difficult to use for an efficient solution to the problem. Deterministic timing analysis has traditionally used an approach where
arrival times are propagated through the circuit in topological order.
We therefore derive such a propagation based approach for computing the graph delay PDF DGp, in a manner that is consistent with the
definition of DGp in Section 2. We first define the probability distribution of the latest arrival time, An(t) at node n as follows:
p(t)
n1 p(t)
n2
q(t)
n3
n1
(a)
n2
q(t)
(b)
Figure 2. Series and parallel reduction
Figure 3. Subgraph substitution with a single edge
page 3
Definition 4. The latest arrival time PDF, An(t) at node n of GP is
the probability that a deterministic timing graph GD in the sample
space S(GP) has an arrival time t a ( n ) ≤ t .
In the subsequent discussion, we will refer to the latest arrival time
as simply the arrival time, noting that a similar derivation can be performed for the earliest arrival time. We also make the following useful definition.
Definition 5. A fanin subgraph GS,n of timing graph GP at node n
is a timing graph consisting of all edges and nodes of GP that lie
on a path from the source node ns of GP to node n, and where
node n is set as the sink node nf of GS,n.
From Definition 4 and 5, it follows that the arrival time An at node n
is equivalent to the graph delay of subgraph GS,n. This is clear from
the fact that we can remove all edges from GP that are not present in
GS,n without affecting the arrival time An, define n as the new sink
node, at which point the arrival time at n becomes equal to the graph
delay of GS,n. Hence, computing arrival time distributions and graph
delay distributions are equivalent problems. The objective of statistical timing analysis is to compute the arrival time PDF of node n
based on the arrival time PDFs of its predecessor nodes np. We can
then use such a method to propagate arrival times through the circuit
in topological fashion. To compute the arrival time of n, we must
consider if the arrival times of its predecessor nodes np are independent random variables. We now state the following theorem:
Theorem 1. Two arrival times An,i and An,j at nodes ni and nj are
independent if the fanin subgraphs GS,i and GS,j at nodes ni and nj
are disjoint (meaning they have no common edges) or if any common edges have a deterministic delay.
Proof: The arrival times An,i and An,j are equal to the graph delays
DS,i and DS,j of subgraphs GS,i and GS,j. From EQ2 it is clear that
DS,i = Fi(di1, di2, ... dik ...) and DS,j = Fj(dj1, dj2, ... djk ...), where dik
and djk are variables denoting the edge delays corresponding to
edges in fanin subgraphs GS,i and GS,j. Since any shared variables
are deterministic, the functions DS,i and DS,j do not share any random variables. Therefore, they are statistically independent because
initially we assumed that edge delays are independent.
We first consider the case where the arrival times of predecessor
nodes np of node n are independent.
Independent arrival time propagation.
Without loss of generality, we consider a node n with two predecessor nodes np,1 and np,2. We then consider fanin subgraph GS,n at
node n, as shown in Figure 4(a). Since the arrival times A1 at node
np,1 and A2 at node np,2 are assumed to be independent, their fanin
subgraphs GS,1 and GS,2 are disjoint, and using the subgraph substitution from Section 3, each can be replaced with a single edge ep,1
and ep,2, as shown in Figure 4(b). If GS,1 and GS,2 share edges with
deterministic delay we can first split these edges without impacting
the circuit delay and obtain disjoint fanin subgraphs. The edge delay
PDF of ep,1 and ep,2 are set equal to the graph delay PDFs of GS,1
and GS,2, which is equal to the arrival time PDFs A1 and A2. The
resulting graph, as shown in Figure 4(b), can be reduced to a single
edge by performing series and parallel reduction. The obtained edge
delay PDF of the final edge between ns and n is equal to the graph
delay PDF of subgraph GS,n, which, in turn, is equal to the arrival
time PDF An at node n.
Given independent arrival time PDFs Ap at predecessor nodes np,
we therefore compute the arrival time PDF An at node n as follows:
1. Convolve the arrival time pdf Ap,i at each node np,i with the edge
delay pdf of the edge connecting np,i with n.
2. Compute the arrival time PDF An at node n by performing the
statistical maximum function using EQ6 on the convolved PDFs.
Using the above procedure, the computation of the arrival time
pdfs in statistical timing analysis is therefore very similar to arrival
time propagation in deterministic timing analysis, where propagation of deterministic arrival times is replaced with convolution and
selection of the latest arrival time is replaced with parallel reduction.
The run time complexity is linear with the size of the circuit. Unfortunately, this procedure is only valid if the arrival times are independent for each node in the circuit. For this to be true for all nodes, the
graph will have the form of a tree-like structure, with the root of the
tree as the sink node of the graph and all leaf nodes of the tree connected to the source node ns with single edges. In practice, such timing graphs are rare and we therefore now discuss how to compute
arrival times in the presence of dependent predecessor arrival times.
Dependent arrival time propagation.
If the fanin subgraphs GS,p of predecessor nodes np of node n
share one or more edges with random delays, the arrival times Ap
will be dependent random variables and the statistical maximum
function cannot be applied. An example of such a graph is shown in
Figure 5(a). To determine for which portions the subgraphs GS,p do
not share edges, we use the following definition of a dependence set.
Definition 6. Consider the set of k predecessor nodes np of node
n, with subgraphs GS,p at nodes np, and the intersection graph
GI={NI,EI} consisting of edges and nodes shared by two or more
subgraphs, E I =
∪
i, j ≤ k
( E S, i ∩ E S, j ) , N I = ∪ ( N S, i ∩ N S, j ) ,
i, j ≤ k
the dependence set of n is the set of nodes n d ∈ N I , n d ≠ n s with
one or more fanout edges e i ∉ G I .
In Figure 5(a), the set of dependence nodes for node nf is {b, d}.
Note that a is not in the dependence set of nf since both its fanout
edges belong to GI. However, the dependence set of d is {a}. Also
nodes h and e have empty dependence sets. The concept of dependence nodes is similar to that used in probabilistic simulation [12].
np,1
e1
e2
ns
(a)
np,2
ep,1
ns
(b)
n
ep,2
np,1
e1
e2
n
np,2
Figure 4. Arrival time computation for node n. Dotted edges in (a)
belong to graph Gp only and solid edges also belong to GS.
page 4
We refer to a node in GP as a convergence node nc if it has a nonempty dependence set. We also define the global set of dependence
nodes nD as the union of the dependence sets nd,i and refer to a node
as a dependence node, if it is an element in this list.
In order to compute the graph delay of a graph Gp with one or
more dependence nodes, we sort the list of global dependence nodes
in topological order. We then consider the first node nD,1 in the
ordered set nD. In Figure 5(a), nD = {a,b,d}, and nD,1 = a. By selecting the first node nD,1 in the list, we ensure that fanin subgraph GS,1
at node nD,1 does not contain any dependence nodes, and it follows
that we can replace GS,1 with a single edge e1 connecting source
node ns and nD,1, where the edge delay PDF D1 of e1 is equal to the
arrival time PDF A1 at nD,1, as shown in Figure 5(b). Similarly, it is
clear that the arrival time PDF A1 at nD,1 can be computed using
independent arrival time propagation.
For simplicity, we assume that the edge delay pdf D1 is discrete
and is specified by a set of delay, probability pairs (di,pi). According
to our construction, random variable D1 does not depend on the edge
delays of other edges in the transformed graph Gp. Then, using conditional probabilities [3] the arrival time pdf Ax at any node x, can be
computed as A x =
∑
k
p
i=0 i
⋅ A x, i , where Ax,i is the arrival time pdf
at node x, when the delay of e1 is equal to di. We therefore compute
the arrival time pdf Ax by performing k arrival time computations,
each weighted by the conditional probability P(D1 = di). Since during the computation of Ax,i, edge e1 has a deterministic delay it is no
longer a random variable and does not create dependence between
arrival times. Node nD,1 is therefore no longer a dependence node
and we can propagate arrival times using independent arrival time
propagation until we encounter the next global dependence node,
nD,2. Here, we repeat the same process, enumerating the arrival time
pdf at nD,2 using conditional probabilities and eliminating it as a
dependence node.
Below is the procedure for dependent arrival time propagation:
1. Identify all dependence nodes in the circuit.
3. Enumerate the pairs (ti, pi) of arrival time pdf Ad at nd and for
each pair propagate ti with conditional probability pi.
e
(a)
a
f
d
ns
nf
c
h
g
e
b
(b)
ns
e1
a
nf
c
e 11
ns
(c)
e 21
a1
h
e
b
f
d
a2
nf
c
g
Since we recursively enumerate the arrival time pdfs of all dependence nodes, the complexity of this approach grows exponentially
with the number of total dependence nodes in a circuit. Note, however, that the number of dependence nodes will, in practice, be significantly less then the number of edges in GP. Dependent arrival
time propagation therefore has a lower complexity than enumeration
of the entire sample space. It can be shown that the set of nodes at
which arrival times are enumerated is the sufficient and necessary set
for exact arrival time computation. It is therefore not possible to enumerate fewer nodes without creating arrival time dependencies in the
circuit. Nevertheless, dependent arrival time propagation is useful
only for very small timing graphs or timing graphs with mostly treelike structures. In Section 5, we show how to efficiently compute
bounds on the arrival time PDFs and in Section 6, we discuss how
these bounds are improved by selective enumeration.
5 Statistical bounds
We now propose an efficient methods for computing lower and
upper bounds of the exact arrival time PDF of GP. We are interested
in both upper and lower bounds since this allows us to determine the
quality of the bounds by comparing their difference. Also, for digital
circuits, analysis of both slow and fast paths are important for correct circuit operation and therefore a lower bound on the earliest
arrival time PDF could be useful. We define the upper and low
bounds of a PDF as follows:
Definition 7. The arrival time PDF P(t) is an upper bound of the
arrival time PDF Q(t) if and only if for all t, P ( t ) ≤ Q ( t ) .
A similar definition and property can be formulated for lower
bounds. Figure 6 shows two arrival time PDFs P(t) and Q(t), where
h
P(t)
Figure 6. Lower and upper bounds of a PDF
P(t) is an upper bound on Q(t). By using PDF P(t) instead of Q(t),
we will overestimate the delay corresponding to a particular probability, resulting in a conservative analysis. Note also that not only the
expected value µ p of P(t), but also other characteristics, such as the
95% confidence point, are bounded by P(t) on Q(t).
Upper bound computation.
To efficiently compute an upper bound on the exact graph delay
PDF of Gp, we propose the following theorem for random variables:
f
d
g
5. Compute the final arrival time pdf at node x by summing the conditional arrival time pdfs weighted by the product of their conditional probabilities.
Q(t)
2. Propagate arrival time pdfs in the circuit until the first dependence node nd is encountered.
b
4. Propagate ti, using independent arrival time propagation until the
next dependence node is encountered and repeat step 3.
Theorem 2. Let x, y and z be random variables that satisfy Property 1 and Property 2. Let x1, x2 be random variables with probability distributions that are identical to x. Then, the PDF of
random variable max(x1+y, x2+z) is an upper bound on PDF of
the random variable max(x+y, x+z).
Proof. The probability distribution function of random variables
max(x+y, x+z) and max(x1+y, x2+z) are:
Figure 5. Dependent arrival time computation for node n.
page 5
P(t ) =
∫x + max ( y, z ) ≤ t p ( x )q ( y )r ( z ) dx dy dz
∫
Q(t ) =
max ( x 2 + y, x 2 + z ) ≤ t
p ( x 1 ) p ( x 2 )q ( y )r ( z ) dx 1 dx 2 dy dz
(EQ 8)
∫
p ( x ) p ( v )q ( y )r ( z ) dx dv dy dz
(EQ 10)
max ( x + y, x + z ) ≤ t, – ∞ ≤ v ≤ ∞
Integrals from formulae EQ9 and EQ10 for probability distributions
Q(t) and P(t) have the same integration functions
f(x1,x2,y,z)=p(x1)p(x2)q(y)r(z) and f(x,v,y,z)=p(x)p(v)q(y)r(z) and differ only in the names of the variables. We now split the 4D domain
of both functions into two subdomains: y ≤ z and y > z . Probability
distributions Q(t) and P(t) can be represented as the sum of two
terms corresponding to the contribution of each subdomain.
Q(t ) = Qy ≤ z(t ) + Qy > z(t )
P(t ) = Py ≤ z(t ) + Py > z(t )
a
n1
(EQ 11)
For subdomain y ≤ z we define a one to one mapping (bijection) so
that (x1,x2,y,z) corresponds to (v,x,y,z) i.e. x1=v and x2=x. In this subdomain, inequality max ( x + y, x + z ) ≤ t follows from inequality
max ( x 1 + y, x 2 + z ) ≤ t . Therefore, the region of integration for
computing P y ≤ z ( t ) in this subdomain includes the integration
region for computing Q y ≤ z ( t ) and hence Q y ≤ z ( t ) ≤ P y ≤ z ( t )
because in both cases we integrate the same function.
For subdomain y > z we define a one to one mapping (bijection)
so that (x1,x2,y,z) corresponds to (x,v,y,z) i.e. x1=x and x2=v. Similar
to the above consideration, max ( x 1 + y, x 2 + z ) ≤ t follows from
max ( x + y, x + z ) ≤ t in this subdomain and the region of integration for computing P y > z ( t ) includes the region of integration for
computing Q y > z ( t ) . Therefore, Q y > z ( t ) ≤ P y > z ( t ) .
Combining inequalities for Q(t) and P(t) from each subdomain,
we obtain the inequality Q ( t ) ≤ P ( t ) for the whole sample space
which proves the theorem.
We now consider the simple graph Gp1 shown in Figure 7(a) with
delay equal to max((da+db+dd), (da+dc+de)), where di is the delay
of edge i. Figure 7(b) shows the timing graph Gp2 where edge a is
split into edges a1 and a2 with the same delay PDFs as a. From Theorem 2, it follows that Gp2 has a delay PDF that is an upper bound on
delay PDF of the graph Gp1. In fact, it is clear that the PDF of arrival
times at all nodes in Gp2 are upper bounds of the PDF of arrival
times of corresponding nodes in Gp1 and hence we refer graph Gp2 is
an upper bound on graph Gp1. In general, Gp1 can have a more complex structure with additional fanin and fanout edges at node n2, etc.
It can be shown that for a general timing graph Gp1, splitting an edge
into multiple edges as illustrated in Figure 7 results in a graph Gp2
that is an upper bound on Gp1. Based on this graph representation of
Theorem 2, we therefore pose the following Corollary:
Corrolary 1. If for graph GP1 with one or more convergence
nodes, arrival times are computed for all nodes using the procedure of independent arrival time propagation, the computed
b
c
(EQ 9)
Multiplying equation EQ8 by the integral of probability density
function p(x) from minus infinity to infinity, rearranging some of the
terms and renaming integration variables, gives us:
P(t ) =
ns
ns
a1
a2
n11
n21
b
c
n2
n3
n2
n3
d
ns
e
d
ns
e
Figure 7. Bounded graph transformation through node splitting.
arrival time PDFs will be an upper bound on the true arrival time
PDFs at those nodes.
The validity Corrolary 1 can be seen by considering the timing
graph Gp1 with dependence nodes nD, as illustrated in Figure 5(a).
Following the procedure for dependent arrival time propagation, we
replace subgraph GS,1 with a single edge e1, as shown in Figure 5(b),
where the edge delay PDF D1 of e1 is equal to the arrival time PDF
A1 at nD,1. We now create a graph Gp2, which bounds Gp1 by splitting edge e1 as shown in Figure 5(c), such that nD,1 is no longer a
dependence node in Gp2. By repeating this process for all dependence nodes, we obtain a timing graph Gp,k that bounds the original
timing graph Gp1 and which has no convergence nodes. We can
compute the exact arrival time PDFs of Gp,k by performing independent arrival time propagation. Finally, it is easy to observe that we
need not explicitly replace subgraph GS,1 with edge e1 and split it,
and that we will compute identical arrival times to those of Gp,k by
simply performing independent arrival time propagation on graph
Gp1, as stated in Corrolary 1.
This leads to the useful observation that an upper bound on the
arrival times of a timing graph Gp is obtained by ignoring the dependencies of arrival times and simply applying the procedure for independent arrival time computation, which has a linear run time
complexity with circuit size.
Lower bound computation.
We now discuss the computation of a low bound on the exact
arrival time PDFs. Given two random variables, x and y, it is obvious
that x and y are lower bounds of max(y, z). Therefore, given a timing
graph Gp1={N1,E1}, and the graph Gp2= {N2,E2}, where
N 2 ⊆ N 1, E 2 ⊆ E 1 , graph Gp2 is a lower bound of Gp1. For a graph
Gp1 with one or more convergence nodes nc, we therefore simply
remove all but one of the fanin edges to predecessor nodes with
dependent arrival times. By doing so for all convergence nodes, we
create a graph Gpk which is a lower bound on Gp1 and for which the
exact arrival times can be computed using the independent arrival
time propagation. Any one of the fanin edges of nc can be preserved
to obtain a lower bound on Gp1. However, in order to obtain a tight
lower bound, it is advantageous to preserve a fanin edge whose
arrival time has the largest expected value among all fanin edges.
The lower bound computation is therefore identical to independent
arrival time propagation, except that at convergence nodes, the statistical maximum function is replace with a simple selection of the
arrival time with the largest expected value. Note that for nodes with
empty dependence lists, such as nodes e and h in Figure 5(a), regular
parallel reduction is performed, since their arrival times are independent.
page 6
6 Graph Reduction and Selective Enumeration
Our overall approach to statistical timing analysis first performs
graph reduction to decrease the problem size, followed by upper and
low bound computation and enumeration of a select set of dependence nodes to improve the computed bounds.
Exact graph reduction.
In order to reduce the size of the timing graph, we prune arrival
times by considering their relative alignment. Given two arrival time
PDFs, A1 and A2 that converge at a node, if the minimum arrival time
with a non-zero probability t1,min of A1, is greater than the maximum
arrival time of with a non-zero probability, t2,max of A2, it is clear
that in the entire sample space S, A1 will be greater A2. We can therefor prune arrival time A2 from the timing graph without changing its
timing behavior. Note that, using this approach, entire subgraphs can
at times be removed from the timing graph.
We also utilize the series and parallel graph transforms discussed
in Section 3 to reduce the size of the timing graph. The pruning
method and the series / parallel reduction are executed in a loop,
until no further improvement is made. Note that while the series
transform and pruning method reduce the graph size, the parallel
transform also resolves arrival time dependencies by removing local
reconvergences and improves the quality of the computed bounds.
Selective enumeration.
In order to improve the quality of the bounds described in Section
5, we can combine the bound computation with enumeration of a
small set of dependence nodes. Enumeration of dependence nodes
improves the computed bounds in two ways. First, the enumeration
partitions the sample space S and thereby reduces the dependencies
of arrival time PDFs. Second, when all dependence nodes of a particular convergence nodes are enumerated, the arrival times at this
node are independent and the low bound can be computed using statistical maximum, instead of selection of the arrival time with the latest expected value. Selective enumeration therefore has the desirable
property that as the number of enumerated dependence nodes
increases, the quality of the bounds increases monotonically. Also,
when all dependence nodes are enumerated, the lower and upper
bounds will become equal to the exact arrival time PDF.
Our objective is to obtain an improvement in the bounds through
enumeration of a minimum number of dependence nodes. We therefore need to select those dependence nodes that most strongly
impact the quality of the bounds at the sink node. For simplicity, we
measure the difference between the upper and lower bounds as the
difference of their expected values and refer to this measure at the
bound difference. However, other measures could also be applied.
We now consider two correlated arrival time PDFs A1 and A2 that
converge at node n. Figure 8, shows the expected value of the upper
11
10
mean of the output pdf
9
8
( + ) upper bound
7
( o ) exact
( * ) lower bound
6
5
−6
−4
−2
0
2
4
6
Separation between means of input1 and input2
Figure 8. Expected values with shift between arrival time PDFs.
and lower bound PDFs and the exact arrival time PDF at node n, as a
function of the alignment of A1 and A2. As the shift between A1 and
A2 increases, the two bounds rapidly converge to the exact arrival
time. This is caused by the fact that if one arrival time PDF lies significantly after the other, the later arrival time PDF is said to dominate and the dependence between the two arrival time PDFs has little
impact. Therefore, two dependent arrival times that combine at a
convergence node nc will generate a significant bound difference
only if their arrival time PDFs are aligned, making enumeration of
the dependence nodes of nc necessary only in this case. Also, it is
possible that two dependent arrival times give rise to a large bound
difference at convergence node nc, but this bound difference does not
propagate to the sink node because along the propagation path, the
arrival time bounds combine with the arrival time bounds which
dominate. Therefore, enumeration of the dependence nodes of nc is
only necessary if the bound difference at nc propagates to nf.
In order to determine which arrival time dependencies are responsible for the bound difference at the sink node, we use an iterative
search procedure. In each iteration, we identify a node nc where the
convergence of dependent arrival times is expected to strongly contribute to the bound difference at the sink node. Then, from the
dependence list of nc, one node is selected and is added to a global
list of enumeration nodes. If the dependence list of nc contains multiple nodes, we select among these nodes the one with the widest PDF,
since it is most likely to contribute significantly to the bound difference at nf. New lower and upper bounds are then computed for the
timing graph using enumeration of the selected nodes in the global
enumeration list. If the bound difference at the sink node does not
improve significantly with the addition of a particular dependence
node, it is removed from the enumeration list and is marked to prevent future selection. Also, if the PDFs of a selected dependence
node contains many time points, we will partition it into sections,
and enumerate these sections, instead of its individual time points.
This results in much improved run time efficiency, while still substantially improving the bound difference.
To determine where convergence of dependent arrival times is
responsible for a large bound difference at the sink node, we perform
a backward traversal from the sink node. At each visited convergence node nc, we estimate, based on the alignment of the arrival
time PDFs, if the bound difference at nc results from convergence of
similarly aligned dependent arrival times at nc, or from propagation
of a bound difference from a dominant arrival time. Accordingly,
either a node from the dependence list of nc is selected, or the traversal continues along the fanin edge of the dominant arrival time.
7 Results
The proposed statistical timing analysis methods, including the
proposed exact graph reduction, computation of upper and lower
arrival time bounds and the method for selective enumeration were
implemented. Also, the exact statistical timing analysis methods
through edge enumeration described in Section 2 and dependent
arrival time propagation described in Section 4 were implemented.
In the cases where it was possible to compute the exact graph delay
PDF with these methods, it was used to confirm the correctness of
the computed bounds. For larger circuits, Monte Carlo simulation
with 100,000 samples was used. To properly stress the proposed
methods, fairly large standard deviations were used for the edge
delay PDFs in the timing graphs, ranging between 15% and 35% of
their mean.
In Table 1, we show the circuit characteristics and results for the
ISCAS [10] benchmark circuits. The table shows the number of convergence nodes (# conv) and the average and maximum number of
page 7
Table 1. Circuit statistics and exact reduction improvement
Circuit
name
c17
c499
c432
c880
c1355
c1908
c2670
c3540
c5315
c6288
c7552
#node #edges
13
19
245
481
198
379
445
815
589
1137
915
1556
1428 2449
1721 3011
2487 4687
2450 4864
3721 6459
#conv
3
38
39
85
240
103
259
480
152
1626
391
# dependence
exact reduction
avg
0.8
6.3
7.9
6.4
2.5
5.9
5.5
23.5
4.2
93.7
6.1
edges
12
318
134
271
695
598
1002
1581
424
3605
1116
max
3
56
32
48
26
85
51
275
15
1142
98
% imp
37%
34%
65%
67%
39%
62%
59%
48%
91%
26%
83%
dependence nodes per convergence node (# dependence). Some circuits have extensive reconvergence, indicated by their high number
of dependence nodes, making them difficult test cases for statistical
timing analysis. The last two columns of Table 1 show the effectiveness of the exact reduction techniques. On average, the number of
edges in the graph is reduced by 56% with a maximum reduction of
91%. The runtime was less than 6 seconds for all test cases.
Table 2 shows the results for the bound computation and refinement through selective enumeration. The expected value of the lower
and upper bound PDFs (bound low/upper) and their relative difference (bound diff) is shown. Although we only report the expected
value in Table 2, the computed bounds are pdfs and allow the computation of other useful values, such as confidence points. We also
show deterministic bounds (det. bound), obtained by selecting for
each edge the minimum or maximum edge delay with non-zero
probability and computing the graph delay with deterministic STA.
The statistical upper and lower bounds have a relatively small difference in their mean value of 4% to 12%, showing their effectiveness.
For all circuits, the run time of the bound computation did not
exceed 5 seconds. Also, the Monte Carlo results fall between the
computed bounds, as expected.
Table 2 also shows the bounds after selective enumeration (selective enum bound) and the percentage improvement of their difference compared to the original bounds. For the other circuits, the
improvement ranged between 31 - 92% with an average of 64%. The
number of dependence nodes selected for enumeration was small,
ranging from 3 to 7 nodes, showing the somewhat surprising result
that enumerating only a few carefully chosen nodes can significantly
improve the bound computation. The run time of the selective enumeration method (time) is modest, not exceeding 8 minutes for all
test cases. Finally, Figure 9 shows the PDFs for the proposed lower
C7552
1
0.9
(
(
(
(
0.8
+)
o)
*)
A)
Lower Bound (Initial)
Upper Bound (Initial)
Improved Bounds
Monte Carlo
0.7
prob
0.6
0.5
0.4
0.3
0.2
0.1
0
18
20
22
24
26
28
30
32
ns
Figure 9. Comparison of PDF bounds and Monte-Carlo PDF
8 Conclusions
In this paper, we have proposed an efficient method for computing
bounds on the statistical behavior of the circuit delay due to withinchip process variations. We first presented a general method for statistical timing analysis as well as a method that substantially reduce
the problem size. Since the exact statistical timing analysis method
has exponential run time complexity with circuit size, we show how
statistical bounds on the graph delay can be computed with linear
run time complexity. We prove the correctness of the upper and
lower bounds and demonstrated that the obtained bounds are close.
In order to further reduce the difference between the bounds, we proposed an iterative refinement technique which selectively enumerates dependence nodes in the circuit. Using this technique, the
difference in the expected value of the bounds could be reduced by
54% on average, with a modest run time increase.
Table 2. Results of bound computation and selective enumeration
det bound
circuit
c17
c499
c432
c880
c1355
c1908
c2670
c3540
c5315
c6288
c7552
low/
upper
1.6/2.4
6/9.6
8.8/15.0
11.6/18.4
11.6/19.0
17.4/28.8
15.2/24.6
20.2/33.6
21.4/34.0
54.4/89.8
19.6/30.8
monte
carlo
2.24
8.48
12.5
14.9
16.7
23.4
20.7
27.4
27.9
75.7
25.5
bound
low/
upper
2.18/2.28
7.64/8.7
11.9/13.1
14.5/15.3
15.2/17.1
22.7/23.9
19.9/21.4
26.5/28.1
27.6/28.6
72.5/79.0
24.7/26.0
selective enum bound
low/
time
% impr
upper
(sec)
5%
2.24/2.24 100%
4
12% 8.19/8.58
62%
150
9% 12.44/12.54 92%
240
6% 14.90/14.98 90%
150
11% 15.80/16.95 39%
290
5% 23.30/23.70 67%
300
6% 20.45/20.90 70%
170
6% 27.15/27.78 63%
180
4% 27.70/28.10 60%
340
8% 73.80/78.30 31%
480
5%
25.2/25.6
69%
250
%diff
Acknowledgements
This research was supported by SRC contract 2001-HJ-959 and
NSF grant CCR-0205227.
References
[1] R.B.Hitchcock “Timing verification and the Timing Analysis
program”, Proc., Design Automation Conf., 1982, pp.594-604
[2] N.P.Jouppi “Timing analysis for nMOS VLSI”, IEEE/ACM
Design Automation Conf., 1983, pp. 411-418
[3] W.P.Feller “An Introduction to Probability Theory and its Applications”, Vol. 1,2 John Wiley & Sons, New York, 1970.
[4] J.J Liou, K.T. Cheng, S. Kundu, A. Krstic, “Fast Statistical Timing Analysis By Probabilistic Even Propagation”, DAC 2001
[5] S.Devadas; H.F.Jyu; K.Keutzer; S.Malik “Statistical timing analysis of combinational circuits “, ICCD 1992 pp. 38 -43
[6] M. Berkelaar, “Statistical Delay Calculation, a Linear Time
Method,” Proc. of TAU, 1997.
[7] R.B. Brawhear, N. Menezes, C. Oh, L. Pillage, R. Mercer, “Predicting circuit performance using circuit-level statistical timing
analysis”, European Design and Test Conference, 1994.
[8] R.-B. Lin; M.-C. Wu, “A new statistical approach to timing analysis of VLSI circuits”, Proc. Int. Conf. on VLSI Design, 1998
[9] M. Orshansky, K. Keutzer, “A general probabilistic framework
for worst-case timing analysis”, Proc. DAC 2002.
[10]F. Brglez, H.Fujiwara, “A Neutral Netlist of 10 Combinatorial
Benchmark Circuits”, Proc. IEEE ISCAS, 1985, pp.695-698
[11]M. Orshansky, L. Milor, P. Chen, K. Keutzer, C. Hu, “Impact of
systematic spatial intra-chip gate length variability on performance of high-speed digital circuits”, ICCAD 2000, pp. 62 -67.
[12] F. Najm, R. Burch, P. Yang, I. Hajj, “Probabilistic simulation
for reliability analysis of CMOS VLSI circuits” IEEE Trans. on
CAD, Volume: 9 Issue: 4, April 1990
and upper bound with and without selective refinement as well as the
PDF obtained through Monte-Carlo simulation for circuit c7552.
page 8