[go: up one dir, main page]

Academia.eduAcademia.edu
Statistical Timing Analysis using Bounds and Selective Enumeration Aseem Agarwal, David Blaauw, *Vladimir Zolotov, -Sarma Vrudhula University of Michigan, Ann Arbor, MI *Motorola, Inc., Austin, TX -University of Arizona, Tucson, AZ Abstract The growing impact of within-die process variation has created the need for statistical timing analysis, where gate delays are modeled as random variables. Statistical timing analysis has traditionally suffered from exponential run time complexity with circuit size, due to the dependencies created by reconverging paths in the circuit. In this paper, we propose a new approach to statistical timing analysis which uses statistical bounds and selective enumeration to refine these bounds. First, we provide a formal definition of the statistical delay of a circuit and derive a statistical timing analysis method from this definition. Since this method for finding the exact statistical delay has exponential run time complexity with circuit size, we also propose a new method for computing statistical bounds which has linear run time complexity. We prove the correctness of the proposed bounds. Since we provide both a lower and upper bound on the true statistical delay, we can determine the quality of the bounds. If the computed bounds are not sufficiently close to each other, we propose the use of a heuristic to iteratively improve the bounds using selective enumeration of the sample space with additional run time. The proposed methods were implemented and tested on benchmark circuits. The results demonstrate that the proposed bounds have only a small error, which could be further reduced using selective enumeration with modest additional run time. Categories and Subject Descriptors B.8.2 [Performance and Reliability]: Performance analysis General Terms Algorithms, performance, reliability 1 Introduction Static timing analysis has become an indispensable part of performance verification. Traditionally, the variation in the underlying process parameters were modeled in static timing analysis (STA) using so-called case analysis. In this methodology, best-case, nominal and worst-case SPICE parameters sets are constructed and the timing analysis is performed several times, each time using one case file. Each execution of the static timing analysis is therefore deterministic, meaning that the analysis uses deterministic delays for the gates and any statistical variation in the underlying silicon is hidden. While this approach has been successfully used in the past to model die-to-die variations in device and interconnect delay, it is not able to accurately model variations within a single die. With the continual scaling of feature sizes, the ability to control critical device parameters on a single die has become increasingly difficult. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. TAU 02, December 2 3, 2002, Monterey, California, USA. Copyright 2002 ACM 1-58113-526-2/02/0012&$5.00. Using a worst-case analysis for these so-called within-die variations therefore leads to very pessimistic analysis results since it assumes that all devices on the die have worst-case characteristics, ignoring their inherent statistical variation. The emerging dominance of within-die variations therefore poses a major obstacle for deterministic STA, giving rise to the need for new statistical timing analysis approaches. Variations in the delays of a circuit can be broadly classified into two categories: environmental variations and process variations. Environmental variation are caused by uncertainty in the environmental conditions during the operation of a chip, such as power supply and temperature variations. Process variations are due to uncertainty in the device and interconnect characteristics, such as effective gate length, doping concentrations, and oxide thickness. These variations can be divided into between-die variations (or interdie variation) and within-die variations (or intra-die variations). Within-die variations can have a deterministic component due to topologically dependencies of device processing, such as CMP effects and topologically correlated lithographic distortions. In some cases, such topological dependencies can be directly accounted for in the analysis, thereby reducing the statistical variation [11], whereas in other cases, such variations are treated as random. In this paper, we propose a formal model and efficient analysis method for statistical STA in the presence of random within-die process variations. Since between-die variations can be adequately captured using case analysis, we focus on within-die variations. We also treat all variations as random variations, meaning that topological dependencies are either removed prior to the analysis or are treated as random variations. We initially also do not address environmental variations, although the proposed model and analysis methods can be extended to such variations. The extensive use of deterministic STA in recent years is in large part due to its linear run time complexity with circuit size. In contrast, statistical STA has an underlying worst-case complexity that is exponential with circuit size, which poses a fundamental obstacle to its practical application. This high run time complexity is the result of reconverging paths in the circuit which causes correlations between their path delays due to shared sections of such paths. Previous statistical STA approaches [4-9] have therefore suffered from very high runtimes, which has made them applicable only to small circuits, from the use of approximate methods with unclear accuracy impact, or from unrealistic assumptions, such as an assumed Gaussian distribution of gate delays. In this paper, we propose a new method for statistical STA. Since the formulation of statistical STA has varied in subtle but important ways in the literature, we first provide a formal model of statistical STA. We then derive the proposed procedure for statistical STA in a strict manner from this problem formulation. Since the computational complexity of exact statistical STA is exponential with the circuit size, we present a new method for computing bounds on the exact statistical delay of the circuit and proof the correctness of these bounds. By computing only bounds of the true statistical behavior of the circuit, we are able to preserve the important characteristic of page 1 deterministic STA that it has a linear run time complexity with circuit size. Since we provide both a lower and upper bound on the true statistical delay, we can determine the quality or error of the computed bounds. If the bounds are not sufficiently close to each other, we use a heuristic method to iteratively improve the computed bounds using selective enumeration of the sample space with additional run time. Combined, the proposed methods provide a statical STA approach with linear run time, that is guaranteed conservative, has a bounded error, and can be iteratively refined at the expense of additional computational effort. The proposed methods were implemented and tested on large benchmark circuits. The difference between the expected values of the upper and lower bound was shown to be small, ranging from 4 - 12%, and this difference could be reduced by 54% on average, using the proposed selective enumeration method, with modest additional run time. The remainder of this paper is organized as follows. In Section 2 we presents a formal model of statistical STA. In Section 3, we present a number of probabilistic timing graph transformations. In Section 4, we derive our method for exact statistical timing analysis. In Section 5, we present the computation of the lower and upper statistical bounds on the true statistical behavior. In Section 6, we presents the methods for exact graph reduction and for refinement of the computed bounds using selective enumeration. In Section 7 we present our results and in Section 8 our concluding remarks. A path P of a timing graph G is a sequence of nodes, such that each pair of adjacent nodes ng and nh has an edge egh=(ng,nh). The path delay dP of path P is the sum of all the delays D(eij) of edge eij on path P. Among all paths terminating at a node n, we define the path with the maximum delay as the critical path of n. The delay of the critical path of node n is referred to as its arrival time, ta(n). Note that the arrival time for the source node ns is a deterministic value equal to 0. The critical path of the sink node nf of a timing graph is referred to as the critical path of the timing graph, and the arrival time of nf, is referred to as its graph delay. After fabrication, a deterministic timing graph GD can be conceptually formulated for each die. However, during the design of a chip, the gate delays are unknown and must be modeled as random variables. Each gate delay is therefore specified either with a probability distribution function (PDF) or probability density function (pdf) and we define a probabilistic timing graph GP as follows: Definition 2. A probabilistic timing graph GP is a timing graph whose edges are assigned random variables of delay values. Figure 1 shows an example of a delay probability distribution funcP(t) 1 P(0)=0 P(∞) = 1 2 Statistical Timing Analysis Formulation In this Section, we present a formal model of statistical static timing analysis. Our goal is to model the impact of gate delays variation due to within-die process variations on the circuit delay. Although at design time, the delay of each gate is unknown, after a chip has been manufactured, the gate delays are fixed and have a deterministic value for each particular die. The randomness or variability of the circuit delay is therefore over the fabricated die, and it is the probability distribution of the circuit delay that statistical timing analysis aims to obtain. At this point, we do not account for temporal variations of the gate delays due to environmental factors, such as power supply fluctuations, temperature dependence and noise, which must be modeled using case analysis. However, our general analysis approach can be extended to these types of variations as well. Also, for simplicity of formulation, we ignore the presence of false paths since these are orthogonal to the issues discussed in this paper. We now give the following definition of a timing graph: Definition 1. A timing graph G is a directed graph having exactly one source and one sink node: G={N,E,ns,nf}, where N={n1,n2,...,nk} is a set of nodes, E={e1,e2,...,el} is a set of edges, n s ∈ N is the source node, and n f ∈ N is the sink node and each edge e ∈ E is simply an ordered pair of nodes e=(ni,nj). The nodes in the timing graph correspond to nets in the circuit, and the edges in the graph correspond to connections from gate inputs to gate outputs. Although circuits generally have multiple inputs and outputs, we can trivially transform them to graphs with a single source and sink by adding a virtual source and sink node. In our formulation, a deterministic timing graph GD represents a particular manufactured die, where each gate has a fixed delay value. Each edge e in GD is assigned a delay D(e), which represents the deterministic signal propagation delay from a gate’s input to its output. Similar to other statistical STA methods, we ignore the dependence of gate delay on the transition time of the gate input signals. 0 P( D ≤ t )= F (t ) = ∫0 f ( τ ) dτ t f(t) ∞ ∫0 f ( t ) dt = 1 dp=f(t)dt 0 Figure 1. Delay probability density and distribution functions. tion and its corresponding probability density function. Since these functions represent the variation of gate delays, they have the following obvious but important properties: Property 1. A delay pdf is non-zero only on a finite interval. Property 2. A delay PDF equals 0 for all delay values less than its minimum value dmin and equals 1 for all values greater than its maximum value dmax. These properties follow from the fact that the delay of a real gate cannot be less that some finite minimum delay value dmin or more than some finite maximum delay value dmax. Similar to previous statistical STA methods [4][5], we assume statistical independence of all edge delays. In practice, edge delays may be spatially correlated, which complicates the analysis by creating additional correlations between path delays. The contribution of this paper is therefore that it provides an efficient solution to the problem of path delay correlation due to path reconvergence. The methods presented in this paper can also be extended to timing graphs with correlated edge delays. Note also that our method does not restrict the shape of the PDF of edge delays, which in general is not Gaussian. To simplify the implementation of statistical STA it is often more convenient to approximate continuous probability density and distribution functions with discrete functions. A discrete pdf, corresponding to continuous pdf f(t), can be represented by a sequence of pairs (di,pi), where d i = i∆ and p i = di + ∆ ⁄ 2 ∫d – ∆ ⁄ 2 f ( τ ) dτ. For computational i efficiency, we use discreet pdfs and PDFs in the final implementation of our proposed statistical timing analysis approaches. However, for page 2 generality, we will formulate the statistical timing analysis task using continuous functions. We now consider the sample space S of a probabilistic timing graph GP, consisting of all deterministic timing graphs GD with edge delays corresponding to the non-zero values of their probability distribution functions. The probability that a timing graph GD in S has an edge i with delay Di between ti and Ti is P ( t 1 ≤ τ 1 ≤ T 1, t 2 ≤ τ 2 ≤ T 2, … ) = (EQ 1) T1 T2 … p 1 ( τ 1 ) p 2 ( τ 2 )…dτ 1 dτ 2 … t1 t2 ∫ ∫ deterministic timing graph GD in S, we can compute its delay D(GD), using any of the currently available means, such as traditional static timing analysis. The delay D(GP) is therefore defined on the sample space S and is a random variable which completely defines its timing behavior. The PDF of D(GP) is defined as follows: Definition 3. The probability distribution function of the delay of a probabilistic timing graph is expressed as: ∫ D ( GD ) ≤ t p 1 ( t 1 ) p 2 ( t 2 )…dt 1 dt 2 … , (EQ 2) where pi(ti) is the probability density function of the delay of edge i and the integration is performed over the volume of sample space where delay D(GD) of timing graph GD is less than t. The probability density function can be computed by simple differentiation. If we use discreet edge delay pdfs, we can compute the graph delay pdfs by enumerating the entire sample space consisting of all permutations of the non-zero delay probabilities of all edges. Of course, this method is exponential in its run time complexity with circuit size and is not useful as a practical solution. However, its formulation is useful as a formal definition and for understanding the underlying problem that needs to be solved. 3 Probabilistic Timing Graph Transforms Before we discuss efficient methods for computing the graph delay PDF, we discuss three basic transformations for probabilistic timing graphs using the formulation in Section 2. ∞ ∫0 p ( t – τ ) ⋅ q ( τ ) ⋅ dτ (EQ 4) 2. Parallel Reduction. Figure 2(b) shows a timing graph GP consisting of two parallel edges with delays described by pdfs p(t) and q(t). For each of the deterministic graphs GD in sample space S(GP), the graph delay D(GD) is the maximum of its two edges delays. Applying EQ2, we can express the probability distribution function DGp(t) as: P ( D Gp ≤ t ) = where p i ( τ i ) is the probability density function of edge i. Given a P ( D ( GP ) ≤ t ) = p Gp ( t ) = ∫ max ( t 1, t 2 ) ≤ t p ( t 1 ) ⋅ q ( t 2 )dt 1 dt 2 (EQ 5) The random variable DGp(t) is the statistical maximum of the two independent random variables p and q. From standard probability theory [3] it is well know that the probability distribution function PG(t) of this maximum is: PG ( t ) = P ( t ) ⋅ Q ( t ) (EQ 6) By differentiation, we obtain the probability density function pG(t): pG ( t ) = P ( t ) ⋅ q ( t ) + p ( t ) ⋅ Q ( t ) , (EQ 7) Therefore, the edges in Figure 2(b) can be replaced with a single edge have the pdf pG(t). 3. Subgraph substitution. Let graph GP have subgraph GP1, as shown in Figure 3,such that: 1. GP1 and complementary graph GP1=GP-GP1 have only 2 common nodes a and b with edges belonging to both GP1 and GP1. 2. Node a (b) has only fanin (fanout) edges belonging to subgraph GP1 and fanout (fanin) edges belonging to subgraph GP1. Let the subgraph GP1 have a graph delay pdf d(GP1). We can substitute subgraph GP1 in graph GP with a single edge (a,b) having the same delay pdf d(GP1). This results in a simpler timing graph with the same graph delay. This can be proven by rearranging the integration in EQ2 for the initial graph delay PDF and separating the integration over the random variables corresponding to the delays of subgraph GP1. 4 Statistical Timing Analysis 1. Series Reduction. Figure 2(a) shows a probabilistic timing graph consisting of two series connected edges with delays described by p(t) and q(t). The total delay of the timing graph is the sum of its edge delays and by applying EQ2, the PDF of the graph delay DGp(t) is: P ( D Gp ≤ t ) = ∫ t1 + t2 ≤ t p ( t 1 ) ⋅ q ( t 2 )dt 1 dt 2 (EQ 3) The sum of the two independent edge delays is the convolution of its edge delays, as is well known from standard probability theory [3] and the two edges can be replaced with a single edge having the following probability density function: The initial formulation presented in the Section 2 relies on the enumeration of all possible edge delays with non-zero probability and is difficult to use for an efficient solution to the problem. Deterministic timing analysis has traditionally used an approach where arrival times are propagated through the circuit in topological order. We therefore derive such a propagation based approach for computing the graph delay PDF DGp, in a manner that is consistent with the definition of DGp in Section 2. We first define the probability distribution of the latest arrival time, An(t) at node n as follows: p(t) n1 p(t) n2 q(t) n3 n1 (a) n2 q(t) (b) Figure 2. Series and parallel reduction Figure 3. Subgraph substitution with a single edge page 3 Definition 4. The latest arrival time PDF, An(t) at node n of GP is the probability that a deterministic timing graph GD in the sample space S(GP) has an arrival time t a ( n ) ≤ t . In the subsequent discussion, we will refer to the latest arrival time as simply the arrival time, noting that a similar derivation can be performed for the earliest arrival time. We also make the following useful definition. Definition 5. A fanin subgraph GS,n of timing graph GP at node n is a timing graph consisting of all edges and nodes of GP that lie on a path from the source node ns of GP to node n, and where node n is set as the sink node nf of GS,n. From Definition 4 and 5, it follows that the arrival time An at node n is equivalent to the graph delay of subgraph GS,n. This is clear from the fact that we can remove all edges from GP that are not present in GS,n without affecting the arrival time An, define n as the new sink node, at which point the arrival time at n becomes equal to the graph delay of GS,n. Hence, computing arrival time distributions and graph delay distributions are equivalent problems. The objective of statistical timing analysis is to compute the arrival time PDF of node n based on the arrival time PDFs of its predecessor nodes np. We can then use such a method to propagate arrival times through the circuit in topological fashion. To compute the arrival time of n, we must consider if the arrival times of its predecessor nodes np are independent random variables. We now state the following theorem: Theorem 1. Two arrival times An,i and An,j at nodes ni and nj are independent if the fanin subgraphs GS,i and GS,j at nodes ni and nj are disjoint (meaning they have no common edges) or if any common edges have a deterministic delay. Proof: The arrival times An,i and An,j are equal to the graph delays DS,i and DS,j of subgraphs GS,i and GS,j. From EQ2 it is clear that DS,i = Fi(di1, di2, ... dik ...) and DS,j = Fj(dj1, dj2, ... djk ...), where dik and djk are variables denoting the edge delays corresponding to edges in fanin subgraphs GS,i and GS,j. Since any shared variables are deterministic, the functions DS,i and DS,j do not share any random variables. Therefore, they are statistically independent because initially we assumed that edge delays are independent. We first consider the case where the arrival times of predecessor nodes np of node n are independent. Independent arrival time propagation. Without loss of generality, we consider a node n with two predecessor nodes np,1 and np,2. We then consider fanin subgraph GS,n at node n, as shown in Figure 4(a). Since the arrival times A1 at node np,1 and A2 at node np,2 are assumed to be independent, their fanin subgraphs GS,1 and GS,2 are disjoint, and using the subgraph substitution from Section 3, each can be replaced with a single edge ep,1 and ep,2, as shown in Figure 4(b). If GS,1 and GS,2 share edges with deterministic delay we can first split these edges without impacting the circuit delay and obtain disjoint fanin subgraphs. The edge delay PDF of ep,1 and ep,2 are set equal to the graph delay PDFs of GS,1 and GS,2, which is equal to the arrival time PDFs A1 and A2. The resulting graph, as shown in Figure 4(b), can be reduced to a single edge by performing series and parallel reduction. The obtained edge delay PDF of the final edge between ns and n is equal to the graph delay PDF of subgraph GS,n, which, in turn, is equal to the arrival time PDF An at node n. Given independent arrival time PDFs Ap at predecessor nodes np, we therefore compute the arrival time PDF An at node n as follows: 1. Convolve the arrival time pdf Ap,i at each node np,i with the edge delay pdf of the edge connecting np,i with n. 2. Compute the arrival time PDF An at node n by performing the statistical maximum function using EQ6 on the convolved PDFs. Using the above procedure, the computation of the arrival time pdfs in statistical timing analysis is therefore very similar to arrival time propagation in deterministic timing analysis, where propagation of deterministic arrival times is replaced with convolution and selection of the latest arrival time is replaced with parallel reduction. The run time complexity is linear with the size of the circuit. Unfortunately, this procedure is only valid if the arrival times are independent for each node in the circuit. For this to be true for all nodes, the graph will have the form of a tree-like structure, with the root of the tree as the sink node of the graph and all leaf nodes of the tree connected to the source node ns with single edges. In practice, such timing graphs are rare and we therefore now discuss how to compute arrival times in the presence of dependent predecessor arrival times. Dependent arrival time propagation. If the fanin subgraphs GS,p of predecessor nodes np of node n share one or more edges with random delays, the arrival times Ap will be dependent random variables and the statistical maximum function cannot be applied. An example of such a graph is shown in Figure 5(a). To determine for which portions the subgraphs GS,p do not share edges, we use the following definition of a dependence set. Definition 6. Consider the set of k predecessor nodes np of node n, with subgraphs GS,p at nodes np, and the intersection graph GI={NI,EI} consisting of edges and nodes shared by two or more subgraphs, E I = ∪ i, j ≤ k ( E S, i ∩ E S, j ) , N I = ∪ ( N S, i ∩ N S, j ) , i, j ≤ k the dependence set of n is the set of nodes n d ∈ N I , n d ≠ n s with one or more fanout edges e i ∉ G I . In Figure 5(a), the set of dependence nodes for node nf is {b, d}. Note that a is not in the dependence set of nf since both its fanout edges belong to GI. However, the dependence set of d is {a}. Also nodes h and e have empty dependence sets. The concept of dependence nodes is similar to that used in probabilistic simulation [12]. np,1 e1 e2 ns (a) np,2 ep,1 ns (b) n ep,2 np,1 e1 e2 n np,2 Figure 4. Arrival time computation for node n. Dotted edges in (a) belong to graph Gp only and solid edges also belong to GS. page 4 We refer to a node in GP as a convergence node nc if it has a nonempty dependence set. We also define the global set of dependence nodes nD as the union of the dependence sets nd,i and refer to a node as a dependence node, if it is an element in this list. In order to compute the graph delay of a graph Gp with one or more dependence nodes, we sort the list of global dependence nodes in topological order. We then consider the first node nD,1 in the ordered set nD. In Figure 5(a), nD = {a,b,d}, and nD,1 = a. By selecting the first node nD,1 in the list, we ensure that fanin subgraph GS,1 at node nD,1 does not contain any dependence nodes, and it follows that we can replace GS,1 with a single edge e1 connecting source node ns and nD,1, where the edge delay PDF D1 of e1 is equal to the arrival time PDF A1 at nD,1, as shown in Figure 5(b). Similarly, it is clear that the arrival time PDF A1 at nD,1 can be computed using independent arrival time propagation. For simplicity, we assume that the edge delay pdf D1 is discrete and is specified by a set of delay, probability pairs (di,pi). According to our construction, random variable D1 does not depend on the edge delays of other edges in the transformed graph Gp. Then, using conditional probabilities [3] the arrival time pdf Ax at any node x, can be computed as A x = ∑ k p i=0 i ⋅ A x, i , where Ax,i is the arrival time pdf at node x, when the delay of e1 is equal to di. We therefore compute the arrival time pdf Ax by performing k arrival time computations, each weighted by the conditional probability P(D1 = di). Since during the computation of Ax,i, edge e1 has a deterministic delay it is no longer a random variable and does not create dependence between arrival times. Node nD,1 is therefore no longer a dependence node and we can propagate arrival times using independent arrival time propagation until we encounter the next global dependence node, nD,2. Here, we repeat the same process, enumerating the arrival time pdf at nD,2 using conditional probabilities and eliminating it as a dependence node. Below is the procedure for dependent arrival time propagation: 1. Identify all dependence nodes in the circuit. 3. Enumerate the pairs (ti, pi) of arrival time pdf Ad at nd and for each pair propagate ti with conditional probability pi. e (a) a f d ns nf c h g e b (b) ns e1 a nf c e 11 ns (c) e 21 a1 h e b f d a2 nf c g Since we recursively enumerate the arrival time pdfs of all dependence nodes, the complexity of this approach grows exponentially with the number of total dependence nodes in a circuit. Note, however, that the number of dependence nodes will, in practice, be significantly less then the number of edges in GP. Dependent arrival time propagation therefore has a lower complexity than enumeration of the entire sample space. It can be shown that the set of nodes at which arrival times are enumerated is the sufficient and necessary set for exact arrival time computation. It is therefore not possible to enumerate fewer nodes without creating arrival time dependencies in the circuit. Nevertheless, dependent arrival time propagation is useful only for very small timing graphs or timing graphs with mostly treelike structures. In Section 5, we show how to efficiently compute bounds on the arrival time PDFs and in Section 6, we discuss how these bounds are improved by selective enumeration. 5 Statistical bounds We now propose an efficient methods for computing lower and upper bounds of the exact arrival time PDF of GP. We are interested in both upper and lower bounds since this allows us to determine the quality of the bounds by comparing their difference. Also, for digital circuits, analysis of both slow and fast paths are important for correct circuit operation and therefore a lower bound on the earliest arrival time PDF could be useful. We define the upper and low bounds of a PDF as follows: Definition 7. The arrival time PDF P(t) is an upper bound of the arrival time PDF Q(t) if and only if for all t, P ( t ) ≤ Q ( t ) . A similar definition and property can be formulated for lower bounds. Figure 6 shows two arrival time PDFs P(t) and Q(t), where h P(t) Figure 6. Lower and upper bounds of a PDF P(t) is an upper bound on Q(t). By using PDF P(t) instead of Q(t), we will overestimate the delay corresponding to a particular probability, resulting in a conservative analysis. Note also that not only the expected value µ p of P(t), but also other characteristics, such as the 95% confidence point, are bounded by P(t) on Q(t). Upper bound computation. To efficiently compute an upper bound on the exact graph delay PDF of Gp, we propose the following theorem for random variables: f d g 5. Compute the final arrival time pdf at node x by summing the conditional arrival time pdfs weighted by the product of their conditional probabilities. Q(t) 2. Propagate arrival time pdfs in the circuit until the first dependence node nd is encountered. b 4. Propagate ti, using independent arrival time propagation until the next dependence node is encountered and repeat step 3. Theorem 2. Let x, y and z be random variables that satisfy Property 1 and Property 2. Let x1, x2 be random variables with probability distributions that are identical to x. Then, the PDF of random variable max(x1+y, x2+z) is an upper bound on PDF of the random variable max(x+y, x+z). Proof. The probability distribution function of random variables max(x+y, x+z) and max(x1+y, x2+z) are: Figure 5. Dependent arrival time computation for node n. page 5 P(t ) = ∫x + max ( y, z ) ≤ t p ( x )q ( y )r ( z ) dx dy dz ∫ Q(t ) = max ( x 2 + y, x 2 + z ) ≤ t p ( x 1 ) p ( x 2 )q ( y )r ( z ) dx 1 dx 2 dy dz (EQ 8) ∫ p ( x ) p ( v )q ( y )r ( z ) dx dv dy dz (EQ 10) max ( x + y, x + z ) ≤ t, – ∞ ≤ v ≤ ∞ Integrals from formulae EQ9 and EQ10 for probability distributions Q(t) and P(t) have the same integration functions f(x1,x2,y,z)=p(x1)p(x2)q(y)r(z) and f(x,v,y,z)=p(x)p(v)q(y)r(z) and differ only in the names of the variables. We now split the 4D domain of both functions into two subdomains: y ≤ z and y > z . Probability distributions Q(t) and P(t) can be represented as the sum of two terms corresponding to the contribution of each subdomain. Q(t ) = Qy ≤ z(t ) + Qy > z(t ) P(t ) = Py ≤ z(t ) + Py > z(t ) a n1 (EQ 11) For subdomain y ≤ z we define a one to one mapping (bijection) so that (x1,x2,y,z) corresponds to (v,x,y,z) i.e. x1=v and x2=x. In this subdomain, inequality max ( x + y, x + z ) ≤ t follows from inequality max ( x 1 + y, x 2 + z ) ≤ t . Therefore, the region of integration for computing P y ≤ z ( t ) in this subdomain includes the integration region for computing Q y ≤ z ( t ) and hence Q y ≤ z ( t ) ≤ P y ≤ z ( t ) because in both cases we integrate the same function. For subdomain y > z we define a one to one mapping (bijection) so that (x1,x2,y,z) corresponds to (x,v,y,z) i.e. x1=x and x2=v. Similar to the above consideration, max ( x 1 + y, x 2 + z ) ≤ t follows from max ( x + y, x + z ) ≤ t in this subdomain and the region of integration for computing P y > z ( t ) includes the region of integration for computing Q y > z ( t ) . Therefore, Q y > z ( t ) ≤ P y > z ( t ) . Combining inequalities for Q(t) and P(t) from each subdomain, we obtain the inequality Q ( t ) ≤ P ( t ) for the whole sample space which proves the theorem. We now consider the simple graph Gp1 shown in Figure 7(a) with delay equal to max((da+db+dd), (da+dc+de)), where di is the delay of edge i. Figure 7(b) shows the timing graph Gp2 where edge a is split into edges a1 and a2 with the same delay PDFs as a. From Theorem 2, it follows that Gp2 has a delay PDF that is an upper bound on delay PDF of the graph Gp1. In fact, it is clear that the PDF of arrival times at all nodes in Gp2 are upper bounds of the PDF of arrival times of corresponding nodes in Gp1 and hence we refer graph Gp2 is an upper bound on graph Gp1. In general, Gp1 can have a more complex structure with additional fanin and fanout edges at node n2, etc. It can be shown that for a general timing graph Gp1, splitting an edge into multiple edges as illustrated in Figure 7 results in a graph Gp2 that is an upper bound on Gp1. Based on this graph representation of Theorem 2, we therefore pose the following Corollary: Corrolary 1. If for graph GP1 with one or more convergence nodes, arrival times are computed for all nodes using the procedure of independent arrival time propagation, the computed b c (EQ 9) Multiplying equation EQ8 by the integral of probability density function p(x) from minus infinity to infinity, rearranging some of the terms and renaming integration variables, gives us: P(t ) = ns ns a1 a2 n11 n21 b c n2 n3 n2 n3 d ns e d ns e Figure 7. Bounded graph transformation through node splitting. arrival time PDFs will be an upper bound on the true arrival time PDFs at those nodes. The validity Corrolary 1 can be seen by considering the timing graph Gp1 with dependence nodes nD, as illustrated in Figure 5(a). Following the procedure for dependent arrival time propagation, we replace subgraph GS,1 with a single edge e1, as shown in Figure 5(b), where the edge delay PDF D1 of e1 is equal to the arrival time PDF A1 at nD,1. We now create a graph Gp2, which bounds Gp1 by splitting edge e1 as shown in Figure 5(c), such that nD,1 is no longer a dependence node in Gp2. By repeating this process for all dependence nodes, we obtain a timing graph Gp,k that bounds the original timing graph Gp1 and which has no convergence nodes. We can compute the exact arrival time PDFs of Gp,k by performing independent arrival time propagation. Finally, it is easy to observe that we need not explicitly replace subgraph GS,1 with edge e1 and split it, and that we will compute identical arrival times to those of Gp,k by simply performing independent arrival time propagation on graph Gp1, as stated in Corrolary 1. This leads to the useful observation that an upper bound on the arrival times of a timing graph Gp is obtained by ignoring the dependencies of arrival times and simply applying the procedure for independent arrival time computation, which has a linear run time complexity with circuit size. Lower bound computation. We now discuss the computation of a low bound on the exact arrival time PDFs. Given two random variables, x and y, it is obvious that x and y are lower bounds of max(y, z). Therefore, given a timing graph Gp1={N1,E1}, and the graph Gp2= {N2,E2}, where N 2 ⊆ N 1, E 2 ⊆ E 1 , graph Gp2 is a lower bound of Gp1. For a graph Gp1 with one or more convergence nodes nc, we therefore simply remove all but one of the fanin edges to predecessor nodes with dependent arrival times. By doing so for all convergence nodes, we create a graph Gpk which is a lower bound on Gp1 and for which the exact arrival times can be computed using the independent arrival time propagation. Any one of the fanin edges of nc can be preserved to obtain a lower bound on Gp1. However, in order to obtain a tight lower bound, it is advantageous to preserve a fanin edge whose arrival time has the largest expected value among all fanin edges. The lower bound computation is therefore identical to independent arrival time propagation, except that at convergence nodes, the statistical maximum function is replace with a simple selection of the arrival time with the largest expected value. Note that for nodes with empty dependence lists, such as nodes e and h in Figure 5(a), regular parallel reduction is performed, since their arrival times are independent. page 6 6 Graph Reduction and Selective Enumeration Our overall approach to statistical timing analysis first performs graph reduction to decrease the problem size, followed by upper and low bound computation and enumeration of a select set of dependence nodes to improve the computed bounds. Exact graph reduction. In order to reduce the size of the timing graph, we prune arrival times by considering their relative alignment. Given two arrival time PDFs, A1 and A2 that converge at a node, if the minimum arrival time with a non-zero probability t1,min of A1, is greater than the maximum arrival time of with a non-zero probability, t2,max of A2, it is clear that in the entire sample space S, A1 will be greater A2. We can therefor prune arrival time A2 from the timing graph without changing its timing behavior. Note that, using this approach, entire subgraphs can at times be removed from the timing graph. We also utilize the series and parallel graph transforms discussed in Section 3 to reduce the size of the timing graph. The pruning method and the series / parallel reduction are executed in a loop, until no further improvement is made. Note that while the series transform and pruning method reduce the graph size, the parallel transform also resolves arrival time dependencies by removing local reconvergences and improves the quality of the computed bounds. Selective enumeration. In order to improve the quality of the bounds described in Section 5, we can combine the bound computation with enumeration of a small set of dependence nodes. Enumeration of dependence nodes improves the computed bounds in two ways. First, the enumeration partitions the sample space S and thereby reduces the dependencies of arrival time PDFs. Second, when all dependence nodes of a particular convergence nodes are enumerated, the arrival times at this node are independent and the low bound can be computed using statistical maximum, instead of selection of the arrival time with the latest expected value. Selective enumeration therefore has the desirable property that as the number of enumerated dependence nodes increases, the quality of the bounds increases monotonically. Also, when all dependence nodes are enumerated, the lower and upper bounds will become equal to the exact arrival time PDF. Our objective is to obtain an improvement in the bounds through enumeration of a minimum number of dependence nodes. We therefore need to select those dependence nodes that most strongly impact the quality of the bounds at the sink node. For simplicity, we measure the difference between the upper and lower bounds as the difference of their expected values and refer to this measure at the bound difference. However, other measures could also be applied. We now consider two correlated arrival time PDFs A1 and A2 that converge at node n. Figure 8, shows the expected value of the upper 11 10 mean of the output pdf 9 8 ( + ) upper bound 7 ( o ) exact ( * ) lower bound 6 5 −6 −4 −2 0 2 4 6 Separation between means of input1 and input2 Figure 8. Expected values with shift between arrival time PDFs. and lower bound PDFs and the exact arrival time PDF at node n, as a function of the alignment of A1 and A2. As the shift between A1 and A2 increases, the two bounds rapidly converge to the exact arrival time. This is caused by the fact that if one arrival time PDF lies significantly after the other, the later arrival time PDF is said to dominate and the dependence between the two arrival time PDFs has little impact. Therefore, two dependent arrival times that combine at a convergence node nc will generate a significant bound difference only if their arrival time PDFs are aligned, making enumeration of the dependence nodes of nc necessary only in this case. Also, it is possible that two dependent arrival times give rise to a large bound difference at convergence node nc, but this bound difference does not propagate to the sink node because along the propagation path, the arrival time bounds combine with the arrival time bounds which dominate. Therefore, enumeration of the dependence nodes of nc is only necessary if the bound difference at nc propagates to nf. In order to determine which arrival time dependencies are responsible for the bound difference at the sink node, we use an iterative search procedure. In each iteration, we identify a node nc where the convergence of dependent arrival times is expected to strongly contribute to the bound difference at the sink node. Then, from the dependence list of nc, one node is selected and is added to a global list of enumeration nodes. If the dependence list of nc contains multiple nodes, we select among these nodes the one with the widest PDF, since it is most likely to contribute significantly to the bound difference at nf. New lower and upper bounds are then computed for the timing graph using enumeration of the selected nodes in the global enumeration list. If the bound difference at the sink node does not improve significantly with the addition of a particular dependence node, it is removed from the enumeration list and is marked to prevent future selection. Also, if the PDFs of a selected dependence node contains many time points, we will partition it into sections, and enumerate these sections, instead of its individual time points. This results in much improved run time efficiency, while still substantially improving the bound difference. To determine where convergence of dependent arrival times is responsible for a large bound difference at the sink node, we perform a backward traversal from the sink node. At each visited convergence node nc, we estimate, based on the alignment of the arrival time PDFs, if the bound difference at nc results from convergence of similarly aligned dependent arrival times at nc, or from propagation of a bound difference from a dominant arrival time. Accordingly, either a node from the dependence list of nc is selected, or the traversal continues along the fanin edge of the dominant arrival time. 7 Results The proposed statistical timing analysis methods, including the proposed exact graph reduction, computation of upper and lower arrival time bounds and the method for selective enumeration were implemented. Also, the exact statistical timing analysis methods through edge enumeration described in Section 2 and dependent arrival time propagation described in Section 4 were implemented. In the cases where it was possible to compute the exact graph delay PDF with these methods, it was used to confirm the correctness of the computed bounds. For larger circuits, Monte Carlo simulation with 100,000 samples was used. To properly stress the proposed methods, fairly large standard deviations were used for the edge delay PDFs in the timing graphs, ranging between 15% and 35% of their mean. In Table 1, we show the circuit characteristics and results for the ISCAS [10] benchmark circuits. The table shows the number of convergence nodes (# conv) and the average and maximum number of page 7 Table 1. Circuit statistics and exact reduction improvement Circuit name c17 c499 c432 c880 c1355 c1908 c2670 c3540 c5315 c6288 c7552 #node #edges 13 19 245 481 198 379 445 815 589 1137 915 1556 1428 2449 1721 3011 2487 4687 2450 4864 3721 6459 #conv 3 38 39 85 240 103 259 480 152 1626 391 # dependence exact reduction avg 0.8 6.3 7.9 6.4 2.5 5.9 5.5 23.5 4.2 93.7 6.1 edges 12 318 134 271 695 598 1002 1581 424 3605 1116 max 3 56 32 48 26 85 51 275 15 1142 98 % imp 37% 34% 65% 67% 39% 62% 59% 48% 91% 26% 83% dependence nodes per convergence node (# dependence). Some circuits have extensive reconvergence, indicated by their high number of dependence nodes, making them difficult test cases for statistical timing analysis. The last two columns of Table 1 show the effectiveness of the exact reduction techniques. On average, the number of edges in the graph is reduced by 56% with a maximum reduction of 91%. The runtime was less than 6 seconds for all test cases. Table 2 shows the results for the bound computation and refinement through selective enumeration. The expected value of the lower and upper bound PDFs (bound low/upper) and their relative difference (bound diff) is shown. Although we only report the expected value in Table 2, the computed bounds are pdfs and allow the computation of other useful values, such as confidence points. We also show deterministic bounds (det. bound), obtained by selecting for each edge the minimum or maximum edge delay with non-zero probability and computing the graph delay with deterministic STA. The statistical upper and lower bounds have a relatively small difference in their mean value of 4% to 12%, showing their effectiveness. For all circuits, the run time of the bound computation did not exceed 5 seconds. Also, the Monte Carlo results fall between the computed bounds, as expected. Table 2 also shows the bounds after selective enumeration (selective enum bound) and the percentage improvement of their difference compared to the original bounds. For the other circuits, the improvement ranged between 31 - 92% with an average of 64%. The number of dependence nodes selected for enumeration was small, ranging from 3 to 7 nodes, showing the somewhat surprising result that enumerating only a few carefully chosen nodes can significantly improve the bound computation. The run time of the selective enumeration method (time) is modest, not exceeding 8 minutes for all test cases. Finally, Figure 9 shows the PDFs for the proposed lower C7552 1 0.9 ( ( ( ( 0.8 +) o) *) A) Lower Bound (Initial) Upper Bound (Initial) Improved Bounds Monte Carlo 0.7 prob 0.6 0.5 0.4 0.3 0.2 0.1 0 18 20 22 24 26 28 30 32 ns Figure 9. Comparison of PDF bounds and Monte-Carlo PDF 8 Conclusions In this paper, we have proposed an efficient method for computing bounds on the statistical behavior of the circuit delay due to withinchip process variations. We first presented a general method for statistical timing analysis as well as a method that substantially reduce the problem size. Since the exact statistical timing analysis method has exponential run time complexity with circuit size, we show how statistical bounds on the graph delay can be computed with linear run time complexity. We prove the correctness of the upper and lower bounds and demonstrated that the obtained bounds are close. In order to further reduce the difference between the bounds, we proposed an iterative refinement technique which selectively enumerates dependence nodes in the circuit. Using this technique, the difference in the expected value of the bounds could be reduced by 54% on average, with a modest run time increase. Table 2. Results of bound computation and selective enumeration det bound circuit c17 c499 c432 c880 c1355 c1908 c2670 c3540 c5315 c6288 c7552 low/ upper 1.6/2.4 6/9.6 8.8/15.0 11.6/18.4 11.6/19.0 17.4/28.8 15.2/24.6 20.2/33.6 21.4/34.0 54.4/89.8 19.6/30.8 monte carlo 2.24 8.48 12.5 14.9 16.7 23.4 20.7 27.4 27.9 75.7 25.5 bound low/ upper 2.18/2.28 7.64/8.7 11.9/13.1 14.5/15.3 15.2/17.1 22.7/23.9 19.9/21.4 26.5/28.1 27.6/28.6 72.5/79.0 24.7/26.0 selective enum bound low/ time % impr upper (sec) 5% 2.24/2.24 100% 4 12% 8.19/8.58 62% 150 9% 12.44/12.54 92% 240 6% 14.90/14.98 90% 150 11% 15.80/16.95 39% 290 5% 23.30/23.70 67% 300 6% 20.45/20.90 70% 170 6% 27.15/27.78 63% 180 4% 27.70/28.10 60% 340 8% 73.80/78.30 31% 480 5% 25.2/25.6 69% 250 %diff Acknowledgements This research was supported by SRC contract 2001-HJ-959 and NSF grant CCR-0205227. References [1] R.B.Hitchcock “Timing verification and the Timing Analysis program”, Proc., Design Automation Conf., 1982, pp.594-604 [2] N.P.Jouppi “Timing analysis for nMOS VLSI”, IEEE/ACM Design Automation Conf., 1983, pp. 411-418 [3] W.P.Feller “An Introduction to Probability Theory and its Applications”, Vol. 1,2 John Wiley & Sons, New York, 1970. [4] J.J Liou, K.T. Cheng, S. Kundu, A. Krstic, “Fast Statistical Timing Analysis By Probabilistic Even Propagation”, DAC 2001 [5] S.Devadas; H.F.Jyu; K.Keutzer; S.Malik “Statistical timing analysis of combinational circuits “, ICCD 1992 pp. 38 -43 [6] M. Berkelaar, “Statistical Delay Calculation, a Linear Time Method,” Proc. of TAU, 1997. [7] R.B. Brawhear, N. Menezes, C. Oh, L. Pillage, R. Mercer, “Predicting circuit performance using circuit-level statistical timing analysis”, European Design and Test Conference, 1994. [8] R.-B. Lin; M.-C. Wu, “A new statistical approach to timing analysis of VLSI circuits”, Proc. Int. Conf. on VLSI Design, 1998 [9] M. Orshansky, K. Keutzer, “A general probabilistic framework for worst-case timing analysis”, Proc. DAC 2002. [10]F. Brglez, H.Fujiwara, “A Neutral Netlist of 10 Combinatorial Benchmark Circuits”, Proc. IEEE ISCAS, 1985, pp.695-698 [11]M. Orshansky, L. Milor, P. Chen, K. Keutzer, C. Hu, “Impact of systematic spatial intra-chip gate length variability on performance of high-speed digital circuits”, ICCAD 2000, pp. 62 -67. [12] F. Najm, R. Burch, P. Yang, I. Hajj, “Probabilistic simulation for reliability analysis of CMOS VLSI circuits” IEEE Trans. on CAD, Volume: 9 Issue: 4, April 1990 and upper bound with and without selective refinement as well as the PDF obtained through Monte-Carlo simulation for circuit c7552. page 8