[go: up one dir, main page]

Academia.eduAcademia.edu

Characterization of Feasible Retimings

2001

Characterization of Feasible Retimings ∗ Philip Chong and Robert K. Brayton EECS, U.C. Berkeley {pchong,brayton}@eecs.berkeley.edu Abstract We present a theorem which characterizes all feasible retimings for a strongly-connected graph. For such graphs, we give necessary and sufficient conditions for the achievability of a chosen target retiming. We describe an application which combines floorplanning and retiming which utilizes this characterization. Experimental results show our techniques yield superior clock frequencies with a minor increase in wirelength. 1 Introduction Retiming [1] is a widely accepted technique used for resynthesis of digital circuits. Generally, retiming refers to the creation and removal of sequential elements (registers) in a circuit in a manner which preserves the functionality of the circuit. Retiming provides flexibility in resynthesis, in that modifying the circuit in such a manner may allow for optimization of some particular metric, say area or delay. Retiming has the well-known property that, for any retiming, the number of registers in any loop (cycle) in the circuit must remain constant. The converse of this is not generally true, though. That is, given an original circuit and a target circuit which preserves the number of registers in every loop, it is not always possible to generate the target circuit from the original in a manner which preserves functionality. However, this converse property does hold for a specific class of circuits, in particular, circuits whose underlying graphs are strongly-connected. In this paper, we present a proof of this converse property. We show in Section 3 a practical application of this theorem in a non-iterative approach to combining retiming and dielevel floorplanning for deep-submicron designs. Experimental results in Section 4 show notable improvement in clock cycle times achievable using our methodology. 2 Theoretical Work In this section, we present our characterization of retimings in strongly-connected graphs. We use the term loop to refer to cycles in the graph G, to avoid confusion with ∗ This work was supported by the Gigascale Silicon Research Center at U.C. Berkeley. the term cycle, which we use to describe clock cycles in the domain of sequential logic. We model a circuit as a directed multigraph G, where the nodes of the graph represent circuit elements (standard cells or macroblocks), and the edges represent the interconnections between the elements. Each edge e is labeled with a non-negative integer S(e), which represents an initial assignment of registers to the interconnects such that the circuit provides the correct behaviour. We are free to relabel the edges, as long as the result is consistent with the original. Here, the concept of consistency is the same as in [1]; circuit behaviour is preserved when a number of registers are removed from (or added to) all the incoming edges of any node and the same number of registers are added to (likewise removed from) the outgoing edges of the same node. We call this procedure of moving registers across a single node a retiming operation. We say two edge labelings of G are consistent iff there is a finite sequence of successive retiming operations which transforms one labeling to the other. We also define compatible labelings: if S and T are two edge labelings of G, S and T are compatible with respect to a set L of loops in G iff, for every loop C in L, the sum of the labels in S for C is the same as the sum of the labels for C in T . We can show: Theorem 1 If S is compatible with T with respect to all simple (non-self-intersecting) loops in G, then S is compatible with T with respect to all loops in G. Theorem 1 indicates that we need only consider simple loops when determining compatibility. Essentially, this result is obtained from the decomposition of selfintersecting loops into non-self-intersecting loops. Although not critical to our main result, this is an important consideration for the practical application of our work described in Section 3, as this allows us to reduce the number of loops we need to consider to a manageable quantity. Note that the retiming operation as described above preserves the number of registers around each loop; if S is consistent with T , then S must be compatible with T . However, we can also show the converse, for stronglyconnected graphs: Theorem 2 If G is strongly-connected, then S is compatible with T iff S is consistent with T . For proofs of both theorems, see Appendix A. Theorem 2 indicates that, if a target retiming preserves the number of registers for every loop in G, then there exists a valid finite sequence of retiming operations which can generate this target. Moreover, as the proof of this theorem is constructive in nature, we can readily obtain the desired sequence of retiming operations. Thus, the constraint that the number of registers remain constant for all loops in G is both necessary and sufficient for any valid retiming of G. One might argue that the condition that G be strongly-connected may be too restrictive in practice. However, one should note that in a design with no redundancies, edges which do not participate in a cycle must lie on some path from a primary input to a primary output. Hence, the registers on such edges only contribute to the overall latency of the circuit. By ignoring these edges during retiming, we thus allow the latency to grow unbounded, but do not affect the functionality of the circuit beyond this. Alternatively, if latency is important, then one may introduce a dummy “host node” in the graph, representing the environment external to the circuit; G then becomes strongly-connected through the host node, assuming a circuit with no redundancies. In this case we can control the latency of off-chip communication by limiting the number of registers introduced on the edges incident to the host node. We model the design as a directed multigraph G, where the nodes of the graph represent large macroblocks, possibly millions of gates each. Here, the blocks represent sequential logic, though we ignore the registers internal to the blocks, as we are only concerned with retiming the registers on the block-to-block interconnects. Each macroblock n has an associated layout area An , though the aspect ratio (width/height) of each block may be flexible. We assume rectangular blocks only, for simplicity. The edges represent the interconnect between blocks; each edge may be a bus consisting of many wires, though for our purposes we abstract this detail away. Again, S(e) represents an initial assignment of registers to the interconnects; this represents an initial design which satisfies the functionality required by the system. Our problem is thus: find a location (xn , yn ) and width and height (wn , hn ) for each macroblock n, and retiming T (e) consistent with S(e) such that 2.1 We compute d(m, n) as follows. First, we assume interconnect delay is linearly related to distance, given optimally buffered lines as in [4]. Optimal buffering is a reasonable assumption, given the long chip-level interconnect we are dealing with here. Second, we use the Manhattan distance D(m, n) between the centers of the macroblocks as an estimate for the length of the interconnect. Third, we assume the macroblocks are all Moore machines, with registered outputs. This assumption is made to ensure that each interconnect can be treated independently; in the presence of combinational paths which span multiple chip-level interconnects, the inequality constraint for T (e) given above does not necessarily hold. Given the large size of the macroblocks, it is reasonable to assume they will be designed in some regular fashion, so the Moore machine assumption seems to be justified. For simplicity, we assume our registers have no propagation delays associated with them, although we can easily model such effects by subtracting such a delay from Tc . Finally, we assume there is a combinational delay t(m, n) associated with the inputs to block n coming from block m; this models the fact that, for a general Moore machine, there may be combinational logic between an input and a register. Based on this we have: Related Work We initially developed our theory believing it to be a novel result in the area of retiming. However, it was subsequently found that the result presented in [2] also derives the same conditions for feasible retimings. The technique for the proof in [2] differs slightly from our work in two ways. First, [2] shows that the temporality (i.e. sequential behaviour) of a circuit is maintained given the number of registers in any loop is fixed. However, it was not shown that a valid sequence of retiming operations exists to transform a circuit into a temporal equivalent. Our work shows this is indeed the case. Second, our proof is constructive, so the desired sequence of retiming operations is explicitly determined. 3 Application to Floorplanning In this section, we consider an application of our retiming theorem to the generation of die-level floorplans under deep-submicron physical conditions. Under current technology trends, wire delays are predicted to grow such that multiple clock cycles will be required for die-level interconnects [3]. Thus, the task of generating a feasible floorplan is tightly coupled with retiming, as the flexibility of placement of modules is directly related to the assignment of registers to the interconnects. Interconnects with more registers may be made longer than interconnects with few registers. • wn hn ≥ An • No macroblocks overlap • For edge e = (m, n), T (e) ≥ bd(m, n)/Tc c, where d(m, n) is the interconnect delay from block m to block n, and Tc is the clock period d(m, n) = κD(m, n) + t(m, n) for some constant κ relating distance to time; κ depends on the physical technology used for the chip itself. 3.1 Our Procedure Given that loop constraints (the number of registers around each loop in G) are sufficient to describe all valid retimings of G, one might hope to translate these into constraints which are amenable for a floorplanning/placement tool to use. If such constraints exist, then as long as the placement obeys such a constraint, the end result will be guaranteed to have a valid retiming. Unfortunately, this task does not seem to be straightforward. In addition, there may not exist any floorplan that meets the constraints. We have therefore settled to use a heuristic method to try to satisfy the loop constraints; as it is a heuristic, the final result may not satisfy the constraints properly. However, note that we may always reduce the clock frequency in order to satisfy our loop constraints; increasing the clock period allows violated constraints to become met. Thus, for our methodology, the maximum clock frequency (or minimum clock period) at which the loop constraints are satisfied becomes a metric for the quality of the floorplan. Our floorplanning procedure generates a slicing structure derived using recursive mincut partitioning [5, 6]. The implementation for our algorithm uses HMetis [7] as the core mincut partitioner. We build the slicing tree recursively, using mincut partitioning to determine the structure of the tree at each level. Using mincut partitioning in this manner acts effectively as a heuristic for minimization of wirelength; as a minimum number of edges are cut at each stage, we expect that relatively few wires will have long lengths. This technique effectively mimics the same wirelength minimization goals used in current state-of-the-art floorplanners. To apply our loop constraints to floorplanning, we adjust the weights on each of the edges in G in order to promote clustering of loops within a single partition, rather than spreading loops across partitions. Moreover, intuitively some loops are more “critical” than others, in the sense that cutting such loops during partitioning may lead to problems when generating the floorplan. We thus wish to avoid cutting edges which: • Participate in many loops • Participate in loops which have few registers • Participate in loops which have few modules Empirical observation shows that the following formula gives good results in practice when used for the weighting of edge e: X M (`)/N (`) w(e) = `∈L(e) where L(e) is the set of all loops in which edge e participates, M (`) is the number of modules in loop `, and N (`) is the number of registers available in loop `. In our experiments, we generate two different floorplans. First we use the wirelength-driven floorplanning technique (i.e. using unweighted edges) to obtain a floorplan. Then we use the edge-weighted method to obtain a second floorplan. Comparing the two results thus give an indication of the improvement possible when using our loop-constraint-driven technique, as a measure of the merit of our method. 3.2 Related Work Recently, Cong and Lim [8] have demonstrated a method for simultaneous partitioning, floorplanning and retiming. Their algorithm GEO effectively interleaves timing analysis and retiming in a recursive top-down partitioning approach. While our technique shares a common goal, we differ primarily in that we strive to fully decouple retiming from the task of floorplanning. In [8], it is noted that retiming and timing analysis can be expensive, and so its application is limited. Our work attempts to provide a viable floorplan independent of an explicit retiming, hence doing away with these costly procedures within floorplanning. 4 4.1 Experiment Synthetic Benchmarks Unfortunately, there are no existing benchmark designs which reflect the large-scale system-on-a-chip designs which we are targeting. To remedy this, we have developed a synthetic benchmark generation technique suitable for the designs which we would like to see. There have been several attempts at generation of realistic synthetic benchmark circuits [9, 10, 11], but all of these methods use particular features which reflect gate-level designs, rather than the macroblock examples we would like. We have thus taken the method of Hutton et al. [10] and modified it to reflect our goals. As in the original method, distributions of various quantitative metrics are extracted from a “seed” design, and these parameters are used to randomly generate new designs which have similar characteristics. Our method differs, though, in that the characteristic distributions we considered to be important were the number of output pins per block, degree of fanout per output, block size, and the size of loops in the design. All but the last characteristic is generated directly from the seed design; in order to generate designs with the correct distribution of loops, we use a ripup and retry technique similar to [10]. We fit a binomial distribution curve to the block size statistics from the seed design so that the block sizes for our generated circuits have a “smoother” distribution over the full range of the possible sizes. Note that the process for generating synthetic designs does not account for S, the initial retiming labeling. We generate S for the synthetic circuit by generating a slicing structure layout using mincut partitioning [5, 6], then assigning registers to edges in proportion to their corresponding edge lengths in the layout. This was done to approximate how such a system might be designed in real life; communication overhead between tightlycoupled blocks would tend to be minimized for performance reasons. 4.2 Technology Assumptions A top-level description of the Alpha 21264 processor was used as the seed design for our experiments in this paper. Although we do not possess an actual Alpha design, high-level architectural descriptions provided information about the basic blocks (functional units, caches, etc.) and their interconnectivity, and a chip micrograph provided information about the relative areas of the blocks. We believe that the interconnectivity between blocks at this level may be similar to the high-level designs we will face in the next decade. Eight benchmarks were synthesized from the Alpha design, ranging from 24 to 32 macroblocks in the generated designs; the original design had 24 blocks. Areas for the blocks were scaled so that the final designs would fit on a square die approximately 24mm per side. Note that the relationship between distance and delay is dependent on the physical characteristics of the die; here we need to make some rough estimates for future technology. We assume that the linear propagation constant for long optimally-buffered interconnect is 16µm/ps; note that this yields a delay from corner-to-corner on the die of 3000ps (6 clock cycles at 2GHz). We also assume t(m, n) = 100ps as the input-to-register delay, uniformly for all macroblock inputs. 4.3 ckt 1 2 3 4 5 6 7 8 avg Norm T wlen (ps) (mm) 913 388 589 544 584 602 814 543 611 398 657 568 579 608 766 807 689 557 Loop T wlen (ps) (mm) 609 402 530 582 568 633 675 597 562 432 625 623 768 685 688 935 628 611 Imp T wlen (%) (%) 33.3 -3.6 10.0 -7.0 2.7 -5.1 17.1 -9.9 8.0 -8.5 4.9 -9.7 -32.6 -12.7 10.2 -15.9 8.9 -9.7 Table 1: Experimental Results lengthened, using our heuristics. As well, this critical loop has no extra registers available for retiming, which further constraining the clock period. Currently we are working on additional techniques which can identify potential critical loops and cluster these together during placement to avoid such a situation. 5 Discussion We have presented a theorem which characterizes all feasible retimings of a circuit for strongly-connected graphs. We demonstrated an application of this theorem in the development of heuristics for combining retiming with die-level floorplanning for deep-submicron designs. Our results show notable improvement in achievable clock cycle when our loop-constraint heuristics are added to our floorplanner, with a small penalty in total wirelength. Results Table 1 shows the results obtained from the eight benchmark circuits generated as described above. The columns labeled Norm indicate the layout results using a “normal” layout technique which generates a slicing structure using mincut partitioning. The mincut heuristic used here emulates the wirelength minimization goal typical of current state-of-the-art floorplanning tools. The T column indicates the minimum clock period obtained using the normal technique. The wlen column indicates the total length of the macroblock interconnects, estimated using the half-perimeter bounding box model for the nets. The columns labeled Loop indicate the same results using the loop-constrained method described in Section 3.1. Finally Imp gives the percentage improvement (reduction in clock period or wirelength) from the normal layout scheme to the loop-constrained method. Note that, for all but Circuit 7, using the loopconstraint heuristics improves the clock period obtained for the design. For this circuit, we found that the critical loop (i.e. the loop which constrains the clock period) is broken early in the slicing tree, and hence overly 6 Future Work Currently, our loop-constraint heuristics are somewhat limited, in that mincut partitioning only indirectly influences the physical distances (i.e. interconnect delay) around loops. Ideally, we would like to work towards minimization of a metric which more closely corresponds to real physical distances. To this end, we are currently working towards using LANCELOT, a non-linear programming package [12] to derive our final floorplans. By using non-linear programming, we allow ourselves the ability to specify more complex constraints which more closely correspond to the constraints we would like to have. However, a general lack of “smoothness” (i.e. differentiability) of these constraints lead to difficulty with this method. We are currently attempting to address these smoothness concerns. [2] suggests using the set of fundamental loops of the circuit, instead of the set of all loops, as a means of reducing computational requirements when dealing with loop-based algorithms. We cannot use this technique in our existing loop-constraint heuristic, since in this method we need to count the number of loops in which each edge participates. However, this approach should be useful in our non-linear programming formulation, where we may reduce the number of constraints in the program by only considering fundamental loops. An interesting extension to this work would be to examine the effect of introducing asynchronous interconnects. One example of an asynchronous interconnect in real circuits would be the communication from a memory cache; as data may not be immediately available from a cache, the ability to stall the data transfer is necessary. Asynchronous interconnects, by definition, are able to accept arbitrarily many registers. However, additional registers introduced on such edges may lead to performance degradation. The challenge in utilizing asynchronous interconnects is twofold: first, the tradeoff between performance and register insertion must be quantified for such edges. Second, a suitable floorplanning algorithm which is able to make performance/register count tradeoffs must be developed. We expect to tackle these problems in the future. References [1] C. Leiserson and J. Saxe, “Retiming synchronous circuitry,” Algorithmica, vol. 6, no. 1, pp. 5–35, 1991. [2] N. V. Shenoy, K. J. Singh, R. K. Brayton, and A. L. Sangiovanni-Vincentelli, “On the temporal equivalence of sequential circuits,” in 29th ACM/IEEE Design Automation Conference, pp. 405–409, 1992. [3] Semiconductor Industry Association, “National technology roadmap for semiconductors,” 1997. [4] R. H. Otten and R. K. Brayton, “Planning for performance,” in 35th ACM/IEEE Design Automation Conference, pp. 122–127, 1998. [5] M. A. Breuer, “Min-cut placement,” Journal of Design Automation and Fault-Tolerant Computing, vol. 1, pp. 343–362, Oct. 1977. [6] R. Otten, “Layout structures,” in IEEE International Large Scale Systems Symposium, pp. 349–353, 1982. [7] G. Karypis and V. Kumar, “Multilevel k-way hypergraph partitioning,” in Design Automation Conference, pp. 343–348, 1999. [8] J. Cong and S. K. Lim, “Physical planning with retiming,” in International Conference on Computer-Aided Design, pp. 2–7, 2000. [11] D. Stroobandt, J. Depreitere, and J. V. Campenhout, “Generating new benchmark designs using a multiterminal net model,” Integration, The VLSI Journal, vol. 27, pp. 113–129, July 1999. [12] A. R. Conn, N. Gould, and P. Toint, LANCELOT: A Fortran Package for Large-Scale Nonlinear Optimization (Release A). Springer-Verlag, 1992. Appendix A Let G(V, E) be a finite directed multigraph. Definition (Path) A path in G of length n is an ordered list of edges (e1 , e2 , . . . , en ), ei ∈ E, such that ei = (vi , vi+1 ), 1 ≤ i ≤ n, where vj ∈ V , 1 ≤ j ≤ n + 1. Definition (Cycle) A cycle in G is a path which begins and ends at the same vertex; that is, en = (vn , v1 ) using the notation above. We assume every edge in E participates in at least one cycle of G. Equivalently, we assume G is strongly-connected. Definition (Self-Intersecting Cycle) A cycle which contains some edge more than once is said to be self-intersecting. Let T : E → Z be a labeling of the edges of G with integers. We call T the target labeling for reasons which will be apparent below. Let S : E → Z be another labeling of the edges of G with integers. Definition (Edge Satisfaction) We say e ∈ E is satisfied by S iff S(e) = T (e). Definition (Path Satisfaction) We say a path P is satisfied by S iff X X S(e) = T (e) e∈P Definition (Cycle Satisfaction) We say a cycle C is satisfied by S iff X X S(e) = T (e) e∈C [10] M. Hutton, J. Rose, J. Grossman, and D. Corneil, “Characterization and parameterized generation of synthetic combinational benchmark circuits,” IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, vol. 17, pp. 985–996, Oct. 1998. e∈C Let S and S 0 be two labelings of the edges of G. Definition (Compatibility) A cycle C in G is compatible with respect to S and S 0 iff X X 0 S(e) = S (e) e∈C [9] J. Darnauer and W.-M. Dai, “A method for generating random circuits and its application to routability measurement,” in ACM International Symposium on FieldProgrammable Gate Arrays, pp. 66–72, 1996. e∈P e∈C S and S 0 are said to be compatible labelings iff all cycles in G are compatible with respect to S and S 0 . Note that cycle satisfaction is a special case of cycle compatibility with S 0 = T . Theorem 1 Labelings S and S 0 are compatible iff all nonself-intersecting cycles C in G are compatible with respect to S and S 0 . Proof One direction (“only if”) is trivial. For the other direction (“if”), use induction on the length of a given cycle. Any cycle C of length 2 cannot be self-intersecting, so C must be compatible with respect to S and S 0 . Now consider a cycle C of length n > 2, and suppose all cycles of length less than n are compatible with respect to S and S 0 . If C is non-selfintersecting, again C must be compatible with respect to S and S 0 . If C is self-intersecting, there is an edge ei which is visited at least twice, so we can write C = (e1 , . . . , ei−1 , ei , ei+1 , . . . , So we can decompose C into two cycles = (e1 , . . . , ei−1 , ei , ej+1 , . . . , en ) C2 = (ei+1 , . . . , ej−1 , ej = ei ) S(e) + e∈C1 X X S(e) = e∈C2 0 S (e) + e∈C1 X X S(e) S (e) = X S 0 (e) e∈C2 e∈C But C1 and C2 are cycles of length less than n, so by induction X X 0 S(e) = S (e) e∈C1 X e∈C1 S(e) = e∈C2 X S 0 (e) e∈C2 and so C must be compatible with respect to S and S 0 .  For any subset V ⊆ V , define 0 β(V ) = {e = (v1 , v2 ) ∈ E : v1 6∈ V 0 , v2 ∈ V 0 } = {e = (v1 , v2 ) ∈ E : v1 ∈ V 0 , v2 6∈ V 0 } Definition (Retiming) Take F (V 0 , x) : S → S 0 to be the retiming operation, V 0 ⊆ V , x ∈ Z, and S, S 0 integer labelings of the edges of G, such that  0  S(e) + x if e ∈ α(V ) 0 S(e) − x if e ∈ β(V 0 ) S (e) =  S(e) otherwise Lemma 1 F (V 0 , x, S) is a labeling compatible with S. Proof Consider any cycle C. |C ∩ α(V 0 )| = |C ∩ β(V 0 )|, so X X S(e) = F (V 0 , x, S)(e) e∈C • e0 = (vi+1 , vi ) ∈ E and e0 is is satisfied by S. Since every edge participates in at least one cycle, there is a cycle C containing e0 . Since S and T are compatible, C is satisfied by S, and since e0 is satisfied as well, then the path C \ {e0 } is also satisfied. Instead of adding e0 to P 0 , we add C \ {e0 }. Since P 0 is constructed by adjoining satisfied paths, P 0 itself is satisfied. Now adjoining e to P 0 gives a cycle C 0 , and C 0 must be satisfied due to the compatibility of S and T . But then C 0 \ P 0 = {e} must be satisfied as well, a contradiction. Thus v1 6∈ Ve .  Lemma 3 If edge e0 is satisfied by S, then e0 is also satisfied by F (Ve , x, S). 0 α(V 0 ) Let S and T be compatible integer labelings of the edges of G such that S leaves some edge e = (v1 , v2 ) unsatisfied. Let Ve = Sc (S, v2 ). • e0 = (vi , vi+1 ) ∈ E and e0 is is satisfied by S. In this case we add the edge e0 to P 0 . Or: e∈C 0 Definition (Sat-Closure) For any v ∈ V and integer labeling of the edges S, define Sc (S, v) to be the set {v 0 ∈ V : ∃ a sat-path from v to v 0 with respect to S}. Proof Suppose v1 ∈ Ve . Then there exists a sat-path P = (v2 , v3 , v4 , . . . , vn = v0 , v1 ). Now consider the construction of a (regular) path P 0 from v2 to v1 from P by examining every pair of vertices (vi , vi+1 ) in the order they appear along P . Either: so that X • ei = (vi+1 , vi ) ∈ E and ei is satisfied by S Lemma 2 v1 6∈ Ve . ej−1 , ej = ei , ej+1 , . . . , en ) C1 • ei = (vi , vi+1 ) ∈ E and ei is satisfied by S, or e∈C  Note that compatibility of labelings is reflexive and transitive, so if S is compatible with T , then F (V 0 , x, S) is compatible with T as well. Definition (Sat-Path) Let S be a labeling of the edges of G by integers. A sat-path from v1 ∈ V to vn ∈ V with respect to S is an ordered list of vertices (v1 , . . . , vn ) such that, for all 1 ≤ i < n, either Proof Let e0 be satisfied by S. Suppose e0 = (v10 , v20 ) ∈ α(Ve ). Then v10 6∈ Ve . Now v20 ∈ Ve , and so v10 ∈ Ve , since e0 is satisfied by S. But this is a contradiction, so e0 6∈ α(Ve ). Likewise, e0 6∈ β(Ve ). Thus F (Ve , x, S)(e0 ) = S(e0 ) = T (e0 ).  Theorem 2 Let S and T be compatible integer labelings of the edges of G. Then there exists a finite sequence of successive retiming operations which satisfies all edges (i.e. transforms S to T ). Proof Suppose there exists some unsatisfied edge in G. Then there must be an edge e = (v1 , v2 ) such that S(e) < T (e). If this were not the case, then for all e0 ∈ E, S(e0 ) ≥ T (e0 ), so there must be some cycle C 0 where X X S(e0 ) > T (e0 ) e0 ∈C 0 e0 ∈C 0 which is a contradiction, since S and T are compatible. Let x = T (e) − S(e). From Lemma 2, v1 6∈ Ve , so e ∈ α(Ve ). Applying F (Ve , x, S) then gives a labeling S 0 where e is satisfied, and edges which were satisfied by S are still satisfied by F (Ve , x, S), from Lemma 3. Thus the number of unsatisfied edges strictly decreases with this operation, and we can repeat this process a finite number of times until all edges are satisfied.