[go: up one dir, main page]

0% found this document useful (0 votes)
9 views10 pages

CH 3

Chapter 3 focuses on power optimization techniques in FPGA design, emphasizing the challenges FPGAs face compared to ASICs in achieving low power consumption. Key topics include clock control methods to reduce dynamic power, the impact of clock skew, and strategies for managing input control and voltage supply. The chapter also discusses the benefits of dual-edge triggered flip-flops for enhancing functionality while minimizing power dissipation.

Uploaded by

sathyaelectrical
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views10 pages

CH 3

Chapter 3 focuses on power optimization techniques in FPGA design, emphasizing the challenges FPGAs face compared to ASICs in achieving low power consumption. Key topics include clock control methods to reduce dynamic power, the impact of clock skew, and strategies for managing input control and voltage supply. The chapter also discusses the benefits of dual-edge triggered flip-flops for enhancing functionality while minimizing power dissipation.

Uploaded by

sathyaelectrical
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Chapter 3

Architecting Power

This chapter discusses the third of three primary physical characteristics of a


digital design: power. Here we also discuss methods for architectural power
optimization in an FPGA.

Relative to ASICs (application specific integrated circuits) with comparable func-


tionality, FPGAs are power-hungry beasts and are typically not well suited for
ultralow-power design techniques. A number of FPGA vendors do offer low-
power CPLDs (complex programmable logic devices), but these are very limited
in size and capability and thus will not always fit an application that requires any
respectable amount of computing power. This section will discuss techniques to
maximize the power efficiency of both low-power CPLDs as well as general
FPGA design.
In CMOS technology, dynamic power consumption is related to charging and
discharging parasitic capacitances on gates and metal traces. The general equation
for current dissipation in a capacitor is
I ¼ VC f
where I is total current, V is voltage, C is capacitance, and f is frequency.
Thus, to reduce the current drawn, we must reduce one of the three key par-
ameters. In FPGA design, the voltage is usually fixed. This leaves the parameters
C and f to manipulate the current. The capacitance C is directly related to the
number of gates that are toggling at any given time and the lengths of the routes
connecting the gates. The frequency f is directly related to the clock frequency.
All of the power-reduction techniques ultimately aim at reducing one of these two
components.
During the course of this chapter, we will discuss the following topics:
. The impact of clock control on dynamic power consumption
. Problems with clock gating
Managing clock skew on gated clocks
. Input control for power minimization

Advanced FPGA Design. By Steve Kilts


Copyright # 2007 John Wiley & Sons, Inc.

37
38 Chapter 3 Architecting Power

. Impact of the core voltage supply


. Guidelines for dual-edge triggered flip-flops
. Reducing static power dissipation in terminations
Reducing dynamic power dissipation by minimizing the route lengths of high
toggle rate nets requires a background discussion of placement and routing, and is
therefore discussed in Chapter 15 Floorplanning.

3.1 CLOCK CONTROL

The most effective and widely used technique for lowering the dynamic power
dissipation in synchronous digital circuits is to dynamically disable the clock in
specific regions that do not need to be active at particular stages in the data flow.
Since most of the dynamic power consumption in an FPGA is directly related to
the toggling of the system clock, temporarily stopping the clock in inactive
regions of the design is the most straightforward method of minimizing this type
of power consumption. The recommended way to accomplish this is to use either
the clock enable pin on the flip-flop or to use a global clock mux (in Xilinx
devices this is the BUFGMUX element). If these clock control elements are not
available in a particular technology, designers will sometimes resort to direct
gating of the system clock. Note that this is not recommended for FPGA
designs, and this section describes the issues involved with direct gating of the
system clock.
Clock control resources such as the clock enable flip-flop input or a global clock
mux should be used in place of direct clock gating.

Note that this section assumes the reader is already familiar with general
FPGA clocking guidelines. In general, FPGAs are synchronous devices, and a
number of difficulties arise when multiple domains are introduced through gating
or asynchronous interfaces. For a more in-depth discussion regarding clock
domains, see Chapter 6.
Figure 3.1 illustrates the poor design practice of simple clock gating. With this
clock topology, all flip-flops and corresponding combinatorial logic is active
(toggling) whenever the Main Clock is active. The logic within the dotted box,
however, is only active when Clock Enable ¼ 1. Here, we refer to the Clock
Enable signal as the gating or enable signal. By gating portions of circuitry as
shown above, the designer is attempting to reduce the dynamic power dissipation
proportional to the amount of logic (capacitance C) and the average toggle
frequency of the corresponding gates (frequency f ).
Clock gating is a direct means for reducing dynamic power dissipation but
creates difficulties in implementation and timing analysis.

Before we proceed to the implementation details, it is important to note how


important careful clock planning is in FPGA design. The system clock is central
3.1 Clock Control 39

Figure 3.1 Simple clock gating: poor design practice.

to all synchronous digital circuits. EDA (electronic design automation) tools are
driven by the system clock to optimize and validate synthesis, layout, static
timing analysis, and so forth. Thus, the system clock or clocks are sacred and
must be characterized up front to drive the implementation process. Clocks are
even more sacred in FPGAs than they are in ASICs, and thus there is less flexi-
bility relative to creative clock structures.
When a clock is gated even in the most trivial sense, the new net that drives
the clock pins is considered a new clock domain. This new clock net will require
a low-skew path to all flip-flops in its domain, similar to the system clock from
which it was derived. For the ASIC designer, these low-skew lines can be built in
the custom clock tree, but for the FPGA designer this presents a problem due to
the limited number and fixed layout of the low-skew lines.
A gated clock introduces a new clock domain and will create difficulties for the
FPGA designer.
The following sections address the issues introduced by gated clocks.

3.1.1 Clock Skew

Before directly addressing the issues related to gated clocks, we must first briefly
review the topic of clock skew. The concept of clock skew is a very important
one in sequential logic design.
In Figure 3.2, the propagation delay of the clock signal between the first flip-
flop and the second flip-flop is assumed to be zero. If there is positive delay
through the cloud of combinatorial logic, then timing compliance will be deter-
mined by the clock period relative to the combinatorial delay þ logic routing
40 Chapter 3 Architecting Power

Figure 3.2 Clock skew.

delay þ flip-flop setup time. A signal can only propagate between a single set of
flip-flops for every clock edge. The situation between the second and third flip-
flop stages, however, is different. Because of the delay on the clock line between
the second and third flip-flops, the active clock edge will not occur simultaneously
at both elements. Instead, the active clock edge on the third flip-flop will be
delayed by an amount dC.
If the delay through the logic (defined as dL) is less than the delay on the
clock line (dC), then a situation may occur where a signal that is propagated
through the second flip-flop will arrive at the third stage before the active edge of
the clock. When the active edge of the clock arrives, the same signal could be
propagated through stage 3. Thus, a signal could propagate through both stage 2
and stage 3 on the same clock edge! This scenario will cause a catastrophic
failure of the circuit, and thus clock skew must be taken into account when per-
forming timing analysis. It is also important to note that clock skew is indepen-
dent of clock speed. The “fly-through” issue described above will occur exactly
the same way regardless of the clock frequency.
Mishandling clock skew can cause catastrophic failures in the FPGA.

3.1.2 Managing Skew

Low-skew resources provided on FPGAs ensure that the clock signal will be
matched on all clock inputs as tightly as possible (within picoseconds). Take, for
instance, the scenario where a gate is introduced to the clock network as shown in
Figure 3.3.
The clock line must be removed from the low-skew global resource and
routed to the gating logic, in this case an AND gate. The fundamental problem of
adding skew to the clock line is now the same as it was in the problem described
previously. It is conceivable that the delay through the gate (dG) plus the routing
delays will be greater than the delay through the logic (dL). To handle this poten-
tial problem, the implementation and analysis tools must be given a set of con-
straints such that any timing problems associated with skew through the gating
item are eliminated and then analyzed properly in post-implementation analysis.
3.1 Clock Control 41

Figure 3.3 Clock skew introduced with clock gating: Poor design practice.

As an example, consider the following module that uses clock gating:

// Poor design practice


module clockgating(
output dataout,
input clk, datain,
input clockgate1);
reg ff0, ff1, ff2;
wire clk1;
// clocks are disabled when gate is low
assign clk1 = clk & clockgate1;
assign dataout = ff2;
always @(posedge clk)
ff0 <= datain;
always @(posedge clk)
ff1 <= ff0;
always @(posedge clk1)
ff2 <= ff1;
endmodule

In the above example, there is no logic between the flip-flops on the data path,
but there is logic in the clock path as shown in Figure 3.4.

Figure 3.4 Clock skew as the dominant delay.


42 Chapter 3 Architecting Power

Different tools handle this situation differently. Some tools such as Synplify
will remove the clock gating by default to create a purely synchronous design.
Other tools ignore skew problems if the clocks remain unconstrained but will add
artificial delays once the clocks have been constrained properly.
Unlike ASIC designs, hold violations in FPGA designs are rare due to the
built-in delays of the logic blocks and routing resources. One thing that can cause a
hold delay, however, is excessive delay on the clock line as shown above. Due to the
fact that the data propagates in less than 1 ns and the clock in almost 2 ns, the data
will arrive almost 1 ns before the clock and lead to a serious timing violation.
Depending on the synthesis tool, this can sometimes be fixed by adding a clock con-
straint. A subsequent analysis may or may not show (depending on the technology)
that artificial routing delay was added to the data path to eliminate the hold violation.
Clock gating can cause hold violations that may or may not be corrected by the
implementation tools.

It is again worth reiterating that most vendors have advanced clock buffer tech-
nology that provide enable capability to certain branches of the clock tree. This
type of control is always recommended above clock gating with logic elements.

3.2 INPUT CONTROL

An often overlooked power-reduction technique is that of input slew rates. CMOS


input buffers can create excessive current draw under conditions where both the
high-side and low-side transistors are conducting at the same time. To conceptual-
ize this, consider a basic first-order model of a CMOS transistor that describes Ids
in terms of Vds as illustrated in Figure 3.5, where the regions are defined by:
Cutoff: Vgs , Vth
Linear (resistive): 0 , Vds , Vgs – Vth
Saturation: 0 , Vgs – Vth , Vds
where Vgs is the gate-to-source voltage, Vth is the device threshold voltage, and
Vds is the drain-to-source voltage.
An ideal switching scheme would be one where the input to a gate switched
from cutoff to the linear region instantaneously, and the complementary logic
switched the opposite direction at the same instant. If one of the two complements
is always in cutoff, there is no current flowing through both sides of the logic gate
at the same time (and thus providing a resistive path between power and ground).
For an inverter, this would mean that the NMOS (N-channel MOSFET) device
would transition from 0 to VDD (positive power rail) taking the NMOS from
cutoff to the linear region instantly, and the PMOS (P-channel MOSFET) would
transition from the linear region to cutoff at the same instant. In the opposite tran-
sition when Vgs transitions from VDD to 0, the NMOS would move from the
linear region to cutoff instantly, and the PMOS would move from the cutoff
region to the linear region at the same instant.
3.2 Input Control 43

Figure 3.5 Simple I/V curve for a CMOS transistor.

In a real system, however, we must take into consideration the transition


times and the behavior of the transistors during those transitions. For instance,
consider a CMOS inverter that has an input of 0 V and an output of VDD. As
the input transitions from 0 to VDD (a 0 to 1 transition), the NMOS transistor
leaves the cutoff region as soon as the input passes the threshold Vth and enters
into the saturation region. The PMOS device is still in the linear region during
the early part of this transition, and so current begins to flow between VDD
and ground. As the input rises, the output falls. When the drain of the NMOS
falls below a threshold of the gate voltage, the NMOS transitions into the linear
region, and the PMOS transitions to saturation and then to cutoff. To minimize
the power dissipation, it is desirable to minimize the time in the saturation region;
that is, minimize the time during which the gate inputs are transitioning.
To minimize the power dissipation of input devices, minimize the rise and fall
times of the signals that drive the input.

Another important conclusion can be drawn from the above equations. If the
driving signal is not within a threshold voltage of 0 or Vdd in steady state (i.e.,
when the gate is not switching), the transistor previously in cutoff will enter into
the saturation region and begin to dissipate a small amount of current. This can
be a problem in systems where smaller signal swings are used to drive inputs that
are powered by a higher voltage.
In harmony with the principle described above, a floating input may be an
even worse problem than an underdriven input. A floating input is by definition
an underdriven input, but because it is floating there is no way to know how
underdriven it is. It may be that the input has settled at a metastable point where
both transistors are in the saturation region. This would have disastrous impli-
cations relative to power dissipation. Worse yet, this would not be a repeatable
problem. Because most FPGA devices have resistive terminations available for
unused inputs, it is good design practice to define a logic state for these and avoid
the unpredictable effects of floating inputs.
Always terminate unused input buffers. Never let an FPGA input buffer float.
44 Chapter 3 Architecting Power

3.3 REDUCING THE VOLTAGE SUPPLY

Although reducing the supply voltage is usually not a desirable option, it is worth
mentioning due to the dramatic effect it can have on power consumption. Power
dissipation in a simple resistor will drop off with the square of the voltage. Thus,
significant power savings can be achieved by lowering the power supply voltage
of the FPGA near the minimum required voltage. It is important to note, however,
that lowering the voltage will also decrease the performance of the system. If this
method is used, ensure that the timing analysis takes into consideration the lowest
possible voltage on the supply rail for worst-case maximum timing.
Dynamic power dissipation drops off with the square of the core voltage, but
reducing voltage will have a negative impact on performance.
Because the core voltage on an FPGA will be rated from 5% to 10% of
the specified value, great care must be given to this from a system perspective.
Typically, power issues can be addressed with other techniques while keeping the
core voltage well within the specified range.

3.4 DUAL-EDGE TRIGGERED FLIP-FLOPS

Due to the fact that power dissipation is proportional to the frequency that a
signal toggles, it is desirable to maximize the amount of functionality for each
toggle of a high fan-out net. Most likely, the highest fan-out net is the system
clock, and thus any techniques to reduce the frequency of this clock would have a
dramatic impact on dynamic power consumption. Dual-edge triggered flip-flops
provide a mechanism to propagate data on both edges of the clock instead
of just one. This allows the designer to run a clock at half the frequency that
would otherwise be required to achieve a certain level of functionality and
performance.
Coding a dual-edge triggered flip-flop is very straightforward. The following
example illustrates this with a simple shift register. Note that the input signal is
captured on the rising edge of the clock and is then passed to dual-edge flip-flops.

module dualedge(
output reg dataout,
input clk, datain);
reg ff0, ff1;
always @(posedge clk)
ff0 <= datain;
always @(posedge clk or negedge clk) begin
ff1 <= ff0;
dataout <= ff1;
end
endmodule
3.5 Modifying Terminations 45

Note that if dual-edge flip-flops are not available, redundant flip-flops and
gating will be added to emulate the appropriate functionality. This could comple-
tely defeat the purpose of using the dual-edge strategy and should be analyzed
appropriately after implementation. A good synthesis tool will at least flag a
warning if no dual-edge devices are available.
Dual-edge triggered flip-flops should only be used if they are provided as primi-
tive elements.
The Xilinx Coolrunner-II family includes a feature named CoolClock, which
divides the incoming clock by 2 and then switches the flip-flops to dual-edge
devices as described above. From an external perspective, the device behaves the
same as a single-edge triggered system but with half of the dynamic power dissi-
pation on the global clock lines.

3.5 MODIFYING TERMINATIONS

Resistive loads connected to output pins are common in systems with bus
signals, open-drain outputs, or transmission lines requiring termination. In all of
these cases, one of the CMOS transistors on the output driver of the FPGA
will need to source or sink current through one of these resistive loads. For
outputs requiring pull-up resistors, calculate the minimum acceptable rise-time
to size the resistor as large as possible. If there are high side drivers as well as
low side drivers, ensure there is never a condition where bus contention occurs
as this will draw excessive currents even if for only a few nanoseconds at
a time. For transmission lines with shunt termination at the load, a series termin-
ation may be used as an alternate depending on the requirements of the system.
As can be seen in Figure 3.6, there is not steady-state current dissipation with a
series termination.
There is no steady-state current dissipation with a series termination.

The disadvantages are


. An initial reflection from the load to the terminating resistor
. A small amount of attenuation through the series resistor during a transition

Figure 3.6 Termination types.


46 Chapter 3 Architecting Power

If these performance characteristics are acceptable for a given system, the


series termination approach will eliminate static power dissipation through the
termination resistor.

3.6 SUMMARY OF KEY POINTS

. Clock control resources such as the clock enable flip-flop input or a global
clock mux should be used in place direct clock gating when they are
available.
. Clock gating is a direct means for reducing dynamic power dissipation but
creates difficulties in implementation and timing analysis.
. Mishandling clock skew can cause catastrophic failures in the FPGA.
. Clock gating can cause hold violations that may or may not be corrected by
the implementation tools.
. To minimize the power dissipation of input devices, minimize the rise and
fall times of the signals that drive the input.
. Always terminate unused input buffers. Never let an FPGA input buffer float.
. Dynamic power dissipation drops off with the square of the core voltage,
but reducing voltage will have a negative impact on performance.
. Dual-edge triggered flip-flops should only be used if they are provided as
primitive elements.
. There is no steady-state current dissipation with a series termination.

You might also like