Verilog Chapter1 Eng
Verilog Chapter1 Eng
This chapter introduces naming conventions and synchronous design issues that
should be kept in mind during the design process, as well as considerations and
cautions relating to asynchronous design, clocks, and hierarchical design.
1-1
1.1. Naming Conventions
recommend 2
[2] Only alphanumeric characters and the underscore ’_’ should be used,
mandatory
and the first character should be a letter of the alphabet
Basic Design Constraints
[5] Do not distinguish names by using upper or lower case English letters mandatory
(Abc, abc)
[6] Do not use an ’_’ (underscore) at the end of the primary port name or
recommend 1
module name, and do not use ’_ ‘ consecutively
[7] Add an identifying symbol at the end of the name so the polarity of
recommend 2
negative logic signals is clearly identified (”_X” , ”_N”, for example)
[8] Instance names should basically be the module names. Instance names
recommend 3
that are used more than once should be ”<module name>_<quantity>”
[9] At the top level, module names and port names should consist of 16 or fewer
characters and should not be distinguished by upper or recommend 1
lower case alphabet letters
[10] Do not use the same instance name or cell name as the ASIC library
mandatory
being used
Explanation
1-2
1.1. Naming Conventions
Chapter 1
in, inertial, is, label, library, linkage, literal,
loop, map, mod, new, next, null, of,on,
open, others, out, package, port, postpond, procedure,
process, pure, range, record, register, reject, rem,
report, return, rol, ror, select, severity, signal,
Identifiers such as module names , instance names , signal names and port names must
facilitate understanding of the HDL description functions. Names that make debugging
more efficient should be carefully chosen because if different naming conventions are used
by different designers, circuits that were divided into sections by multiple designers will
be difficult to understand when they are integrated. This is why consistent naming con-
ventions are important. Further, if design tools are used to automatically analyze or
modify the circuit structure, uniform command convention being used will allow easier
description of script code.
A single file should contain a single module, but this is unfit for a large design with many
files since it becomes difficult to handle. In this case, multiple modules can be included in
a file, but modules which have no relation to one another should not be included in the
same file. A single file should include modules which have a tree hierarchical structure.
The top module name and file name should be the same.
Since keywords defined in Verilog-HDL language specifications are in lower case, a de-
signer would have the advantage of names being easily distinguishable within the code,
provided that identifiers such as designer-defined I/O signal names were defined in upper
case letters. Since Verilog-HDL is case-sensitive, it is possible to define an identifier in
upper case letters in situations such as this with the same spelling as the keyword, al-
though it is not recommended because it could lead to confusion (INPUT, TASK for ex-
ample). In addition to Verilog-HDL keywords, software and VHDL keywords that are used
at later stages should be avoided as well. [3]
1-3
1.1. Naming Conventions
The use of EDIF, SDF and Windows keywords will not cause problems. Nevertheless, it is
a better practice not to use these keywords. If need be, refer to the following section to
help avoid any possible confusion.
Chapter 1
CON,AUX,COM1,COM2,COM3,LPT1,LPT2,PRN,NUL
In VHDL there is a convention which states that the final character must not be ’_’. This
character is sometimes used in gate level verification by VITAL, so please refrain from
using ’_’ at the end of top level module names and port names . In addition, it is also
forbidden to use ’_’ consecutively. [6]
Add an identifying symbol (”_X”,”_N”, for example) at the end of negative logic signals
(suffix). [7] If an identifier is added at the top of signals (prefix), like ”X_”, it becomes
difficult to distinguish it from identifiers of hierarchies or identifiers of delimiting func-
tion . Therefore, to identify negative logic signals, delimit with ”_” at the end followed by
identifying characters such as ’X’ and ’N’. If adding an identifier of clock system at the
end, an identifier of negative logic signal can be used just before it. Describe identifiers
used in the document. As a rule, the same name should be used for the module name and
the file. This makes it easy to generate script when using a simulator or a logic synthesis
tool.
Instance names should be based on the module name and ”< module names >_<quantity>”
if multiple instances exist. [8] If an instance name that does not conform to this naming
convention (all the rules of user definition) is used, explicity describe it in the document.
In order to be used as the IP core, module names in the top level must use only alphabeti-
cal letters or numbers and be 16 or fewer characters in length. Port names should be 16
or fewer characters in length, and must use alphabetical letters, numbers, or ’_’. Names
beginning or ending with ’_’ and names that use ’_’ consecutively are prohibited. To sup-
port systems that are not case sensitive, names should use only upper case or lower case
and cases should not be mixed. [9]
1-4
1.1. Naming Conventions
Refer to the following sections for rules on file naming, signal naming, pin naming and
module naming, etc .
"1.1.2. Naming conventions of circuit and port names should be considered by the hierarchy"
"1.1.3. Give meaningful names for signals"
Chapter 1
"3.1.3. Standardize description order of module I/O ports"
"3.5.2. File suffix names"
Naming conventions are explained in RMM:5.2.1. There is no limitation for the number of
characters in RMM and negative logic has _n at the end.
1-5
1.1. Naming Conventions
1.1.2. Naming conventions of circuit and port names should be considered by the hierarchy
[4] The first string of an output port name for each block should be
recommend 3
“<hierarchy identification character>” + “_”
[5] Naming conventions for input port names and output port names reference
for each block should be different from those for internal signal names
[6] Output port names and the connected net names should be the same
recommend 2
- Upper level net names and the input port names should be the same
(recommend 2)
Example Code
For each design unit (IP), define a block name (max. 16 characters)
Output port names of each block should start with “<block name’s hierarchy id> “ + “ _ “ [4]
Explanation
Module name and instance name should be between 2 and 32 characters in length. Some
ASIC vendors have a limitation of up to 32 characters. Logic synthesis tools will change
module or instance names if they exceed 32 characters. Instance names should not be
long because readability decreases when confirming a signal value of the third, fourth or
lower layers of hierarchy by a simulator or confirming timing analysis report. Instance
names of 16 or fewer characters is recommended.
1-6
1.1. Naming Conventions
Chapter 1
long. Therefore, it is recommended that an instance names including module hierarchy
should be 128 or fewer characters.
A description can be made more readable and debugging efficiency improved by naming
The hierarchy identification character is the first letter of the top (root) level instance
name. [2] Also, the hierarchy identification character of the top level will be added to the
beginning of the instance name in the level under that. [3] Example 1-2 illustrates the use
of a unique letter (g, l, c, v, r) as the hierarchy identification character for the first
character of instance names placed at the top level. There are five instances in this case.
The sub-block name under the geomet block adds an identification character (gt, gm,
gr, gf, gn, gc) at the beginning that includes the geomet level identification charac-
ter (g). By adding a hierarchy identification character to the beginning of each hierarchy
name in this manner, it becomes possible to prevent the exact same name from appearing.
The number of identification characters would be too large if the hierarchy identification
characters of the levels under each sub-block consecutively inherit the hierarchy identifi-
cation characters of its upper level.
(Example : geomet(g) -> gtex(gt) -> gt1port(gt1) -> gt1count(gt1c))
It is not essential that all upper level hierarchy identification characters be included in
the hierarchy identification characters of a sub-block, but it is necessary to use at least
the upper hierarchy identification characters at the beginning as well as be unique within
designs.
Using the hierarchy identification characters for each level of hierarchy in the output port
names facilitates debugging. [4] The initial character of the block should be added at the
beginning. This is to make it easier to understand which block a signal is output from
when debugging. Defining module names , signal names , and instance names that con-
form to the naming conventions makes it easier to examine a circuit during debugging
and improves readability of the code.
Be sure to follow these naming conventions during the entire design process.
1-7
1.1. Naming Conventions
TIM
TREG
TRA_SIG TRA_SIG TRA_SIG
TRADD
Chapter 1
For some cases, T_SIG Add one character that expresses the block at
the beginning of each signal
Basic Design Constraints
Figure 1-1 illustrates the naming conventions for module names and signal names in
circuits that have a hierarchical structure. The initial character in the string of the sub-
blocks under the upper level must be a unique character within the same level. Module
names under the sub-blocks add to their beginning one character that indicates a module
name in a top level, making it easier to understand the hierarchical structure by the
module name alone.
Use signal name of output ports for signal names in the hierarchical structure. By doing
so, it allows you to know from which module name that identifier is generated. A general
rule is illustrated in Figure 1-1: when output ports from a lower level end up as an upper
level output, the lower level output port name is used as is.
It would be useful to have a different naming convention for input/output signal names for
a block. [5] One example is to use only upper case characters for the block’s I/O signal
names and to use lower case for the internal signal names . This will simplify distinguish-
ing between external signal names and internal signal names . But these I/O signal names
are often converted to all upper case or lower case after synthesis and layout using an
ASIC vendor’s tools. Therefore, some suggest that either all upper case or all lower case
be used.
However, it is not easy to read names if all upper or lower case characters are used. As
long as you follow the rule described in item 5 of “1.1.1. Basic naming conventions”: “Do
not distinguish names by using upper or lower case English letters”, it will help improve
readability when using both upper case and lower case characters.
It is best that the net name of the upper level to which an output signal name is connected
and the input signal name are the same. [6] The output signal name is used for an input
signal name even when the output signal is input to multiple blocks. This is because
output signals are more important for debugging purposes.
During the initial stage of a collaborative design by a large number of designers, the out-
put port name of the blocks is sometimes unknown. In such a case, an input signal name
cannot follow the rules. At the very least, the output port name and net name should be
1-8
1.1. Naming Conventions
the same. Top level output ports may lead to confusion if a hierarchy name is included, so
hierarchy identification characters should not be added. This rule does not apply to mod-
ules used more than once like a design library or to the small sub-blocks.
Chapter 1
For a large scale design or block (200K to 1,000K gates equivalent), avoid using multiple
instances of a single module (10K to 20K gate equivalent). The reason for this is when
logic synthesis is applied, the contents of each instance will be optimized differently and
module names will also be altered. It is confusing to have a module name which is differ-
If a signal name is used more than once, the upper net name should basically consist of
”<hierarchy identification character>”+ ”<number added to instance name>”+”_”+”<output
signal name (without hierarchy identification character)>”.
Example 1-3 Upper level net name when using multiple instances
With regard to bi-directional bus signals, the upper net name and lower input/output port
name should be the same and an hierarchy identification characters should not be added
because multiple outputs exist.
RTqualify checks all the items of 1.1.2. All naming conventions can be defined by regular
expression. However, checks are based on limited specifications for 1122, 1123 and 1124.
1-9
1.1. Naming Conventions
to which it is connected.
1126b(W2) Net name "<net_name>" of higher net does not match input port name
"<port_name>".
1126 will not be checked if the same module is used for two or more times.
With RTqualify, all the numeric values and naming_style can be modified by setting file.
Naming conventions are explained in RMM:5.2.1. RMM recommends using lower case charac-
ters for all names.
1-10
1.1. Naming Conventions
[1] Naming conventions for internal signal names of blocks should be different from
reference
those for input and output ports
Chapter 1
[2] Give meaningful and comprehensive names for internal signal names of reference
hierarchy
[3] Signal names, port names, parameter names, `define names and
Example Code
_X,_N should be added at the end of signal names when using negative logic
if ( !RESET ) if ( !RESET_X )
Q <= 1'b0; Q <= 1'b0;
if ( PLUS_or_MINUS ) if ( PLUS )
Q <= Q + 1'b1; Q <= Q + 1'b1;
else else
Q <= Q - 1'b1; Q <= Q - 1'b1;
Internal signals
DataMemWrite - Memory write signal
DataMemAdr[31:0] - Memory address
pos_GT_cos - Comparison result
Counter_Load_Val - Load signal to register
Hsync_conter_clear - Reset signal of counter
Parameter names
parameter P_Slot_length = 4'd10;
parameter P_Timegrad_length = 3'd3;
parameter P_StvRiseStartPoint= 0;
parameter P_StvFallStartPoint= 8'd111;
If following “1.1.1.[5] Do not distinguish names by using upper or lower case English let-
ters” strictly, only either upper or lower case letters may be used for all the signal names ,
1-11
1.1. Naming Conventions
port names and module names . However, readability decreases if all of those names are
either all upper or lower case letters. The item of 1.1.1[5] can currently be checked by the
RTL check tool. It is preferable to mix upper and lower case letters after checking by
these kind of tool. A meaningful name is preferred for increased readibility over a short
Chapter 1
Naming conventions should be specified for port names , internal signal names and param-
eter names to distinguish between each other [1]. For example, all port names are in upper
Basic Design Constraints
case letters and internal signal names alone are based on lower case letters. More ex-
amples are that an hierarchy identification character is added to port names and only
parameters names are in upper case letters. In any case, consider naming conventions,
which are easy to understand and unified.
The number of characters in signal names , port names , parameter names , define names
and function names should be between 2 and 40. [3] A tool used at a later stage might
convert a signal name which is too long. Although it is true that a long signal name is
more understandable than a short one, an overly long name makes it unreadable. There-
fore, the basis for signal name should be up to 24 characters in length.
RTqualify cannot judge whether a name is meaningful or not. 1.1.3.[3] checks the number of
characters for each signal name, port name, parameter name define name and function name
separately.
Naming conventions are explained in RMM:5.2.1. RMM recommends using lower case charac-
ter for all names.
1-12
1.1. Naming Conventions
1.1.4. Naming conventions of include file, parameter and `define (different from VHDL)
[1] Use either ”.h”, ”.vh” or ”.inc” for RTL description and ”.h”, ”.inc”, ”.ht” or ”.tsk”
recommend 2
for test benches as the include file (Verilog only)
Chapter 1
[2] Parameter names should have a different naming convention recommend 3
[3] Do not use parameters with the same name for different modules recommend 3
[6] Fixed values should not be connected directly to output ports (recommend 1)
- Fixed values should not be connected to input ports (reference). recommend 1
[7] Parameterize the bit width of ports required for circuits that will be reused recommend 3
[8] Clarify <value> ’b, ’h, ’d, ’o specification for parameters (Verilog only) recommend 1
[9] Specify the bit width if it is greater than 32 bits (Verilog only) mandatory
Example Code
parameter P_Geomet_Datalength = 8;
parameter P_Geomet_PxDefalut = 10'd323;
Explanation
Whenever possible, put data to be used as parameters into include files thus making it
easy to change parameter values. Distinguish parameters used for the overall design from
parameters used only under particular hierarchies, [3] and place each one into a separate
include file. For Verilog-HDL descriptions, use a relative path name for include files.
Even if there is an include file in the same directory, refer to the next higher level using
”include ../RTL/compara.v”. This should be done to prevent trouble from occurring when
executing an EDA tool in another directory.
You should add the special identification character ”P_” in front of a parameter to distin-
guish it from other signal names [2]. There are two methods to declare a parameter: using
a parameter statement and using `define. `define is regarded as a compile directive and
therefore becomes available inside modules other than the one in which `define is defined.
1-13
1.1. Naming Conventions
In RTL descriptions, `define on global position and `define in other modules should not be
used. When generating a logic circuit with a logic synthesis tool, each module may be
generated separately. In the case of using `define as described above, it becomes impos-
sible to generate a logic circuit. Of course, it is acceptable to call a file that defines `define
Chapter 1
from an include statement . There are cases when `define may be used outside the module
inadvertently, so beware when using `define . It is advised that these be reviewed with an
RTL check tool.
Basic Design Constraints
It is recommended that parameters which are used in an overall design be defined with a
parameter statement in the include files which are called by each module. To avoid un-
necessary confusion, it is recommended not to use `define as a parameter in overall de-
sign.
A fixed value should not be connected directly to input/output ports. [6] After applying
synthesis optimization from the upper level hiearchy, ports that are directly connected to
fixed value may become unconnected ports. This situation may cause problems during
logic equivelancy checks. In cases where upper level hiearchy's ports are connected to
fixed values, there should be no problem. However, after applying synthesis optimization,
there might remain some redundant logic. This will increase gate count and should be
considered carefully.
In a parameter, describe ’b, ’h, ’d and ’o clearly when defining any numeric value greater
than 8. In particular, when a value greater than 10 is specified, there is a possibility that
a designer may mistake it for a hexadecimal number. For example, 12 is not ’h12.
Specify bit width in a parameter as much as possible. However, since one parameter value
may be assigned to multiple signals with different bit widths, it is not necessary to indi-
cate it to all.[8] Please note that parameters with no bit width specified have a bit width of
32. Specify the bit width when declaring parameter s greater than 32 bits. [9]
When using constants , use a parameter as much as possible so that check and modifica-
tion may be easily done. However, readability will be decreased if parameterizing all the
constants such as 0,1, numeric value for which all the bits are 1, and clauses in case
statements . In particular, parameterizing all the clauses in a case statement loses bit
image (except for a state machine description) and therefore quality may be decreased. As
for this type of clause, a constant value should be described as is, except when describing
complete parameterization.
1-14
1.1. Naming Conventions
Chapter 1
1143 will be supported in the next version.
1144a(E) Different values are defined by `define in multiple places.
1144b(W1) Character string defined by `define used in another module.
1145 (N) Layer ID character not used in parameter name.
1-15
1.1. Naming Conventions
[1] Give register output signal names that suggest the clock system or register recommend 3
Chapter 1
[2] Basically, use “CLK” or “CK” for clock signal names, “RST_X” or
recommend 3
“RESET_X” for reset signal names and “EN” for enable signal names.
Add identifiers to the end of these basic names.
Example Code
Explanation
In order to improve the readability of a description, a signal name based on the clock
system or signal names , which explicitly identify that a signal is a register output signal,
can be given to output signals of a register inference description.
First, decide the basic signal name for the clock signal, reset signal and enable signal.
Then add an identifier to the end of the basic signal name when more than one signals of
the same kind exist.
It is recommended to use basic signal name s of “CLK” or “CK” for a clock signal, “RST_X”
or “RESET_X” for a reset signal and “EN” for an enable signal.
For example, if multiple clocks exist, add one to three characters to the end of “CLK” or
“CK” like “CLK1”, “CLKM” or “CLK_CPU” etc.
Names which suggest a clock system can be given by adding the name of the clock signal
source, to the end of the signal name (ex.”_CK5").
It would be overly verbose to add clock identification to signals for the entire design. How-
ever, knowing which clock each signal is dependent upon is important in systems that
1-16
1.1. Naming Conventions
To clearly distinguish between a signal names of registers (FF, D latch) and combinational
Chapter 1
logic , one option is to add ”_REG” ("_reg", if the signal name is in lower case letters) at the
end of a signal name intended to be a register.
However, in the logic synthesis tool Design Compiler, the instance name of a register is
General naming conventions are explained in RMM: 5.2.1. RMM describes that "_r" should be
used for "_REG", "_a" for asynchronous signal and "_z" for tri-state signal.
1-17
1.2. Synchronous design
[1] Designs should use a single clock/single edge as much as possible recommend 1
[2] Do not create a RS latch or FF using primitive cells such as AND, OR mandatory
Explanation
Use the synchronous design method in HDL and logic synthesis tools. Using asynchro-
nous clocks makes adding precise design constraints difficult on logic synthesis. Utilize a
single clock with a single edge in your design whenever possible.
Figure 1-2 Asynchronous circuit and feedback of combinational circuits (avoid these examples)
As designs grow larger, the circuit operating speed is analyzed using static timing analy-
sis tools (Design Compiler, PrimeTime, BuildGates, etc.) instead of logic simulation. In
such situations, analysis becomes difficult if the clock system is complex. [1] In reality,
there are few systems that operate with a single clock and a single edge.
If using multiple clocks, try to minimize the number of clocks.
FF or latches can be created by using primitive cells, but this could be treated by the
timing analysis tool as feedback to a combinational circuit. [2] If combinational circuit
feedback cannot be avoided, use the set_disable_timing setting to avoid the effect of a
feedback loop during timing analysis. [3]
1-18
1.2. Synchronous design
Circuit designs such as the example in Figure 1-2 above should be avoided, but if your
design requires internally generated clocks, specify create_clock to the output of the FF
that generates the clock.
Chapter 1
Although a looped path that spans over FFs are not a problem, an asynchronous loop that
spans over latches are prohibited. Please refer to "2.4.1. The latch description is clearly
distinguished from the combinational circuit". Moreover, because the loop that spans over
asynchronous reset becomes an asynchronous loop, FF shown in the figure below in Fig-
Verilint Warning
W408 : Combinational circuit loop is detected
W506 : Description, which may become combinational circuit, is detected
1-19
1.3. Initial reset
reference
a circuit that cannot be reset properly.
[3] Do not use asynchronous set/reset pins for anything other than initial reset recommend 1
[4] When using synchronous reset circuits, establish a new hierarchy reference
for the register with synchronous reset
[5] Do not use synchronous reset directives for a particular logic synthesis tool recommend 3
[7] Do not use a FF with both asynchronous set and asynchronous reset recommend 1
Example Code
(a)
Error example of synchronous reset
RST_X
always @( posedge CLK )
if(RST_X == 1'b0) IN1 Q
Q <= 1'b0;
else if(Q == 1'd1) IN2
Q <= IN1;
else
Q <= IN2;
becomes…
(b)
- With circuit (a), a value is determined
at initial state (Q is unknown) IN2
- When circuit (b) is generated, Q
a value is not determined RST_X
at initial state
IN1
Explanation
Several different circuits can be generated with the synchronous reset FF inference illus-
trated in Figure 1-3. [1] For example, (a) is a circuit that resets after selecting the input
signal and (b) is a circuit that resets the input signal and then selects the signal.
In logic simulation, the initial FF state is (X) (unknown), but because circuit (a) connects
the reset signal to the gate directly before the FF data input, the data input is defined
1-20
1.3. Initial reset
since an AND operation is performed with the reset signal even if the FF’s output is ’X’.
In circuit (b), however, the output signal from the FF that is input to the selector is ’X’ and
since the EXOR gate output becomes ’X’ regardless of the input signal value, the FF data
input is always ’X’. As a result, the value for the FF is not defined by the synchronous
Chapter 1
reset signal.
This will not cause any problems as long as a circuit (a) is always synthesized from the
synchronous reset description, but there is no guarantee that such a circuit is always
Example 1-7 shows an example in which a synchronous reset FF description has been
changed into an asynchronous reset FF description. The always construct is activated by
the rising edge clock and the active low reset signal.
In addition to the above mentioned reason, asynchronous reset is realized during the lay-
out process (Refer to section "1.4.2. Use clock tree synthesis for clock balancing"). Since
some systems inherently only accept asynchronous reset, it is more realistic to specify
asynchronous reset (Refer to section "1.3.3. Be careful about external noise on an initial
reset signal")
Initial reset should be input for asynchronous reset, but other signals must not be input to
the asynchronous set and reset pins[3] because it is difficult to analyze the paths which the
asynchronous set and reset pass through during the timing analysis. In other words,
when using logic synthesis tools or static timing tools to perform an analysis, the timing
path is cut off without taking into account the timing from the register B reset input to
the Q output of register B, as shown in Figure 1-4.
Timing problems like this may occur if sets/resets other than initial reset are used, so it is
recommended that asynchronous set and reset not be used for purposes other than initial
reset.
1-21
1.3. Initial reset
B C
Chapter 1
Timing analysis
is cutoff
A
Basic Design Constraints
A B C
For a circuit structure like the one in Figure 1-4, a logic synthesis tool (e.g. Design Com-
piler) will not analyze the timing of the path arriving at the B asynchronous reset from
the output of A, unless it is manually specified. Therefore, it is possible that no error will
be reported even if the path has a delay of 1000 ns, and this will not be evident until after
the layout process.
Any resets other than initial resets should be synchronous resets, and should be distin-
guished from asynchronous resets. In the example in Fig. 1-3, the value of the FF be-
comes unknown state X. In this situation, even if the synchronous reset signal is TRUE,
there are cases in which the logic synthesis tool will generate a circuit that may not
assure initialization. If a reset signal is fed by more than two lines, guaranteeing a known
reset state is required. This can be achieved by creating a separate module containing a
FF with an AND gate as seen in Figure 1-5. By creating this extra hierarchical module,
the RTL description in Figure 1-3 will not generate circuit (b) in the same example.
However, in logic synthesis tools, there is a command that is used to flatten hierarchy. If a circuit
hierarchy containing the example in Figure 1-5 is flattened, the resulting circuit's output value
may not be stable. Care must be exercised not to remove the hierarchical structure in this type of
scenario. If the block is ungrouped by this command, values may not be fixed. Take care
not to specify ungroup to this type of block.
1-22
1.3. Initial reset
Chapter 1
reg Q;
assign Q_X = ~Q;
always @( posedge CLK ) RST_X
if( RST_X == 1' d0 )
IN1 Q
Q <= 1 'b0;
.. grouped together
DFFsynrst U1(.D(ASIG),.RST_X(RST_X),
.CLK(CLK),.Q(Q));
always @( Q or IN1 or IN2 )
if(Q==1'b1)
ASIG = IN1;
else
ASIG = IN2;
..
If you add the Design Compiler specific directive ”//synopsys sync_set_reset” you could
assure a value with synchronous reset without using a hierarchical FF. Therefore, if you
are not using an additional hierarchy, this directive should be added. However, as this is
only effective with Design Compiler, it will not be possible to use this RTL description if
other logic synthesis tools are used in the future. [5] Also, because this Synopsys directive
is realized in comment lines, syntax cannot be checked and it is recognized as a simple
comment statement even if only a character is wrong. This method cannot guarantee that
a tool always generates a circuit as illustrated above. To ensure that synchronous reset
defines value at gate simulation, there is no other way but the use of hierarchy method as
above.
If one reset line has both synchronous reset and asynchronous reset, synthesis may not be
performed properly. [6] The asynchronous reset line sometimes forms a tree-structure dur-
ing the layout process (Refer to "1.4.2. Use clock tree synthesis for clock balancing"). To
avoid accidentally inserting a buffer or logical operand during logic synthesis with Design
Compiler, set_ideal_net (The specified net is excluded from the limitation of the logic syn-
thesis, the timing analysis, and the design rule) may be put on this reset line. Then, if
this reset line is also input to a FF's synchronous input, that part will not be synthesized
and synthesis will fail as a result. In addition to this, having both asynchronous reset and
synchronous reset may cause other problems during logic synthesis and layout, and they
should not be mixed.
If you do not use any asynchronous reset other than an initial reset, you will only need
either asynchronous reset FF or asynchronous set FF.
Do not use FF with both asynchronous set and reset. [7]
1-23
1.3. Initial reset
else if(!RST_X)
QOUT <= #DLY 1'b0;
else
QOUT <= #DLY DIN;
Basic Design Constraints
In this description, when RST_X is 0 and SET_X is 0, since QOUT is prioritized on set
signal, output becomes 1. If asynchronous signal SET_X changes to 1 in this case, the
always construct will not be activated until the next clock edge; therefore, QOUT remains
as 1. However, in the behavior of a generated gate-level circuit, QOUT becomes 0. This
will result in RTL and gate level simulations not matching. As seen in this example,
cases in which both RST_X and SET_X are active, or in which a FF has a prioritized set/
reset, the simulation results of RTL and gate-level will not match.
RTqualify checks items except for 1.3.1.[4]. Tools cannot recognize 1.3.1.[4].
1312(W3) Neither an asynchronous set nor a reset for a FF in description.
1.3.1.[3] Checked by 1321 and 1322.
1315(W1) //synopsys sync_set_reset used in a synchronous reset.
1316(W2) An asynchronous reset or an asynchronous set is connected to a FF data
input path.
1317(W1) A FF is used that has both an asynchronous reset and an asynchronous set.
Verilint Warning
W396 : No asynchronous reset for flip-flop.
W392 : The polarity of asynchronous rest is wrong.
W395 : More than one asynchronous reset is detected.
1-24
1.3. Initial reset
Chapter 1
- Logic order may be replaced by synthesis
- Hazards cannot be prevented in the RTL description
[2] Do not insert signals other than initial reset to FF asynchronous reset pins recommend 1
Example Code
count
ctl_X
Signal with hazard
When a combinational circuit generates an asynchronous reset signal, there are situations in
which the enable signal would be separated from the FF as the result of optimization, and
signals with a hazard may drive the reset input. [3] As illustrated in the figure above, this
occurs even if enable logic is inserted before the FF reset signal in an RTL description to avoid
hazards. In this case, depending on the timing of the combinational circuit input signal, a
hazard could develop in the reset circuit and the FF could be reset with unexpected timing.
It would be difficult to discover a problem such as this because hazards are prevented in the
RTL description. As a rule, it is therefore forbidden to directly input signals other than the
initial reset signal to the asynchronous reset pins. [2] In cases in which this type of circuit is
created, use an additional level of hierarchy to group this logic. This method is similar to the
handling of synchronous resets. Refer to "1.3.1. Use asynchronous reset for initial reset" for
further information.
As explained in “1.3.3. Be careful about external noise on an initial reset signal”, an asynchro-
nous reset signal may be supplied as synchronized. Also, logic circuits may be inserted so that
a system reset can be selected.When logic is needed on a reset line, combine that reset line
logic together in the top level ( illustrated in “1.4.1. Creating modules for clock generation
circuits”) as much as possible and directly input the same signal input to all FFs.[1]
1-25
1.3. Initial reset
1-26
1.3. Initial reset
[1] There is danger of malfunction unless attention is paid to reset lines reference
on the circuit board
Chapter 1
[2] An initial reset may have to be synchronized or else a noise elimination
recommend 3
circuit may be needed
[3] In some systems, an initial reset signal is asserted before the clock reference
Explanation
As explained in “1.3.1 Use asynchronous reset for initial reset” an asynchronous reset is
preferred for the initial reset. But operation tends to become unstable when slowly sloped
waveforms or waveforms with a lot of noise are directly input from outside the LSI. Also,
if the asynchronous mode is used for the reset, it may violate Setup and Hold times, be-
cause the rise point of the clock and the rise or fall of the asynchronous reset signal may
occur at the same time. In this situation it is recommended to synchronize before distrib-
ute the waveforms as illustrated below.
If there is a significant concern about the influence of noise, three or five FF stages should
be used, and reset should be only executed when all FF outputs are in the RESET state.
Latch by FF
Latch by FF
RST_X
RST_X
Schmitt
trigger I/O
In some systems, the initial reset signal is asserted before the clock. Some portable con-
sumer electronics automatically turn on the initial reset at the time the power is turned
on. By loading the resistor and capacitor on the reset pin, the voltage rise of the reset line,
as shown in Figure 1-8, slows down in comparison to the voltage rise of power.
However, a clock signal generated by a PLL also will be slower in comparison to the volt-
age rise of power. In most cases the rise of the clock signal becomes even slower than the
reset input. Thus, if the reset signal is synchronized by a FF, as shown in Figure 1-7, the
system would have no reset.
Synchronization is not possible in this type of system, and the reset signal from outside
the LSI, which should be supplied to a FF asynchronous reset input, is input directly.
1-27
1.3. Initial reset
In this case, asynchronous reset signals are preferred, but it would be better to add some
noise elimination circuitry.
Voltage
Chapter 1
Application Power
Clock
of power Reset
Time
Basic Design Constraints
As countermeasures, the use of a Schmitt trigger I/O, a VDD and GND for the I/O pin next
to the reset pin to reduce noise (Figure 1-9), or the addition of a DLY element to prevent
hazards as shown in Figure 1-10, may be applied.
GND
RST_X
VDD
I/O pad
Figure 1-9 Use power pins beside reset pin to eliminate noise
In Figure 1-10, when the value at point A becomes '0', after some delay, the value at point
B becomes '0'. If A then becomes '1' during the time span of the DLY cell (delay element),
the OR output will remain as '1'. Therefore, hazards can be prevented to a certain extent.
Yet, in ASIC design, a DLY cell (or BUFFER) and OR gates are assigned in distant posi-
tions and may end up with values different from the assumed delay value. Take extra care
when inserting such a circuit.
B
DLY cell
RST_X A
In addition, one countermeasure against asynchronous reset noise and malfunction due to
Setup/Hold is to first create stable clock stop states and then input an asynchronous reset
between them.
However, no counter measure is perfect, so please give careful consideration to how the
reset line should be laid out on the board. [1]
1-28
1.3. Initial reset
Chapter 1
Basic Design Constraints
1-29
1.4. Clocks
1.4. Clocks
1.4.1. Creating modules for clock generation circuits
Chapter 1
Explanation
Combine circuits such as gated clocks and multiplication clocks that supply clocks to the
internal sub-circuits together in the same level as much as possible.
Creating a clock generation module and supplying clocks from a single clock generation
module offer the following advantages:
1-30
1.4. Clocks
The clock line is created with the CTS tool as explained in "1.4.2. Use clock tree synthesis
for clock balancing". Buffers should not be inserted to the clock line during logic synthesis
process. Therefore, the clock line is should not be excluded from the logic synthesis.[2]
Chapter 1
Similarly, reset lines should be made into separate hierarchical structures as in clock
lines. By creating modules for reset generation, it will become easier to apply synthesis
constraints.
1-31
1.4. Clocks
[1] For clock lines, do not use primitive cells other than dummy buffers reference
on a clock tree
Chapter 1
[2] Clocks should be balanced in accordance with the number of cells to be reference
connected to each clock tree
Basic Design Constraints
Explanation
Adjust skew by
shifting position
In LSI design, the Clock Tree Synthesis (CTS) tool is used to synthesize the clock after the
placement of each FF in the layout. Figure 1-12 provides an overview of the CTS. The
clock lines extend from the center in an H-shaped tree structure. Wires from each termi-
nal buffer are also arranged in H-shaped structures, supplying the clock to the local FF
from the final terminal buffer.
In these clock lines, the arrival time will vary depending on the number of FFs ultimately
supplied, and the length of the interconnect lines. The CTS tool adjusts at this point by
shifting the position of any given buffer forward to delay the arrival time.
To use the CTS, add a CTS dummy buffer to the circuit. The CTS dummy buffer is in-
serted either into the clock generator module during the RTL description stage, or after
the gate-level net list has been generated. Any buffers or delay cells in the clock lines
aside from CTS dummy buffers are not recommended. In particular, do not use buffers,
etc., in signal lines after the CTS dummy buffer.
The CTS tool generates the clock tree after the CTS dummy buffer, and adjusts the clock
signal arrival time. However, it does not adjust the arrival time between different clocks.
To adjust the arrival time between different clocks, adjust by inserting delay cells (such as
buffers) in the stage prior to CTS. If there are too many clock lines, this adjustment
process becomes laborious.
1-32
1.4. Clocks
CTS tool capabilities have been enhanced recently, making it possible to adjust delay val-
ues even when gated clocks are present in the clock lines. Even if gated clocks are present,
the placement of a CTS dummy buffer in the stage prior to the gated clock will make clock
adjustments easier during layout. However, when this approach is used, a single clock
Chapter 1
line will be encumbered by a large capacitance, reducing the degree to which power con-
sumption in the clock line can be reduced. (See "3.4.Low Power-Consumption Design".)
Although gated clocks are used to reduce power consumption, they are unable to produce
as much of an effect as anticipated. Because of this, there is also the approach of fabricat-
During design of the LSI, consideration should be given to the number of clock lines.
Additionally, the designer should be aware of the approximate number of FFs connected to
each clock line for load balancing these clock lines. [2]
It may be necessary to fine-tune the timing of input signals to the LSI from the outside or
output signals by the LSI to the outside. These interfaces include: interfaces with the
CPU bus, interfaces with the PCI bus, and the interface with external memory. For LSI
external interface parts that require fine-tuning of the timing, it is recommended that the
clock lines be divided in advance.
As is introduced in "3.3.Design for Test(DFT)", scan registers are inserted into the LSI
using the scan register insertion tool. These scan registers generate structure scan chains
for each clock line. As circuits become larger and individual clock lines are connected to
more FFs, scan paths may become too long, and too many test vectors are output from the
automatic test pattern generator (ATPG) tool. Because of this, the designer may have to
divide the clock lines. When it comes to how the clock lines should be divided, the answer
is dependent on how the circuit is divided into blocks in the layout, and thus the clock
lines often cannot be divided in the RTL design stage. However, it is recommended that
the problem of power consumption (discussed above) and scan line insertion be considered
to some degree when working on the clock lines.
Additionally, it is not possible to insert scan registers for each clock line if the number of
clock lines is too large. In this situation, it may be necessary to switch the clock lines into
a single clock line during the test mode. Moreover, because there is also the issue of
detecting interferences between different clock lines, ("3.3.7. Handling of Different Clocks")
the connecting logic of signal lines between the separate clock lines should also be consid-
ered.
Currently, not only clock lines but also resets and the scanning selections are created with
CTS tools. Therefore, dummy buffers for CTS should also be added in reset and scanning
select lines.
1-33
1.4. Clocks
[1] Avoid inverting logic on the same clock line. Also avoid using gated clocks recommend 2
and using FFs with different edges.
Chapter 1
[3] Using gated clocks is an effective method for achieving low power consumption reference
Basic Design Constraints
[4] Do not supply clock signals to pins other than FF clock input pins recommend 1
(such as D input)
[5] Clock signals should not be connected to black boxes, bi-directional pins
recommend 3
or reset lines
Explanation
As explained in “1.4.2. Use clock tree synthesis for clock balancing”, the recent CTS tools
(Clock Tree Synthesis) are now able to take clock tree balancing into consideration even
when there are gated clocks or inverted clocks. However, as described above, great care
must be taken when designing the clock lines. [1] As a result, gated clocks or inverted
clocks should be gathered in the clock generator module in the top level, and clocks should
not be generated at the local levels. If you wish to consider using gated clocks at a very
detailed level in order to reduce power consumption that little bit more, use the “genera-
tion of gated clock circuits using EDA tools” explained in “3.4.1. Low Power-Consumption
Design Using Gated Clock”. [3]
The connection of the output of one FF to the clock pin of another FF should also be limited
to the clock generator module located in the topmost level. [2] When the output of an FF
becomes another clock line, the CTS tool cannot take the clock line balancing into consid-
eration.
Do not connect clock signals to anything except for the clock pins of FFs. [4] If the clock
line passes through a logical gate to arrive at the D input of the FF, the logic synthesis
tool cannot perform optimization for that part. Additionally, such a path is extremely
dangerous because the timing cannot be analyzed correctly. Use caution when fabricating
a circuit wherein an external clock circuit is latched by a clock signal within the LSI
(used, for example, finely dividing clock signals ).
Clocks input from outside of the LSI may be connected to clock generator circuits (PLLs
and DLLs). If there is no library provided for these clock generator circuits (PLLs or
DLLs) "black boxes" errors will result when performing simulations and logic synthesis -
make sure these libraries are present.
1-34
1.4. Clocks
Depending on the ASIC library used, there may be two types of FFs: those that work on a
positive clock edge and those that work on a negative clock edge. When the two types of
FFs are mixed in a circuit, scan register insertion becomes problematic. It is best not to
use FFs that work on an inverted clock. [6] However, it is not a problem if latches that
Chapter 1
work on inverted logic are used only in a single stage (See “1.5.3.Guaranteeing the setup/
hold and margin for synchronous RAM”, “2.4.Latch inference” and “3.3.7.Handling of Dif-
ferent Clocks”).
1-35
1.4. Clocks
[1] Whenever possible, create separate hierarchical blocks for each clock line. reference
Chapter 1
[2] When inputting multiple clocks in the same block, provide an integral
reference
multiple period as a clock constraint
Explanation
Basic Design Constraints
Define clock generation circuits in different levels when clocks consist of two or more
systems. See “1.4.1. Creating modules for clock generation circuits” for the benefits of
placing clocks in different levels.
When creating sub-blocks, create a sub-block for each clock system as often as possible [1]
to avoid a racing problem (see “2.3.1.Unify the description style of FF inferences”) during
simulation and to facilitate clock synthesis during layout. Even if creating a sub-block for
each clock is difficult, avoid inputting multiple clocks in the same block whenever possible
(Figure 3-13).
For external data retrieval such as CPU interfaces, consider creating synchronization
circuits. If external data is stable and its operating frequency is lower than the internal
clock, all signals should be synchronized by the internal clock.
Figure 1.14 shows an example in which a typical CPU interface is created by full synchro-
nization. In this case, the rise of the WR X signal is detected and the values for ADR and
DATA at that time are retrieved. This type of synchronization circuit can be applied only
when CPU interface speed is one half or less than that of the internal clock. As the falling
edge is used to write instead of the rising edge of the WR_X signal, it is valid only when
CPU interface speed does not change.
The CPU interface may take the WR_X signal as a clock signal and retrieve ADR and
DATA as shown in the figure below. In this case, pay attention to metastable measures.
In such CPU interfaces, it is rare to have circuits operate by signal WR_X clock. Further,
if FFs operating with an internal clock system are contained in separate blocks, the cir-
cuit becomes difficult to understand. It is not recommended that blocks be divided for
each clock in a circuit such as this.
1-36
1.4. Clocks
Chapter 1
DATA ADR register
DATA
Inside ASIC and All are
Asynchronous CPUIF Internal clock
Metastable countermeasure
CS_X
When CPU bus is directly supplied
To
WR_X internal
- Timing management register
becomes critical ADR
When synthesizing circuits that input multiple clocks to the same block, provide a mul-
tiple cycle relationship as much as possible.[2] In the case of clocks that have a 12ns and a
6ns cycle, the logic synthesis tool optimizes with the basic cycle set to 12ns, which is the
least common multiple. The basic cycle is 60ns for clocks with cycles of 12ns and 15ns,
and is 84ns for clocks with cycles of 12ns and 14ns. As the least common multiple in-
creases, optimization and timing analysis takes more time to complete. (See “5.4.4 Mul-
tiple clock optimization” for more information.)
In cases when the actual clock cycle yields a rather large LCM (lowest common multiple)
cycle, it can be adjusted by modifying circuit partitioning and hierarchy. Or, consider
adjusting the clock cycle to yield a smaller LCM cycle during logic synthesis runs.
1-37
1.5. Asynchronous circuits
[3] Do not have a feedback loop at the first-stage FF after transfers mandatory
between asynchronous clocks
[5] To avoid erroneous input data, latch the clock signals and use them reference
as enable signals
Explanation
There is a difficult problem termed “metastable” when transmitting data between asyn-
chronous clocks. In order to solve metastable problems, one must first understand the
operating principles of flip-flops (FFs).
Figure 1-16 shows the circuit structure of a MOS LSI FF. (1) and (3) in the butterfly
shape are MOS switches. With the MOS switch as its entrance, the structure then
has an inverter loop in its next stage. This loop, in its steady state, either has a ’1’ on
the left side and a ’0’ on the right side, or conversely, is stable with a ’0’ on the left
side and a ’1’ on the right side.
1-38
1.5. Asynchronous circuits
Let us assume that the left side in the left-hand loop (2) is stabilized at ’0’ and the
right side is stabilized at ’1’. Given this, let us assume that the switch at (1) is
opened. In actuality, over the interval where the CLK signal is low, the left-hand
switch is open. The inverter in the loop is only capable of driving a very low current.
Chapter 1
When a ’1’ signal arrives at the D input, the value that is looped by the inverter is
simply inverted.
Next let us consider what happens when the CLK signal transitions from low to high
Conversely, next let us consider what happens when the CLK signal goes from high to
low (i.e. at the falling edge). At this time, the right-hand switch (3) closes and contin-
ues to maintain its previous value. Consequently, the output from the FF does not
change. The left-hand switch (1) opens, and, in order to prepare for the next rising
edge of the CLK signal, the D input value is continuously copied into the left-hand
loop (2).
In an actual LSI, the FFs function as described above. Cells termed “latches”(or “D
latches”) do not have the right-hand loop of the FF structure. Consequently, latches
require less area than FFs.
Q Q
CLK CLK
1-39
1.5. Asynchronous circuits
CLK
D
Chapter 1
Q
Basic Design Constraints
This oscillating state is propagated to the logic circuits, and thus the logic circuit
may not function properly. This is the metastable problem.
In the design of large logic circuits, it is not possible to perform adequate investiga-
tions for all of the circuits. When the metastable problem is found in circuits that
have already been fabricated, it is difficult to trace back to the causes, presenting a
major impediment to circuit debugging. There is a need for a simple, easily under-
standable approach to the metastable problem.
There is no precise data on the time required for the metastable state to reach conver-
gence. Because it is difficult for LSI manufacturers to measure this time period, the
times are not publicly disclosed. As a result, no specific times can be provided; how-
ever, these times are estimated to be about 10ns for a 0.18um design rule, and 12ns
for a 0.25um design rule. If the operating frequency at the 0.18um rule is more than
100 MHz, then the delay time that is allowable between FFs is less than 10ns and,
from the perspective of safety, another stage of FFs may be required.
In logic circuits wherein the operating speed is high, the latter stages of the FF ac-
cepts the data without the oscillations from the previous stage of the FF converging.
While one may think that if this is repeated, then the oscillation will not converge
regardless of the number of stages of FF that are used, this is not the case. The
metastable state is a probabilistic phenomenon. When the oscillation from a previous
stage arrives at the FF of the later stage, even if we assume that the oscillation does
not converge, the probability that the lower-stage FF will oscillate is low. Even if
there is no convergence, the amount of oscillation will decline. Additionally, it does
not always take the same amount of time for an oscillation to converge. While most
oscillations do converge, when there is no convergence, the probability that the oscil-
1-40
1.5. Asynchronous circuits
lation will propagate is extremely low. Consequently, if several stages of FFs are
added, the probability of oscillation in the output of the FFs becomes asymptotically
close to ’0’, so there is no problem in actual application. Regardless of how high the
operating frequency, if some number of FF levels is used, then the oscillation will
Chapter 1
guranteed to converge.
CLK1
CLK2
* Do Not Have a Feedback Loop in the First-Stage FF After Asynchronous Transfer [3]
If, conversely, the CLK1 and CLK2 operating frequencies are not high, then there is
no problem if there is some degree of logic before the next-stage FF, such as shown in
Figure 1-19 (b). However, even when this is the case, there will be problems when
there is feedback to the FF in order to latch data. Depending on the internal feedback
circuitry of a FF, the metastable state being placed on input might destroy the latched
value. As a result, the circuit will malfunction. Because of this, even if the place-
ment of some amount of logic is unavoidable, be absolutely sure not to have feedback.
It becomes difficult to confirm whether or not a circuit is safe when a logic circuit is
inserted after asynchronous transfer. Even if the designer has assembled the circuit
while paying careful attention to the metastable problem, if understanding the circuit
by other designers is difficult, then the result will be much extra work and expense.
The metastable problem is extremely difficult, and the checks are difficult as well. In
order to eliminate this problem, it is best to be thoroughly entrenched in the habit of
not placing logic prior to the next-level FF, even when the operating frequency is low.
1-41
1.5. Asynchronous circuits
and recently circuit boards operating at about 1.9V have begun to appear. If the
power supply voltage is low, then the lower the voltage, the more sensitive the device
is to noise. Moreover, the finer feature the design rule for the LSI, the faster the I/O
operating speed will be, making the LSI extremely sensitive to noise.
Chapter 1
is required in cases such as when bus values are transmitted. In Figure 1-20, the
data is latched at CLK1, and this signal is passed to CLK2 using asynchronous trans-
mission. In such a case, if the rising edges of CLK1 and CLK2 are close together and
if the rising edge of CLK1 is applied during the setup time for CLK2, then the problem
will not be one of a metastable state, but rather one in which the value will not be
read correctly.
Because the data signal uses a bus (8 bits wide), the individual signals will arrive at
minutely different times. When these signals are latched, some bits might accept the
value after the value has changed, while some bits might accept the value before the
value has changed. The result is the danger that, if, for example, the value ”36”
should be received, a totally different value may be read in.
DATA 36 FE 36 7F
CLK1(16.23MHz)
CLK2(34.6MHz)
Data cannot be read properly if the CLK2 rise
and CLK1 rise are close to each other
One method by which to safely read data from a bus of some width is the method
where CLK1 is latched by CLK2. As is shown in Figure 1-21, the select signal is the
one in which CLK1 is latched by CLK2. When this is done, if the CLK2 clock period
is less than one third of the CLK1 clock period, then it is possible to receive the data
signal in a safe place without being constrained by setup/hold time violations. How-
ever, at the point in time in which CLK1 is latched by CLK2, there is the potential for
the metastable problem to occur, and thus it is necessary to insert an additional FF
(Figure 1-22, CLK1EN2) and shift the latched CLK1 by one CLK2 cycle. This method
does not substantially increase the size of the circuit, and the circuit can be created
easily; it is really quite simple. See “1.5.2. Use memory in transfers between asyn-
chronous same-period clocks”.
1-42
1.5. Asynchronous circuits
DATA_CLK1
DATA Q
Chapter 1
Do not place logic
CLK1
CLK1EN CLK1EN2
DATA_CLK1 36 FF 36 7F
CLK1(16.23MHz)
CLK2(34.6MHz)
CLK1EN
CLK1EN2
Q 36 FF 36 7F
1-43
1.5. Asynchronous circuits
[2] Use frame memory for transfers in asynchronous clock or clocks reference
with different period.
Explanation
Basic Design Constraints
Use FIFOs for data transfers between asynchronous clocks with the same clock period.
Data are input to the FIFO according to the address of the input side free run counter (free
running counter). At the output side of the FIFO, use a different free run counter which
indicates a different address value than the address value at the free run counter at the
input side to read out data. [1]
If there are six FIFO stages, shifting the 3-bit address counter value will prevent the same
address from being simultaneously accessed. In this case, neither the input side clock nor
the output side clock will function erroneously even if a shift of two cycles or less occurs.
However, a reference enable signal is required for data exchanges during the asynchro-
nous period that used this FIFO.
Even if this enable signal is not present, if there is a pause in the data being input, there
is also a method that uses this pause interval to forcibly retrieve an enable signal. If no
enable signal exists and there is no pause in the data, there is no way to safely transfer the
correct data. Using an initial reset as the start signal for a free run counter in a design
can be hazardous.
DATA
Address
CLK1
EN1 Reset free run Reset free run
counter by Address counter by
EN1=1 EN2=1
EN2
CLK2
Figure 1-23 Transfer between asynchronous clocks with the same clock period
Large frame memory (dual port RAMs) is required for transfers between clocks that are
asynchronous but which do not have the same clock period. Similar to data transfer dur-
ing an asynchronous period using FIFO, the reference enable signal determines the ad-
dress values of the input side and output side. If data are output from the input side
consecutively and the duration of each frame is long, then there must be sufficient memory
for that frame. Therefore, it is necessary to carefully examine the architecture to make
sure that it does not grow too large. [2]
1-44
1.5. Asynchronous circuits
Chapter 1
[2] Synchronous RAM has a long hold time, so some measures are necessary reference
[5] There is also a method for avoiding modification of I/O for each reference
ASIC vendor by creating a general-purpose RAM module and fixing the I/O
Explanation
FFs operating by clock edges have a setup time and a hold time. (Refer to “1.5.1 Consider
metastable issues in signals between asynchronous clocks”.) Besides FFs, cells operating
by clock edges have a setup time and a hold time. Designers should follow this rule when
coding RTL to generate logic circuits using logic synthesis tools.
The RAM that is used inside LSIs and FPGAs is mostly synchronous RAM. Because syn-
chronous RAM operates on the rising edge of the clock just as other circuits do, it can be
used following the same approaches as for other logic circuits. However, synchronous
RAM has an extremely long hold time, and thus one must consider ways to ensure the hold
time. [2]
Hold time violations that occur in synchronous RAM are problems that can be solved by
the execution of commands to ensure the hold times using logic synthesis tools. However,
in some cases the hold time for synchronous RAM may exceed 1ns, and in some cases a
large number of buffers will be inserted in order to ensure the hold time (See Figure 1-24).
At 0.18um, 0.13um, or the like, the delay for one stage worth of buffer is only about 10 or
20ps. If we assume that the RAM hold assurance time is 1ns, then 50 or more buffers
would need to be added. If a large circuit is acceptable, it is a simple matter to use the
logic synthesis tools to ensure the hold time, though it means many buffers may be in-
serted.
ADR
DIN
Synchronous
WR_X
RAM
CS_X
Figure 1-24 Large amount of buffers inserted to guarantee hold time of synchronous RAM
1-45
1.5. Asynchronous circuits
nous RAM is less than half of the clock period, this will solve any problems with hold time.
Latches that use the inverted clock have open gates during the interval over which the
clock is low. Even if we assume that there are a large number of logic gates prior to the
latch, the signal passes straight through the latch, and so the effective delay on this path
Chapter 1
becomes one cycle time subtracted by a latch delay time. In other words, by inserting the
latch, it is possible to solve the hold time assurance problem without being particularly
aware of the delay time before the signal arrives at the synchronous RAM.
Basic Design Constraints
ADR
G Previous
DIN stage
ADR h8256 h8257
G Synchronous
WR_X RAM
ADR ?? h8256 ??
G after
CS_X latch Data goes
through while
Low Data does
G
not change
Figure 1-25 Inserting a latch with an inverted clock to guarantee RAM hold time
In Figure 1-26, a BIST circuit is inserted into the RAM for automatic testing. Addition-
ally, when laying out the LSI, the positioning of the RAM should be decided first. If the
RAM is positioned in the topmost level instead of in a deep position in the local levels,
these operations are simplified.[4] However, in during design stages that use RTL descrip-
tion, it may require a large amount of work to move the RAM to the topmost level. For
example, for an IP that uses RAM internally, from a design management perspective it
would be unwise to move the RAM to the topmost level. Thus this should be interpreted as
“layout the RAM in as high a level as possible”.
When RAM is used within the IP, RAM cell names, I/O pin names, and specifications will
vary depending on the ASIC library used. If possible, it is convenient to prepare a stan-
dard RAM interface and reuse the RAM by adopting a method where the RAM is called up
from the ASIC library. [5]
OE OE
RWE RWE
IH
IH BIST circuit
CLK Automatic insert CLK
1-46
1.6. Hierarchical design
[1] Limit the gate size of a single level to 10,000 gates or fewer to ensure safety
Chapter 1
- As the operating frequency becomes higher, keep the size of a single level recommend 2
as small as possible.
- Limit the gate size of a level to 20,000 gates or fewer even when
the operating frequency is low (mandatory)
[3] The size of basic blocks (on a 2,000 to 10,000 gate scale) at lower hierarchy
reference
is optional
[4] The top most hierarchy should only contain the following types of blocks.
-Clock generation module, reset generation module, RAM, ROM, I/O cells, mandatory
RTL description of the top hierarchy.
Explanation
When an ASIC design is made, they are segmented into each block by taking into account
the functionality and distribution to each designer. When divided into four blocks
(geometo, lendar, cpuif, video), as illustrated in Figure 1-27, if the geometo
block is to a certain extent rather large, then it is split up further into blocks such as
gtexc, gclip, and gmap. In this case, it will end up having hierarchical structures such
as G3DDengin (top level), geometo (upper levels), and gtexc (lower levels).
These hierarchical structures have two important meanings in ASIC design. One is the
segmentation of a hierarchical structure which is easy to verify or easy to reuse. The
other is the allocation of hierarchical modules that are easy to handle using logic synthe-
sis and layout tools.
gmap
1-47
1.6. Hierarchical design
When considering the hierarchy in terms of logic synthesis and layout, there must be a
clear conception of what the basic block (basic level of hierarchical design) will be. The
basic block is constrained by size and structure. The size of the each basic block should
fundamentally be between 2,000 and 10,000 gates. [1] If the size of the four TOP (upper-
Chapter 1
level levels) blocks (geomet, lendar, cpuif, video) is between 2,000 and 10,000
gates, as illustrated in Figure 1-28, then these four blocks will constitute the basic blocks.
If any of the geometo, lendar, cpuif or video blocks is larger than 20,000 gates,
Basic Design Constraints
then the next lower levels (gtexc, gclip, gmap) would constitute the basic blocks.
* Logic synthesis is usually performed from these basic blocks.
* The hierarchical structure of the basic blocks is maintained by logic synthesis
and is passed on to the layout tool.
* Levels below the basic blocks are referred to as sub-blocks.
Sub-blocks' hierarchies are frequently ungrouped when used with a logic synthesis tool,
therefore hierarchical levels are not taken into account when transferred over to the lay-
out tool.
In the RTL TOP hierarchy, only hierarchical blocks should be placed and no logic gates
should be placed directly. [2] If a RTL description is written to generate logic gates in the
TOP hierarchy, it will be problematic to optimize these gates during logic synthesis.
In additions to RTL TOP descriptions, the topmost hierarchy may contain I/O cells, clock
generation modules, RAM and ROM. Logic gates should never be placed directly. [4]
RAM lendar
geometo
ROM
video
Clock
cpuif
generation
module
Logic synthesis is performed at the basic block unit. If there is an RTL description in the
top level that has logic, it may be difficult to synthesize. It will be problematic to increase
operating speed even if logic synthesis using hierarchical compile was performed. If there
are logic gates that do not belong to any level, it will be difficult to place cells in their
proper locations during the layout phase, circuit speed will slow down, and wiring effi-
ciency will decrease.
1-48
1.6. Hierarchical design
Logic synthesis performance will not be adversely affected if the basic blocks have up to
about 20,000 gates, thanks in part to recent improvements in the performance of logic
synthesis tools. The faster the operating speed required by the circuit, the smaller the
basic block should be made; otherwise, required performance cannot be met. For faster
Chapter 1
operating speed, using a smaller basic block is also advantageous to layout tools. There-
fore, you may wish to keep in mind that the faster the operating speed becomes, the smaller
the basic blocks should be.
The limitation of basic block size is one constraint to be observed. However, the circuit
structure constraint noted in “1.6.2. Make basic blocks FF output & combinational circuit
input” is even more important. Size constraints will be meaningless if the circuit struc-
ture constraint is not observed.
1-49
1.6. Hierarchical design
[1] Make all basic blocks combinational circuit input and FF output recommend 3
Chapter 1
[2] When the above is impossible, have the timing path cover no more than
recommend 1
two blocks
[3] The above restrictions do not apply to smaller levels below the basic blocks reference
(2,000 – 10,000 gates in scale)
Basic Design Constraints
Explanation
Output of basic blocks should be FF output whenever possible. [1] By adhering to this rule,
you can gain advantages during the synthesis phase such as:
*Drive capacity and output arrival time from the outputs of basic blocks will be clearer.
*Input delay attributes (set_input_delay) provided during block synthesis and the values
for input drive capacity (set_driving_cell) are given with greater consistency.
As a result, the quality of an optimized circuit can be enhanced.
Also, the timing path can be kept within a single block when such a circuit structure is
employed. As a result, it is possible to prevent excessive loss of speed during layout since
the wiring length is restricted. Consider this to be a mandatory circuit structure for
designs that require tight timing constraints
C D
LOGIC FF LOGIC FF
Nevertheless, it is difficult in actual design to make all basic blocks combinational circuit
input and FF output. In certain situations, FF input and combinational circuit output
will result. Even in these cases, however, you should observe the restrictions in “Figure 1-
30 Make paths two-modules paths if possible”. [2] This is because if there is a path that
passes through the basic block as combinational logic, it becomes very difficult to achieve
speed improvement on this path in the layout.
C D
FF LOGIC LOGIC FF
1-50
1.6. Hierarchical design
Chapter 1
1621(W3) Signal "<name>" outputted from this module as a combinational circuit.
If a path passes combinational circuit at a module, which is larger than the forward figure set
by the variable of basic_block_line_number(default=500), the following message is output.
1622(W1) There is a path that passes through a combinational circuit.
1-51
1.6. Hierarchical design
[1] Limit paths that are critical in terms of speed to within two sub-blocks
reference
inside each basic block, if possible
Chapter 1
[2] If there are any speed issues, stay within three sub-blocks recommend 3
whenever possible
Basic Design Constraints
Explanation
Item “1.6.2. Make basic blocks FF output & combinational circuit input” is not applicable
to sub-blocks (levels below the basic blocks). However, it is not desirable to span a large
number of sub-blocks for designs that are tight in terms of speed. It is advisable to avoid
the use of designs in which timing paths span multiple blocks as much as possible. [1] Try
to keep the timing path to within no more than three sub-blocks, especially if there are
any speed problems. [2]
Figure 1-31 Keep paths to within three modules even inside sub-block
Optimizing combinational circuits which span different hierarchical levels presents the
following problems:
1-52
1.6. Hierarchical design
Chapter 1
mented by combinational circuits, there are cases in which the wiring will become
longer and the timing will deteriorate.
When the operating speed is somewhat fast (100Mz at 0.13um), the above restrictions
RTqualify changes variables and uses 1611 and 1612 to check within sub-blocks..
1-53
1.6. Hierarchical design
[1] The upper levels of basic blocks should contain only the connections mandatory
of each block
Chapter 1
[2] Paths with severe speed constraints should only contain connection of
recommend 3
each block, even in the sub-blocks
[3] If the scale of a basic block is about 10,000 gates, the number of I/O ports
recommend 2
Basic Design Constraints
Explanation
Do not place logic (except for I/O cells and CTS buffers) in the upper levels of basic blocks.[1]
This is because a timing path that spans three blocks will result, as illustrated in Figure
1-32, even if only one AND gate is placed there. This would be in violation of the con-
straint in “1.6.2. Make basic blocks FF output & combinational circuit input”, which states
that the timing paths of basic blocks must span no more than two blocks. If such a path
exists, executing a floor plan in the layout will become more difficult and a very long
timing path may result. There may also be instance names in which logic cells that do
not exist in any basic block cannot be placed in the appropriate positions in the layout.
This will cause a floor plan to become problematic and layouts which take speed into ac-
count to become impossible.
When a timing path passes across multiple blocks, it becomes difficult to distribute the
optimum timing constraint necessary for properly optimizing the timing path. Also, when
synthesizing upper level hierarchies in which multiple basic blocks exist (50,000 to 200,000
gates), synthesis will take a long time to run and it will become difficult to use synthesis
methods that improve the operating speed. Any combinational circuits not belonging to
any basic blocks will make certain logic synthesis methods impossible.
Selectors for binding busses are sometimes placed in the upper level hierarchies of designs
in which busses exist. Also, selectors used for sharing I/O pins with test I/O are placed in
the top level (Figure 1-33). You should consider it a prerequisite to at least modularize
such logic.
1-54
1.6. Hierarchical design
in each level of hierarchy and to implement an efficient layout. Thus, by placing small
selectors as module in a hierarchical level higher than the basic blocks, paths that span
two or three blocks may exist. Bus signals in particular have a certain degree of width at
16 bits and 32 bits, so the wiring area will expand.
Chapter 1
If possible, try to include such selectors inside some of the blocks in your design. However,
even if a selector to bundle the bus is included inside the ctrl block, the output from alu
will pass through the ctrl block and therefore does not follow the guideline of the timing
alu ram
It is ideal to have the above two selectors under separate hierarchical blocks
If possible, it is best not to use a bus structure but rather to exchange signals directly,
such as from alu to arbter or from ram to ctrl. It may appear to be disadvantageous
since the number of ports for each module will increase, but this method offers a number
of advantages with current layout tools.
However, since the number of I/O ports for each block increases in a design where signals
are supplied from alu to all the other blocks, and from each of ctrl, ram and arbter to
all the other blocks, there is no other alternative but to choose a bus structure. Whether
it is necessary to assume a bus structure or not is determined at the time of system de-
sign, which is when consideration should be given to how to avoid using such a structure.
When the number of ports for each basic block increases, the number of paths which pass
through many blocks increases as well. In this situation, it may become difficult to perform
placement of basic blocks during the layout (floor plan) process. To increase the degree of
freedom at the time of layout, careful consideration must be given to the number of I/O ports of
each basic block. If the number of gates in basic blocks is about 10,000, 200 or fewer I/O ports
is preferable.
1-55
1.6. Hierarchical design
1-56
1.6. Hierarchical design
[1] Description styles are different for the data path section and the controller reference
Chapter 1
[2] Different synthesis methods can be chosen for the data path section
reference
and the controller
Explanation
The description for the data path section consists primarily of FFs and operators. The
description for the controller mainly consists of control syntax including encoders/decod-
ers such as if statements or case statements , and state machine descriptions, such as
those described in “2.11. State machine descriptions”. [1]
FF
selector
+/-
barrel
shifter
selector
Since descriptions for the data path block primarily consist of operators, most logic syn-
thesis tools perform processes such as allocating resources corresponding to the operators,
sharing resources, and optimizing operational expressions. See “5.6. Circuit synthesis
including operators” for more information.
Because control blocks primarily consist of state machine descriptions, logic operational
expressions, and FF descriptions, most logic synthesis tools perform logic optimization
processes such as structuring, flattening, or optimizing the state machine. See “2.11.State
machine descriptions” for more information. [2]
Separating the data path part from the control part is said to be useful in establishing
policies for RTL description and for logic synthesis. However, paths from the control to
the data path are often critical in terms of timing, and thus the control and the data path
1-57
1.6. Hierarchical design
must have meaningful hierarchical relationships to each other.
Ideally, it would probably be best for the data path part and the control part to exist as sub
blocks of a basic block. Doing so will cause the paths from control parts to data path parts
to be contained within the same basic block. However, it is difficult to provide both mod-
Chapter 1
ules in the basic block when the size of circuits in the data path is large. Additionally, the
relationship between control parts and data path parts is often not a one-to-one relation-
ship, but rather multiple control parts are related to multiple data path parts.
It is not easy to create the ideal hierarchy structure. It is better to think in terms of basic
Basic Design Constraints
blocks, even more so than separating data path parts from control parts.
The rules described in this section should be considered an additional policy to the rules
described in “1.6.2. Make basic blocks FF output & combinational circuit input”, to be
implemented when possible.
1-58
1.6. Hierarchical design
1.6.6. Designate buffer outputs in upper levels with 200,000 or more gates
[1] ASIC I/O cells should be inserted only in the top level or the I/O cell level recommend 3
Chapter 1
[2] In large-scale ASICs, it is not possible to synthesize everything reference
from the top level
[3] When a higher drive buffer is required for the output of a level with
reference
200,000 to 800,000 gates, create a separate module containing only buffers
The recent increases in speed with which logic synthesis tools run have made it possible to
perform logic synthesis on a relatively large scale. Even logic synthesis of circuits con-
taining as many as 500,000 gates and beginning at the uppermost level can take as little
as four to five hours, or, if on a slow system, the logic synthesis can be completed within
two days. However, when one takes into account the execution time and performance,
500,000 to 800,000 gates is probably about the limit for performing logic synthesis. In
designs in the two to three million gate range, only timing analysis can be done at the
topmost level.[2] In large designs in the two to three million gates range, the creation of a
higher-level module for the basic blocks is extremely important. It is not particularly
desirable for a circuit design to have some modules with 800,000 gates and other modules
with 2000 to 3000 gates, as well as the topmost hierarchy containing 200 or 300 modules.
There are some cases in which it is difficult to completely fulfill operating speed require-
ments using the logic synthesis tool within only the basic blocks. Even if logic synthesis
began with the basic blocks, fine-tuning the timing using circuits in the scope of 200,000
to 800,000 gates is a wise policy. The topmost level in the design of large LSI circuits
should have modules with no more than 200,000 to 800,000 gates.
Special caution is required regarding the inputs and outputs of the 200,000 to 800,000
gate blocks because logic synthesis using hierarchical compile to fine-tune the final cir-
cuit is not performed in the top level. The interfaces between these blocks require long
interconnects in the layout.
When the interconnect lines are long, it is necessary to provide buffers with strong drive
capabilities. Buffers that have strong drive capabilities can be produced by the logic syn-
thesis tool. A delicate balance between the added capacitance and delay values in the
output ports is required as well. Finally, it is difficult to derive an ideal drive capability.
1-59
1.6. Hierarchical design
If one wishes to place a higher drive buffer, it is probably best to specify the buffer cell
explicitly.
However, when it comes to this buffer, problems may arise in the form that is called di-
Chapter 1
rectly from the RTL description. If the cell is frozen through the use of the “set_dont_touch”
command in logic synthesis for the higher drive buffer called by RTL description, then
this cell itself will not be deleted. However, the output net for this cell may be cut off and
the output of the cell may pass through an indirect route.
Basic Design Constraints
To avoid such circumstances, prepare separate modules for higher drive buffers as shown
in Figure 1-35, and use a method for calling these modules. When this is the case, as long
as "set_dont_touch" is specified for the module, the output network will never be cut off in
this way.
Adding buffers in an output port means not only securing the usage of higher drive cells
but also closing timing analysis within the block. If output signals of a module are used
not only as output ports, but also as internal signals, the delay values inside the module
change depending on additional capacitance connected to the output ports. Therefore,
buffers should be inserted just before the output port to facilitate timing analysis of a
large scale design.
With the latest version of logic synthesis tools, the buffer just before the output port can
be inserted by the command of tools. Please refer to "5.7.4.13 set_isolate_ports" for de-
tails. Buffer insertion just before the output port is not necessary if logic synthesis tools
generates particular buffers.
1-60