[go: up one dir, main page]

0% found this document useful (0 votes)
17 views8 pages

Async

The document discusses asynchronous data transfer in digital design, highlighting the limitations of synchronous models in modern VLSI circuits due to increased size and speed. It outlines various timing models for data transfer, the motivations for adopting asynchronous design, and describes several handshake protocols and logic components used in event-based logic. The text emphasizes the advantages of asynchronous design, such as reduced power consumption and improved modularity, while also noting that synchronous design remains prevalent despite these benefits.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views8 pages

Async

The document discusses asynchronous data transfer in digital design, highlighting the limitations of synchronous models in modern VLSI circuits due to increased size and speed. It outlines various timing models for data transfer, the motivations for adopting asynchronous design, and describes several handshake protocols and logic components used in event-based logic. The text emphasizes the advantages of asynchronous design, such as reduced power consumption and improved modularity, while also noting that synchronous design remains prevalent despite these benefits.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

Asynchronous Data Transfer

Dinesh Sharma

The most common method for digital design is the synchronous model. It assumes that
signals are binary (with values ‘0’ or ‘1’) and all subcircuits in an electronic system have a
common notion of time (the existence of a global clock).

As systems have become bigger and faster, these assumptions have become harder to
justify. Modern VLSI circuits are complete sytems on chip with dimensions of the order of
centimetres, with clock speeds of multiple GHz. Delays of much more than a clock period are
therefore expected for global interconnects. Complete electronic circuits involve data transfers
on chip, from chip to chip and across multiple PC boards. The synchronous model breaks
down for such systems. This has necessitated alternative timinig models for data transfer
between sub-systems.

1 Timing models
Timining models for data transfer at interfaces between subsystems are divided into several
classes.

Fully synchronous: These are the traditional systems with a global clock such that the
clock frequency is the same every where (∆f = 0) and the phase difference ∆φ is zero,
or at the most fixed and known.

Mesochronous: Here the clock source is the same everywhere, so (∆f = 0). However, clock
distribution introduces unknown delays. Thus ∆φ is unknown, though bounded.

Pleisiochronous: Here the clock frequencies are nominally matched, but derived from inde-
pendent oscillators. Thus (∆f 6= 0), but bounded. The phase will therefore drift with
time. Hence, ∆φ is unbounded.

Rationally clocked: Here, the clocks for various sub-systems are derived by dividing a mas-
ter clock by integral numbers. Thus (∆f 6= 0), and clock frequency of communicating
systems are rationally related. ∆φ is cyclic in nature, but known for different clock
cycles.

Heterogenious clocks: In this model, different subsystems have unrelated clocks. Thus
(∆f 6= 0) and is unbounded. Obviously, there is no defined phase relationship between
the clocks of subsystems. Such systems are also known as Globally Asynchronous, Lo-
cally Synchronous or GALS systems. Data transfer between subsystems requires special
synchronization methods. One such method involves clock stretching or Pausible Clocks.

Fully Asynchronous: These systems use no clocks. Data transfer is through asynchronous
hand-shake protocols.

1
2 Motivation for Asynchronous Design
Synchronous design isolates circuits into small stages divided by clocked latches or flipflops.
This permits easy design and debugging methods because each stage can be independently
designed. For this reason, synchronous design has been the dominant design technique.
However, larger circuits and faster clock speeds have made the synchronous model difficult
to implement. The clock, which has the highest switching rate in a system, has to be dis-
tributed over the entire system. This means the distribution network is heavily loaded. The
combination of high switching rate and high capacitive load leads to high power dissipation.
Indeed, the clock distribution network itself consumes about half (or even more!) of the total
power dissipation of a complex chip. Additionally, the slowest element along a clocked path
determines the fastest clock that can be used, and hence the performance is limited by the
slowest stage in a design.

This has necessitated a move to asynchronous design. A. L. Davis had summarized the
top ten reasons for asynchronous design in the Async conference of 1994. He claimed that
Asyncronous circuits

1. Achieve Average Case Performance,

• Exploit data-dependent processing times


• Best if difference between average and worst case is large
• Be careful not to spend too much time on completion detection

2. Consume power only when needed,

• CMOS, in particular, consumes power only during transitions


• Clocks make a lot of transitions, not all of them do useful work
• Demonstrated ability for async circuits to consume power only on demand

3. Provide easy modular composition,

• “LEGO” approach
• Allows incremental improvement
• Object-oriented approach to hardware
• Operating parameter robustness

4. Do not require clock alignment at interfaces,

• Synchronizing an incoming signal to a clock requires great care and wastes time.
• Metastability can cause hard-to-find errors.
• Naturally adaptive to a variety of data rates.

5. Metastability has time to resolve,

• Any bistable device can get caught in a metastable region for an unpredictable
amount of time
• Assuming fixed resolution time leaves possibility of
• Arbiters can be used to ensure correctness errors

6. Avoid clock distribution problems,

• Major design time drain


2
• Major power budget drain
• Major chip area drain

7. Exploit concurrency more gracefully,

• Natural way to describe systems with lots of concurrency


• Let concurrency happen rather than plan all interleavings

8. Provide intellectual Challenge,

• Lots of good puzzles


• Informal reasoning is dangerous
• Room for innovation

9. Exhibit intrinsic elegance,

• Provide direct mapping of sequence domain


• Tangible target for theoretical work
• Correct-by-construction design
• Measurement vs. trust

10. Global synchrony does not exist anyway!

• High clock speeds, large chips, and even larger systems


• Global synchrony is a useful abstraction, but it is not reality
• requires great care, and wastes time
• May as well admit it, and figure out where async techniques can help solve problems

These are fair arguments, but even a quarter century after these claims were made, syn-
chronous design remains the dominant design model. This is because testing of these circuits
remains a problem and the actual gains in practice are not as dramatic as were expected.
However, many concepts which were analyzed for design of asynchronous systems have found
their way into modern system design. Also, some kinds of asynchronous circuits – such as
First In First Out buffers (FIFOs) are in wide use now.

3 Asynchronous Hand Shake protocols


Let us look at some of the hand shake protocols used for asynchronous data transfer.

3.1 Four Phase Request-Acknowledge Protocol


The widely used Request-Acknowledge protocol uses 4 phases.
• The initiator puts up Req to send/receive data.
Req
• The responder puts up Ack when it has received/sent
data. This can take any amount of time after Req has
been asserted.

Ack • The initiator removes Req on seeing Ack.

• The responder removes Ack on removal of Req.


The last two phases are used just to return the signals to their passive state.

3
3.2 Two Phase Request-Acknowledge Protocol
Faster data transfer is possible if we use events (change in logic value) rather than logic levels
to signal request and acknowledge. This is used in two phase Request-Acknowledge protocol.
Any change in the state of Req or Ack is considered an event.
New Data Transfer
• The initiator toggles Req to send/receive data. (Could
Req
be 0 → 1 or 1 → 0).

• The responder toggles Ack when data has been re-


ceived/sent. (Could be 0 → 1 or 1 → 0).
Ack
There is no need to have states where signals return to their
passive states.
Requires Event Signaling
Now faster data transfer is possible. However, we need to have logic elements which operate
on events rather than binary levels.

4 Logic components for Event based logic


Several circuit configurations have been developed for event based logic. Since event sensing
requires the previous as well as the current state, logic which appears combinational often
requires storage elements as well. Another technique is to use two control signals and devise
structures which depend on whether these are equal or unequal in their logic value. Any
change in value of either control signal will change equal to unequal and vice versa. If the
output depends on equality or unequality of the two control signals, it will be event sensitive.
Often the output is also an event and a transition from ‘0’ to ‘1’ or from ‘1’ to ‘0’ is considered
equivalent.

4.1 The C element


Vdd
• The C element is one such hardware structure.
It uses 2 PMOS and 2 NMOS transistors, all in
series. A
Out
• The output of this 4 transistor structure goes to
a capacitor, which stores state. B

• The logic level on the capacitor is inverted to


generate the final output.
Gnd

When A and B are unequal, one of these should be ‘0’ and the other ‘1’. Thus, one out
of the two series connected NMOS transistors is off. Similarly, one out of the two series con-
nected PMOS transistors is off. In this case both the pull up and pull down are disabled and
the capacitor holds its previous value. So the output remains at its previous value.

When both inputs are ‘0’, the P channel transistors are ON while the N channel transistors
are OFF in the first stage. The capacitor charges up to ‘1’ and so the output is ‘0’.

When both inputs are ‘1’, the N channel transistors are ON while the P channel transistors
are OFF. The capacitor is discharged and the output goes to ‘1’.

4
Thus when the inputs are equal, the output is the same as inputs. When inputs are un-
equal, the C element holds its previous state.

Due to this behaviour, the C element acts as a logical AND of events. Assume that both
inputs are at ‘0’ initially. So the output is also at ‘0’. If either input goes to ‘1’, the inputs
become unequal and the output holds its previous value of ‘0’ Subsequently, if the other input
also goes to ‘1’, both inputs are now equal and are at ‘1’. So the output will be driven to ‘1’.
If the other input does not go to ‘1’ but the first one returns to ‘0’, both inputs are equal to
‘0’ and the output remains ‘0’. Thus the output has an “event” only when both inputs have
had an event.

Assume that both inputs are at ‘1’ initially. So the output is also at ‘1’. If either input goes
to ‘0’, the inputs become unequal and the output holds its previous value of ‘1’. Subsequently,
if the other input also goes to ‘0’, the output will be driven to ‘0’. Thus the output again has
an “event” only when both inputs have had an event.
Vdd

The C element output has an event only when A

both its inputs have an event. Thus the C el- Out


B
ement performs “AND” logic on events. It is C
therefore represented as an AND gate with a C
inside the symbol. Gnd

Vdd

A B
If we want to use a static C element without re-
lying on the capacitor to hold the previous state,
A
we can use the circuit shown on the right. We
make use of the fact that state needs to be stored B Out
only when the inputs are unequal.
A B
Gnd

When inputs are unequal, one of these must be a ‘1’ and the other must be ‘0’. This ensures
that in the parallel PMOS and NMOS structures used to power the last inverter, one of the
switches is ON. So the last inverter is powered when inputs are unequal and forms a latch
with the first inverter. For unequal inputs, the first stage has both pull up and pull down
disabled as was the case for the dynamic C element. However, the second inverter is powered
and forms a latch with the first inverter, which retains the previous state.

If both inputs are ‘0’, the series connected PMOS transistors are ON, while the series
connected NMOS transistors are off in the first stage. This outputs a ‘1’, which is inverted to
‘0’ by the first inverter. The parallel connected PMOS transistors are ON, while the parallel
connected NMOS transistors are OFF. The PMOS pull up of the second inverter is ON be-
cause the output of the first inverter is ‘0’. So the output of the second inverter is ‘1’. (This
helps the output of the first stage, to which it is shorted).
So the static C element output is ‘0’ when both inputs are ‘0’.

If both inputs are ‘1’, the series connected PMOS transistors are OFF, while the series
connected NMOS transistors are ON in the first stage. This outputs a ‘0’, which is inverted
to ‘1’ by the first inverter. The parallel connected PMOS transistors are OFF, while the

5
parallel connected NMOS transistors are ON. The NMOS pull down of the second inverter is
ON because the output of the first inverter is ‘1’. So the output of the second inverter is ‘0’,
(which helps the output of the first stage, to which it is shorted).
Thus the behaviour of static C element is the same as the dynamic C element.

4.2 The XOR logic gate as an OR of events


We have seen that the C element acts as an AND of events. What about the OR function for
events? We need to look no further than the conventional XOR gate. Its output is already
dependent on equality or inequality of its inputs. If either input has an event, equal becomes
unequal or vice versa. This causes the output to change. Thus the output has an event if
either input has an event.

4.3 The selector element


By itself, the selector element is not an event sensitive circuit. However, it is used in event
sensitive storage elements. The select element is also used in conventional circuits as a multi-
plexer. The figure on the right below shows a select element.
• When C = ‘0’, the PMOS and NMOS con-
nected to supply and ground in the left
C C
half are ON while the right half is OFF. Select Element
So Z = X. X
X Y Z
Y
• When C = ‘1’, the PMOS and NMOS con- Z
nected to supply and ground in the right C C
half are ON while the left half is OFF. So
Z =Y.
Thus, the behaviour of the circuit is that of a two way switch followed by inversion as shown
in the figure above.

4.4 Capture Element


We can construct an event sensitive latch using selector elements. The circuit has two control
inputs C1 and C2 . Assume that the selector elements are so configured that C1 = ‘1’ puts the
selectors S1 and S2 in the up position, while C2 = ‘1’ puts its selector S3 in the down position.
Capture Element

0 S1
In 0 Out
1 S3
1

0 S2

C1 C2

C1 and C2 can be either equal or unequal.


The circuit behaviour is the same when C1 = 0, C2 = 0 or when C1 = 1, C2 = 1. Simi-
larly, the circuit behaviour is the same for the two unequal combinations C1 = 0, C2 = 1 or
C1 = 1, C2 = 0. This makes it event sensitive (similar to C element).

6
C1 = C2 = 1

When C1 = C2 , both can be ‘1’ or both can be ‘0’. 1

0 S1
In 0 Out

When both are ‘1’, the two switches on the left (S1 1
1
S3

and S2) will be up and the right switch (S3) will be 0 S2

down. Data will flow through S2, the lower inverter


on the left, S3 and the output inverter. C1 C2
C1 = C2 = 0

S1
When both are ‘0’, the two switches on the left (S1 1

0
and S2) will be be down and the switch on the right In 0 S3 Out

(S3) will be up. Data will flow through S1, the upper 1
1

input inverter, S3 and the output inverter. 0


S2

Data buffered from In to Out

In either case, The input is just buffered to the output when inputs are equal.
C1 = 0, C2 = 1

When C1 6= C2 , either C1 = 0, C2 = 1 and all switches 0 S1


In 0 Out
are down; or C1 = 1, C2 = 0 and all switches are up. 1
1
S3
0 S2
When all switches are in the down position, the lower
input inverter and the output inverter will form a
C1 C2
C1 = 1, C2 = 0
latch through S2 and S3.
1 S1
When all switches are in the up position, the upper In
0
0 S3 Out
input inverter and the output inverter will form a latch 1

through S1 and S3. 1

0
S2

In data latched

In either case the data present at the input when this condition occurred will be latched and
the output will be isolated from the input.

5 Two Phase Pipeline


We can now construct a two phase event sensitive pipeline. Only the control signals are
described below. It is assumed that data will be latched using event sensitive latches. Assume
initially that all Req and Ack signals are ‘0’ and the FIFO is empty.
Rin A1 R2 A3 Rout
Delay Delay

C C C C

Delay Delay
Ain R1 A2 R3 Aout

• When the external source of data on the left has data available, it will toggle the Rin
line.

• Now Rin = ‘1’, A1 = ‘0’. Both inputs of first C element are ‘1’, so output Ain = ‘1’.

• This latches the data (using unequal values of Rin and A1) and acknowledges it to the
source on the left. This stage is now ‘full’.

7
Suppose another request comes when the first stage is still full. Sender will toggle Rin to
‘0’ now. A1 is still at ‘0’.

Since the two inputs to the left most C element are unequal, it will hold its output. No
acknowledge event is generated. The previous stage will continue to hold its data and Rin
lines, waiting for an acknowledge event.

After generation of an event on Ain, it appears as request R1 to the next stage with
some delay. Now R1 = ‘1’, A2 = ‘0’. Both inputs of second C element are ‘1’, so output A1
= ‘1’. Data from the first stage in now latched into the second stage using unequal R1 and A2.

Thus the second stage has accepted data and the first stage is empty again. It can now
except fresh data from the source on the left.

(Set up a VHDL or verilog description for this pipeline with a depth of 4 stages and test it
for various scenarios – fresh input arriving before the data has been transferred to the second
stage, fresh input arriving after the data has been transferred etc.)

You might also like