ISSCC 2023 Tutorials: Digital Equalization and Timing Recovery Techniques For ADC-DSP-based Highspeed Links
ISSCC 2023 Tutorials: Digital Equalization and Timing Recovery Techniques For ADC-DSP-based Highspeed Links
ADC-DSP solutions in recent years are being rapidly deployed at 56+ Gb/s
data rate in nanometer finFET technology
Masum Hossain T11: ADC-DSP-based Digital Equalization 7 of 108
1 UI = 100 ps @ 10 Gb/s
0 0 1 0 0
Frequency (GHz)
Masum Hossain T11: ADC-DSP-based Digital Equalization 8 of 108
h0+h1
h0
1 1 h0-h1
1 0
0 1 Decision threshold
h1
0 0
-h0+h1
-h0-h1
h0
Combines ‘Noise’
and ‘ISI’ PDFs can
extended to estimate
h1
achievable ‘BER’
Rx
CDR
Signal slope
sample where the
slope is max.
Rx
CDR
Data
Recovered
clock
Rx
HCDR
fCDR f
Data
Recovered
clock
Within the tracking
bandwidth (fjitter << fCDR) Data Recovered clock
CDR can track several UI of zero crossing HCDR zero crossing
Rx
HCDR
fCDR f
Data
Recovered
Outside the tracking clock
UI of Jitter
time time
f
fCDR
Rx
HCDR
fCDR f
|HCDR|
Rx
HCDR
fCDR f
|HCDR|
Frequency (GHz)
1 + s
ω z
A = ADC RL
CL
1 + s 1 + s
ω ω
P1 P2
M1
1
ωZ =
RC
gm 1 + gm R
ADC = g m −eff RL = RL ω P1 = 2
gm R RC
1+
2 1
ωP 2 =
RL C L
1 + s
ω
A = ADC z
1 + s 1 + s
ω P1 ω P 2
Combination of DFE +
CTLE can achieve 20 Gb/s
compensating 20 dB Loss
Frequency (GHz)
[Sontag’06]
Area efficient Noise less digital accumulator + DPC
Synthesizable and portable Less supply sensitivity
Phase
Rotation Q-clk
IB I PI clk
I-clk
Dig. Clk I Q Quad 3 Quad 4
QB
+1/ 0 /-1 Time
VDD
PI clk
OUTMIX
DT Phase Accumulator = Digital accumulator + OUTBMIX
digital to Phase converter (Mixer) MUXI MUXIB MUXQ MUXQB
BiCMOS
Power 7.5 W 2.04 W 0.230 W
(180 pJ/bit) (50pJ/bit) (5.74 pJ/bit)
(w/o CMU )
Area 1.7mm X 2 1.7mm x 0.57 mm x 0.32 mm
mm 2.9mm Digital Loop filter
Passive Loop Passive Loop
filter filter
Existing equalization strategy does not scale well with technology, channel loss and data rate
2-bit information is
encoded in 1-UI
2x improvement in
spectral efficiency
3X reduction in
eye height
+1
S[n-1] S[n]
NRZ
+1
-1
Eye-opening -3
-1
Channel
37.5 Response
dB Loss Single Bit Response (SBR)
@ 28 GHz
-37.5 dB @ 28 GHz
TxFIR
Frequency Response (dB)
NRZ PAM4
Time (second)
Masum Hossain T11: ADC-DSP-based Digital Equalization 40 of 108
Before DSP EQ
Sampled SBR
after CTLE
Freq response of DSP FFE
Sampled DSP
h.f. boost
FFE Response After DSP EQ
Sampled SBR
after FFE-only DSP
4 levels
Without
Equalization
hmain 7 bit
DAC
Z-1 hpost,1
hpost,N 8 bit
Z-1
hpre,1
Z-1
Z-1 hpost,1
hpost,N 8 bit
Z-1
hpre,1
Z-1
Z-1 hpost,1
8 bit
Add an offset value in all pre2
hpost,N
Z-1
entries to re-center to 0-127
Offset
Masum Hossain T11: ADC-DSP-based Digital Equalization 47 of 108
Pre-driver 17
Serializer 11
DSP based TxFIR 15
Clock buffer, DCD 40
Tx PLL 45
Bias circuit 5
Amortized global clocking 48/4 = 12
Total 175 mW
DSP part of the transmitter power is progressively reducing with technology scaling
4nm implementation
Achieves 200+ Gb/s data-
rate with flexibility of
modulation from PAM to
OFDM
The Tx-FIR and CTLE only partially compensate for the loss – ‘eye’ will remain closed
Time-interleaved ADCs
outputs can also be
used as delayed a
version of the data.
It is easier to
understand the
equalization with
samples single-bit
response
𝑄𝑄𝑛𝑛(𝑚𝑚)
0
= � 𝑐𝑐𝑚𝑚, 𝑙𝑙 𝑟𝑟𝑛𝑛(𝑚𝑚+𝑙𝑙) , 𝑚𝑚 = 0, 1, . . , 𝑀𝑀 − 1
𝑙𝑙=−𝑁𝑁+1
[Agazzi’JSSC 2008]
-40
-50
-60
-70
-80
-90
-100
1 2 3 4 5 6 7
ANALOG INPUT FREQUENCY (GHz)
Although Digital FFE output can be 2N bit, If FFE can be moved ahead of the ADC then
ADC’s N-bit resolution still limits us we can Minimize ADC’s quantization noise penalty
How can we build a digital FFE with a resolution better than the ADC?
300
200
100
512
Selected
FFE output
FFE output
FFE output
0
Entries
-100
- To LUT
-200
-300
-400
0 20 40 60 80
Occurance (Times)
1000 µm
500 µm
Required ADC resolution and EQ complexity are highest for 30 dB loss channels
Opportunistic power savings for less lossy channels by reducing ADC resolution and
DSP resolution.
[S.Kasturia’1991]
Loop unrolled
architecture can
be implemented
in the digital
domain.
N-Mux and flop
delay should be
less than N UI to
meet the timing
requirements
𝝃𝝃 = 𝑬𝑬[𝒆𝒆𝟐𝟐]
Pi(n) = Pi(n−1)+2µ𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑒𝑒 𝑆𝑆i
Unlike Data decision, Timing recovery does not require every PD decision to
be correct. So we can under-equalize to reduce latency – use a simpler
dedicated 3-tap FFE, this saves a couple of cycles.
Latency can be significantly improved if CDR can handle ISI and DDJ
0.3
Threshold 0.2
Adjust D+1
D+1 0.1
D-1
amplitude (V)
D-1
-0.1
-0.2
-0.3
-1 -0.5 0 0.5 1
time (UI)
No D+1 / D-1 Smaller D+1 / D-1 Thresholds Larger D+1 / D-1 Thresholds
Higher data threshold results in tighter edge distribution (good for DDJ perf.) but
reduces CDR bandwidth & may impact l.f. SJ tracking
By appropriately adjusting the data threshold it is possible to improve the DDJ
seen by the phase detector
Masum Hossain T11: ADC-DSP-based Digital Equalization 83 of 108
Both data and edge thresholds are adapted to maximize proportional gain without
exceeding jitter target
Comp. Power
AFE 80 mW
ADC 195 mW
Ring PLL 35 mW
‘TDC’ power is overhead but it can
TDC + CDR 20 mW
be kept low compared to the ‘ADC’
Misc 8 mW
10% overhead (195mW vs 20mW) Total 338 mW
Data Edge
Sampled
PD output
MMPD
SBR
Symbol Error Rate
@ PAM-4
Slicer input
Measurement
Time (ns)
SNR vs BER correlates well with the
theoretical plot at the decision point.
At the FFE input SNR vs BER plot is
offset by the FFE amplification factor
SNR (dB)
Masum Hossain T11: ADC-DSP-based Digital Equalization 89 of 108
FFE &
1 tap DFE
FFE only
Channel Shortening to 4 taps
Signal attenuation
Time (ns) Freq (Hz)
Tx
h0
h-1 h+1
𝑟𝑟 𝑘𝑘 = 𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟𝑟 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠
r[k] includes both signal
and noise
time
101
010
100
001
000 time
101
2
010 𝑒𝑒𝑖𝑖𝑖𝑖 = 𝑟𝑟�
𝑖𝑖𝑖𝑖 [𝑘𝑘] − 𝑟𝑟[𝑘𝑘]
Branch
100 eij indicates how ‘likely’
Matric
it is that the received sample
001 belonging to a particular seq.
000 time
h0
h-1 h+1
ADC + BM Cal.
= Analog to seq. conv.
Directly calculating
the ‘probability’ achieves
Reduced computation
Less quantization
noise effect
Bit-wise operation
Low-cost sequence set and Branch 1-bit Branch Matric based short (1 UI)
Matric. Generation for NRZ. traceback error correction
35 mW @ 10 Gb/s NRZ in 65nm 82 mW @ 28 Gb/s PAM-4 in 28nm
CMOS CMOS
+1
-1
-3
Decoded
State
Corresponding
3b Trellis
+1
-1
-3
Decoded
State
Corresponding
3b Trellis
+1
-1
-3
Decoded
State
Corresponding
3b Trellis
[Aurangozeb’ 2020]