High Speed Fpga
High Speed Fpga
Abstract – This paper describes a low-cost extension module the speed of current ATEs. Furthermore, FPGAs have been
used to extend an FPGA-based development platform that enables widely used to implement and verify new designs and algorithms
digital testing up to 40Gbps. This platform typically operates up to in industry for past three decades. Therefore, they can easily be
13.1Gbps and can be applied to test current main-stream I/O re-configured into different testing applications. The limited
standards such as PCIE3.0 (8Gbps), USB3.1 (10Gbps) and flexibility of ATE systems has often caused difficulties for
Thunderbolt (10Gbps). The high bandwidth of an ultra-high- testing new digital products that require new I/O protocols and
speed test module allows testing capability for future high-speed waveforms. In contrast, the high performance and flexibility of
standards such as PCIE4.0 (32Gbps) and 40G Ethernet. An FPGA
FPGAs provides a natural solution for building a testing
main board is built and programmed to control this plugin module
platform to replace costly and outdated ATE.
for testing at across a wide range of data-rates. Using such “state
of the art” FPGAs and careful design strategy of an economical
In this paper we take advantage of a FPGA-based extension
FR4 plugin board, the data rate is extended to 40Gbps. This module from ATE [7] for building a general testing platform.
economical plugin module is implemented by multiplexing four The previous prototype module was implemented with only a
high-speed channels from the FPGA into a single 40Gbps serial bit single channel, targeting a 3.2Gbps data rate specifically needed
stream. for memoryġtesting. By using serializer/deserializer (SERDES)
technology, the latest FPGAs are very powerful and capable of
Keywords— ATE; FPGA; High-speed I/O; DUT; Multi-GHz even higher data rates with multiple-channels. The power
consumption of FPGA is low which reduces the cost of power
and cooling system as implementing this platform. The Xilinx
I. INTRODUCTION Kintex-7 series FPGA was selected to build this platform due to
the balance of cost and performance. This FPGA uses 28nm
Traditionally high-speed digital test becomes very limited as semiconductor CMOS technology and it supports up to 4.5
the data rate goes above 10Gbps. Under this speed, most of million logical “cells”, block memories, normal I/O ports, and
digital designers can only rely on software-based simulation to 16 high-speed transceivers. The high-speed transceivers (GTX)
verify the function of their new designs. Although some high- support data rate from 500Mbps to 13.1Gbps and are able to test
end automated test equipment (ATEs) might be utilized to test several mainstream I/O standards like USB3.1 (10Gbps) and
the most cutting edge designs at this speed rate, the high price of PCIE3.0 (8Gbps).
the ATEs also increases the testing cost significantly. Since the The use of FPGAs enables solutions for many testing issues.
testing cost is dominating the total cost of developing new FPGAs provide multi-channel I/Os and support different I/O
devices, an economical solution is required to. Furthermore, the standards are able to replace the core function of traditional
limited performance (typically 3.2Gbps) of the ATEs is also ATE. Some designs already exist to use FPGAs for testing
becoming the bottleneck on high-speed digital test. Therefore, memory systems. The most widely-used method for designing
an alternative method with both low cost and high performance test equipment is to use FPGA evaluation boards from the
is urgently needed in the market. manufacturer, a memory slot on the board [8], and existing I/O
Several methods have been developed to solve this issue: interface built on the evaluation board to perform different I/O
ATE extensions [1] provide the ability of multi-channel and testing. But it is restricted to the standards offered by the
high-speed testing, however the price and the power evaluation board. A more flexible design is to build an FPGA-
consumption of this method is still high. Built-In Self-Test based platform for testing [9] [10] that uses the built-in SERDES
(BIST) [2] [3] is another approach to test high-speed digital [11] [12] and the high-performance transceivers within FPGAs.
systems. In this scheme, BIST uses several low-speed signals In addition to the multi-channel ability and high-speed I/Os, the
from an ATE and generates high-speed data for digital testing configurability allows FPGAs to have flexibility to adapt to
within device-under-test (DUT). This design can be built inside different test requirements. Moreover, the cost of FPGAs is very
a DUT module and also extend the use of out-of-date ATE. The low compared to high-performance ATE systems [13].
drawback of customized BIST is that it may have limited Considering the additional flexibility that this platform offers,
flexibility to handle a variety of input/output (I/O) standards. and the lower component cost, FPGAs are an appropriate
Also, the additional time of designing BIST circuit might delay approach for building a new high-performance testing platform.
the new products to be delivered into the market. According to the approaches described above, we still lack
In order to find an ideal solution to this challenge, the use of solution to test I/Os above 13.1Gbps on this test platform.
field-programmable gate arrays (FPGAs) is becoming a popular Therefore, this paper describes a plugin module is designed
method [4] [5] to be applied in low-cost and high performance using the method of multiplexing [14] to achieve up to 40Gbps
digital test recently. Due to the explosive development of data rates. This paper is arranged as following: In section II we
semiconductor technology, the most recent FPGAs now are able first discuss the design and setup of FPGA main board to
to operate at nearly 1GHz [6] for each of hundreds of normal perform various data rates, later we will introduce the design of
I/Os. The performance allows FPGAs to nearly match or exceed ultra-high-speed module in section III, and in section IV and V
If the DUT I/Os require several Gbps data transition, then the
high-speed GTX transceivers are used. These transceivers are
composed of high-performance SERDES, and programmable
drivers/receivers. The fundamental concept of SERDES is to
take a relative low-frequency (e.g. 200MHz) reference clock and
with multiple-bit (typically 32-bit) serializing to form a single
high-speed signal. The specific output data rate of this signal is
defined by a serial clock generated from phase lock loop (PLL)
circuit. The formula to set the frequency of this serial clock in
the PLL is shown below:
ܰ
̴݂ ൌ ̴݂ ൈ
ܯൈʹ
270
Authorized licensed use limited to: Indraprastha Institute of Information Technology. Downloaded on December 01,2024 at 09:23:36 UTC from IEEE Xplore. Restrictions apply.
6.4Gbps output signal. The combinational setups of these core generates parallel PRBS-31 data patterns and serializes this
parameters allow for a wide range of data rates with a fix data to create 10Gbps bit streams. Then high-speed connectors
reference clock. bring four-channel differential 10Gbps GTX signals from FPGA
Similar to moderate-speed I/O block, PRBS generators and main board to this module. These differential signals go into a
static memories are also implemented for these transceivers. The high-performance HMC847 MUX chip. The MUX chip uses a
configurable TX driver is able to provide a differential voltage 20GHz phase-adjustable reference clock from one end of output
swing (peak to peak) from 300mV to 1000mV, also pre/post- from an HMC910 delay chip to serialize these GTX signals to
emphasis adjustment is provided to compensate the distortion of form a 40Gbps signal. The HMC910 delay chip takes the
giga-bit signal transmitting. The emphasis settings are very reference clock from an E8257D signal generator and has a
critical as the data rate goes above 6.4Gbps. As we mentioned, maximum 70ps delay controllability. This delay allows us to
the signal is transferring between the FPGA and the connectors. control the sampling phase of the MUX chip, which is critical
Although the board is well designed to minimize the distortion for synthesizingġthe high-speed signal. The red arrows in Fig. 3
under this data rate, there is still the possibility to adjust the show the direction of signal receiving path, which is basically
pre/post-emphasis setting to improve the transmitted data eye the opposite direction of TX side. The loopback 40Gbps signal
through long traces on the PCB to the DUT I/Os. is first received by an HMC848 DEMUX chip. This chip takes
With the above hardware design and software setup, this the other end of the delayed reference clock from HMC910 to
platform is able to perform wideband testing from DC to de-serialize 40Gbps into four-channel 10Gbps signals. Then
13.1Gbps. This range covers most of the applications in current these de-serialized signals go through connectors and back to the
market. However, even higher speed standards are coming soon. FPGA. The FPGA GTXs receive the data using built-in PRBS
This powerful testing platform will soon to be out of date at that pattern checkers to check the received data pattern.
time. Therefore a new hardware approach based on this platform Fig. 4 shows the timing diagram of the multiplexing process.
is needed to meet this challenge. The four sampling edges (two rising edges and two falling edge)
of 20GHz reference clock must be aligned to the window of data
pins (D1~D4). Since we take 10Gbps signal from the FPGA,
III. ULTRA-HIGH-SPEED MODULE DESIGN there is only100ps period slot available for sampling. Although
The FPGA development platform described in the previous we have optimized the skew of each data channels, there might
section provides low-cost and flexible wideband testing still exist several picosecond differences between the channels.
capability. However, this architecture has the potential for At this data rate, this scale of skew will be relatively significant
running at even higher speeds. The basic concept is to build an and might fail the sampling result. That’s the main reason the
extension module to plugin on the original main board and delay chip is used on this board. The 70ps variable delay form
extend the speed of the FPGA-based development platform. The delay chip provides about 0.7UI adjustment to optimize the
simplified operation can be described as follows: In the sampling location, this delay range should be enough for us to
transmitter side (TX), the extension module takes four 10Gbps find the appropriate sampling location. The delay value is
high-speed signals from the FPGA main board, and multiplexes defined by the analog control pin on the chip and an output of
(MUX) those four FPGA GTX signals to form a 40Gbps signal. digital to analog (DAC) chip is used to provide voltage control
On the receiver side (RX), we connected the high speed signal to this pin. The process of locating appropriate sampling clock
back to a 40Gbps receiver and de-multiplexes (DEMUX) this can be described as follows: In the beginning, we program the
ultra-high-speed signal to four 10Gbps channels so that the delay control pin to 0V then a minimum delay is achieved on the
FPGA is able to receive the data. output of the MUX chip. Then we start sweeping the delay value
(from 0 to 70ps), if the waveform shows a clear on the scope
(just as the eyes shown in result section), then we can confirm
that the sampling clock has been located correctly. In an
automated test, alignment of the clock is optimized by
minimizing the bit error rate (BER) during loopback testing.
271
Authorized licensed use limited to: Indraprastha Institute of Information Technology. Downloaded on December 01,2024 at 09:23:36 UTC from IEEE Xplore. Restrictions apply.
triggering clock is required. Therefore we will connect this clean
output clock to trigger the 86100D and measure the 40Gbps
signals on the scope.
Fig. 5 shows the PCB layout of this module. The control pins
on the delay chips, MUX and DEMUX parts can be controlled
either by a multi-channel outputs DAC, trimmer-pot resisters, or
power supplies. All the connectors are either high-performance
SMA or SMP connectors (with >26.5GHz bandwidth). The
extension module is manufactured using a simple four-layer
low-cost FR4 processįġThe high-speed traces are designed with
an appropriate width based on the stack up to match 50-Ohm
impedance and minimize reflections. Furthermore, we design
the trace length of the 40Gbps signal as short as possible to
minimize distortion on these paths. Appropriate decoupling
capacitors are also added to improve high-speed performance.
272
Authorized licensed use limited to: Indraprastha Institute of Information Technology. Downloaded on December 01,2024 at 09:23:36 UTC from IEEE Xplore. Restrictions apply.
generate by this extension module. The signal is measured by an In Fig. 9(a), the FPGA main board is programmed to output
86100D scope with 54752A 50GHz module, and the scope is four channels of 5Gbps signal and the sampling clock need to
triggered by the divided-by-two output clock from the MUX. be set to run at 10GHz to get 20Gbps data out. The jitter is as
The sampling clock of MUX/DEMUX is generated by Keysight low as only 1.2ps RMS and 9.5ps peak to peak, and the rise/fall
E8257D 40GHz signal generator. The signal in the following time is around 15ps. The amplitude of the signal is about
figures is AC coupled and measured differentially. 600mV and is adjusted by the voltage swing control pin on the
module. Similarly, higher data rates are performed in Fig. 9(b)
and Fig. 9(c). With 36Gbps output, it requires 9Gbps signals
from the FPGA and an 18GHz sampling clock from signal
generator. Finally we reach our target 40Gbps signal with four
10Gbps data streams and a 20GHz sampling clock. The rise/fall
time remains about 15ps and the amplitude is controlled at
600mV. Jitter increases to 1.5ps RMS and 12ps peak-to-peak
due to the limitation of the equipment. Consider the random
jitter from 86100D trigger circuit is about 1ps RMS, the jitter of
reference source (~1ps RMS), combining the jitter added in the
data transmission (effect of cable, connectors, and PCB), the
jitter performance is about as expected.
In Fig. 10, a 36Gbps signal with an hour run time is
presented. This figure shows very clear eyes, and reasonable
jitter with sharp rising/falling time, which indicates the
performance of this module is stable even over a long runtime.
(b) 36Gbps with 600mV amplitude (7ps/div, 140mV/div) Fig. 11 shows several examples of the UHS module output
at 40Gbps with various amplitudes spanning the full range of
200mV to 500mV. The smallest amplitude (200mV) behavior is
shown in Fig.11 (a). This amplitude is near the minimum
amplitude specified for the MUX part and the data eyes are open
at this full-rate of 40Gbps. Larger signal amplitudes of 300mV
and 500mV are shown in Fig.11 (b) and Fig. 11 (c), respectively.
The rising/fall time goes up a bit as the amplitude increases, and
the jitter at this frequency was found to be about 12ps peak to
peak over different amplitudes. This experiment shows a wide
amplitude range of this module with minimum limitation, and
the opening eyes are large enough be received by most receivers
in the market.
A duty cycle adjustment is performed in Fig.12. By
programming the analog control pin on the driver at various
voltage level, we are able to get a non-50% duty cycle, this also
indicates that the driver is able to run at an even higher data rate.
The waveform in Fig. 12 is programmed to be around 60%-40%
(c) 40Gbps with 600mV amplitude (6.25ps/div, 130mV/div) duty cycle and a clear-open eye with about 20ps width is shown
Fig. 9 – UHS module output with PRBS-31 pattern at different data rates. on the right side, which means there is a potential for this part to
run at 50Gbps.
273
Authorized licensed use limited to: Indraprastha Institute of Information Technology. Downloaded on December 01,2024 at 09:23:36 UTC from IEEE Xplore. Restrictions apply.
VI. CONCLUSIONS
This paper has presented a prototype FPGA-based
development platform, with 500 moderate speed (up to 1Gbps)
testing channels andġ16 high-speed transceivers for testing up to
13.1Gbps. The FPGA main board was realized using 28nm
Kintex-7 FPGA which balances the cost and performance of
building a system to replace current high-performance ATEs. A
plugin module is also implemented to extend the FPGA output
data rate to a higher level. With this extension module, the
FPGA-based development platform is able to operate at 40Gbps
with 15ps rising/fall time and about 12ps peak-to-peak jitter over
different data rates. The amplitude and duty cycle are
controllable, which enables characterization and go/no-Go
testing across a wide range of I/O signaling standards in the
(a) 200mV (6.25ps/div, 120mV/div)
future.
REFERENCES
[1] J. Moreira, M. Moessinger, K. Sasaki, and T. Nakamura, "Driver sharing
challenges for DDR4 high-volume testing with ATE," IEEE Intl. Test
Conf. (ITC), 2012.
[2] H. Kim and J.A. Abraham, "A built-in self-test scheme for DDR memory
output timing test and measurement," IEEE 30th VLSI Test Symposium
(VTS), p.7-12, 2012.
[3] P. Bernardi, M. Grosso, M.S. Reorda, and Y. Zhang, "A programmable
BIST for DRAM testing and diagnosis," IEEE Intl Test Conf. (ITC), 2010.
[4] R.L. Ladbury, M.D. Berg, E.P. Wilcox, K.A. LaBel, H.S. Kim, A.M.
Phan, and C.M. Seidleck, "Use of commercial FPGA-based evaluation
boards for single-event testing of DDR2 and DDR3 SDRAMS," IEEE
Trans. on Nuclear Science, Vol.60, No.6, p.4457-4463, 2013.
(b) 300mV (6.25ps/div, 120mV/div)
[5] I. Aleksejev, A. Jutman, S. Devadze, S. Odintsov, and T. Wenzel, "FPGA-
based synthetic instrumentation for board test," IEEE Intl Test Conf.
(ITC), 2012.
[6] http://www.xilinx.com/support/documentation/datasheets/ds890ultrascal
e-overview.pdf,“Ultrascale Architecture and Product Overviw”
[7] D. C. Keezer, T. H. Chen, T. Moon, D. T. Stonecypher, A. Chatterjee, H.
W. Choi, S. Y. Kim, and H. Yoo. "An FPGA-based ATE extension
module for low-cost multi-GHz memory test." European Test Symposium
(ETS), 2015 20th IEEE European, pp. 1-6. IEEE, 2015.
[8] J. Ferry, "FPGA-based universal embedded digital instrument," IEEE Intl
Test Conf. (ITC), 2013.
[9] S.H. Lee, S. Cho, K.J. Song, E.J. Byun, S.H. Joo, S.D. Suh, K. Ha, S.J.
Oh, and W. Lee, "A serial optical link based memory test system for high-
speed and multi-parallel test." J. of Lightwave Technology, 28, no. 1,
p.104-110, 2010.
[10] S. Kojima, Y. Arai, T. Fujibe, T. Ataka, A. Ono, K. Sawada, D. Watanabe,
“8Gbps CMOS pin electronics hardware macro with simultaneous bi-
(c) 500mV (6.25ps/div, 120mV/div)
directional capability,” IEEE Intl. Test Conf. (ITC), 2012.
Fig. 11 – Extension module amplitude adjustment at 40Gbps. [11] D.C. Keezer, C. Gray, A. Majid, D. Minier, P. Ducharme, “A
development platform and electronic modules for automated test up to
20Gbps,” Proc. of the IEEE Intl. Test Conf. (ITC), 2009.
[12] A.M. Majid, D.C. Keezer, “Multi-function multi-GHz ATE extension
using state-of-the-art FPGAs,” Proc. of the IEEE Intl. Test Conf. (ITC),
2011.
[13] D.C. Keezer, T.H. Chen, C.E. Gray “Multi-gigahertz arbitrary timing
generator and data pattern serializer/formatter,” Proc. of the IEEE Intl.
Test Conf. (ITC), 2012.
[14] Moreira, José, Bernhard Roth, Hubert Werkmann, Lars Klapproth,
Michael Howieson, Mark Broman, Wend Ouedraogo, and Mitchell Lin.
"An Active Test Fixture Approach for 40 Gbps and Above At-Speed
Testing Using a Standard ATE System." In 2013 22nd Asian Test
Symposium, pp. 271-276. IEEE, 2013.
[15] T. Moon, H.W. Choi, D.C. Keezer, A. Chatterjee, "Multi-channel testing
architecture for high-speed eye-diagram using pin electronics and
subsampling monobit reconstruction algorithms," Proc. the IEEE VLSI
Test Symposium (VTS), 2014.
Fig. 12 – Duty-cycle adjustment (60%-40%) at 40Gbps.
274
Authorized licensed use limited to: Indraprastha Institute of Information Technology. Downloaded on December 01,2024 at 09:23:36 UTC from IEEE Xplore. Restrictions apply.