High Speed USB 2.
0 Interface for FPGA Based Embedded Systems
Fatemeh Arbab Jolfaei, Neda Mohammadizadeh, Mohammad Sadegh Sadri, Fatemeh FaniSani
Isfahan University of Technology, Department of Electrical & Computer Engineering
f.arbabjolfaee@ec.iut.ac.ir
Abstract makes Spartan-3 FPGA a suitable choice for use in
USB peripheral designs.
FPGA implementation of high speed serial In this paper, we connect a USB 2.0 controller to an
peripherals such as USB 2.0 are of great use. The FPGA to build a complete USB 2.0 system. Generally,
Cypress SX2 USB 2.0 controller is one of the suitable commercial USB controllers can be divided into two
choices for developing FPGA based USB peripherals. categories: The first category includes stand alone
A simple interface module capable of transferring data controllers which contain an integrated microcontroller
rates above 400Mbits/s can be implemented to in addition to USB interfacing modules. The second
communicate with SX2. FPGAs can efficiently be used category includes USB controllers which do most of
for building embedded systems. Xilinx complete set of needed USB interactions automatically and for other
development tools make implementation of large operations, rely on an external master, so user will
System-On-Chip designs feasible. We present two develop all of the needed functionalities on his own
complete architectures for connecting SX2 to FPGA. FPGA, DSP or CPU.
First design minimizes FPGA resource usage while Cypress is one of the most famous providers of
keeping a reasonable speed. In the second design, USB 2.0 controllers. SX2 is a Cypress’s offering that
optimizations are done to reach maximum USB 2.0 has a built-in USB transceiver and Serial Interface
interface speed at the cost of some additional logic. In Engine (SIE), along with a command decoder for
order to use developed module in Xilinx embedded sending and receiving USB data [3]. It automatically
design flow, we make a custom peripheral which responds to USB standard requests without any
includes SX2 interface as its core and additional logic external master intervention. The SX2 presents two
capable of connecting to OPB and PLB. interfaces to the external master, a FIFO interface and
a command interface. FIFO/Command interface can be
asynchronous or synchronous. At startup the external
1. Introduction master use the default descriptor built into the SX2 or
load a complete descriptor into SX2. Here we use the
High speed easy to use peripheral interfaces like second choice.
USB 2.0 are of great use in today's world. USB 2.0 SX2 is mainly designed to be connected to
standard specifies a transfer rate of 480 Mega Bits per microcontroller like devices. So, one idea for FPGA
second for it's peripheral devices [1,7]. The high data implementation of an interface to SX2 is to use a
transfer rate of USB 2.0 interface makes it a suitable microprocessor core.
choice for many different purposes such as data Xilinx provides different microprocessor cores.
acquisition and processing. Virtex-4 family contains a high performance PowerPC
The Xilinx Spartan-3 FPGA family, built on the hard core inside FPGA. However for other families
success of four previous generations of cost-optimized such as Spartan-3, soft microprocessor cores are
Spartan FPGAs, offers platform capabilities with a available. Examples include MicroBlaze and PicoBlaze
wide range of I/O and density options. Efficiently soft cores. MicroBlaze provides a complete
utilizing 90 nanometer technology, Spartan™-3 microprocessor system on an FPGA while providing a
devices provide aggressive speed, pricing, density and powerful 32 bit processing engine for different tasks.
feature-rich low-cost FPGAs to customers [2]. PicoBlaze can also be used in simpler designs and
Implemented hardware on a Spartan-3 FPGA is developers can get familiar with it in an hour.
easily capable of handling data throughputs as high as KCPSM3 (PicoBlaze) is a very simple 8-bit
what USB 2.0 demands. Using dedicated resources microcontroller [4], primarily designed for the Spartan-
such as multipliers it is also possible to do signal 3, Virtex-II, Virtex-II PRO, Virtex-4 and Virtex-5
processing tasks on these data. In addition, the large devices. Although it could be used in processing data,
amount of available logic resources on FPGA allows it is most likely employed in applications requiring a
integration of a complete system on a single chip. This complex but non-time critical state machine.
* Paper presented in EM-COM 2009 Korea 1
The revised version of the popular KCPSM macro This report provides a complete design of a USB
occupies just 96 spartan-3 slices which is just 5% of 2.0 peripheral using Spartan-3 FPGA. This includes
XC3S200 device and less than 0.3% of XC3S5000 system schematics, FPGA related state machines and
device. Together with this small amount of logic, a flow charts and description of program running on PC
single block RAM is used to form a ROM store for a side along with needed drivers. We try to reduce logic
program of up to 1024 instructions. Even with such resource usage as much as possible while keeping data
size constraints, the performance is respectable at transfer rate of 480Mbits/s.
approximately 43 to 100 MIPS depending on device
type and speed grade. 2. USB Interface Description
Xilinx provides a complete set of tools for
developing FPGA based embedded systems. Xilinx Figure 1 shows how SX2 is connected to an FPGA.
Embedded Development Kit (EDK) allows one to Required tasks for FPGA in this work could be divided
easily make a complete system containing CPU, into two sections: 1- Complex tasks and 2- Usual tasks.
memory controller and peripherals with just a few Most of the times complex tasks are not time critical,
mouse clicks. EDK contains a large set of ready to use but some usual tasks should be performed as fast as
cores for all parts of the system. possible. Thus we provide two choices to the user. One
As an example EDK provides Microblaze core as a choice is FPGA resource utilization of less than 100
32bits CPU, a dedicated DDR SDRAM controller core slices and limited transfer rate of up to 100Mbits/s.
and various kinds of peripherals such as: UART, I2C, The other choice is More complicated design, resource
SPI, VGA, PS/2, GPIO and Audio interfaces. usage of above 300 slices and higher performance of
EDK uses Peripheral Local Bus (PLB) and On-Chip more than 400Mbits/s.
Peripheral Bus (OPB) as means of connecting CPU
and other modules together. These busses, mainly
designed by IBM, are one of the most efficient and
most widely used set of buses for System-On-Chip
applications.
One of the keys to success in making FPGA based
embedded systems is the ability to make custom
peripherals capable of connecting to PLB and OPB.
You can build your own peripheral and add it to your
design repository. After that the core can be used
easily in any embedded system using EDK.
One of the main goals of this paper is to design a Figure 1. FPGA - SX2 connection
fully functional USB 2.0 interface and to add it to
Xilinx embedded base system. This way we can take The simplified design only consists of the KCPSM
advantage of USB 2.0 high speed peripheral in our module and a very little amount of FPGA logic. The
FPGA based embedded designs. This can be useful for block diagram of the high performance design on
anybody who is using EDK and needs a reliable high FPGA is shown in Figure 2. It consists of the
speed data transfer interface to outside world. following modules: ClockGen module Generates
There are some problems regarding practical design needed clock pulses of the entire system, with desired
of USB 2.0 interface. 400MBits/s is a very high data frequencies and phase shifts. PicoBlaze module
transfer rate, therefore data transmission between Performs needed initialization tasks and manages
FPGA and USB 2.0 chip should be done with highest exceptions and interrupts. FSM module is a high speed
possible speed. Correct clocking schemes should be module which acts as the main control unit and
utilized to ensure that all of hold/setup timing provides synchronization between different parts of the
requirements are met. Connecting the embedded USB system. IOInterface module Handles FPGA connection
interface module to PLB bus, there are some additional to SX2. DataManager module connects USB related
considerations. In addition to necessary HDL coding modules to the rest of the modules on FPGA.
the related software, executed on Microblaze or PPC,
should be developed carefully.
2
Figure 2. USB interface block diagram in high performance mode
initiates a read request. The received byte is interrupt
2.1. Clock Generator status which indicates the type of interrupt.
After receiving ready interrupt, SX2 registers are
Reaching a rate of above 400Mbits/s needs SX2 configured by PicoBlaze. Some of the configurations
device to be clocked at more than 30MHz when in are as follows:
synchronous mode. In order to meet SX2's required A suitable value for INTENABLE register is
setup and hold times, a special clocking scheme is chosen and corresponding interrupts are
used. Figure 3 shows clock generator's block diagram. enabled/disabled. IFCONFIG register contains
important settings such as selecting SX2's clock source
(internal/external) and SX2's
Synchronous/Asynchronous mode. In simplified
design, SX2 is configured to use its own internal clock
source and data transfers between FPGA and SX2 will
be in asynchronous mode. For high performance
design, synchronous data transfers between FPGA and
SX2 will be done using an FPGA provided clock
source.
In high performance design, FSM module is
Figure 3. Clock generator block diagram
responsible for sending descriptor to SX2 by receiving
"DESC report" signal from PicoBlaze. In simplified
version, this task is done by PicoBlaze itself.
2.2. PicoBlaze
After the descriptor is sent, PicoBlaze waits for
ENUMOK interrupt to ensure that enumeration
Figure 4.a and 4.b show execution flowcharts in the
process has been successful. It then checks if SX2 has
simplified design and high performance design
enumerated in either full or high speed mode.
respectively. Making SX2 enabled by setting CS#
Finishing the initialization process, PicoBlaze
active, PicoBlaze waits for ready interrupt by running
handles the received interrupts by executing the related
checkInt routine. After interrupt acquires, PicoBlaze
routines.
3
Figure 4. PicoBlaze execution Flow a) simplified mode b) high performance mode
2.3. FSM transferred, FSM enters the IDLE state. When
enumeration is completed, the SX2 will notify the
The designed FSM consists of 6 states (Figure 5). FSM with an ENUMOK interrupt which is handled
All of the transitions in this FSM are synchronized properly. Now the system is ready for its normal
with positive edge of clock. The FSM has also an operation.
asynchronous reset input. FSM leaves the IDLE state in three cases. If an
Table 1 shows the function of each state briefly. At interrupt occurs, FSM enters the INT_HANDLE state
reset, The FSM enters PICOBLAZE state in which the FSM reads the interrupt status byte to
unconditionally in which PicoBlaze will do the determine the interrupt source, and then system control
required initialization tasks. When finished, Pico Blaze will be passed to PicoBlaze which performs required
will send a message to FSM indicating that it can go to interrupt tasks and then informs FSM to continue its
DESC state. In the DESC state, the descriptor will be normal operation. If write FIFO has data to send, FSM
transferred to SX2’s descriptor RAM in command goes to WRITE state. Data will be sent to SX2 using
synchronous mode. After the entire descriptor has been slave FIFO synchronous mode. FSM stays in the
4
WRITE state until all the data in write FIFO is sent or
an interrupt occurs. In the latter case, transition to
INT_HANDLE state happens after the write operation
for the current 16-bit word is completed. When the
USB host writes to one of the OUT endpoints, the
related FIFO flag will be asserted and FSM enters the
READ state. Similar to the WRITE state, FSM stays in
this state until all data is read or an interrupt occurs.
Figure 6. Connecting USB custom module to PLB bus
Designed SX2 interface module, described in
section 2 will be added inside custom module as user
logic. Additional code is developed in order to connect
signals and make required timings.
Initialization, and control of the custom module will
be done using CPU. Software running on CPU is
responsible for this task. A set of library routines is
developed which provide the capability for user to
Figure 5. FSM execution diagram configure custom module, send/receive data and get
its status.
2.4. I/O Interface Developed custom module can be easily inserted in
any EDK base system. By the addition of library
The IOINTERFACE module is responsible for routines to the project, user can take advantage of the
controlling the I/O pins connected to SX2. module easily. Figure 6 is the block diagram of a
This block uses the 100MHz clock pulse in order to sample EDK project with custom module included in
provide appropriate timings for control signals. For the it.
rest of the operations, this module uses the same
50MHz clock pulse as other modules. Table 1. FSM states’ descriptions
State Description
3. Custom Module for EDK PICOBLAZE Performing basic initializing tasks and
responding to the interrupts
Using Embedded Development Kit create and DESC Sending the descriptor to cypress
import peripheral wizard, we made a custom module IDLE System has no special task
INT_HANDLE Determining the interrupt source
for our SX2 interface. Created custom module can
WRITE Writing data to USB IN endpoint
easily connect to PLB bus. The base system CPU can
READ Reading from USB OUT endpoint
be either Microblaze or PowerPC. Microblaze can be
used in designs with any of Spartan-II/III or Virtex
targets. PowerPC hard CPU core is available just in 4. PC Side Software
Virtex-IIP or Virtex-4/5FX FPGAs. Our custom
module uses three different ways to talk to base system Two complete sets of exmple programs are
CPU and other modules on PLB: 1- Soft registers: developed for both Windows and Linux. The most
dedicated registers inside custom module, mainly used important point in USB software development is the
for initialization and confirmation. 2- Interrupt: Our kernel mode driver needed for communication with
custom module is capable of interrupting the CPU USB device.
when an event happened. For example when a new Sample codes for sending data to USB peripheral
USB packet is arrived. 3- FIFOs : These are the main and receiving data from it, using libUSB [5], is
logic inside custom module responsible for developed and tested for both windows and Linux
transmission and reception of data from PLB. operating systems. Qt development environment is
used for Linux based samples. Another solution is to
use USB 2.0 drivers provided by Cypress [6]. This
5
solution is solely for windows Operating System.
Sample Windows applications are developed using Table 3. FPGA resource usage
MFC (MS Visual Studio). Design Type Device Slices BRAMs DLLs
High XC2S300E 344 out 9 out of 1 out
performance of 3072 16 (56%) of 4
(11%) (25%)
Simple XC2S300E 191 out 5 out of ---
design of 3072 16
(6%) (31%)
Simple XC3S400 34 out of 1 out of ---
design 1792 16
(1%) (6%)
6. Conclusion
USB 2.0 interfaces can play an important role in
embedded systems. FPGAs provide a high level of
flexibility and processing power and so can be widely
Figure 7. USB 2.0 interface on XSB300E board used in various kinds of embedded systems for
different applications. Xilinx provides a complete set
5. Practical Implementation of development tools for building a complete system
on one FPGA. It is a big advantage to be able to add a
high speed USB 2.0 interface to an FPGA based
This design is implemented on a Xess© XSB300E
embedded system. We have developed and tested a
board which mounts a Xilinx® Spartan–IIE
complete design of a USB 2.0 interface which provides
XC2S300E FPGA device connected to Cypress
data transfer rate of over 400Mbits/s. Finally we have
CY7C68001 USB controller. We also developed a new
integrated the developed high speed module into
board using Spartan-3 FPGA, composite video
Embedded Development Kit. One can easily use the
interfaces and SX2 USB2.0 interface. The board is
module in his/her FPGA based embedded design for
capable of transferring digitized video over USB to
whatever application needed.
PC.
Table 2. Performance test results 7. References
Experiment Device Operating Achieved
Description System Rate [1] Jan Axelson, USB Complete - Everything you need to
Transfer a 400 MByte XC2S300E Windows 432 develop custom USB peripherals, Lakeview research, 3rd
file to board in high (CyUSB) Mbits/s Edition, Aug 2005
performance mode [2] "Spartan-3 FPGA Family: Complete Data Sheet", Xilinx
Transfer a 400 MByte XC2S300E Linux 410 Corp., Aug 2005
file to board in high (libUSB) Mbits/s
[3] "CY7C68001 EZ-USB SX2 High-Speed USB Interface
performance mode
Device", Cypress Corp., Jun 2005
Transfer digitized XC3S400 Linux 113
video to PC using (libUSB) Mbits/s [4] "PicoBlaze 8-Bit Embedded Microcontroller User
simple design Guide", Xilinx Corp, Nov 2005
[5] "libUSB Manual", libUSB Project, Available at:
http://libusb.sourceforge.net/
In high performance design 16-bit SX2 bus is
[6] "Cypress CyUSB.sys programmer's reference", Cypress
selected and a 50 MHz clock pulse is used to Corp, 2003
synchronize FPGA and SX2. So data transmission rate [7] "USB in a nutshell, Making sense of the USB standard",
between FPGA and SX2 is high enough to make the Craig Peacock, Nov 2002, Available at:
480 Mbits/s rate of USB 2.0 standard achievable. www.beyondlogic.org
A wide range of USB 2.0 products, design cores
and IPs are available in the market, however the
authors could not find any solution capable of being
implemented on all of the Spartan2/3, Virtex4,5 and
Virtex-6 FPGAs. Besides, in many cases the provided
core supports just USB full speed interface and not
480Mbits/s rate.