3 Digital Systems Implementation
Programmable Logic Devices
Why
Basic FPGA Architectures
All
Programmable Logic Devices (PLDs)?
FPGAs have the following key elements:
The Programming technology
The basic logic cells
The I/O logic cells
Programmable interconnect
Software to design and program the FPGA
Low cost, low risk way of implementing digital
circuits as application specific ICs (ASICs).
Technology of choice for low to medium volume
products (say hundreds to few 10s of thousands per
year).
Good and low cost design software.
Latest high density devices are over 1 million gates!
Currently
the four main players in this field are:-
Actel
Altera
Xilinx
Atmel
Imperial College, 2005
Digital System Design
3.1
PLD Technologies: Antifuse
Imperial College, 2005
Digital System Design
3.2
Actel FPGAs
Invented at Stanford and developed by Actel. Currently mainly
used for military applications. See www.actel.com.
Imperial College, 2005
Digital System Design
3.3
Uses antifuse technology
Based on channelled gate
array architecture as shown
below
Each logic element (labelled
L) is a combination of
multiplexers which can be
configured as a multi-input
gate
Imperial College, 2005
Digital System Design
3.4
Actel Logic Element
PLD Technologies: EPROM & EEPROM
Imperial College, 2005
Digital System Design
3.5
Altera MAX CPLDs
= Complex Programmable Logic
Devices
FPGA = Field Programming Gate Arrays
Altera has four different PLD families:
MAX family product-term based macrocells
CPLDs
FLEX family SRAM based lookup tables (LUTs)
APEX family mixture of product-term and LUT
based devices
Stratix family Advanced FPGAs with embedded
blocks (Stratix-2 is currently the most advanced
FPGA devices)
Digital System Design
Imperial College, 2005
Digital System Design
3.6
MAX7000 Logic Element
CPLD
Imperial College, 2005
Generally used in product-term type of PLDs.
Non-volatile and reprogrammable.
Good for FSM, less good for arithmetic circuits.
3.7
macrocell implements a Boolean expression in the form of sum-ofproduct (SOP).
An example of such a sum-of-product is: a!bc!d + ace + !af
Each product term could have many input variables ANDed together.
A SOP could have a number of product terms ORed together.
Each macrocell also contains a flip-flop essential for implementing
FSM.
Imperial College, 2005
Digital System Design
3.8
MAX7000 Logic Element
MAX7000 LABs
Each horizontal line represent a product term.
Inputs are presented to the product term as signal and its inverse.
Each macrocell can normal OR 4 product terms together.
Each LAB share an additional 16 shared product terms in order to
cope with more complex Boolean equations.
Output XOR gate allows either efficient implementation of XOR
function or programmable logic inversion.
The SOP output can drive the output directly or can be passed
through a register.
This architecture is particularly good for implementing finite state
machine.
Each register can store one state variable. This can be fed back to
the logic array via the Programmable Interconnect Array (PIA).
This is not efficient for adder or multiplier circuits or as buffer storage
(such as register file or FIFO buffers) waste the potential of the
logic array.
Imperial College, 2005
Digital System Design
3.9
PLD Technologies: Static RAM
Almost all Field Programmable Gate Arrays (FPGAs) are based on
static RAMs.
Static RAM cells are used for three purposes:
Advantages:
Imperial College, 2005
Digital System Design
3.10
Xilinx FPGAs
As lookup tables (LUTs) for implementing logic (as truth-table).
As embedded block RAM blocks (for buffer storage etc.).
As control to routing and configuration switches.
Consists of Logic Array Blocks (LABs), each with 16 macro-cells
PIA = Programmable Interconnect Array
Start with XC4000
Virtex Family
Virtex-II Family
Virtex-II PRO (We use this for your coursework)
Virtex-4 Families
Easily changeable (even dynamic reconfiguration)
Good density
Track latest SRAM technology (moving even faster than technology for logic)
Flexible no only good for FSM, also good for arithmetic circuits
Disadvantages:
Volatile
Generally high power
Imperial College, 2005
Digital System Design
3.11
Imperial College, 2005
Digital System Design
3.12
Xilinx FPGA (XC4000)
Xilinx FPGA (XC4000)
Xilinx first to
introduce SRAM
based FPGA using
Lookup Tables
(LUTs)
Xilinx 4000 series
contains four main
building blocks:
Configurable Logic
Block (CLB)
Switch Matrix
VersaRing
Input/Output Block
Imperial College, 2005
Digital System Design
3.13
Xilinx FPGA Switch Matrix Routing
Imperial College, 2005
Digital System Design
3.14
Xilinx FPGA (XC4000)
Each CLB has two 4-input Lookup Tables (LUTs) and two reigsters.
The two LUTs implement two independent logic functions F and G.
The outputs F and G from the two LUTs inside each CLB can be
combined to form a more complex function H.
CLBs are linked together to form carry and cascade chain circuits
(not shown in diagram)
For the 4000E familiess, each CLB can be configured as
synchronous RAM. Write address, data, and control are synchronized
to write clock. This is called distributed RAM.
Possible configurations are:
Two independent 16 x 1 RAMS
One 32 x 1 or 16 x 2 RAM
One 16 x 1 dual-port RAM (second port is read-only)
Imperial College, 2005
Digital System Design
3.15
Imperial College, 2005
Digital System Design
3.16
Virtex Slice Model
CARRY
CARRY
SINGLE
LONG
LONG
HEX
HEX
SWITCH
MATRIX
SINGLE
DIRECT
CONNECT
SINGLE
SLICE
SLICE
Imperial College, 2005
DIRECT
CONNECT
Local
Feedback
CLB
CARRY
CARRY
LONG
SINGLE
TRISTATE BUSSES
HEX
All CLB inputs have
access to
interconnect on all
four sides
Two identical slices in
each CLB
Slices have bit pitch
of 2
Fast local feedback
within CLB
Direct connects to
adjacent horizontal
CLBs
LONG
HEX
Xilinx new families: Virtex-2 CLB
Digital System Design
3.17
Imperial College, 2005
Cout
ACTUAL CIRCUIT
Cout
Cinit
Cinit
Dynamically Addressable Shift
D
CE
Register (DASR)
CLK
Ultra-efficient programmable
clock cycle delays
One LUT for up to 16 cycle
delay
Can cascade DASRs for longer
than 16 clock cycle delay
2 LCs
Cinit
Cin
Imperial College, 2005
D Q
CE
>
D Q
CE
>
D Q
CE
>
o
o
o
CLB
Cinit
3.18
Virtex LUT as Pipeline Delay
Virtex Carry Select
FUNCTIONAL
Digital System Design
D Q
CE
>
2 LCs
15
A[3:0]
0
1
Cin
Digital System Design
3.19
Imperial College, 2005
Digital System Design
3.20
XCV600 (48x72) with Block RAMs
SelectShift Replaces Register Files
A[31:0]
B[31:0]
C[31:0]
32 LUTs replace 256 registers
A[31:0]
B[31:0]
C[31:0]
32-bit
8 cycle
delay
Function
G
(5 cycles)
Function
F
(8 cycles)
32-bit
13 cycle
delay
Imperial College, 2005
Function
H
(1 cycle)
32 LUTs replace 416 registers
Digital System Design
3.21
Virtex Block SelectRAM
Imperial College, 2005
Digital System Design
3.22
Virtex-II Pro Platform FPGA
Columns of Blocks on left
and right
1 Block per 4 CLB rows.
4K bits of data
Full Synchronous operation
3.125 Gbps Multi-Gigabit
Transceivers (MGTs)
Supports 10 Gbps standards
Up to 24 per device
No Asynchronous Read
can be configured to
different widths
Synchronous reset for Finite
State Machine
MGT
MGT
Fabric
IP-Immersion Fabric
ActiveInterconnect
18Kb Dual-Port RAM
Xtreme Multipliers
16 Global Clock Domains
Ports
Imperial College, 2005
ADDR
(11:0)
(10:0)
(9:0)
(8:0)
(7:0)
Digital System Design
DATA
(0:0)
(1:0)
(3:0)
(7:0)
(15:0)
#/Width
1
2
4
8
16
Depth
4096
2048
1024
512
256
3.23
PowerPC 405 Core
300+ MHz / 450+ DMIPS
Performance
Up to 4 per device
Imperial College, 2005
MGT
Digital System Design
MGT
3.24
Virtex-II PRO - Platform FPGA Technology
Processor Integration Technology
Leading
edge 130nm process technology
Ultra-high performance 92nm gate length
Transistors
1.5V Core
9 Metal layers - All Copper technology
IP-Immersion
BRAM
Fabric
BRAM
Gb
Gb
IP-Immersion Tiles
BRAM
Digital System Design
3.25
405 Core
FPGA CLB Array
Gb
Interface Logic
BRAM
Digital System Design
3.26
Virtex-2 PRO Board for coursework
Virtex-II Pro & MicroBlaze
Imperial College, 2005
provide IP-to-Fabric connectivity
DOCM Controller
IOCM Controller
Control Logic
Gb
Imperial College, 2005
PowerPC 405 Core
RISC
32-bit ALU, 32-bit data bus, 32-bit instruction word, 32 x 32
General Purpose Register file
Harvard architecture (i.e. separate program and data
memory space)
3 stage pipeline (IF, OF, EX)
Proprietary instruction set has been created for
MicroBlaze.
Imperial College, 2005
Digital System Design
3.27
Imperial College, 2005
Digital System Design
3.28
4th Generation Virtex V4
Advanced 90nm Process Technology
Advanced 90-nm Process
11-Layer Copper
Metallization
New Triple-Oxide
Technology
Enables Lower Quiescent
Power Consumption
Exclusive Benefits:
Best Cost
Greatest Performance
Lowest Power
Highest Density
Enables 2x Performance, 2x Capacity, Power, Cost
Imperial College, 2005
Digital System Design
3.29
Imperial College, 2005
Digital System Design
3.30
Three Virtex-4 Platforms
New ASMBL Columnar Architecture
LX
Revolutionary
Advance in
FPGA Architecture
FX
SX
Resource
Logic
Enables
Dial-In Resource
Allocation Mix
Memory
DCMs
DSP Slices
Logic, DSP, BRAM, I/O, MGT,
DCM, PowerPC
SelectIO
RocketIO
Enabled
by Flip-Chip
Packaging Technology
PowerPC
Ethernet MAC
System Monitor
I/O Columns Distributed Throughout
the Device
14K14K-200K LCs
12K12K-140K LCs
23K23K-55K LCs
0.90.9-6Mb
0.60.6-10Mb
2.32.3-5.7Mb
4-12
4-20
4-8
3232-96
3232-192
128128-512
240240-960
240240-896
320320-640
N/A
0-24 Channels
N/A
N/A
1 or 2 Cores
N/A
N/A
2 or 4 Cores
N/A
0-1+ADC
0-1+ADC
0-1+ADC
Choose the Platform that Best Fits the Application
Imperial College, 2005
Digital System Design
3.31
Imperial College, 2005
Digital System Design
3.32