Design and Verification of Asynchronous FIFO
using System Verilog
By
Rahul Kumar
FIFO (First In First Out)
FIFO stands for First-In, First-Out, a type of memory buffer or queue in which the first data written is the
first to be read. It is widely used in digital systems for temporary storage and data synchronization
between two processes.
It comes Between Two modules, these modules can have same frequency or different. Based on this
fifo is divide in two parts:-
1. Synchronous FIFO
2. Asynchronous FIFO
1. Synchronous FIFO
A Synchronous FIFO (First-In, First-Out) is a type of FIFO memory where both the write and read
operations are controlled by the same clock signal.
Q. If both modules have same frequency then why do we need fifo?
Physically, both modules are clocked by the same clock (so frequency is the same). However, in
practical VLSI design, due to effects like clock gating, clock skew, or duty cycling, it may look like
modules operate at slightly different rates (e.g., one is gated for some cycles). Both modules frequency
have minimal difference of frequency.
Synchronous FIFOs are easier to implement than asynchronous FIFOs because they don't face challenges
like metastability or clock domain crossing.
Applications of Synchronous FIFO:
➢ Used for reliable data transfer between blocks within a chip that operate on the same clock,
such as between a CPU core and a tightly coupled accelerator.
➢ In cache or memory subsystems, synchronous FIFOs buffer read/write requests or data lines to
optimize memory bandwidth usage.
➢ Used to buffer data or control signals within bus-based communication protocols where the
master and slave share the same clock.
➢ In peripherals like UART, SPI, or ADC, synchronous FIFOs temporarily hold data to prevent data
loss when the processor is momentarily busy.
2. Asynchronous FIFO
An Asynchronous FIFO is a special type of FIFO where the write and read operations are controlled by
different clock domains.
Key challenges in asynchronous FIFO design include:
➢ Metastability due to clock domain crossing
➢ Safe pointer synchronization (write pointer in read clock domain and vice versa).
➢ Full and empty flag generation without timing errors.
These challenges are typically addressed using techniques like Gray code counters and double flip-flop
synchronizers to ensure reliable operation.
Application of Asynchronous FIFO
➢ In large SoCs, different subsystems (e.g., CPU, GPU, DSP, I/O) often run on independent clocks.
Asynchronous FIFOs act as buffers between these blocks to ensure smooth communication.
➢ Interfaces like USB, Ethernet, PCIe, and UART often require CDC between fast and slow
domains. Asynchronous FIFOs buffer incoming/outgoing data to handle rate mismatch and
burst transfers.
➢ In low-power designs, different blocks might power up/down independently. Asynchronous FIFO
helps retain data integrity when a block wakes up or sleeps.
Architecture of Asynchronous FIFO:-
Working of Asynchronous FIFO
An asynchronous FIFO is a memory buffer used to safely transfer data between two clock domains that
are not synchronized (i.e., have different clocks: wr_clk and rd_clk).
Initially, before writing any data, the FIFO is empty. This means both the write pointer (wr_ptr) and the
read pointer (rd_ptr) are pointing to the same memory location.
Write Operation
When wr_clk and wr_en (write enable) are active:
➢ The data is written into the memory at the address pointed to by the write pointer
(wr_ptr).
➢ After writing, the wr_ptr is incremented by one.
➢ This process continues with each write clock cycle, storing data sequentially.
Read Operation
When rd_clk and rd_en (read enable) are active:
➢ The data is read from the memory location pointed to by the read pointer (rd_ptr).
➢ After reading, the rd_ptr is incremented by one.
➢ This continues for every read clock cycle, retrieving data sequentially.
FIFO Conditions
➢ Empty Condition: The FIFO is empty when wr_ptr equals rd_ptr.
➢ Full Condition: The FIFO is full when the rd_ptr is exactly one position behind the wr_ptr
(i.e., in circular buffer terms, rd_ptr == wr_ptr + 1 considering wrapping).
Use of Synchronizers in Asynchronous FIFO
In asynchronous FIFO design, there’s a major challenge when signals cross from one clock
domain to another — this can cause metastability, where the signal's value becomes
unpredictable and may lead to system failure.
Use of Synchronizers in Asynchronous FIFO
In asynchronous FIFO design, there’s a major challenge when signals cross from one clock domain to
another — this can cause metastability, where the signal's value becomes unpredictable and may lead
to system failure.
Q. Why Synchronizers are needed?
➢ Only pointers (wr_ptr and rd_ptr) need to cross clock domains, not the actual data in or data
out.
➢ Hence, the synchronizer is applied only to pointers, not to data.
➢ To avoid metastability, a double flip-flop synchronizer (also called two-stage synchronizer) is
used.
Q. How Synchronization Works?
First, the binary pointer (wr_ptr or rd_ptr) is converted into Gray code. Gray code is used because it
changes only one bit at a time between successive values, reducing the chance of metastability.
The Gray-coded pointer is then passed through a two-stage synchronizer before being used in the other
clock domain.
Example:
The rd_ptr (in read clock domain) is converted to Gray code and synchronized into the write clock
domain.
This synchronized pointer is then used in the write logic to check whether the FIFO is full.
// Synchronized pointers (across clock domains)
Asynchronous FIFO Design Code reg [ADDR_WIDTH:0] rd_gray_sync1, rd_gray_sync2;
`timescale 1ns / 1ps reg [ADDR_WIDTH:0] wr_gray_sync1, wr_gray_sync2;
module asyn_fifo #( // Write logic
always @(posedge wr_clk or posedge rst) begin
parameter DATA_WIDTH = 8,
if (rst) begin
parameter ADDR_WIDTH = 2
wr_bin_ptr <= 0;
)(
wr_gray_ptr <= 0;
input wr_clk,
end else if (wr_en && !full) begin
input rd_clk,
mem[wr_bin_ptr[ADDR_WIDTH-1:0]] <= data_in;
input rst,
wr_bin_ptr <= wr_bin_ptr + 1;
input wr_en,
wr_gray_ptr <= (wr_bin_ptr + 1) ^ ((wr_bin_ptr + 1)
input rd_en, >> 1);
input [DATA_WIDTH-1:0] data_in, end
output reg [DATA_WIDTH-1:0] data_out, end
// Read logic
output full,
always @(posedge rd_clk or posedge rst) begin
output empty
if (rst) begin
);
rd_bin_ptr <= 0;
localparam DEPTH = 1 << ADDR_WIDTH;
rd_gray_ptr <= 0;
// Memory
data_out <= 0;
reg [DATA_WIDTH-1:0] mem [0:DEPTH-1];
end else if (rd_en && !empty) begin
// Binary pointers
data_out <= mem[rd_bin_ptr[ADDR_WIDTH-1:0]];
reg [ADDR_WIDTH:0] wr_bin_ptr, rd_bin_ptr;
rd_bin_ptr <= rd_bin_ptr + 1;
// Gray code pointers
rd_gray_ptr <= (rd_bin_ptr + 1) ^ ((rd_bin_ptr +
reg [ADDR_WIDTH:0] wr_gray_ptr, rd_gray_ptr; 1) >> 1);
end
end
// Empty detection (in read clock domain)
// Synchronize write pointer into read clock domain assign empty = (rd_gray_ptr == wr_gray_sync2);
always @(posedge rd_clk or posedge rst) begin
if (rst) begin endmodule
wr_gray_sync1 <= 0;
wr_gray_sync2 <= 0;
end else begin
wr_gray_sync1 <= wr_gray_ptr;
wr_gray_sync2 <= wr_gray_sync1;
end
end
// Synchronize read pointer into write clock domain
always @(posedge wr_clk or posedge rst) begin
if (rst) begin
rd_gray_sync1 <= 0;
rd_gray_sync2 <= 0;
end else begin
rd_gray_sync1 <= rd_gray_ptr;
rd_gray_sync2 <= rd_gray_sync1;
end
end
// Full detection (in write clock domain)
assign full = (wr_gray_ptr == {~rd_gray_sync2[ADDR_WIDTH:ADDR_WIDTH-1],
rd_gray_sync2[ADDR_WIDTH-2:0]});
Verification of Asynchronous FIFO using System Verilog
Q. If we can verify the design using test bench in verilog then why we are using system verilog?
Verilog testbenches can be used to verify digital designs, SystemVerilog was developed to address
limitations in Verilog and to make verification more powerful, modular, and reusable.
Limitations of Verilog Testbenches
➢ No object-oriented features
Verilog is not object-oriented, so writing reusable and scalable testbenches is difficult.
➢ Hard to manage complex protocols
In Verilog, modeling complex behaviors (like handshake protocols, AXI, AMBA, etc.) requires
writing long procedural code, making it error-prone and messy.
➢ Limited stimulus generation
Verilog lacks advanced stimulus generation like randomization, constraints, and functional
coverage.
SystemVerilog Verification Flow Using Class-Based Testbench
The DUT (Design Under Test) is the module to be verified. It communicates via ports. In SystemVerilog,
test data is packaged into transaction objects for easier generation and checking, whereas in a Verilog
testbench, data is generated and applied directly in pin-level form.
Pin-Level Data Generation (Verilog Testbench)
In Verilog, we write testbenches by manually assigning values to DUT ports. This means we deal directly
with signals, like a, b, sel, etc.
Ex:-
initial begin
a = 1'b0;
b = 1'b1;
sel = 1'b0;
#10;
a = 1'b1;
b = 1'b0;
sel = 1'b1;
end
Package-Level Data Generation (SystemVerilog using Transactions)
In SystemVerilog, we use transaction classes to group related input/output signals into a single
object.
// Transaction class
class mux_trans;
rand bit a;
rand bit b;
rand bit sel;
bit out; // output (not randomized)
endclass
// In Generator
mux_trans tx;
tx = new();
tx.randomize(); // Generates random values for a, b, sel
// Driver will take tx.a, tx.b, tx.sel and apply to DUT via interface
➢ Pin-Level vs Package-Level
Feature Verilog (Pin-Level) SystemVerilog (Package-Level)
Abstracted through objects
Signal handling Direct assignment to pins
(transaction class)
High (easy reuse of transaction
Code reusability Low
logic)
Built-in randomization with
Randomization Manual/random at signal level
constraints
Easier (one object holds all
Debugging Tedious (many signals)
info)
Abstraction and scalability Poor for large designs Excellent for complex designs
Note
The DUT will always communicate through its pins (ports), regardless of whether the testbench
is written in Verilog or SystemVerilog.
System Verilog Testbench Architecture
1. TestBench_Top: It contains both the testbench and the DUT.
2. Interface: Acts as a communication bridge between the testbench and the DUT; it holds
the signal definitions used by the test components.
3. Test: This is where the Environment class is instantiated and controlled.
4. Environment: Contains the Generator, Driver, Monitor, and Scoreboard components.
1. Transaction Class
• This is the first class we write.
• It contains all input and output signals of the DUT.
• Randomization is applied only to input signals (e.g., rand logic wr_en,rd_en;).
• Output signals are defined but not randomized.
• The transaction class acts like a "package of data", describing a single operation or
scenario.
2. Generator Class
• This class is responsible for creating different test cases.
• It creates multiple randomized transaction objects.
• These objects (data packets) are placed into a mailbox, which is a communication
mechanism between generator and driver.
3. Driver Class
• The driver picks up the transaction object from the mailbox.
• It converts the abstract data (from the transaction) into pin-level signals.
• These pin-level signals are driven to the DUT via the interface.
4. DUT Output Handling
• The DUT processes the inputs and generates outputs on its pins.
5. Monitor Class
• The monitor reads DUT output signals (pins) via the interface.
• It converts these pin-level outputs back into a transaction object (package form).
• This transaction is then passed to the scoreboard using a mailbox.
6. Scoreboard
• The scoreboard compares the DUT output transaction with the expected (golden)
output.
• If they match, the test passes. If not, it fails.
7. Environment Class
• This is the top-level block that connects all components:
o Generator
o Driver
o Monitor
o Scoreboard
• It creates and manages instances of these components.
8. Test Class
• The test class instantiates the environment.
• It defines which type of test case to run (e.g., basic, corner, random).
• The test communicates with the DUT and triggers simulation.
9. Final Testbench
• The testbench contains:
o DUT instance
o Interface connection
o Clock and reset logic
o Test class instance
Transaction Class
class transaction; // packet class
parameter DATA_WIDTH = 8;
parameter ADDR_WIDTH = 4;
rand bit wr_en;
rand bit rd_en;
rand bit [DATA_WIDTH-1:0]data_in;
bit [DATA_WIDTH-1:0]data_out;
bit full;
bit empty;
function display (string name);
$display ("---%s---",name);
$display ("wr_en=%0b | rd_en=%0b | data_in=%0d | data_out=%0d | full=%0b |
empty=%0b",wr_en,rd_en,data_in,data_out,full,empty);
$display ("....................");
endfunction
endclass
Generator Class
class generator;
transaction trans;
mailbox gen2drv; function new (mailbox gen2drv);
this.gen2drv = gen2drv;
endfunction
task main();
repeat (50)
begin
trans=new();
trans.randomize();
trans.display("Generator class signal");
gen2drv.put(trans);
#1;
end
endtask
endclass
vif.rd_en <= 0;
Driver
//@(posedge vif.rd_clk);
class driver;
end
virtual intf vif;
end
mailbox gen2drv;
endtask
function new (virtual intf vif, mailbox gen2drv);
endclass
this.vif = vif;
this.gen2drv = gen2drv;
endfunction
task main();
transaction trans;
repeat (50) begin
gen2drv.get(trans);
trans.display("Driver class signals");
// Apply write operation if wr_en is set
if (trans.wr_en) begin
@(posedge vif.wr_clk);
vif.data_in <= trans.data_in;
vif.wr_en <= 1;
@(posedge vif.wr_clk);
vif.wr_en <= 0;
end
// Apply read operation if rd_en is set
if (trans.rd_en && !trans.empty) begin
@(posedge vif.rd_clk);
vif.rd_en <= 1;
@(posedge vif.rd_clk);
Monitor
if (vif.rd_en && !vif.empty) begin
class monitor;
transaction trans = new();
virtual intf vif;
trans.rd_en = vif.rd_en;
mailbox mon2scb;
trans.empty = vif.empty;
function new(virtual intf vif, mailbox mon2scb);
trans.data_out = vif.data_out;
this.vif = vif;
mon2scb.put(trans); // Send read info
this.mon2scb = mon2scb;
// trans.display("monitor: READ");
endfunction
end
// Write side monitor
end
task monitor_write();
endtask
forever begin
// Launch both monitors in parallel
@(posedge vif.wr_clk);
task main();
if (vif.wr_en && !vif.full) begin
fork
transaction trans = new();
monitor_write();
trans.wr_en = vif.wr_en;
monitor_read();
trans.data_in = vif.data_in;
join
trans.full = vif.full;
endtask
mon2scb.put(trans);
endclass
// trans.display("monitor: WRITE");
end
end
endtask
// Read side monitor
task monitor_read();
forever begin
@(posedge vif.rd_clk);
$display("!!!! ERROR: FIFO Underflow Attempt Detected
!!!!");
Scoreboard $display("At time %0t: rd_en=1 while empty=1", $time);
class scoreboard; end
bit [7:0] expected_data; // Check data_out if reading from non-empty FIFO
bit [7:0] model_fifo[$]; else if (trans.rd_en && !trans.empty) begin
mailbox mon2scb; expected_data = model_fifo.pop_back();
virtual intf vif; if (trans.data_out == expected_data) begin
function new(virtual intf vif, mailbox mon2scb); $display("****** PASS: Correct Data Read ******");
this.vif = vif; $display("Expected = %0d, Got = %0d", expected_data,
trans.data_out);
this.mon2scb = mon2scb;
end else begin
endfunction
$display("!!!! FAIL: Data Mismatch !!!!");
task main();
$display("Expected = %0d, Got = %0d", expected_data,
trans.data_out);
transaction trans;
end
repeat (50) begin
end
mon2scb.get(trans);
// $display(""); // Separator
@(posedge vif.wr_clk);
end
// -------------------- Overflow Check --------------------
endtask
if (trans.wr_en && trans.full) begin
endclass
$display("!!!! ERROR: FIFO Overflow Attempt Detected !!!!");
$display("At time %0t: wr_en=1 while full=1. Data = %0d", $time, trans.data_in);
end
// Push only if not full
else if (trans.wr_en && !trans.full) begin
model_fifo.push_front(trans.data_in);
end
// -------------------- Underflow Check --------------------
if (trans.rd_en && trans.empty) begin
environment class
//include all lower classes
`include "transaction.sv"
`include "generator.sv"
`include "driver.sv"
`include "monitor.sv"
`include "scoreboard.sv"
class environment;
generator gen;
driver drv;
monitor mon;
scoreboard scb;
mailbox gen2drv;
mailbox mon2scb;
virtual intf vif;
function new (virtual intf vif);
this.vif = vif;
gen2drv = new();
mon2scb = new();
gen = new(gen2drv);
drv = new(vif, gen2drv);
mon = new(vif, mon2scb);
scb = new(vif,mon2scb);
endfunction
Test
`include "environment.sv"
program test(intf intff);
environment env;
initial begin
env = new(intff);
env.test_run();
end
endprogram
.data_out(intff.data_out),
.full(intff.full),
Testbench
.empty(intff.empty)
`include "interface.sv"
);
`include "test.sv"
// VCD dump
module testbench;
initial begin
// Clock signals (you likely need 2 for async FIFO: wr_clk and rd_clk)
$dumpfile("dump.vcd");
logic wr_clk = 0, rd_clk = 0;
$dumpvars(0, testbench);
always #3 wr_clk = ~wr_clk; // Write clock
end
always #5 rd_clk = ~rd_clk; // Read clock
initial
// Interface instantiation
begin
intf intff();
intff.rst=1;
// Connect clocks to interface if required
intff.wr_en=0;
assign intff.wr_clk = wr_clk;
intff.rd_en=0;
assign intff.rd_clk = rd_clk;
#10 intff.rst=0;
// Test program instantiation
test tst(intff);
end
// DUT instantiation — replace with your FIFO module name and ports
// Simulation end
asyn_fifo #(
initial begin
.DATA_WIDTH(8),
#2000 $finish;
.ADDR_WIDTH(4)
end
) dut (
endmodule
.wr_clk(intff.wr_clk),
.rd_clk(intff.rd_clk),
.rst(intff.rst),
.wr_en(intff.wr_en),
.rd_en(intff.rd_en),
.data_in(intff.data_in),
Output Waveform
TCL Console
# run -all
# ****** PASS: Correct Data Read ******
# Expected = 49, Got = 49
# ****** PASS: Correct Data Read ******
# Expected = 112, Got = 112
# ****** PASS: Correct Data Read ******
# Expected = 99, Got = 99
# ****** PASS: Correct Data Read ******
# Expected = 209, Got = 209
# ****** PASS: Correct Data Read ******
# Expected = 221, Got = 221
# ****** PASS: Correct Data Read ******
# Expected = 38, Got = 38
# ****** PASS: Correct Data Read ******
# Expected = 234, Got = 234
# ****** PASS: Correct Data Read ******
# Expected = 240, Got = 240
# ****** PASS: Correct Data Read ******
# Expected = 12, Got = 12
# ****** PASS: Correct Data Read ******
# Expected = 198, Got = 198
# ****** PASS: Correct Data Read ******
# Expected = 4, Got = 4
# ****** PASS: Correct Data Read ******
# Expected = 71, Got = 71
# ****** PASS: Correct Data Read ******
# Expected = 217, Got = 217
# ****** PASS: Correct Data Read ******
# Expected = 47, Got = 47
# ****** PASS: Correct Data Read ******
# Expected = 27, Got = 27
# ****** PASS: Correct Data Read ******
# Expected = 173, Got = 173
# ****** PASS: Correct Data Read ******
# Expected = 179, Got = 179
# ****** PASS: Correct Data Read ******
# Expected = 164, Got = 164
# ****** PASS: Correct Data Read ******
# Expected = 175, Got = 175
# ****** PASS: Correct Data Read ******
# Expected = 64, Got = 64
# ** Note: $finish : testbench.sv(57)
Output Observation
1. Our golden data which we generated in score board is matching with the data coming from dut
via monitor class.
2. Output data is coming after one clock cycle of rd_clk.
3. The overflow and underflow conditions implemented in the scoreboard are running successfully,
indicating that these conditions are being correctly detected and handled.
Tool used
1. Siemens Questa