[go: up one dir, main page]

100% found this document useful (1 vote)
98 views44 pages

Lecture 5 Communicating With Peripherals 2021

This document discusses communicating with peripherals through interfaces. It provides examples of communicating with a LED by configuring GPIO pins as outputs and toggling them, as well as using UART to communicate serially with external devices. Offloading UART processing to a dedicated controller can improve performance by serializing/deserializing data in shift registers and buffering transactions in FIFOs. The CPU can poll peripherals by checking their status registers to see when data is ready.

Uploaded by

Pavan Dhake
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
98 views44 pages

Lecture 5 Communicating With Peripherals 2021

This document discusses communicating with peripherals through interfaces. It provides examples of communicating with a LED by configuring GPIO pins as outputs and toggling them, as well as using UART to communicate serially with external devices. Offloading UART processing to a dedicated controller can improve performance by serializing/deserializing data in shift registers and buffering transactions in FIFOs. The CPU can poll peripherals by checking their status registers to see when data is ready.

Uploaded by

Pavan Dhake
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 44

Communicating

with Peripherals
(Interfaces Part II)
Advanced Digital VLSI Design I
Bar-Ilan University, Course 83-614
Semester B, 2021
13 May 2021

Heavily based on the wonderful lecture


“Interfaces: External/Internal, or why
CPUs suck” by Tzachi Noy, 2019
What do a car and a router have in common?

© Adam May
Teman,
13, 2021
Both the car and the router have interfaces

So let’s go ahead and try to build a router…


© Adam May
Teman,
13, 2021
Reminder: Memory-Mapped I/O
• Registers and I/O Devices are given an address in the system’s memory map:
• Everything is treated the same as memory.
• To communicate with an I/O, we write to and read from these addresses.
• These are achieved with simple load and store assembler commands.
int peek (char *location) {
// Read from a memory-mapped address
• In C, we can define two functions, return *location;

peek and poke, to accomplish this }


void poke (char *location, char newval) {
easily: // Write to a memory-mapped address
(*location) = newval;
• Now to access a register, just }

define its address, and use #define DEV1 0x1000


these functions ...
dev_status = peek(DEV1)
...
5 poke(DEV1,8); © Adam May
Teman,
13, 2021
General Purpose I/O (GPIO)
• Most microcontrollers have a set of general purpose input/output (GPIO) pins.
• Can be configured as input pins or output pins.
• Can be programmed by software for various purposes.
GPIO D
Config Q
Register

Output External
Memory- Enable GPIO Pin
Mapped D
Output Q
Register

Memory-
Mapped D
Input Q
Register
6 © Adam May
Teman,
13, 2021
Example: Blinking a LED
• First, configure the GPIO to be an output.
• Next, create an infinite loop that:
• Toggles the state of the output. #define GPIO_CONFIG_REG 0x10000000
#define GPIO_OUTPUT_REG 0x10000001
• Waits for a given period. #define GPIO_BLINK_PIN 0b00000001
#define BLINK_PERIOD 1000000
int main () {
0 // Set LED connected GPIO PIN to output
toggle_config |= peek(GPIO_CONFIG_REG);
poke(GPIO_CONFIG_REG,toggle_config);
0→1→0→1
while (true) {
// Toggle the state of the GPIO output register
output_status = peek(GPIO_OUTPUT_REG);
poke(GPIO_OUTPUT_REG,output_status~GPIO_BLINK_PIN);
// Wait for a predefined delay
wait(BLINK_PERIOD);
}
}
7 © Adam May
Teman,
13, 2021
Communicating Off-Chip
• What if we want to communicate with something more sophisticated
than a LED or a button?
• We need a communication protocol.
• Introducing UART
• The Universal Asynchronous
Receiver/Transmitter

Baud Rate = 1
• Baud Rate bit time
• Number of bits per unit time
UART1 UART2
data bits
BW =  Baud Rate
• Bandwidth frame bits
• Data per unit time
8 © Adam May
Teman,
13, 2021
Can we use UART for our router?
BAUD RATE: 115,200 bits/sec

SAMPLE RATE: 230,400 samples/sec

CODE: 40 instructions/sample

OVERHEAD: 9,216,000 instructions/sec

9.2% of CPU time * Assuming a 100MHz


clock frequency
© Adam May
Teman,
13, 2021
Offload the CPU with a controller
• UART is a slow serial protocol
• One bit is transferred at a time at a low baud rate (e.g., 1200-115200 bits/sec).
• Integrate a specific UART controller that offloads the CPU
• Communicate with the UART through a wider register (e.g., byte, 32-bit).
• Use a Shift Register to serialize/deserialize parallel data
1-byte → 16X speedup
• Use a FIFO to buffer several CPU transactions 64 byte FIFO→512X speedup
Stop 7 6 5 4 3 2 1 0 Start
Shift U0Tx Stop 7 6 5 4 3 2 1 0 Start
1 Data 0
clock Shift 1 Data 0 U0Rx
Transmit shift register clock OE BE PE FE Receive shift register
16-element
FIFO TXEF Fifo empty flag 12-bit, 16-element
RXFE Fifo empty flag
FIFO
TXFF Fifo full flag RXFF Fifo full flag
Write data UART0_DR_R Read data UART0_DR_R
Transmit data register Receive data register Source: Bard, EE319K
11 Transmit Parallel-In Serial-Out (PISO) © Adam May
Receive Serial-In Parallel-Out (SIPO) Teman,
13, 2021
But how do we know when it’s done?
• How can the CPU know when a new byte of data is received?
• Simple way: “Polling” do {
// Play games
• Check on the status of the UART every ...
so often to see if data has been received // Poll to see if we're there yet.
status = areWeThereYet();
(or if it is ready to receive new data) } while (status == NO);

• Polling can be carried out with a “busy-wait” loop:


Busy-wait on input from the UART Busy-wait on writing to the UART
while (TRUE) { current_char = mystring;
// Wait until a new character has been read // Continue until the end of string
while (peek(UART_IN_STATUS)==0); while (*current_char != ‘\0’) {
// Read the new character // Wait until the UART is ready
achar=(char)peek(UART_DATA); while (peek(UART_OUT_STATUS)!=0);
} // Send character to UART
poke(UART_DATA_OUT, *current_char);
// update character pointer
current_char++;
12 } © Adam May
Teman,
13, 2021
Interrupts
• An interrupt is an asynchronous signal from a peripheral to the processor.
• Can be generated from peripherals external or internal
to the processor, as well as by software.
• Frees up the CPU, while the
peripheral is doing its job.

• Upon receiving an interrupt:


• The CPU decides when to handle the interrupt
• When ready, the CPU acknowledges the interrupt Source: Computers as
Components
• The CPU calls an interrupt service routine (ISR)
• Upon finishing, the ISR returns and the CPU continues operation

14 © Adam May
Teman,
13, 2021
RATE: 115200 bit/sec
0.1152 Mbps

RANGE: 15m
Ethernet
• Widely used for realization of Local Area Networks (LANs)
• Bus with single signal path
• Nodes are not synchronized → Collisions
• Arbitration: “Carrier Sense Multiple Access
with Collision Detection (CSMA/CD)” Source: Computers as
• If collision → wait for random time → retransmit. Components

• Ethernet packet:
• Addresses
• Variable-length data payload: 46 – 1518 bytes
• Throughput:
• 10M = 2.5 x 4bit 100M = 25 x 4bit 1G = 125 x 8bit

21 © Adam May
Teman,
13, 2021
Side note: The OSI Model Ethernet
• The Open Systems Interconnection (OSI) model defines seven network layers.
1. Physical: electrical and physical components
2. Data Link: Peer2Peer communication across
a singe physical layer.
3. Network: basic routing over the link.
4. Transport: ensure data is delivered in the proper
order and without errors across multiple links.
5. Session: interaction of end-user services
across a network
6. Presentation: defines data exchange formats
including encryption and compression.
7. Application: interface between the network
Source: Computers as
and end-user Components
22 © Adam May
Teman,
13, 2021
Ethernet

© Adam May
Teman,
13, 2021
Let’s try a simple interface: APB
• 32-bit bus
• Two phase access:
• Address phase
• Read/Write phase

Source: ARM

© Adam May
Teman,
13, 2021
Is APB Sufficient?
ETH RATE: 109 bits/sec
APB TRANSFER WIDTH: 32 bits
APB RATE: 2 cycles/transfer
CLOCK: 108 cycles/sec
APB THROUGHPUT: 1.6 x 109 bits/sec

© Adam May
Teman,
13, 2021
So let’s make it faster: AHB
• Wider bus (>32 bits)
• Pipelined address and R/W phases (X2 throughput)
• Supports Bursts

Source: ARM
© Adam May
Teman,
13, 2021
Is AHB fast enough?
ETHERNET RATE: 2 x 109 bits/sec

AHB TRANSFER WIDTH: 64 bits


AHB RATE: 1 cycle/transfer
CLOCK: 108 cycles/sec
AHB THROUGHPUT: 6.4 x 109 bits/sec

© Adam May
Teman,
13, 2021
Can the CPU support this?
ETHERNET RATE: 2 x 109 bits/sec

CPU WORD: 32 bits


INSTRUCTIONS PER SW/LW: 3 inst/load
CLOCK: 108 cycles/sec
CPU THROUGHPUT: 1.1 x 109 bits/sec

© Adam May
Teman,
13, 2021
DMA
• Direct memory access (DMA) is a bus operation that allows reads and writes
not controlled by the CPU.
• A DMA transfer is controlled by a DMA controller (DMAC)
that requests control of the bus from the CPU.
• After gaining control, the DMA controller performs read and write operations
directly between devices and memory.

• DMA adds two new signals:


• Bus request
• Bus grant

Source: Computers as
Components
31 © Adam May
Teman,
13, 2021
DMA Registers
• The CPU controls the DMA operation through registers in the DMA controller.
• Starting address register
• Length register
• Status register – to start and stall the DMA
• After the DMA operation is complete,
the DMA controller interrupts the CPU to tell it that the transfer is done.
• DMA controllers usually use short bursts (e.g., 4-16 words) to only occupy the
bus for a few cycles at a time

Source: Computers as
32 Components © Adam May
Teman,
13, 2021
Data Rate or Packet Rate?
BIT RATE: 2 x 109 bits/sec
each packet includes 20 bytes of overhead

DATA RATE: BIT-RATE x P/(P+20)


P=64 → 2 x 0.76 x 109 bits/sec
Larger packets mean
interconnect is busier P=1518 → 2 x 0.98 x 109 bits/sec

PACKET RATE: BIT-RATE / ((P+20) x 8)


P=64 → 2 x 1.48 x 106 packets/sec
Smaller packets mean
CPU has to do more P=1518 → 2 x 81.2 x 103 packets/sec
© Adam May
Teman,
13, 2021
How does this affect the CPU?
• The CPU (in a router) needs to handle the packet
• i.e., figure out where to send the packet to.
• So, all it cares about is packet rate
• How much work can the CPU do on each packet?
• For packets with 1518 bytes of data, the packet rate is about 160K packets/sec
• At 100 MHz → 615 instructions per packet → Not that much
• For 64 byte packets → 34 instructions per packet → Infeasible!

• What can we do???


• Trivial solution: Raise the frequency
• Still not enough: Add additional CPUs (ASIPs)
• Better solution: Integrate dedicated hardware (Accelerators)
35 © Adam May
Teman,
13, 2021
A typical router SoC

HOST
ASIP ASIP ASIP
TRAFFIC 0 1 2
PARSER CLASSIFIER POLICER
MANAGEMENT

I/F I/F I/F I/F I/F I/F


0 1 2 3 4 5

© Adam May
Teman,
13, 2021
But what about memory?
• We now have a processor that can communicate with peripherals,
with an off-chip network, etc.
But what about memory?
• Our router needs a lot of memory:
• To buffer packets
• To store routing tables
• To host the operating system
• …
• The on-chip memory (~MB) is nowhere near enough.
We need to use DRAM
37 © Adam May
Teman,
13, 2021
DRAM Organization

Rank

Source: Onur Mutlu

Chip
Bank

38 Source: Bruce Jacob © Adam May


Teman,
13, 2021
8K
DRAM Organization
• So, for an 1GB DIMM (i.e., 8 chips),
we need chips with 1Gb of memory
• e.g., 128k x 8k

128K 1 Gb
• But that is a lot of rows…

© Adam May
Teman,
13, 2021
8K
DRAM Banks
• So, we break it into 8 banks of 16k rows 16K 128 Mb
• And readout an entire row to a buffer
• Once the row is buffered, we can directly
access any byte in the buffer.

8K

16K 128 Mb
Source: Computers as Components
© Adam May
Teman,
13, 2021
8K
DRAM Banks B0/R0 B0/R0
B1/R1 B1/R1
16K 128 Mb B0/R0 B0/R0
B1/R1 B1/R1
B0/R0 B0/R0
B1/R1 B1/R1
8K B0/R3 B0/R0
B0/R0 B1/R1
16K 128 Mb B1/R2 B0/R3
B1/R1 B1/R2
© Adam May
Teman,
13, 2021
AXI
WRITE ADDR CHANNEL

WRITE DATA CHANNEL

MASTER SLAVE
WRITE RESPONSE CHANNEL

READ ADDR CHANNEL

WRITE DATA CHANNEL


© Adam May
Teman,
13, 2021
CACHE
But what happens on Startup?
• Start by reading from a BootROM
• A small piece of memory, which contains
Source: Intel
the very first code that is executed upon reset.
• Either hard-wired (mask-ROM) or rewriteable (embedded Flash)
• Can use bootstraps/fuses to change configuration.
• Then move on to the Bootloader
• Usually stored on rewriteable flash (i.e., SD card)
• Configures the chip and some of the peripherals
• Loads the end application (e.g., OS) from storage (flash, SSD, HDD)
• Passes control to the end application
• The BootROM and bootloader can be combined
• The BootROM of an x86 system is called the BIOS
45 © Adam May
Teman,
13, 2021
Summary
• Processors are great at processing.
• They are not so great at data movement.
• They are not so great at doing simple tasks.

• For a well-defined task, dedicated hardware


will never lose to a processor.

• With Processors we gain flexibility.


• SW development is faster than HW.
• Bugs in SW are much cheaper to fix.
© Adam May
Teman,
13, 2021
Main References
• Tzachi Noy, “Interfaces: External/Internal, or why CPUs suck”, 2019
• Wolf, “Computer as Components - Principles of Embedded Computing System
Design,” Elsevier 2012

48 © Adam May
Teman,
13, 2021

You might also like