[go: up one dir, main page]

0% found this document useful (0 votes)
542 views34 pages

Kernel-Bypass Techniques For High-Speed Network Packet Processing

The document outlines the journey of a packet through the Linux network stack, from arriving at the network interface card (NIC) to being processed by applications. It discusses the need for kernel bypass techniques to improve packet processing performance. Several kernel bypass techniques are presented, including user-space packet processing with Data Plane Development Kit (DPDK) and Netmap, and a user-space network stack called mTCP.

Uploaded by

FourthLion Bflex
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
542 views34 pages

Kernel-Bypass Techniques For High-Speed Network Packet Processing

The document outlines the journey of a packet through the Linux network stack, from arriving at the network interface card (NIC) to being processed by applications. It discusses the need for kernel bypass techniques to improve packet processing performance. Several kernel bypass techniques are presented, including user-space packet processing with Data Plane Development Kit (DPDK) and Netmap, and a user-space network stack called mTCP.

Uploaded by

FourthLion Bflex
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Kernel-bypass techniques for

high-speed network packet processing


CS 744

Presenters: Rinku Shah, Priyanka Naik


{rinku, ppnaik}@cse.iitb.ac.in

Course Instructor: Prof. Umesh Bellur

Department of Computer Science & Engineering


Indian Institute of Technology Bombay
Outline
● The journey of a packet through the Linux network stack

● Need for kernel bypass techniques for packet processing

● Kernel-bypass techniques

○ User-space packet processing

■ Data Plane Development Kit (DPDK)

■ Netmap

○ User-space network stack

■ mTCP

● What’s trending?
2
Typical packet flow

TX RX

Application Application

Transport (L4) Transport (L4)

Network (L3) Network (L3)

Data link (L2) Data link (L2)

NIC driver NIC driver

NIC hardware NIC hardware

3
What does a packet contain?

Ethernet header IP header TCP header payload FCS

dest src type


MAC MAC

src dst ... checksum ...


... length ... IP header src dst port port
type csum IP IP

FCS: Frame Check Sequence


4
Outline

● The journey of a packet through the Linux network stack

● Need for kernel bypass techniques for packet processing

● Kernel-bypass techniques

○ User-space packet processing

■ Data Plane Development Kit (DPDK)

■ Netmap

○ User-space network stack

■ mTCP

● What’s next??
5
RX path: Packet arrives at the destination NIC

User space
Applications NIC receives the packet

Kernel space ● Match destination MAC address


● Verify Ethernet checksum (FCS)
NIC driver
packet
buffer Packets accepted at the NIC
packet
Hardware
interrupt

buffer
TX RX ...
● DMA the packet to RX ring buffer
packet
buffer
● NIC triggers an interrupt

TX/RX rings
NIC ● Circular queue
● Shared between NIC and NIC driver
Hardware
RX queue
● Content: Length + packet buffer pointer
6
Interrupt processing in the linux kernel
● Top-half
○ Minimal processing
● Bottom-half
○ Rest of interrupt processing

7
Top-half interrupt processing

RX CPU interrupts the process in execution


Application
Switch from user space to kernel space
Transport (L4)

Network (L3) Top-half interrupt processing


Data link (L2) ● Lookup IDT (Interrupt Descriptor Table)

NIC driver ● Call corresponding ISR (Interrupt Service Routine)


○ Acknowledge the interrupt
NIC hardware
○ Schedule bottom-half processing
● Switch back to user space

8
Bottom-half processing

CPU initiates the bottom-half when it is free (soft-irq)


User space
Applications
Switch from user space to kernel space
Kernel space
s
k Driver dynamically allocates an sk-buff (a.k.a., skb)
NIC driver b

packet Oops!!
buffer
packet
Hardware
interrupt

buffer
TX RX ... sk-buff (sk-buff tutorial link)
packet
buffer In-memory data structure that contains packet metadata
● Pointers to packet headers and payload
● More packet related information ...
NIC

9
Bottom-half processing

User space
Applications NIC driver processing

Kernel space 1. Driver dynamically allocates an sk-buff

For all packets


s

in buffer
k 2. Update sk-buff with packet metadata
NIC driver b

packet 3. Remove the Ethernet header


buffer
packet
Hardware

4. Pass sk-buff to the network stack


interrupt

buffer
TX RX ...
packet
buffer

Call L3 protocol handler

NIC

10
L3/L4 processing

L3-specific processing

RX 1. Route lookup

Application 2. Combine fragmented packets


Common processing
Transport (L4) 3. Call L4 protocol handler
1. Match destination IP/socket
Network (L3)
2. Verify checksum L4-specific processing
Data link (L2)
3. Remove header
NIC driver

NIC hardware

11
L3/L4 processing
User space
Application L3-specific processing

Kernel space 1. Route lookup

2. Combine fragmented packets


Network stack W R
Q Q 3. Call L4 protocol handler
NIC driver skb

packet
L4-specific processing
buffer
Hardware

packet
interrupt

buffer 1. Handle TCP state machine


TX RX
...
packet 2. Enqueue to socket read queue
buffer

3. Signal the socket

NIC
12
Application processing
User space
Application
On socket read: user space to kernel space
Kernel space ● Dequeue packet from socket receive queue
System calls
W R (kernel space)
Network stack Q Q
skb ● Copy packet to application buffer (user space)
NIC driver

packet
● Release sk-buff
buffer
Hardware

packet
interrupt

buffer ● Return back to the application


TX RX
...
packet
buffer kernel space to user space

NIC
13
Transmit path of an application packet
User space
Application

Kernel space
System calls
On socket write: user space to kernel space
Network stack
● Writes the packet to the kernel buffer
NIC driver
packet ● Calls socket’s send function (e.g., sendmsg)
buffer
packet
Hardware
interrupt

buffer
RX TX ...
packet
buffer

NIC
14
L4/L3 processing
User space
Application L4-specific processing
1. Allocate sk-buff
Kernel space 2. Enqueue sk-buff to socket write queue
3. Call L3 protocol handler

Network stack W R
Q Q Common processing
NIC driver skb 1. Build header
2. Add header to packet buffer
packet
buffer 3. Update sk-buff
Hardware

packet
interrupt

buffer
RX TX
...
L3-specific processing
packet
buffer 1. Fragment, if needed
2. Call L2 protocol handler

NIC
15
L2 processing
User space
Application
Enqueue packet to queue discipline (qdisc)
Kernel space ● Hold packets in a queue
● Apply scheduling policies (e.g. FIFO, priority)
R W
Q Q

NIC driver skb


qdisc
● Dequeue sk-buff (if NIC has free buffers)
packet
buffer
● Post process sk-buff
Hardware

packet
interrupt

buffer
RX TX
...
qdisc ○ Calculate IP/TCP checksum
queue
packet
buffer ○ … (tasks that h/w cannot do)
● Call NIC driver’s send function

NIC
16
NIC processing
NIC driver
User space ● If hardware transmit queue full
Application ○ Stop qdisc queue
● Otherwise:
Kernel space ○ Map packet data for DMA
○ Tells NIC to send the packet
NIC driver

packet NIC
buffer
● Calculates ethernet frame checksum (FCS)
Hardware

packet
interrupt

buffer
RX TX
...
qdisc ● Sends packet to the wire
queue
packet ● Sends an interrupt “Packet is sent” (kernel
buffer
space to user space)
● Driver frees the sk-buff; starts the qdisc queue

NIC Hardware
TX queue
Transmit and receive packet processing pipeline DONE!!
17
Packet processing overheads in the kernel
● Too many context switches!!

○ Pollutes CPU cache

● Per-packet interrupt overhead

● Dynamic allocation of sk-buff

● Packet copy between kernel and user space

● Shared data structures

Cannot achieve line-rate for recent high speed NICs!! (40Gbps/100Gbps)

18
Optimizations to accelerate kernel packet processing
● NAPI (New API) Reading link

● GRO (Generic Receive Offload) GRO+GSO

● GSO (Generic Segmentation Offload) GRO+GSO with DPDK

● Use of multiple hardware queues Multiqueue NIC, Supplement: RSS+RPS+...

● ...

19
Outline
● The journey of a packet through the Linux network stack

● Need for kernel bypass techniques for packet processing

● Kernel-bypass techniques

○ User-space packet processing

■ Data Plane Development Kit (DPDK)

■ Netmap

○ User-space network stack

■ mTCP

● What’s trending?
20
Packet Processing Overheads in Kernel

● Context switch between kernel and userspace


Application read user space

Kernel kernel space

NIC

21
Packet Processing Overheads in Kernel

Application buffer
in userspace
● Context switch between kernel and userspace
Application read user space
● Packet copy between kernel and userspace

Buffer in kernel Kernel kernel space


memory

NIC

22
Packet Processing Overheads in Kernel

● Context switch between kernel and userspace


Application
● Packet copy between kernel and userspace
● Dynamic allocation of sk_buff
skb skb
● Per packet interrupt receive Kernel transmit

● Shared data structures

NIC

23
Overcome Overheads in Kernel: Bypass the kernel

L2-L4 packet
Application user space processing Application
Shared
Pre-allocated buffers user space

Packet processing
Kernel kernel space

NIC NIC

Context switch between kernel and userspace


Packet copy between kernel and userspace
Dynamic allocation of sk_buff

24
Interrupt vs Poll Mode

Interrupt Mode Poll Mode

CPU NIC CPU NIC

● NIC notifies it needs servicing ● CPU keeps checking the NIC


● Interrupt is a hardware mechanism ● Polling is done with help of control
● Handled using interrupt handler bits (Command-ready bit)
● Interrupt overhead for high speed ● Handled by the CPU
traffic ● Consumes CPU cycles but handles
● Interrupt for a batch of packets high speed traffic

25
Interrupt vs Poll Mode: Kernel bypass techniques

Interrupt Mode Poll Mode

CPU NIC CPU NIC

● NIC notifies it needs servicing ● CPU keeps checking the NIC


● Interrupt is a hardware mechanism ● Polling is done with help of control
● Handled using interrupt handler bits(Command-ready bit)
● Interrupt overhead for high speed ● Handled by the CPU
traffic ● Consumes CPU cycles but handles
high speed traffic

Netmap DPDK
26
Outline
● The journey of a packet through the Linux network stack

● Need for kernel bypass techniques for packet processing

● Kernel-bypass techniques

○ User-space packet processing

■ Data Plane Development Kit (DPDK)

■ Netmap

○ User-space network stack

■ mTCP

● What’s trending?
27
Intel Data Plane Development Kit (DPDK)

User Space

• Poll mode user space drivers (uio) Application


○ Unbinds NIC from kernel
• Mempool: HUGE pages to avoid TLB misses. rte_mbuf

• Rte_mbuf: metadata+ pkt buffer rte_ring rte_mempool

• Cooperative multiprocessing
○ Safe for trusted application Poll Mode Drivers

28

Kernel Space
DPDK NIC
Netmap

• Netmap Rings are memory regions in Application


kernel space shared between application
and kernel User Space

• No extra copy of a packet Sockets


• NIC can work with netmap as well as
kernel drivers (transparent mode) Kernel TCP
Stack

Netmap driver Drivers (ixgbe)


DPDK, netmap manage processing till
L2 of network stack Kernel Space
NIC
netmap 29
What about L3-L7 processing?

Application
● Overheads with L3-L7 processing in kernel
● Shared data structure

● Userspace network stack


○ Over netmap or DPDK
Kernel network
● mTCP: multicore TCP processing
Shared socket
and TCP data
structure

NIC

CPU core

30
Multiqueue NIC

Application

NIC Receive Side Scaling (RSS)

Hash of (src_ip, dst_ip, src_port, dst_port)


Incoming packet to NIC

Application
RX queue Cores

TX queue

31
mTCP: Userspace network stack

Application
● Designed for multicore scalable application
● Per core TCP data structures
Per core mTCP
○ E.g. accept queue, socket list thread
○ Lock free
○ Connection locality
netmap/ DPDK
● Leverages multiqueue support of NIC

Shared data structures NIC

Incoming packets
mTCP
32
Outline
● The journey of a packet through the Linux network stack

● Need for kernel bypass techniques for packet processing

● Kernel-bypass techniques

○ User-space packet processing

■ Data Plane Development Kit (DPDK)

■ Netmap

○ User-space network stack

■ mTCP

● What’s trending?
33
What’s trending?

● Offload application processing to the kernel


○ BPF (Berkeley Packet Filter)
○ eBPF (eXtended BPF) BPF+eBPF+XDP link-1, BPF+eBPF+XDP tutorial link-2
● Offload application processing to the NIC driver
○ XDP (eXpress DataPath) Sample apps for eBPF + XDP
● Offload application processing to programmable hardware
○ Programmable SmartNICs (NPU/DPU)
■ Netronome, Mellanox, Bluefield, Pensando Video on smartNIC architecture + Netronome
NIC specifics
○ Programmable FPGAs
■ Xilinx, Altera
○ Programmable hardware ASICs Programmable network: Intro video , Detailed video link
■ Barefoot Tofino, Cisco’s Doppler, Intel Flexpipe, Cavium’s Xpliant

34

You might also like