[go: up one dir, main page]

100% found this document useful (11 votes)
9K views71 pages

Linux TCP/IP Tuning

The document presents an analysis of TCP performance, covering topics such as connection establishment, congestion control, and flow control. It discusses various tools for network analysis and performance testing, as well as different TCP congestion control algorithms like Reno, Vegas, and Westwood. The presentation aims to provide insights into improving TCP performance over high bandwidth-delay product links and highlights the importance of understanding network behavior and tuning parameters.

Uploaded by

qexing
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (11 votes)
9K views71 pages

Linux TCP/IP Tuning

The document presents an analysis of TCP performance, covering topics such as connection establishment, congestion control, and flow control. It discusses various tools for network analysis and performance testing, as well as different TCP congestion control algorithms like Reno, Vegas, and Westwood. The presentation aims to provide insights into improving TCP performance over high bandwidth-delay product links and highlights the importance of understanding network behavior and tuning parameters.

Uploaded by

qexing
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 71

Analyzing TCP

Performance

Stephen Hemminger
Sr. Staff Engineer
Linux Kongress 2004
2004-09-09

Copyright 2004 OSDL, All rights reserved.
Agenda

■ Introduction
■ TCP for muggles

■ Engineering Process

■ Problem examples

■ Network Tools

■ Wrapup


Copyright 2004 OSDL, All rights reserved. -2-
Outside of scope

■ Non TCP protocols


■ SCTP, multicast, etc
■ Queuing theory - “no math”
■ Hardware and product comparisons


Copyright 2004 OSDL, All rights reserved. -3-
My Background

■ Did TCP back in the “old school”


■ BSD 4.2, Ethernet
■ SMP Unix versions of OSI, Netware, Appletalk, ...
■ Plan9 Hypercube communication
■ Linux
■ Incorporation of TCP research in 2.6 kernel
■ Performance tests for LWE
■ Wizard gap


Copyright 2004 OSDL, All rights reserved. -4-
Limits of my knowledge

■ Only worked with current Linux (2.4/2.6)


■ Will mention tools here that I have not used

extensively
■ Involved in development of Linux not deployment

or research


Copyright 2004 OSDL, All rights reserved. -5-
Agenda

■ Introduction
■ TCP for muggles

■ Engineering Process

■ Problem examples

■ Network Tools

■ Wrapup


Copyright 2004 OSDL, All rights reserved. -6-
TCP for “muggles”

■ connection establishment
■ slow start

■ windows

■ congestion control

■ silly window


Copyright 2004 OSDL, All rights reserved. -7-
Connection establishment

Client Server

SYN
connect
ACK
+
SYN accept
write
Dat
a1
(10
)

ck 11
A
read

Copyright 2004 OSDL, All rights reserved. -8-
ethereal


Copyright 2004 OSDL, All rights reserved. -9-
tcpdump trace

13:28:21.745624 IP 172.20.1.60.38052 > 216.239.39.99.http: S 1765497548:1765497548(0)


win 5840 <mss 1460,sackOK,timestamp 1563951453 0,nop,wscale 7>
13:28:21.831935 IP 216.239.39.99.http > 172.20.1.60.38052: S 227058185:227058185(0)
ack 1765497549 win 8190 <mss 1460>
13:28:21.832035 IP 172.20.1.60.38052 > 216.239.39.99.http: . ack 1 win 5840
13:28:21.832321 IP 172.20.1.60.38052 > 216.239.39.99.http: P 1:126(125) ack 1 win 5840
13:28:21.939237 IP 216.239.39.99.http > 172.20.1.60.38052: . ack 126 win 31460
13:28:21.972448 IP 216.239.39.99.http > 172.20.1.60.38052: P 1:485(484) ack 126 win 31460
13:28:21.972529 IP 172.20.1.60.38052 > 216.239.39.99.http: . ack 485 win 6432
13:28:21.973016 IP 172.20.1.60.38052 > 216.239.39.99.http: F 126:126(0) ack 485 win 6432


Copyright 2004 OSDL, All rights reserved. - 10 -
Flow control

10 10 ( 50 00)
write ACK
Data
1011
Data (1400
2411 )
Data (1
3811 400)
Data (
5211 1400)
(800)

60 10 (0)
Ack
read (1000)
(1000)
k 6010
Ac


Copyright 2004 OSDL, All rights reserved. - 11 -
Retransmission

write
Data
1

Ack 1
Multiple ack's Ack 1
= fast retransmit
Data 2


Copyright 2004 OSDL, All rights reserved. - 12 -
Tcptrace

http://tcptrace.org
Tool to convert captured data into graphs
■ Time sequence graph
■ Throughput

■ RTT

Lots more than time to cover here!


Copyright 2004 OSDL, All rights reserved. - 13 -
Xplot

http://xplot.org
■ Takes plot command scripts

■ Mouse

■ Zoom – drag with the left button


■ Zoom out – click the left button
■ Scroll – drag with middle button
■ Dump – shift-left button produces postscript
■ Shift-middle and shift-right also


Copyright 2004 OSDL, All rights reserved. - 14 -
Time Sequence Graph


Copyright 2004 OSDL, All rights reserved. - 15 -

Copyright 2004 OSDL, All rights reserved. - 16 -
Windows & Buffering

■ Used to isolate TCP from application read/write


■ Used for congestion control

■ Upper bound determined by system parameters


Copyright 2004 OSDL, All rights reserved. - 17 -
Congestion window

■ slow start
■ Window normally starts small
■ Grows in response to ack
■ congestion control
■ Packet loss = congestion


Copyright 2004 OSDL, All rights reserved. - 18 -
Silly Window

write
8k bytes ck [10]
A

“Hey, I am not going to


try and send this data now Read
give me a bigger window 8k bytes
first” [2000]
Ack

Data
OK, (2000
)
thanks


Copyright 2004 OSDL, All rights reserved. - 19 -
Model of TCP networks

Sender Receiver

Send Receive
Window Window

Data

Network

Ack
BDP = Bandwidth (bytes/sec) * Delay (secs/unit)


Copyright 2004 OSDL, All rights reserved. - 20 -
BDP - Bandwidth Delay Product

■ BDP = amount of data in transit


■ Examples

■ DSL/Cable modem (international)


1,000,000 bit/sec
* 1/8 byte/bit
* 500 ms = 62500 bytes
■ Gigabit across US
1,000,000,000 bit/sec
* 1/8 byte/bit
* 70 ms = 8,75 Mbytes


Copyright 2004 OSDL, All rights reserved. - 21 -
Bandwidth Delay Product (BDP)

1000
64K 1M
8K
LAN Research
100
Bandwidth
Mbits/sec

10

1
Broadband

0.1
0.1 1 10 100 1000
Delay (ms)


Copyright 2004 OSDL, All rights reserved. - 22 -
Internet

■ Router queues
■ Delays

■ Speed of light (70ms coast/coast)


■ Slow routers
■ Packet correlation, sizes
■ DoS


Copyright 2004 OSDL, All rights reserved. - 23 -
Extensions for larger windows

■ TCP Selective Acknowlegement (SACK)


RFC2018
■ Don't have to retransmit everything
■ Window scaling (RFC1323)
■ Window size multiplied by 2n
■ Protection Against Wrapped Sequence (PAWS)
■ Timestamp inside each packet


Copyright 2004 OSDL, All rights reserved. - 24 -
TCP options negotiation 1

Window scale by 4
IP 172.20.1.60.32820 > 216.239.39.99.http: S 3599527174:3599527174(0) win 5840
<mss 1460,sackOK,timestamp 2519711 0,nop,wscale 2>
IP 216.239.39.99.http > 172.20.1.60.32820: S 3820474812:3820474812(0) ack 3599527175
win 8190 <mss 1460>
IP 172.20.1.60.32820 > 216.239.39.99.http: . ack 1 win 5840
IP 172.20.1.60.32820 > 216.239.39.99.http: P 1:126(125) ack 1 win 5840

But server doesn't support scaling


Copyright 2004 OSDL, All rights reserved. - 25 -
TCP options negotiation 2

Window scale by 4
IP 172.20.1.60.32823 > 65.172.181.13.http: S 4120108902:4120108902(0) win 5840
<mss 1460,sackOK,timestamp 3036627 0,nop,wscale 2>
IP 65.172.181.13.http > 172.20.1.60.32823: S 2295773021:2295773021(0) ack 4120108903
win 5792
<mss 1460,sackOK,timestamp 1818411318 3036627,nop,wscale 0>
IP 172.20.1.60.32823 > 65.172.181.13.http: . ack 1 win 1460 <nop,nop,timestamp
3036628 1818411318>
IP 172.20.1.60.32823 > 65.172.181.13.http: P 1:144(143) ack 1 win 1460
<nop,nop,timestamp 3036628 1818411318>

Your scaling is okay, but don't scale mine


Copyright 2004 OSDL, All rights reserved. - 26 -
Linux TCP window tuning

■ Send window - net.ipv4.tcp_wmem


■ three values : initial default max
■ default is 4K 16K 128K

■ also limited by net.core.wmem_max


■ Receive window – net.ipv4.tcp_rmem
■ three values : initial default max

■ default is 4K 85K 170K

■ also limited by net.core.rmem_max


Copyright 2004 OSDL, All rights reserved. - 27 -
Linux TCP window tuning

■ Overall memory – net.ipv4.tcp_mem


■ three values : low pressure max
■ automatic value based on system memory
■ Application window – net.ipv4.tcp_app_mem
■ reserved space to handle slow applications


Copyright 2004 OSDL, All rights reserved. - 28 -
But!

■ Some firewalls and routers are buggy


■ Corrupt window scale change N to 0
■ Forget to track state, or read RFC wrong
■ Connections will hang because initial window looks
like a silly window
■ 1% of the net is buggy..
■ Linux 2.6.9 chooses window scale based on
maximum possible receive window
■ Default tcp_rmem => window scale of 2
■ Buggy devices will see ¼ of the real window


Copyright 2004 OSDL, All rights reserved. - 29 -
Break


Copyright 2004 OSDL, All rights reserved. - 30 -
Agenda

■ Introduction
■ TCP for muggles

■ Engineering Process

■ Problem examples

■ Network Tools

■ Wrapup


Copyright 2004 OSDL, All rights reserved. - 31 -
Performance Engineering process

■ Define what your goal


■ Capture information

■ Analyze and form hypothesis

■ Prototype to validate hypothesis

■ If successful

■ Make changes on production system

■ Report problems or patches to others


Copyright 2004 OSDL, All rights reserved. - 32 -
Goal setting

■ Know what is possible:


■ bus bandwidth, network latency, etc.

■ Know your application

■ Compare with similar applications


Copyright 2004 OSDL, All rights reserved. - 33 -
TCP performance testing

■ Goal: Improve TCP performance over high


bandwidth * delay links
■ Plan:

■ New TCP congestion control


■ Validate and test


Copyright 2004 OSDL, All rights reserved. - 34 -
Testing TCP over WAN

■ Want to test performance of TCP over high BDP


links
■ Can't afford a 10Gbit trans-continental link

■ Proposal: emulate network delay over 1Gbit

Ethernet


Copyright 2004 OSDL, All rights reserved. - 35 -
Existing network emulation tools

■Dummynet
http://info.iet.unipi.it/~luigi/ip_dummynet/
I don't want to setup separate FreeBSD machine
■ NISTnet
http://snad.ncsl.nist.gov/itg/nistnet/
Only on 2.4 and not ready to be in main tree


Copyright 2004 OSDL, All rights reserved. - 36 -
Netem

TCP
IP

netem
Ethernet (eth0)
http://developer.osdl.org/shemminger/netem
■ Started out as simple delay only hack

■ Grown up to do all the functionality of NISTnet


Copyright 2004 OSDL, All rights reserved. - 37 -
Current TCP research

■ Alternative TCP congestion


■ Vegas
■ Westwood
■ Binary Increase Congestion Control (BIC)
■ Research community based around Web100


Copyright 2004 OSDL, All rights reserved. - 38 -
TCP Reno

■ Standard default in 2.4/2.6


■ Adjusts congestion window based on packet loss

■ Slow start – window grows slowly

■ Additive Increase window on each Ack

■ Multiplicative Decrease on loss


Copyright 2004 OSDL, All rights reserved. - 39 -
TCP Vegas

■ Original work by Larry Peterson


■ Patches existed for 2.2, 2.4 and part of web100
■ sysctl net.ipv4.tcp_cong_avoid
■ Measure bandwidth based on RTT
■ Adjust congestion window on bandwidth

■ Avoids packet loss


Copyright 2004 OSDL, All rights reserved. - 40 -
TCP Westwood

■ Work by Caludio Casetti


■ Patches for 2.4 by Angelo Dell'Aera
■ sysctl net.ipv4.tcp_westwood
■ Focused on wireless
■ packet loss != congestion
■ Measure bandwidth based on RTT
■ Use normal Reno till congestion then adjust

congestion window based on bandwidth


Copyright 2004 OSDL, All rights reserved. - 41 -
Binary Increase Congestion Control (BIC)

■ Work by Lisung Xu
■ Patches for Web100 (2.4)
■ sysctl net.ipv4.tcp_bic
■ Designed for best high speed networks
■ Modification of Reno

■ Use additive increase when congestion window

is large
■ Binary search increase when window is small


Copyright 2004 OSDL, All rights reserved. - 42 -
Tuning

■ Default tcp parameters not big enough


■ Need bigger send and receive window

■ Send window autosized based on rtt already


■ Receive window autosizing was done in Web100


Copyright 2004 OSDL, All rights reserved. - 43 -
Receiver Tuning

■ Patches from John Heffner


■ sysctl net.ipv4.tcp_moderate_rcvbuf
■ Dynamic Right Sizing (DRS)
■ adjust receive window based on RTT
■ If application doesn't set window then do it for them
■ Window will grow from default to max


Copyright 2004 OSDL, All rights reserved. - 44 -
Receiver auto-tuning

1000

800
Throughput (Mbits/sec)

600

Default
400 Auto Tuned

200

0
0 50 100 150 200


Delay (ms)- 45 -
Copyright 2004 OSDL, All rights reserved.
Throughput vs Delay (initial run)

800
Reno
Vegas
Westwood
700 Bic

600
Bandwidth (Mbits/sec)

500

400

300

200

100

0
0 50 100 150 200
Delay (ms)

Copyright 2004 OSDL, All rights reserved. - 46 -
What's happening

■ NAPI
■ Driver API to allow avoiding interrupts
■ Trades off latency for overall performance
■ E1000 driver
■ Uses NAPI for transmit
Answer: Transmit ring gets full and driver flow
blocks
Solution: set TxDescriptors=1000


Copyright 2004 OSDL, All rights reserved. - 47 -
Thorughput vs Delay (rerun)

800

700

600
Throughput (bits/sec)

500

Reno
400
Vegas
Westwood
300 BIC

200

100

0
0 25 50 75 100 125 150 175 200
Delay (ms)

Copyright 2004 OSDL, All rights reserved. - 48 -
Performance still slow

■ Vegas and Westwood are terrible


■ Not at full link speed

■ Performance falling off with delay


Copyright 2004 OSDL, All rights reserved. - 49 -
Vegas trace with 100ms delay


Copyright 2004 OSDL, All rights reserved. - 50 -
Vegas detail


Copyright 2004 OSDL, All rights reserved. - 51 -
Westwood (70ms)


Copyright 2004 OSDL, All rights reserved. - 52 -
Westwood detail


Copyright 2004 OSDL, All rights reserved. - 53 -
BIC trace (100ms)


Copyright 2004 OSDL, All rights reserved. - 54 -
BIC detail (100ms)


Copyright 2004 OSDL, All rights reserved. - 55 -
How to squeeze out more performance

■ Large MTU (4k) + 63%


■ LAN driver not-module up to 10%
■ Turn off timestamps + 4%
■ Bind IRQ to processor varies


Copyright 2004 OSDL, All rights reserved. - 56 -
Congestion more work

■ Vegas doesn't use available window


■ Does it under estimate bandwidth?
■ Westwood
■ Another bandwidth problem
■ BIC
■ When does it make into binary mode?
■ What is holding back window?
■ Netem
■ Higher resolution? Packet groups?


Copyright 2004 OSDL, All rights reserved. - 57 -
Break


Copyright 2004 OSDL, All rights reserved. - 58 -
Agenda

■ Introduction
■ TCP for muggles

■ Engineering Process

■ Problem examples

■ Network Tools

■ Wrapup


Copyright 2004 OSDL, All rights reserved. - 59 -
Other tools

■ Information about
■ ISP connection
■ Sockets open
■ Testing infrastructure
■ More data capture

■ Monitoring


Copyright 2004 OSDL, All rights reserved. - 60 -
Tools: basic

■ Network path information


■ Ping – send icmp echo
■ Measure of round trip time and loss

■ Can be blocked by firewall

■ Traceroute – use IP source routing


■ Usually blocked now

■ Pathcapture (pcap)
■ Bandwidth and delay measurement


Copyright 2004 OSDL, All rights reserved. - 61 -
Tools: Network interface

■ ifconfig
■ Basic statistics, packets sent/received/errors
■ ip -stats link
■ Alternate newer, may have more info
■ SNMP
■ Remote access to same information
■ Slightly more work


Copyright 2004 OSDL, All rights reserved. - 62 -
Tools: Sockets

■ Netstat
■ TCP statistics
■ Open sockets
■ Ss
■ More statistics available (rtt, etc)
■ Recvmsg
■ Application can see TCP info (cmsg)


Copyright 2004 OSDL, All rights reserved. - 63 -
Tools: test servers

■ SYN test
telnet syntest.psc.edu 7960
■ TCP bandwidth
http://www.epm.ornl.gov/~duniga
n/java/misc/tcpbw.html
http://dslreports.com
■ ANL network config
http://miranda.ctd.anl.gov:7123
■ Path MTU
http://www.ncne.org/jumbogram/mtu_discove
ry.php

Copyright 2004 OSDL, All rights reserved. - 64 -
Tools: testing

■ Ttcp
■ Basic send /receive throughput
■ Iperf
■ Longer running tests and turnaround
■ Netperf
■ Includes cpu and other statistics
■ Dbs
■ Multiclient testing


Copyright 2004 OSDL, All rights reserved. - 65 -
Tools: monitoring

■ Ntop
■ Measure of network activity by service
■ Nice web interface
■ Mailgraph
■ Long term mail statistics
■ Web server activity log analysis


Copyright 2004 OSDL, All rights reserved. - 66 -
Tools: data capture

■ Tcpdump
■ Filter packets by protocol, address, etc
■ Decode many protcols
■ Ethereal
■ GUI interface
■ RMON
■ Remote monitoring
■ Kismet
■ Wireless activity


Copyright 2004 OSDL, All rights reserved. - 67 -
Tools: generators

■ Pktgen
■ Kernel level packet generation
■ Can generate maximum hardware packet rate
■ Network packet generator
■ Application level


Copyright 2004 OSDL, All rights reserved. - 68 -
Tools: simulation

■ Ns
■ Describe overall system
■ Event based simulation
■ Used for protocol analysis
■ SSFnet
■ More detailed models of real hardware


Copyright 2004 OSDL, All rights reserved. - 69 -
Tools: client simulator

■ Web
■ SPECweb, Apache (as), httpload
■ NFS
■ Nfsstone
■ FTP
■ Dkftpbench


Copyright 2004 OSDL, All rights reserved. - 70 -
Conclusion

■ Data capture can provide clues of:


■ Application problems
■ Device problems
■ TCP/IP problems
■ Nothing is ever simple


Copyright 2004 OSDL, All rights reserved. - 71 -

You might also like