GuptaIN01-Algorithms For Packet Classification
GuptaIN01-Algorithms For Packet Classification
Abstract
The process of categorizing packets into “flows” in an Internet router is called packet
classification. All packets belonging to the same flow obey a predefined rule and
are processed in a similar manner by the router. For example, all packets with the
same source and destination IP addresses may be defined to form a flow. Packet
classification is needed for non-best-effort services, such as firewalls and quality of
service; services that require the capability to distinguish and isolate traffic in differ-
ent flows for suitable processing. In general, packet classification on multiple fields is
a difficult problem. Hence, researchers have proposed a variety of algorithms which,
broadly speaking, can be categorized as basic search algorithms, geometric algorithms,
heuristic algorithms, or hardware-specific search algorithms. In this tutorial we
describe algorithms that are representative of each category, and discuss which type
of algorithm might be suitable for different applications.
Accounting and Treat all video traffic to E1 (via interface Y) Flow Relevant packet fields
as highest priority and perform accounting
for the traffic sent this way. Email and from ISP2 Source link-layer address, source transport
port number
Traffic rate limiting Ensure that ISP2 does not inject more than
10 Mb/s of e-mail traffic and 50 Mb/s of From ISP2 Source link-layer address
total traffic on interface X.
From ISP3 and going Source link-layer address,
Traffic shaping Ensure that no more than 50 Mb/s of Web to E2 destination network-layer address
traffic is injected into ISP2 on interface X.
■ Table 2. Flows into which an incoming packet must be classi-
■ Table 1. Customer services provided by ISP1. fied by the router at interface X.
ISP1
Performance Metrics for Classification ISP2
Algorithms NAP X
Z
• Search speed — Faster links require faster classifica-
ISP3 E2
tion. For example, links running at 10 Gb/s can Router
bring 31.25 million packets/s (assuming minimum-
sized 40-byte TCP/IP packets).
■ Figure 2. An example network of an ISP (ISP1) connected to two enter-
• Low storage requirements — Small storage require-
prise networks (E1 and E2) and to two other ISP networks across a net-
ments enable the use of fast memory technologies
work access point (NAP).
like static random access memory (SRAM). SRAM
can be used as an on-chip cache by a software algo-
rithm and as on-chip SRAM for a hardware algorithm.
• Ability to handle large real-life classifiers. • Flexibility in specification — A classification algorithm should
• Fast updates — As the classifier changes, the data struc- support general rules, including prefixes, operators (range,
ture needs to be updated. We can categorize data struc- less than, greater than, equal to, etc.), and wildcards. In some
tures into those which can add or delete entries applications, noncontiguous masks may be required.
incrementally, and those which need to be reconstructed
from scratch each time the classifier changes. When the Classification Algorithms
data structure is reconstructed from scratch, we call it pre-
processing. The update rate differs among different appli- Background
cations: a very low update rate may be sufficient in For the next few sections, we will use the example classifier in
firewalls where entries are added manually or infrequent- Table 5 repeatedly. The classifier has six rules in two fields
ly, whereas a router with per-flow queues may require very labeled F1 and F2; each specification is a prefix of maximum
frequent updates. length 3 bits. We will refer to the classifier as C = {Rj} and
• Scalability in the number of header fields used for classification. each rule Rj as a 2-tuple: <Rjl, Rj2>.
Packet Network-layer destination Network-layer source Transport-layer Transport-layer protocol Best matching rule,
header destination action
■ Figure 5. A set-pruning trie data structure. The gray pointers are the "next-
2The time bound for d ≤ 3 is O(loglogN) [1] but has large trie" pointers. The path traversed by the query algorithm on an incoming
constant factors. packet (000, 010) is shown.
R1 00* 00*
matches all relevant fields. While simple and stor- prefix of n1, and so on for all dimensions. The rules
age-efficient, this algorithm clearly has poor scaling R2 0* 01*
are replicated to ensure that every matching rule
properties; the time to classify a packet grows lin- will be encountered in the path. The query time is
early with the number of rules. R3 1* 0* reduced to O(dW) at the expense of increased stor-
age of O(NddW) since a rule may need to be repli-
Hierarchical Tries — A d-dimensional hierarchical R4 00* 0* cated O(Nd) times. Update complexity is O(Nd);
radix trie is a simple extension of the one dimen- hence, this data structure works only for relatively
sional radix trie data structure, and is constructed R5 0* 1* static classifiers.
recursively as follows. If d is greater than 1, we first
construct a 1-dimensional trie, called the F1-trie, R6 * 1* Geometric Algorithms
on the set of prefixes {Rj1}, belonging to dimension ■ Table 5. An exam- Grid-of-tries — The grid-of-tries data structure,
F1 of all rules in the classifier, C = {Rj}. For each ple classifier. proposed by Srinivasan et al. [5] for 2D classifica-
prefix, p, in the F1-trie, we recursively tion, reduces storages space by allo-
construct a (d – 1) -dimensional hierar- cating a rule to only one trie node as
chical trie, T p , on those rules which Range Constituent prefixes in a hierarchical trie, and yet achieves
specify exactly p in dimension F1, that O(W) query time by precomputing
is, the set of rules {Rj:Rj1 = p}. Prefix [4,7] 01** and storing a switch pointer in some
p is linked to the trie Tp using a next- trie nodes. A switch pointer is labeled
trie pointer. The storage complexity of [3,8] 0011, 01**, 1000 0 or 1 and guides the search process.
the data structure for an N-rule classi- [1,14] 0001, 001*, 01**, 10**, 110*, 1110
The conditions that must be satisfied
fier is O(NdW). The data structure for for a switch pointer labeled b (b = 0
the classifier in Table 5 is shown in Fig. ■ Table 6. Examples of range-to-prefix conver- or 1) to exist from a node w in the
4. Hierarchical tries are sometimes sions for W = 4. trie Tw to a node x of another trie Tx
called multilevel tries, backtracking- are (Fig. 6):
search tries, or trie-of-tries. • T x and T w are distinct tries built
Classification of an incoming packet (n 1, n 2, …, n d) pro- on the prefix components of dimension F2. Tx and Tw are
ceeds as follows. The query algorithm first traverses the F1- pointed to by two distinct nodes, say r and s respectively of
trie based on the bits in n1. At each F1-trie node encountered, the same trie, T, built on prefix components of dimension
the algorithm follows the next-trie pointer (if present) and tra- F1.
verses the (d – 1)-dimensional trie. The query time complexity • The bit-string that denotes the path from the root node to
for d dimensions is therefore O(Wd). Incremental updates can node w in trie Tw concatenated with the bit b is identical to
be carried out similarly in O(d2W) time since each component the bit-string that denotes the path from the root node to
of the updated rule is stored in exactly one location at maxi- node x in the trie Tx.
mum depth O(dW). • Node w does not have a child pointer labeled b.
• Node s in trie T is the closest ancestor of node r that satis- is then found in the cross-product table CT which contains the
fies the above conditions. precomputed best matching rule. Figure 8 shows an example.
If the query algorithm traverses paths U1(s, root (Tx), y, x) Given that N prefixes leads to at most 2N – 2 ranges, sk £ 2N
and U2(r, root (Tw), w) in a hierarchical trie, it need only tra- and CT is of size O(Nd). The lookup time is O(dtRL) where tRL is
verse the path (s, r, root (Tw), w, x) on a grid-of-tries. This is the time complexity of finding a range in one dimension.
because paths U1 and U2 are identical (by condition 2 above) Because of its high worst case storage complexity, cross-product-
till U1 terminates at node w because it has no child branch ing is suitable for very small classifiers. Reference [5] proposes
(by condition 3). The switch pointer eliminates the need for using an on-demand cross- producting scheme together with
backtracking in a hierarchical trie without the storage of a set- caching for classifiers bigger than 50 rules in five dimensions.
pruning trie. Each bit of the packet header is examined at Updates require reconstruction of the cross-product table, so
most once, so the time complexity reduces to O(W), while cross-producting is suitable for relatively static classifiers.
storage complexity O(NW) is the same as a 2D hierarchical
trie. However, switch pointers makes incremental updates dif- A 2D Classification Scheme [6] — Lakshman and Stiliadis [6]
ficult, so the authors [5] recommend rebuilding the data struc- propose a 2D classification algorithm where one dimension,
ture (in time O(NW)) for each update. An example of the say F1, is restricted to have prefix specifications while the sec-
grid-of-tries data structure is shown in Fig. 7. ond dimension, F2, is allowed to have arbitrary range specifi-
Reference [5] reports 2 Mbytes of storage for a 20,000 2D cations. The data structure first builds a trie on the prefixes of
classifier with destination and source IP prefixes. The stride of dimension F1, and then associates a set Gw of nonoverlapping
the destination (source) prefix trie was 8 (5) bits, respectively, ranges to each trie node, w, that represents prefix p. These
leading to a maximum of nine memory accesses. ranges are created by (possibly overlapping) projections on
Grid-of-tries works well for 2D classification, and can be used dimension F2 of those rules, S w , that specify exactly p in
for the last two dimensions of a multidimensional hierarchical dimension F1. A range lookup data structure (e.g., an array or
trie, decreasing the classification time complexity by a factor of a binary search tree) is then constructed on Gw and associated
W to O(NWd–1). As with hierarchical and set-pruning tries, grid- with trie node w. An example is shown in Fig. 9.
of-tries handles range specifications by splitting into prefixes. Searching for point P(n1, n2) involves a range lookup in data
structure Gw for each trie node, w, encountered. The search in
Cross-Producting — Cross-producting [5] is suitable for an Gw returns the range containing n2, and hence the best match-
arbitrary number of dimensions. Packets are classified by com- ing rule. The highest priority rule is selected from the rules
posing the results of separate 1D range lookups for each {Rw} for all trie nodes encountered during the traversal.
dimension, as explained below. The storage complexity is O(NW) because each rule is
Constructing the data structure involves computing a set of stored only once in the data structure. Queries take O(WlogN)
time because an O(logN) range lookup is performed for every
node encountered in the F1-trie. This can be reduced
to O(W + logN) using fractional cascading [7], but
that makes incremental updates impractical.
0 1
F1-trie Area-Based Quadtree — The area-based quadtree
0 (AQT) was proposed by Buddhikot et al. [8] for 2D
classification. AQT allows incremental updates
Search path whose complexity can be traded off with query time
by a tunable parameter. Each node of a quadtree [7]
represents a 2D space that is decomposed into four
equal-sized quadrants, each of which is represented
000, 001, 011 010, 011, 100, 111 100, 111 000, 011 by a child node. The initial 2D space is recursively
decomposed into four equal-sized quadrants till each
R1 R2 R5 R6 R3 quadrant has at most one rule in it (Fig. 10 shows an
R4 example of the decomposition). Rules are allocated
to each node as follows. A rule is said to cross a
■ Figure 9. The data structure for the example classifier of Table 5. The quadrant if it completely spans at least one dimen-
search path for example packet P(011, 110) resulting in R5 is also shown. sion of the quadrant. For instance, rule R6 spans the
Cross-producting dW Nd
References
[1] M.H. Overmars and A.F. van der Stappen, “Range Searching and Point
Location Among Fat Objects,” J. Algorithms, vol. 21, no. 3, Nov. 1996,
FIS-tree (l + 1)W l ¥ N1+1/l pp. 629–56.
[2] F. Preparata and M. I. Shamos, Computational Geometry: An Introduction,
RFC d Nd Springer-Verlag, 1985.
[3] A. Feldman and S. Muthukrishnan, “Tradeoffs for Packet Classification,”
Bitmap-intersection dW + N/memwidth dN2 Proc. INFOCOM, vol. 3, Mar. 2000, pp. 1193–1202.
[4] P. Tsuchiya, “A Search Algorithm for Table Entries with Non-contiguous
Wildcarding,” unpublished report, Bellcore.
HiCuts d Nd [5] V. Srinivasan et al., “Fast and Scalable Layer four Switching,” Proc. ACM
Sigcomm, Sept. 1998, pp. 203–14.
Tuple space search N N [6] T. V. Lakshman and D. Stiliadis, “High-Speed Policy-based Packet Forward-
ing Using Efficient Multi-dimensional Range Matching,” Proc. ACM Sig-
■ Table 8. A summary of classification schemes. comm, pp. 191–202, Sept. 1998.
[7] M. de Berg, M. van Kreveld, and M. Overmars, Computational Geometry:
Algorithms and Applications, Springer-Verlag, 2nd rev. ed. 2000.
[8] M. M. Buddhikot, S. Suri, and M. Waldvogel, “Space Decomposition Tech-
dissipates about 7 W [13, 14]. In comparison, an 8 Mb SRAM niques for Fast Layer-4 Switching,” Proc. Conf. Protocols for High Speed
running at 200 MHz dissipates approximately 2 W [15]. Networks, Aug. 1999, pp. 25–41.
TCAMs are appealing for relatively small classifiers, but [9] P. Gupta and N. McKeown, “Packet Classification on Multiple Fields,” Proc.
will probably remain unsuitable in the near future for: Sigcomm, Comp. Commun. Rev., vol. 29, no. 4, Sept. 1999, pp. 147–60.
[10] P. Gupta and N. McKeown, “ Classification Using Hierarchical Intelligent
• Large classifiers (256K–1M rules) used for microflow recog- Cuttings,” Proc. Hot Interconnects VII, Aug. 1999, Stanford,; also available
nition at the edge of the network in IEEE Micro, vol. 20, no. 1, Jan./Feb.2000, pp. 34–41.
• Large classifiers (128–256K rules) used at edge routers that [11] V. Srinivasan, S. Suri, and G. Varghese, “Packet Classification using Tuple
manage thousands of subscribers (with a few rules per sub- Space Search,” Proc. ACM Sigcomm, Sept. 1999, pp. 135–46.
[12] F. Shafai et al., “Fully Parallel 30-Mhz, 2.5 Mb CAM,” IEEE J. Solid-State
scriber) Circuits, vol. 33, no. 11, Nov. 1998.
• Extremely high-speed (greater than 200 Mpackets/s) classifi- [13] Netlogic Microsystems: http://www.netlogicmicro.com. CIDR products:
cation http://www.netlogicmicro.com/products/cidr/cidr.html
• Price-sensitive applications [14] Sibercore Technologies: http://www.sibercore.com
[15] ZBT-Datasheet docs from IDT: http://www.idt.com/products/
pages/ZBT_DS_p.html
Bitmap-Intersection — The bitmap-intersection classification
scheme proposed in [6] is based on the observation that the Biographies
set of rules, S, that match a packet is the intersection of d P ANKAJ G UPTA (pankaj@stanford.edu) got his Ph.D. in computer science from
sets, Si, where Si is the set of rules that match the packet in Stanford University in 2000. His research interests lie in the architecture and
the ith dimension alone. While cross-producting precomputes design of high-speed switches and routers, and in traffic engineering for inte-
S and stores the best matching rule in S, this scheme com- grated optical and data networks. He has previously worked for Cisco Systems,
and is the cofounder of Sahasra Networks, Inc.
putes S and the best matching rule during each classification
operation. NICK MCKEOWN (nickm@stanford.edu) is assistant professor of electrical engi-
In order to compute intersection of sets in hardware, each neering and computer science at Stanford University, where he works on the the-
set is encoded as an N-bit bitmap with each bit corresponding ory, architecture, and implementation of high-speed Internet routers and
switches. He completed his Ph.D. at Berkeley in 1995. He has worked for
to a rule. The set of matching rules is the set of rules whose Hewlett-Packard Labs and Cisco Systems, and co-founded Abrizio Inc. He is as
corresponding bits are 1 in the bitmap. A query is similar to an editor of IEEE Transactions on Networking, the Robert Noyce Faculty Fellow
cross-producting: First, a range lookup is performed in each at Stanford, and a research fellow of the Alfred P. Sloan Foundation.