[go: up one dir, main page]

0% found this document useful (0 votes)
51 views9 pages

GuptaIN01-Algorithms For Packet Classification

The document discusses packet classification, a process essential for categorizing packets into flows in Internet routers, which is crucial for services like firewalls and quality of service. It outlines various algorithms for packet classification, categorized into basic search, geometric, heuristic, and hardware-specific algorithms, and emphasizes the challenges of classifying packets based on multiple fields. The paper also highlights performance metrics for classification algorithms, including search speed, storage requirements, and flexibility in rule specification.

Uploaded by

zoukaiass
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views9 pages

GuptaIN01-Algorithms For Packet Classification

The document discusses packet classification, a process essential for categorizing packets into flows in Internet routers, which is crucial for services like firewalls and quality of service. It outlines various algorithms for packet classification, categorized into basic search, geometric, heuristic, and hardware-specific algorithms, and emphasizes the challenges of classifying packets based on multiple fields. The paper also highlights performance metrics for classification algorithms, including search speed, storage requirements, and flexibility in rule specification.

Uploaded by

zoukaiass
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 9

Algorithms for Packet Classification

Pankaj Gupta and Nick McKeown, Stanford University

Abstract
The process of categorizing packets into “flows” in an Internet router is called packet
classification. All packets belonging to the same flow obey a predefined rule and
are processed in a similar manner by the router. For example, all packets with the
same source and destination IP addresses may be defined to form a flow. Packet
classification is needed for non-best-effort services, such as firewalls and quality of
service; services that require the capability to distinguish and isolate traffic in differ-
ent flows for suitable processing. In general, packet classification on multiple fields is
a difficult problem. Hence, researchers have proposed a variety of algorithms which,
broadly speaking, can be categorized as basic search algorithms, geometric algorithms,
heuristic algorithms, or hardware-specific search algorithms. In this tutorial we
describe algorithms that are representative of each category, and discuss which type
of algorithm might be suitable for different applications.

U ntil recently, Internet routers provided only best-


effort service, servicing packets in a first-come-
first-served manner. Routers are now called on to
provide different qualities of service to different
applications, which means routers need new mechanisms
such as admission control, resource reservation, per-flow
queuing, and fair scheduling. All of these mechanisms
Table 2 shows the flows into which an incoming packet
must be classified by the router at interface X. Note that the
flows specified may or may not be mutually exclusive. For
example, the first and second flow in Table 2 overlap. This is
common in practice, and when no explicit priorities are speci-
fied, we follow the convention that rules closer to the top of
the list take priority.
require the router to distinguish packets belonging to differ-
ent flows. Problem Statement
Flows are specified by rules applied to incoming packets. Each rule of a classifier has d components. R[i] is the ith com-
We call a collection of rules a classifier. Each rule specifies a ponent of rule R, and is a regular expression on the ith field
flow to which a packet may belong based on some criteria of the packet header. A packet P is said to match rule R, if "i,
applied to the packet header (shown in Fig. 1). To illustrate the ith field of the header of P, satisfies the regular expression
the variety of classifiers, consider some examples of how pack- R[i]. In practice, a rule component is not a general regular
et classification can be used by an Internet service provider expression but is often limited by syntax to a simple
(ISP) to provide different services. Figure 2 shows ISP1 con- address/mask or operator/number(s) specification. In an
nected to three different sites: enterprise networks E1 and E2, address/mask specification, a 0 (1) at bit position x in the
and a network access point1 (NAP), which is in turn connect- mask denotes that the corresponding bit in the address is a
ed to ISP2 and ISP3. ISP1 provides a number of different ser- don’t care (significant) bit. Examples of operator/number(s)
vices to its customers, as shown in Table 1. specifications are eq 1232 and range 34-9339. Note that a pre-
fix can be specified as an address/mask pair where the mask is
contiguous (i.e., all bits with value 1 appear to the left of bits
Service Example with value 0 in the mask). It can also be specified as a range
of width equal to 2 t where t = 32 – prefixlength. Most com-
Packet filtering Deny all traffic from ISP3 (on interface X)
destined to E2.
1 A network access point is a network site which acts as an exchange point for
Policy routing Send all voice-over-IP traffic arriving from
Internet traffic. ISPs connect to the NAP to exchange traffic with other ISPs.
E1 (on interface Y) and destined to E2
via a separate ATM network.

Accounting and Treat all video traffic to E1 (via interface Y) Flow Relevant packet fields
as highest priority and perform accounting
for the traffic sent this way. Email and from ISP2 Source link-layer address, source transport
port number
Traffic rate limiting Ensure that ISP2 does not inject more than
10 Mb/s of e-mail traffic and 50 Mb/s of From ISP2 Source link-layer address
total traffic on interface X.
From ISP3 and going Source link-layer address,
Traffic shaping Ensure that no more than 50 Mb/s of Web to E2 destination network-layer address
traffic is injected into ISP2 on interface X.
■ Table 2. Flows into which an incoming packet must be classi-
■ Table 1. Customer services provided by ISP1. fied by the router at interface X.

24 0890-8044/01/$10.00 © 2001 IEEE IEEE Network • March/April 2001


L4-SP L4-DP L4-PROT L3-SA L3-DA L3-PROT L2-SA L2-DA
Payload 16b 16b 8b 32b 32b 8b 48b 48b
monly occurring specifications can be
represented by ranges.
An example real-life classifier in Transport layer header Network layer header Link layer header
four dimensions is shown in Table 3.
By convention, the first rule R1 is of L2: Layer 2 (e.g., Ethernet) DA: Destination address
highest priority, and rule R7 is of low- L3: Layer 3 (e.g., IP) SA: Source address
est priority. Some example classifica- L4: Layer 4 (e.g., TCP) PROT: Protocol
tion results are shown in Table 4. SP: Source port
DP: Destination port
Longest prefix matching for routing
lookups is a special case of one-dimen- ■ Figure 1. This figure shows some of the header fields (and their widths) that might be used
sional packet classification. All packets for classifying the packet. Although not shown in this figure, higher-layer (e.g., application-
destined to the set of addresses level) headers may also be used.
described by a common prefix may be
considered to be part of the same flow.
The address of the next hop to which the packet should
be forwarded is the associated action. The length of the Y E1
prefix defines the priority of the rule.

ISP1
Performance Metrics for Classification ISP2
Algorithms NAP X
Z
• Search speed — Faster links require faster classifica-
ISP3 E2
tion. For example, links running at 10 Gb/s can Router
bring 31.25 million packets/s (assuming minimum-
sized 40-byte TCP/IP packets).
■ Figure 2. An example network of an ISP (ISP1) connected to two enter-
• Low storage requirements — Small storage require-
prise networks (E1 and E2) and to two other ISP networks across a net-
ments enable the use of fast memory technologies
work access point (NAP).
like static random access memory (SRAM). SRAM
can be used as an on-chip cache by a software algo-
rithm and as on-chip SRAM for a hardware algorithm.
• Ability to handle large real-life classifiers. • Flexibility in specification — A classification algorithm should
• Fast updates — As the classifier changes, the data struc- support general rules, including prefixes, operators (range,
ture needs to be updated. We can categorize data struc- less than, greater than, equal to, etc.), and wildcards. In some
tures into those which can add or delete entries applications, noncontiguous masks may be required.
incrementally, and those which need to be reconstructed
from scratch each time the classifier changes. When the Classification Algorithms
data structure is reconstructed from scratch, we call it pre-
processing. The update rate differs among different appli- Background
cations: a very low update rate may be sufficient in For the next few sections, we will use the example classifier in
firewalls where entries are added manually or infrequent- Table 5 repeatedly. The classifier has six rules in two fields
ly, whereas a router with per-flow queues may require very labeled F1 and F2; each specification is a prefix of maximum
frequent updates. length 3 bits. We will refer to the classifier as C = {Rj} and
• Scalability in the number of header fields used for classification. each rule Rj as a 2-tuple: <Rjl, Rj2>.

Rule Network-layer destination Network-layer source (address/ Transport-layer Transport-layer Action


(address/mask) mask) destination protocol

R1 152.163.190.69/255.255.255.255 152.163.80.11/255.255.255.255 * * Deny

R2 152.168.3.0/255.255.255.0 152.163.200.157/255.255.255.255 eq www udp Deny

R5 152.163.198.4/255.255.255.255 152.163.160.0/255.255.252.0 gt 1023 tcp Permit

R6 0.0.0.0/0.0.0.0 0.0.0.0/0.0.0.0 * * Permit

■ Table 3. A real-life classifier in four dimensions.

Packet Network-layer destination Network-layer source Transport-layer Transport-layer protocol Best matching rule,
header destination action

P1 152.163.190.69 152.163.80.11 www tcp R1, deny

P2 152.168.3.21 152.163.200.157 www udp R2, deny

P3 152.168.198.4 152.163.160.10 1024 tcp R5, permit

■ Table 4. Example classification results.

IEEE Network • March/April 2001 25


111
P
Definition 1 — Given a set of N disjoint ranges G = {Gi = [li,
110
ui]} that form a partition of the number line [0,2W – 1], i.e., li
and ui are such that l1 = 0, li £ ui, li+1 = ui + 1, uN = 2W – 1;
R5 R6
101 the range lookup problem is to find the range GP (and any asso-
ciated information) that contains an incoming point.
To assess the increased complexity of ranges, we can con-
100
vert each range to a set of prefixes (a prefix of length corre-
sponds to a range where the least significant bits of l are all 0
011 and those of u are all 1) and use a longest prefix matching
R2 algorithm. Table 6 shows some examples of range-to-prefix
010
conversions for W = 4.
R3 A W-bit range can be represented by at most 2W – 2 prefixes
(see the last row of Table 6 as an example), which means a pre-
001 fix matching algorithm can find ranges with times as much stor-
R1
age. Feldman and Muthukrishnan [3] show a reduction of
000 ranges to prefix lookup with a twofold storage increase that can
be used in some specific multidimensional classification schemes.
000 010 100 110
001 011 101 111 Taxonomy of Classification Algorithms
The classification algorithms we will describe here can be cat-
■ Figure 3. A geometric representation of the classifier in Table 5. egorized into the four classes shown in Table 7.
A packet represents a point, for instance P(011,110), in two- We now proceed to describe representative algorithms
dimensional space. Note that R4 is hidden by R1 and R2. from each class.

Basic Data Structures


Bounds from Computational Geometry — There is a simple geo- Linear Search — The simplest data structure is a linked list of
metric interpretation of packet classification. While a prefix repre- rules stored in order of decreasing priority. A packet is com-
sents a contiguous interval on the number line, a two-dimensional pared with each rule sequentially until a rule is found that
rule represents a rectangle in two-dimensional Euclidean
space, and a rule in d dimensions represents a d-dimen-
sional hyper-rectangle. A classifier is therefore a collec- Search path
tion of prioritized hyper-rectangles, and a packet header 1
represents a point in d dimensions. For example, Fig. 3 0
shows the classifier in Table 5 geometrically in which F1-trie
high-priority rules overlay lower-priority rules. Classify- 1
ing a packet is equivalent to finding the highest-priority 0
rectangle that contains the point representing the pack-
et. For example, point P(011,110) in Fig. 3 would be
classified by rule R5. 0 1
1 0
There are several standard geometry problems R3
(e.g., ray shooting, point location, and rectangle 0
R4
enclosure) that resemble packet classification. Point 1 R5 R6 F2-tries
location involves finding the enclosing region of a R1
point, given a set of nonoverlapping regions. The
R2
best bounds for point location in N rectangular
regions and d > 3 dimensions are O(logN) time with ■ Figure 4. A hierarchical trie data structure. The gray pointers are the next-
O(Nd) space;2 or O(logN)d–1) time with O(N) space trie pointers. The path traversed by the query algorithm on an incoming
[1, 2]. In packet classification, hyper-rectangles can packet (000, 010) is shown.
overlap, making classification at least as hard as
point location. Hence, a solution is either impracti-
cably large (with 100 rules and 4 fields, Nd space is Search path F1-trie
about 100 MBytes) or too slow ((logN) d–1 is about 1
350 memory accesses). 0
We can conclude that:
• Multifield classification is considerably more com- 0
plex than one-dimensional longest prefix matching.
• Complexity may require that practical solutions
use heuristics. 0
1 0 1 F2-tries
1 0 1 R3

Range Lookups — Packet classification is made yet 0 R4 R5 x


1 1 R6 R6
more complex by the need to match on ranges as R5 R5
well as prefixes. A range lookup for a dimension of
R1 R2
W width bits can be defined as: R2

■ Figure 5. A set-pruning trie data structure. The gray pointers are the "next-
2The time bound for d ≤ 3 is O(loglogN) [1] but has large trie" pointers. The path traversed by the query algorithm on an incoming
constant factors. packet (000, 010) is shown.

26 IEEE Network • March/April 2001


Rule F1 F2

R1 00* 00*
matches all relevant fields. While simple and stor- prefix of n1, and so on for all dimensions. The rules
age-efficient, this algorithm clearly has poor scaling R2 0* 01*
are replicated to ensure that every matching rule
properties; the time to classify a packet grows lin- will be encountered in the path. The query time is
early with the number of rules. R3 1* 0* reduced to O(dW) at the expense of increased stor-
age of O(NddW) since a rule may need to be repli-
Hierarchical Tries — A d-dimensional hierarchical R4 00* 0* cated O(Nd) times. Update complexity is O(Nd);
radix trie is a simple extension of the one dimen- hence, this data structure works only for relatively
sional radix trie data structure, and is constructed R5 0* 1* static classifiers.
recursively as follows. If d is greater than 1, we first
construct a 1-dimensional trie, called the F1-trie, R6 * 1* Geometric Algorithms
on the set of prefixes {Rj1}, belonging to dimension ■ Table 5. An exam- Grid-of-tries — The grid-of-tries data structure,
F1 of all rules in the classifier, C = {Rj}. For each ple classifier. proposed by Srinivasan et al. [5] for 2D classifica-
prefix, p, in the F1-trie, we recursively tion, reduces storages space by allo-
construct a (d – 1) -dimensional hierar- cating a rule to only one trie node as
chical trie, T p , on those rules which Range Constituent prefixes in a hierarchical trie, and yet achieves
specify exactly p in dimension F1, that O(W) query time by precomputing
is, the set of rules {Rj:Rj1 = p}. Prefix [4,7] 01** and storing a switch pointer in some
p is linked to the trie Tp using a next- trie nodes. A switch pointer is labeled
trie pointer. The storage complexity of [3,8] 0011, 01**, 1000 0 or 1 and guides the search process.
the data structure for an N-rule classi- [1,14] 0001, 001*, 01**, 10**, 110*, 1110
The conditions that must be satisfied
fier is O(NdW). The data structure for for a switch pointer labeled b (b = 0
the classifier in Table 5 is shown in Fig. ■ Table 6. Examples of range-to-prefix conver- or 1) to exist from a node w in the
4. Hierarchical tries are sometimes sions for W = 4. trie Tw to a node x of another trie Tx
called multilevel tries, backtracking- are (Fig. 6):
search tries, or trie-of-tries. • T x and T w are distinct tries built
Classification of an incoming packet (n 1, n 2, …, n d) pro- on the prefix components of dimension F2. Tx and Tw are
ceeds as follows. The query algorithm first traverses the F1- pointed to by two distinct nodes, say r and s respectively of
trie based on the bits in n1. At each F1-trie node encountered, the same trie, T, built on prefix components of dimension
the algorithm follows the next-trie pointer (if present) and tra- F1.
verses the (d – 1)-dimensional trie. The query time complexity • The bit-string that denotes the path from the root node to
for d dimensions is therefore O(Wd). Incremental updates can node w in trie Tw concatenated with the bit b is identical to
be carried out similarly in O(d2W) time since each component the bit-string that denotes the path from the root node to
of the updated rule is stored in exactly one location at maxi- node x in the trie Tx.
mum depth O(dW). • Node w does not have a child pointer labeled b.

Set-Pruning Tries — A set-pruning trie data structure [4] is simi-


lar, but with reduced query time obtained by replicating rules to Category Algorithms
eliminate recursive traversals. The data structure for the classifi-
er in Table 5 is shown in Fig. 5. The query algorithm for an Basic data structures Linear search, caching, hierarchical tries,
incoming packet (n1, n2, …, nd) need only traverse the F1-trie to set-pruning tries
find the longest matching prefix of n1, follow its next-trie point- Geometry-based Grid-of-tries, AQT, FIS
er (if present), traverse the F2-trie to find the longest matching
Heuristic RFC, hierarchical cuttings, tuple-space
search

Hardware only Ternary CAM, bitmap-intersection

■ Table 7. Four classes of classification algorithms.

s F1-trie Search path


1 F1-trie
r 0
1
T 0
0
R3
F2-tries
0 1 1 1
1
y 0 0
R4 1 F2-tries
W 1
x R5 R6
R1
R2
Tw Tx
■ Figure 7. The grid-of-tries data structure. The switch pointers
■ Figure 6. The conditions under which a switch pointer exists are shown dashed. The path traversed by the query algorithm on
from node w to x. an incoming packet (000, 010) is shown.

IEEE Network • March/April 2001 27


Cross-product table
r11 r12 r13
111 (r11,r21) R1
P ranges, Gk, of size sk = |Gk|, projected by rule specifica-
110 (r11,r22) R2 tions in each dimension k,1 £ k £ d. Let r kj , 1 £ j £ s k ,
r23 R5 R6 101 denote the jth range in Gk. A cross-product table CT of size
(r11,r23) R5
100 (r12,r21) d
011 P sk
r22 R2 (r12,r22) R2 k =1
R3
010 R5 is constructed, and the best matching rule for each entry
r21 R1 001 (r13,r21) R3
Ê r i1 , r i2 ,º r id ˆ ,1 £ i £ s ,1 £ k £ d
000 (r13,r22) d ¯ k k
R3 Ë1 2
000 010 100 110
001 011 101 111 (r13,r23) R5 is precomputed and stored. Classifying a packet (n1, n2,
…, n d) involves a range lookup in each dimension k to
■ Figure 8. The table produced by the cross-producting algorithm and identify the range r kik containing point vk. The tuple
its geometric representation.
r1i1 , r2i2 ,º rdid

• Node s in trie T is the closest ancestor of node r that satis- is then found in the cross-product table CT which contains the
fies the above conditions. precomputed best matching rule. Figure 8 shows an example.
If the query algorithm traverses paths U1(s, root (Tx), y, x) Given that N prefixes leads to at most 2N – 2 ranges, sk £ 2N
and U2(r, root (Tw), w) in a hierarchical trie, it need only tra- and CT is of size O(Nd). The lookup time is O(dtRL) where tRL is
verse the path (s, r, root (Tw), w, x) on a grid-of-tries. This is the time complexity of finding a range in one dimension.
because paths U1 and U2 are identical (by condition 2 above) Because of its high worst case storage complexity, cross-product-
till U1 terminates at node w because it has no child branch ing is suitable for very small classifiers. Reference [5] proposes
(by condition 3). The switch pointer eliminates the need for using an on-demand cross- producting scheme together with
backtracking in a hierarchical trie without the storage of a set- caching for classifiers bigger than 50 rules in five dimensions.
pruning trie. Each bit of the packet header is examined at Updates require reconstruction of the cross-product table, so
most once, so the time complexity reduces to O(W), while cross-producting is suitable for relatively static classifiers.
storage complexity O(NW) is the same as a 2D hierarchical
trie. However, switch pointers makes incremental updates dif- A 2D Classification Scheme [6] — Lakshman and Stiliadis [6]
ficult, so the authors [5] recommend rebuilding the data struc- propose a 2D classification algorithm where one dimension,
ture (in time O(NW)) for each update. An example of the say F1, is restricted to have prefix specifications while the sec-
grid-of-tries data structure is shown in Fig. 7. ond dimension, F2, is allowed to have arbitrary range specifi-
Reference [5] reports 2 Mbytes of storage for a 20,000 2D cations. The data structure first builds a trie on the prefixes of
classifier with destination and source IP prefixes. The stride of dimension F1, and then associates a set Gw of nonoverlapping
the destination (source) prefix trie was 8 (5) bits, respectively, ranges to each trie node, w, that represents prefix p. These
leading to a maximum of nine memory accesses. ranges are created by (possibly overlapping) projections on
Grid-of-tries works well for 2D classification, and can be used dimension F2 of those rules, S w , that specify exactly p in
for the last two dimensions of a multidimensional hierarchical dimension F1. A range lookup data structure (e.g., an array or
trie, decreasing the classification time complexity by a factor of a binary search tree) is then constructed on Gw and associated
W to O(NWd–1). As with hierarchical and set-pruning tries, grid- with trie node w. An example is shown in Fig. 9.
of-tries handles range specifications by splitting into prefixes. Searching for point P(n1, n2) involves a range lookup in data
structure Gw for each trie node, w, encountered. The search in
Cross-Producting — Cross-producting [5] is suitable for an Gw returns the range containing n2, and hence the best match-
arbitrary number of dimensions. Packets are classified by com- ing rule. The highest priority rule is selected from the rules
posing the results of separate 1D range lookups for each {Rw} for all trie nodes encountered during the traversal.
dimension, as explained below. The storage complexity is O(NW) because each rule is
Constructing the data structure involves computing a set of stored only once in the data structure. Queries take O(WlogN)
time because an O(logN) range lookup is performed for every
node encountered in the F1-trie. This can be reduced
to O(W + logN) using fractional cascading [7], but
that makes incremental updates impractical.
0 1
F1-trie Area-Based Quadtree — The area-based quadtree
0 (AQT) was proposed by Buddhikot et al. [8] for 2D
classification. AQT allows incremental updates
Search path whose complexity can be traded off with query time
by a tunable parameter. Each node of a quadtree [7]
represents a 2D space that is decomposed into four
equal-sized quadrants, each of which is represented
000, 001, 011 010, 011, 100, 111 100, 111 000, 011 by a child node. The initial 2D space is recursively
decomposed into four equal-sized quadrants till each
R1 R2 R5 R6 R3 quadrant has at most one rule in it (Fig. 10 shows an
R4 example of the decomposition). Rules are allocated
to each node as follows. A rule is said to cross a
■ Figure 9. The data structure for the example classifier of Table 5. The quadrant if it completely spans at least one dimen-
search path for example packet P(011, 110) resulting in R5 is also shown. sion of the quadrant. For instance, rule R6 spans the

28 IEEE Network • March/April 2001


quadrant represented by the root node in NW (00) NE (10) 00 01 10 11
Fig. 10, while R5 does not. If we divide the
2D space into four quadrants, rule R5 cross-
es the northwest quadrant while rule R3
crosses the southwest quadrant. We call the
set of rules crossing the quadrant repre- SW (01) SE (11)
sented by a node in dimension k the k-
crossing filter set (k- CFS) of that node.
Two instances of the same data structure
are associated with each quadtree node — ■ Figure 10. A quadtree constructed by decomposition of two-dimensional space.
each stores the rules in k-CFS (k = 1,2). Each decomposition results in four quadrants.
Since rules in crossing filter sets span at
least one dimension, only the range speci-
fied in the other dimension need be stored.
111
Queries proceed two bits at a time by trans- P
posing one bit from each dimension, with 110
two 1D lookups being performed (one for R5 R6 Search path {R6}
101
each dimension on k-CFS) at each node. 00
100 10
Figure 11 shows an example. 01
Reference [8] proposes an efficient update 011 {R2, R4} {R5}
algorithm that, for N two-dimensional rules, R2 R3
has O(NW) space complexity, O(aW) search 010 01 {R3}
time and O(aa÷N) update time, where a is a R1 001 {R1}
tunable integer parameter.
000
Fat Inverted Segment Tree (FIS-Tree) — Feld- 000 010 100 110
001 011 101 111
man and Muthukrishnan [3] propose the fat
inverted segment tree (FIS-tree) for 2D ■ Figure 11. An AQT data structure. The path traversed by the query algorithm for an
classification as a modification of a segment incoming packet P(001, 010), yields R1 as the best matching rule.
tree. A segment tree [7] stores a set S of
possibly overlapping line segments to
answer queries such as finding the highest priority line seg- n1. The query algorithm then follows the up-pointers from this
ment containing a given point. A segment tree is a balanced leaf node toward the root node, carrying out 1D range lookups
binary search tree containing the endpoints of the line seg- at each node. The highest-priority rule containing the given
ments in S. Each node, w, represents a range Gw, leaves rep- point is calculated at the end of the traversal.
resent the original line segments in S, and parent nodes Queries on an l-level FIS-tree have complexity O((l +
represent the union of the ranges represented by their chil- 1)tRL) with storage complexity O(lnl+1), where tRL is the time
dren. A line segment is allocated to a node w if it contains Gw for a 1D range lookup. Storage space can be traded off with
but not Gparent(w). The highest-priority line segment allocated search time by varying l. Modifications to the FIS-tree are
to a node is precomputed and stored at the node. A query tra- necessary to support incremental updates; even then, it is easi-
verses the segment tree from the root, calculating the highest er to support inserts than deletes [3]. The static FIS-tree can
priority of all the precomputed segments encountered. Figure be extended to multiple dimensions by building hierarchical
12 shows an example segment tree.
An FIS-tree is a segment tree
with two modifications:
R6
• The segment tree is compressed
R5
(made “fat”) by increasing the
degree to more than two in order R2
to decrease its depth, and occu- R4 R3
pies a given number of levels l. R1
• Up-pointers from child to parent
nodes are used.
The data structure for two dimen- 000 001 011 111
sions consists of an FIS-tree on
dimension F1 and a range lookup {000,111}
{R1, R2, R4, R5} {R2, R5} {R3}
data associated with each node. An {R6}
{000,001}
instance of the range lookup data
structure associated with node w of {010,011} {100,111}
{R2, R5} {R3}
the FIS-tree stores the ranges
{000,011} {100,111}
formed by the F2-projections of {R6}
those classifier rules whose F1-pro-
jections were allocated to w. {R1, R4} {000,111}
A query for point P(n1, n2) first {000,001} {010,011}
solves the range lookup problem
on dimension F1. This returns a Segment tree FIS-tree
leaf node of the FIS-tree represent-
ing the range containing the point ■ Figure 12. The segment tree and the 2-level FIS-tree for the classifier of Table 5.

IEEE Network • March/April 2001 29


Simple one-step classification
2S=2128 2T=212 life 4D classifiers of up to 1700 rules, RFC
appears practical for 10 Gb/s line rates in
hardware and 2.5 Gb/s rates in software. How-
ever, the storage space and preprocessing time
grow rapidly for classifiers larger than about
Recursive classification 6000 rules. An optimization described in [9]
reduces the storage requirement of a 15,000
2S=2128 264 224 2T=212 four-field classifier to below 4 Mbytes.

Phase 0 Phase 1 Phase 2 Phase 3 Hierarchical Intelligent Cuttings — HiCuts [10]


partitions the multidimensional search space
■ Figure 13. Showing the basic idea of RFC. The reduction is carried out in multi- guided by heuristics that exploit the structure
ple phases, with a reduction in phase I being carried out recursively on the image of the classifier. Each query leads to a leaf
of the phase I – 1. The example shows the mapping of bits to bits in three phases. node in the HiCuts tree, which stores a small
number of rules that can be searched sequen-
tially to find the best match. The characteris-
FIS-trees, but the bounds are similar to other methods stud- tics of the decision tree (its depth, degree of each node, and
ied earlier [3]. the local search decision to be made at each node) are chosen
Measurements on real-life 2D classifiers are reported in [3] while preprocessing the classifier based on its characteristics
using the static FIS-tree data structure. Queries took 15 or (see [10] for the heuristics used).
less memory operations with a two-level tree, 4–60K rules, Each node, n, of the tree represents a portion of the geo-
and 5 Mbytes of storage. Large classifiers with one million 2D metric search space. The root node represents the complete
rules required three levels, 18 memory accesses per query, and d-dimensional space, which is partitioned into smaller geomet-
100 Mbytes of storage. ric subspaces, represented by its child nodes, by cutting across
one of the d dimensions. Each subspace is recursively parti-
Heuristics tioned until no subspace has more than B rules, where B is a
As we saw earlier, the packet classification problem is expen- tunable parameter of the preprocessing algorithm. An exam-
sive to solve in the worst case: theoretical bounds state that ple is shown in Fig. 15 for two dimensions with B = 2.
solutions to multifield classification require either storage that Parameters of the HiCuts algorithm can be tuned to trade
is geometric or a number of memory accesses that is polyloga- off query time against storage requirements. On 40 real-life
rithmic in the number of classification rules. We can expect 4D classifiers with up to 1700 rules, HiCuts requires less than
that classifiers in real networks have considerable structure 1 Mbyte of storage with a worst case query time of 20 memory
and redundancy that might be exploited by a heuristic. That is accesses, and supports fast updates.
the motivation behind the algorithms described in this section.
Tuple Space Search — The basic tuple space search algorithm
Recursive Flow Classification — RFC [9] is a heuristic for packet [11] decomposes a classification query into a number of exact
classification on multiple fields. Classifying a packet involves match queries. The algorithm first maps each d-dimensional
mapping bits in the packet header to a T bit action identifier, rule into a d-tuple whose ith component stores the length of the
where T = logN, T « S. A simple but impractical method could prefix specified in the ith dimension of the rule (the scheme
precompute the action for each of the
2S different packet headers, yielding the
action in one step. RFC attempts to Preprocessed tables
perform the same mapping over several
phases, as shown in Fig. 13; at each
stage the algorithm maps one set of val-
ues to a smaller set. In each phase a set
of memories return a value shorter (i.e.,
expressed in fewer bits) than the index
of the memory access. The algorithm, indx
illustrated in Fig. 14, operates as fol-
lows:
• In the first phase, d fields of the
packet header are split up into mul-
tiple chunks that are used to index
into multiple memories in parallel.
The contents of each memory are Packet
chosen so that the result of the
lookup is narrower than the index.
• In subsequent phases, memories are
indexed using the results from earli-
er phases.
• In the final phase, the memory yields
the action.
The algorithm requires construction
of the contents of each memory, Phase 0 Phase 1 Phase 2 Phase 3
detailed in [9].
Reference [9] reports that with real- ■ Figure 14. Packet flow in RFC.

30 IEEE Network • March/April 2001


111 (8*8,X,4)
P Cut across X
supports only prefix specifications). 110
Hence, the set of rules mapped to R5 R6 101
the same tuple are of a fixed and (2*8,Y,2)
100 R2 R3 R3
known length, and can be stored in R5 R6 R6
a hash table. Queries perform exact Cut
011 across Y
match operations on each of the R2
hash tables corresponding to all pos- R3 010
sible tuples in the classifier. An R1 R5
R1 001
example is shown in Fig. 16. R2
Query time is M hashed memory 000
accesses, where M is the number of 000 010 100 110
tuples in the classifier. Storage com- 001 011 101 111
plexity is O(N) since each rule is
stored in exactly one hash table. ■ Figure 15. A possible HiCuts tree for the example classifier in Table 5. Each ellipse in the tree
Incremental updates are supported denotes an internal node with a tuple (size of 2D space represented, dimension to cut across,
and require just one hashed memo- number of children). Each square is a leaf node that contains the actual classifier rules.
ry access to the hashed table associ-
ated with the tuple of the modified
rule. In summary, the tuple space
search algorithm performs well for Rule Specification Tuple Tuple Hash table entries
multiple dimensions in the average
R1 (00*,00*) (2,2) (0,1) {R6}
case if the number of tuples is
small. However, the use of hashing R2 (0**,01*) (1,2) (1,1) {R3,R5}
makes the time complexity of R3 (1**,0**) (1,1) (1,2) {R2}
searches and updates nondetermin-
istic. The number of tuples could R4 (00*,0**) (2,1) (2,1) {R4}
be very large, up to O(W d ), in the R5 (0**,1**) (1,1) (2,2) {R1}
worst case. Furthermore, since the
scheme supports only prefixes, the R6 (***,1**) (0,1)
storage complexity increases by a
factor of O(W d ) for generic rules ■ Figure 16. The tuples and associated hash tables in the tuple space search scheme for the
because each range could be split example classifier of Table 5.
into O(W) prefixes in the manner
explained earlier.
plicative factor of 900. Newer TCAMs, based on DRAM
Hardware-Based Algorithms technology, have been proposed and promise higher densi-
Ternary CAMs — A TCAM stores each W-bit field as a (val, ties. One unresolved issue with DRAM-based CAMs is the
mask) pair; where val and mask are each W-bit numbers. For detection of soft errors caused by alpha particles.
example, if W = 5, a prefix 10* will be stored as the pair • TCAMs dissipate more power than RAM solutions because an
(10000, 11000). An element matches a given input key by address is compared against every TCAM element in parallel.
checking if those bits of val for which the mask bit is 1 match At the time of writing, a 2 Mb TCAM chip running at 50 MHz
those in the key.
A TCAM is used as shown in Fig. 17. The TCAM
memory array stores rules in decreasing order of pri-
orities, and compares an input key against every ele- Destination
address
ment in the array in parallel. The N-bit bit vector,
matched, indicates which rules match, so the N-bit
priority encoder indicates the address of the highest- Memory location: 1 2 N
priority match. The address is used to index into a
Memory
RAM to find the action associated with this prefix. array TCAM
TCAMs are increasingly being deployed because
of their simplicity and speed (the promise of single Matched: 0 1 0 1
clock-cycle classification). Several companies produce
2 Mb TCAMs capable of single and multifield classi-
fication in as little as 10 ns. Both faster and denser Priority
TCAMs can be expected in the near future. There encoder
are, however, some disadvantages to TCAMs:
• A TCAM is less dense than a RAM, storing fewer Memory
bits in the same chip area. One bit in an SRAM typ- location
ically requires 4–6 transistors, while one bit in a
Action RAM
TCAM requires 11–15 transistors [12]. A 2 Mb memory
TCAM running at 100 MHz costs about $70 today,
while 8 Mb of SRAM running at 200 MHz costs
about $30. Furthermore, range specifications need
to be split into multiple masks, reducing the number Next hop
of entries by up to (2W–2)d in the worst case. If only
two 16-bit dimensions specify ranges, this is a multi- ■ Figure 17. The lookup operation using a ternary CAM.

IEEE Network • March/April 2001 31


Dimension 1 Dimension 2
of the d dimensions. Each lookup
r11 {R1,R2,R4,R5,R6} 110111 r21 {R1,R3,R4} 101100 returns a bitmap representing the
r12 {R2,R5,R6} 010011 r22 {R2,R3} 011000 matching rules (precomputed for
each range) in that dimension. The
r13 {R3,R6} 001001 r23 {R5,R6} 000111 d sets are intersected (a simple bit-
wise AND operation) to give the
Query on P(011,010): 010011 Dimension 1 bitmap set of matching rules, from which
000111 Dimension 2 bitmap the best matching rule is found
(Fig. 18).
000011 Since each bitmap is N bits wide,
R5 Best matching rule and there are O(N) ranges in each
of d dimensions, the storage space
■ Figure 18. Bitmap tables used in the bitmap-intersection classification scheme. See Fig. 8 for a consumed is O(dN2). Query time is
description of the ranges. Also shown is a classification query on an example packet P(011, 110). O(dt RL + dN/w) where t RL is the
time to do one range lookup and w
is the memory width. Time com-
Algorithm Worst-case time Worst-case storage plexity can be reduced by a factor of d by looking up each
complexity complexity dimension independently in parallel. Incremental updates are
not supported.
Linear search N N Reference [6] reports that the scheme can support up to
512 rules with a 33 MHz field-programmable gate array and
Ternary CAM 1 N five 1 Mb SRAMs, classifying 1 Mpacket/s. The scheme
d
works well for a small number of rules in multiple dimen-
Hierarchical tries W NdW
sions, but suffers from a quadratic increase in storage space
d and linear increase in classification time with the size of the
Set-pruning tries dW N
classifier. A variation is described in [6] that decreases stor-
Grid-of-tries Wd–1 NdW age at the expense of increased query time.

Cross-producting dW Nd
References
[1] M.H. Overmars and A.F. van der Stappen, “Range Searching and Point
Location Among Fat Objects,” J. Algorithms, vol. 21, no. 3, Nov. 1996,
FIS-tree (l + 1)W l ¥ N1+1/l pp. 629–56.
[2] F. Preparata and M. I. Shamos, Computational Geometry: An Introduction,
RFC d Nd Springer-Verlag, 1985.
[3] A. Feldman and S. Muthukrishnan, “Tradeoffs for Packet Classification,”
Bitmap-intersection dW + N/memwidth dN2 Proc. INFOCOM, vol. 3, Mar. 2000, pp. 1193–1202.
[4] P. Tsuchiya, “A Search Algorithm for Table Entries with Non-contiguous
Wildcarding,” unpublished report, Bellcore.
HiCuts d Nd [5] V. Srinivasan et al., “Fast and Scalable Layer four Switching,” Proc. ACM
Sigcomm, Sept. 1998, pp. 203–14.
Tuple space search N N [6] T. V. Lakshman and D. Stiliadis, “High-Speed Policy-based Packet Forward-
ing Using Efficient Multi-dimensional Range Matching,” Proc. ACM Sig-
■ Table 8. A summary of classification schemes. comm, pp. 191–202, Sept. 1998.
[7] M. de Berg, M. van Kreveld, and M. Overmars, Computational Geometry:
Algorithms and Applications, Springer-Verlag, 2nd rev. ed. 2000.
[8] M. M. Buddhikot, S. Suri, and M. Waldvogel, “Space Decomposition Tech-
dissipates about 7 W [13, 14]. In comparison, an 8 Mb SRAM niques for Fast Layer-4 Switching,” Proc. Conf. Protocols for High Speed
running at 200 MHz dissipates approximately 2 W [15]. Networks, Aug. 1999, pp. 25–41.
TCAMs are appealing for relatively small classifiers, but [9] P. Gupta and N. McKeown, “Packet Classification on Multiple Fields,” Proc.
will probably remain unsuitable in the near future for: Sigcomm, Comp. Commun. Rev., vol. 29, no. 4, Sept. 1999, pp. 147–60.
[10] P. Gupta and N. McKeown, “ Classification Using Hierarchical Intelligent
• Large classifiers (256K–1M rules) used for microflow recog- Cuttings,” Proc. Hot Interconnects VII, Aug. 1999, Stanford,; also available
nition at the edge of the network in IEEE Micro, vol. 20, no. 1, Jan./Feb.2000, pp. 34–41.
• Large classifiers (128–256K rules) used at edge routers that [11] V. Srinivasan, S. Suri, and G. Varghese, “Packet Classification using Tuple
manage thousands of subscribers (with a few rules per sub- Space Search,” Proc. ACM Sigcomm, Sept. 1999, pp. 135–46.
[12] F. Shafai et al., “Fully Parallel 30-Mhz, 2.5 Mb CAM,” IEEE J. Solid-State
scriber) Circuits, vol. 33, no. 11, Nov. 1998.
• Extremely high-speed (greater than 200 Mpackets/s) classifi- [13] Netlogic Microsystems: http://www.netlogicmicro.com. CIDR products:
cation http://www.netlogicmicro.com/products/cidr/cidr.html
• Price-sensitive applications [14] Sibercore Technologies: http://www.sibercore.com
[15] ZBT-Datasheet docs from IDT: http://www.idt.com/products/
pages/ZBT_DS_p.html
Bitmap-Intersection — The bitmap-intersection classification
scheme proposed in [6] is based on the observation that the Biographies
set of rules, S, that match a packet is the intersection of d P ANKAJ G UPTA (pankaj@stanford.edu) got his Ph.D. in computer science from
sets, Si, where Si is the set of rules that match the packet in Stanford University in 2000. His research interests lie in the architecture and
the ith dimension alone. While cross-producting precomputes design of high-speed switches and routers, and in traffic engineering for inte-
S and stores the best matching rule in S, this scheme com- grated optical and data networks. He has previously worked for Cisco Systems,
and is the cofounder of Sahasra Networks, Inc.
putes S and the best matching rule during each classification
operation. NICK MCKEOWN (nickm@stanford.edu) is assistant professor of electrical engi-
In order to compute intersection of sets in hardware, each neering and computer science at Stanford University, where he works on the the-
set is encoded as an N-bit bitmap with each bit corresponding ory, architecture, and implementation of high-speed Internet routers and
switches. He completed his Ph.D. at Berkeley in 1995. He has worked for
to a rule. The set of matching rules is the set of rules whose Hewlett-Packard Labs and Cisco Systems, and co-founded Abrizio Inc. He is as
corresponding bits are 1 in the bitmap. A query is similar to an editor of IEEE Transactions on Networking, the Robert Noyce Faculty Fellow
cross-producting: First, a range lookup is performed in each at Stanford, and a research fellow of the Alfred P. Sloan Foundation.

32 IEEE Network • March/April 2001

You might also like