Skip to main content

Weirong Jiang

University of Southern California, Electrical Engineering, Graduate Student

Followers

66

Following

0

Public Views

My research is on designing parallel algorithms and architectures for high-performance, low-power, flexible and robust networking systems.

Specifically, I worked on developing novel customized hardware engines (mainly using FPGA) to solve a wide range of network processing problems, from basic packet forwarding (e.g. IP lookup, packet classification in routers / firewalls) to advanced traffic analysis (e.g. regular expression matching, application identification in NIDS). I've participated in multiple R&D projects and published over 20 research papers in major conferences and journals. 4 of my papers have been awarded the best papers.
Supervisors: Viktor K. Prasanna
Address: 3740 McClintock Avenue, EEB-244,
Department of EE-Systems,
University of Southern California,
Los Angeles, CA 90089-2562

less

InterestsView All (17)

Uploads

Papers by Weirong Jiang

Beyond TCAMs: An SRAM-based parallel multi-pipeline architecture for terabit IP lookup

Abstract Continuous growth in network link rates poses a strong demand on high speed IP lookup en... more Abstract Continuous growth in network link rates poses a strong demand on high speed IP lookup engines. While Ternary Content Addressable Memory (TCAM) based solutions serve most of today's high-end routers, they do not scale well for the next-generation. On the other hand, pipelined SRAM-based algorithmic solutions become attractive. Intuitively multiple pipelines can be utilized in parallel to have a multiplicative effect on the throughput.

FEACAN: Front-end acceleration for content-aware network processing

Abstract Modern networks are increasingly becoming content aware to improve data delivery and sec... more Abstract Modern networks are increasingly becoming content aware to improve data delivery and security via content-based network processing. Content-aware processing at the front end of distributed network systems, such as application identification for datacenter load-balancers and deep packet inspection for security gateways, is more challenging due to the wire-speed and low-latency requirement.

Bidirectional Pipelining for Scalable IP Lookup and Packet Classification

Abstract: Both IP lookup and packet classification in IP routers can be implemented by some form ... more Abstract: Both IP lookup and packet classification in IP routers can be implemented by some form of tree traversal. SRAM-based Pipelining can improve the throughput dramatically. However, previous pipelining schemes result in unbalanced memory allocation over the pipeline stages. This has been identified as a major challenge for scalable pipelined solutions. This paper proposes a flexible bidirectional linear pipeline architecture based on widely-used dual-port SRAMs.

High performance packet forwarding on parallel architectures

Internet is built as a packet-switching network. The kernel function of Internet infrastructure, ... more Internet is built as a packet-switching network. The kernel function of Internet infrastructure, including routers and switches, is to forward the packets that are received from one subnet to another subnet. The packet forwarding is accomplished by using the header information extracted from a packet to look up the forwarding table maintained in the routers/switches. Due to rapid growth of network traffic, packet forwarding has long been a performance bottleneck in routers/switches.

Multiroot: Towards Memory-Efficient Router Virtualization

Abstract Network virtualization has become a powerful scheme to make efficient use of networking ... more Abstract Network virtualization has become a powerful scheme to make efficient use of networking hardware. It allows multiple virtual networks to co-exist on the same physical networking substrate. This requires the hardware router to maintain multiple lookup tables. Hence, ultimately the hardware router should be capable of handling packets from different virtual networks. In this paper, we introduce a memory-efficient solution for router virtualization named, Multiroot.

ParaSplit: A Scalable Architecture on FPGA for Terabit Packet Classification

Abstract Packet classification is a fundamental enabling function for various applications in swi... more Abstract Packet classification is a fundamental enabling function for various applications in switches, routers and firewalls. Due to their performance and scalability limitations, current packet classification solutions are insufficient in ad-dressing the challenges from the growing network bandwidth and the increasing number of new applications. This paper presents a scalable parallel architecture, named Para Split, for high-performance packet classification.

A Framework for Security-Enhanced Peer-to-Peer Applications in Mobile Cellular Networks Open Access

Abstract Due to the dual trends of increasing cellular network transmission capacity and coverage... more Abstract Due to the dual trends of increasing cellular network transmission capacity and coverage as well as improving computational capacity, storage and intelligence of mobile handsets, mobile peer-to-peer (MP2P) networking is emerging an attractive research field in recent years. However, these trends have not been clearly articulated in perspective of both technology and business.

Sequence-Preserving Parallel IP Lookup using Multiple SRAM- based Pipelines

SRAM (static random access memory)-based pipelined algorithmic solutions have become competitive ... more SRAM (static random access memory)-based pipelined algorithmic solutions have become competitive alternatives to TCAMs (ternary content addressable memories) for high-throughput IP lookup. Multiple pipelines can be utilized in parallel to improve the throughput further. However, several challenges must be addressed to make such solutions feasible. First, the memory distribution over different pipelines, as well as across different stages of each pipeline, must be balanced. Second, the traffic among these pipelines should be balanced. Third, the intra-flow packet order (i.e. the sequence) must be preserved. In this paper, we propose a parallel SRAM-based multi-pipeline architecture for IP lookup. A two-level mapping scheme is developed to balance the memory requirement among the pipelines as well as across the stages in each pipeline. To balance the traffic, we propose an early caching scheme to exploit the data locality inherent in the architecture. Our technique uses neither a large reorder buffer nor complex reorder logic. Instead, a flow-aware queuing scheme exploiting the flow information is used to maintain the intra-flow sequence. Extensive simulation using real-life traffic traces shows that the proposed architecture with 8 pipelines can achieve a throughput of up to 10 billion packets per second, i.e. 3.2 Tbps for minimum size (40 bytes) packets, while preserving intra-flow packet order.

Scalable Multi-Pipeline Architecture for High Performance Multi-Pattern String Matching

Multi-pattern string matching remains a major performance bottleneck in network intrusion detecti... more Multi-pattern string matching remains a major performance bottleneck in network intrusion detection and anti-virus systems for high-speed deep packet inspection (DPI). Although Aho-Corasick deterministic finite automaton (AC-DFA) based solutions produce deterministic throughput and are widely used in today's DPI systems such as Snort and ClamAV, the high memory requirement of AC-DFA (due to the large number of state transitions in AC-DFA) inhibits efficient hardware implementation to achieve high performance. Some recent work has shown that the AC-DFA can be reduced to a character trie that contains only the forward transitions by incorporating pipelined processing. But they have limitations in either handling long patterns or extensions to support multi-character input per clock cycle to achieve high throughput. This paper generalizes the problem and proves formally that a linear pipeline with H stages can remove all cross transitions to the top H levels of a AC-DFA. A novel and scalable pipeline architecture for memory-efficient multi-pattern string matching is then presented. The architecture can be easily extended to support multi-character input per clock cycle by mapping a compressed AC-DFA onto multiple pipelines. Simulation using Snort and ClamAV pattern sets shows that a 8-stage pipeline can remove more than 99% of the transitions in the original AC-DFA. The implementation on a state-of-the-art field programmable gate array (FPGA) shows that our architecture can store on a single FPGA device the full set of string patterns from the latest Snort rule set. Our FPGA implementation sustains 10+ Gbps throughput, while consuming a small amount of on-chip logic resources. Also desirable scalability is achieved: the increase in resource requirement of our solution is sub-linear with the throughput improvement.

Reducing Dynamic Power Dissipation in Pipelined Forwarding Engines

Power consumption has become a limiting factor in next-generation routers. IP forwarding engines ... more Power consumption has become a limiting factor in next-generation routers. IP forwarding engines dominate the overall power dissipation in a router. Although SRAM-based
pipeline architectures have recently been developed as a promising alternative to power-hungry TCAM-based solutions for high-throughput IP forwarding, it remains a challenge to achieve low power. This paper proposes several novel architecture-specific techniques to reduce the dynamic power consumption in SRAM-based pipelined IP forwarding engines. First, the pipeline architecture itself is built as an inherent cache, exploiting the data locality in Internet traffic. The number of memory accesses which contribute to the majority of power consumption, is thus reduced. No external cache is needed. Second, instead of using a global clock, different pipeline stages are driven by separate clocks. The local locking scheme is carefully designed to exploit the traffic rate variation and improve the caching performance. Third, a fine-grained memory enabling scheme is developed to eliminate unnecessary memory accesses, while preserving the packet order. Simulation experiments using real-life traces show that our solutions can achieve up to 15-fold reduction in dynamic power dissipation, over the baseline pipeline architecture that does not employ the proposed schemes. FPGA implementation results show that our design sustains 40 Gbps throughput for minimum size (40 bytes)
packets while consuming a small amount of logic resources.

Field-Split Parallel Architecture for High Performance Multi-Match Packet Classification Using FPGAs

Multi-match packet classification is a critical function in network intrusion detection systems (... more Multi-match packet classification is a critical function in network intrusion detection systems (NIDS), where all matching rules for a packet need to be reported. Most of the previous work is based on ternary content addressable memories (TCAMs) which are expensive and are not scalable with respect to clock rate, power consumption, and circuit area. This paper studies the characteristics of real-life Snort NIDS rule sets, and proposes a novel SRAM-based architecture. The proposed architecture is called field-split parallel bit vector (FSBV) where some header fields of a packet are further split into bit-level subfields. Unlike previous multi-match packet classification algorithms which suffer from memory explosion, the memory requirement of FSBV is linear in the number of rules. FPGA technology is exploited to provide high throughput and to support dynamic updates. Implementation results show that our architecture can store on a single Xilinx Virtex-5 FPGA the full set of packet header rules extracted from the latest Snort NIDS and sustains 100 Gbps throughput for minimum size (40 bytes) packets. The design achieves 1.25× improvement in throughput while the power consumption is approximately one fourth that of the state-of-the-art solutions.

Large-Scale Wire-Speed Packet Classification on FPGAs

Multi-field packet classification is a key enabling function of a variety of network applications... more Multi-field packet classification is a key enabling function of a variety of network applications, such as firewall processing, Quality of Service differentiation, traffic billing, and other value added services. Although a plethora of research has been done in this area, wire-speed packet classification while supporting large rule sets remains difficult. This paper exploits the features provided by current FPGAs and proposes a decision-tree-based, two-dimensional dual-pipeline architecture for multi-field packet classification. To fit the current largest rule set in the on-chip memory of the FPGA device, we propose several optimization techniques for the state-of-the-art decision-tree-based algorithm, so that the memory requirement is almost linear with the number of rules. Specialized logic is developed to support varying number of branches at each decision tree node. A tree-to-pipeline mapping scheme is carefully designed to maximize the memory utilization. Since our architecture is linear and memory-based, on-the-fly update without disturbing the ongoing operations is feasible. The implementation results show that our architecture can store 10K real-life rules in on-chip memory of a single Xilinx Virtex-5 FPGA, and sustain 80 Gbps (i.e. 2x OC-768 rate) throughput for minimum size (40 bytes) packets. To the best of our knowledge, this work is the first FPGA-based packet classification engine that achieves wire-speed throughput while supporting 10K unique rules.

Beyond TCAMs: An SRAM-based Multi-Pipeline Architecture for Terabit IP Lookup

Continuous growth in network link rates poses a strong demand on high speed IP lookup engines. Wh... more Continuous growth in network link rates poses a strong demand on high speed IP lookup engines. While Ternary Content Addressable Memory (TCAM) based solutions serve most of today's high-end routers, they do not scale well for the next-generation. On the other hand, pipelined SRAM- based algorithmic solutions become attractive. Intuitively multiple pipelines can be utilized in parallel to have a multiplicative effect on the throughput. However, several challenges must be addressed for such solutions to realize high throughput. First, the memory distribution across different stages of each pipeline as well as across different pipelines must be balanced. Second, the traffic on various pipelines should be balanced. In this paper, we propose a parallel SRAM-based multi- pipeline architecture for terabit IP lookup. To balance the memory requirement over the stages, a two-level mapping scheme is presented. By trie partitioning and subtrie-to-pipeline mapping, we ensure that each pipeline contains approximately equal number of trie nodes. Then, within each pipeline, a fine-grained node-to-stage mapping is used to achieve evenly distributed memory across the stages. To balance the traffic on different pipelines, both pipelined prefix caching and dynamic subtrie-to-pipeline remapping are employed. Simulation using real-life data shows that the proposed architecture with 8 pipelines can store a core routing table with over 200 K unique routing prefixes using 3.5 MB of memory. It achieves a throughput of up to 3.2 billion packets per second, i.e. 1 Tbps for minimum size (40 bytes) packets.

Compact architecture for high-throughput regular expression matching on FPGA

Proceedings of the 4th ACM/ …, Jan 1, 2008

Optimizing routing metrics for large-scale multi-radio mesh networks

… , Networking and Mobile …, Jan 1, 2007

Abstract Routing metrics play a critical role in wireless mesh networks (WMNs). Several metrics h... more Abstract Routing metrics play a critical role in wireless mesh networks (WMNs). Several metrics have already been proposed but none of them can well meet the specific requirement brought by large-scale multi-radio mesh networks (LSMRMNs). In LSMRMNs, most of traffic has much longer paths than in small scale WMNs. The channel distribution on a long path thus has a significant impact on the route performance. In this paper, we identify such a challenge and study five existing routing metrics. Then we describe a novel ...

A memory-balanced linear pipeline architecture for trie-based IP lookup

Parallel IP lookup using multiple SRAM-based pipelines

Parallel and Distributed Processing, …, Jan 1, 2008

Scalable high-throughput SRAM-based architecture for IP-lookup using FPGA

Field Programmable Logic and …, Jan 1, 2008

High throughput routing in large-scale multi-radio wireless mesh networks

Wireless Communications and …, Jan 1, 2007

Abstract Routing in large-scale multi-radio wireless mesh networks (WMNs) is facing two challenge... more Abstract Routing in large-scale multi-radio wireless mesh networks (WMNs) is facing two challenges in achieving a high throughput. One is the long path between the source and the destination, and the other is the high routing overhead. We study the both aspects and develop our schemes accordingly. Firstly, a new routing metric for selecting multi-channel routes with maximum end-to-end capacity is presented. Secondly, a feedback based algorithm to maximize the control message broadcasting interval is proposed to minimize ...

A portable real-time emulator for testing multi-radio MANETs

Parallel and Distributed Processing …, Jan 1, 2006

Beyond TCAMs: An SRAM-based parallel multi-pipeline architecture for terabit IP lookup

Abstract Continuous growth in network link rates poses a strong demand on high speed IP lookup en... more Abstract Continuous growth in network link rates poses a strong demand on high speed IP lookup engines. While Ternary Content Addressable Memory (TCAM) based solutions serve most of today's high-end routers, they do not scale well for the next-generation. On the other hand, pipelined SRAM-based algorithmic solutions become attractive. Intuitively multiple pipelines can be utilized in parallel to have a multiplicative effect on the throughput.

FEACAN: Front-end acceleration for content-aware network processing

Abstract Modern networks are increasingly becoming content aware to improve data delivery and sec... more Abstract Modern networks are increasingly becoming content aware to improve data delivery and security via content-based network processing. Content-aware processing at the front end of distributed network systems, such as application identification for datacenter load-balancers and deep packet inspection for security gateways, is more challenging due to the wire-speed and low-latency requirement.

Bidirectional Pipelining for Scalable IP Lookup and Packet Classification

Abstract: Both IP lookup and packet classification in IP routers can be implemented by some form ... more Abstract: Both IP lookup and packet classification in IP routers can be implemented by some form of tree traversal. SRAM-based Pipelining can improve the throughput dramatically. However, previous pipelining schemes result in unbalanced memory allocation over the pipeline stages. This has been identified as a major challenge for scalable pipelined solutions. This paper proposes a flexible bidirectional linear pipeline architecture based on widely-used dual-port SRAMs.

High performance packet forwarding on parallel architectures

Internet is built as a packet-switching network. The kernel function of Internet infrastructure, ... more Internet is built as a packet-switching network. The kernel function of Internet infrastructure, including routers and switches, is to forward the packets that are received from one subnet to another subnet. The packet forwarding is accomplished by using the header information extracted from a packet to look up the forwarding table maintained in the routers/switches. Due to rapid growth of network traffic, packet forwarding has long been a performance bottleneck in routers/switches.

Multiroot: Towards Memory-Efficient Router Virtualization

Abstract Network virtualization has become a powerful scheme to make efficient use of networking ... more Abstract Network virtualization has become a powerful scheme to make efficient use of networking hardware. It allows multiple virtual networks to co-exist on the same physical networking substrate. This requires the hardware router to maintain multiple lookup tables. Hence, ultimately the hardware router should be capable of handling packets from different virtual networks. In this paper, we introduce a memory-efficient solution for router virtualization named, Multiroot.

ParaSplit: A Scalable Architecture on FPGA for Terabit Packet Classification

Abstract Packet classification is a fundamental enabling function for various applications in swi... more Abstract Packet classification is a fundamental enabling function for various applications in switches, routers and firewalls. Due to their performance and scalability limitations, current packet classification solutions are insufficient in ad-dressing the challenges from the growing network bandwidth and the increasing number of new applications. This paper presents a scalable parallel architecture, named Para Split, for high-performance packet classification.

A Framework for Security-Enhanced Peer-to-Peer Applications in Mobile Cellular Networks Open Access

Abstract Due to the dual trends of increasing cellular network transmission capacity and coverage... more Abstract Due to the dual trends of increasing cellular network transmission capacity and coverage as well as improving computational capacity, storage and intelligence of mobile handsets, mobile peer-to-peer (MP2P) networking is emerging an attractive research field in recent years. However, these trends have not been clearly articulated in perspective of both technology and business.

Sequence-Preserving Parallel IP Lookup using Multiple SRAM- based Pipelines

SRAM (static random access memory)-based pipelined algorithmic solutions have become competitive ... more SRAM (static random access memory)-based pipelined algorithmic solutions have become competitive alternatives to TCAMs (ternary content addressable memories) for high-throughput IP lookup. Multiple pipelines can be utilized in parallel to improve the throughput further. However, several challenges must be addressed to make such solutions feasible. First, the memory distribution over different pipelines, as well as across different stages of each pipeline, must be balanced. Second, the traffic among these pipelines should be balanced. Third, the intra-flow packet order (i.e. the sequence) must be preserved. In this paper, we propose a parallel SRAM-based multi-pipeline architecture for IP lookup. A two-level mapping scheme is developed to balance the memory requirement among the pipelines as well as across the stages in each pipeline. To balance the traffic, we propose an early caching scheme to exploit the data locality inherent in the architecture. Our technique uses neither a large reorder buffer nor complex reorder logic. Instead, a flow-aware queuing scheme exploiting the flow information is used to maintain the intra-flow sequence. Extensive simulation using real-life traffic traces shows that the proposed architecture with 8 pipelines can achieve a throughput of up to 10 billion packets per second, i.e. 3.2 Tbps for minimum size (40 bytes) packets, while preserving intra-flow packet order.

Scalable Multi-Pipeline Architecture for High Performance Multi-Pattern String Matching

Multi-pattern string matching remains a major performance bottleneck in network intrusion detecti... more Multi-pattern string matching remains a major performance bottleneck in network intrusion detection and anti-virus systems for high-speed deep packet inspection (DPI). Although Aho-Corasick deterministic finite automaton (AC-DFA) based solutions produce deterministic throughput and are widely used in today's DPI systems such as Snort and ClamAV, the high memory requirement of AC-DFA (due to the large number of state transitions in AC-DFA) inhibits efficient hardware implementation to achieve high performance. Some recent work has shown that the AC-DFA can be reduced to a character trie that contains only the forward transitions by incorporating pipelined processing. But they have limitations in either handling long patterns or extensions to support multi-character input per clock cycle to achieve high throughput. This paper generalizes the problem and proves formally that a linear pipeline with H stages can remove all cross transitions to the top H levels of a AC-DFA. A novel and scalable pipeline architecture for memory-efficient multi-pattern string matching is then presented. The architecture can be easily extended to support multi-character input per clock cycle by mapping a compressed AC-DFA onto multiple pipelines. Simulation using Snort and ClamAV pattern sets shows that a 8-stage pipeline can remove more than 99% of the transitions in the original AC-DFA. The implementation on a state-of-the-art field programmable gate array (FPGA) shows that our architecture can store on a single FPGA device the full set of string patterns from the latest Snort rule set. Our FPGA implementation sustains 10+ Gbps throughput, while consuming a small amount of on-chip logic resources. Also desirable scalability is achieved: the increase in resource requirement of our solution is sub-linear with the throughput improvement.

Reducing Dynamic Power Dissipation in Pipelined Forwarding Engines

Power consumption has become a limiting factor in next-generation routers. IP forwarding engines ... more Power consumption has become a limiting factor in next-generation routers. IP forwarding engines dominate the overall power dissipation in a router. Although SRAM-based
pipeline architectures have recently been developed as a promising alternative to power-hungry TCAM-based solutions for high-throughput IP forwarding, it remains a challenge to achieve low power. This paper proposes several novel architecture-specific techniques to reduce the dynamic power consumption in SRAM-based pipelined IP forwarding engines. First, the pipeline architecture itself is built as an inherent cache, exploiting the data locality in Internet traffic. The number of memory accesses which contribute to the majority of power consumption, is thus reduced. No external cache is needed. Second, instead of using a global clock, different pipeline stages are driven by separate clocks. The local locking scheme is carefully designed to exploit the traffic rate variation and improve the caching performance. Third, a fine-grained memory enabling scheme is developed to eliminate unnecessary memory accesses, while preserving the packet order. Simulation experiments using real-life traces show that our solutions can achieve up to 15-fold reduction in dynamic power dissipation, over the baseline pipeline architecture that does not employ the proposed schemes. FPGA implementation results show that our design sustains 40 Gbps throughput for minimum size (40 bytes)
packets while consuming a small amount of logic resources.

Field-Split Parallel Architecture for High Performance Multi-Match Packet Classification Using FPGAs

Multi-match packet classification is a critical function in network intrusion detection systems (... more Multi-match packet classification is a critical function in network intrusion detection systems (NIDS), where all matching rules for a packet need to be reported. Most of the previous work is based on ternary content addressable memories (TCAMs) which are expensive and are not scalable with respect to clock rate, power consumption, and circuit area. This paper studies the characteristics of real-life Snort NIDS rule sets, and proposes a novel SRAM-based architecture. The proposed architecture is called field-split parallel bit vector (FSBV) where some header fields of a packet are further split into bit-level subfields. Unlike previous multi-match packet classification algorithms which suffer from memory explosion, the memory requirement of FSBV is linear in the number of rules. FPGA technology is exploited to provide high throughput and to support dynamic updates. Implementation results show that our architecture can store on a single Xilinx Virtex-5 FPGA the full set of packet header rules extracted from the latest Snort NIDS and sustains 100 Gbps throughput for minimum size (40 bytes) packets. The design achieves 1.25× improvement in throughput while the power consumption is approximately one fourth that of the state-of-the-art solutions.

Large-Scale Wire-Speed Packet Classification on FPGAs

Multi-field packet classification is a key enabling function of a variety of network applications... more Multi-field packet classification is a key enabling function of a variety of network applications, such as firewall processing, Quality of Service differentiation, traffic billing, and other value added services. Although a plethora of research has been done in this area, wire-speed packet classification while supporting large rule sets remains difficult. This paper exploits the features provided by current FPGAs and proposes a decision-tree-based, two-dimensional dual-pipeline architecture for multi-field packet classification. To fit the current largest rule set in the on-chip memory of the FPGA device, we propose several optimization techniques for the state-of-the-art decision-tree-based algorithm, so that the memory requirement is almost linear with the number of rules. Specialized logic is developed to support varying number of branches at each decision tree node. A tree-to-pipeline mapping scheme is carefully designed to maximize the memory utilization. Since our architecture is linear and memory-based, on-the-fly update without disturbing the ongoing operations is feasible. The implementation results show that our architecture can store 10K real-life rules in on-chip memory of a single Xilinx Virtex-5 FPGA, and sustain 80 Gbps (i.e. 2x OC-768 rate) throughput for minimum size (40 bytes) packets. To the best of our knowledge, this work is the first FPGA-based packet classification engine that achieves wire-speed throughput while supporting 10K unique rules.

Beyond TCAMs: An SRAM-based Multi-Pipeline Architecture for Terabit IP Lookup

Continuous growth in network link rates poses a strong demand on high speed IP lookup engines. Wh... more Continuous growth in network link rates poses a strong demand on high speed IP lookup engines. While Ternary Content Addressable Memory (TCAM) based solutions serve most of today's high-end routers, they do not scale well for the next-generation. On the other hand, pipelined SRAM- based algorithmic solutions become attractive. Intuitively multiple pipelines can be utilized in parallel to have a multiplicative effect on the throughput. However, several challenges must be addressed for such solutions to realize high throughput. First, the memory distribution across different stages of each pipeline as well as across different pipelines must be balanced. Second, the traffic on various pipelines should be balanced. In this paper, we propose a parallel SRAM-based multi- pipeline architecture for terabit IP lookup. To balance the memory requirement over the stages, a two-level mapping scheme is presented. By trie partitioning and subtrie-to-pipeline mapping, we ensure that each pipeline contains approximately equal number of trie nodes. Then, within each pipeline, a fine-grained node-to-stage mapping is used to achieve evenly distributed memory across the stages. To balance the traffic on different pipelines, both pipelined prefix caching and dynamic subtrie-to-pipeline remapping are employed. Simulation using real-life data shows that the proposed architecture with 8 pipelines can store a core routing table with over 200 K unique routing prefixes using 3.5 MB of memory. It achieves a throughput of up to 3.2 billion packets per second, i.e. 1 Tbps for minimum size (40 bytes) packets.

Compact architecture for high-throughput regular expression matching on FPGA

Proceedings of the 4th ACM/ …, Jan 1, 2008

Optimizing routing metrics for large-scale multi-radio mesh networks

… , Networking and Mobile …, Jan 1, 2007

Abstract Routing metrics play a critical role in wireless mesh networks (WMNs). Several metrics h... more Abstract Routing metrics play a critical role in wireless mesh networks (WMNs). Several metrics have already been proposed but none of them can well meet the specific requirement brought by large-scale multi-radio mesh networks (LSMRMNs). In LSMRMNs, most of traffic has much longer paths than in small scale WMNs. The channel distribution on a long path thus has a significant impact on the route performance. In this paper, we identify such a challenge and study five existing routing metrics. Then we describe a novel ...

A memory-balanced linear pipeline architecture for trie-based IP lookup

Parallel IP lookup using multiple SRAM-based pipelines

Parallel and Distributed Processing, …, Jan 1, 2008

Scalable high-throughput SRAM-based architecture for IP-lookup using FPGA

Field Programmable Logic and …, Jan 1, 2008

High throughput routing in large-scale multi-radio wireless mesh networks

Wireless Communications and …, Jan 1, 2007

Abstract Routing in large-scale multi-radio wireless mesh networks (WMNs) is facing two challenge... more Abstract Routing in large-scale multi-radio wireless mesh networks (WMNs) is facing two challenges in achieving a high throughput. One is the long path between the source and the destination, and the other is the high routing overhead. We study the both aspects and develop our schemes accordingly. Firstly, a new routing metric for selecting multi-channel routes with maximum end-to-end capacity is presented. Secondly, a feedback based algorithm to maximize the control message broadcasting interval is proposed to minimize ...

A portable real-time emulator for testing multi-radio MANETs

Parallel and Distributed Processing …, Jan 1, 2006