Disclosure of Invention
The technical problem to be solved by the invention is as follows: aiming at the requirements of the reconfigurable switching chip on performance and flexibility, on the basis of analyzing the message forwarding and processing working characteristics of the reconfigurable chip, the large-scale network data processing method based on the reconfigurable switching chip architecture is provided, and the flexibility is realized on the premise of ensuring the message processing performance.
The technical solution of the invention is as follows: a large-scale network data processing method based on a reconfigurable switching chip architecture comprises the following steps:
(1) receiving and storing a plurality of paths of messages from the physical link;
(2) dividing the message stored in the step (1) into N message slices according to the preset slice size, wherein N is larger than or equal to 1, the size of each slice is larger than or equal to the size of a message header, and when the message slices are larger than 1, executing the steps (3) - (5) and the step (6); otherwise, executing the step (4) and the step (6);
(3) storing the message slice containing the message data payload, and adding corresponding message data payload storage address pointer information to the message header slice containing the message header;
(4) distributing a serial number to a message header slice containing message header information, analyzing the message header to obtain a message type, and analyzing, classifying and forwarding the message header independently in parallel according to the message type to update the message header slice;
(5) according to the message data payload storage address information carried by the message header, extracting the message data payload from the cache, and splicing the message data payload and the corresponding message header into a complete message;
(6) and according to the sequence number carried by the message header, the parallel processed messages are divided into multiple paths for forwarding after carrying out flow shaping and queue management processing in sequence.
The step (1) is realized specifically as follows:
(1.1) receiving messages from a physical link through a plurality of ports;
(1.2) carrying out message identification, verification and filtering on the received messages, filtering out invalid messages, and storing the remaining valid messages in a receiving buffer area;
(1.3) converging the messages into a path of data according to the sequence of arrival time;
and (1.4) caching the messages obtained in the step (1.3) in sequence.
The step (4) adopts a plurality of parallel microengines to analyze, classify and forward the message header independently in parallel, and specifically comprises the following steps:
(4.1) polling the thread working state of each thread of each microengine, and submitting the received message header to the microengines with more idle threads;
and (4.2) loading a corresponding microcode instruction by the micro engine receiving the message, scheduling a plurality of threads to access related table entries in corresponding storage units in the memory module in a rotating non-preemptive mode according to the microcode instruction, and completing analysis, classification and forwarding processing of a message header data frame so as to update a message header slice.
And the threads in each micro engine work in a pipeline working mode.
The specific method for accessing the relevant entries in the corresponding storage units in the memory module in the round-robin non-preemptive manner in step (4.2) is as follows:
(4.2.1) recording the thread numbers of all the microengines which are in the state of preparing to access the memory cells in the memory and the memory cells needing to be accessed;
and (4.2.2) polling whether the storage unit is in an accessed state, when a thread finishes accessing the storage unit, sequentially searching a thread ready to access the storage unit in the recorded thread number, and giving access right to the thread.
When the micro engine accesses the DDR memory, the micro engine firstly calls a search engine and appoints the search engine to search the table items in the DDR by adopting a hash algorithm or a binary tree search algorithm, searches the table items matched with the message header processed by the micro engine, and feeds back the search result to the micro engine.
The microengines are integrated on one chip.
The chip is internally provided with a special instruction set specially aiming at network data packet processing, the special instruction set comprises a multiplication instruction, a cyclic redundancy check instruction, a content addressing instruction and an FFS instruction, and the micro-engine schedules threads to execute the instructions according to a microcode instruction to complete corresponding message processing.
And (6) performing flow shaping processing on the message by adopting a token bucket algorithm based on priority.
And (6) adopting a priority queue, a flow-based weighted queue, a fair queue and a PQ or CQ queue method to perform queue management on the message.
Compared with the prior art, the invention has the beneficial effects that:
(1) the invention inherits the reconfigurable core idea, separates data forwarding and control, realizes the processing function of high-speed forwarding data packets between an input port and an output port by mainly operating the data on a micro-engine processing core, has the linear speed execution characteristic, fully utilizes the independence of the data packets, adopts a parallel processing mode, operates the control aspect on a hardware coprocessor, processes routing table lookup, flow management, completes high-level QoS control and the like.
(2) The message processing of the invention is programmable, the microcode runs on the microengine, and the reloadability of the microcode provides great convenience for system upgrading.
(3) The invention can identify the data packet according to the protocol type, port number, destination address and other information specific to the protocol of the message in the aspects of protocol identification and classification.
(4) The invention can process the message in slice in the aspects of message disassembly and assembly and message recombination, and can ensure the forwarding sequence of the message when the message is recombined.
(5) In the aspect of message header processing, a plurality of parallel microengines are adopted to simultaneously carry out end-to-end complete processing on a plurality of message headers, and each microengine comprises a plurality of threads, so that high-bandwidth line speed processing can be realized.
(6) The invention can shape the flow according to certain protocol or application requirements to make the flow meet the requirements of time delay and time delay jitter when outputting, and sends the message to the corresponding queue for priority processing and the like after the flow is shaped, thereby realizing QoS guarantee.
(7) The invention adopts a special hardware acceleration processing unit to carry out co-processing on specific tasks, such as a search engine SE, an order-preserving engine OE, a flow shaping TM and a queue management QM, thereby improving the processing speed.
Detailed Description
The invention will be further described with reference to the accompanying drawings and specific embodiments.
As shown in fig. 1, the present invention provides a large-scale network data processing method based on a reconfigurable switching chip architecture, which specifically comprises the following steps:
(1) receiving and storing the multi-channel messages from the physical link; the method specifically comprises the following steps:
(1.1) receiving messages from a plurality of ports;
(1.2) carrying out message identification, verification and filtering on the received messages, filtering out invalid messages, and storing the remaining valid messages in a receiving buffer area;
(1.3) converging the messages into a path of data according to the sequence of arrival time; (ii) a
And (1.4) caching the messages obtained in the step (1.3) in sequence.
(2) Dividing the message stored in the step (1) into N message slices according to the preset slice size, wherein N is larger than or equal to 1, the size of each slice is larger than or equal to the size of a message header, and when the message slices are larger than 1, executing the steps (3) - (5) and the step (6); otherwise, executing the step (4) and the step (6);
(3) storing the message slice containing the message data payload, and adding corresponding message data payload storage address pointer information to the message slice containing the message header;
(4) distributing a serial number to a message header slice containing message header information, analyzing the message header to obtain a message type, and analyzing, classifying and forwarding the message header independently in parallel according to the message type to update the message header slice;
the specific method for analyzing, classifying and forwarding the message header independently and parallelly by adopting a plurality of parallel microengines comprises the following steps:
(4.1) polling the thread working state of each thread of each microengine, and submitting the received message header to the microengines with more idle threads;
and (4.2) loading a corresponding microcode instruction by the micro engine receiving the message, scheduling a plurality of threads to access related table entries in corresponding storage units in the memory module in a rotating non-preemptive mode according to the microcode instruction, and completing analysis, classification and forwarding processing of a message header data frame so as to update a message header slice. The specific method comprises the following steps:
(4.2.1) recording the thread numbers of all the microengines which are in the state of preparing to access the memory cells in the memory and the memory cells needing to be accessed;
and (4.2.2) polling whether the storage unit is in an accessed state, when a thread finishes accessing the storage unit, sequentially searching a thread ready to access the storage unit in the recorded thread number, and giving access right to the thread.
When the micro engine accesses the DDR memory, the micro engine firstly calls a search engine and appoints the search engine to search the table items in the DDR by adopting a hash algorithm or a binary tree search algorithm, searches the table items matched with the message header processed by the micro engine, and feeds back the search result to the micro engine.
And the threads in each micro engine work in a pipeline working mode. The microengines are integrated on one chip. The chip is internally provided with a special instruction set specially aiming at network data packet processing, the special instruction set comprises a multiplication instruction, a cyclic redundancy check instruction, a content addressing instruction and an FFS instruction, and the micro-engine schedules threads to execute the instructions according to a microcode instruction to complete corresponding message processing.
(5) According to the message data payload storage address information carried by the message header, extracting the message data payload from the cache, and splicing the message data payload and the corresponding message header into a complete message;
(6) and according to the sequence number carried by the message header, the parallel processed messages are divided into multiple paths for forwarding after carrying out flow shaping and queue management processing in sequence.
As shown in fig. 2, a specific method for performing traffic shaping processing on a packet by using a token bucket algorithm based on priority includes: firstly, classifying messages according to a preset matching rule, and directly sending messages which do not accord with the matching rule without processing by a token bucket; for messages that meet the matching rules, token buckets are required to be used for processing. When enough tokens are in the bucket, the message can be continuously sent, and the quantity of the tokens in the token bucket is correspondingly reduced according to the length of the message; when the token in the token bucket is insufficient, the message cannot be sent, and the message can be sent only when a new token is generated in the bucket. Therefore, the flow of the message is limited to be less than or equal to the token generation speed, and the purpose of limiting the flow is achieved. And the message is transmitted to a QM module through flow shaping.
And (6) adopting a priority queue, a flow-based weighted queue, a fair queue, a PQ or CQ queue method to perform queue management on the message.
Based on the large-scale network data Processing method based on the reconfigurable switching chip architecture, the invention provides a large-scale network data Processing system based on the reconfigurable switching chip architecture, the system structure of which is shown in fig. 4, the system comprises XGE 1-XGEn ports, MAC module (Medium Access Control), aggregation module rmux (roll multiplexer), input cache module ibm (ingress Buffer management), packet analysis module pa (packet analysis), polling scheduling module pba (packet allocation), sequential assurance Engine module OE (Order-caching Engine), micro Engine cluster module npe (network Processing Engine), packet editing module pe (packet editing), traffic shaping module tm (traffic management), and queue management module qm (equest management), and output module ebb.
S1: XGE1 XGEN ports: the received message is sent to the MAC module, XEG is: Ten-Gigabit Ethernet;
s2: and the MAC (Medium Access control) module is used for identifying, checking and filtering the received messages, filtering out invalid messages and storing the remaining valid messages in a receiving buffer area. The MAC module is composed of 3 module parts of a control module, a sending module and a receiving module, and supports full-duplex communication.
The control module comprises a general processor interface, a register and the like and is used for realizing the control processing of the general processor on the MAC; and the statistics of the messages received and transmitted by the interface are also provided, wherein the statistics comprises the statistics information of unicast, multicast, broadcast, short packets, long packets, CRC correctness/errors and the like.
The sending module is mainly used for completing the transmission of data frames, reading data from a sending buffer area by taking bytes as a unit, filling the CRC and the lead codes of the Ethernet frames, converting the data into a physical layer XGE mode for transmission, and ensuring the minimum interval between the two Ethernet frames through a frame gap counter during the transmission;
and the receiving module is mainly used for receiving the data frame, receiving data from the XGE interface of the physical layer, identifying, checking and filtering the message, and storing the message in a receiving buffer area.
S3: the convergence module RMUX (roll multiplexer) converges each path of message into one path of data according to the sequence of arrival time and then sends the data to the IBM module;
s4: an input cache module IBM (ingress Buffer management) caches input messages in sequence, and simultaneously divides the input messages into N message slices according to the preset slice size, wherein N is more than or equal to 1, the size of each slice is more than or equal to the size of a message header, the size of a common slice is 80 bytes, and the slices are processed and then sent to a message analysis module PA;
s5: a packet analysis module PA (packet analysis), when the packet slice is greater than 1, storing the packet slice containing the packet data payload into an RB (resource buffer) module, and adding the corresponding packet data payload storage address pointer information into the packet slice containing the packet header; analyzing to obtain the message types, wherein the message types comprise ARP (Address Resolution Protocol), IPV4(Internet Protocol Version 4) and IPV6(Internet Protocol Version 6), analyzing the message types, and forwarding the message header to a polling scheduling module PBA.
Further, through PA analysis, once the message is found to need processing of more than 4 layers of protocols, the message is sent to the general processor for higher-level protocol processing after being processed by the NPE module.
S6: a polling scheduling module (PBA) and a polling scheduling module (PBA) are used for polling the thread working state of each thread of each micro-engine in the network message header processor, allocating a sequence number sent by an order assurance engine module (OE) to a received message header and submitting the sequence number to the micro-engine with more thread idle number;
s7: order-ensuring Engine module OE (Order-ensuring Engine): in order to prevent the messages from being out of order after being processed by the micro-engine, a sequence number is distributed to each message header before the messages enter the micro-engine, and a polling scheduling module PBA is sent.
S8: and the network message header processor analyzes, classifies and forwards the message headers independently in parallel to update message header slices, and sends the processed message header slices to the message editing module PE.
The network message header processor comprises a micro-engine cluster module NPE, a task scheduling module RBA and a storage unit. Wherein:
the micro-engine cluster module NPE (network Processing Engine) consists of a plurality of parallel micro-engines, each micro-engine completes the complete Processing of a message, each micro-engine comprises a plurality of threads, and each thread works in a pipeline working mode. The micro engine which receives the message loads a corresponding microcode instruction from the instruction memory IMEM, and according to the microcode instruction, a plurality of threads are scheduled by the task scheduling module RBA to access related table items in corresponding memory units in the memory module in a rotating non-preemptive mode, so that the analysis, classification and forwarding processing of a message header data frame are completed, and a message header slice is updated. And sending the processed message header to the PE module.
The specific method for accessing the relevant table entries in the corresponding storage units in the memory module in a round-robin non-preemptive manner by the RBA is as follows: the RBA records the thread numbers of all the microengines which are in the state of preparing to access the memory cells in the memory and the memory cells needing to be accessed; polling whether the memory unit is in an accessed state, when a thread finishes accessing the memory unit, sequentially searching a thread ready to access the memory unit in the recorded thread number, and giving access to the thread.
The storage unit comprises a DDR memory, a TCAM and an on-chip memory LMEM. Wherein:
the DDR memory is used for storing the table items which are related to the services such as the VLAN table, the MPLS table and the like and have relatively low requirements on the processing speed; the micro engine calls a search engine through the task scheduler, and appoints the search engine to search the table items in the DDR (double Data rate) by adopting a corresponding search algorithm, searches the table items matched with the message header processed by the micro engine, and feeds back the search result to the micro engine.
And the TCAM (ternary Content Addressable memory) memory is used for storing items with higher requirements on processing speed, such as an MAC address table, a routing table and the like. The MAC address table and the routing table are stored in a TCAM form, and during searching, the task scheduling module converts information in a message header into a TCAM table for storage, matches the TCAM table with the MAC address table and the routing table, finds a required data matching item and feeds the data matching item back to the micro-engine.
An on-chip memory LMEM (local memory) for storing a flow table and directly accessed by the thread of the micro-engine through the task scheduler.
S9: a message editing module PE (packet editing) for modifying the data content of the message header, storing address information according to the message data payload carried by the message header, extracting the message data payload from the cache, splicing the message data payload and the corresponding message header into a complete message, and sending the complete message to a traffic shaping module TM;
s10: a traffic shaping module TM (traffic management) for performing traffic shaping on the message and sending the shaped message to a queue management module QM;
the method specifically comprises the following steps: as shown in fig. 2, the specific method for guaranteeing the network QoS by the token bucket algorithm based on the priority is as follows: firstly, classifying messages according to a preset matching rule, and directly sending messages which do not accord with the matching rule without processing by a token bucket; for messages that meet the matching rules, token buckets are required to be used for processing. When enough tokens are in the bucket, the message can be continuously sent, and the quantity of the tokens in the token bucket is correspondingly reduced according to the length of the message; when the token in the token bucket is insufficient, the message cannot be sent, and the message can be sent only when a new token is generated in the bucket. Therefore, the flow of the message is limited to be less than or equal to the token generation speed, and the purpose of limiting the flow is achieved.
S11: queue management module QM (queue management) for managing the queue of the message and sending the message managed by the queue to EBM (accumulation Buffer management) module;
the method specifically comprises the following steps: as shown in fig. 3, a queue is created according to an index, when no queue congestion occurs at an interface, a message is sent out immediately after the message arrives, if congestion occurs, the message is classified and sent to different queues, a queue scheduling mechanism processes the messages with different priorities respectively, the queue with a high priority is processed preferentially, and after the length of the queue reaches a certain maximum value, a RED or WRED policy can be used for packet loss processing to avoid network overload. And the message is transmitted to the EBM module after being managed by the queue.
S12: and the output cache module EBM caches the output message and sends the message to the MAC module.
S13: and the MAC module receives the message, stores the message into a sending buffer area, reads data from the sending buffer area, fills an Ethernet frame CRC (cyclic Redundancy check) and a lead code, and converts the message into a physical layer XGE mode for transmission.
Further, in the large-scale network data processing system based on the reconfigurable switching chip architecture shown in fig. 4, the inside of the block is an on-chip processing unit, and the outside of the block is an off-chip processing unit.
Further, in the large-scale network data processing system based on the reconfigurable switching chip architecture shown in fig. 4, OE, SE, TM, and QM are hardware coprocessors.
Further, in the large-scale network data processing system based on the reconfigurable switch chip architecture shown in fig. 4, TM and QM may be implemented both on-chip and off-chip, for example, the switch chip has a higher processing speed and lower power consumption.
Aiming at the packet data processing, the invention adopts an optimization system structure, a special instruction set and a hardware unit, and can meet the requirement of the high-speed packet data linear speed processing. The editable message processing method comprises editable message analysis, message matching search, message editing and message forwarding, so that the message processing is more flexible and faster, and the method is more suitable for large-scale network data processing. The functions of processing the high-speed and high-capacity intelligent data packets, including message analysis, classification, forwarding and the like, are finished by the micro-engine; some complex and frequently operated functions such as routing table lookup, packet order preservation, traffic management and queue management adopt a hardware coprocessor to further improve the processing performance, thereby realizing the organic combination of service flexibility and high performance.
The invention is not described in detail and is within the knowledge of a person skilled in the art.