CN118656316B - Method and chip for integrating negation and timeout retransmission strategies in chiplet interconnection interface - Google Patents
Method and chip for integrating negation and timeout retransmission strategies in chiplet interconnection interface Download PDFInfo
- Publication number
- CN118656316B CN118656316B CN202411073144.XA CN202411073144A CN118656316B CN 118656316 B CN118656316 B CN 118656316B CN 202411073144 A CN202411073144 A CN 202411073144A CN 118656316 B CN118656316 B CN 118656316B
- Authority
- CN
- China
- Prior art keywords
- retransmission
- state
- timeout
- link
- strategy
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F13/00—Interconnection of, or transfer of information or other signals between, memories, input/output devices or central processing units
- G06F13/10—Program control for peripheral devices
- G06F13/12—Program control for peripheral devices using hardware independent of the central processor, e.g. channel or peripheral processor
- G06F13/124—Program control for peripheral devices using hardware independent of the central processor, e.g. channel or peripheral processor where hardware is a sequential transfer control unit, e.g. microprocessor, peripheral processor or state-machine
- G06F13/126—Program control for peripheral devices using hardware independent of the central processor, e.g. channel or peripheral processor where hardware is a sequential transfer control unit, e.g. microprocessor, peripheral processor or state-machine and has means for transferring I/O instructions and statuses between control unit and main processor
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F11/00—Error detection; Error correction; Monitoring
- G06F11/07—Responding to the occurrence of a fault, e.g. fault tolerance
- G06F11/0703—Error or fault processing not based on redundancy, i.e. by taking additional measures to deal with the error or fault not making use of redundancy in operation, in hardware, or in data representation
- G06F11/0751—Error or fault detection not based on redundancy
- G06F11/0754—Error or fault detection not based on redundancy by exceeding limits
- G06F11/0757—Error or fault detection not based on redundancy by exceeding limits by exceeding a time limit, i.e. time-out, e.g. watchdogs
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
- G06F15/163—Interprocessor communication
- G06F15/17—Interprocessor communication using an input/output type connection, e.g. channel, I/O port
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Quality & Reliability (AREA)
- Software Systems (AREA)
- Microelectronics & Electronic Packaging (AREA)
- Communication Control (AREA)
Abstract
The invention discloses a method and a chip for fusing negative and overtime retransmission strategies in a core interconnection interface, wherein the method comprises the steps of configuring and selecting retransmission strategies in the negative and overtime retransmission strategies for a sender and a receiver, designing hardware resources with the fusion of the two strategies, wherein the design comprises the steps of multiplexing the same retransmission buffer memory bank with the negative retransmission strategy and the overtime retransmission strategy, and independently using a retransmission overtime counter by the overtime retransmission strategy; when a sender sends a retransmission micro-packet to a receiver, if micro-packet retransmission is needed, the sender and the receiver complete micro-packet retransmission based on fused hardware resource design according to a preset selected negative retransmission strategy or a timeout retransmission strategy. The invention aims to realize a configurable reliability transmission mode of negative retransmission and timeout retransmission of a core interconnection interface protocol adapter layer, so that a user can select a retransmission strategy to be used according to requirements, and the hardware cost is lower on the premise of keeping flexibility and compatibility.
Description
Technical Field
The invention relates to the technical field of chip interconnection interface protocols and circuits, in particular to a method and a chip for fusing negative and overtime retransmission strategies in a chip interconnection interface.
Background
The inter-Die interface is the basis for implementing inter-Die (Die-to-Die, D2D) interconnection. The hierarchical structure of the core interconnection interface is different from the traditional inter-chip interconnection protocol, and is simplified and generally divided into three layers: protocol layer, adapter layer and physical layer. The protocol layer defines a transmission mode of a plurality of main stream protocols on an interconnection interface aiming at an inter-core particle communication protocol under a typical application, and the supported protocols comprise CXL2.0, PCIE6.0, AXI and the like. The adapter layer provides reliable link management for both parties of the communication, including standard micro-packet (Flit) format definition, link reliability transmission, and the like. The physical layers typically include a logical physical layer and an electrical physical layer that provide bit stream transport functionality consistent with the channel characteristics of the package substrate.
At present, the reliable transmission of the core particles mainly refers to the traditional inter-chip retransmission mechanism. However, on one hand, the core interconnection link has new characteristics in the aspect of transmission reliability due to short distance and high linear density; on the other hand, the interconnection and interworking of the core grains of different protocols provides a higher compatibility requirement for the core grain interconnection interface. For the strategy adopted by the retransmission mechanism of the core interconnection interface, different core interconnection standard initiators inherit from the respective existing inter-chip interconnection protocol ecology, so that the original inter-chip interconnection ecology is compatible, compatibility with other core interconnection standard protocols is not considered, and the retransmission mechanism of the adapter layer of different cores cannot work cooperatively, and the logic of the adapter layer cannot be reused.
At present, the inter-chip reliability transmission is mainly based on a retransmission mechanism of a sliding window, and a negative retransmission strategy (Nak+ack, NA strategy for short) and a timeout retransmission strategy (TimeoutRetry +ack, TA strategy for short) are adopted, and an adapter layer retransmission mechanism in an inter-chip interconnection standard is mainly based on inheritance of the two strategies. The negative retransmission strategy is currently adopted by PCIE6.0 and CXL2.0 protocols of Intel and is inherited to UCIe core interconnection interface standards newly issued by Intel. The timeout retransmission strategy is used for long-distance reliability transmission of chips in a retransmission mechanism of a transmission control protocol (Transmission Control Protocol, TCP), is used in a retransmission mechanism of a consistent interconnection interface among chips of a domestic Feiteng multipath server CPU (FT-2500), and supports reliability interconnection among 16 CPUs at most. When the interconnection standard of the domestic core grains is formulated, different strategies need to be considered, and the prosperous ecology of the interconnection and intercommunication of the core grains is constructed.
The principle of the timeout retransmission policy (TA policy) is explained as follows: the TA policy is implemented by a timeout counter and an Acknowledgement (ACK). The sender adds a serial number to each data packet in retransmission buffer, maintains a timeout counter, starts the timeout counter after the sender sends the data packet to the receiver, returns an ACK message when the receiver successfully receives the data packet, indicates that the data has been successfully received, and clears the ACK message from the retransmission buffer; when the timeout counter overflows, the sender does not receive the ACK message sent by the receiver, and triggers timeout retransmission, and the data packet corresponding to the ACK is not received before retransmission.
The principle of the negative retransmission strategy (NA strategy) is as follows: the NA strategy is implemented by positive and negative acknowledgements (Negative Acknowledgment, NAK). The sender adds a serial number to each data packet in the retransmission buffer, sends the data packet to the sender, and returns an ACK message when the receiver successfully receives the data packet, which indicates that the data has been successfully received, and clears the data from the retransmission buffer; when the received data packet has a check error, a NAK message is returned, and the sender receives NAK message Wen Jiuhui to trigger retransmission and resend the data packet in the retransmission buffer.
When the core particle adapter layer supports the TA strategy and the NA strategy simultaneously, if the two strategies are simply and completely realized, the two strategies can be configured into one of the strategies after being negotiated according to the mutually connected core particles in actual use, so that the adapter layer logic redundancy is caused. Retransmission buffering, which is the component occupying the most logical resources, is necessary and identical in comparison to the retransmission mechanism of the TA policy and NA policy, except for the individual counters and management mechanisms. Therefore, the characteristics of the two mechanisms are researched, the common content is found out, and the fusion support of the two mechanisms is realized under the conditions of multiplexing most logic resources and increasing smaller logic resource cost, so that the method is technically feasible and meaningful.
Disclosure of Invention
The invention aims to solve the technical problems: aiming at the problems in the prior art, the invention provides a method and a chip for fusing negative and overtime retransmission strategies in a core interconnection interface, which aim to realize a configurable reliability transmission mode of negative retransmission and overtime retransmission of a core interconnection interface protocol adapter layer, so that a user can select the retransmission strategy to be used according to the requirements, and the hardware cost is lower on the premise of keeping the flexibility and compatibility.
In order to solve the technical problems, the invention adopts the following technical scheme:
a method for fusing negative and overtime retransmission strategies in a core interconnection interface comprises the following steps:
S101, configuring and selecting retransmission strategies for a sender and a receiver in a negative retransmission strategy and a timeout retransmission strategy, wherein the negative retransmission strategy and the timeout retransmission strategy have a fused hardware resource design, the fused hardware resource design comprises whether the negative retransmission strategy and the timeout retransmission strategy multiplex the same set of retransmission buffer memory, and the timeout retransmission strategy independently uses a retransmission timeout counter;
S102, when a sender sends a retransmission micro-packet to a receiver, if micro-packet retransmission is needed, the sender and the receiver finish micro-packet retransmission based on fused hardware resource design according to a preset and selected negative retransmission strategy or a timeout retransmission strategy.
Optionally, the hardware resource design fused in step S101 further includes determining whether the retransmission policy and the timeout retransmission policy have multiplexing variables including WritePointer, readPointer, ack _ Info, retryNum and LinkRetrainNum, determining whether the retransmission policy has independent use variable including AccAck and FreeRB, and determining that the timeout retransmission policy has independent use variable AckSentTimeout, wherein the WritePointer is a retransmission buffer write pointer, the ReadPointer is a retransmission buffer read pointer, ack_info is response information, retryNum is a retransmission number counter, linkRetrainNum is a physical layer reinitialization number counter, accAck is an accumulated response number counter, freeRB is a retransmission buffer idle number counter, and AckSentTimeout is an acknowledgement timeout counter.
Optionally, the hardware resource design fused in step S101 further includes multiplexing, by the receiver, a receiver state machine RSM for both the negative retransmission policy and the timeout retransmission policy, where the receiver state machine RSM is a receiver response state r_rsp required by the receiver state machine that is used by the negative retransmission policy and includes at least the NORMAL state NORMAL, and is used to send the retransmission management micro packet retransmission. Under the overtime retransmission strategy, using NORMAL state NORMAL and receiver response state R_RSP to manage the receiver state, and entering the NORMAL state NORMAL when requesting link recovery; in the NORMAL state, if a retransmission micro packet is received and the positive response timeout counter overflows, the retransmission micro packet enters a receiver response state R_RSP, which indicates that a retransmission management micro packet retry.Rcvd needs to be sent out to a received retransmission micro packet receiver at intervals; and under the response state R_RSP of the receiver, if the adapter layer management micro-packet control module receives the retransmission management micro-packet retry.
Optionally, the hardware resource design fused in step S101 further includes multiplexing, by the receiver, a sender state machine SSM for both a negative retransmission policy and a timeout retransmission policy, where the sender state machine SSM adds, based on a sender state machine used for the negative retransmission policy and including at least two states of a NORMAL state NORMAL and a retransmission state RETRY, a receiver response state s_req, a LINK retraining state link_ RETRAIN and a LINK termination state END required by the sender state machine of the timeout retransmission policy, where the receiver response state s_req is used to send a retransmission management micro packet RETRY. Timeout to the adapter layer management micro packet control module, the LINK retraining state link_ RETRAIN is used to perform a LINK retraining state by using the number of retransmissions exceeding a threshold, and the LINK termination state END is used to cause a LINK failure by using the number of retraining exceeding the threshold; under the timeout retransmission policy, the transmitter state management is performed by using five states, namely a NORMAL state NORMAL, a retransmission state RETRY, a receiver response state s_req, a LINK retraining state link_ RETRAIN and a LINK termination state END: when reset, the device is in a LINK retraining state LINK_ RETRAIN, and under the LINK retraining state LINK_ RETRAIN, if the LINK is successfully recovered and no LINK retraining request exists, the device indicates that the LINK is recovered and enters a receiver response state S_REQ; in the response state S_REQ of the receiver, if the adapter layer management micro packet control module receives a retransmission management micro packet retry.Timeout sent by the sender, the adapter layer management micro packet control module enters a retransmission state RETRY, and if the link termination is requested, the adapter layer management micro packet control module enters a link termination state END; in the retransmission state RETRY, if the retransmission is completed and enters a NORMAL state NORMAL, if the timeout retransmission counter overflows, a receiver response state s_req is returned; under NORMAL state NORMAL, if the timeout retransmission counter overflows, the retransmission management micro-packet retry. Timeout request micro-packet needs to be sent out again, the receiver response state s_req is entered, and if the LINK recovery is requested, the LINK retraining state link_ RETRAIN is entered.
Optionally, the hardware resource design fused in step S101 further includes multiplexing a retransmission buffer state machine RBSM for both the negative retransmission policy and the overtime retransmission policy by the receiver, where the retransmission buffer state machine RBSM is multiplexed in the overtime retransmission policy by using a retransmission buffer state machine including an IDLE state IDLE and a READ state READ used in the negative retransmission policy; under the overtime retransmission strategy, the method is in an IDLE state IDLE when reset, and under the IDLE state IDLE, if retransmission is required when the retransmission overtime counter overflows, a READ state READ is entered; in READ state READ, if the retransmission is completed, the IDLE state IDLE is returned.
Optionally, the receiver state machine RSM, the sender state machine SSM and the retransmission buffer state machine RBSM all decide to use a negative retransmission strategy or a timeout retransmission strategy according to the enable signal ta_enable, if the enable signal ta_enable is 1, the timeout retransmission strategy is used, otherwise, the negative retransmission strategy is used.
In addition, the invention also provides a multi-chip comprising a plurality of chips connected with each other through a chip interconnection interface, wherein the chips are programmed or configured to execute the method for fusing negative and overtime retransmission strategies in the chip interconnection interface.
The present invention further provides a computer readable storage medium having stored therein a computer program or instructions programmed or configured to perform a method of fusing negative and timeout retransmission policies in the core interconnect interface by a processor.
The present invention further provides a computer program product comprising a computer program or instructions programmed or configured to perform a method of fusing negative and timeout retransmission policies in the pellet interconnect interface by a processor.
Compared with the prior art, the invention has the following advantages:
1. The flexibility is good. Compared with a single strategy for realizing reliability transmission by a traditional core interconnection interface adapter layer, the invention can realize the configurability of a traditional negative retransmission strategy and a timeout retransmission strategy, and a user can select which strategy to use to realize reliability transmission according to own needs.
2. The compatibility is good. The negative retransmission and timeout retransmission strategy realized by the invention can be configured to be compatible with the reliability transmission strategy of UCIe protocol adapter layer and the retransmission management micro-packet of the adapter layer.
3. The hardware overhead is low. The invention has the advantages that the retransmission strategy and the overtime retransmission strategy are not determined, the fused hardware resource design comprises the step that the retransmission strategy and the overtime retransmission strategy are determined to multiplex the same set of retransmission buffer memory bank, and the overtime retransmission strategy independently uses a retransmission overtime counter, so that the hardware cost is lower on the premise of keeping the flexibility and the compatibility.
Drawings
FIG. 1 is a schematic diagram of a basic flow of a method according to an embodiment of the present invention.
Fig. 2 is a state transition diagram of a multiplexed receiver state machine RSM according to an embodiment of the present invention.
Fig. 3 is a state transition diagram of a multiplexed sender state machine SSM in an embodiment of the present invention.
Fig. 4 is a state transition diagram of a retransmission buffer state machine RBSM for multiplexing in an embodiment of the present invention.
Detailed Description
As shown in fig. 1, the method for fusing negative and timeout retransmission policies in the core interconnection interface in this embodiment includes the following steps:
S101, configuring and selecting retransmission strategies for a sender and a receiver in a negative retransmission strategy (NA strategy) and a timeout retransmission strategy (TA strategy), wherein the negative retransmission strategy and the timeout retransmission strategy have a fused hardware resource design, and resources used by the two strategies of timeout retransmission and negative retransmission are fused, and the resources are as follows: two kinds of strategy multiplexing retransmission buffer memory banks; a timeout retransmission counter used only by the timeout retransmission policy; controlling variables which need to retransmit the correct retransmission of the micro-packet, wherein a read pointer, a write pointer, a retransmission time counter, a link retraining time counter and response information are resources multiplexed by two strategies; the two strategy multiplexing retransmission state machines are divided into a receiving side state machine, a sending side state machine and a retransmission buffer state machine. The hardware cost can be lower by multiplexing the retransmission buffer memory bank, the variable for controlling retransmission and the retransmission state machine;
s102, when a sender sends a retransmission micro-packet to a receiver, if micro-packet retransmission is needed, the sender and the receiver finish micro-packet retransmission based on fused hardware resource design according to a preset and selected negative retransmission strategy or a timeout retransmission strategy. When the micro-packet retransmission is completed based on the fused hardware resource design according to the preset selective negative retransmission strategy or the timeout retransmission strategy, each time a sender sends a retransmission micro-packet, the retransmission micro-packet is simultaneously placed in a retransmission buffer area, and a micro-packet request sequence number (Seq) is allocated to the retransmission buffer area. The receiver maintains a variable Rcvd _ Seq for recording the expected value of the sequence number of the correctly received retransmitted packet. The initial value of Rcvd _seq is 0, the maximum of which depends on the depth of the retransmission buffer. Each time a receiver correctly receives a retransmitted packet, rcvd _seq is incremented by 1; if there is an error in the received retransmitted packet, rcvd _seq remains unchanged. If the sequence number of the received retransmitted micro packet does not match Rcvd _seq, it is directly discarded. The receiver needs to transmit Rcvd _seq to the sender through the retransmission management micro-packet, and meanwhile, the sender also maintains a variable Send_seq for recording the latest response sequence number in the received management micro-packet, and the sender updates the Send_seq according to Rcvd _seq information contained in the received retransmission management micro-packet. NA policy differs from TA policy in that the trigger mechanism for retransmissions is different. The NA policy is that when an erroneous packet is received, the receiver generates a retransmission request packet to the sender, and the retransmission request packet needs to include Rcvd _seq. When the sender receives the retransmission request micro-packet, all micro-packets stored in the retransmission buffer area from the Rcvd _seq sequence number are retransmitted. In the TA strategy, each micro packet corresponds to a retransmission timeout counter, a corresponding counter is started when a sending end sends out the micro packet, and when any retransmission timeout counter is overtime, retransmission is triggered, a sender generates a retransmission request micro packet, and the retransmission request micro packet needs to comprise a send_seq. When retransmission is performed, all the micro packets stored in the retransmission buffer from the send_seq sequence number need to be retransmitted. Each time a receiver correctly receives a retransmitted packet, rcvd _seq increases by 1, and some other information is generated, which is collectively called retransmission response information. The retransmission acknowledgement information may be transmitted by a dedicated retransmission acknowledgement management micro-packet or by a separate acknowledgement management micro-packet. In the NA strategy, the retransmission response information is the number of responses generated by correctly receiving the micro-packets (AckNum). The receiving side returns the number of the plurality of responses to the transmitting side by the micro packet at an appropriate timing by counting the number of the generated responses. The number of replies carried in the micro-packet is denoted RtnAck. After receiving RtnAck the management micro packet, the sender sends a retransmission response micro packet (rty. Ack) to the receiver, and clears the corresponding number of micro packets from the retransmission buffer. In the TA strategy, the retransmission response information is a response sequence number (AckNo). And each time the receiving party successfully receives a certain number of micro-packets, a retransmission response micro-packet (Rerty. Rcvd) is sent to the sending party, and after the sending party receives the management micro-packet to obtain a response sequence number, the sequence number and all the retransmission micro-packets before the sequence number are removed from a retransmission buffer.
In the embodiment, on the basis of comparing and analyzing two strategies, on the premise of multiplexing the same retransmission buffer memory bank, the common counter, pointer, management micro-packet and other logics under the two strategies are fused, namely, only one set of common logics is designed, and the logic resources which are respectively and independently owned are respectively designed; particularly for a retransmission management state machine, expansion is performed on the basis of implementation of NA strategies, and configurable support of NA and TA strategies is realized with smaller hardware cost by adding states of partial TA strategies and configurable design of conversion conditions between original states or adding new conversion conditions to new states.
The hardware resource design fused in step S101 of this embodiment further includes multiplexing variables including WritePointer, readPointer, ack _ Info, retryNum and LinkRetrainNum for both the retransmission policy and the timeout retransmission policy, independent using variables for the retransmission policy include AccAck and FreeRB, independent using variable AckSentTimeout for the timeout retransmission policy, where the WritePointer is a retransmission buffer write pointer, the ReadPointer is a retransmission buffer read pointer, ack_info is response information, retryNum is a retransmission number counter, linkRetrainNum is a physical layer reinitialization number counter, accAck is an accumulated response number counter, freeRB is a retransmission buffer idle number counter, and AckSentTimeout is an acknowledgement timeout counter. It can be seen that under the strategy of fusing NA and TA, hardware resources in the retransmission mechanism structure are classified into three types.
First, fully multiplexed and uniformly controlled resources: and retransmitting the buffer memory bank. The micro-packets that have been issued but have not yet been validated are saved. The width of the memory bank is the micro-packet width and the depth is the number of micro-packets that can be saved.
The second class multiplexes but configures resources for different functions according to different policies. Each feature in this portion of the resource uses the same hardware, but holds different content or performs different functions under different policies. These resources include: retransmission buffer write pointer (WritePointer): pointing to the location in the retransmission buffer where the retransmitted micro packet to be transmitted is written. The write pointer adopts a rollback type counter, the maximum value is the depth of retransmission buffer, and the reset is postponed to 0. When retransmission occurs, the retransmission buffer stops receiving new micro-packets, the write pointer remains unchanged, and all micro-packets in the retransmission buffer are retransmitted. After the retransmission is finished, the retransmission buffer can be written into the new micro-packet, and the write pointer can be continuously increased. Retransmission buffer read pointer (ReadPointer): pointing to the location where the micro-packet needs to be read from the retransmission buffer. The read pointer adopts a rollback type counter, the maximum value is the depth of retransmission buffer, and the reset value is 0. During retransmission, the read pointer is incremented by 1 every time the sender reads out a micro packet from the retransmission buffer. Until all the micro packets in the retransmission buffer are sent completely (when the next position of the read pointer is the write pointer), the retransmission is finished. Response information (Ack _ Info) from the receiver, which verifies the latest sequence number (AckNo) of the correct micro packet or the number of correctly received micro packets to be returned (AckNum) by data verification of the micro packets of the receiver, is transmitted to the sender. This information needs to be implemented as different functions depending on the configuration of the policy. The reply information is AckNum when configured as NA policy and AckNo when configured as TA policy. Retransmission number counter (RetryNum): recording the times of retransmission operation of the retransmission buffer (each retransmission retransmits all the effective micro-packets in the retransmission buffer), and considering that the physical link is failed and needs to be initialized when the retransmission times exceeds the set maximum continuous retransmission times and the response is not received. Physical layer reinitialization number counter (LinkRetrainNum): recording the number of physical layer re-initialization, and considering that the link cannot be recovered when the number of physical layer initialization exceeds the set maximum continuous initialization number and the physical link failure is not solved yet. Retransmission management state machine: when implementing two kinds of configurable retransmission strategies, the state machines are designed as a sender state machine (SENDER STATE MACHINE, SSM), a receiver state machine (RECEIVER STATE MACHINE, RSM) and a retransmission Buffer state machine (retransmission Buffer STATE MACHINE, RBSM). These three state machines extend the TA policy while satisfying the NA policy. This part of the design is described in detail later.
And thirdly, respectively independent resources which are used under different strategies and cannot be reused can be used or not used according to configuration decision only after the corresponding NA or TA strategy is configured, the resources of the NA are idle and unused after the TA strategy is configured, and otherwise, the resources of the TA are idle and unused after the NA strategy is configured. These resources include: retransmission buffer idle number counter (FreeRB): this value indicates how much free space remains in the retransmission buffer, which is set to the depth of the retransmission buffer at reset. When there is a new packet to be buffered, the counter is decremented by 1, and when there is an old packet correctly received by the receiver, the counter is incremented by the number of correctly received packets (RtnAck). FreeRB has a value of 1, which indicates that the retransmission buffer has only one free position, and new data packets are not allowed to be written. Cumulative number of replies counter (AccAck): the number of micro-packets waiting to be sent is recorded and correctly received and accumulated by the receiver. When a retransmitted packet is correctly received AccAck plus 1, after a number of replies are returned to each other by the retransmission management packet, accAck counter subtracts the returned value (RtnAck). Retransmission timeout counter (RetryTimeout): each micro packet stored in the retransmission buffer of the sender corresponds to one timeout counter, and under the initial condition, all the timeout counters stop timing; when a micro packet enters retransmission buffer, starting an overtime counter of a corresponding serial number of the micro packet; when receiving the response serial number AckNo, the sender clears the timeout counter corresponding to AckNo and all the previous serial numbers to 0; if any one of the timeout counters is overtime, all the timeout counters stop timing and prepare for retransmission; after the retransmission is started, starting an overtime counter corresponding to the sequence number of the retransmitted micro packet, and continuously counting whether the micro packet overtime in the retransmission process. Acknowledgement timeout counter (AckSentTimeout): the receiving side does not return a retransmission management message every time a micro packet is received, but rather the correctly received micro packet is accumulated to a certain number to return a retransmission management message. Each time the receiving side sends a retransmission management message, resetting the acknowledgement timeout counter and restarting counting, and immediately sending a retransmission management message (retry. Rcvd) after overflowing; if the received micro-packet is wrong or out of order, the retransmission management message (retry. Req) is immediately sent out without waiting for the timeout counter to overflow, and the acknowledgement timeout counter is reset.
Retransmission related information is exchanged between a sender and a receiver of a guaranteed reliable transmission and stored using retransmission management micro-packets. In addition to the retransmission management micro-packet, the adapter also comprises a link management micro-packet, a power consumption management micro-packet and the like, and in order to uniformly manage the micro-packets with different functions, an adapter layer management micro-packet control module is introduced, retransmission management micro-packets mutually transmitted between all the sender and the receiver are transmitted to the adapter layer management micro-packet control module, the module is used as an intermediate party, and corresponding signals are returned to the sender or the receiver so as to realize interaction among the sender, the receiver and retransmission buffer. The hardware resource design fused in step S101 of this embodiment further includes defining two retransmission management micro-packets retransmission.err and retransmission.ack for the negative retransmission policy, defining two retransmission management micro-packets retransmission.timeout and retransmission.rcvd for the timeout retransmission policy, where the fields of the retransmission management micro-packets retransmission.err, retransmission.ack, retransmission.timeout, retransmission.rcvd include the belonging retransmission policy, the adapter layer micro-packet type, the retransmission management micro-packet type and the filled field, and the filled field in the retransmission management micro-packet retransmission.err includes Rcvd _seq, ack_ Info, retryNum and LinkRetrainNum; The fields filled in the retransmission management micro-packet retry. Ack include WritePointer, rcvd _seq and FreeRB; the fields filled in the retransmission management micro-packet retry.err include WritePointer, send _ Seq, retryNum, linkRetrainNum and FreeRB; the fields filled in the retransmission management micro-packet retry. Err include send_seq and ack_info; wherein Rcvd _seq is an expected sequence number maintained by the receiving side, ack_info is response information, retryNum is a retransmission number counter, linkRetrainNum is a physical layer reinitialization number counter, writePointer is a retransmission buffer write pointer, freeRB is a retransmission buffer idle number counter, and Send_seq is a sequence number maintained by the transmitting side for recording a latest response sequence number AckNo in a received management micro packet. When configured as NA policies, retry. Err and retry. Ack are used. The retransmission management micro packet sent by the receiving party when CRC check error occurs is Err, and the function of the retransmission management micro packet is to send out a retransmission request; the retry. Ack is a response packet sent by the sender when a retransmission request occurs, and controls the retransmission. When configured as a TA policy, retry.timeout and retry.rcvd are used. After the timeout counter of the retry.Timeout is overflowed, the sender sends out the retransmission request micro-packet and controls the retransmission; Rcvd is the response packet that the receiver will send out a packet containing the sequence number of the received packet after accumulating a certain number of ACKs. In this embodiment, retransmission management micro packets retry.err, retry.ack, retry.timeout, retry.rcvd are specifically shown in table 1.
Table 1 field table of four retransmission management micro-packets of adapter layer
The hardware resource design fused in step S101 of this embodiment further includes that the receiver multiplexes the receiver state machine RSM for both the negative retransmission policy and the timeout retransmission policy, the receiver multiplexes the sender state machine SSM for both the negative retransmission policy and the timeout retransmission policy, and the receiver multiplexes the retransmission buffer state machine RBSM for both the negative retransmission policy (NA policy) and the timeout retransmission policy (TA policy). The three state machines function as follows: the function of the receiver state machine RSM is to send a retransmission type management micro packet retry.err (NA policy) or retry.rcvd (TA policy) to the adapter layer control micro packet management module in time when receiving the retransmission micro packet; maintaining a receiver timeout counter to control how long to issue a retry; under NA policy, a RetryNum counter is maintained, link retraining is performed when overflowed, a LinkRetrainNum counter is maintained, and the link fails when overflowed. The function of the sender state machine SSM is to send a management micro packet retry. Ack (NA policy) or retry. Timeout (TA policy) of retransmission type to the adapter layer control micro packet management module in time when triggering the retransmission condition; retransmitting all micro-packets in the RB from the updated read pointer; under the TA policy, a RetryNum counter is maintained, link retraining is performed when overflowed, a LinkRetrainNum counter is maintained, and the link fails when overflowed. The retransmission Buffer state machine (retransmission Buffer STATE MACHINE, RBSM) has the function of writing the RB through the write pointer when there is a retransmission packet; when a retransmission request exists, the read pointer is controlled to point to the micro-packet needing to be retransmitted, and the read pointer points to the micro-packet corresponding to Rcvd _seq (NA strategy) or Send_seq (TA strategy); rcvd _seq and Send_seq registers are managed.
As shown in fig. 2, the receiver state machine RSM is a receiver response state r_rsp required by a receiver state machine with a timeout retransmission policy added on the basis of a receiver state machine including three states of a NORMAL state NORMAL, a receiver retransmission request state r_req, and a LINK retraining state link_ RETRAIN used by a negative retransmission policy, and is used for sending a retransmission management micro-packet retry. Under the overtime retransmission strategy, using NORMAL state NORMAL and receiver response state R_RSP to manage the receiver state, and entering the NORMAL state NORMAL when requesting link recovery; In the NORMAL state, if a retransmission micro packet is received and the positive response timeout counter overflows, the retransmission micro packet enters a receiver response state R_RSP, which indicates that a retransmission management micro packet retry.Rcvd needs to be sent out to a received retransmission micro packet receiver at intervals; and under the response state R_RSP of the receiver, if the adapter layer management micro-packet control module receives the retransmission management micro-packet retry. In fig. 2, the state machines used for the negative retransmission policy (NA policy) in the block include NORMAL state NORMAL, receiver retransmission request state r_req, LINK retraining state link_ RETRAIN, and so on. The functions are as follows: normally receiving the micro-packet in a NORMAL state and entering a retransmission request state R_REQ state request retransmission of a receiving party when the micro-packet with the verification error appears, and entering a LINK retraining state LINK_ RETRAIN if the retransmission times exceed a threshold value; when the TA strategy is fused, NORMAL is a NORMAL working state machine of a receiver, and the state machine is used in both NA strategy and TA strategy. A receiver response state r_rsp is added for sending a retry.rcvd management packet to the adapter layer management packet control module. The relevant signals and state transitions of the TA policy state machine are as follows: ①: rcvd _timeout & retryable _flit indicates that a retransmitted micro packet is received and an acknowledgement timeout counter overflows. ② : retry_ rcvd _send indicates that the adapter layer management micro packet control module receives the retry.rcvd sent by the receiver. ③: req_recovery indicates that link recovery is requested. The reset is in NORMAL state; if the req_recovery signal is set to 1, the state machine enters a NORMAL state; When rcvd _timeout is set to 1 and retryable _flit is set to 1, the R_RSP state is entered, which indicates that a retransmission.Rcvd needs to be sent to a receiving party of the received retransmission micro packet at intervals, and when retransmission_ rcvd _send is set to 1, the NORMAL state is entered, which indicates that the adapter layer management micro packet control module has received a retransmission.Rcvd response micro packet. It should be noted that, referring to fig. 2, the receiver state machine used by the negative retransmission policy (NA policy) in the box only needs to be in NORMAL state NORMAL to implement multiplexing with the receiver state machine used by the TA policy, so the receiver state machine used by the negative retransmission policy (NA policy) in the box shown in fig. 2 is only an example in this embodiment, and other receiver state machines used by the negative retransmission policy (NA policy) including NORMAL state NORMAL may also be used, and multiplexing with the receiver state machine used by the timeout retransmission policy (TA policy) may also be implemented.
As shown in fig. 3, the sender state machine SSM adds a receiver response state s_req, a LINK retraining state link_ RETRAIN and a LINK termination state END required by the sender state machine of the timeout retransmission policy on the basis of the sender state machine including three states of NORMAL state NORMAL, sender retransmission response state s_rsp and retransmission state RETRY used by the negative retransmission policy, wherein the receiver response state s_req is used for sending a retransmission management micro packet retry.timeout to the adapter layer management micro packet control module, the LINK retraining state link_ RETRAIN is used for performing the LINK retraining state by exceeding a threshold for the number of retransmissions, the END state END is used for the link retraining times exceeding a threshold value to cause the link to fail; Under the timeout retransmission policy, the transmitter state management is performed by using five states, namely a NORMAL state NORMAL, a retransmission state RETRY, a receiver response state s_req, a LINK retraining state link_ RETRAIN and a LINK termination state END: when reset, the device is in a LINK retraining state LINK_ RETRAIN, and under the LINK retraining state LINK_ RETRAIN, if the LINK is successfully recovered and no LINK retraining request exists, the device indicates that the LINK is recovered and enters a receiver response state S_REQ; in the response state S_REQ of the receiver, if the adapter layer management micro packet control module receives a retransmission management micro packet retry.Timeout sent by the sender, the adapter layer management micro packet control module enters a retransmission state RETRY, and if the link termination is requested, the adapter layer management micro packet control module enters a link termination state END; In the retransmission state RETRY, if the retransmission is completed and enters a NORMAL state NORMAL, if the timeout retransmission counter overflows, a receiver response state s_req is returned; under NORMAL state NORMAL, if the timeout retransmission counter overflows, the retransmission management micro-packet retry. Timeout request micro-packet needs to be sent out again, the receiver response state s_req is entered, and if the LINK recovery is requested, the LINK retraining state link_ RETRAIN is entered. The state machine used for determining the retransmission policy (NA policy) in the block of fig. 3 includes NORMAL state, sender retransmission response state s_rsp, retransmission state RETRY, etc. The functions are as follows: the NORMAL state does not need to retransmit the micro packet, and enters a retransmission response state when receiving the retransmission request micro packet and enters a retransmission state when sending out the retransmission response micro packet. When fusing the TA policy, NORMAL and RETRY states are used in the state machine in both NA policy and TA policy. The receiver response state S_REQ is added and is used for sending a retry. Timeout management micro packet to the adapter layer management micro packet control module; a LINK retraining state LINK_ RETRAIN is added and is used for carrying out the LINK retraining state when the retransmission times exceed a threshold value; a link termination state END is added for link retraining times exceeding a threshold causing link failure. The relevant signals and state transitions of the TA policy state machine are as follows: ①: req_recovery indicates that link recovery is requested. ②: rsp_active & ≡ req_ retrain indicates that the link recovery was successful and there was no link retraining request. ③: req_ retrain represents a link retraining request. ④ : ta_timeout. A timeout retransmission counter indicating TA policy overflows. ⑤: the retry_timeout_send indicates that the adaptor layer management micro packet control module receives the retry. ⑥: retry_done indicates that the retransmission is complete. ⑦ : ta_timeout indicates that the timeout retransmission counter of the TA policy overflows. ⑧: req_end indicates that the requested link is terminated. The reset is in LINK_ RETRAIN state; if rsp_active is set to 1 and req_ retrain is 0, the link is restored and the S_REQ state is entered; if retry_timeout_send is set to 1, it indicates that the adaptor layer management micro packet control module has received the RETRY. Timeout request micro packet, and enters a RETRY state; If retry_done is set to 1, the retransmission is completed and enters NORMAL state; if ta_timeout is set to 1 in the RETRY state, it indicates that the timeout retransmission counter overflows again in this retransmission, and the RETRY. Timeout request packet needs to be sent again, and the state of s_req is entered; if ta_timeout is set to 1 in NORMAL state, it indicates that there is overflow of timeout retransmission counter, and a retry. Timeout request micro packet needs to be sent out to enter into S_REQ state; if req_ retrain is set to 1, indicating that LINK retraining is required, and entering a LINK_ RETRAIN state; If req_end is set to 1, it indicates that the link fails to enter the termination state. It should be noted that, referring to fig. 3, the sender state machine used by the negative retransmission policy (NA policy) in the block only needs the NORMAL state NORMAL and the retransmission state RETRY to implement multiplexing with the sender state machine used by the timeout retransmission policy (TA policy), so the sender state machine used by the negative retransmission policy (NA policy) in the block shown in fig. 3 is only an example in this embodiment, other sender state machines used by the NA policy including the NORMAL state NORMAL and the retransmission state RETRY may be used, and multiplexing with the sender state machine used by the timeout retransmission policy (TA policy) may also be implemented.
As shown in fig. 4, the retransmission buffer state machine RBSM multiplexes the timeout retransmission policy in a retransmission buffer state machine including two states of IDLE state IDLE and READ state READ used in the negative retransmission policy; under the overtime retransmission strategy, the method is in an IDLE state IDLE when reset, and under the IDLE state IDLE, if retransmission is required when the retransmission overtime counter overflows, a READ state READ is entered; in READ state READ, if the retransmission is completed, the IDLE state IDLE is returned. In the retransmission buffer state machine RBSM, the NA policy and the TA policy multiplex all state machines, but the signals converted by the state machines are different. Both state machines are used in both the negative retransmission policy (NA policy) and the time out retransmission policy (TA policy). In IDLE state IDLE, there is no retransmission request, only the write pointer is increased; in the READ state READ, a retransmission request occurs, the write retransmission buffer is stopped, and retransmission is started from the position of the READ pointer. The relevant signals and state transitions of the TA policy state machine are as follows: ①: retry_done indicates that the retransmission is complete. ②: na_err ta_timeout represents either a trigger retransmission under NA policy or a trigger request under TA policy. When reset, the device is in an IDLE state; when NA strategy is adopted, if na_err is set to 1, indicating that the error of verification occurs and retransmission is needed, and entering a READ state; when the TA strategy is adopted, if the ta_timeout is set to 1, the situation that retransmission is needed when the retransmission timeout counter overflows is indicated, and a READ state is entered; if retry_done is set to 1, indicating that retransmission is completed, entering IDLE state.
In this embodiment, the receiver state machine RSM, the sender state machine SSM, and the retransmission buffer state machine RBSM all determine whether to use a negative retransmission policy or a timeout retransmission policy according to the enable signal ta_enable, if the enable signal ta_enable is 1, the timeout retransmission policy is used, otherwise, the negative retransmission policy is used.
In summary, in the method for fusing negative and timeout retransmission policies in the core interconnection interface of the embodiment, the sender and the receiver configure and select retransmission policies in both negative retransmission policies (NA policies) and timeout retransmission policies (TA policies), the negative retransmission policies and the timeout retransmission policies have fused hardware resource designs, when the sender sends a retransmission packet to the receiver, if the sender needs to retransmit the packet, the sender and the receiver complete the retransmission of the packet based on the fused hardware resource designs according to the preset and selected negative retransmission policies or the timeout retransmission policies, and the fused hardware resource designs have the following advantages: (1) Compared with a single strategy for realizing reliability transmission by a traditional core interconnection interface adapter layer, the embodiment can realize the configurability of a traditional negative retransmission strategy and a timeout retransmission strategy, and a user can select which strategy to use to realize reliability transmission according to own needs. (2) The negative retransmission and timeout retransmission policy implemented in this embodiment is configurable to be compatible with the reliability transmission policy of UCIe protocol adapter layer and the retransmission management micro packet of the adapter layer. (3) In the embodiment, the negative retransmission strategy and the overtime retransmission strategy have a fused hardware resource design, and the fused hardware resource design comprises whether the retransmission strategy and the overtime retransmission strategy multiplex the same set of retransmission buffer memory bank, and the overtime retransmission strategy independently uses a retransmission overtime counter, so that the hardware cost is lower on the premise of keeping the flexibility and the compatibility.
In addition, the embodiment also provides a multi-chip, which comprises a plurality of chips connected with each other through a chip interconnection interface, wherein the chips are programmed or configured to execute the method for fusing negative and timeout retransmission strategies in the chip interconnection interface.
In addition, the present embodiment also provides a computer readable storage medium having stored therein a computer program or instructions programmed or configured to execute, by a processor, a method of fusing negative and timeout retransmission policies in the core particle interconnect interface.
Furthermore, the present embodiment provides a computer program product comprising a computer program or instructions programmed or configured to execute, by a processor, a method of fusing negative and timeout retransmission policies in the pellet interconnect interface.
The above description is only a preferred embodiment of the present invention, and the protection scope of the present invention is not limited to the above examples, and all technical solutions belonging to the concept of the present invention belong to the protection scope of the present invention. It should be noted that modifications and adaptations to the present invention may occur to one skilled in the art without departing from the principles of the present invention and are intended to be within the scope of the present invention.
Claims (8)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202411073144.XA CN118656316B (en) | 2024-08-06 | 2024-08-06 | Method and chip for integrating negation and timeout retransmission strategies in chiplet interconnection interface |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN202411073144.XA CN118656316B (en) | 2024-08-06 | 2024-08-06 | Method and chip for integrating negation and timeout retransmission strategies in chiplet interconnection interface |
Publications (2)
| Publication Number | Publication Date |
|---|---|
| CN118656316A CN118656316A (en) | 2024-09-17 |
| CN118656316B true CN118656316B (en) | 2024-11-08 |
Family
ID=92704229
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN202411073144.XA Active CN118656316B (en) | 2024-08-06 | 2024-08-06 | Method and chip for integrating negation and timeout retransmission strategies in chiplet interconnection interface |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN118656316B (en) |
Families Citing this family (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN120723699B (en) * | 2025-08-22 | 2025-11-25 | 上海壁仞科技股份有限公司 | UCIe-based data transmission methods, artificial intelligence chips, media, and electronic devices |
Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110784289A (en) * | 2019-10-31 | 2020-02-11 | 海光信息技术有限公司 | Data retransmission method and data retransmission device |
Family Cites Families (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN1165129C (en) * | 2002-05-16 | 2004-09-01 | 武汉汉网高技术有限公司 | ARQ mechanism able to automatically request retransmissions for multiple rejections |
| CN113868172B (en) * | 2021-09-28 | 2024-06-18 | 上海兆芯集成电路股份有限公司 | Interconnect interface |
| US12360934B2 (en) * | 2021-12-30 | 2025-07-15 | Intel Corporation | Parameter exchange for a die-to-die interconnect |
| CN117453596B (en) * | 2023-11-03 | 2024-10-11 | 海光信息技术股份有限公司 | Protocol controller, protocol control method, chip, system on chip and electronic device |
| CN117834755B (en) * | 2024-03-04 | 2024-05-10 | 中国人民解放军国防科技大学 | Interface circuit and chip between protocol layer and adapter layer for chiplet interconnection interface |
-
2024
- 2024-08-06 CN CN202411073144.XA patent/CN118656316B/en active Active
Patent Citations (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN110784289A (en) * | 2019-10-31 | 2020-02-11 | 海光信息技术有限公司 | Data retransmission method and data retransmission device |
Also Published As
| Publication number | Publication date |
|---|---|
| CN118656316A (en) | 2024-09-17 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| US11412078B2 (en) | Data transmission method and first device | |
| CN103248467B (en) | Based on the RDMA communication means of sheet inner connection tube reason | |
| US11381514B2 (en) | Methods and apparatus for early delivery of data link layer packets | |
| US20070174608A1 (en) | Distributed (modular) internal architecture | |
| CN108966046B (en) | An FPGA-based fusion MAC controller with two communication interfaces | |
| CN118656316B (en) | Method and chip for integrating negation and timeout retransmission strategies in chiplet interconnection interface | |
| CN117834755B (en) | Interface circuit and chip between protocol layer and adapter layer for chiplet interconnection interface | |
| KR19990067626A (en) | Packet transmitter and receiver | |
| US7305605B2 (en) | Storage system | |
| CN111030747A (en) | FPGA-based SpaceFibre node IP core | |
| CN100375466C (en) | A data packet forwarding control device and method | |
| WO2023109891A1 (en) | Multicast transmission method, apparatus and system | |
| US9900257B2 (en) | Method and universal interface chip for achieving high-speed data transmission | |
| CN101145968B (en) | Data sending and receiving method between network management system and transmission equipment | |
| CN120562359A (en) | A high-speed serial communication system protocol adaptation layer circuit and working method for chiplet-to-chip interconnection | |
| CN100571183C (en) | A barrier operation network system, device and method based on fat tree topology | |
| CN101304296A (en) | Network device and transmission method thereof | |
| CN104426866A (en) | Data transmission method and apparatus | |
| CN118764450A (en) | A packet granularity load balancing method and system | |
| US7664863B2 (en) | Data transferring method | |
| CN115174496B (en) | A processing terminal and switch for intra-network aggregation transmission | |
| CN119854220B (en) | A method and apparatus for initializing flow control | |
| CN100459483C (en) | A method of controlling the sending frequency of status reports | |
| CN120880999B (en) | Message transmission methods, PCIe devices and computer-readable storage media | |
| CN114221905B (en) | Processing unit and flow control unit, and related methods |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| GR01 | Patent grant | ||
| GR01 | Patent grant |