CN115914110B

CN115914110B - A hang-release circuit and hang-release method based on on-chip network

Info

Publication number: CN115914110B
Application number: CN202110943339.5A
Authority: CN
Inventors: 黄凯
Original assignee: Beijing Simm Computing Technology Co ltd; Guangzhou Ximu Semiconductor Technology Co ltd
Current assignee: Beijing Simm Computing Technology Co ltd; Guangzhou Ximu Semiconductor Technology Co ltd
Priority date: 2021-08-17
Filing date: 2021-08-17
Publication date: 2025-09-23
Anticipated expiration: 2041-08-17
Also published as: CN115914110A

Abstract

The embodiment of the present invention discloses a deadlock release circuit and a deadlock release method based on an on-chip network. The deadlock release circuit of the embodiment of the present invention includes a plurality of block circuits, wherein the block circuit includes a synchronizer and a plurality of data buffers, and the plurality of block circuits are sequentially connected through a data channel to form a ring path; a channel buffer is provided on the ring path and is configured to cache data transmitted on the ring path in a cache state; a deadlock management circuit is connected to the block circuit and the channel buffer, and is used to receive the first signal and send cache enable information to the channel buffer; the cache enable information is used to instruct the channel buffer to switch from a pass-through state to a cache state, so as to provide data exchange space for the ring path to be unhooked. The deadlock management circuit and the channel buffer provided in the above circuit can release the deadlock caused by the ring path during data transmission.

Description

On-chip network-based hang-up release circuit and hang-up release method

Technical Field

The invention relates to the technical field of communication, in particular to a network-on-chip-based hang-up release circuit and a hang-up release method.

Background

Communication between subsystems is implemented in a System on Chip (SoC) using a network-on-Chip (NoC). Specifically, in the NoC architecture, a plurality of subsystems (i.e., compute cores) are included, which are interconnected by Broadcast Routers (BRs), which route data in the form of packets to a target subsystem.

Because the SoC of the system on chip based on the NoC of the network on chip can better meet the data transmission requirements of high bandwidth and low delay, the NoC is the best interconnection mechanism of the SoC of the system on chip, but in the complex NoC network topology design, the phenomenon of hanging up caused by a ring-shaped routing path (ring path) often occurs in the data transmission process.

In summary, how to solve the problem of the ring-shaped channel in the data transmission process is the problem to be solved at present.

Disclosure of Invention

In view of the above, the embodiment of the invention provides a network-on-chip-based hang-up release circuit and a hang-up release method, which can release the hang-up phenomenon caused by a ring-shaped channel in the data transmission process.

In a first aspect, an embodiment of the present invention provides a network-on-chip-based hang-up release circuit, including:

The system comprises a plurality of blocking circuits, a plurality of hanging management circuits and a plurality of data buffer areas, wherein the blocking circuits comprise a synchronizer and the plurality of data buffer areas, and the plurality of blocking circuits are sequentially connected through data channels to form an annular passage;

the channel buffer area is arranged on the annular passage and is configured to buffer the data transmitted on the annular passage in a buffer state;

and the hang-up management circuit is connected with the blocking circuit and the channel buffer area and is used for receiving the first signal and sending enabling buffer information to the channel buffer area, wherein the enabling buffer information is used for indicating the channel buffer area to be switched from a direct state to a buffer state so as to provide a data exchange space for releasing hang-up of the annular passage.

Optionally, in response to all the data buffers in any of the block circuits on the ring path reaching a preset high waterline, the synchronizer of any of the block circuits sends the first signal to the hang-up management circuit, and/or the data buffers enter a blocked state to indicate that the capacity of the data buffers reaches the preset high waterline.

Optionally, in response to the hang management circuit receiving the first signals sent by all synchronizers of the block circuit connected with the hang management circuit, the hang management circuit sends enabling buffer information to the channel buffer area, and the channel buffer area is switched from a through state to a buffer state.

Optionally, the partitioning circuit includes an arbiter;

the hang-up management circuit is also used for sending a hang-up release signal to the arbiter positioned on the annular passage in the blocking circuit;

and after the arbiter receives the unhooking signal, the arbiter is switched from a fair scheduling state to a priority scheduling state, and the data on the annular channel is scheduled preferentially.

Optionally, the hang-up management circuit is further configured to receive a second signal sent by a synchronizer of the blocking circuit, and send disabling cache information to the channel buffer;

The second signal is used for indicating that all the data buffers in the block circuit enter a second state, the second state indicates that the data buffers reach a preset low waterline, and the disabling buffer information is used for indicating that the channel buffers are switched from a buffer state to a direct state.

Optionally, in response to all the data buffers in any of the block circuits on the ring path reaching a preset low waterline, the synchronizer of the any of the block circuits sends the second signal to the hang management circuit.

Optionally, in response to the hang-up management circuit receiving the second signals sent by all synchronizers of the block circuit connected with the hang-up management circuit, the hang-up management circuit sends disabling cache information to the channel buffer area, and switches the channel buffer area from the cache state to the pass-through state.

Optionally, the hang-up management circuit is further configured to send a hang-up release signal to the arbiter located on the ring path in the blocking circuit;

After the arbiter receives the deadlock release signal, the arbiter switches from a priority scheduling state to a fair scheduling state.

Optionally, the channel buffer is disposed on the annular channel, and specifically includes:

the channel buffer is arranged between any two of the block circuits, or

The channel buffer is arranged inside any of the block circuits.

Optionally, the ring channel transmits data clockwise or the ring channel transmits data counterclockwise.

In a second aspect, an embodiment of the present invention provides a network-on-chip-based hang-up release circuit, including:

the system comprises a plurality of blocking circuits, a plurality of hanging management circuits and a plurality of control circuits, wherein the blocking circuits comprise a plurality of data buffer areas, and the plurality of blocking circuits are sequentially connected through data channels to form an annular passage;

And the hang-up management circuit is connected with the blocking circuit and the channel buffer area and is used for receiving the third signal and sending enabling buffer information to the channel buffer area, wherein the enabling buffer information is used for indicating the channel buffer area to be switched from a direct state to a buffer state so as to provide a data exchange space for releasing hang-up of the annular passage.

In a third aspect, an embodiment of the present invention provides a method for releasing a dead hook based on a network on chip, where the method includes:

receiving a plurality of first signals sent by synchronizers of all block circuits on a ring channel, wherein the first signals are used for indicating that all data buffer areas in the block circuits enter a blocking state;

and sending enabling buffer information to the channel buffer area, switching the channel buffer area from a straight-through state to a buffer state, and/or sending an unhooking signal to an arbiter on the ring channel, wherein the arbiter receives the unhooking signal, and then switching from a fair scheduling state to a priority scheduling state to schedule data on the ring channel preferentially.

Optionally, a plurality of second signals sent by all the block circuits on the ring channel are received, where the second signals are used to indicate that all the data buffers in the block circuits enter a second state, and the second state indicates that the data buffers reach a preset low waterline;

And sending disabling cache information to the channel buffer area, switching the channel buffer area from a cache state to a straight-through state, and simultaneously sending a hang-up release signal to an arbiter on the ring-shaped channel, wherein the arbiter is switched from the priority scheduling state to the fair scheduling state after receiving the hang-up release signal.

In a fourth aspect, an embodiment of the present invention provides a method for releasing a dead hook based on a network on chip, where the method includes:

receiving a plurality of third signals sent by a plurality of data buffers of all block circuits on a ring channel, wherein the third signals are used for indicating that the data buffers enter a blocking state;

In a fifth aspect, an embodiment of the present invention provides an integrated circuit comprising a plurality of cores, a network on chip, and a network on chip-based hang release circuit as in the first aspect, any one of the possibilities of the first aspect, or any one of the second aspects.

In a sixth aspect, an embodiment of the present invention provides a board, where the board includes the integrated circuit of the fifth aspect.

In a seventh aspect, an embodiment of the present invention provides a server, where the server includes the board card of the sixth aspect.

The embodiment of the invention discloses a device for processing data, which comprises a plurality of blocking circuits, a channel buffer zone, a hanging management circuit and a channel buffer zone, wherein the blocking circuits comprise a synchronizer and a plurality of data buffer zones, the blocking circuits are sequentially connected through data channels to form an annular passage, the synchronizer is used for sending a first signal to the hanging management circuit, the first signal is used for indicating that all the data buffer zones of the blocking circuits enter a blocking state, the channel buffer zone is arranged on the annular passage and is configured for buffering the data transmitted on the annular passage in a buffer state, the hanging management circuit is connected with the blocking circuits and the channel buffer zone and is used for receiving the first signal and sending enabling buffer information to the channel buffer zone, and the enabling buffer information is used for indicating that the channel buffer zone is switched from a straight-through state to a buffer state so as to provide a data exchange space for releasing the blocking of the annular passage. Through the circuit, when all the data buffer areas on the annular passage enter a blocking state, namely the annular passage is blocked, a data exchange space is provided for releasing the blocking of the annular passage through the channel buffer areas, and the blocking phenomenon caused by the annular passage in the data transmission process is relieved.

Drawings

The above and other objects, features and advantages of the present invention will become more apparent from the following description of embodiments of the present invention with reference to the accompanying drawings, in which:

FIG. 1 is a schematic diagram of a prior art network-on-chip SOC architecture;

FIG. 2 is a schematic diagram of a prior art data transmission process;

FIG. 3 is a schematic diagram of a latch-up release circuit according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a latch-up release circuit according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of an internal circuit of a BR according to an embodiment of the present invention;

FIG. 6 is a schematic diagram of a latch-up release circuit according to an embodiment of the present invention;

fig. 7 is a flowchart of a process of a hang-up release method according to an embodiment of the present invention.

Detailed Description

The present disclosure is described below based on examples, but the present disclosure is not limited to only these examples. In the following detailed description of the present disclosure, certain specific details are set forth in detail. The present disclosure may be fully understood by those skilled in the art without a review of these details. Well-known methods, procedures, flows, components and circuits have not been described in detail so as not to obscure the nature of the disclosure.

Moreover, those of ordinary skill in the art will appreciate that the drawings are provided herein for illustrative purposes and that the drawings are not necessarily drawn to scale.

Unless the context clearly requires otherwise, the words "comprise," comprising, "and the like throughout the specification are to be construed as including, rather than being exclusive or exhaustive, that is to say, as" including but not limited to.

In the description of the present disclosure, it is to be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Furthermore, in the description of the present disclosure, unless otherwise indicated, the meaning of "a plurality" is two or more.

Communication between subsystems is implemented in a system on chip SoC using a network on chip NoC. Specifically, the SoC includes a plurality of Blocks (BANKs), each Block (BANK) includes a plurality of COREs (COREs), it is assumed that SoC includes 4 BANKs, each BANK includes 4 COREs, as shown in fig. 1,4 BANKs are respectively BANK0, BANK1, BANK2 and BANK3, taking BANK0 as an example, its COREs are respectively COREs 0, CORE1, CORE2 and CORE3, and the names of the COREs in other BANKs are shown in fig. 1, which is not repeated here. Each BANK further comprises 4 Broadcast Routers (BR), wherein the Broadcast routers in the BANK0 are BR0, BR1, BR2 and BR3 respectively, the Broadcast routers in the BANK1 are BR4, BR5, BR6 and BR7 respectively, the Broadcast routers in the BANK2 are BR8, BR9, BR10 and BR11 respectively, the Broadcast routers in the BANK3 are BR12, BR13, BR14 and BR15 respectively, and each BANK further comprises two asynchronous bridges (Async Bridge) for carrying out data connection with other BANKs.

The BR0, BR1, BR4, BR5, BR8, BR9, BR12, BR13 of the 4 BANKs and the two asynchronous bridges included in each BANK form a closed ring route path, that is, a ring path, and during data transmission, the data can be transmitted clockwise on the ring path or counterclockwise on the ring path. When the annular passage is transmitted anticlockwise or clockwise, the annular passage generally has the characteristic of unidirectional circulation transmission, and the problem of annular passage hanging up caused by mutual waiting of resources can be solved. The hanging will be described by taking the clockwise transmission of data on the loop path as an example. Specifically, as shown in fig. 2, since data is transmitted clockwise, the Buffer area (Buffer) of each BANK has data sent to other BANKs, the Buffer resource Buffer of BANK0 is all data sent to BANK3, the Buffer resource Buffer of BANK1 is all data sent to BANK2, the Buffer resource Buffer of BANK3 is all data sent to BANK0, the Buffer resource Buffer of BANK2 is all data sent to BANK1, so that the phenomenon of resource waiting occurs, namely, the data sent to BANK3 needs to have a resource space in the Buffer waiting for BANK1 and can be continuously routed forward, the data (data) sent to BANK2 needs to have a Buffer resource space in the Buffer waiting for BANK3 and can be continuously routed forward, the data sent to BANK0 needs to have a Buffer resource space in the Buffer waiting for BANK2 and can be continuously routed forward, the data sent to BANK1 needs to have a Buffer resource space in the Buffer waiting for BANK0 and can be continuously routed forward, and the Buffer resource Buffer to form a Buffer ring in the Buffer waiting for BANK2 can not have any idle path, and thus each idle path can not be accessed forward due to the fact that the Buffer is not have any idle path. The number of buffers in the four BANKs shown in fig. 2 is merely exemplary, and is specifically determined according to actual situations.

In the prior art, in order to solve the problem of hanging death, a ring-shaped channel is structurally broken, a system shown in the figure 2 above breaks a physical channel between any two BANK, for example, breaks a physical channel between BANK0 and BANK2, if BANK0 needs to send data to BANK2, the routing rule cannot be directly sent, the original BANK0- > BANK2 channel needs to be replaced by the BANK0- > BANK3- > BANK2, if BANK2 needs to send data to BANK0, the routing rule cannot be directly sent, the original BANK2- > BANK3- > BANK0 is used, if BANK3 needs to send data to BANK0, the original BANK3- > BANK2- > BANK0 channel is replaced by the BANK3- > BANK2, the ring-shaped channel cannot be directly sent, the data access performance is greatly improved, and compared with the case that the data access performance of the BANK0 is not changed, and the ring-shaped channel is not required to be greatly changed, and the data access performance of the BANK0 is greatly improved, and the system is not required to be greatly changed, and the access performance of the BANK1 is greatly improved.

In summary, how to solve the problem of hanging up in the annular passage and avoid the problems caused by breaking the annular passage structure in the prior art is needed to be solved at present.

In the embodiment of the invention, in order to solve the problem of hanging in the annular passage, a hanging releasing circuit based on the network on chip is provided, and particularly as shown in fig. 3, fig. 3 is a schematic diagram of the hanging releasing circuit based on the network on chip in the embodiment of the invention. Specifically, the device comprises a plurality of block circuits 300, a channel buffer 301 and a hang management circuit 302.

For the plurality of block circuits 300, a plurality of the block circuits are sequentially connected through data channels to form a ring-shaped path. The data channel is used for data transmission between the block circuits. Each blocking circuit comprises a synchronizer and a plurality of data buffers, the data buffers are used for buffering data transmitted on the annular passage, the synchronizers collect that the plurality of data buffers in the blocking circuit where the synchronizers are located all enter a blocking state and then are used for sending a first signal to the hanging management circuit, and the first signal is used for indicating that all the data buffers of the blocking circuit enter the blocking state, and at the moment, the blocking circuit where the synchronizers are located is blocked. Or each blocking circuit comprises a plurality of data buffers, the data buffers are used for sending third signals to the hanging and dying management circuit, the third signals are used for indicating that the data buffers enter a blocking state, and at the moment, only the data buffers sending the third signals are blocked. Since the signals sent to the hanging management circuit are sent from the blocking circuit regardless of whether the blocking circuit has a synchronizer, the connection lines between the blocking circuit and the hanging management module are exemplarily shown in fig. 3, and in actual operation, the connection lines may be connected to the synchronizer in the blocking circuit or to each data buffer in the blocking circuit, which is not described herein.

And the channel buffer 301 is arranged on the annular path and is configured to buffer data transmitted on the annular path in a buffer state. The channel buffer has two states, namely a pass-through state and a buffer state. The channel buffer area is in a straight-through state, and does not buffer data on the annular passage, and in the buffer state, the channel buffer area can buffer data on the annular passage.

And the suspension management circuit 302 is connected with the block circuit and the channel buffer, and is used for receiving the first signal or the third signal, and sending enabling buffer information to the channel buffer in response to the received first signal sent by the synchronizer of the block circuits or the received third signal sent by the data buffer of the block circuits. The enabling buffer information is used for indicating that the channel buffer area is switched from a direct state to a buffer state so as to provide a data exchange space for the annular passage to be unhooked.

In the embodiment of the invention, only the channel buffer area is needed to be added on the annular passage, the topological structure and the routing rule of the NoC are not needed to be changed, the realization is simple, the data transmission of other passages except the annular passage is not influenced in the process of releasing the hanging, and the performance loss is small.

In one possible implementation manner, the blocking circuit includes a synchronizer and a plurality of data buffers, the synchronizer is connected with the plurality of data buffers and the hang-up management module, and is used for collecting states of the plurality of data buffers, and when all the data buffers enter a blocking state, the synchronizer generates a first signal and sends the first signal to the hang-up management module. Wherein, the data amount in the data buffer area reaches a preset high waterline, which indicates that the data buffer area enters a blocking state, namely, the data buffer area is in a full state. When all data buffers on the annular channel in the block circuit enter a blocking state, the block circuit is blocked on the annular channel. In this embodiment, after the synchronizer of the block circuit collects the full state of all the data buffers in the block circuit, the block circuit uniformly sends signals to the hang-up management module, and at this time, the hang-up management module receives a signal to determine that the block circuit sending the signals is in the blocked state.

In one possible implementation, the blocking circuit includes a plurality of data buffers, each data buffer being configured to send a third signal to the hang up management circuit, the third signal being configured to indicate that an amount of data in the data buffer reaches a predetermined high water line.

In the embodiment of the present invention, the data buffer may be a first-in first-out queue.

In the embodiment of the invention, the number of the channel buffers on the annular passage can be 1 or more, and the greater the number of the channel buffers, the faster the hanging-off speed is. In theory, the purpose of releasing the hanging can be achieved by only adding 1 channel buffer area, and the more the number of the channel buffer areas is on the annular passage, the faster the hanging releasing speed is.

In one possible implementation manner, in response to the hang management circuit receiving a first signal sent by all the blocking circuits connected with the hang management circuit, the hang management circuit sends enabling buffer information to the channel buffer area, and the channel buffer area is switched from a through state to a buffer state. That is, when the hang up management circuit determines that all the block circuits are in the blocked state, then the channel buffers on the ring path are enabled.

In another possible implementation manner, if one or more channel buffers are correspondingly set for each blocking circuit, the hang management circuit may send enabling buffer information to the channel buffer corresponding to the hang blocking circuit if the hang management circuit receives the first signal of a certain blocking circuit.

In one possible implementation, the blocking circuit includes an arbiter, and the hang management circuit is further configured to send an unhook signal to the arbiter in the blocking circuit on the ring path. After the arbiter receives the unhooking signal, the arbiter switches from a fair scheduling state to a priority scheduling state to schedule data on the ring path preferentially. According to the embodiment of the invention, the data on the annular channel can be preferentially transmitted by enabling the channel buffer area on the annular channel and controlling the arbiter on the annular channel.

In another embodiment, the arbiter on the ring channel may be switched from the fair scheduling state to the priority scheduling state before the channel buffer is started. Specifically, after determining that an annular passage is blocked, the hang-up management circuit firstly transmits a hang-up release signal to the arbiters located on the annular passage in the block circuits, the arbiters are switched from a fair scheduling state to a priority scheduling state and then feed back response signals (acks) to the hang-up management circuit, and the hang-up management circuit receives response signals of all arbiters located on the annular passage in each block circuit and transmits enabling buffer information to a channel buffer. By limiting the starting sequence of the arbiter and the channel buffer in this embodiment, the high efficiency of releasing the ring channel from the ring channel can be ensured, because if the channel buffer is started first, the arbiter is in a fair scheduling state at this time, and may schedule the data of other channels into the ring channel, which is equivalent to that of the other channels, and the ring channel is continuously started even if the channel buffer is started at this time, the buffer may be blocked and the effect of releasing the ring channel cannot be achieved.

In another alternative embodiment, the blocking circuit includes a synchronizer and a plurality of data buffers, the synchronizer may collect states of the data buffers in the blocking circuit, and send a second signal to the hang-up management module after collecting states of all the data buffers in the blocking circuit as the second state, where the hang-up management module receives the second signal to determine that the blocking state of the blocking circuit has been removed, that is, all the data buffers located on the ring path in the blocking circuit have been removed.

Optionally, the blocking circuit includes a plurality of data buffers, each data buffer may send a fourth signal to the hang-up management circuit, where the fourth signal is used to indicate that the data buffer reaches a preset low waterline, that is, the data buffer is in a state of being empty. The hang-up management circuit can judge that the blocking state of the blocking circuit is removed through the fourth signals of all the data buffers of the received blocking circuit.

In one possible implementation manner, the suspension management circuit is further configured to send disabling cache information to the channel buffer, where the disabling cache information is used to instruct the channel buffer to switch from a cache state to a pass-through state. After the data cached in the channel buffer is emptied, and in the state that the blocking circuit is released from the blocking circuit, the blocking management module sends out disabling cache information to the channel buffer.

In one possible implementation, the hang management circuit is further configured to send a hang release signal to the arbiter in the blocking circuit that is located on the ring path. And after receiving the deadlock release signal, the arbiter is switched to the fair scheduling state from the priority scheduling state. Namely, the annular passage is released from being hung, and the data transmission is restored to be normal. Alternatively, the hang management circuit may send hang release signals to all arbiters located on the ring path in each of the block circuits, respectively. In another alternative embodiment, the blocking circuit further comprises a synchronizer, the hang management circuit may send a hang release signal to the synchronizer of each blocking circuit, which then synchronizes the hang release signal to all arbiters in the blocking circuit that are located on the ring path.

In the embodiment of the present invention, the channel buffer 301 in fig. 3 is disposed between any two of the block circuits.

Optionally, the channel buffer may also be disposed inside any of the blocking circuits. Specifically, as shown in fig. 4, the channel buffer is disposed between two broadcast routers of any of the blocking circuits. In the embodiment of the present invention, the number of the channel buffers may be plural, and only one of the above-mentioned fig. 3 and fig. 4 is taken as an example for illustration.

In the embodiment of the present invention, in order to more clearly express the circular path data transmission, the internal circuit structure of each BR is shown in fig. 5, each BR can receive data and transmit data in each direction, the port for receiving data is denoted by RX, and each BR can receive data from four directions, that is, each BR includes four RX, which are RX0, RX1, RX2 and RX3, respectively; the ports for transmitting data are denoted by TX, each BR can transmit data in four directions, that is, each BR includes four TX, TX0, TX1, TX2 and TX3, a Splitter (Splitter) is correspondingly disposed on each RX of the directions and is used for transmitting the data received in the directions to TX of the other three directions, the BR further includes 4 splitters, namely Splitter0, splitter1, splitter2 and Splitter3, the Splitter may be also referred to as a distribution module, an arbiter (Arbiter) is correspondingly disposed on each TX of the directions and is used for arbitrating the data transmitted in the other three directions, the BR further includes 4 Arbiter, namely Arbiter0, arbiter1, arbiter2 and Arbiter, and the TX of each direction can receive the data received in the directions, and a buffer (buffer) is correspondingly disposed on each TX of the directions and is a first-in-out queue (First Input First Output, fff), the BR further includes 12 FIFOs, and is correspondingly disposed between each FIFO and each FIFO channel for forming a FIFO and each FIFO channel between the FIFO and the other three directions. Specifically, the transmission process of the data after entering the BR is that the data can be distributed to the FIFO of the target direction according to the shortest path routing principle through the splitter distribution module, the data which needs to be routed to the same TX direction are converged at the TX side, namely the three FIFOs of the target direction respectively converge the data of the RX of the three directions, the RX directions of the data in each FIFO are consistent and the TX directions are consistent, the converged data is scheduled by arbiter of the target direction on the data in the corresponding three FIFOs, and the data is sent to a TX port for output after the scheduling.

In the embodiment of the present invention, assuming that 4 BANKs form a ring path, each BANK includes 4 COREs, in a clockwise data transmission direction, as shown in fig. 1, data is input from a port on one side of BR in the BANKs and output from a port on the opposite side, if BR1, BR0, BR4, BR5, BR12, BR13, BR9, BR8 are all input from RX3, data is output from TX1, and a specific schematic diagram is shown in fig. 6, where the split 2 on the TX1 side, arbiter on the TX1 side, and FIFO between split 2 and Arbiter corresponding to RX3 of BR1, BR5, BR12, BR13, BR9, BR8 form a ring path of the 4 BANKs. The FIFO located on the ring path in the BANK is the data buffer. In fig. 6, a hang up management circuit is designed to connect with four BANKs, four channel buffers are designed, each channel buffer between any two BANKs, for example, between TX1 of BANK0 and RX3 of BANK1, and the dotted line in fig. 6 indicates a clockwise ring path.

In the embodiment of the invention, the processing flow of the on-chip network-based hanging release method is shown in fig. 7, and specifically includes the following steps:

step S700, a plurality of first signals sent by all the block circuits on the ring channel are received.

The first signal is used for indicating that all the data buffers in the block circuit enter a blocking state. The data buffer area enters a blocking state, which indicates that the capacity of the data buffer area reaches a preset high waterline

In one possible implementation, the data buffer is a FIFO, where the FIFO reaches a preset high water line to indicate that the FIFO is about to be unable to receive new data, for example, if the depth of the FIFO in the BR is 8, the high water line and the low water line of the FIFO on the ring path in the BR are set, and the high water line is 6, and the low water line is 2, and when the number of buffered data in the FIFO exceeds the high water line, the FIFO is considered to be in a full state, and when the number of buffered data in the FIFO is less than the low water line, the FIFO is considered to be in an empty state.

And responding to the fact that all data buffers in any blocking circuit on the annular path reach a preset high waterline, and sending the first signal to the hang-up management circuit by the any blocking circuit.

For example, as shown in fig. 6, in the BRO, a part of the ring paths is RX3 of BR0 to TX1 of BR0, the passing FIFO is a gray FIFO in BR0 in the drawing, and when the gray FIFO reaches or exceeds a preset high waterline, the BANK corresponding to the gray FIFO sends the first signal to the dying managing circuit.

In a possible implementation manner, the FIFO on the ring path in fig. 6 may also be directly connected to the hang management circuit, which is not limited by the embodiment of the present invention. Assuming that all 8 FIFOs on the ring path shown in fig. 6 send a full signal to the hang up management module, the full state of all 8 FIFOs on the ring path is indicated, that is, the ring path is hung up, and data transmission cannot be performed.

And step 701, sending enabling buffer information to the channel buffer area, switching the channel buffer area from a straight-through state to a buffer state, and simultaneously sending a unhooking signal to an arbiter on the loop channel.

In the embodiment of the invention, the hang-up management circuit receives the first signals sent by all the blocking circuits connected with the hang-up management circuit, and the hang-up management circuit sends enabling cache information to the channel buffer area to switch the channel buffer area from a direct state to a cache state. Assuming that the depth of the channel buffer area is set to be 2, in the through state, no data is buffered in the channel buffer area when the data is transmitted, and when the channel buffer area is switched from the through state to the buffer state, that is, when the ring channel is suspended, the channel buffer area is started, so that the data is buffered in the channel buffer area, and an exchange space is provided for the data on the ring channel to be suspended.

Optionally, after the arbiter receives the unhooking signal, the arbiter switches from a fair scheduling state to a priority scheduling state, and schedules data on the ring path preferentially.

As can be seen from fig. 6, each arbiter is connected with a plurality of FIFOs, when the arbiter on the ring channel in any BR in fig. 6 receives the contact hang-up signal sent by the hang-up management module, it will schedule the data in the gray FIFOs preferentially, and when the contact hang-up signal is received, if the data of other channels are being transmitted, it will switch channels after the data transmission of one granularity in the channel is completed.

In the embodiment of the present invention, the FIFO on the ring path may be directly connected to the hang-up management module, or may be connected to the hang-up management module through the BANK where the FIFO is located.

Step S702, a plurality of second signals sent by all the block circuits on the ring channel are received.

The second signal is used for indicating that all the data buffers in the block circuit enter a second state, and the second state indicates that the data buffers reach a preset low waterline.

In one possible implementation, the second state may also be referred to as a null state.

In one possible implementation, the ring path is de-stalled due to the channel buffer being enabled, the data buffered in the FIFO on the ring path is below the low waterline, the FIFO is in a null state, and in response to all data buffers in any of the block circuits on the ring path reaching a predetermined low waterline, the any of the block circuits sends the second signal to the stall management circuit. If any blocking circuit on the ring path sends a second signal to the hang-up management circuit through the synchronizer of the blocking circuit or all FIFOs on the ring path directly send a null signal to the hang-up management circuit, the ring path is described as being unhooked.

And step 703, sending disabling buffer information to the channel buffer area, switching the channel buffer area from the buffer status to the through status, and sending a hang-up release signal to an arbiter on the ring channel.

The embodiment of the invention provides an integrated circuit, which comprises a plurality of cores, a network-on-chip and a hang-up release circuit based on the network-on-chip.

The embodiment of the invention provides a board card, which comprises the integrated circuit.

The embodiment of the invention provides a server, which comprises the board card.

As will be appreciated by one skilled in the art, aspects of embodiments of the present invention may be embodied as a system, method or computer program product. Accordingly, aspects of embodiments of the invention may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a "circuit," module, "or" system. Furthermore, aspects of embodiments of the invention may take the form of a computer program product embodied in one or more computer-readable media having computer-readable program code embodied therein.

Any combination of one or more computer readable media may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of embodiments of the present invention, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

The computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, such as in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of embodiments of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer and partly on a remote computer, as a stand-alone software package, or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention described above describe aspects of embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer-readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.

The above description is only of the preferred embodiments of the present invention and is not intended to limit the present invention, and various modifications and variations may be made to the present invention by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A network-on-chip based hang-up release circuit, comprising:

2. The circuit of claim 1, wherein the hang management circuit sends enable buffer information to the channel buffer to switch the channel buffer from a pass-through state to a buffer state in response to the hang management circuit receiving a first signal sent by a synchronizer of all of the partition circuits connected thereto.

3. The circuit of claim 1 or 2, wherein the blocking circuit comprises an arbiter;

and after receiving the unhooking signal, the arbiter is switched to a priority scheduling state from a fair scheduling state so as to schedule the data on the annular channel preferentially.

4. The circuit of any one of claims 1 or 2, wherein the hang management circuit is further configured to receive a second signal sent by a synchronizer of the blocking circuit and send disabling cache information to the channel buffer;

The second signal is used for indicating that all the data buffers in the block circuit enter a second state, and the second state indicates that the data buffers reach a preset low waterline;

The disabling cache information is used for indicating that the channel buffer area is switched from a cache state to a through state.

5. The circuit of claim 4, wherein the hang management circuit sends out disabling cache information to the channel buffer to switch the channel buffer from a cache state to a pass-through state in response to the hang management circuit receiving a second signal sent by all synchronizers of the block circuits connected thereto.

6. The circuit of claim 3, wherein the hang management circuit is further to send a hang release signal to the arbiter in the block circuit on the ring path;

7. The circuit of claim 1, wherein the channel buffer is disposed on the annular path, comprising:

the channel buffer is arranged between any two of the block circuits, or

The channel buffer is arranged inside any of the block circuits.

8. A network-on-chip based hang-up release circuit, comprising:

The hang-up management circuit is connected with the blocking circuit and the channel buffer area and is used for receiving the third signal and sending enabling buffer information to the channel buffer area through receiving the third signal of all the data buffer areas of the blocking circuit, wherein the enabling buffer information is used for indicating that the channel buffer area is switched from a direct state to a buffer state so as to provide a data exchange space for the annular passage to be unhooked.

9. The method for releasing the hanging death based on the network on chip is characterized by comprising the following steps of:

receiving a plurality of first signals sent by synchronizers of all block circuits on a ring-shaped passage, wherein the ring-shaped passage is formed by sequentially connecting a plurality of block circuits through data channels, and the first signals are used for indicating that all data buffer areas of the block circuits enter a blocking state;

And transmitting enabling buffer information to a channel buffer area, switching the channel buffer area from a straight-through state to a buffer state, and/or transmitting an unhooking signal to an arbiter on the annular channel, wherein the arbiter switches from a fair scheduling state to a priority scheduling state to schedule data on the annular channel preferentially after receiving the unhooking signal, and the channel buffer area is arranged on the annular channel.

10. The method of claim 9, wherein the method further comprises:

receiving a plurality of second signals sent by all the block circuits on the annular channel, wherein the second signals are used for indicating that all the data buffer areas of the block circuits enter a second state;

And sending disabling cache information to the channel buffer area, switching the channel buffer area from a cache state to a straight-through state, and/or sending a hang-up release signal to an arbiter on the ring channel, wherein the arbiter receives the hang-up release signal and then switches from the priority scheduling state to the fair scheduling state.

11. The method for releasing the hanging death based on the network on chip is characterized by comprising the following steps of:

receiving a plurality of third signals sent by a plurality of data buffers of all block circuits on a ring-shaped passage, wherein the ring-shaped passage is formed by sequentially connecting a plurality of block circuits through data channels, and the third signals are used for indicating that the data buffers enter a blocking state;

12. An integrated circuit comprising a plurality of network-on-chip based hang-off circuits according to any one of claims 1-8.