CN106453072A - Greedy distribution method and device of on-chip network router channel resources and router - Google Patents
Greedy distribution method and device of on-chip network router channel resources and router Download PDFInfo
- Publication number
- CN106453072A CN106453072A CN201610460938.0A CN201610460938A CN106453072A CN 106453072 A CN106453072 A CN 106453072A CN 201610460938 A CN201610460938 A CN 201610460938A CN 106453072 A CN106453072 A CN 106453072A
- Authority
- CN
- China
- Prior art keywords
- port
- packet
- network
- output port
- greedy
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 43
- 238000007781 pre-processing Methods 0.000 claims abstract description 56
- 238000013507 mapping Methods 0.000 claims abstract description 17
- 230000005540 biological transmission Effects 0.000 claims description 38
- 230000008569 process Effects 0.000 claims description 6
- 238000002203 pretreatment Methods 0.000 claims 1
- 238000005516 engineering process Methods 0.000 abstract description 3
- 238000004364 calculation method Methods 0.000 description 12
- 238000012545 processing Methods 0.000 description 11
- 238000010586 diagram Methods 0.000 description 8
- 239000000872 buffer Substances 0.000 description 6
- 239000012634 fragment Substances 0.000 description 6
- 230000007246 mechanism Effects 0.000 description 6
- 238000007726 management method Methods 0.000 description 4
- 238000011160 research Methods 0.000 description 4
- 238000012546 transfer Methods 0.000 description 4
- 238000013467 fragmentation Methods 0.000 description 3
- 238000006062 fragmentation reaction Methods 0.000 description 3
- 230000010365 information processing Effects 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 230000007547 defect Effects 0.000 description 2
- 238000005457 optimization Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 235000019580 granularity Nutrition 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 238000013468 resource allocation Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L45/00—Routing or path finding of packets in data switching networks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L45/00—Routing or path finding of packets in data switching networks
- H04L45/60—Router architectures
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
本发明提供了一种片上网络路由器通道资源的贪婪分配方法,适用于片上网络技术领域,所述贪婪分配方法在交叉开关分配阶段为路由器的内部的输入端口与输出端口进行映射之前进行,包括:预处理步骤,当数据包进入所述输入端口的子通道和虚通道中时,将数据包信息进行归类处理得到预处理信息表;执行步骤,根据所述预处理信息表以及所述数据包对应的所述输出端口的大小,选择最大限度多个所述数据包匹配所述子通道和所述输出端口。同时还提供一种片上网络路由器通道资源的贪婪分配装置。借此,本发明利用贪婪算法思想对输入端口和输出端口之间进行通道的匹配,达到趋向于最优的映射,提高通道的利用率。
The present invention provides a method for greedily allocating channel resources of a network-on-chip router, which is applicable to the field of network-on-chip technology. The greedy allocation method is performed before the internal input port and output port of the router are mapped in the crossbar allocation stage, including: In the preprocessing step, when the data packet enters the sub-channel and virtual channel of the input port, the data packet information is classified and processed to obtain a preprocessing information table; the execution step is based on the preprocessing information table and the data packet Corresponding to the size of the output port, select a maximum number of the data packets to match the sub-channel and the output port. At the same time, it also provides a device for greedily allocating channel resources of the on-chip network router. Thereby, the present invention uses the idea of greedy algorithm to match the channel between the input port and the output port, so as to achieve the mapping that tends to be optimal and improve the utilization rate of the channel.
Description
技术领域technical field
本发明涉及片上网络技术领域,尤其涉及一种片上网络路由器通道资源的贪婪分配方法、装置及路由器。The present invention relates to the field of on-chip network technology, in particular to a greedy allocation method, device and router for on-chip network router channel resources.
背景技术Background technique
在大规模众核处理器中,为了满足众核之间的数据传输需求,片上网络越来越发挥着重要的作用。随着面向互联网、云计算等服务应用的多样化和复杂化,传输在片上网络的数据包类型也变的多样化和复杂化。同时,为了降低Cache功耗和提高Cache的性能和利用率,细粒度的Cache管理也被越来越多的研究工作所提出。因此,网络中传输的数据包大小不再是规则的Cache块大小,各种粒度的数据包大小(控制信息+小粒度访存)穿梭在网络中,严重降低了传统固定宽度的片上网络通道的利用率。针对此现象,众多的研究工作提出了细粒度的片上网络划分机制,将固定宽度的片上网络分割成多个细粒度的链路通道,这样可以适应性地传输不同大小的数据包,提高片上网络的有效利用率。In large-scale many-core processors, in order to meet the data transmission requirements among many-cores, the network-on-chip is playing an increasingly important role. With the diversification and complexity of Internet-oriented, cloud computing and other service applications, the types of data packets transmitted in the network-on-chip are also becoming more diverse and complex. At the same time, in order to reduce Cache power consumption and improve Cache performance and utilization, fine-grained Cache management has been proposed by more and more research work. Therefore, the size of data packets transmitted in the network is no longer the regular Cache block size, and data packet sizes of various granularities (control information + small-grained memory access) shuttle in the network, seriously reducing the traditional fixed-width on-chip network channel. utilization rate. In response to this phenomenon, numerous research works have proposed a fine-grained on-chip network partition mechanism, which divides a fixed-width on-chip network into multiple fine-grained link channels, so that data packets of different sizes can be adaptively transmitted, and the network on a chip can be improved. effective utilization rate.
在已报导和所能查阅到的国内外相关研究中,关于片上网络路由器通道资源分配算法研究情况可列举、分析、总结如下:Among the related domestic and foreign researches that have been reported and can be consulted, the research on channel resource allocation algorithms of network-on-chip routers can be listed, analyzed, and summarized as follows:
Hesse等人提出了将宽通道链路拆分成细粒度的通道宽度,同时数据包也变为与细粒度通道宽度一致的数据包。并且,通道设置为双向可变的通道,根据双向的数据包的流量压力动态调节通道的传输方向。Hesse et al. proposed to split wide-channel links into fine-grained channel widths, and at the same time, data packets also become packets consistent with the fine-grained channel widths. Moreover, the channel is set as a two-way variable channel, and the transmission direction of the channel is dynamically adjusted according to the flow pressure of the two-way data packets.
Krishna等人根据大量的数据包通常会分享一些传输通道,因此提出了将分享同一路线的小数据包合并后传输,将目的地不同的合并数据包进行拆分后传输至不同的方向,提高通道的利用率。通常这样的数据包出现在支持广播机制的片上网络中。Krishna et al. based on the fact that a large number of data packets usually share some transmission channels, so they proposed to combine small data packets sharing the same route and then transmit them, split the merged data packets with different destinations and transmit them to different directions, and improve the channel. utilization rate. Typically such packets appear in a network-on-chip that supports a broadcast mechanism.
Jin等人提出了将数据包进行压缩,压缩之后的数据包可以共享使用同一传输通道,进而提高传输通道的利用率和吞吐量。Jin et al. proposed to compress the data packets, and the compressed data packets can share the same transmission channel, thereby improving the utilization rate and throughput of the transmission channel.
Junghee Lee等人提出了细粒度片上通道控制并配合数据包在不同虚通道中的“窃取”和“分享”技术,提高端口虚通道的利用率,降低数据包传输的延 迟,提高传输效率。Junghee Lee et al. proposed fine-grained on-chip channel control and combined with the "stealing" and "sharing" technology of data packets in different virtual channels to improve the utilization of port virtual channels, reduce the delay of data packet transmission, and improve transmission efficiency.
Reetuparna Das等人提出了灵活的路由器内部输入端口和输出端口之间的仲裁管理机制。对于申请同一输出端口的输入端口来说,如果两个输入端口都存在小数据包时,那么就将小数据包在同一时刻传输至输出端口,提高数据传输的并行性同时提高通道的有效利用率。Reetuparna Das et al. proposed a flexible arbitration management mechanism between input ports and output ports inside a router. For input ports that apply for the same output port, if there are small data packets in both input ports, then the small data packets are transmitted to the output port at the same time, improving the parallelism of data transmission and improving the effective utilization of the channel .
随着片上数据处理单元的逐步增加,片上网络的传输性能必将受到越来越多的关注和压力。新型的大数据网络服务应用的出现,对传统的片上网络提出了更高的要求。因此,如何进一步提高片上网络的传输效率也将会成为研究者不断探索的热点方向之一。With the gradual increase of on-chip data processing units, the transmission performance of on-chip networks will receive more and more attention and pressure. The emergence of new big data network service applications has put forward higher requirements for traditional on-chip networks. Therefore, how to further improve the transmission efficiency of the network on chip will also become one of the hot directions that researchers continue to explore.
综上可知,现有技术在实际使用上显然存在不便与缺陷,所以有必要加以改进。In summary, there are obviously inconveniences and defects in the actual use of the prior art, so it is necessary to improve it.
发明内容Contents of the invention
针对上述的缺陷,本发明的目的在于提供一种高密度片上网络的路网实现方法、装置及路由器,其目的在于改善片上网络路由器体系结构设计及路由器资源分配,使得面向细粒度控制的路由器内部通道资源的高效分配,实现局部最优的路由器内部输入端口和输出端口仲裁映射算法。In view of the above-mentioned defects, the purpose of the present invention is to provide a road network implementation method, device and router of a high-density network-on-chip. The efficient allocation of channel resources realizes the locally optimal arbitration mapping algorithm for the input port and output port inside the router.
为了实现上述目的,本发明提供一种片上网络路由器通道资源的贪婪分配方法,所述贪婪分配方法在片上网络路由器的交叉开关分配阶段为路由器的内部的输入端口与输出端口进行映射之前进行,包括:In order to achieve the above object, the present invention provides a method for greedily allocating network-on-chip router channel resources. The greedy allocation method is carried out before the cross-bar switch allocation stage of the network-on-chip router is mapped to the internal input port and output port of the router, including :
预处理步骤,当数据包进入所述输入端口的子通道和虚通道中时,将数据包信息进行归类处理得到预处理信息表;A preprocessing step, when a data packet enters the sub-channel and the virtual channel of the input port, classify the data packet information to obtain a pre-processing information table;
执行步骤,根据所述预处理信息表以及所述数据包对应的所述输出端口的大小,选择最大限度多个所述数据包匹配所述子通道和所述输出端口。Executing a step of selecting a maximum number of data packets to match the sub-channel and the output port according to the preprocessing information table and the size of the output port corresponding to the data packet.
根据本发明所述片上网络路由器通道资源的贪婪分配方法,所述预处理步骤还包括:According to the greedy allocation method of network-on-chip router channel resources of the present invention, the preprocessing step also includes:
当所述数据包传至所述输入端口时,收集路由计算处理结果;When the data packet is transmitted to the input port, collecting routing calculation processing results;
根据路由计算的所述数据包信息按预期发往的所述输出端口分类创建独立的所述预处理信息表;Create an independent preprocessing information table according to the packet information calculated by routing according to the expected output port classification;
将每个所述子通道的所述数据包期望的所述输出端口更新至对应的所述预 处理信息表;Updating the expected output port of the data packet of each subchannel to the corresponding preprocessing information table;
所述预处理信息表包括发往的所述输出端口的子端口信息、数据包的大小信息和/或所述数据包占用的所述子通道信息。The preprocessing information table includes subport information of the output port sent to, size information of the data packet, and/or information of the subchannel occupied by the data packet.
根据本发明所述片上网络路由器通道资源的贪婪分配方法,所述执行步骤还包括:According to the greedy allocation method of network-on-chip router channel resources of the present invention, the execution step also includes:
以轮询算法轮询选择一所述输入端口作为基准端口;Polling and selecting one of the input ports as a reference port by a polling algorithm;
所述轮询算法轮询的包括依次选择不同的所述输入端口,每次选择执行i=(i+1)mod n,选出第i个端口,n为端口总数;所述基准端口的所述数据包优先传输到同一所述输出端口。The polling algorithm polling includes selecting different input ports in turn, each selection is performed i=(i+1) mod n, and the i-th port is selected, and n is the total number of ports; all of the reference ports said data packets are preferentially transmitted to the same said output port.
根据本发明所述片上网络路由器通道资源的贪婪分配方法,所述基准端口的所述数据包优先传输到同一所述输出端口的步骤还包括:According to the greedy allocation method of network-on-chip router channel resources of the present invention, the step of preferentially transmitting the data packets of the reference port to the same output port further includes:
若所述基准端口的所述子通道都有所述数据包期望发往同一个所述输出端口,则不再选择后续所述输入端口的子通道的所述数据包;If the sub-channels of the reference port all have the data packets expected to be sent to the same output port, then no longer select the data packets of the subsequent sub-channels of the input port;
若所述基准端口的所述子通道中的所述数据包没有填满传输链路,依次选择后续所述输入端口的所述数据包。If the data packets in the sub-channels of the reference port do not fill up the transmission link, sequentially select the data packets of the subsequent input ports.
根据本发明所述片上网络路由器通道资源的贪婪分配方法,还包括:According to the greedy allocation method of network-on-chip router channel resources of the present invention, it also includes:
在所述片上网络路由器通道资源的贪婪分配方法前执行虚通道分配;Perform virtual channel allocation before the greedy allocation method of channel resources of the network-on-chip router;
根据所述片上网络路由器通道资源的贪婪分配方法的处理结果在所述交叉开关分配阶段传输所述数据包。The data packet is transmitted in the crossbar allocation stage according to the processing result of the greedy allocation method of the network-on-chip router channel resources.
本发明一种片上网络路由器通道资源的贪婪分配装置,所述贪婪分配装置设置于路由器中,所述贪婪分配装置包括:The present invention is a greedy allocation device for network-on-chip router channel resources, the greedy allocation device is set in the router, and the greedy allocation device includes:
预处理模块,用于当数据包进入所述输入端口的子通道和虚通道中时,将数据包信息进行归类处理得到预处理信息表;A pre-processing module, configured to classify the data packet information to obtain a pre-processing information table when the data packet enters the sub-channel and the virtual channel of the input port;
执行模块,用于根据所述预处理信息表以及所述数据包对应的所述输出端口的大小,选择最大限度多个所述数据包匹配所述子通道和所述输出端口。An execution module, configured to select a maximum number of data packets to match the sub-channel and the output port according to the preprocessing information table and the size of the output port corresponding to the data packet.
根据本发明所述片上网络路由器通道资源的贪婪分配装置,所述预处理模块还包括:According to the greedy allocation device of network-on-chip router channel resources of the present invention, the preprocessing module further includes:
路由收集子模块,用于当所述数据包传至输入端口时,收集路由计算处理结果;信息处理子模块,用于根据路由计算的所述数据包信息按预期发往的所述输出端口分类创建独立的所述预处理信息表;The routing collection submodule is used to collect the routing calculation processing results when the data packet is transmitted to the input port; the information processing submodule is used to classify the output port to which the data packet information according to the routing calculation is expected to be sent creating a separate table of said preprocessing information;
信息更新子模块,用于将每个所述子通道的所述数据包期望的所述输出端口更新至对应的所述预处理信息表;An information update submodule, configured to update the expected output port of the data packet of each subchannel to the corresponding preprocessing information table;
所述预处理信息表包括发往的所述输出端口的子端口信息、数据包的大小信息和/或所述数据包占用的所述子通道信息。The preprocessing information table includes subport information of the output port sent to, size information of the data packet, and/or information of the subchannel occupied by the data packet.
根据本发明所述片上网络路由器通道资源的贪婪分配装置,所述执行模块还包括:According to the device for greedily allocating network-on-chip router channel resources of the present invention, the execution module further includes:
基准选择子模块,用于以轮询算法轮询选择一所述输入端口作为基准端口;A reference selection sub-module, configured to use a polling algorithm to poll and select one of the input ports as a reference port;
所述轮询算法轮询的包括依次选择不同的所述输入端口,每次选择执行i=(i+1)mod n,选出第i个端口,n为端口总数;The polling algorithm polling includes selecting different input ports in turn, each selection is performed i=(i+1) mod n, and the i-th port is selected, and n is the total number of ports;
局部优先子模块,用于所述基准端口的所述数据包优先传输到同一所述输出端口。The local priority sub-module is configured to preferentially transmit the data packets for the reference port to the same output port.
根据本发明所述片上网络路由器通道资源的贪婪分配装置,所述执行模块还包括:According to the device for greedily allocating network-on-chip router channel resources of the present invention, the execution module further includes:
第一选择子模块,用于若所述基准端口的所述子通道都有所述数据包期望发往同一个所述输出端口,则不再选择后续所述输入端口的子通道的所述数据包;The first selection sub-module is configured to no longer select the data of the subsequent sub-channels of the input port if the data packets in the sub-channels of the reference port are expected to be sent to the same output port Bag;
第二选择子模块,用于若所述基准端口的所述子通道中的所述数据包没有填满传输链路,依次选择后续所述输入端口的所述数据包。The second selection submodule is configured to sequentially select the data packets of the subsequent input ports if the data packets in the sub-channels of the reference port do not fill up the transmission link.
本发明还提供一种包括如上述任一项所述片上网络路由器通道资源的贪婪分配装置的路由器。The present invention also provides a router including the device for greedily allocating network-on-chip router channel resources as described above.
本发明在路由器内部添加具有预处理模块和执行模块的贪婪分配装置,通过利用贪婪算法思想为路由器内部的输入端口和输出端口之间进行趋向最优的映射,适用于细粒度通道网络中,即将传统的片上宽通路分割成低宽度的子传输通道,在细粒度的片上网络路由器中,当各子通道的数据包经过路由计算后,将子通道的信息更新至预处理信息表中,执行模块通过对预处理表信息的处理,实现了对输入端口和输出端口之间进行通道的匹配,达到趋向于最优的映射,提高通道的利用率。The present invention adds a greedy allocation device with a preprocessing module and an execution module inside the router, and uses the idea of greedy algorithm to perform an optimal mapping between the input port and the output port inside the router, which is suitable for fine-grained channel networks. The traditional on-chip wide channel is divided into low-width sub-transmission channels. In the fine-grained on-chip network router, after the data packets of each sub-channel are routed and calculated, the information of the sub-channel is updated to the preprocessing information table, and the execution module Through the processing of the preprocessing table information, the channel matching between the input port and the output port is realized, the mapping tends to be optimal, and the utilization rate of the channel is improved.
附图说明Description of drawings
图1是本发明片上网络路由器通道资源的贪婪分配装置的贪婪分配装置结 构示意图;Fig. 1 is the greedy distribution device structure schematic diagram of the greedy distribution device of network-on-chip router channel resource of the present invention;
图2是本发明片上网络路由器通道资源的贪婪分配装置的贪婪分配装置的优选实施例的结构示意图;Fig. 2 is a schematic structural diagram of a preferred embodiment of the greedy allocation device of the greedy allocation device of the network-on-chip router channel resources of the present invention;
图3是本发明片上网络路由器通道资源的贪婪分配装置的路由器具体实施例示意图;Fig. 3 is a schematic diagram of a specific embodiment of a router of a device for greedily allocating network-on-chip router channel resources according to the present invention;
图4是本发明片上网络路由器通道资源的贪婪分配方法流程示意图;Fig. 4 is a schematic flow chart of a greedy allocation method of network-on-chip router channel resources of the present invention;
图5是本发明片上网络路由器通道资源的贪婪分配方法优选流程实施例示意图;Fig. 5 is a schematic diagram of an embodiment of a greedy allocation method of network-on-chip router channel resources in the present invention;
图6是片上网络路由器通道资源的贪婪分配方法实现的具体实施例mesh网络示意图;Fig. 6 is a schematic diagram of a mesh network of a specific embodiment realized by a method for greedy allocation of network-on-chip router channel resources;
图7是本发明片上网络路由器通道资源的贪婪分配方法的具体实施例数据包处理完整流程示意图;Fig. 7 is a schematic diagram of a specific embodiment of the method for greedily allocating network-on-chip router channel resources of the present invention and a complete flow of data packet processing;
图8是本发明片上网络路由器通道资源的贪婪分配方法具体实施例之执行步骤示意图;Fig. 8 is a schematic diagram of execution steps of a specific embodiment of a method for greedily allocating network-on-chip router channel resources of the present invention;
图9是本发明片上网络路由器通道资源的贪婪分配方法传输状态示意图。FIG. 9 is a schematic diagram of the transmission state of the method for greedy allocation of network-on-chip router channel resources according to the present invention.
具体实施方式detailed description
为了使本发明的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本发明进行进一步详细说明。应当理解,此处所描述的具体实施例仅仅用以解释本发明,并不用于限定本发明。In order to make the object, technical solution and advantages of the present invention clearer, the present invention will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present invention, not to limit the present invention.
为了解决上述问题,本发明提供一种片上网络路由器通道资源的贪婪分配装置,结合图示说明,如图1~图3所示,所述贪婪分配装置40设置在路由器100中,贪婪分配装置40包括:In order to solve the above-mentioned problems, the present invention provides a device for greedily allocating network-on-chip router channel resources. With reference to illustrations, as shown in FIGS. include:
预处理模块41,用于当数据包进入所述输入端口的子通道和虚通道中时,将数据包信息进行归类处理得到预处理信息表;A pre-processing module 41, configured to classify the data packet information to obtain a pre-processing information table when the data packet enters the sub-channel and the virtual channel of the input port;
执行模块42,用于根据所述预处理信息表以及所述数据包对应的所述输出端口的大小,选择最大限度多个所述数据包匹配所述子通道和所述输出端口。The execution module 42 is configured to select a maximum number of the data packets to match the sub-channel and the output port according to the preprocessing information table and the size of the output port corresponding to the data packet.
进一步地,如图3所示,贪婪分配装置40还包括:Further, as shown in Figure 3, the greedy distribution device 40 also includes:
路由收集子模块411,用于当所述数据包传至输入端口时,收集路由计算处理结果;The routing collection sub-module 411 is used to collect routing calculation processing results when the data packet is transmitted to the input port;
相应地,如图2所示,预处理模块41还包括:Correspondingly, as shown in Figure 2, the preprocessing module 41 also includes:
信息处理子模块412,用于根据路由计算的所述数据包信息按预期发往的所述输出端口分类创建独立的所述预处理信息表;The information processing sub-module 412 is configured to create an independent preprocessing information table according to the expected output port classification of the data packet information calculated by routing;
信息更新子模块413,用于将每个所述子通道的所述数据包期望的所述输出端口更新至对应的所述预处理信息表;An information update submodule 413, configured to update the expected output port of the data packet of each subchannel to the corresponding preprocessing information table;
所述预处理信息表包括发往的所述输出端口的子端口信息、数据包的大小信息和/或所述数据包占用的所述子通道信息,具体地可参见图9。The preprocessing information table includes subport information of the output port sent to, size information of the data packet and/or information of the subchannel occupied by the data packet, specifically refer to FIG. 9 .
进一步地,贪婪分配装置40,优选的是,所述执行模块42还包括:Further, the greedy allocation device 40, preferably, the execution module 42 also includes:
基准选择子模块421,用于以轮询算法轮询选择一所述输入端口作为基准端口;A reference selection submodule 421, configured to use a polling algorithm to poll and select one of the input ports as a reference port;
基准选择子模块421还用于以轮询的方式依次选择不同的所述输入端口,每次选择执行i=(i+1)mod n,选出第i个端口,n为端口总数;The reference selection sub-module 421 is also used to sequentially select different input ports in a polling manner, and execute i=(i+1) mod n each time to select the i-th port, where n is the total number of ports;
局部优先子模块422,用于所述基准端口的所述数据包优先传输到同一所述输出端口;A local priority sub-module 422, configured to preferentially transmit the data packets of the reference port to the same output port;
第一选择子模块423,用于若所述基准端口的所述子通道都有所述数据包期望发往同一个所述输出端口,则不再选择后续所述输入端口的子通道的所述数据包;The first selection sub-module 423 is configured to no longer select the sub-channel of the subsequent input port if the data packets in the sub-channels of the reference port are expected to be sent to the same output port. data pack;
第二选择子模块424,用于若所述基准端口的所述子通道中的所述数据包没有填满传输链路,依次选择后续所述输入端口的所述数据包。The second selection sub-module 424 is configured to sequentially select the data packets of the subsequent input ports if the data packets in the sub-channel of the reference port do not fill up the transmission link.
与贪婪分配装置40对应的是,本发明提供一种包括如上述任一项所述片上网络路由器通道资源的贪婪分配装置40的路由器100。如图3所示的一个具体实施例,路由器100还包括:Corresponding to the greedy allocating device 40, the present invention provides a router 100 including the greedy allocating device 40 for network-on-chip router channel resources as described above. A specific embodiment shown in Figure 3, router 100 also includes:
路由计算模块20,用于当所述数据包传至输入端口时,进行路由计算;A route calculation module 20, configured to perform route calculation when the data packet is transmitted to the input port;
虚通道分配器30,用于在片上网络路由器通道资源的贪婪分配方法之前执行虚通道分配;交叉开关分配器50,用于根据片上网络路由器通道资源的贪婪分配方法的处理结果在所述交叉开关分配阶段传输所述数据包;路由器100的所述子通道与所述输出端口采用全相联的映射方式。The virtual channel allocator 30 is used to perform virtual channel allocation before the greedy allocation method of the network-on-chip router channel resources; the crossbar allocator 50 is used to perform the virtual channel allocation according to the processing result of the greedy allocation method of the network-on-chip router channel resources. The data packet is transmitted in the allocation stage; the sub-channel of the router 100 and the output port adopt a full-associative mapping manner.
本发明的总体思路是:在路由器100内部流水线一般存在虚通道分配(VirtualChannel Allocation,VA)、路由计算(Route Computing,RC)、交叉开关分配(SwitchAllocation,SA)、交叉开关传输(Switch Transfer,ST)、 链路传输(Link Transfer,LT)等阶段。其中,在交叉开关分配(SA)阶段,是为路由器内部输入端口与所述输出端口进行映射,为下一阶段的数据传输做准备。在传统的片上网络实现机制中,数据包的大小与通道宽度一致,而在细粒度管理的片上网络实现机制中,一定数量的数据包大小以及一些控制信息的大小远低于通道宽度,按所述子通道的数量分配对应的路由器100之上的输入缓冲区、交叉开关、输出缓冲区、虚通道、多选一选择器和路由到路由之间的链路,多选一选择器包括二选一选择器或是四选一选择器等类型的多选一选择器。The general idea of the present invention is: generally there are virtual channel allocation (VirtualChannel Allocation, VA), route calculation (Route Computing, RC), crossbar allocation (SwitchAllocation, SA) and crossbar transmission (Switch Transfer, ST) in the internal assembly line of router 100. ), link transfer (Link Transfer, LT) and other stages. Wherein, in the crossbar assignment (SA) stage, the internal input port of the router is mapped to the output port, so as to prepare for the data transmission in the next stage. In the traditional implementation mechanism of network-on-chip, the packet size is consistent with the channel width, while in the implementation mechanism of network-on-chip with fine-grained management, the size of a certain number of data packets and some control information are much lower than the channel width. The number of sub-channels is allocated to the corresponding input buffer, crossbar switch, output buffer, virtual channel, multi-choice one selector and the link between the router and the router 100. The multi-choice one selector includes the two-choice One selector or four select one selector and other types of multi-select one selector.
如图6所示为细粒度片上网络通道实现机制,传统的高宽度通道被分割成细粒度的子通道13,子通道与所述输出端口采用全相联的映射方式。假设为Mesh结构的片上网络路由器,输入端口11,12为5个输入端口中的其中之一。如图6所示的子通道贪婪算法映射原理,在输入端口中,128bit宽度数据通道被分割成4个子通道,每个子通道VC Creditchannel1为32bit宽,包含1个二选一选择器14和2个虚通道VC0、VC1,处于同等的地位,每个输入端口共4个二选一选择器和8个虚通道。在数据包的传输过程中,每个所述输入端口中,如输入端口0每个所述子通道可能传输不同的数据包,也存在几个不同的虚通道传输同一个大数据包的可能性,即当某一数据包大于子通道宽度,它便可以占据满足自己需求宽度的多个相邻子通道。因此对于每个输入端口来说,并不是所有输入端口中等待发送到所述输出端口0~8的数据包的大小都是128bit,每个所述输出端口每次最多只能接受128bit的数据传输。因此在进行交叉开关分配(SA)阶段时,有交叉开关分配器执行处理,数据包通过交叉开关51,5个输入端口和5个所述输出端口进行匹配。然而,当传统的端口被划分为细粒度的子通道时,输入端口和所述输出端口之间的映射会变的复杂很多,传统的轮询(Round-Robin)算法之类的算法并不一定能保证每次选出的内容是合理或是趋向于最优的,提出贪婪算法映射来最大化提高数据通道的利用率,即片上网络路由器通道资源的贪婪分配方法实现贪婪算法思想的映射方式,通过对基准端口到非基准端口逐个局部最优化获得整体最优化的输入端口到输出端口的映射。As shown in FIG. 6 , the implementation mechanism of the fine-grained on-chip network channel, the traditional high-width channel is divided into fine-grained sub-channels 13, and the sub-channels and the output ports are mapped in a fully associative manner. It is assumed that it is an on-chip network router with a Mesh structure, and the input ports 11 and 12 are one of the five input ports. The sub-channel greedy algorithm mapping principle shown in Figure 6, in the input port, the 128bit wide data channel is divided into 4 sub-channels, and each sub-channel VC Creditchannel1 is 32bit wide, including 1 alternative selector 14 and 2 The virtual channels VC0 and VC1 are in the same position, and each input port has 4 alternative selectors and 8 virtual channels. During the transmission of data packets, in each of the input ports, such as input port 0, each of the sub-channels may transmit different data packets, and there is also the possibility that several different virtual channels transmit the same large data packet , that is, when a data packet is larger than the sub-channel width, it can occupy multiple adjacent sub-channels that meet its own required width. Therefore, for each input port, not all the data packets waiting to be sent to the output ports 0 to 8 in all input ports have a size of 128 bits, and each output port can only accept 128 bits of data transmission at a time. . So in the Crossbar Assignment (SA) phase, there is a Crossbar Assignor to perform the processing, the data packet is matched through the Crossbar 51, 5 input ports and 5 said output ports. However, when a traditional port is divided into fine-grained sub-channels, the mapping between the input port and the output port will become much more complicated, and algorithms such as the traditional round-robin (Round-Robin) algorithm do not necessarily It can ensure that the content selected each time is reasonable or tends to be optimal. A greedy algorithm mapping is proposed to maximize the utilization rate of the data channel, that is, the greedy allocation method of the router channel resources of the on-chip network realizes the mapping method of the greedy algorithm idea. The overall optimized input port to output port mapping is obtained by local optimization of the reference port to the non-reference port one by one.
更进一步地,为了使得本发明一种片上网络路由器通道资源的贪婪分配方法阐述更清楚,基于所述贪婪分配装置40及所述路由器100实现,所述片上网 络路由器通道资源的贪婪分配方法在交叉开关分配阶段为路由器的内部的输入端口与输出端口进行映射之前进行,如图4所示的流程图,包括:Furthermore, in order to make the description of a greedy allocation method of network-on-chip router channel resources in the present invention clearer, based on the realization of the greedy allocation device 40 and the router 100, the greedy allocation method of network-on-chip router channel resources in the cross The switch allocation stage is performed before the internal input port and output port of the router are mapped, as shown in the flow chart in Figure 4, including:
步骤S401,预处理步骤,当数据包进入所述输入端口的子通道和虚通道中时,将数据包信息进行归类处理得到预处理信息表;Step S401, a preprocessing step, when a data packet enters the sub-channel and virtual channel of the input port, classify the data packet information to obtain a pre-processing information table;
步骤S402,执行步骤,根据所述预处理信息表以及所述数据包对应的所述输出端口的大小,选择最大限度多个所述数据包匹配所述子通道和所述输出端口;Step S402, performing a step of selecting a maximum number of data packets to match the sub-channel and the output port according to the preprocessing information table and the size of the output port corresponding to the data packet;
这两步骤分别由预处理模块41和执行模块42执行,预处理信息表作为后续数据包和子通道管理依据,执行模块42根据预处理信息表获得数据包的大小、所在的子通道、要去往的输出端口。These two steps are respectively carried out by the preprocessing module 41 and the execution module 42, the preprocessing information table is used as the basis for subsequent data packet and sub-channel management, and the execution module 42 obtains the size of the data packet, the sub-channel where it is located, and the destination to and from according to the preprocessing information table. output port.
由于每个所述输出端口每次最多只能接受128bit的数据传输,对于具有5个输入端口,每个所述输入端口划分4个子通道,那么任意一个所述输出端口的每次传输的多个数据包的总大小不能超过128bit,而由于数据包大小不一,每次的数量也就只能根据能传输到同一所述输出端口的所有数据包的大小信息进行集合,得到以下的预处理信息表,将其中传输到所述输出端口直到子通道填满或不能再合并传输后续的数据包,获得局部最优化的效果。Since each said output port can only accept data transmission of 128bit at most each time, for having 5 input ports, each said input port is divided into 4 sub-channels, so the number of each transmission of any one of said output ports The total size of the data packet cannot exceed 128bit, and because the data packet size is different, the number of each time can only be collected according to the size information of all data packets that can be transmitted to the same output port, and the following preprocessing information is obtained table, and transmit it to the output port until the sub-channel is full or the subsequent data packets cannot be combined and transmitted, so as to obtain the effect of local optimization.
在预处理步骤,预处理模块10获取了数据包信息,所有子通道中,预期发往同一个所述输出端口的所有数据包的信息发往对应的信息表中:In the preprocessing step, the preprocessing module 10 has obtained the data packet information, and in all sub-channels, the information of all data packets expected to be sent to the same output port is sent to the corresponding information table:
预处理信息表Preprocessing information table
如预处理信息表所示,第一列属性sub_channel代表对应的是输入端口的哪个子端口;第二列属性size代表数据包的大小;第三至第五列S0、S1、S2,代表如果size大于1时,数据包的其它部分在哪个子通道中,因为这个数据包是一个整体,传输的时候不能将其拆开。As shown in the preprocessing information table, the attribute sub_channel in the first column represents which sub-port of the corresponding input port; the attribute size in the second column represents the size of the data packet; the third to fifth columns S0, S1, S2 represent if the size When it is greater than 1, which sub-channel is the other part of the data packet in, because the data packet is a whole and cannot be disassembled during transmission.
执行步骤中,执行模块42的作用是根据预处理过程的结果,以输出端口为中心,在端口大小为上限的条件下,选择尽可能多的数据包。但是,为了保证 公平性,算法需遵循一定的规则,即:一、轮询选择基准端口;二、基准端口中的数据包优先传输。In the execution step, the function of the execution module 42 is to select as many data packets as possible with the output port as the center and the port size as the upper limit according to the result of the preprocessing process. However, in order to ensure fairness, the algorithm needs to follow certain rules, namely: 1. Polling to select the reference port; 2. Data packets in the reference port are transmitted preferentially.
进一步地,如图5所示,所述片上网络路由器通道资源的贪婪分配方法一个优选实施例,步骤还包括:Further, as shown in FIG. 5, in a preferred embodiment of the method for greedy allocation of network-on-chip router channel resources, the steps further include:
步骤S501,当所述数据包传至所述输入端口时,进行路由计算;Step S501, when the data packet is transmitted to the input port, perform route calculation;
这一步由路由收集子模块411实现,与路由器的路由计算模块实现协作;This step is realized by the route collection sub-module 411, which cooperates with the route calculation module of the router;
步骤S502,根据路由计算的所述数据包信息按预期发往的所述输出端口分类创建独立的所述预处理信息表;信息处理子模块412根据输出端口进行分类创建不同的预处理信息表,方便执行。Step S502, creating an independent preprocessing information table according to the expected output port classification of the data packet information calculated according to the route; the information processing sub-module 412 classifies and creates different preprocessing information tables according to the output port, Easy to execute.
步骤S503,将每个所述子通道的所述数据包期望的所述输出端口更新至对应的所述预处理信息表;Step S503, updating the expected output port of the data packet of each sub-channel to the corresponding preprocessing information table;
所述预处理信息表包括发往的所述输出端口的子端口信息、数据包的大小信息和/或所述数据包占用的所述子通道信息。The preprocessing information table includes subport information of the output port sent to, size information of the data packet, and/or information of the subchannel occupied by the data packet.
信息更新子模块413通过路由计算的结果将每个所述子通道的所述数据包期望的所述输出端口更新至对应的所述预处理信息表,与现有路由器其它的部分进行协作。The information update sub-module 413 updates the expected output port of the data packet of each sub-channel to the corresponding pre-processing information table according to the route calculation result, and cooperates with other parts of the existing router.
步骤S504,以轮询(Round-Robin)算法轮询选择一所述输入端口作为基准端口;这一步通过基准选择子模块421执行。Step S504 , polling and selecting one of the input ports as a reference port by using a round-robin algorithm; this step is executed by the reference selection sub-module 421 .
优选的是,步骤S504中:Preferably, in step S504:
所述轮询(Round-Robin)算法包括以轮询的方式依次选择不同的所述输入端口,每次选择执行i=(i+1)mod n,选出第i个端口,n为端口总数;以轮询(Round-Robin)算法选择基准端口。基准选择子模块421,用于以轮询算法轮询选择一所述输入端口作为基准端口;基准选择子模块421依次选择不同的所述输入端口,局部优先子模块422将所述基准端口的所述数据包优先传输到同一所述输出端口;所谓基准端口即以此输入端口的数据包大小为初始大小,通过贪婪算法选择其它输入端口的数据包,并且若基准端口数据包大小和链路宽度一致,如链路宽度为128bit,数据包为128bit,即数据包已经填满所有子通道(链路),此时不会去选择其它端口的数据包,最终获取局部最优策略。轮询选择基准端口的原因是防止造成端口传输的不公平,因为如果不进行轮询的话,第一个被选择为基准端口的输入端口存在极端可能,即每次都被选择为基准端 口。The polling (Round-Robin) algorithm includes sequentially selecting different input ports in a polling manner, each selection is performed i=(i+1)mod n, and the i-th port is selected, and n is the total number of ports ; Select the reference port by round-robin algorithm. The reference selection sub-module 421 is used to poll and select one of the input ports as the reference port with a polling algorithm; the reference selection sub-module 421 selects different input ports in turn, and the local priority sub-module 422 selects all of the reference ports The above data packets are preferentially transmitted to the same output port; the so-called reference port is to use the data packet size of this input port as the initial size, select the data packets of other input ports through a greedy algorithm, and if the data packet size of the reference port and the link width Consistent, if the link width is 128bit, the data packet is 128bit, that is, the data packet has filled all the sub-channels (links), at this time, the data packets of other ports will not be selected, and finally the local optimal strategy will be obtained. The reason for polling to select the reference port is to prevent unfair port transmission, because if there is no polling, the first input port selected as the reference port is extremely likely to be selected as the reference port every time.
步骤S505,所述基准端口的所述数据包优先传输到同一所述输出端口;这一步由局部优先子模块422执行;Step S505, the data packets of the reference port are preferentially transmitted to the same output port; this step is performed by the local priority sub-module 422;
步骤S506,若所述基准端口的所述子通道都有所述数据包期望发往同一个所述输出端口,则不再选择后续所述输入端口的子通道的所述数据包;Step S506, if all of the sub-channels of the reference port have the data packets expected to be sent to the same output port, then no longer select the data packets of the subsequent sub-channels of the input port;
这一步由第一选择子模块423对当前传输的数据包进行检测和执行;In this step, the first selection sub-module 423 detects and executes the currently transmitted data packet;
步骤S507,若所述基准端口的所述子通道中的所述数据包没有填满传输链路,依次选择后续所述输入端口的所述数据包。这一步由第二选择子模块424实现。Step S507, if the data packets in the subchannel of the reference port do not fill up the transmission link, sequentially select the data packets of the subsequent input ports. This step is realized by the second selection sub-module 424 .
对于第一选择子模块423从输入端口中选择数据包,是从所有子通道所涉及的虚通道、虚通道缓存等交叉开关分配阶段前的器件中选择,对于某次传输,被选择的所述基准端口(每次只有一个)具有优先传输权。所谓优先传输权是指,当输入端口的虚通道或是虚通道内的缓冲中存在多个数据包期望发往同一个所述输出端口时,优先组合基准端口中的各子通道的数据包,而同一次传输的多个数据包看作整体,那么其中的一个数据包很大可以占据多个子通道,对于其中一个子通道的数据即为数据分片,对此需要计算所述基准端口中的各个子通道中等待传输的数据分片的大小和所述输出端口,若未填满当次输入时的所有子通道,则第一选择子模块423从虚通道、虚通道缓冲中提取合适大小的数据包,直到填满子通道为止,若仍然无法填满,则第二选择子模块424从其它子通道中选择数据包,即从其它的输入端口中选择同类数据包,发送到同一所述输出端口,使其达到最优的传输效率。第一选择子模块423尽量选择同一端口子通道的数据包的优点是降低输出端口所一次映射的输入端口,而基准选择子模块421选择了基准端口,可以尽量使除了基准端口的其它端口与除了基准端口对应的输出端口之外的其它输出端口映射,提高端口的利用率,降低复杂度。若基准端口子通道中的数据包无法填满传输链路,则选择其它输入端口的子通道中的数据包进行合并传输。For the first selection sub-module 423 to select data packets from the input port, it is selected from the devices before the crossbar allocation stage such as virtual channels and virtual channel buffers involved in all sub-channels. For a certain transmission, the selected The reference port (only one at a time) has the priority to transmit. The so-called priority transmission right means that when there are multiple data packets expected to be sent to the same output port in the virtual channel of the input port or in the buffer in the virtual channel, the data packets of each sub-channel in the reference port are preferentially combined, However, multiple data packets transmitted at the same time are regarded as a whole, so one of the data packets is very large and can occupy multiple sub-channels, and the data of one of the sub-channels is data fragmentation. For this, it is necessary to calculate the The size of the data fragments waiting to be transmitted in each sub-channel and the output port, if not filling all the sub-channels of the current input, the first selection sub-module 423 extracts a suitable size from the virtual channel and the virtual channel buffer. Data packet, until filling up sub-passage, if still can't fill up, then the second selection submodule 424 selects data packet from other sub-passages, promptly selects similar data packet from other input ports, sends to same described output port to achieve optimal transmission efficiency. The first selection sub-module 423 has the advantage of selecting the data packets of the same port sub-channel as far as possible to reduce the input port mapped by the output port once, and the reference port selection sub-module 421 has selected the reference port, so that other ports except the reference port can be compared with other ports except the reference port as much as possible. Mapping of other output ports other than the output port corresponding to the reference port improves port utilization and reduces complexity. If the data packets in the sub-channels of the reference port cannot fill the transmission link, select the data packets in the sub-channels of other input ports for combined transmission.
在一个具体实施例中,基于所述贪婪分配装置40及所述路由器100实现所述片上网络路由器通道资源的贪婪分配方法,数据包以此实现一个完整流程,所述流程包括:In a specific embodiment, based on the greedy allocation device 40 and the router 100, the greedy allocation method of the network-on-chip router channel resource is implemented, and the data packet realizes a complete process in this way, and the process includes:
步骤701,数据包输入;Step 701, data packet input;
步骤702,虚通道分配;Step 702, virtual channel allocation;
步骤703,路由计算;Step 703, route calculation;
步骤704,预处理步骤;Step 704, a preprocessing step;
步骤705,执行步骤;Step 705, execute the step;
步骤706,交叉开关分配(SA);Step 706, crossbar assignment (SA);
步骤707,交叉开关传输(ST);Step 707, crossbar transmission (ST);
步骤708,链路传输(LT)阶段。Step 708, link transfer (LT) phase.
经过贪婪算法选择之后,端口之间能够达到局部最优的传输组合(即尽可能多的选择数据包,使其能够填满传输链路),之后经过交叉开关分配(SA),交叉开关传输(ST)和链路传输(LT)阶段,根据所述贪婪算法映射的处理结果在所述交叉开关分配阶段传输所述数据包,将数据包传输至下一跳路由器。After being selected by the greedy algorithm, the ports can achieve a locally optimal transmission combination (that is, select as many data packets as possible so that they can fill the transmission link), and then pass through the crossbar allocation (SA), and the crossbar transmission ( ST) and link transmission (LT) stages, according to the processing result of the greedy algorithm mapping, the data packet is transmitted in the crossbar distribution stage, and the data packet is transmitted to the next-hop router.
由于本发明在图4和图5的步骤基础上,在传统的交叉开关分配(SA)阶段,添加了GA(Greedy Algorithm)阶段,主要目的是进行基于贪婪算法思想的映射,有贪婪分配装置40实现。在GA阶段,收集所有的输入端口的信息,按照贪婪算法思想局部最优的原则,为每个输出端口选择对应的输入端口的子通道。适用贪婪算法的片上路由传输过程中,优选的是执行步骤在同一基准端口中,区分好不同子通道的数据分片,单个子通道的数据分片,发往同一输入端口的数据分片称为同方向,如图8流程图所示,步骤包括:Since the present invention adds the GA (Greedy Algorithm) stage to the traditional crossbar allocation (SA) stage on the basis of the steps in Fig. 4 and Fig. 5, the main purpose is to carry out mapping based on the greedy algorithm idea, and there is a greedy allocation device 40 accomplish. In the GA stage, the information of all input ports is collected, and the sub-channel corresponding to the input port is selected for each output port according to the local optimal principle of the greedy algorithm. In the on-chip routing transmission process where the greedy algorithm is applied, it is preferable to execute the steps in the same reference port, distinguish the data fragments of different sub-channels, and the data fragments of a single sub-channel, and the data fragments sent to the same input port are called In the same direction, as shown in the flowchart in Figure 8, the steps include:
步骤S801,Round-Robin算法选择基准端口;Step S801, the Round-Robin algorithm selects the reference port;
步骤S802,基准端口所有子通道所有虚通道后面有同方向分片;若可以,执行步骤S803,若不可以,执行步骤S804;Step S802, there are fragments in the same direction behind all sub-channels and all virtual channels of the reference port; if possible, perform step S803, if not, perform step S804;
步骤S803,计算基准端口后续数据分片是否可取;若可以,执行步骤S805,若不可以,执行步骤S804;Step S803, calculate whether the subsequent data fragmentation of the reference port is desirable; if yes, execute step S805; if not, execute step S804;
步骤S804,计算后续输入端口的子通道的数据分片是否可取;若可以,执行步骤S805,若不可以,返回;Step S804, calculate whether the data fragmentation of the sub-channel of the subsequent input port is desirable; if yes, execute step S805, if not, return;
步骤S801,取出数据分片。回到执行步骤S802;Step S801, taking out data fragments. Go back to step S802;
如图9所示的为了具体实施例,每个Router有五个输入端口,输入端口一401、输入端口二402、输入端口三403、输入端口四404、输入端口五405,五个输出端口(图中省略)。每个输入端口假设分为4个子通道,每个子通道的缓存深度为2。每个子通道的4选1片选器406,连接四个输出端口。图9中所 示为某一时刻,某一路由器的传输状态,数据包407都期望发往同一个输出端口。As shown in FIG. 9 for a specific embodiment, each Router has five input ports, input port one 401, input port two 402, input port three 403, input port four 404, input port five 405, and five output ports ( omitted in the figure). Each input port is assumed to be divided into 4 sub-channels, and the buffer depth of each sub-channel is 2. The 4-to-1 chip selector 406 of each sub-channel is connected to four output ports. Shown in Fig. 9 is a certain moment, the transmission state of a certain router, and data packet 407 all expects to send to the same output port.
首先,各个数据包到达输入端口之后,经过路由计算之后,将自己的期望输出端口信息统一发往贪婪算法预处理装置。预处理模块41会对收集到的端口信息进行分类和整理。其中输入端口一(以端口一为例)的四个子通道(深度为二)的预处理信息表:Firstly, after each data packet arrives at the input port, after routing calculation, it uniformly sends its expected output port information to the greedy algorithm preprocessing device. The preprocessing module 41 will classify and organize the collected port information. Among them, the preprocessing information table of the four sub-channels (the depth is two) of the input port 1 (taking port 1 as an example):
预处理信息表(输入端口一)Preprocessing information table (input port 1)
在预处理模块41收集了所有端口的信息之后,执行模块42为每个输出端口选择局部最优的端口映射方案。After the preprocessing module 41 collects the information of all ports, the execution module 42 selects a locally optimal port mapping scheme for each output port.
假设本次的基准端口为输入端口一401,则根据贪婪算法执行原则,选择的数据包分别为输入端口一401中,右边是深度为一的1个数据包和左边是深度为二的2个数据包,组合之后宽度与端口和链路的最大宽度相等,因此不选择其它端口的数据包,即输入端口一401的所有数据包可满足最优传输,无需再去选择其它端口;Assuming that the reference port this time is input port 1 401, according to the principle of greedy algorithm execution, the selected data packets are input port 1 401, the right side is 1 data packet with depth 1 and the left side is 2 data packets with depth 2 The width of the data packet after combination is equal to the maximum width of the port and the link, so the data packets of other ports are not selected, that is, all the data packets of the input port one 401 can meet the optimal transmission, and there is no need to select other ports;
假设本次的基准端口为输入端口二402,则根据贪婪算法的执行原则,其将会顺序选择输入端口三403的数据包;Assuming that the reference port this time is the input port 2 402, according to the execution principle of the greedy algorithm, it will sequentially select the data packets of the input port 3 403;
假设本次的基准端口为输入端口三403,则其会选择输入端口四404的一个数据包之后从输入端口五405选择数据包;Assuming that the reference port this time is input port three 403, it will select a data packet from input port four 404 and then select a data packet from input port five 405;
假设本次的基准端口为输入端口四404,则其会选择输入端口五405,因输入端口五405的四个子通道被数据包填满,因此不需要选择其它输入端口的数据包。Assuming that the reference port this time is input port 4 404 , it will select input port 5 405 , because the four sub-channels of input port 5 405 are filled with data packets, so there is no need to select data packets from other input ports.
综上所述,本发明在路由器内部添加预处理模块和执行模块,通过利用贪婪算法思想为路由器内部的输入端口和输出端口之间进行趋向最优的映射,适用于细粒度通道网络中,即将传统的片上宽通路分割成低宽度的子传输通道, 在细粒度的片上网络路由器中,当各子通道的数据包经过路由计算后,将子通道的信息更新至预处理信息表中,执行模块通过对预处理表信息的处理,利用贪婪算法思想对输入端口和输出端口之间进行通道的匹配,达到趋向于最优的映射,提高通道的利用率。In summary, the present invention adds a preprocessing module and an execution module inside the router, and uses the greedy algorithm idea to perform an optimal mapping between the input port and the output port inside the router, which is suitable for fine-grained channel networks. The traditional on-chip wide channel is divided into low-width sub-transmission channels. In the fine-grained on-chip network router, after the data packets of each sub-channel are routed and calculated, the information of the sub-channel is updated to the preprocessing information table, and the execution module Through the processing of the preprocessing table information, the greedy algorithm is used to match the channel between the input port and the output port, so as to achieve the mapping that tends to be optimal and improve the utilization rate of the channel.
当然,本发明还可有其它多种实施例,在不背离本发明精神及其实质的情况下,熟悉本领域的技术人员当可根据本发明作出各种相应的改变和变形,但这些相应的改变和变形都应属于本发明所附的权利要求的保护范围。Certainly, the present invention also can have other multiple embodiments, without departing from the spirit and essence of the present invention, those skilled in the art can make various corresponding changes and deformations according to the present invention, but these corresponding Changes and deformations should belong to the scope of protection of the appended claims of the present invention.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610460938.0A CN106453072A (en) | 2016-06-22 | 2016-06-22 | Greedy distribution method and device of on-chip network router channel resources and router |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610460938.0A CN106453072A (en) | 2016-06-22 | 2016-06-22 | Greedy distribution method and device of on-chip network router channel resources and router |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106453072A true CN106453072A (en) | 2017-02-22 |
Family
ID=58183363
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610460938.0A Pending CN106453072A (en) | 2016-06-22 | 2016-06-22 | Greedy distribution method and device of on-chip network router channel resources and router |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106453072A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111030927A (en) * | 2019-11-20 | 2020-04-17 | 中国人民解放军国防科技大学 | Network-on-chip routing method and network router with sequential perception |
CN114866564A (en) * | 2022-03-24 | 2022-08-05 | 煤炭工业合肥设计研究院有限责任公司 | Communication optimization method, apparatus, electronic device, and readable storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102835081A (en) * | 2012-05-21 | 2012-12-19 | 华为技术有限公司 | Scheduling method, device and system based on three-level interaction and interchange network |
CN102857445A (en) * | 2012-09-10 | 2013-01-02 | 西安电子科技大学 | Low-expenditure distributing structure and distributing method of network-on-chip router |
CN104158738A (en) * | 2014-08-29 | 2014-11-19 | 中国航空无线电电子研究所 | Network-on-chip router with low buffer area and routing method |
-
2016
- 2016-06-22 CN CN201610460938.0A patent/CN106453072A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102835081A (en) * | 2012-05-21 | 2012-12-19 | 华为技术有限公司 | Scheduling method, device and system based on three-level interaction and interchange network |
CN102857445A (en) * | 2012-09-10 | 2013-01-02 | 西安电子科技大学 | Low-expenditure distributing structure and distributing method of network-on-chip router |
CN104158738A (en) * | 2014-08-29 | 2014-11-19 | 中国航空无线电电子研究所 | Network-on-chip router with low buffer area and routing method |
Non-Patent Citations (1)
Title |
---|
WENMING LI ET AL.: "A High-Density Data Path Implementation fitting for HTC Applications", 《GREEN COMPUTING CONFERENCE AND SUSTAINABLE COMPUTING CONFERENCE》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111030927A (en) * | 2019-11-20 | 2020-04-17 | 中国人民解放军国防科技大学 | Network-on-chip routing method and network router with sequential perception |
CN114866564A (en) * | 2022-03-24 | 2022-08-05 | 煤炭工业合肥设计研究院有限责任公司 | Communication optimization method, apparatus, electronic device, and readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN1201532C (en) | Quick-circulating port dispatcher for high-volume asynchronous transmission mode exchange | |
US6768716B1 (en) | Load balancing system, apparatus and method | |
CN104767694B (en) | A kind of stream compression forwarding method towards Fat Tree data center network architectures | |
CN104579962B (en) | A kind of method and device of qos policy that distinguishing different messages | |
CN103152281B (en) | Two-level switch-based load balanced scheduling method | |
CN110995598B (en) | Variable-length message data processing method and scheduling device | |
CN107819695A (en) | A kind of distributed AC servo system SiteServer LBS and method based on SDN | |
EP2880828A1 (en) | System and method for virtual ethernet interface binding | |
CN104022950B (en) | It is a kind of to share the router topology cached with self-configuring | |
CN108833299A (en) | A large-scale network data processing method based on reconfigurable switching chip architecture | |
CN103973564B (en) | The adaptive routing method of interconnected network system | |
CN104683242B (en) | A kind of topological structure and method for routing of two dimension network-on-chip | |
CN106161254A (en) | A kind of many purposes data transmission network road route device, method, chip, router | |
CN1579075A (en) | Method and systems for ordered dynamic distribution of packet flows over network processing means | |
CN107046500B (en) | A Two-Stage Split Router and Its Routing Algorithm for Hierarchical On-Chip Networks | |
CN107566275B (en) | Multi-path transmission method based on the delay inequality opposite sex in data center network | |
CN105721354A (en) | Network-on-chip interconnection method and device | |
CN101227297A (en) | A QoS Guarantee Method for Designing Network-on-Chip | |
CN106453072A (en) | Greedy distribution method and device of on-chip network router channel resources and router | |
CN101695052B (en) | Small cross point buffer high-property crossbar dispatching method | |
CN106168940A (en) | The road network implementation method of high density network-on-chip and device | |
CN103546397A (en) | Support out-of-order self-routing Omega network structure | |
Vaidya et al. | LAPSES: A recipe for high performance adaptive router design | |
Banerjee et al. | Flow-aware allocation for on-chip networks | |
Jeong et al. | Deadlock-free XY-YX router for on-chip interconnection network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20170222 |
|
WD01 | Invention patent application deemed withdrawn after publication |