[go: up one dir, main page]

CN111090611B - Small heterogeneous distributed computing system based on FPGA - Google Patents

Small heterogeneous distributed computing system based on FPGA Download PDF

Info

Publication number
CN111090611B
CN111090611B CN201811247613.XA CN201811247613A CN111090611B CN 111090611 B CN111090611 B CN 111090611B CN 201811247613 A CN201811247613 A CN 201811247613A CN 111090611 B CN111090611 B CN 111090611B
Authority
CN
China
Prior art keywords
data
module
fpga
cpu
calculation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201811247613.XA
Other languages
Chinese (zh)
Other versions
CN111090611A (en
Inventor
陈钰文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Xuehu Information Technology Co ltd
Original Assignee
Shanghai Xuehu Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Xuehu Information Technology Co ltd filed Critical Shanghai Xuehu Information Technology Co ltd
Priority to CN201811247613.XA priority Critical patent/CN111090611B/en
Publication of CN111090611A publication Critical patent/CN111090611A/en
Application granted granted Critical
Publication of CN111090611B publication Critical patent/CN111090611B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7807System on chip, i.e. computer system on a single chip; System in package, i.e. computer system on one or more chips in a single package
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7839Architectures of general purpose stored program computers comprising a single central processing unit with memory
    • G06F15/7842Architectures of general purpose stored program computers comprising a single central processing unit with memory on one IC chip (single chip microcontrollers)
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/76Architectures of general purpose stored program computers
    • G06F15/78Architectures of general purpose stored program computers comprising a single central processing unit
    • G06F15/7867Architectures of general purpose stored program computers comprising a single central processing unit with reconfigurable architecture
    • G06F15/7871Reconfiguration support, e.g. configuration loading, configuration switching, or hardware OS
    • G06F15/7878Reconfiguration support, e.g. configuration loading, configuration switching, or hardware OS for pipeline reconfiguration

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • Microelectronics & Electronic Packaging (AREA)
  • Multi Processors (AREA)

Abstract

The invention discloses a small heterogeneous distributed computing system based on FPGA, which belongs to the technical field of computationally intensive hardware design and comprises a data input module, a data computing module and a data return module; the data input module is used for scattering and reorganizing data and sending the data to the data calculation module in a serial form in a pipeline form; the data calculation module is used for receiving the data input module and transmitting the data to the data return module; the data return module is used for grouping the sequence of the arrival of the calculation output result of the front stage data input module through the disordered return data, and the invention can furthest exert the advantages of FPGA (field programmable gate array) flow calculation and large throughput and is very suitable for the calculation requirement of tattletale; and the distributed core computing units adopt an FPGA cascade configurable strategy to configure according to specific computing requirements.

Description

Small heterogeneous distributed computing system based on FPGA
Technical Field
The invention relates to the technical field of computationally intensive hardware design, in particular to a small heterogeneous distributed computing system based on an FPGA.
Background
Most existing open source frameworks of software are based on an operating system, which is based on a hardware unit, and a core unit of the hardware unit involved in computation is a CPU. At present, the CPU may be divided into different architectures such as x86 and MIPS, POWERPC, ARM according to different manufacturers or different instruction sets, but is essentially a von willebrand architecture, each operation is simplified to be executed by a single instruction, and the single instruction undergoes the most basic steps of accessing, fetching, decoding, executing and writing back to complete the actual life cycle of the single instruction. Therefore, each calculation CPU can perform relatively complex and time-consuming instruction translation execution process from a microscopic analysis. Moreover, for the CPU, execution among the multiple instructions must be performed sequentially, that is, the next instruction must wait for the execution of the last instruction to complete, so that the time-consuming calculation accumulated in the microcosmic area will not be satisfied by the macroscopic real-time high-density calculation. While various optimization approaches such as branch prediction, superscalar, hyper-threading, etc. have been proposed for computational performance inadequacies of the CPU, they are merely optimizations, and their most fundamental architectural problems have not been eliminated.
GPUs are also becoming increasingly widely used for increasing computational and complexity market demands. Compared with a CPU, the GPU has the data parallel capability which is not possessed by the CPU, and can carry out block parallel operation on data, so that the GPU has a larger data throughput rate, and can better support streaming computation with large data volume like multimedia, images and audios and videos. However, the GPU is currently running on the operating system for most applications, and needs to interact with the CPU, so that the computing process winds around the CPU-based frame, and the drawbacks are obvious. Furthermore, more critical is that the GPU can only perform data parallelism, which cannot implement a deeply pipelined computing module, and the data entering the GPU must be cross-linked before and after one computing process, and once the data are correlated with each other, the GPU must wait for the data preparation to be completed before entering the next computing process. Therefore, although data parallelism is realized, the data parallelism is not really used, and the data parallelism must be finished by the data of the previous operation to be really calculated.
The existing distributed computing system adopts CPU or GPU of the Von architecture, wherein the CPU is not suitable for intensive data computation, the CPU is more suitable for task scheduling, the GPU has higher efficiency, but still only data parallelism, and the instruction pipeline depth is still limited, so that the CPU and the GPU are not suitable for intensive computation; the existing FPGA computing modules aiming at acceleration all adopt high-performance FPGA chips to form an FPGA computing block by adopting a PCIE protocol cascade mode, so that great requirements are brought to the requirements on the aspects of PCB design, cost and the like, in addition, the number of FPGA integration is limited by the mode, and once a single FPGA in the integrated module fails, the whole system is paralyzed; at the computing nodes of the distributed computing system, the node data is not received in a CPU+NIC mode.
Based on the above, the invention designs a small heterogeneous distributed computing system based on an FPGA to solve the problems.
Disclosure of Invention
The invention aims to provide a small heterogeneous distributed computing system based on an FPGA, so as to solve the problems that the CPU or the GPU of a von willebrand system architecture adopted by a computing unit of the existing distributed computing system proposed in the background art is not suitable for intensive data computation, the CPU is more suitable for task scheduling, the GPU has higher efficiency but still has data parallelism, and the instruction pipeline depth is still limited, so that the CPU and the GPU are not suitable for intensive computation; the existing FPGA computing modules aiming at acceleration all adopt high-performance FPGA chips to form an FPGA computing block by adopting a PCIE protocol cascade mode, so that great requirements are brought to the requirements on the aspects of PCB design, cost and the like, in addition, the number of FPGA integration is limited by the mode, and once a single FPGA in the integrated module fails, the whole system is paralyzed; in the computing nodes of the distributed computing system, the problem of receiving node data in a CPU+NIC mode is solved.
In order to achieve the above purpose, the present invention provides the following technical solutions: a small heterogeneous distributed computing system based on an FPGA comprises a data input module, a data computing module and a data return module;
The data input module is used for scattering and reorganizing data and sending the data to the data calculation module in a serial form in a pipeline form;
the data calculation module is used for receiving the data input module and transmitting the data to the data return module;
the data returning module is used for grouping the sequence of the arrival of the output result calculated by the front stage data input module through the out-of-order returning data.
Preferably, the data input module includes, but is not limited to, a CPU, an FPGA and a DDR hardware module;
The FPGA module is used for receiving data and scattering and reorganizing the data;
The CPU module is directly connected with the FPGA module at a high speed through QPI protocol and is used for rapidly and dynamically configuring the FPGA module to receive and transmit data.
Preferably, the data input module further comprises at least two groups of ethernet physical interfaces, and one group of ethernet physical interfaces is used for receiving data;
and the other group of the Ethernet physical interfaces is used for data forwarding.
Preferably, the data input module further comprises a reassembly pipeline module, and the ethernet physical interface for receiving data may spread serial input data and transfer parallel data to the reassembly pipeline module.
Preferably, the data calculation module comprises at least one group of data calculation units, and the data calculation units comprise a single group of FPGA, DDR and at least two groups of ethernet physical interfaces.
Preferably, the data feedback module comprises a post-stage processing module, and the post-stage processing module is used for improving the data throughput of the recombined data in a deep stream mode.
Compared with the prior art, the invention has the beneficial effects that: the invention can exert the advantages of FPGA pipelining calculation and large throughput to the greatest extent, and is very suitable for the calculation requirement of tattletale; the distributed core computing units adopt an FPGA cascade configurable strategy to configure according to specific computing requirements; in the data distribution module and the data return module, the FPGA is communicated with the CPU through the QPI bus, the CPU can directly access the memory controller of the FPGA, and the FPGA can be directly informed of reading and writing data, so that a great amount of time is saved compared with the traditional mode that the CPU and the FPGA share the memory; the network protocol stack is realized to directly transmit and receive network data packets through the FPGA, so that the time for executing a large amount of decoding in the process of transmitting and checking by the CPU can be saved, and the overall transmitting and receiving time can be increased by an order of magnitude.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a diagram of the overall framework of a distributed heterogeneous computing system of the present invention;
FIG. 2 is a diagram of a distributed heterogeneous computing system hardware framework in accordance with the present invention;
FIG. 3 is a block diagram of the embodiment of FIG. 2 of the present invention;
FIG. 4 is an enlarged view of the left end of FIG. 3 in accordance with the present invention;
FIG. 5 is an enlarged view of the right end connection of FIG. 4 in accordance with the present invention;
FIG. 6 is an enlarged view of the right end connection of FIG. 5 in accordance with the present invention;
FIG. 7 is an enlarged view of the right end connection of FIG. 6 in accordance with the present invention;
fig. 8 is a block diagram of a data computing unit according to the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Referring to fig. 1-8, the present invention provides a technical solution: a small heterogeneous distributed computing system based on an FPGA comprises a data input module, a data computing module and a data return module;
The data input module is used for scattering and reorganizing data and sending the data to the data calculation module in a serial form in a pipeline form;
the data calculation module is used for receiving the data input module and transmitting the data to the data return module;
the data returning module is used for grouping the sequence of the arrival of the output result calculated by the front stage data input module through the out-of-order returning data.
It should be noted that the system is composed of three parts, namely a data input module, a data calculation module and a data return module. The input module is composed of hardware modules such as a CPU, an FPGA, a DDR and the like. After being transmitted to an input module through a network, input data is directly received by the FPGA, and then the data is scattered and recombined and forwarded in a pipeline mode. The CPU in the input module is directly connected with the FPGA at high speed through QPI protocol, the CPU plays a role of rapidly and dynamically configuring the related strategy of receiving and transmitting data by the FPGA, and the CPU does not directly participate in receiving and transmitting, verifying, reorganizing and the like of the data. The FPGA in the input module internally realizes a complete TCP/IP protocol stack, and externally configures a group (two total) of Ethernet physical interfaces, one special for receiving data and the other special for forwarding data. At a data receiving end, expanding serial input data into parallel data and transmitting the parallel data to a reorganization pipeline module; at one data output end, before data forwarding, the parallel output data of the recombination pipeline module is converted into serial data through a local frequency doubling mode. The reorganized data are serially distributed to the subsequent computing modules in a manner several times higher than the input rate. The computing module is composed of a group of computing units, and each computing unit is a single FPGA, DDR and two Ethernet physical interfaces. The data distributed from the input module reaches each calculation unit after passing through the exchanger, the calculation unit receives the data through the IP with the TCP/IP protocol stack realized inside, transmits the data to the specially calculated IP, and forwards the data to the post-stage back transmission module through the Ethernet interface after the calculation is completed. The hardware composition of the data feedback module is the same as that of the data input module, but the FPGA of the data feedback module feeds back data in an out-of-order mode, and particularly the data feedback module groups according to the arrival sequence of the calculation output result of the front-stage calculation module.
In still further embodiments, the data input module includes, but is not limited to, a CPU, an FPGA, and a DDR hardware module;
The FPGA module is used for receiving data and scattering and reorganizing the data;
The CPU module is directly connected with the FPGA module at a high speed through QPI protocol and is used for rapidly and dynamically configuring the FPGA module to receive and transmit data.
In a further embodiment, the data input module further includes at least two groups of ethernet physical interfaces, and one group of ethernet physical interfaces is configured to receive data;
and the other group of the Ethernet physical interfaces is used for data forwarding.
In a further embodiment, the data input module further includes a reassembly pipeline module, and the ethernet physical interface for receiving data may spread serial input data and pass parallel data to the reassembly pipeline module.
In a further embodiment, the data computing module includes at least one group of data computing units, and the data computing units include a single group of FPGA, DDR, and at least two groups of ethernet physical interfaces.
In a further embodiment, the data backhaul module includes a post-processing module, and the post-processing module is configured to improve data throughput by deep pipelining the reorganized data;
as shown in fig. 2, the hardware framework of the distributed heterogeneous computing system designed by the present invention includes a front-end data distribution module, a data computing unit, and a data backhaul unit. Fig. 3 is a specific design of fig. 2. The data distribution module adopts a CPU+FPGA architecture, and the CPU and the FPGA are connected through a PCIE or QPI bus. Front-end network data is input to the data distribution module through a route or a switch, and is cached by the FPGA in the data distribution module and the cascaded DDR thereof. And if the post-stage calculation module does not need to reorganize the data at this time, the FPGA directly distributes the cached data in parallel through the internal integrated data distribution IP unit. If the latter FPGA computing unit needs to reconstruct the data before computation, the data is directly connected to the data reconstruction module after the FPGA buffer module in series, and then forwarded to the latter computing unit. If the data reorganization is complex, the operation needed by the reorganization can be converted into the instruction corresponding to the MIG module in the FPGA under the condition of the reorganization strategy which needs to be changed dynamically, and then the instruction is directly sent to the FPGA through the PCIE or QPI bus which is directly connected with the CPU and the FPGA, so that the FPGA can also change the strategy of data reorganization quickly while buffering the data efficiently. The data calculation unit is completely composed of a plurality of groups of single-block FPGAs, and the total number of the data calculation units is dynamically distributed according to the actual calculation amount or communication tasks. The data computation unit in the monolithic FPGA is completed by an internal unique IPCore. The hardware composition of the data feedback unit is consistent with that of the data distribution module, and the difference is that the CPU transmits MIG instructions to the FPGA, and the specific design implementation of the FPGA internal data result recombination module and the result feedback module is realized.
As shown in fig. 3, the data distribution module is cascaded with the data calculation module through a switch or other network devices, and the data calculation unit is cascaded with the back-stage data feedback module through another switch or network device, so that the reason for using two groups of network devices is to fully adapt to the deep pipeline structure in the calculation unit module, and ensure the high data throughput capability of the system.
As shown in fig. 4-7, the hardware architecture of the data distribution and data return modules is the same, and two network physical interfaces are provided at the peripheral part of the FPGA, and the physical interfaces can be RJ45, ST, SC. For the data distribution module, the data distribution module receives network computing data through one port, performs data recombination in a deep pipeline mode through an internal special IPCore, and then forwards the data to the post-processing module through the other port. The data throughput rate is greatly improved by means of dual ports and deep pipelining. For the data backhaul module, the design of the internal special IPCore is different from that of the data receiving module, and the IPCore function of the data backhaul module is to repack the calculation results arriving out of order according to rules, attach labels and then backhaul the calculation results to the later modules. The data distribution and data return modules all realize a network protocol stack inside the FPGA. As shown in fig. 8, the data calculation unit is composed of a single FPGA plus dual network interfaces. According to actual requirements, the single computing unit can be deployed by a single node, and can also be partially interconnected into a star network or a ring network according to the complex condition of computing tasks, and the formed local network and other nodes form a computing unit part in the computing system together. Thus, the computing unit portion is dynamically configured to structurally account for the needs of the task. Inside each node of the computing unit, a proprietary IPCore is used for parallel pipeline computation.
In the description of the present specification, the descriptions of the terms "one embodiment," "example," "specific example," and the like, mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the present invention. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The preferred embodiments of the invention disclosed above are intended only to assist in the explanation of the invention. The preferred embodiments are not exhaustive or to limit the invention to the precise form disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best understand and utilize the invention. The invention is limited only by the claims and the full scope and equivalents thereof.

Claims (6)

1. A small heterogeneous distributed computing system based on FPGA, characterized in that: the system comprises a data input module, a data calculation module and a data return module;
The data input module is used for scattering and reorganizing data and sending the data to the data calculation module in a serial form in a pipeline form;
The data calculation module is connected with the data input module and is used for transmitting data to the data return module;
the data returning module groups the sequence of the arrival of the calculation output result of the data calculation module according to the out-of-order returning data;
The system consists of three parts, namely a data input module, a data calculation module and a data feedback module, wherein the input module consists of a CPU, an FPGA and a DDR hardware module, input data is directly received by the FPGA after being transmitted to the input module through a network, then the data is scattered and recombined and forwarded in a pipeline mode, the CPU in the input module is directly connected with the FPGA at a high speed through a QPI protocol, the CPU plays a role in rapidly and dynamically configuring a strategy related to the FPGA for receiving and transmitting the data, the CPU does not directly participate in the receiving, the verification and the recombination of the data, the FPGA in the input module internally realizes a complete TCP/IP protocol stack, a group of Ethernet physical interfaces are externally configured, one is specially used for receiving the data, the other is specially used for forwarding the data, and at one data receiving end, the serial input data is unfolded into parallel data to be transmitted to a recombination pipeline module; at one data output end, before data forwarding, the parallel output data of the recombination pipeline module is converted into serial data through a local frequency doubling mode, and the recombined data is serially distributed to a later stage computing module in a mode of being several times higher than the input rate;
The hardware framework of the system comprises a front-end data distribution module, a data calculation unit and a data return unit, wherein the data distribution module adopts a CPU+FPGA framework, and the CPU and the FPGA are connected through a PCIE or QPI bus; front-end network data are input to a data distribution module through a route or a switch, the data are cached together by an FPGA in the data distribution module and the cascaded DDR thereof, and if the latter calculation module does not need to reorganize the data at this time, the cached data are directly distributed in parallel by the FPGA through an internally integrated data distribution IP unit; if the latter FPGA computing unit needs to reconstruct the data before computation, the data is directly connected to the data reconstruction module after the FPGA buffer module in series, and then forwarded to the latter computing unit; if the data recombination is complex, under the condition that the recombination strategy needs to be changed dynamically, converting the operation needed by the recombination into an instruction corresponding to an MIG module in the FPGA, and then directly sending the instruction to the FPGA through a PCIE or QPI bus directly connected with the CPU and the FPGA, so that the FPGA can also change the strategy of the data recombination quickly while buffering the data efficiently.
2. The FPGA-based small heterogeneous distributed computing system of claim 1, wherein: the data input module comprises a CPU, an FPGA and a DDR hardware module;
The FPGA module is used for receiving data and scattering and reorganizing the data;
The CPU module is directly connected with the FPGA module at a high speed through QPI protocol and is used for rapidly and dynamically configuring the FPGA module to receive and transmit data.
3. The FPGA-based small heterogeneous distributed computing system of claim 2, wherein: the data input module also comprises at least two groups of Ethernet physical interfaces, and one group of Ethernet physical interfaces is used for receiving data; and the other group of the Ethernet physical interfaces is used for data forwarding.
4. A FPGA-based small heterogeneous distributed computing system as claimed in claim 3, wherein: the data input module further comprises a reorganization pipeline module, and the Ethernet physical interface for receiving data is used for expanding serial input data into parallel data and transmitting the parallel data to the reorganization pipeline module.
5. The FPGA-based small heterogeneous distributed computing system of claim 1, wherein: the data computing module comprises at least one group of data computing units, and the data computing units comprise a single group of FPGA, DDR and at least two groups of Ethernet physical interfaces.
6. The FPGA-based small heterogeneous distributed computing system of claim 1, wherein: the data return module comprises a post-stage processing module, and the post-stage processing module is used for improving the data throughput of the recombined data in a deep stream mode.
CN201811247613.XA 2018-10-24 2018-10-24 Small heterogeneous distributed computing system based on FPGA Active CN111090611B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811247613.XA CN111090611B (en) 2018-10-24 2018-10-24 Small heterogeneous distributed computing system based on FPGA

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811247613.XA CN111090611B (en) 2018-10-24 2018-10-24 Small heterogeneous distributed computing system based on FPGA

Publications (2)

Publication Number Publication Date
CN111090611A CN111090611A (en) 2020-05-01
CN111090611B true CN111090611B (en) 2024-08-27

Family

ID=70392706

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811247613.XA Active CN111090611B (en) 2018-10-24 2018-10-24 Small heterogeneous distributed computing system based on FPGA

Country Status (1)

Country Link
CN (1) CN111090611B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114531459B (en) * 2020-11-03 2024-05-07 深圳市明微电子股份有限公司 Cascade device parameter self-adaptive acquisition method, device, system and storage medium
CN114138481A (en) * 2021-11-26 2022-03-04 浪潮电子信息产业股份有限公司 Data processing method, device and medium
CN114595185A (en) * 2022-02-25 2022-06-07 山东云海国创云计算装备产业创新中心有限公司 A multi-CPU system and its communication method

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107145467A (en) * 2017-05-13 2017-09-08 贾宏博 A kind of distributed computing hardware system in real time

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8059650B2 (en) * 2007-10-31 2011-11-15 Aruba Networks, Inc. Hardware based parallel processing cores with multiple threads and multiple pipeline stages
CH705650B1 (en) * 2007-11-12 2013-04-30 Supercomputing Systems Ag Parallel computer system, method for parallel processing of data.
CN104657330A (en) * 2015-03-05 2015-05-27 浪潮电子信息产业股份有限公司 High-performance heterogeneous computing platform based on x86 architecture processor and FPGA
CN106339351B (en) * 2016-08-30 2019-05-10 浪潮(北京)电子信息产业有限公司 An SGD algorithm optimization system and method
US10558575B2 (en) * 2016-12-30 2020-02-11 Intel Corporation Processors, methods, and systems with a configurable spatial accelerator
CN107066802B (en) * 2017-01-25 2018-05-15 人和未来生物科技(长沙)有限公司 A kind of heterogeneous platform calculated towards gene data
US20180239725A1 (en) * 2017-02-17 2018-08-23 Intel Corporation Persistent Remote Direct Memory Access
CN107273331A (en) * 2017-06-30 2017-10-20 山东超越数控电子有限公司 A kind of heterogeneous computing system and method based on CPU+GPU+FPGA frameworks
CN108563808B (en) * 2018-01-05 2020-12-04 中国科学技术大学 Design Method of Heterogeneous Reconfigurable Graph Computation Accelerator System Based on FPGA
CN108052839A (en) * 2018-01-25 2018-05-18 知新思明科技(北京)有限公司 Mimicry task processor

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107145467A (en) * 2017-05-13 2017-09-08 贾宏博 A kind of distributed computing hardware system in real time

Also Published As

Publication number Publication date
CN111090611A (en) 2020-05-01

Similar Documents

Publication Publication Date Title
CN103345461B (en) Based on the polycaryon processor network-on-a-chip with accelerator of FPGA
CN111090611B (en) Small heterogeneous distributed computing system based on FPGA
CN106250103A (en) A kind of convolutional neural networks cyclic convolution calculates the system of data reusing
US10140124B2 (en) Reconfigurable microprocessor hardware architecture
US7069372B1 (en) Processor having systolic array pipeline for processing data packets
CN101488922B (en) Network-on-chip router with adaptive routing capability and its implementation method
CN108537331A (en) A kind of restructural convolutional neural networks accelerating circuit based on asynchronous logic
JP7389231B2 (en) synchronous network
CN102306371B (en) Hierarchical parallel modular sequence image real-time processing device
Haghi et al. A reconfigurable compute-in-the-network fpga assistant for high-level collective support with distributed matrix multiply case study
CN114138707B (en) A Data Transmission System Based on FPGA
CN113114593B (en) Dual-channel router in network on chip and routing method thereof
CN104035896B (en) Off-chip accelerator applicable to fusion memory of 2.5D (2.5 dimensional) multi-core system
CN106941488B (en) Multi-layer protocol packet encapsulation device and method based on FPGA
Jonna et al. Minimally buffered single-cycle deflection router
Wang et al. A flexible high speed star network based on peer to peer links on FPGA
Su et al. Technology trends in large-scale high-efficiency network computing
CN113704169B (en) Embedded configurable many-core processor
Ueno et al. VCSN: Virtual circuit-switching network for flexible and simple-to-operate communication in HPC FPGA cluster
Zhu et al. BiLink: A high performance NoC router architecture using bi-directional link with double data rate
Mahafzah et al. Performance evaluation of broadcast and global combine operations in all-port wormhole-routed OTIS-Mesh interconnection networks
RU2830044C1 (en) Vector computing device
Pande et al. Performance optimization for system-on-chip using network-on-chip and data compression
Bertozzi et al. An asynchronous soft macro for ultra-low power communication in neuromorphic computing
CN110516800A (en) Deep learning network application distributed self-assembly instruction processor core, processor, circuit and processing method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant