[go: up one dir, main page]

CN109564562A - Big data operation acceleration system and chip - Google Patents

Big data operation acceleration system and chip Download PDF

Info

Publication number
CN109564562A
CN109564562A CN201880002364.XA CN201880002364A CN109564562A CN 109564562 A CN109564562 A CN 109564562A CN 201880002364 A CN201880002364 A CN 201880002364A CN 109564562 A CN109564562 A CN 109564562A
Authority
CN
China
Prior art keywords
data
interface
chip
storage unit
unit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201880002364.XA
Other languages
Chinese (zh)
Other versions
CN109564562B (en
Inventor
桂文明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Suneng Technology Co ltd
Original Assignee
Bitmain Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Bitmain Technologies Inc filed Critical Bitmain Technologies Inc
Publication of CN109564562A publication Critical patent/CN109564562A/en
Application granted granted Critical
Publication of CN109564562B publication Critical patent/CN109564562B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/173Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
    • G06F15/17306Intercommunication techniques
    • G06F15/17312Routing techniques specific to parallel machines, e.g. wormhole, store and forward, shortest path problem congestion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/173Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
    • G06F15/17356Indirect interconnection networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/173Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
    • G06F15/17356Indirect interconnection networks
    • G06F15/17368Indirect interconnection networks non hierarchical topologies
    • G06F15/17375One dimensional, e.g. linear array, ring

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Theoretical Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Advance Control (AREA)

Abstract

本申请提供了一种大数据运算加速系统以及芯片,通过在芯片中设置多个内核core,每个内核core执行运算和存储控制功能,并且在芯片外部给每个内核core连接至少一个存储单元。采用本发明的技术方案,使得每个内核通过读取自己连接的存储单元和其他内核连接的存储单元,达到每个内核可以具有大容量内存的技术效果,减少了数据从外部存储空间中搬入或者搬出内存的次数,加快了数据的处理速度;同时,由于多个内核可以分别独立运算或者协同运算,这样也加快了数据的处理速度。

The present application provides a big data computing acceleration system and a chip. By arranging multiple core cores in the chip, each core core performs computing and storage control functions, and at least one storage unit is connected to each core core outside the chip. By adopting the technical scheme of the present invention, each core can read the storage unit connected by itself and the storage units connected by other cores, so as to achieve the technical effect that each core can have a large-capacity memory, and reduce the data being transferred from the external storage space or The number of times that the memory is moved out speeds up the data processing speed; at the same time, since multiple cores can operate independently or cooperatively, it also speeds up the data processing speed.

Description

Big data operation acceleration system and chip
Technical field
The present embodiments relate to integrated circuit fields, more particularly to a kind of big data operation acceleration system and chip.
Background technique
ASIC (Application Specific Integrated Circuits) i.e. specific integrated circuit, refers to Ying Te Determine the integrated circuit that user requires the needs with particular electronic system and designs, manufactures.The characteristics of ASIC is towards specific user Demand, ASIC batch production when with universal integrated circuit compared with have volume is smaller, power consumption is lower, reliability raising, property The advantages that energy improves, confidentiality enhances, cost reduces.
With the development of science and technology, more and more fields, such as artificial intelligence, safe operation etc. are directed to macrooperation amount Specific calculation.For certain operations, asic chip can play that its operation is fast, and small power consumption etc. is specific.Meanwhile for these big fortune Calculation amount field, in order to improve the processing speed and processing capacity of data, it usually needs control N number of operation chip while carrying out work Make.With the continuous promotion of data precision, the fields such as artificial intelligence, safe operation need to transport increasing data It calculates, configures multiple storage units to asic chip in order to which storing data is generally required, such as one piece of asic chip will configure 4 pieces of 2G Memory;When operation chip N number of in this way works at the same time, it is necessary to 4N block 2NG memory.But when multioperation chip works at the same time, Data storage capacity does not exceed 2 G, thus causes the waste of storage unit, improves system cost.
Above-mentioned background technique content be only used for help understand the application, and do not represent recognize or approve mentioned by it is any Content belongs to a part of the common knowledge relative to the application.
Summary of the invention
The embodiment of the present invention provides a kind of big data operation acceleration system and chip, and 2 or more ASIC operation chips are led to It crosses bus to be connected with 2 units stored above respectively, the operation chip carries out data exchange by the storage unit, in this way The quantity for not only reducing storage unit decreases the connecting line between ASIC operation chip, simplifies system construction, and Each ASIC operation chip is connect with multiple storage units respectively, not will cause using bus mode and is clashed, without For each ASIC operation chip, Cache is set.
In order to achieve the above objectives, a kind of big data operation acceleration system is provided according to the first aspect of the present embodiment, including More than two operation chips and more than two storage units, in which:
The operation chip include at least one first data-interface (130), more than two second data-interface (150, 151,152,153), at least two kernel core (110,111,112,113), routing unit (120);It is described at least one first Data-interface (130) and more than two second data-interfaces (150,151,152,153) are connected with the routing unit respectively, institute Routing unit is stated to be connected at least two kernels core (110,111,112,113);
The storage unit includes more than two third data-interfaces (250,251,252,253);The storage unit It (20) include more than two memories, routing unit (230) and more than two third data-interfaces (250,251,252,253); Described two above third data-interfaces (250,251,252,253) are connected with the routing unit respectively by bus, described Routing unit is connected with described two devices stored above again.
The second data-interface (150,151,152,153) of the operation chip pass through bus and the storage unit the Three data-interfaces (250,251,252,253) connection.
A kind of big data operation acceleration system, including more than two operation chips are provided according to the second aspect of the present embodiment With more than two storage units, in which:
The operation chip include at least one first data-interface (130), more than two second data-interface (150, 151,152,153), at least two kernel core (110,111,112,113), routing unit (120);Each second data-interface A kernel core is connected, at least two kernels core is connect with the routing unit, at least one described first data Interface (130) is connect with a kernel core (110);
The storage unit includes more than two third data-interfaces (250,251,252,253);The storage unit It (20) include more than two memories, routing unit (230) and more than two third data-interfaces (250,251,252,253); Described two above third data-interfaces (250,251,252,253) are connected with the routing unit respectively by bus, described Routing unit is connected with described two devices stored above again;
The second data-interface (150,151,152,153) of the operation chip pass through bus and the storage unit the Three data-interfaces (250,251,252,253) connection.
According to the third aspect of the present embodiment, a kind of big data operation chip is provided, which is characterized in that the operation chip Including at least one first data-interface (130), more than two second data-interfaces (150,151,152,153), at least two Kernel core (110,111,112,113), routing unit (120);At least one described first data-interface (130) and two with Upper second data-interface (150,151,152,153) is connected with the routing unit respectively, the routing unit and it is described at least Two kernel core (110,111,112,113) are connected;Second data-interface and third data-interface are serdes interface; The second data-interface (150,151,152,153) of the operation chip is connected by bus with storage unit.
According to the fourth aspect of the present embodiment, a kind of big data operation chip is provided, which is characterized in that the operation chip Including at least one first data-interface (130), more than two second data-interfaces (150,151,152,153), at least two Kernel core (110,111,112,113), routing unit (120);Each second data-interface connects a kernel core, described At least two kernel core are connect with the routing unit, at least one described first data-interface (130) and a kernel Core (110) connection;Second data-interface and third data-interface are serdes interface;Second number of the operation chip It is connected by bus with storage unit according to interface (150,151,152,153).
The embodiment of the present invention by by operation chips multiple in big data operation acceleration system respectively with each internal storage location It is connected, has reached the technical effect for saving internal storage location quantity, reduced system cost and decrease between ASIC operation chip Connecting line, simplify system construction, and each ASIC operation chip is connect with multiple storage units respectively, and not will cause makes It is clashed with bus mode, Cache is set without for each ASIC operation chip.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only to show Some embodiments of example property for those of ordinary skill in the art without creative efforts, can be with It obtains other drawings based on these drawings.
Fig. 1 illustrates the of the big data operation acceleration system structural schematic diagram with 4 operation chips and 4 storage units One embodiment;
Fig. 2 a illustrates the first embodiment of the operation chip structure schematic diagram with 4 kernels;
Fig. 2 b illustrates that first embodiment has the operation chip signal flow diagram of 4 kernels;
Fig. 3 a illustrates the second embodiment of the operation chip structure schematic diagram with 4 kernels;
Fig. 3 b illustrates that second embodiment has the operation chip signal flow diagram of 4 kernels;
Fig. 4 a illustrates the 3rd embodiment of memory cell structure schematic diagram corresponding with having the operation chip of 4 kernels;
Fig. 4 b illustrates that 3rd embodiment has the corresponding memory cell signal flow diagram of operation chip of 4 kernels;
Fig. 5 illustrates the big data operation acceleration system connection structure signal with 4 operation chips and 4 storage units Figure;
Fig. 6 illustrates the data structure schematic diagram according to the present embodiment
Specific embodiment
The illustrative embodiments of the present embodiment will be illustrated based on attached drawing below, it should be understood that provide these implementations Mode is used for the purpose of making those skilled in the art can better understand that realizing the present invention in turn, and not limit in any way The scope of the present invention processed.On the contrary, these embodiments are provided so that this disclosure will be more thorough and complete, and can be by this Scope of disclosure is completely communicated to those skilled in the art.
Furthermore, it is necessary to specification, all directions of the upper and lower, left and right in each attached drawing are only with specific embodiment The illustration of progress, those skilled in the art part or all by each component shown in the drawings can change according to actual needs It changing direction to apply, integrally realizing its function without will affect each component or system, this technical solution for changing direction is still It belongs to the scope of protection of the present invention.
Multi core chip is the multiprocessing system for being embodied in single large-scale integrated semiconductor core on piece.Typically, two Or more chip core can be embodied on multi core chip chip, (can also be in identical multi core chip chip by bus Upper formation bus) it is interconnected.Can have from two chip cores to many chip cores and be embodied in identical multicore On microarray biochip, the upper limit in the quantity of chip core is only limited by manufacturing capacity and performance constraints.Multi core chip can be with With application, the application be included in multimedia and signal processing algorithm (such as, encoding and decoding of video, 2D/3D figure, audio and Speech processes, image procossing, phone, speech recognition and sound rendering, encryption) in execute special arithmetic and/or patrol Collect operation.
Although having referred only ASIC specific integrated circuit in the background technology, the specific wiring in embodiment is realized Mode can be applied to in multi core chip CPU, GPU, FPGA etc..Multiple kernels can be in identical in the present embodiment Core is also possible to different kernels.
It for convenience of explanation, below will be with the big data operation of 4 operation chips and 4 storage units present in Fig. 1 Be illustrated for acceleration system, and skilled person will appreciate that, select here 4 operation chips and 4 storage units for Example, is only exemplary explanation, and operation chip number can be N, and wherein N is positive integer more than or equal to 2, such as can be 6, 10,12 etc..Storage unit number can be M, and wherein M is the positive integer more than or equal to 2, such as can be 6,9,12 etc.. N and M can be equal in embodiment, can also be not desired to.Multiple operation chips can be identical operation in the present embodiment Chip is also possible to different operation chips.
Attached drawing 1 is the of the big data operation acceleration system structural schematic diagram with 4 operation chips and 4 storage units One embodiment.As shown in Figure 1, big data operation acceleration system includes depositing including 4 operation chips (10,11,12,13) and 4 Storage unit (20,21,22,23);Each operation chip is connected by bus with all storage units, and the operation chip passes through institute It states storage unit and carries out data exchange, pass between operation chip and change data;Control instruction is sent between operation chip.
Proprietary storage region and shared storage area are set in each storage unit;The proprietary storage region is for storing The interim operation result of one operation chip, the interim operation result are the intermediate computations that one operation chip continues with As a result, and results of intermediate calculations that other operation chips not will use;The shared storage area is for storing operation chip Data operation is as a result, data operation result is used by other operation chips, or needs to carry out feedback transmission to outside.Certainly, Storage unit can not also be divided in order to facilitate management.Here storage unit may for DDR, SDDR, DDR2, DDR3, The high speeds external memory such as DDR4, GDDR5, GDDR6, HMC, HBM.Storage unit preferably selects DDR series memory herein, DDR (Dual Data Rate) memory, that is, Double Data Rate synchronous DRAM.DDR has used synchronous circuit, makes to specify Address, the conveying of data and output key step not only independently executed, but also kept fully synchronized with CPU;DDR has used DLL (Delay Locked Loop, delay locked loop provide a data filtering signal) technology, when data are effective, storage control This data filtering signal can be used to be accurately positioned data in device processed, and every 16 output is primary, and re-synchronization is deposited from difference The data of memory modules.The frequency of DDR memory can indicate that working frequency is interior with working frequency and equivalent frequency two ways The actual working frequency of particle is deposited, but since DDR memory can all transmit data in the rising and falling edges of pulse, is passed The equivalent frequency of transmission of data is twice of working frequency.DDR2 (Double Data Rate 2) memory is that (electronics is set by JEDEC Standby engineering joint committee) the memory techniques standard of new generation developed, each clock of DDR2 memory can be total with 4 times of outsides The speed read/write data of line, and can be run with 4 times of Internal Control Bus IBC of speed.DDR3,DDR4,GDDR5,GDDR6, HMC, HBM memory are all the prior arts, are just not described in detail here.
4 ASIC operation chips are connected with 4 storage units respectively by bus, the operation chip is deposited by described Storage unit carries out data exchange, and this not only reduces the quantity of storage unit, decrease the company between ASIC operation chip Wiring simplifies system construction, and each ASIC operation chip is connect with multiple storage units respectively, not will cause using total Line mode and clash, without for each ASIC operation chip be arranged Cache.
Fig. 2 a illustrates the first embodiment of the operation chip structure schematic diagram with 4 kernels.And those skilled in the art It is found that being only exemplary explanation, the number of operation chip core can be Q, and wherein Q is big for selecting 4 kernels here In the positive integer equal to 2, such as it can be 6,10,12 etc..Operation chip core can be with identical in the present embodiment The kernel of function is also possible to the kernel of different function.
The operation chip (10) of 4 kernels includes 4 kernel core (110,111,112,113), a routing units (120), a data switching control unit (130) and 4 serdes interfaces (150,151,152,153).One data exchange Control unit, 4 serdes interfaces are connected with routing unit respectively by bus, routing unit again with each kernel core phase Even.Data exchange control unit can be realized using various protocols, such as UART, SPI, PCIE, SERDES, USB etc., Data exchange control unit is UART (Universal Asynchronous Receiver/ in present embodiment Transmitter) control unit (130).Universal asynchronous receiving-transmitting transmitter is commonly referred to as UART, is a kind of asynchronous receiving-transmitting transmission Device, the data that it will be transmitted are converted between serial communication and parallel communications, and UART is usually integrated in various communications In the connection of interface.But only said by taking UART protocol as an example here, other agreements can also be used.UART control unit (130) external data or control instruction can be received, send control instruction to other chips, receive control from other chips and refer to It enables, and to external feedback op result or intermediate data etc..
Serdes is English SERializer (serializer)/DESerializer (deserializer) abbreviation.It is a kind of master The serial communication technology of the time division multiplexing (TDM) of stream, point-to-point (P2P).I.e. in transmitting terminal multi-path low speed parallel signal quilt High-speed serial signals are converted into, by transmission media (optical cable or copper wire), are finally converted again in receiving end high-speed serial signals At speed parallel signals.This point-to-point serial communication technology makes full use of the channel capacity of transmission media, reduces required Transmission channel and device pin number, the transmission speed of promotion signal, to substantially reduce communications cost.Certainly, here may be used To replace serdes interface using other communication interfaces, such as: SSI, UATR.Pass through serdes between chip and storage unit Interface and transmission line carry out data and control instruction transmission.
The major function of kernel core is to execute depositing for external or internal control instruction, the calculating of execution data and data The functions such as storage control.
Routing unit is used to send data or control instruction to kernel core (110,111,112,113), and receives Kernel core (110,111,112,113) sends data or control instruction, realizes the communication between kernel core.Receive internal Either external control instruction is written data, reads data or sends control to internal storage location by serdes interface to storage unit System instruction;If either internally or externally control instruction is used to control the control instruction of other chips, routing unit refers to control Order is sent to UART control unit (130), is sent from UART control unit (130) to other chips;If necessary to other cores When piece sends data, routing unit transmits data to storage unit by serdes interface, other chips are obtained by storage unit Access evidence;If necessary to which when receiving data from other chips, routing unit obtains data from storage unit by serdes interface. Routing unit and receive external control instruction by UART control unit (130), to each kernel core (110,111,112, 113) control instruction is sent;Receive external data by UART control unit (130), according to external data address by external data It is sent to kernel core (110,111,112,113) or storage unit.The internal data or internal control instruction are Perhaps the control instruction external data or external control instruction refer to that chip exterior generates to the data that finger chip itself generates Data perhaps control instruction for example external host, external network send data or control instruction.
Fig. 2 b illustrates that first embodiment has the operation chip signal flow diagram of 4 kernels.The UART interface (130) being used to obtain chip exterior data, perhaps control instruction routing unit (120) will according to data or control instruction address Perhaps control instruction is sent to kernel core to data or routing unit (120) is sent to serdes by serdes interface and connects The storage unit of mouth connection.If the destination address of external control instruction is directed toward other chips, routing unit is by control instruction UART control unit (130) are sent to, are sent from UART control unit (130) to other chips.UART interface (130) is according to outer Portion's control instruction or internal control, which are instructed, is sent to outside for operation result, and operation result can be from the kernel of operation chip Core is obtained, and the storage unit that the connection of serdes interface can also be obtained by serdes interface obtains.Outside described here It can refer to external host, external network or external platform etc..External host can pass through UART control unit initial configuration Storage unit parameter carries out unified addressing to multiple storage particles.
Kernel core can send the control instruction for obtaining or data being written to routing unit, carry number in control instruction According to address, routing unit is read to storage unit or is written data by serdes interface according to address.Kernel core can also To send data or control instruction to other kernels core by routing unit according to address, and by routing unit from its His kernel core obtains data or control instruction.Kernel core is calculated according to the data of acquisition, and calculated result is deposited It stores up in storage unit.Proprietary storage region and shared storage area are set in each storage unit;The proprietary storage region For storing the interim operation result of an operation chip, which is continued with Results of intermediate calculations, and the results of intermediate calculations that other operation chips not will use;The shared storage area is for storing fortune The data operation of chip is calculated as a result, data operation result is used by other operation chips, or needs to carry out feedback biography to outside It is defeated.If the control instruction that kernel core is generated is used to control the operation of other chips, routing unit sends control instruction UART control unit (130) are given, are sent from UART control unit (130) to other chips.If the control that kernel core is generated Instruction is for controlling storage unit, then routing unit sends control instruction to storage unit by serdes interface.
Fig. 3 a illustrates the second embodiment of the operation chip structure schematic diagram with 4 kernels.According to Fig. 3 a it is found that The operation chip of 4 kernels includes 4 kernel core (110,111,112,113), a routing unit (120), a UART Control unit (130) and 4 serdes interfaces (150,151,152,153).Each serdes interface connects a kernel core, 4 kernel core are connected to routing unit, and the UART control unit (130) is connected to kernel core (110).
Fig. 3 b illustrates that second embodiment has the operation chip signal flow diagram of 4 kernels.The UART control For obtaining chip exterior data, perhaps external data or control instruction are transferred to and UART unit (130) by control instruction The kernel core (110) of control unit connection.External data or control instruction are transferred to routing unit by kernel core (110) (120), it is corresponding according to data or control instruction address to be sent to data address by routing unit for data or control instruction Kernel core (111,112,113).If the destination address of data or control instruction is the kernel core of this operation chip, Data or control instruction are sent to kernel core (110,111,112,113) by routing unit.If data or control instruction Destination address be storage unit, then by kernel core (111,112,113) by serdes interface (151,152,153) transmission To corresponding storage unit.What kernel core (110) can also directly be connected data or control instruction by itself Serdes interface (150) is sent to corresponding storage unit.In this case, routing unit stores all memory unit addresses Corresponding serdes interface.If the destination address of data or control instruction is other operation chips, data are by kernel Core (111,112,113) is sent to corresponding storage unit by serdes interface (151,152,153);Control instruction passes through UART control unit is sent to other operation chips.Kernel core is instructed according to external control instruction or internal control by operation As a result when perhaps intermediate data feeds back to outside kernel core from serdes interface from storage unit obtain operation result or in Between data, by operation result, perhaps intermediate data is sent to routing unit routing unit and sends out operation result or intermediate data The kernel core (110) for giving the connection of UART control unit, finally by UART control unit by operation result or mediant According to being sent to outside.Operation result is obtained if it is serdes interface corresponding to the kernel core connected as UART control unit Perhaps at this moment operation result or intermediate data are just directly sent to outside by UART control unit by intermediate data.Here The outside can refer to external host, external network or external platform etc..External host can pass through UART control unit Initial configuration storage unit parameter carries out unified addressing to multiple storage units.
Kernel core can send control instruction to routing unit, and routing unit is according to the address of control instruction into other Perhaps storage unit sends other kernels of control instruction core, other chips or storage unit receiving for core core, other chips After control instruction, corresponding operation is executed.When kernel core sends control instruction or data to other kernels core, pass through road It is directly forwarded by unit.Kernel core sends control instruction to other chips and is sent by UART control unit.Kernel core to When storage unit sends control instruction, routing unit serdes interface according to corresponding to address lookup address, by control instruction It is sent to the corresponding kernel core of serdes interface, then corresponding serdes interface, serdes interface are sent to by kernel core Control instruction is sent to storage unit.When kernel core sends data to other chips or storage unit, routing unit according to Control instruction is sent to the corresponding kernel core of serdes interface by serdes interface corresponding to address lookup address, then by Kernel core is sent to corresponding serdes interface, and serdes interface sends data to storage unit.Other chips are by depositing Storage unit obtains data.It when kernel core obtains data from internal storage location, reads and carries data address in control instruction, routing is single Control instruction is sent to the corresponding kernel of serdes interface by member serdes interface according to corresponding to address lookup address Core, then corresponding serdes interface is sent to by kernel core, serdes interface sends to storage unit and reads control instruction, Destination address and source address are carried in instruction.Serdes interface sends the data to serdes after storage unit acquisition data Data packet including source address and destination address is sent to routing unit by interface corresponding kernel core, kernel core, is routed The data packet is sent to corresponding kernel core according to destination address by unit.If kernel core has found the destination address and is If its own address, then kernel core obtains data and is handled.And kernel core can also by routing unit to its His kernel core sends data and perhaps order and obtains data or order from other kernels core by routing unit.It is interior Core core is calculated according to the data of acquisition, and by calculated result storage into storage unit.It is arranged in each storage unit Proprietary storage region and shared storage area;The proprietary storage region is used to store the interim operation knot of an operation chip Fruit, which is the results of intermediate calculations that one operation chip continues with, and other operation chips will not The results of intermediate calculations used;The shared storage area is used to store the operational data of operation chip as a result, the operational data As a result it is used by other operation chips, or needs to carry out feedback transmission to outside.
Fig. 4 a illustrates the first embodiment of memory cell structure schematic diagram corresponding with having the operation chip of 4 kernels. Storage unit (20) includes C memory, is illustrated by taking C=4 as an example here, and wherein C is just whole more than or equal to 2 certainly Number, such as can be 6,10,12 etc.;Memory (240,241,242,243) include storage control (220,221,222, 223) and particle (210,211,212,213) are stored;Storage control is used to be written or read to storage particle according to instruction Data, storage particle is for storing data.Storage unit (20) further comprises that (230) 4 serdes of a routing unit connect Mouth (250,251,252,253).4 serdes interfaces are connected with routing unit respectively by bus, and routing unit is again and each Memory is connected.
Fig. 4 b illustrates that the first of memory cell signal flow diagram corresponding with having the operation chip of 4 kernels implements Example.Storage unit (20) receives control instruction by serdes interface (250,251,252,253), and control instruction is sent to road By unit (230), routing unit according to the address in control instruction, by control instruction be sent to corresponding memory (240, 241,242,243), storage control (220,221,222,223) executes relevant operation according to control instruction.Such as according to initial Change configuration memory parameter, unified addressing is carried out to multiple storage particles;Or according to reset indication, weight is carried out to storage particle Set reset;The operation such as write instruction or sense order.Receive operation chip by serdes interface (250,251,252,253) The acquisition data command of transmission, the address that obtain data is carried in instruction, and routing unit is obtained according to address to memory transmission Data command is taken, storage control obtains data from storage particle according to acquisition data command, leads to data according to source address Cross the operation chip that serdes interface is sent to demand data.Receive operation by serdes interface (250,251,252,253) The write-in data command and data that chip is sent, carry the address that data are written in instruction, and routing unit is according to address to depositing Reservoir sends write-in data command and data, and data are written to storage particle according to write-in data command in storage control.Write-in Data command and data can be synchronous transfer, be also possible to asynchronous transmission.Proprietary storage region is set in each storage unit The shared storage area and;The proprietary storage region is used to store the interim operation result of an operation chip, the interim operation As a result the results of intermediate calculations continued with for one operation chip, and the intermediate computations that other operation chips not will use As a result;The shared storage area is used to store the operational data of operation chip as a result, the operational data result is by other operations Chip uses, or needs to carry out feedback transmission to outside.
Fig. 5 illustrates the big data operation acceleration system connection structure signal with 4 operation chips and 4 storage units Figure.There are 4 operation chips (10,11,12,13) and 4 storage units (20,21,22,23) for system in figure 5.Operation core The structure of piece can be chip structure disclosed in first embodiment and second embodiment, and certain operation chip is also possible to this field The equivalent improved chip structure that technical staff carries out for the first and second embodiments, these equivalent improved chip structures In the range of the present embodiment protection.The structure of storage unit can be memory cell structure disclosed in 3rd embodiment, deposit certainly Storage unit is also possible to the equivalent improved memory cell structure that those skilled in the art carry out for 3rd embodiment, these etc. With improved memory cell structure also in the range of the present embodiment protection.The operation chip (10) in big data operation acceleration system UART control unit (130) be connected with external host, the UART control unit (130) of each chip (10,11,12,13) is logical Bus is crossed to be connected.It is single that each serdes interface (150,151,152,153) of chip (10,11,12,13) connects a storage The serdes interface (250,251,252,253) of first (20,21,22,23), and then realize that each operation chip passes through bus and institute There is storage unit to be connected, the operation chip carries out data exchange by the storage unit, passes between operation chip Change data.Operation chip and the inside and outside signal flow of storage unit in the first, second, and third embodiment in detail It illustrates, is not just described again here.
The system is applied in artificial intelligence field, and the UART control unit (130) of operation chip (10) is by external host The image data or video data of transmission by serdes interface (150,151,152,153) storage to storage unit (20, 21,22,23) in, operation chip (10,11,12,13) generates the mathematical model of neural network, which can also be by outer Portion's host arrives storage unit (20,21,22,23) by serdes interface (150,151,152,153) storage, by each operation core Piece (10,11,12,13) is read.Neural network first layer mathematical model is run on operation chip (10), operation chip (10) is logical It crosses serdes interface and reads data progress operation from storage unit (20,21,22,23), and operation result is connect by serdes At least one storage unit in storage unit (20,21,22,23) is arrived in mouth storage.Operation chip (10) is controlled single by UART First (130) send control instruction to operation chip (20), and starting operation chip (20) carries out operation.It is transported on operation chip (20) Row neural network second layer mathematical model, operation chip (20) are read by serdes interface from storage unit (20,21,22,23) Access is stored in storage unit (20,21,22,23) at least according to progress operation, and by operation result by serdes interface One storage unit.Each chip execute neural network in one layer, by serdes interface from storage unit (20,21, 22,23) data are obtained and carries out operation, only calculate operation result to neural network the last layer.Operation chip (10) passes through Serdes interface obtains operation result from storage unit (20,21,22,23), feeds back to outside by UART control unit (130) Host.
The system is applied in encryption digital cash field, and the UART control unit (130) of operation chip (10) will be external The block information that host is sent is stored at least one storage unit in storage unit (20,21,22,23).External host is logical It crosses operation chip (10,11,12,13) UART control unit (130) and sends to control to 4 operation chips (10,11,12,13) and refer to It enables and carries out data operation, 4 operation chips (10,11,12,13) start arithmetic operation.Can certainly external host to one Operation chip (10) UART control unit (130) send control instruction carry out data operation, operation chip (10) successively to other 3 A operation chip (11,12,13) sends control instruction and carries out data operation, and 4 operation chips (10,11,12,13) start operation Operation.Control instruction can also be sent to operation chip (10) UART control unit (130) with external host and carry out data fortune It calculates, the first operation chip (10) sends control instruction to the second operation chip (11) and carries out data operation, the second operation chip (11) control instruction being sent to third operation chip (12) and carrying out data operation, third operation chip (12) is to the 4th operation chip (13) it sends control instruction and carries out data operation, 4 operation chips (10,11,12,13) start arithmetic operation.4 operation chips (10,11,12,13) block information data are read from storage unit by serdes interface, 4 operation chips (10,11,12, 13) proof of work operation is carried out simultaneously, and operation chip (10) obtains operation result from storage unit (20,21,22,23), leads to It crosses UART control unit (130) and feeds back to external host.
Operation chip described in above-described embodiment and the number of memory cells are all equal, the at this moment storage units The second data-interface number and the operation chip the second data-interface number be all storage unit quantity.
But skilled person will appreciate that, the operation chip and the number of memory cells be also possible to it is unequal, At this moment the second data-interface number of the storage unit is the quantity of operation chip, the second data-interface of the operation chip Number is the quantity of storage unit.Such as operation chip is 4, storage unit is 5, and 5 are at this moment arranged on operation chip 4 the second data-interfaces are arranged in second data-interface in storage unit.
Bus can be this field using centralized arbitration bus structures or loop wire topology bus structures, bussing technique Common technology, therefore be just not described in detail herein.
Fig. 6 illustrates data structure schematic diagram according to the present invention.Data mentioned here be order data, numeric data, A variety of data such as character data.Data format specifically include significance bit valid, destination address dst id, source address src id and Data data.Kernel can judge that the data packet is order or numerical value by significance bit valid, can be assumed for 0 generation here Table numerical value, 1 represents order.Kernel can judge destination address, source address and data type according to data structure.From when instruction operation From the point of view of in sequence, in the present embodiment use six traditional stage pipeline structures, respectively fetching, decoding, execution, memory access, alignment and Write-back stage.From instruction set architecture, reduced instruction set computer framework can be taken.According to the universal design of reduced instruction set computer framework Method, instruction set of the present invention can be divided into the instruction of register-register type by function, and register-immediate instruction jumps finger It enables, access instruction, control instruction and intercore communication instruction.
Using description provided herein, embodiment can be realized by using the programming and/or engineering technology of standard At machine, process or manufacture to generate programming software, firmware, hardware or any combination thereof.
The program (multiple) (having computer readable program code) of any generation can be embodied in one or more On medium workable for computer, such as resident storage equipment, smart card or other movable memory equipments or transmission equipment, To make computer program product and manufacture according to embodiment.As such, as used in this article term " manufacture " and " computer program product " is intended to cover permanently or temporarily non-transitory in the presence of can be used in any computer Medium on computer program.
As noted above, memory/storage is (all including but not limited to disk, CD, movable memory equipment Such as smart card, subscriber identity module (SIM), wireless identity module (WIM)), semiconductor memory (such as random access memory (RAM), read-only memory (ROM), programmable read only memory (PROM)) etc..Transmission medium is including but not limited to via wireless Communication network, internet, intranet, the network communication based on telephone/modem, hard-wired/cabled communication network, satellite Communication and other fixations or the transmission of mobile network system/communication link.
Although specific example embodiment has been disclosed, it will be appreciated by those skilled in the art that not carrying on the back In the case where from the spirit and scope of the present invention, specific example embodiments can be changed.
Above with reference to attached drawing, the present invention is illustrated based on embodiment, but the present invention is not limited to above-mentioned embodiment party The part of each embodiment and each variation is constituted the scheme after appropriately combined or displacement according to layout needs etc., also wrapped by formula Containing within the scope of the invention.Furthermore it is also possible to which the knowledge based on those skilled in the art suitably recombinates the group of each embodiment Conjunction and processing sequence, or the deformation such as various design alterations is applied to each embodiment, it has been applied the implementation of such deformation Mode may also be within the scope of the present invention.

Claims (17)

1. a kind of big data operation acceleration system, which is characterized in that single including more than two operation chips and more than two storages Member, in which:
The operation chip include at least one first data-interface (130), more than two second data-interface (150,151, 152,153), at least two kernel core (110,111,112,113), routing unit (120);At least one described first data Interface (130) and more than two second data-interfaces (150,151,152,153) are connected with the routing unit respectively, the road It is connected by unit at least two kernels core (110,111,112,113);
The storage unit includes more than two third data-interfaces (250,251,252,253);Storage unit (20) packet Include more than two memories, routing unit (230) and more than two third data-interfaces (250,251,252,253);Described two A above third data-interface (250,251,252,253) is connected with the routing unit respectively by bus, and the routing is single Member is connected with described two devices stored above again;
The second data-interface (150,151,152,153) of the operation chip passes through the third number of bus and the storage unit It is connected according to interface (250,251,252,253).
2. a kind of big data operation acceleration system, which is characterized in that single including more than two operation chips and more than two storages Member, in which:
The operation chip include at least one first data-interface (130), more than two second data-interface (150,151, 152,153), at least two kernel core (110,111,112,113), routing unit (120);Each second data-interface connection One kernel core, at least two kernels core are connect with the routing unit, at least one described first data-interface (130) it is connect with a kernel core (110);
The storage unit includes more than two third data-interfaces (250,251,252,253);Storage unit (20) packet Include more than two memories, routing unit (230) and more than two third data-interfaces (250,251,252,253);Described two A above third data-interface (250,251,252,253) is connected with the routing unit respectively by bus, and the routing is single Member is connected with described two devices stored above again;
The second data-interface (150,151,152,153) of the operation chip passes through the third number of bus and the storage unit It is connected according to interface (250,251,252,253).
3. system according to claim 1 or 2, which is characterized in that second data-interface and third data-interface are Serdes interface, first data-interface are the UART interface of UART control unit.
4. system according to claim 1 or 2, which is characterized in that the operation chip and the number of memory cells phase It is storage unit Deng, the third data-interface quantity of the storage unit and the second data-interface quantity of the operation chip Quantity.
5. system according to claim 1 or 2, which is characterized in that the routing unit by it is described at least one first Data-interface (130) sends control instruction to external chip.
6. system according to claim 1 or 2, which is characterized in that pass through second data-interface between chip and deposit Storage unit sends or receives data.
7. system according to claim 1 or 2, which is characterized in that the routing unit by it is described at least one first Data-interface (130) receives external data and perhaps control instruction and is sent to the external data received or control instruction Kernel core or storage unit.
8. system according to claim 1 or 2, which is characterized in that the memory (240,241,242,243) includes depositing Store up controller (220,221,222,223) and storage particle (210,211,212,213), wherein the storage control is used for To the write-in of storage particle or data are read according to instruction, the storage particle is for storing data.
9. system according to claim 8, which is characterized in that the routing unit of the storage unit passes through described two A above third data-interface (250,251,252,253) receives control instruction, and according to the address in control instruction, will control Instruction is sent to corresponding memory (240,241,242,243).
10. system according to claim 8, which is characterized in that the routing unit of the storage unit passes through described The data that more than two third data-interfaces (250,251,252,253) will acquire are sent to operation chip.
11. system according to claim 8, which is characterized in that be arranged in the storage unit of the storage unit special There are storage region and shared storage area.
12. system according to claim 8, which is characterized in that the storage particle is HMC memory.
13. system according to claim 1 or 2, which is characterized in that described two above operation chips can execute encryption One or more of operation, convolutional calculation.
14. system according to claim 1 or 2, which is characterized in that described two above operation chips execute independence respectively Operation, each computing unit calculates separately result.
15. system according to claim 1 or 2, which is characterized in that described two above operation chips can execute collaboration Operation, each computing unit carry out operation according to the calculated result of other more than two operation chips.
16. a kind of big data operation chip, which is characterized in that the operation chip includes at least one first data-interface (130), more than two second data-interfaces (150,151,152,153), at least two kernel core (110,111,112, 113), routing unit (120);The second data-interface of at least one described first data-interface (130) and two or more (150, 151,152,153) it is connected respectively with the routing unit, the routing unit and at least two kernels core (110, 111,112,113) it is connected;Second data-interface and third data-interface are serdes interface;
The second data-interface (150,151,152,153) of the operation chip is connected by bus with storage unit.
17. a kind of big data operation chip, which is characterized in that the operation chip includes at least one first data-interface (130), more than two second data-interfaces (150,151,152,153), at least two kernel core (110,111,112, 113), routing unit (120);Each second data-interface connects kernel a core, at least two kernels core and institute Routing unit connection is stated, at least one described first data-interface (130) is connect with a kernel core (110);Described second Data-interface and third data-interface are serdes interface;
The second data-interface (150,151,152,153) of the operation chip is connected by bus with storage unit.
CN201880002364.XA 2018-10-30 2018-10-30 Big data operation acceleration system and chip Active CN109564562B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/112688 WO2020087276A1 (en) 2018-10-30 2018-10-30 Big data operation acceleration system and chip

Publications (2)

Publication Number Publication Date
CN109564562A true CN109564562A (en) 2019-04-02
CN109564562B CN109564562B (en) 2022-05-13

Family

ID=65872661

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201880002364.XA Active CN109564562B (en) 2018-10-30 2018-10-30 Big data operation acceleration system and chip

Country Status (2)

Country Link
CN (1) CN109564562B (en)
WO (1) WO2020087276A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023125448A1 (en) * 2021-12-30 2023-07-06 声龙(新加坡)私人有限公司 Proof-of-work operation method, proof-of-work chip, and upper computer

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112214448B (en) * 2020-10-10 2024-04-09 声龙(新加坡)私人有限公司 Data dynamic reconstruction circuit and method of heterogeneous integrated workload proving operation chip
CN114691591B (en) * 2020-12-31 2024-06-28 中科寒武纪科技股份有限公司 Circuit, method and system for inter-chip communication
CN118330446B (en) * 2024-06-13 2024-08-20 电子科技大学 Cross-core ASIC chip aging prediction method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050169355A1 (en) * 2004-01-30 2005-08-04 Broadcom Corporation Transceiver device with a transmit clock signal phase that is phase-locked with a receiver clock signal phase
CN105550140A (en) * 2014-11-03 2016-05-04 联想(北京)有限公司 Electronic device and data processing method
CN107451075A (en) * 2017-09-22 2017-12-08 算丰科技(北京)有限公司 Data processing chip and system, data storage forwarding and reading and processing method
CN108536642A (en) * 2018-06-13 2018-09-14 北京比特大陆科技有限公司 Big data operation acceleration system and chip
CN209784995U (en) * 2018-10-30 2019-12-13 北京比特大陆科技有限公司 Big Data Operation Acceleration System and Chip

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102314377B (en) * 2010-06-30 2014-08-06 国际商业机器公司 Accelerator and method thereof for supporting virtual machine migration
CN103634945A (en) * 2013-11-21 2014-03-12 安徽海聚信息科技有限责任公司 SOC-based high-performance cloud terminal
CN105183683B (en) * 2015-08-31 2018-06-29 浪潮(北京)电子信息产业有限公司 A kind of more fpga chip accelerator cards

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050169355A1 (en) * 2004-01-30 2005-08-04 Broadcom Corporation Transceiver device with a transmit clock signal phase that is phase-locked with a receiver clock signal phase
CN105550140A (en) * 2014-11-03 2016-05-04 联想(北京)有限公司 Electronic device and data processing method
CN107451075A (en) * 2017-09-22 2017-12-08 算丰科技(北京)有限公司 Data processing chip and system, data storage forwarding and reading and processing method
CN108536642A (en) * 2018-06-13 2018-09-14 北京比特大陆科技有限公司 Big data operation acceleration system and chip
CN209784995U (en) * 2018-10-30 2019-12-13 北京比特大陆科技有限公司 Big Data Operation Acceleration System and Chip

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023125448A1 (en) * 2021-12-30 2023-07-06 声龙(新加坡)私人有限公司 Proof-of-work operation method, proof-of-work chip, and upper computer

Also Published As

Publication number Publication date
WO2020087276A1 (en) 2020-05-07
CN109564562B (en) 2022-05-13

Similar Documents

Publication Publication Date Title
CN108536642B (en) Big data computing acceleration systems and chips
CN109564562A (en) Big data operation acceleration system and chip
US20240273041A1 (en) Devices using chiplet based storage architectures
CN101739241A (en) On-chip multi-core DSP cluster and application extension method
US20240403236A1 (en) Devices using chiplet based storage architectures
CN209149287U (en) Big data computing acceleration system
CN102301363A (en) Data processsing node, system and method
US9830283B2 (en) Multi-mode agent
CN102929329B (en) Method for dynamically reconfiguring interconnection network between systems-on-chip
CN115129657B (en) Programmable logic resource expansion device and server
WO2025112837A1 (en) Server system, job execution method and apparatus, device, and medium
US11797311B2 (en) Asynchronous pipeline merging using long vector arbitration
CN103222286B (en) Route switching device, network switching system and route switching method
CN209560543U (en) Big Data Computing Chip
CN116842998A (en) Distributed optimization-based multi-FPGA collaborative training neural network method
US8645557B2 (en) System of interconnections for external functional blocks on a chip provided with a single configurable communication protocol
CN115834602A (en) Asynchronous data flow communication interaction system
CN209784995U (en) Big Data Operation Acceleration System and Chip
US20080320186A1 (en) Memory device capable of communicating with host at different speeds, and data communication system using the memory device
CN104898775A (en) Calculation apparatus, storage device, network switching device and computer system architecture
CN112740193B (en) Method for executing calculations by big data calculation acceleration system
CN115994115B (en) Chip control method, chip set and electronic equipment
CN209543343U (en) Big data operation acceleration system
CN208298179U (en) Big data operation acceleration system and chip
CN112783813B (en) Interconnectable HART communication protocol chip and use method thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20210813

Address after: 100192 Building No. 25, No. 1 Hospital, Baosheng South Road, Haidian District, Beijing, No. 301

Applicant after: SUANFENG TECHNOLOGY (BEIJING) Co.,Ltd.

Address before: 100192 2nd Floor, Building 25, No. 1 Hospital, Baosheng South Road, Haidian District, Beijing

Applicant before: BITMAIN TECHNOLOGIES Inc.

TA01 Transfer of patent application right
TA01 Transfer of patent application right

Effective date of registration: 20220225

Address after: Room 501-1, unit 4, floor 5, building 2, yard 9, FengHao East Road, Haidian District, Beijing 100089

Applicant after: Beijing suneng Technology Co.,Ltd.

Address before: 100192 Building No. 25, No. 1 Hospital, Baosheng South Road, Haidian District, Beijing, No. 301

Applicant before: SUANFENG TECHNOLOGY (BEIJING) CO.,LTD.

GR01 Patent grant
GR01 Patent grant