CN116185498A - Storage and calculation integrated chip, calculation method and device thereof - Google Patents
Storage and calculation integrated chip, calculation method and device thereof Download PDFInfo
- Publication number
- CN116185498A CN116185498A CN202310195494.2A CN202310195494A CN116185498A CN 116185498 A CN116185498 A CN 116185498A CN 202310195494 A CN202310195494 A CN 202310195494A CN 116185498 A CN116185498 A CN 116185498A
- Authority
- CN
- China
- Prior art keywords
- data
- instruction
- storage
- calculation
- module
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000004364 calculation method Methods 0.000 title claims abstract description 115
- 238000003860 storage Methods 0.000 title claims abstract description 114
- 238000013500 data storage Methods 0.000 claims abstract description 29
- 238000000034 method Methods 0.000 claims description 36
- 238000004590 computer program Methods 0.000 claims description 26
- 238000012545 processing Methods 0.000 claims description 13
- 238000013528 artificial neural network Methods 0.000 claims description 11
- 230000005540 biological transmission Effects 0.000 abstract description 12
- 238000010586 diagram Methods 0.000 description 16
- 230000006870 function Effects 0.000 description 12
- 230000008569 process Effects 0.000 description 8
- 238000004891 communication Methods 0.000 description 5
- 238000005516 engineering process Methods 0.000 description 5
- 230000010354 integration Effects 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 230000003068 static effect Effects 0.000 description 3
- 230000004913 activation Effects 0.000 description 2
- 238000013135 deep learning Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 1
- 230000001413 cellular effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 239000004973 liquid crystal related substance Substances 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000011176 pooling Methods 0.000 description 1
- 230000000750 progressive effect Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000005549 size reduction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/30—Arrangements for executing machine instructions, e.g. instruction decode
- G06F9/30098—Register arrangements
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/76—Architectures of general purpose stored program computers
- G06F15/78—Architectures of general purpose stored program computers comprising a single central processing unit
- G06F15/7839—Architectures of general purpose stored program computers comprising a single central processing unit with memory
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/06—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
- G06N3/063—Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Software Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Biophysics (AREA)
- Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Computer Hardware Design (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Neurology (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Microcomputers (AREA)
Abstract
Description
技术领域technical field
本发明涉及芯片技术领域,尤其涉及一种存算一体芯片、其计算方法及装置。The present invention relates to the field of chip technology, in particular to a memory-computing integrated chip, a calculation method and a device thereof.
背景技术Background technique
本部分旨在为权利要求书中陈述的本发明实施例提供背景或上下文。此处的描述不因为包括在本部分中就承认是现有技术。This section is intended to provide a background or context to embodiments of the invention that are recited in the claims. The descriptions herein are not admitted to be prior art by inclusion in this section.
近年来,为了解决器件尺寸微缩挑战与冯诺依曼计算架构瓶颈,存算一体(或者也可称为存内计算、内存计算等)技术得到人们的广泛关注,其基本思想是存储与计算融合在同一个芯片,从而直接利用存储器进行计算,降低功耗的同时提高性能。In recent years, in order to solve the challenges of device size reduction and the bottleneck of von Neumann computing architecture, storage and computing integration (or also called in-memory computing, memory computing, etc.) technology has attracted widespread attention. The basic idea is to integrate storage and computing. On the same chip, the memory is directly used for calculation, reducing power consumption while improving performance.
存算一体技术目前被认为是后摩尔时代解决大数据实时智能处理的高效硬件方案之一,也是目前深度学习神经网络高效实施方案之一。对于深度学习神经网络应用,其最频繁的运算是乘积累加运算(Multiply Accumulate,简称MAC),通过存算一体的方式可以高效的实现MAC运算,从而在大幅度提高性能的同时降低功耗。在面向复杂的神经网络应用场景时,考虑到外设交互以及除MAC运算外的计算需求,单一存算一体芯片往往难以满足设计需要,因此需要添加额外的处理器(CPU),通过处理器的控制单元以及计算模块实现复杂的计算任务。当前,外部的控制单元及计算模块与存算一体芯片的设计多采用分立的形式,即CPU与存算一体芯片间仅通过总线进行数据传递,大量的数据传递过程限制了神经网络系统的计算效率。The integration of storage and computing technology is currently considered to be one of the efficient hardware solutions for real-time intelligent processing of big data in the post-Moore era, and it is also one of the current efficient implementation solutions for deep learning neural networks. For deep learning neural network applications, the most frequent operation is Multiply Accumulate (MAC for short). The MAC operation can be efficiently realized through the integration of storage and calculation, thereby greatly improving performance while reducing power consumption. When facing complex neural network application scenarios, considering peripheral interaction and computing requirements other than MAC operations, it is often difficult for a single memory-computing integrated chip to meet the design requirements. Therefore, it is necessary to add an additional processor (CPU). The control unit and the calculation module implement complex calculation tasks. At present, the design of external control unit, computing module and integrated memory and calculation chip mostly adopts a discrete form, that is, the CPU and integrated storage and calculation chip only transmit data through the bus, and a large number of data transmission processes limit the calculation efficiency of the neural network system. .
发明内容Contents of the invention
本发明的一个目的在于提供一种存算一体芯片,减少数据传输,提高计算效率。本发明的另一个目的在于提供一种存算一体芯片的计算方法。本发明的再一个目的在于提供一种神经网络计算装置。本发明的还一个目的在于提供一种计算机设备。本发明的还一个目的在于提供一种计算机可读介质。An object of the present invention is to provide a memory-computing integrated chip, which reduces data transmission and improves computing efficiency. Another object of the present invention is to provide a calculation method for an integrated storage and calculation chip. Another object of the present invention is to provide a neural network computing device. Another object of the present invention is to provide a computer device. Another object of the present invention is to provide a computer-readable medium.
为了达到以上目的,本发明一方面公开了一种存算一体芯片,包括RISCV处理器、至少一个存储阵列以及计算模块;In order to achieve the above purpose, the present invention discloses a memory-computing integrated chip on the one hand, including a RISCV processor, at least one storage array, and a computing module;
所述RISCV处理器包括处理器核心模块和数据存储器;The RISCV processor includes a processor core module and a data memory;
所述处理器核心模块用于接收外部指令,确定所述外部指令为RISCV指令还是扩展指令,若为扩展指令,根据所述外部指令确定权重数据和待处理数据,将所述权重数据写入所述存储阵列,将所述待处理数据发送至所述数据存储器;The processor core module is used to receive an external instruction, determine whether the external instruction is a RISCV instruction or an extended instruction, if it is an extended instruction, determine weight data and data to be processed according to the external instruction, and write the weight data into the The storage array, sending the data to be processed to the data storage;
所述计算模块根据所述存储阵列的权重数据和所述数据存储器的待处理数据得到计算结果。The calculation module obtains a calculation result according to the weight data of the storage array and the data to be processed in the data storage.
优选的,进一步包括本地指令存储器、总线和外部存储器;Preferably, it further includes a local instruction memory, a bus and an external memory;
所述处理器核心模块通过所述总线与所述外部存储器连接;The processor core module is connected to the external memory through the bus;
所述处理器核心模块进一步用于在接收到外部指令后,将所述外部指令存储至所述本地指令存储器。The processor core module is further configured to store the external instruction in the local instruction memory after receiving the external instruction.
优选的,所述数据存储器包括本地数据存储模块和存算一体缓存模块;Preferably, the data storage includes a local data storage module and a storage-computing integrated cache module;
所述本地数据存储模块用于存储处理所述RISCV指令所需的计算数据;The local data storage module is used for storing calculation data required for processing the RISCV instruction;
所述存算一体缓存模块用于存储所述待处理数据。The storage-computing integrated cache module is used to store the data to be processed.
优选的,进一步包括与所述存储阵列对应的写入模块和读取模块;Preferably, it further includes a write module and a read module corresponding to the storage array;
所述写入模块包括与所述存储阵列对应的行译码器和列译码器;The writing module includes a row decoder and a column decoder corresponding to the memory array;
所述读取模块包括读译码器。The reading module includes a reading decoder.
优选的,所述计算模块包括加法树和累加器;Preferably, the calculation module includes an addition tree and an accumulator;
所述处理器核心模块用于根据所述外部指令确定比特选择信号,根据所述比特选择信号从所述数据存储器的待处理数据中选择一比特数据,将所述一比特数据发送至所述加法树以使所述加法树将所述权重数据和所述一比特数据相乘得到相乘结果并发送至所述累加器;The processor core module is used to determine a bit selection signal according to the external instruction, select one bit of data from the data to be processed in the data memory according to the bit selection signal, and send the one bit of data to the addition tree so that the addition tree multiplies the weight data and the one-bit data to obtain a multiplication result and sends it to the accumulator;
所述累加器用于将所述待处理数据的所有比特数据对应的相乘结果相加得到所述计算结果。The accumulator is used to add the multiplication results corresponding to all the bit data of the data to be processed to obtain the calculation result.
优选的,所述处理器核心模块包括依次连接的取指模块、译码模块、执行模块和访存回写模块,还包括存算一体控制单元;Preferably, the processor core module includes an instruction fetch module, a decoding module, an execution module, and a memory access and write-back module connected in sequence, and also includes a memory-computing integrated control unit;
所述取指模块包括程序计数器,用于从本地指令存储器中获取外部指令;The instruction fetching module includes a program counter for obtaining external instructions from the local instruction memory;
所述译码模块包括判断单元和寄存器堆,所述判断单元用于确定所述外部指令为RISCV指令还是扩展指令,若为扩展指令,将所述外部指令发送至所述存算一体控制单元,若为RISCV指令,将所述RISCV指令发送至所述寄存器堆;The decoding module includes a judging unit and a register file, the judging unit is used to determine whether the external instruction is a RISCV instruction or an extended instruction, and if it is an extended instruction, the external instruction is sent to the integrated storage and calculation control unit, If it is a RISCV instruction, sending the RISCV instruction to the register file;
所述执行模块用于执行所述RISCV指令得到RISCV计算数据;The execution module is used to execute the RISCV instruction to obtain RISCV calculation data;
所述访存回写模块包括存储器访存单元,接收所述RISCV计算数据并存储,并传输至所述寄存器堆;The memory access and write-back module includes a memory access unit, which receives and stores the RISCV calculation data, and transmits it to the register file;
所述存算一体控制单元用于根据所述外部指令确定权重数据和待处理数据,将所述权重数据写入所述存储阵列,将所述待处理数据发送至所述数据存储器,并控制所述计算模块基于所述权重数据和所述待处理数据得到所述计算结果。The integrated storage and calculation control unit is used to determine the weight data and the data to be processed according to the external instruction, write the weight data into the storage array, send the data to be processed to the data memory, and control the The calculation module obtains the calculation result based on the weight data and the data to be processed.
本发明还公开了一种存算一体芯片的计算方法,所述存算一体芯片包括至少一个存储阵列、计算模块以及数据存储器;The present invention also discloses a computing method for a memory-computing integrated chip. The memory-computing integrated chip includes at least one storage array, a computing module, and a data memory;
所述方法包括:The methods include:
接收外部指令;Receive external instructions;
确定所述外部指令为RISCV指令还是扩展指令,若为扩展指令,根据所述外部指令确定权重数据和待处理数据,将所述权重数据写入所述存储阵列,将所述待处理数据发送至所述数据存储器;Determine whether the external instruction is a RISCV instruction or an extended instruction, if it is an extended instruction, determine weight data and data to be processed according to the external instruction, write the weight data into the storage array, and send the data to be processed to said data storage;
通过所述计算模块根据所述存储阵列的权重数据和所述数据存储器的待处理数据得到计算结果。A calculation result is obtained by the calculation module according to the weight data of the storage array and the data to be processed in the data memory.
本发明还公开了一种神经网络计算装置,包括如上所述的存算一体芯片。The invention also discloses a neural network computing device, which includes the storage-computing integrated chip as described above.
本发明还公开了一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如上所述方法。The present invention also discloses a computer device, which includes a memory, a processor, and a computer program stored in the memory and operable on the processor, and the processor implements the above-mentioned method when executing the computer program.
本发明还公开了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现如上所述方法。The present invention also discloses a computer-readable storage medium, where a computer program is stored in the computer-readable storage medium, and when the computer program is executed by a processor, the above method is realized.
本发明存算一体芯片包括RISCV处理器、至少一个存储阵列以及计算模块。所述RISCV处理器包括处理器核心模块和数据存储器。所述处理器核心模块用于接收外部指令,确定所述外部指令为RISCV指令还是扩展指令,若为扩展指令,根据所述外部指令确定权重数据和待处理数据,将所述权重数据写入所述存储阵列,将所述待处理数据发送至所述数据存储器。所述计算模块根据所述存储阵列的权重数据和所述数据存储器的待处理数据得到计算结果。从而,本发明将RISCV处理器与存算一体功能集成在一个芯片中,减少了数据存储和计算时的数据传输过程,提升数据计算效率。The storage and calculation integrated chip of the present invention includes a RISCV processor, at least one storage array and a calculation module. The RISCV processor includes a processor core module and a data memory. The processor core module is used to receive an external instruction, determine whether the external instruction is a RISCV instruction or an extended instruction, if it is an extended instruction, determine weight data and data to be processed according to the external instruction, and write the weight data into the the storage array, and send the data to be processed to the data storage. The calculation module obtains a calculation result according to the weight data of the storage array and the data to be processed in the data storage. Therefore, the present invention integrates the RISCV processor and the integrated function of storage and calculation into one chip, which reduces the data transmission process during data storage and calculation, and improves the efficiency of data calculation.
附图说明Description of drawings
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。在附图中:In order to more clearly illustrate the technical solutions in the embodiments of the present invention or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present invention. Those skilled in the art can also obtain other drawings based on these drawings without creative work. In the attached picture:
图1为本发明存算一体芯片具体实施例的示意图;Fig. 1 is a schematic diagram of a specific embodiment of an integrated storage and calculation chip of the present invention;
图2为本发明存算一体芯片具体实施例RISCV处理器的示意图;FIG. 2 is a schematic diagram of a RISCV processor of a specific embodiment of an integrated chip for storage and calculation according to the present invention;
图3为本发明存算一体芯片具体实施例存算一体计算单元的示意图;Fig. 3 is a schematic diagram of an integrated storage and calculation calculation unit of a specific embodiment of the integrated storage and calculation chip of the present invention;
图4为本发明存算一体芯片具体实施例数据存储器的示意图;Fig. 4 is a schematic diagram of a data memory of a specific embodiment of an integrated storage and calculation chip according to the present invention;
图5为本发明存算一体芯片具体实施例计算模块的示意图;Fig. 5 is a schematic diagram of a computing module of a specific embodiment of the integrated storage and computing chip of the present invention;
图6为本发明存算一体芯片具体实施例向量乘法实现的流程图;Fig. 6 is a flow chart of implementing vector multiplication of a specific embodiment of the integrated storage and calculation chip of the present invention;
图7为本发明存算一体芯片具体实施例处理器核心模块的示意图;Fig. 7 is a schematic diagram of a processor core module of a specific embodiment of an integrated storage and calculation chip according to the present invention;
图8为本发明存算一体芯片具体实施例译码模块的工作示意图;Fig. 8 is a working schematic diagram of a decoding module of a specific embodiment of an integrated storage and calculation chip according to the present invention;
图9为本发明存算一体芯片的计算方法具体实施例的流程图;Fig. 9 is a flowchart of a specific embodiment of the calculation method of the integrated storage and calculation chip of the present invention;
图10为本发明神经网络计算装置具体实施例的示意图;10 is a schematic diagram of a specific embodiment of the neural network computing device of the present invention;
图11示出适于用来实现本发明实施例的计算机设备的结构示意图。FIG. 11 shows a schematic structural diagram of a computer device suitable for implementing an embodiment of the present invention.
具体实施方式Detailed ways
为使本发明实施例的目的、技术方案和优点更加清楚明白,下面结合附图对本发明实施例做进一步详细说明。在此,本发明的示意性实施例及其说明用于解释本发明,但并不作为对本发明的限定。In order to make the purpose, technical solutions and advantages of the embodiments of the present invention more clear, the embodiments of the present invention will be further described in detail below in conjunction with the accompanying drawings. Here, the exemplary embodiments and descriptions of the present invention are used to explain the present invention, but not to limit the present invention.
需要说明的是,在本申请的一个或多个实施例中,RISCV是一个基于精简指令集(RISC)原则的开源指令集架构(ISA)。It should be noted that, in one or more embodiments of the present application, RISCV is an open source instruction set architecture (ISA) based on the reduced instruction set (RISC) principle.
根据本发明的一个方面,本实施例公开了一种存算一体芯片。如图1所示,所述芯片包括RISCV处理器、至少一个存储阵列以及计算模块。According to one aspect of the present invention, this embodiment discloses an integrated storage and calculation chip. As shown in FIG. 1 , the chip includes a RISCV processor, at least one storage array, and a computing module.
其中,所述RISCV处理器包括处理器核心模块和数据存储器。Wherein, the RISCV processor includes a processor core module and a data memory.
所述处理器核心模块用于接收外部指令,确定所述外部指令为RISCV指令还是扩展指令,若为扩展指令,根据所述外部指令确定权重数据和待处理数据,将所述权重数据写入所述存储阵列,将所述待处理数据发送至所述数据存储器。The processor core module is used to receive an external instruction, determine whether the external instruction is a RISCV instruction or an extended instruction, if it is an extended instruction, determine weight data and data to be processed according to the external instruction, and write the weight data into the the storage array, and send the data to be processed to the data storage.
所述计算模块根据所述存储阵列的权重数据和所述数据存储器的待处理数据得到计算结果。The calculation module obtains a calculation result according to the weight data of the storage array and the data to be processed in the data storage.
本发明存算一体芯片包括RISCV处理器、至少一个存储阵列以及计算模块。所述RISCV处理器包括处理器核心模块和数据存储器。所述处理器核心模块用于接收外部指令,确定所述外部指令为RISCV指令还是扩展指令,若为扩展指令,根据所述外部指令确定权重数据和待处理数据,将所述权重数据写入所述存储阵列,将所述待处理数据发送至所述数据存储器。所述计算模块根据所述存储阵列的权重数据和所述数据存储器的待处理数据得到计算结果。从而,本发明在接收到外部指令后,判断外部指令是常规的RISCV指令还是扩展指令,若是扩展指令,则确定权重数据和待处理数据,将权重数据发送至存储阵列进行存储,将待处理数据发送至数据存储器,计算模块可从数据存储器中获取待处理数据并与存储阵列中的权重数据进行计算得到计算结果。本发明将RISCV处理器与存算一体功能集成在一个芯片中,减少了数据存储和计算时的数据传输过程,提升数据计算效率。The storage and calculation integrated chip of the present invention includes a RISCV processor, at least one storage array and a calculation module. The RISCV processor includes a processor core module and a data memory. The processor core module is used to receive an external instruction, determine whether the external instruction is a RISCV instruction or an extended instruction, if it is an extended instruction, determine weight data and data to be processed according to the external instruction, and write the weight data into the the storage array, and send the data to be processed to the data storage. The calculation module obtains a calculation result according to the weight data of the storage array and the data to be processed in the data storage. Therefore, after receiving the external instruction, the present invention judges whether the external instruction is a conventional RISCV instruction or an extended instruction, and if it is an extended instruction, then determines the weight data and the data to be processed, sends the weight data to the storage array for storage, and stores the data to be processed Sending to the data memory, the calculation module can obtain the data to be processed from the data memory and perform calculation with the weight data in the storage array to obtain the calculation result. The invention integrates the RISCV processor and the storage and calculation functions into one chip, reduces the data transmission process during data storage and calculation, and improves the data calculation efficiency.
在优选的实施方式中,所述芯片进一步包括本地指令存储器、总线和外部存储器,所述处理器核心模块通过所述总线与所述外部存储器连接。In a preferred embodiment, the chip further includes a local instruction memory, a bus, and an external memory, and the processor core module is connected to the external memory through the bus.
可以理解的是,所述芯片为了实现基础的RISCV指令的处理,需要在芯片中设置本地指令存储器、总线和外部存储器。处理器核心模块、本地指令存储器、总线、外部存储器和数据存储器组成一个完整的RISCV处理器,如图2所示。It can be understood that, in order to realize the processing of basic RISCV instructions, the chip needs to be provided with a local instruction memory, a bus and an external memory. The processor core module, local instruction memory, bus, external memory and data memory form a complete RISCV processor, as shown in Figure 2.
其中,本地指令存储器可用于存储待执行的RISCV指令和扩展指令,所述处理器核心模块进一步用于在接收到外部指令后,将所述外部指令存储至所述本地指令存储器。外部存储器可扩大芯片的存储容量,处理器核心模块通过总线与外部存储器进行数据交换,将部分数据存储在外部存储器中以实现芯片的存储容量的扩展。Wherein, the local instruction memory can be used to store RISCV instructions and extension instructions to be executed, and the processor core module is further used to store the external instruction into the local instruction memory after receiving the external instruction. The external memory can expand the storage capacity of the chip, and the processor core module exchanges data with the external memory through the bus, and stores part of the data in the external memory to realize the expansion of the chip's storage capacity.
至少一个存储阵列、计算模块和数据存储器可以组成一个完整的存算一体计算单元,实现存算一体功能,如图3所示。其中,存储阵列用于可用于存储复用率较高的权重数据,待处理数据可暂存至数据存储器中。从而计算模块可从数据存储器中取出待处理数据,对权重数据和待处理数据进行计算得到计算结果。其中,需要说明的是,扩展指令通常为数据运算的指令,当多个数据与同一个数据分别进行计算时,同一个数据的复用率高于其他数据,该复用率高的数据可作为权重数据写入存储阵列,多个数据作为待处理数据暂存至数据存储器。At least one storage array, computing module, and data storage can form a complete storage-computing integrated computing unit to realize the storage-computing integrated function, as shown in Figure 3. Wherein, the storage array is used to store weight data with a high multiplexing rate, and the data to be processed can be temporarily stored in the data memory. Therefore, the calculation module can take out the data to be processed from the data memory, and calculate the weight data and the data to be processed to obtain a calculation result. Among them, it should be noted that the extended instruction is usually a data operation instruction. When multiple data and the same data are calculated separately, the multiplexing rate of the same data is higher than that of other data, and the data with a high multiplexing rate can be used as The weight data is written into the storage array, and multiple data are temporarily stored in the data memory as data to be processed.
在优选的实施方式中,所述数据存储器包括本地数据存储模块和存算一体缓存模块。In a preferred embodiment, the data storage includes a local data storage module and a storage-computing integrated cache module.
其中,所述本地数据存储模块用于存储处理所述RISCV指令所需的计算数据。所述存算一体缓存模块用于存储所述待处理数据。Wherein, the local data storage module is used for storing calculation data required for processing the RISCV instruction. The storage-computing integrated cache module is used to store the data to be processed.
具体的,可以理解的是,在该优选的实施方式中,在现有的RISCV处理器中设置一部分存储空间作为本地数据存储模块,用来存储处理器核心模块处理RISCV指令所需的计算数据。同时,还设置另一部分存储空间作为存算一体缓存模块,用来存储扩展指令的待处理数据。在具体例子中,数据存储器由静态随机存取存储器(Static Random-AccessMemory,SRAM)阵列形成,本地数据存储模块的存储阵列与存算一体缓存模块的存储阵列位于同一块SRAM阵列内,可在物理地址上对两者进行划分。数据存储器的存算一体缓存模块进一步还包括运算放大器与读出电路,以并行输出待处理数据。Specifically, it can be understood that, in this preferred embodiment, a part of the storage space in the existing RISCV processor is set as a local data storage module for storing calculation data required by the processor core module to process RISCV instructions. At the same time, another part of the storage space is also set up as an integrated cache module for storage and calculation, which is used to store the pending data of the extended instructions. In a specific example, the data memory is formed by a static random-access memory (Static Random-Access Memory, SRAM) array, and the storage array of the local data storage module and the storage array of the storage-computing integrated cache module are located in the same SRAM array, which can be physically The two are divided on the address. The integrated storage and calculation cache module of the data memory further includes an operational amplifier and a readout circuit to output data to be processed in parallel.
在现有技术中,通常采用本地数据存储器与存算一体输入缓存相分离的形式,两者之间需要通过总线进行数据传输,在两者间数据传输需求量大时,无疑增加了系统的额外负担。该优选的实施方式中,如图4所示,本地数据存储模块与存算一体缓存模块合二为一,集成在一起,省去了数据传输过程,简化了RISCV处理器与存算一体计算单元之间的数据传输,既节省了功耗又降低了延时。In the prior art, the local data memory is usually separated from the storage-computing integrated input buffer, and data transmission between the two needs to be carried out through the bus. When the demand for data transmission between the two is large, it will undoubtedly increase the additional cost of the system. burden. In this preferred embodiment, as shown in Figure 4, the local data storage module and the storage-computing integrated cache module are integrated into one, which saves the data transmission process and simplifies the RISCV processor and the storage-computing integrated computing unit The data transmission between them not only saves power consumption but also reduces delay.
在优选的实施方式中,RISCV处理器进一步包括与所述存储阵列对应的写入模块和读取模块。所述写入模块包括与所述存储阵列对应的行译码器和列译码器;所述读取模块包括读译码器。In a preferred embodiment, the RISCV processor further includes a write module and a read module corresponding to the storage array. The writing module includes a row decoder and a column decoder corresponding to the storage array; the reading module includes a reading decoder.
可以理解的是,存储阵列通常包括阵列排布的多个存储单元,该存储单元可包括控制开关和磁隧道结。其中,磁隧道结可用于存储数据,控制开关可基于外部的控制指令导通或者断开,以实现向磁隧道结中写入或者不写入数据。It can be understood that a memory array generally includes a plurality of memory cells arranged in an array, and the memory cells may include control switches and magnetic tunnel junctions. Wherein, the magnetic tunnel junction can be used to store data, and the control switch can be turned on or off based on an external control command, so as to write or not write data into the magnetic tunnel junction.
在具体例子中,处理器核心模块在确定权重数据后,可确定权重数据写入的存储单元的写入地址和待写入的权重数据,进而分别通过行译码器和列译码器将权重数据写入存储阵列。In a specific example, after determining the weight data, the processor core module can determine the write address of the storage unit where the weight data is written and the weight data to be written, and then respectively pass the row decoder and the column decoder to the weight Data is written to the storage array.
可以理解的是,当根据所述存储阵列的权重数据和所述数据存储器的待处理数据得到计算结果时,可通过读译码器向存储阵列发送列选择信号,以选择存储阵列中的一列,与待处理数据中的一比特数据相乘后相加得到相乘结果,将待处理数据的所有相乘结果相加完成乘积累加运算。It can be understood that when the calculation result is obtained according to the weight data of the storage array and the data to be processed in the data memory, a column selection signal can be sent to the storage array through the read decoder to select a column in the storage array, Multiply with one bit of data in the data to be processed and then add to obtain the multiplication result, and add all the multiplication results of the data to be processed to complete the multiplication and accumulation operation.
在优选的实施方式中,如图5所示,所述计算模块包括加法树和累加器。In a preferred implementation manner, as shown in FIG. 5 , the calculation module includes an addition tree and an accumulator.
其中,所述处理器核心模块用于根据所述外部指令确定比特选择信号,根据所述比特选择信号从所述数据存储器的待处理数据中选择一比特数据,将所述一比特数据发送至所述加法树以使所述加法树将所述权重数据和所述一比特数据相乘得到相乘结果并发送至所述累加器;Wherein, the processor core module is configured to determine a bit selection signal according to the external instruction, select one bit of data from the data to be processed in the data memory according to the bit selection signal, and send the one bit of data to the The addition tree causes the addition tree to multiply the weight data and the one-bit data to obtain a multiplication result and send it to the accumulator;
所述累加器用于将所述待处理数据的所有比特数据对应的相乘结果相加得到所述计算结果。The accumulator is used to add the multiplication results corresponding to all the bit data of the data to be processed to obtain the calculation result.
具体的,可以理解的是,如图6所示,在进行MAC计算时,处理器核心模块可向数据存储器发送比特选择信号以选择一比特的待处理数据,并根据列选择信号从存储阵列中选择一列数据并行输入到加法树内,根据比特选择信号将数据存储器内所有待处理数据包括的比特数据并行输入到加法树内,两者在加法树内相乘并并行相加。累加器将多周期的计算结果累加得到计算结果,上述过程实现了向量乘法,通过多次向量乘法并行处理可将向量乘法扩展为向量矩阵乘法。Specifically, it can be understood that, as shown in Figure 6, when performing MAC calculations, the processor core module can send a bit selection signal to the data memory to select one bit of data to be processed, and select the data from the storage array according to the column selection signal A column of data is selected and input into the addition tree in parallel, and bit data included in all the data to be processed in the data memory is input in parallel into the addition tree according to the bit selection signal, and the two are multiplied in the addition tree and added in parallel. The accumulator accumulates the calculation results of multiple cycles to obtain the calculation results. The above process realizes vector multiplication, and the vector multiplication can be extended to vector matrix multiplication through parallel processing of multiple vector multiplications.
在优选的实施方式中,如图7所示,所述处理器核心模块包括依次连接的取指模块、译码模块、执行模块和访存回写模块,还包括存算一体控制单元。In a preferred embodiment, as shown in FIG. 7 , the processor core module includes an instruction fetch module, a decoding module, an execution module, and a memory access and write-back module connected in sequence, and also includes a memory-computing integrated control unit.
其中,所述取指模块包括程序计数器,用于从本地指令存储器中获取外部指令。Wherein, the instruction fetching module includes a program counter for obtaining external instructions from the local instruction memory.
所述译码模块包括判断单元和寄存器堆,所述判断单元用于确定所述外部指令为RISCV指令还是扩展指令,若为扩展指令,将所述外部指令发送至所述存算一体控制单元,存算一体控制单元根据所述外部指令确定权重数据和待处理数据,将所述权重数据写入所述存储阵列,将所述待处理数据发送至所述数据存储器,并控制所述计算模块根据所述存储阵列的权重数据和所述数据存储器的待处理数据得到计算结果;若为RISCV指令,将所述RISCV指令发送至所述寄存器堆,如图8所示。The decoding module includes a judging unit and a register file, the judging unit is used to determine whether the external instruction is a RISCV instruction or an extended instruction, and if it is an extended instruction, the external instruction is sent to the integrated storage and calculation control unit, The storage-computing integrated control unit determines the weight data and the data to be processed according to the external instruction, writes the weight data into the storage array, sends the data to be processed to the data memory, and controls the calculation module according to the The calculation result is obtained from the weight data of the storage array and the data to be processed in the data memory; if it is a RISCV instruction, the RISCV instruction is sent to the register file, as shown in FIG. 8 .
所述执行模块用于执行所述RISCV指令得到RISCV计算数据。The execution module is used to execute the RISCV instruction to obtain RISCV calculation data.
所述访存回写模块包括存储器访存单元,接收所述RISCV计算数据并存储,并传输至所述寄存器堆。The memory access and write-back module includes a memory access unit, which receives and stores the RISCV calculation data, and transmits it to the register file.
具体的,在将外部指令导入本地指令存储器后,处理器核心模块可读取外部指令并送入流水线内。传统五级流水线包含取值、译码、执行、访存、写回,分别通过取指模块、译码模块、执行模块和访存回写模块实现。在此基础上,本实施例对流水线进行了改进,在译码模块中增加了判断单元。在读取到扩展指令的时候,判断单元将指令发送到存算一体控制单元内部,由存算一体控制单元完成对存算一体计算的控制;在读取到RISCV指令的时候,则继续执行上述五级流水线,执行指令功能。Specifically, after importing the external instruction into the local instruction memory, the processor core module can read the external instruction and send it into the pipeline. The traditional five-stage pipeline includes value fetching, decoding, execution, memory access, and write-back, which are respectively implemented by an instruction fetch module, a decode module, an execution module, and a memory access and write-back module. On this basis, this embodiment improves the pipeline and adds a judging unit in the decoding module. When the extended instruction is read, the judgment unit sends the instruction to the integrated storage and calculation control unit, and the integrated storage and calculation control unit completes the control of the integrated storage and calculation; when the RISCV instruction is read, continue to execute the above Five-stage pipeline to execute instruction functions.
基于相同原理,本发明实施例中还提供了一种存算一体芯片的计算方法。所述存算一体芯片包括至少一个存储阵列、计算模块以及数据存储器;Based on the same principle, an embodiment of the present invention also provides a calculation method for an integrated storage and calculation chip. The integrated storage and calculation chip includes at least one storage array, computing module and data memory;
如图9所示,所述方法包括:As shown in Figure 9, the method includes:
S100:接收外部指令。S100: Receive an external command.
S200:确定所述外部指令为RISCV指令还是扩展指令,若为扩展指令,根据所述外部指令确定权重数据和待处理数据,将所述权重数据写入所述存储阵列,将所述待处理数据发送至所述数据存储器。S200: Determine whether the external instruction is a RISCV instruction or an extended instruction, if it is an extended instruction, determine weight data and data to be processed according to the external instruction, write the weight data into the storage array, and write the data to be processed sent to the data store.
S300:通过所述计算模块根据所述存储阵列的权重数据和所述数据存储器的待处理数据得到计算结果。S300: Obtain a calculation result by the calculation module according to the weight data of the storage array and the data to be processed in the data memory.
由于该芯片解决问题的原理与以上方法类似,因此本芯片的实施可以参见方法的实施,在此不再赘述。Since the problem-solving principle of this chip is similar to the above method, the implementation of this chip can refer to the implementation of the method, and will not be repeated here.
基于相同原理,本发明实施例中还提供了一种神经网络计算装置。该神经网络计算装置包括本实施例的存算一体芯片。Based on the same principle, an embodiment of the present invention also provides a neural network computing device. The neural network computing device includes the storage-computing integrated chip of this embodiment.
具体的,在神经网络计算领域中,存算一体芯片的存算一体计算功能可以完成全连接层、卷积层等所需的MAC运算,RISCV处理器内完成激活、池化等操作,在不同层间不需要额外的数据传输,此处以全连接层计算为例。如图10所示,输入的待处理数据存储于本地数据存储模块内,权重数据存储于存算一体存储阵列内,两者通过存算一体芯片中的存储阵列和计算模块完成MAC计算,计算结果返回到数据存储器的本地数据存储模块。处理器核心模块从本地数据存储器内取出计算结果并执行激活函数,从而完成完整的全连接层计算。由于计算结果直接存储于本地数据存储器内,下一层计算可以直接读取,降低了数据传输需求。Specifically, in the field of neural network computing, the memory-computing integrated computing function of the memory-computing integrated chip can complete the MAC operations required for the fully connected layer and the convolutional layer, and the RISCV processor completes operations such as activation and pooling. There is no need for additional data transmission between layers. Here, the calculation of the fully connected layer is taken as an example. As shown in Figure 10, the input data to be processed is stored in the local data storage module, and the weight data is stored in the storage-computing integrated storage array. The two complete the MAC calculation through the storage array and the computing module in the storage-computing integrated chip, and the calculation results Returns to the local datastore module for the datastore. The processor core module fetches calculation results from the local data memory and executes the activation function to complete the complete fully connected layer calculation. Since the calculation results are directly stored in the local data storage, the next layer of calculations can be read directly, reducing the need for data transmission.
由于该装置解决问题的原理与以上芯片类似,因此本装置的实施可以参见芯片的实施,在此不再赘述。Since the problem-solving principle of the device is similar to that of the chip above, the implementation of the device can refer to the implementation of the chip, and will not be repeated here.
本发明实施例还提供一种计算机设备,包括存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现上述方法。An embodiment of the present invention also provides a computer device, including a memory, a processor, and a computer program stored on the memory and operable on the processor, and the above method is implemented when the processor executes the computer program.
本发明实施例还提供一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现上述方法。An embodiment of the present invention also provides a computer-readable storage medium, where the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the foregoing method is implemented.
本领域内的技术人员应明白,本发明的实施例可提供为方法、系统、或计算机程序产上述实施例阐明的系统、装置、模块或单元,具体可以由计算机芯片或实体实现,或者由具有某种功能的产品来实现。一种典型的实现设备为计算机设备,具体的,计算机设备例如可以为个人计算机、膝上型计算机、蜂窝电话、相机电话、智能电话、个人数字助理、媒体播放器、导航设备、电子邮件设备、游戏控制台、平板计算机、可穿戴设备或者这些设备中的任何设备的组合。Those skilled in the art should understand that the embodiments of the present invention may provide the systems, devices, modules or units described in the above embodiments as methods, systems, or computer programs, and specifically may be implemented by computer chips or entities, or by A product with a certain function to achieve. A typical implementing device is a computer device. Specifically, the computer device can be, for example, a personal computer, a laptop computer, a cellular phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email device, Game consoles, tablets, wearables, or a combination of any of these devices.
在一个典型的实例中计算机设备具体包括存储器、处理器以及存储在存储器上并可在处理器上运行的计算机程序,所述处理器执行所述程序时实现如上所述的由客户端执行的方法,或者,所述处理器执行所述程序时实现如上所述的由服务器执行的方法。In a typical example, the computer device specifically includes a memory, a processor, and a computer program stored on the memory and operable on the processor. When the processor executes the program, the above-mentioned method executed by the client is implemented. Or, when the processor executes the program, the above-mentioned method executed by the server is implemented.
下面参考图11,其示出了适于用来实现本申请实施例的计算机设备600的结构示意图。Referring now to FIG. 11 , it shows a schematic structural diagram of a computer device 600 suitable for implementing the embodiment of the present application.
如图11所示,计算机设备600包括中央处理单元(CPU)601,其可以根据存储在只读存储器(ROM)602中的程序或者从存储部分608加载到随机访问存储器(RAM))603中的程序而执行各种适当的工作和处理。在RAM603中,还存储有系统600操作所需的各种程序和数据。CPU601、ROM602、以及RAM603通过总线604彼此相连。输入/输出(I/O)接口605也连接至总线604。As shown in FIG. 11 , a computer device 600 includes a central processing unit (CPU) 601 that can be programmed according to a program stored in a read-only memory (ROM) 602 or loaded from a
以下部件连接至I/O接口605:包括键盘、鼠标等的输入部分606;包括诸如阴极射线管(CRT)、液晶反馈器(LCD)等以及扬声器等的输出部分607;包括硬盘等的存储部分608;以及包括诸如LAN卡,调制解调器等的网络接口卡的通信部分609。通信部分609经由诸如因特网的网络执行通信处理。驱动器610也根据需要连接至I/O接口606。可拆卸介质611,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器610上,以便于从其上读出的计算机程序根据需要被安装如存储部分608。The following components are connected to the I/O interface 605: an
特别地,根据本发明的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本发明的实施例包括一种计算机程序产品,其包括有形地包含在机器可读介质上的计算机程序,所述计算机程序包括用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信部分609从网络上被下载和安装,和/或从可拆卸介质611被安装。In particular, according to an embodiment of the present invention, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present invention include a computer program product comprising a computer program tangibly embodied on a machine-readable medium, the computer program comprising program code for performing the methods shown in the flowcharts. In such an embodiment, the computer program may be downloaded and installed from a network via
计算机可读介质包括永久性和非永久性、可移动和非可移动媒体可以由任何方法或技术来实现信息存储。信息可以是计算机可读指令、数据结构、程序的模块或其他数据。计算机的存储介质的例子包括,但不限于相变内存(PRAM)、静态随机存取存储器(SRAM)、动态随机存取存储器(DRAM)、其他类型的随机存取存储器(RAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、快闪记忆体或其他内存技术、只读光盘只读存储器(CD-ROM)、数字多功能光盘(DVD)或其他光学存储、磁盒式磁带,磁带磁磁盘存储或其他磁性存储设备或任何其他非传输介质,可用于存储可以被计算设备访问的信息。按照本文中的界定,计算机可读介质不包括暂存电脑可读媒体(transitory media),如调制的数据信号和载波。Computer-readable media, including both permanent and non-permanent, removable and non-removable media, can be implemented by any method or technology for storage of information. Information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), other types of random access memory (RAM), read only memory (ROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Flash memory or other memory technology, Compact Disc Read-Only Memory (CD-ROM), Digital Versatile Disc (DVD) or other optical storage, Magnetic tape cartridge, tape magnetic disk storage or other magnetic storage device or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, computer-readable media excludes transitory computer-readable media, such as modulated data signals and carrier waves.
为了描述的方便,描述以上装置时以功能分为各种单元分别描述。当然,在实施本申请时可以把各单元的功能在同一个或多个软件和/或硬件中实现。For the convenience of description, when describing the above devices, functions are divided into various units and described separately. Of course, when implementing the present application, the functions of each unit can be implemented in one or more pieces of software and/or hardware.
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It should be understood that each procedure and/or block in the flowchart and/or block diagram, and a combination of procedures and/or blocks in the flowchart and/or block diagram can be realized by computer program instructions. These computer program instructions may be provided to a general purpose computer, special purpose computer, embedded processor, or processor of other programmable data processing equipment to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing equipment produce a An apparatus for realizing the functions specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to operate in a specific manner, such that the instructions stored in the computer-readable memory produce an article of manufacture comprising instruction means, the instructions The device realizes the function specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded onto a computer or other programmable data processing device, causing a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process, thereby The instructions provide steps for implementing the functions specified in the flow chart or blocks of the flowchart and/or the block or blocks of the block diagrams.
还需要说明的是,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、商品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、商品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、商品或者设备中还存在另外的相同要素。It should also be noted that the term "comprises", "comprises" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article, or apparatus comprising a set of elements includes not only those elements, but also includes Other elements not expressly listed, or elements inherent in the process, method, commodity, or apparatus are also included. Without further limitations, an element defined by the phrase "comprising a ..." does not exclude the presence of additional identical elements in the process, method, article or apparatus comprising said element.
本领域技术人员应明白,本申请的实施例可提供为方法、系统或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art should understand that the embodiments of the present application may be provided as methods, systems or computer program products. Accordingly, the present application can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
本申请可以在由计算机执行的计算机可执行指令的一般上下文中描述,例如程序模块。一般地,程序模块包括执行特定任务或实现特定抽象数据类型的例程、程序、对象、组件、数据结构等等。也可以在分布式计算环境中实践本申请,在这些分布式计算环境中,由通过通信网络而被连接的远程处理设备来执行任务。在分布式计算环境中,程序模块可以位于包括存储设备在内的本地和远程计算机存储介质中。This application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including storage devices.
本说明书中的各个实施例均采用递进的方式描述,各个实施例之间相同相似的部分互相参见即可,每个实施例重点说明的都是与其他实施例的不同之处。尤其,对于系统实施例而言,由于其基本相似于方法实施例,所以描述的比较简单,相关之处参见方法实施例的部分说明即可。Each embodiment in this specification is described in a progressive manner, the same and similar parts of each embodiment can be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, for the system embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for relevant parts, refer to part of the description of the method embodiment.
以上所述仅为本申请的实施例而已,并不用于限制本申请。对于本领域技术人员来说,本申请可以有各种更改和变化。凡在本申请的精神和原理之内所作的任何修改、等同替换、改进等,均应包含在本申请的权利要求范围之内。The above descriptions are only examples of the present application, and are not intended to limit the present application. For those skilled in the art, various modifications and changes may occur in this application. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application shall be included within the scope of the claims of the present application.
Claims (10)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310195494.2A CN116185498A (en) | 2023-02-24 | 2023-02-24 | Storage and calculation integrated chip, calculation method and device thereof |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310195494.2A CN116185498A (en) | 2023-02-24 | 2023-02-24 | Storage and calculation integrated chip, calculation method and device thereof |
Publications (1)
Publication Number | Publication Date |
---|---|
CN116185498A true CN116185498A (en) | 2023-05-30 |
Family
ID=86444205
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310195494.2A Pending CN116185498A (en) | 2023-02-24 | 2023-02-24 | Storage and calculation integrated chip, calculation method and device thereof |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116185498A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117472847A (en) * | 2023-11-08 | 2024-01-30 | 海光信息技术股份有限公司 | Memory core, data processing method, computer system |
CN119003139A (en) * | 2024-10-24 | 2024-11-22 | 中科南京智能技术研究院 | Memory calculation hardware accelerator, electronic equipment and data processing method |
-
2023
- 2023-02-24 CN CN202310195494.2A patent/CN116185498A/en active Pending
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117472847A (en) * | 2023-11-08 | 2024-01-30 | 海光信息技术股份有限公司 | Memory core, data processing method, computer system |
CN119003139A (en) * | 2024-10-24 | 2024-11-22 | 中科南京智能技术研究院 | Memory calculation hardware accelerator, electronic equipment and data processing method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108427990B (en) | Neural network computing system and method | |
CN107169563B (en) | Processing system and method applied to two-value weight convolutional network | |
CN111310910A (en) | A computing device and method | |
US20120054468A1 (en) | Processor, apparatus, and method for memory management | |
US20190034833A1 (en) | Model Training Method and Apparatus | |
CN116185498A (en) | Storage and calculation integrated chip, calculation method and device thereof | |
US20110106916A1 (en) | Apparatus and method for executing an application | |
CN106557436A (en) | The memory compression function enabled method of terminal and device | |
CN107315716B (en) | Device and method for executing vector outer product operation | |
CN115033188B (en) | Storage hardware acceleration module system based on ZNS solid state disk | |
CN116010299A (en) | Data processing method, device, equipment and readable storage medium | |
JP2020042782A (en) | Computing method applied to artificial intelligence chip, and artificial intelligence chip | |
EP3991097A1 (en) | Managing workloads of a deep neural network processor | |
Min et al. | NeuralHMC: An efficient HMC-based accelerator for deep neural networks | |
CN113065643A (en) | Apparatus and method for performing multi-task convolutional neural network prediction | |
CN115249057A (en) | System and computer-implemented method for graph node sampling | |
CN112965788A (en) | Task execution method, system and equipment in hybrid virtualization mode | |
CN118963843B (en) | Coprocessor and computer equipment | |
Liu et al. | Enabling efficient large recommendation model training with near cxl memory processing | |
WO2023115529A1 (en) | Data processing method in chip, and chip | |
CN107179883B (en) | Spark architecture optimization method of hybrid storage system based on SSD and HDD | |
CN117436528A (en) | An AI network model reasoning unit and network model reasoning pipeline technology based on RISC-V | |
US20200026669A1 (en) | Memory system | |
CN114840886B (en) | Safe read-write storage device, method and equipment based on data flow architecture | |
CN116931876A (en) | Matrix operation system, matrix operation method, satellite navigation method and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |