CN115329935A

CN115329935A - Pulse neural network accelerator

Info

Publication number: CN115329935A
Application number: CN202210987309.9A
Authority: CN
Inventors: 刘怡俊; 陈岳海; 叶武剑
Original assignee: Guangdong University of Technology
Current assignee: Guangdong University of Technology
Priority date: 2022-08-17
Filing date: 2022-08-17
Publication date: 2022-11-11

Abstract

The application discloses a pulse neural network accelerator.A controller drives a time step generator to generate a time step t after receiving input data; the pulse encoding unit encodes the input data into pulse data of the next time step t +1 based on the time step t; the scheduler decodes the pulse data and stores the decoded pulse data into the FIFO array; the controller reads a pulse source address from the FIFO array based on the current time step and sends the pulse source address to the neuron computing unit; the neuron computing unit reads corresponding neuron state data from the state memory according to the pulse source address so as to update the state of the neurons of the corresponding layer, writes the updated state data back to the state memory, and sends the output pulse data to the scheduler for storage, so that the neurons of the next layer can update the state, and the technical problem that the existing accelerator cannot determine whether unprocessed events remain at the current time in terms of hardware implementation and the performance of the accelerator is low is solved.

Description

A Spiking Neural Network Accelerator

技术领域technical field

本申请涉及脉冲神经网络加速技术领域，尤其涉及一种脉冲神经网络加速器。The present application relates to the technical field of spiking neural network acceleration, in particular to a spiking neural network accelerator.

背景技术Background technique

人工智能技术及应用在日常生活中发挥着重大的作用，其中以脉冲神经网络为基础的模型在分类、回归以及目标检测等领域展示极高的准确度以及性能，能够较准确地模拟人类大脑神经网络的运行机制，且随着网络层数的加深，对系统资源的消耗较少，利于推进人工智能应用的发展。Artificial intelligence technology and applications play an important role in daily life. Among them, the model based on spiking neural network shows extremely high accuracy and performance in the fields of classification, regression and target detection, and can more accurately simulate the human brain. The operating mechanism of the network, and as the number of network layers deepens, the consumption of system resources is less, which is conducive to promoting the development of artificial intelligence applications.

现有的硬件脉冲神经网络驱动算法普遍为事件驱动，在硬件实现上无法确定当前时间下是否属于未处理的事件，导致加速器的性能较低。The existing hardware spiking neural network driving algorithms are generally event-driven, and it is impossible to determine whether the current time is an unprocessed event in hardware implementation, resulting in low performance of the accelerator.

发明内容Contents of the invention

本申请提供了一种脉冲神经网络加速器，用于改善现有的加速器在硬件实现上无法确定当前时间下是否剩余未处理的事件，导致加速器的性能较低的技术问题。The present application provides a spiking neural network accelerator, which is used to improve the technical problem that the existing accelerator cannot determine whether there are unprocessed events at the current time in terms of hardware implementation, resulting in low performance of the accelerator.

有鉴于此，本申请第一方面提供了一种脉冲神经网络加速器方法，包括：In view of this, the first aspect of the present application provides a spiking neural network accelerator method, including:

所述控制器，用于在接收到输入数据后驱动所述时间步生成器生成时间步t，并将时间步t和输入数据发送给所述脉冲编码器单元；The controller is configured to drive the time step generator to generate a time step t after receiving the input data, and send the time step t and the input data to the pulse encoder unit;

所述脉冲编码单元，用于基于时间步t将输入数据编码为下一个时间步t+1的脉冲数据，并将所述脉冲数据发送给所述控制器，由所述控制器将所述脉冲数据发送给所述调度器；The pulse encoding unit is configured to encode the input data into the pulse data of the next time step t+1 based on the time step t, and send the pulse data to the controller, and the controller converts the pulse sending data to the scheduler;

所述调度器，用于对所述脉冲数据进行解码得到脉冲源地址和时间步t+1，并将脉冲源地址和时间步t+1存储到FIFO阵列中；The scheduler is used to decode the pulse data to obtain the pulse source address and time step t+1, and store the pulse source address and time step t+1 in the FIFO array;

所述控制器还用于基于当前的时间步从所述FIFO阵列中读取脉冲源地址，并将该脉冲源地址发送给所述神经元计算单元；The controller is further configured to read the pulse source address from the FIFO array based on the current time step, and send the pulse source address to the neuron computing unit;

所述神经元计算单元，用于根据脉冲源地址从所述状态存储器中读取对应的神经元状态数据，根据所述神经元状态数据对对应层的神经元进行状态更新，将更新后的状态数据写回所述状态存储器中，并将输出的脉冲数据发送给所述调度器进行存储，以便下一层神经元进行状态更新。The neuron calculation unit is configured to read corresponding neuron state data from the state memory according to the pulse source address, update the state of the neurons of the corresponding layer according to the neuron state data, and update the updated state The data is written back into the state memory, and the output pulse data is sent to the scheduler for storage, so that the next layer of neurons can update the state.

可选的，所述脉冲编码单元包括：伪随机数生成器和泊松编码器；Optionally, the pulse encoding unit includes: a pseudo-random number generator and a Poisson encoder;

所述伪随机数生成器，用于在接收到时间步后，随机生成0到1之间的随机数；The pseudo-random number generator is used to randomly generate a random number between 0 and 1 after receiving the time step;

所述泊松编码器，用于基于所述随机数对输入数据进行泊松编码，生成下一个时间步t+1的脉冲数据。The Poisson encoder is configured to perform Poisson encoding on the input data based on the random number to generate pulse data at the next time step t+1.

可选的，所述脉冲编码单元，还用于在编码完下一个时间步t+1的输入数据后，触发所述时间步生成器生成下一个时间步t+1。Optionally, the pulse encoding unit is further configured to trigger the time step generator to generate the next time step t+1 after encoding the input data of the next time step t+1.

可选的，还包括：延迟存储单元和突触延迟计算单元；Optionally, it also includes: a delay storage unit and a synaptic delay calculation unit;

所述延迟存储单元，用于存储目标神经元的突触延迟时间；The delay storage unit is used to store the synaptic delay time of the target neuron;

所述突触延迟计算单元，用于根据所述目标神经元的突触延迟和当前时间步计算目标时间步，将所述目标神经元的目标时间步发送给所述调度器，由所述调度器根据所述目标神经元的脉冲源地址和所述目标时间步进行存储。The synaptic delay calculation unit is configured to calculate a target time step according to the synaptic delay of the target neuron and the current time step, and send the target time step of the target neuron to the scheduler, and the scheduler The device stores according to the pulse source address of the target neuron and the target time step.

可选的，所述神经元计算单元，具体用于：Optionally, the neuron computing unit is specifically used for:

当神经元为IF或LIF神经元时，基于四级流水线根据脉冲源地址从所述状态存储器中读取对应的神经元状态数据，根据所述神经元状态数据计算脉冲神经网络参数，基于脉冲神经网络参数更新神经元状态，以及将更新后的神经元状态写回到所述状态存储器中；When the neuron is an IF or LIF neuron, based on the four-stage pipeline, the corresponding neuron state data is read from the state memory according to the pulse source address, and the pulse neural network parameters are calculated according to the neuron state data. The network parameters update the neuron state, and write the updated neuron state back into the state memory;

当神经元为Izhikevich神经元时，基于六级流水线根据脉冲源地址从所述状态存储器中读取对应的神经元状态数据，根据所述神经元状态数据计算脉冲神经网络参数，基于脉冲神经网络参数更新神经元状态，以及将更新后的神经元状态写回到所述状态存储器中。When the neuron is an Izhikevich neuron, based on the six-stage pipeline, the corresponding neuron state data is read from the state memory according to the pulse source address, and the pulse neural network parameters are calculated according to the neuron state data, based on the pulse neural network parameters updating the neuron state, and writing the updated neuron state back into the state memory.

可选的，当神经元为LIF神经元时，所述脉冲神经网络参数包括基于一阶欧拉方法计算的LIF神经元的膜电位电压，计算公式为：Optionally, when the neuron is a LIF neuron, the spiking neural network parameters include the membrane potential voltage of the LIF neuron calculated based on the first-order Euler method, and the calculation formula is:

β₁＝α·V[n]，

β ₁ =α·V[n],

式中，V[n]为当前时刻的膜状态，V[n+1]为所求的膜电位大小，

为采用一阶欧拉方法求得的膜电位电压，t_n为计算时的第n个离散时间步，v_n为输入膜电位大小，α为神经元膜电阻产生的电压受神经元此时膜电位的影响大小，f₁(t,v)为第一目标方程，f₂(t,v)为第二目标方程，β₁、β₂为不同阶段V的大小偏置，V_reset为神经元静息电位，h为时间步，τ_m为时间常数。In the formula, V[n] is the membrane state at the current moment, V[n+1] is the required membrane potential,

is the membrane potential voltage obtained by the first-order Euler method, t _n is the nth discrete time step in the calculation, v _n is the input membrane potential, α is the voltage generated by the neuron membrane resistance affected by the neuron membrane at this time The influence of potential, f ₁ (t,v) is the first objective equation, f ₂ (t,v) is the second objective equation, β ₁ and β ₂ are the biases of V in different stages, V _reset is the neuron Resting potential, h is the time step, τ _m is the time constant.

可选的，当神经元为Izhikevich神经元时，所述脉冲神经网络参数包括基于一阶欧拉方法计算的Izhikevich神经元的膜电位电压，计算公式为：Optionally, when the neuron is an Izhikevich neuron, the pulse neural network parameters include the membrane potential voltage of the Izhikevich neuron calculated based on the first-order Euler method, and the calculation formula is:

V[n+1]＝(0.04V²+5V+140-U+I)·h；V[n+1]=(0.04V ² +5V+140-U+I) h;

U[n+1]＝[a(bV-U)]·h；U[n+1]=[a(bV-U)] h;

式中，V为所求的膜电位电压，U为膜电位恢复变量，I为神经元的输入电流，h为时间步，a、b、c、d为神经元的模型参数，V_threhold为膜电压阈值。In the formula, V is the required membrane potential voltage, U is the membrane potential recovery variable, I is the input current of the neuron, h is the time step, a, b, c, d are the model parameters of the neuron, and V _threshold is the membrane voltage threshold.

可选的，所述状态存储器包括用于存储神经元的权重的权重存储器和用于存储神经元参数的神经元参数存储单元。Optionally, the state memory includes a weight memory for storing weights of neurons and a neuron parameter storage unit for storing neuron parameters.

可选的，所述调度器中的FIFO阵列为16个并行的FIFO阵列。Optionally, the FIFO arrays in the scheduler are 16 parallel FIFO arrays.

从以上技术方案可以看出，本申请具有以下优点：As can be seen from the above technical solutions, the present application has the following advantages:

本申请提供了一种脉冲神经网络加速器，包括：脉冲编码单元、控制器、时间步生成器、调度器、状态存储器和神经元计算单元；控制器，用于在接收到输入数据后驱动时间步生成器生成时间步t，并将时间步t和输入数据发送给脉冲编码器单元；脉冲编码单元，用于基于时间步t将输入数据编码为下一个时间步t+1的脉冲数据，并将脉冲数据发送给控制器，由控制器将脉冲数据发送给调度器；调度器，用于对脉冲数据进行解码得到脉冲源地址和时间步t+1，并将脉冲源地址和时间步t+1存储到FIFO阵列中；控制器还用于基于当前的时间步从FIFO阵列中读取脉冲源地址，并将该脉冲源地址发送给神经元计算单元；神经元计算单元，用于根据脉冲源地址从状态存储器中读取对应的神经元状态数据，根据神经元状态数据对对应层的神经元进行状态更新，将更新后的状态数据写回状态存储器中，并将输出的脉冲数据发送给调度器进行存储，以便下一层神经元进行状态更新。The application provides a spiking neural network accelerator, including: a spiking unit, a controller, a time step generator, a scheduler, a state memory, and a neuron computing unit; the controller is used to drive the time step after receiving input data The generator generates a time step t, and sends the time step t and input data to the pulse encoder unit; the pulse encoding unit is used to encode the input data into the pulse data of the next time step t+1 based on the time step t, and sends The pulse data is sent to the controller, and the controller sends the pulse data to the scheduler; the scheduler is used to decode the pulse data to obtain the pulse source address and time step t+1, and the pulse source address and time step t+1 Stored in the FIFO array; the controller is also used to read the pulse source address from the FIFO array based on the current time step, and send the pulse source address to the neuron computing unit; the neuron computing unit is used for according to the pulse source address Read the corresponding neuron state data from the state memory, update the state of the neurons in the corresponding layer according to the neuron state data, write the updated state data back to the state memory, and send the output pulse data to the scheduler Stored so that the next layer of neurons can update the state.

本申请中，采用脉冲编码的时间步比时间步生成器产生的时间步延迟一个时间步，使得神经元计算单元在计算t时刻的输入脉冲时，脉冲编码单元在编码t+1时刻的脉冲，从而神经元计算单元在t时刻时便可以直接得到待计算的脉冲，而无需等待编码，提高了计算效率，通过时间步可以确定当前时间下的脉冲是否计算完成，从而改善了现有的加速器在硬件实现上无法确定当前时间下是否剩余未处理的事件，导致加速器的性能较低的技术问题。In this application, the time step using pulse encoding is delayed by one time step than the time step generated by the time step generator, so that when the neuron calculation unit calculates the input pulse at time t, the pulse encoding unit encodes the pulse at time t+1, Thus, the neuron calculation unit can directly obtain the pulse to be calculated at time t without waiting for encoding, which improves the calculation efficiency, and can determine whether the calculation of the pulse at the current time is completed through the time step, thereby improving the existing accelerator. Hardware implementation cannot determine whether there are unprocessed events left at the current time, resulting in a technical problem of low performance of the accelerator.

附图说明Description of drawings

为了更清楚地说明本申请实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本申请的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动性的前提下，还可以根据这些附图获得其它的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only These are some embodiments of the present application. Those skilled in the art can also obtain other drawings based on these drawings without any creative effort.

图1为本申请实施例提供的一种脉冲神经网络加速器的一个结构示意图；FIG. 1 is a schematic structural diagram of a spiking neural network accelerator provided in an embodiment of the present application;

图2为本申请实施例提供的一种伪随机数生成器的一个结构示意图；Fig. 2 is a schematic structural diagram of a pseudo-random number generator provided by the embodiment of the present application;

图3为本申请实施例提供的一种脉冲神经网络加速器的一个结构示意图；FIG. 3 is a schematic structural diagram of a spiking neural network accelerator provided in an embodiment of the present application;

图4为本申请实施例提供的LIF神经元流水线计算方式示意图；FIG. 4 is a schematic diagram of the calculation method of the LIF neuron pipeline provided by the embodiment of the present application;

图5为本申请实施例提供的Izhikevich神经元神经元流水线计算方式示意图；Fig. 5 is a schematic diagram of the Izhikevich neuron neuron pipeline calculation method provided by the embodiment of the present application;

图6为本申请实施例提供的并行缓存及转发的示意图。FIG. 6 is a schematic diagram of parallel buffering and forwarding provided by an embodiment of the present application.

具体实施方式Detailed ways

为了使本技术领域的人员更好地理解本申请方案，下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅是本申请一部分实施例，而不是全部的实施例。基于本申请中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本申请保护的范围。In order to enable those skilled in the art to better understand the solution of the present application, the technical solution in the embodiment of the application will be clearly and completely described below in conjunction with the accompanying drawings in the embodiment of the application. Obviously, the described embodiment is only It is a part of the embodiments of this application, not all of them. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the scope of protection of this application.

为了便于理解，请参阅图1，本申请实施例提供了一种脉冲神经网络加速器，包括：脉冲编码单元、控制器、时间步生成器、调度器、状态存储器和神经元计算单元；For ease of understanding, please refer to FIG. 1. The embodiment of the present application provides a pulse neural network accelerator, including: a pulse encoding unit, a controller, a time step generator, a scheduler, a state memory, and a neuron computing unit;

控制器，用于在接收到输入数据后驱动时间步生成器生成时间步t，并将时间步t和输入数据发送给脉冲编码器单元；A controller is used to drive the time step generator to generate a time step t after receiving the input data, and send the time step t and the input data to the pulse encoder unit;

脉冲编码单元，用于基于时间步t将输入数据编码为下一个时间步t+1的脉冲数据，并将脉冲数据发送给控制器，由控制器将脉冲数据发送给调度器；a pulse encoding unit, configured to encode the input data into pulse data of the next time step t+1 based on the time step t, and send the pulse data to the controller, and the controller sends the pulse data to the scheduler;

调度器，用于对脉冲数据进行解码得到脉冲源地址和时间步t+1，并将脉冲源地址和时间步t+1存储到FIFO阵列中；The scheduler is used to decode the pulse data to obtain the pulse source address and time step t+1, and store the pulse source address and time step t+1 in the FIFO array;

控制器还用于基于当前的时间步从FIFO阵列中读取脉冲源地址，并将该脉冲源地址发送给神经元计算单元；The controller is also used to read the pulse source address from the FIFO array based on the current time step, and send the pulse source address to the neuron computing unit;

神经元计算单元，用于根据脉冲源地址从状态存储器中读取对应的神经元状态数据，根据神经元状态数据对对应层的神经元进行状态更新，将更新后的状态数据写回状态存储器中，并将输出的脉冲数据发送给调度器进行存储，以便下一层神经元进行状态更新。The neuron computing unit is used to read the corresponding neuron state data from the state memory according to the pulse source address, update the state of the neurons of the corresponding layer according to the neuron state data, and write the updated state data back into the state memory , and send the output pulse data to the scheduler for storage, so that the next layer of neurons can update the state.

本申请中的脉冲神经网络加速器的核心模块为控制器，控制器在接收到输入数据后，驱动时间步生成器运行生成时间步t，然后发送时间步t和输入数据给脉冲编码单元。脉冲编码单元在接收到时间步t和输入数据后，对时间步t进行加一，并将输入数据编码为脉冲数据，得到下一个时间步t+1的脉冲数据，并将脉冲数据发送给控制器，由控制器将脉冲数据发送给调度器；脉冲编码单元，还用于在编码完下一个时间步t+1的输入数据后，触发时间步生成器生成下一个时间步t+1。The core module of the spiking neural network accelerator in this application is the controller. After receiving the input data, the controller drives the time step generator to run to generate the time step t, and then sends the time step t and the input data to the pulse encoding unit. After receiving the time step t and the input data, the pulse encoding unit adds one to the time step t, encodes the input data into pulse data, obtains the pulse data of the next time step t+1, and sends the pulse data to the control The controller sends the pulse data to the scheduler; the pulse encoding unit is also used to trigger the time step generator to generate the next time step t+1 after encoding the input data of the next time step t+1.

进一步，为了解决将输入数据转换为脉冲数据，本申请实施例设计了一种基于伪随机数的泊松编码器。本申请实施例中的脉冲编码单元包括：伪随机数生成器和泊松编码器；Further, in order to solve the problem of converting input data into pulse data, the embodiment of the present application designs a Poisson encoder based on pseudo-random numbers. The pulse encoding unit in the embodiment of the present application includes: a pseudo-random number generator and a Poisson encoder;

伪随机数生成器，用于在接收到时间步t后，随机生成0到1之间的随机数；A pseudo-random number generator, used to randomly generate a random number between 0 and 1 after receiving time step t;

泊松编码器，用于基于随机数对输入数据进行泊松编码，生成下一个时间步t+1的脉冲数据。Poisson encoder, which is used to Poisson encode the input data based on random numbers to generate the pulse data of the next time step t+1.

请参考图2，本申请实施例采用12bit线性反馈移位寄存器(LSFR)生成0到1之间的随机数rand，泊松编码器的编码公式为：Please refer to FIG. 2, the embodiment of the present application uses a 12bit linear feedback shift register (LSFR) to generate a random number rand between 0 and 1, and the encoding formula of the Poisson encoder is:

if rand<c×I_ijk,then spike；if rand<c×I _ijk , then spike;

式中，c为控制参数，用于控制最大的脉冲发放频率，优选设置c＝1，即最大发放脉冲频率为1Khz，I_ijk为输入数据，输入数据可以是归一化后的数据，根据上式可知，当c＝1时，只要生成的随机数小于输入数据，便会产生一个脉冲。In the formula, c is a control parameter, which is used to control the maximum pulse emission frequency, preferably setting c=1, that is, the maximum emission pulse frequency is 1Khz, I _ijk is input data, and the input data can be normalized data, according to the above It can be seen from the formula that when c=1, as long as the generated random number is smaller than the input data, a pulse will be generated.

控制器接受脉冲编码单元通过AER总线发送的脉冲数据，并转发给调度器，调度器对脉冲数据进行解码得到脉冲源地址和时间步，将脉冲源地址和时间步存储到FIFO阵列中。The controller receives the pulse data sent by the pulse encoding unit through the AER bus and forwards it to the scheduler. The scheduler decodes the pulse data to obtain the pulse source address and time step, and stores the pulse source address and time step in the FIFO array.

进一步，本申请实施例中的脉冲神经网络加速器还包括：延迟存储单元和突触延迟计算单元；Further, the spiking neural network accelerator in the embodiment of the present application also includes: a delay storage unit and a synaptic delay calculation unit;

延迟存储单元，用于存储目标神经元的突触延迟时间；延迟存储单元存储着每一个突触后神经元发送脉冲时需要的延迟大小，该延迟只会在产生脉冲时才会作用于调度器；The delay storage unit is used to store the synaptic delay time of the target neuron; the delay storage unit stores the delay required for each post-synaptic neuron to send a pulse, and the delay will only act on the scheduler when a pulse is generated ;

突触延迟计算单元，用于根据目标神经元的突触延迟和当前时间步计算目标时间步，将目标神经元的目标时间步发送给调度器，由调度器根据目标神经元的脉冲源地址和目标时间步进行存储。The synaptic delay calculation unit is used to calculate the target time step according to the synaptic delay of the target neuron and the current time step, and send the target time step of the target neuron to the scheduler, and the scheduler uses the pulse source address of the target neuron and The target time step is stored.

调度器中的FIFO阵列为16个并行的FIFO阵列，如图3所示，可以实现16种不同的突触延迟，16个并行的FIFO阵列组成一个环形阵列，实现对不同时间步下的脉冲源地址的读取。The FIFO array in the scheduler is 16 parallel FIFO arrays, as shown in Figure 3, which can realize 16 different synaptic delays, and 16 parallel FIFO arrays form a ring array to realize the pulse source under different time steps address read.

当脉冲编码单元完成对输入数据的编码后，控制器基于当前的时间步从FIFO阵列中读取脉冲源地址，并将该脉冲源地址发送给神经元计算单元，神经元计算单元根据脉冲源地址从状态存储器中读取对应的神经元状态数据，根据神经元状态数据对对应层的神经元进行状态更新，将更新后的状态数据写回状态存储器中，并将生成的脉冲数据发送给调度器进行存储，以便下一层神经元进行状态更新。在当前时间步完成对应神经元状态更新后，将输出的脉冲数据存储到调度器中，以便对下一层神经元的状态更新。After the pulse coding unit finishes encoding the input data, the controller reads the pulse source address from the FIFO array based on the current time step, and sends the pulse source address to the neuron computing unit, and the neuron computing unit Read the corresponding neuron state data from the state memory, update the state of the neurons of the corresponding layer according to the neuron state data, write the updated state data back to the state memory, and send the generated pulse data to the scheduler Stored so that the next layer of neurons can update the state. After the current time step completes the corresponding neuron state update, the output pulse data is stored in the scheduler, so as to update the state of the next layer of neurons.

本申请实施例中的加速器当完成对输入的图片数据完成编码和对应神经元时间单位的脉冲事件处理完成(即当前神经元单位下待处理的脉冲事件，其中一个脉冲事件代表需更新一次对应的神经网络层的所有神经元膜电位，N个表示更新N次)后，便可以完成神经元时间单位的更新，开始下一时间单位的神经元状态更新。采用结合泊松编码器、时间步和环形FIFO设计的加速器，16个并行的FIFO阵列组成一个环形阵列，实现对不同时间步下的脉冲源地址的读取，加速器不仅可以完成对外部数据的脉冲编码，还可以减少神经元状态更新的等待时间，有效的提高了硬件执行脉冲神经网络计算的速度。When the accelerator in the embodiment of the present application completes the encoding of the input picture data and the processing of the pulse event corresponding to the neuron time unit (that is, the pulse event to be processed under the current neuron unit, one of the pulse events represents the need to update the corresponding After the membrane potentials of all neurons in the neural network layer, N means updating N times), the update of the neuron time unit can be completed, and the neuron state update of the next time unit can be started. Using an accelerator designed with Poisson encoder, time step and ring FIFO, 16 parallel FIFO arrays form a ring array to realize the reading of pulse source addresses at different time steps. The accelerator can not only complete the pulse of external data Coding can also reduce the waiting time for neuron state updates, effectively improving the speed at which hardware performs spiking neural network calculations.

进一步，神经元计算单元，具体用于：Further, the neuron computing unit is specifically used for:

当神经元为IF或LIF神经元时，基于四级流水线根据脉冲源地址从状态存储器中读取对应的神经元状态数据，根据神经元状态数据计算脉冲神经网络参数，基于脉冲神经网络参数更新神经元状态，以及将更新后的神经元状态写回到状态存储器中；When the neuron is an IF or LIF neuron, based on the four-stage pipeline, the corresponding neuron state data is read from the state memory according to the pulse source address, the pulse neural network parameters are calculated according to the neuron state data, and the neural network parameters are updated based on the pulse neural network parameters. neuron state, and write the updated neuron state back into the state memory;

当神经元为Izhikevich神经元时，基于六级流水线根据脉冲源地址从状态存储器中读取对应的神经元状态数据，根据神经元状态数据计算脉冲神经网络参数，基于脉冲神经网络参数更新神经元状态，以及将更新后的神经元状态写回到状态存储器中。When the neuron is an Izhikevich neuron, read the corresponding neuron state data from the state memory according to the pulse source address based on the six-stage pipeline, calculate the pulse neural network parameters according to the neuron state data, and update the neuron state based on the pulse neural network parameters , and write the updated neuron state back into the state memory.

本申请中，状态存储器包括用于存储神经元的权重的权重存储器和用于存储神经元参数的神经元参数存储单元。神经元计算单元根据脉冲源地址从状态存储器中读取出权重和神经元参数，根据权重和神经元参数进行神经元状态更新，即更新神经元膜电位，需要将更新后的神经元状态重新写回到状态存储器。本申请实施例使用并行结合流水线技术实现神经元状态的更新。采用16个并行工作的神经元计算单元，因此可以同时完成16个神经元状态的更新。对于计算较简单的IF神经元/LIF神经元，如图4所示，其更新方法为四级流水线。每一个神经元更新的时候分为读取(R)、模型参数计算(O)、神经元状态计算(C)和写回(W)，计算的模型参数包括基于一阶欧拉方法计算的LIF神经元的膜电位电压和不应期更新值，神经元状态计算包括膜电位更新以及不应期更新。而对于计算较为复杂的Izhikevich神经元，如图5所示，其更新方法为六级流水线，包括了读取(R)、模型参数计算和神经元状态计算(A、B、C及D)和写回(W)。采用流水线技术加速计算将传统的读写分离方法中的计算部分的组合逻辑更改为流水线处理，不仅可以提高系统电路的工作时钟频率，还可以提高电路调试的简易性。In this application, the state memory includes a weight memory for storing weights of neurons and a neuron parameter storage unit for storing neuron parameters. The neuron calculation unit reads the weight and neuron parameters from the state memory according to the pulse source address, and updates the neuron state according to the weight and neuron parameters, that is, updates the neuron membrane potential, and needs to rewrite the updated neuron state Back to state memory. In this embodiment of the present application, the update of the state of the neuron is realized by using parallel and pipeline technology. 16 neuron computing units working in parallel are used, so 16 neuron states can be updated at the same time. For IF neurons/LIF neurons with relatively simple calculations, as shown in Figure 4, the update method is a four-stage pipeline. When each neuron is updated, it is divided into reading (R), model parameter calculation (O), neuron state calculation (C) and write back (W). The calculated model parameters include LIF calculated based on the first-order Euler method Membrane potential voltage and refractory period update value of neuron, neuron state calculation includes membrane potential update and refractory period update. As for the Izhikevich neurons with complex calculations, as shown in Figure 5, the update method is a six-stage pipeline, including reading (R), model parameter calculation and neuron state calculation (A, B, C and D) and Write back (W). The use of pipeline technology to accelerate calculation changes the combination logic of the calculation part in the traditional read-write separation method to pipeline processing, which can not only increase the working clock frequency of the system circuit, but also improve the simplicity of circuit debugging.

当神经元为LIF神经元时，脉冲神经网络参数包括基于一阶欧拉方法计算的LIF神经元的膜电位电压。LIF神经元以指数过程模拟生物神经元的膜电压泄漏，如下式所示：When the neuron is a LIF neuron, the spiking neural network parameters include the membrane potential voltage of the LIF neuron calculated based on the first-order Euler method. LIF neurons simulate the membrane voltage leakage of biological neurons in an exponential process, as shown in the following formula:

式中，V_m为神经元当前膜电位电压，V_m+1为泄露之后的膜电位电压，Δt为泄露时间差，τ_m为时间常数。当突触前神经元i产生一个输出脉冲时，突触权重经过一个设定的传播延迟后，更新到突触后神经元j的膜电位。根据上述计算公式可知，LIF神经元计算中存在指数运算，采用一阶Euler方法消除求解的指数计算，使用算术移位替代乘法计算，得到优化后的计算公式为：In the formula, V _m is the current membrane potential voltage of the neuron, V _m+1 is the membrane potential voltage after leakage, Δt is the time difference of leakage, and τ _m is the time constant. When presynaptic neuron i generates an output spike, the synaptic weights are updated to the membrane potential of postsynaptic neuron j after a set propagation delay. According to the above calculation formula, it can be seen that there is an exponential operation in the calculation of LIF neurons, and the first-order Euler method is used to eliminate the exponential calculation of the solution, and the arithmetic shift is used to replace the multiplication calculation. The optimized calculation formula is:

β₁＝α·V[n]，

β ₁ =α·V[n],

式中，V[n]为当前时刻的膜状态，V[n+1]为所求的膜电位大小，

将上述部分参数固定，其中，V_reset设置为0，α设置为0.5，

设置为0.125，简化后的计算公式为：Fix some of the above parameters, where V _reset is set to 0, α is set to 0.5,

Set to 0.125, the simplified calculation formula is:

V[n+1]＝V[n]-(y₁+y₂)；V[n+1]=V[n]-(y ₁ +y ₂ );

y₁＝(V[n]-β₁)＞＞4；y ₁ =(V[n]-β ₁ )>>4;

式中，>>4为逻辑左移4位，>>3为逻辑左移3位，y₁、y₂为中间参数。In the formula, >>4 is a logical left shift of 4 bits, >>3 is a logical left shift of 3 bits, and y ₁ and y ₂ are intermediate parameters.

当神经元为Izhikevich神经元时，脉冲神经网络参数包括基于一阶欧拉方法计算的Izhikevich神经元的膜电位电压。相比LIF神经元，Izhikevich神经元具有更高的生物真实性，因此计算复杂度也会相应提升，其计算公式如下式为：When the neuron is an Izhikevich neuron, the spiking neural network parameters include the membrane potential voltage of the Izhikevich neuron calculated based on the first-order Euler method. Compared with LIF neurons, Izhikevich neurons have higher biological authenticity, so the computational complexity will increase accordingly. The calculation formula is as follows:

if V＞V_threhold,SpikeandV＝c,U＝U+d；if V＞V _threshold , Spike and V=c, U=U+d;

式中，V为膜电压，U为膜电位恢复变量，I为神经元的输入电流，a、b、c、d为神经元的模型参数，V_threhold为膜电压阈值；In the formula, V is the membrane voltage, U is the membrane potential recovery variable, I is the input current of the neuron, a, b, c, d are the model parameters of the neuron, and V _threshold is the membrane voltage threshold;

对于Izhikevich神经元其优化过程与优化LIF神经元的过程类似，采用一阶欧拉方法实现公式中的dV/dt和dU/dt，虽然，Izhikevich神经元在优化后仍存在乘法计算，但已经极大的减小了指数计算，优化后的Izhikevich神经元的计算公式为：For Izhikevich neurons, the optimization process is similar to the process of optimizing LIF neurons. The first-order Euler method is used to realize the dV/dt and dU/dt in the formula. Although there are still multiplication calculations in Izhikevich neurons after optimization, they are extremely The index calculation is greatly reduced, and the calculation formula of the optimized Izhikevich neuron is:

V[n+1]＝(0.04V²+5V+140-U+I)·h；V[n+1]=(0.04V ² +5V+140-U+I) h;

U[n+1]＝[a(bV-U)]·h；U[n+1]=[a(bV-U)] h;

本申请实施例中设置a为6，b为2，c为1.0，d为0.01。In the embodiment of the present application, a is set to 6, b is set to 2, c is set to 1.0, and d is set to 0.01.

采用一阶Euler方法消除LIF/IZH神经元求解的微分计算，减少了求解的复杂度，使其硬件友好化。采用了一阶欧拉方法消除LIF神经元和Izhikevich神经元求解的微分计算得到的优化后的神经元，优化了LIF神经元和Izhikevich神经元的计算过程，降低了计算复杂度，使得神经元计算单元更新神经元膜电位的速度加快，占用更少的硬件资源，并且可以支持不同脉冲神经元的脉冲神经网络，改善了现有技术存在神经元模型单一，无法支持多个不同的脉冲神经元搭建的SNN，并且计算单元所占用的硬件资源、计算量和功耗很大的技术问题。The first-order Euler method is used to eliminate the differential calculation of the LIF/IZH neuron solution, which reduces the complexity of the solution and makes it hardware-friendly. Using the first-order Euler method to eliminate the optimized neurons obtained by the differential calculation of LIF neurons and Izhikevich neurons, optimize the calculation process of LIF neurons and Izhikevich neurons, reduce the computational complexity, and make the neuron calculation The unit updates the neuron membrane potential faster, takes up less hardware resources, and can support the spiking neural network of different spiking neurons, which improves the existing technology that has a single neuron model and cannot support the construction of multiple different spiking neurons. The SNN, and the hardware resources occupied by the computing unit, the amount of calculation and the power consumption are very large technical problems.

调度器接收到的脉冲数据为内部脉冲时，即神经元进行状态更新后输出的脉冲数据，神经元计算单元将更新后生成的脉冲数据发送给调度器进行存储，采用16个FIFO阵列对发放脉冲的地址进行缓存，并将FIFO的空信号组合起来传递至仲裁器，可以参考图6。仲裁器会根据设定好的优先级从FIFO取出发放脉冲神经元地址，这样处理的优势主要有：①无需暂停上述计算的流水线，提高计算效率。②可灵活完成十六个并行脉冲的缓存并记录脉冲源地址，无需地址变换即可完成下一层神经元更新，缩短转换时间。When the pulse data received by the scheduler is an internal pulse, that is, the pulse data output by the neuron after the state update, the neuron computing unit sends the updated pulse data to the scheduler for storage, and uses 16 FIFO arrays to send pulses The address of the FIFO is cached, and the empty signal of the FIFO is combined and passed to the arbitrator, as shown in Figure 6. The arbitrator will take out the neuron address of the pulse neuron from the FIFO according to the set priority. The main advantages of this processing are: ① There is no need to suspend the pipeline of the above calculation, which improves the calculation efficiency. ② It can flexibly complete the buffering of sixteen parallel pulses and record the pulse source address, and complete the update of the next layer of neurons without address conversion, shortening the conversion time.

本申请实施例采用两种数据集(MNIST和Fashion-MNIST)对脉冲神经网络加速器进行验证，均证明本申请实施例中的加速器有较好的加速性能。The embodiment of the present application uses two data sets (MNIST and Fashion-MNIST) to verify the spiking neural network accelerator, both of which prove that the accelerator in the embodiment of the present application has better acceleration performance.

本申请实施例中，采用脉冲编码的时间步比时间步生成器产生的时间步延迟一个时间步，使得神经元计算单元在计算t时刻的输入脉冲时，脉冲编码单元在编码t+1时刻的脉冲，从而神经元计算单元在t时刻时便可以直接得到待计算的脉冲，而无需等待编码，提高了计算效率，通过时间步可以确定当前时间下的脉冲是否计算完成，从而改善了现有的加速器在硬件实现上无法确定当前时间下是否剩余未处理的事件，导致加速器的性能较低的技术问题。并且，采用并行和可配置变长流水线架构实现对IF、LIF及IZH神经元膜电位的更新，采用时分复用技术，有效的提高了加速器规模和数据处理的效率。In the embodiment of the present application, the time step using pulse encoding is delayed by one time step than the time step generated by the time step generator, so that when the neuron calculation unit calculates the input pulse at time t, the pulse encoding unit encodes the time step at time t+1 Pulse, so that the neuron calculation unit can directly obtain the pulse to be calculated at time t without waiting for encoding, which improves the calculation efficiency, and can determine whether the calculation of the pulse at the current time is completed through the time step, thereby improving the existing The hardware implementation of the accelerator cannot determine whether there are unprocessed events left at the current time, resulting in a technical problem of low performance of the accelerator. In addition, the parallel and configurable variable-length pipeline architecture is used to update the membrane potential of IF, LIF and IZH neurons, and the time-division multiplexing technology is used to effectively improve the scale of the accelerator and the efficiency of data processing.

本申请的说明书及上述附图中的术语“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象，而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换，以便这里描述的本申请的实施例例如能够以除了在这里图示或描述的那些以外的顺序实施。此外，术语“包括”和“具有”以及他们的任何变形，意图在于覆盖不排他的包含，例如，包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元，而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。The terms "first", "second", "third", "fourth", etc. (if any) in the description of the present application and the above drawings are used to distinguish similar objects and not necessarily to describe specific sequence or sequence. It is to be understood that the data so used are interchangeable under appropriate circumstances such that the embodiments of the application described herein, for example, can be practiced in sequences other than those illustrated or described herein. Furthermore, the terms "comprising" and "having", as well as any variations thereof, are intended to cover a non-exclusive inclusion, for example, a process, method, system, product or device comprising a sequence of steps or elements is not necessarily limited to the expressly listed instead, may include other steps or elements not explicitly listed or inherent to the process, method, product or apparatus.

应当理解，在本申请中，“至少一个(项)”是指一个或者多个，“多个”是指两个或两个以上。“和/或”，用于描述关联对象的关联关系，表示可以存在三种关系，例如，“A和/或B”可以表示：只存在A，只存在B以及同时存在A和B三种情况，其中A，B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项(个)”或其类似表达，是指这些项中的任意组合，包括单项(个)或复数项(个)的任意组合。例如，a，b或c中的至少一项(个)，可以表示：a，b，c，“a和b”，“a和c”，“b和c”，或“a和b和c”，其中a，b，c可以是单个，也可以是多个。It should be understood that in this application, "at least one (item)" means one or more, and "multiple" means two or more. "And/or" is used to describe the association relationship of associated objects, indicating that there can be three types of relationships, for example, "A and/or B" can mean: only A exists, only B exists, and A and B exist at the same time , where A and B can be singular or plural. The character "/" generally indicates that the contextual objects are an "or" relationship. "At least one of the following" or similar expressions refer to any combination of these items, including any combination of single or plural items. For example, at least one item (piece) of a, b or c can mean: a, b, c, "a and b", "a and c", "b and c", or "a and b and c ", where a, b, c can be single or multiple.

在本申请所提供的几个实施例中，应该理解到，所揭露的装置和方法，可以通过其它的方式实现。例如，以上所描述的装置实施例仅仅是示意性的，例如，所述单元的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个单元或组件可以结合或者可以集成到另一个系统，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口，装置或单元的间接耦合或通信连接，可以是电性，机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed devices and methods may be implemented in other ways. For example, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components can be combined or May be integrated into another system, or some features may be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or units may be in electrical, mechanical or other forms.

所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the units can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

另外，在本申请各个实施例中的各功能单元可以集成在一个处理单元中，也可以是各个单元单独物理存在，也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现，也可以采用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit. The above-mentioned integrated units can be implemented in the form of hardware or in the form of software functional units.

所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以通过一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(英文全称：Read-OnlyMemory，英文缩写：ROM)、随机存取存储器(英文全称：Random Access Memory，英文缩写：RAM)、磁碟或者光盘等各种可以存储程序代码的介质。If the integrated unit is realized in the form of a software function unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the technical solution of the present application is essentially or part of the contribution to the prior art or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium , including several instructions for executing all or part of the steps of the methods described in the various embodiments of the present application through a computer device (which may be a personal computer, a server, or a network device, etc.). The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (English full name: Read-OnlyMemory, English abbreviation: ROM), random access memory (English full name: Random Access Memory, English abbreviation: RAM), disk Or various media such as CDs that can store program codes.

以上所述，以上实施例仅用以说明本申请的技术方案，而非对其限制；尽管参照前述实施例对本申请进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围。As mentioned above, the above embodiments are only used to illustrate the technical solutions of the present application, and are not intended to limit them; although the present application has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that: it can still understand the foregoing The technical solutions described in each embodiment are modified, or some of the technical features are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the various embodiments of the application.

Claims

1. A pulse neural network accelerator, is characterized in that, comprises: pulse encoding unit, controller, time step generator, scheduler, state memory and neuron computing unit;

The controller is configured to drive the time step generator to generate a time step t after receiving the input data, and send the time step t and the input data to the pulse encoder unit;

The pulse encoding unit is configured to encode the input data into the pulse data of the next time step t+1 based on the time step t, and send the pulse data to the controller, and the controller converts the pulse sending data to the scheduler;

The scheduler is used to decode the pulse data to obtain the pulse source address and time step t+1, and store the pulse source address and time step t+1 in the FIFO array;

The controller is further configured to read the pulse source address from the FIFO array based on the current time step, and send the pulse source address to the neuron computing unit;

The neuron calculation unit is configured to read corresponding neuron state data from the state memory according to the pulse source address, update the state of the neurons of the corresponding layer according to the neuron state data, and update the updated state The data is written back into the state memory, and the output pulse data is sent to the scheduler for storage, so that the next layer of neurons can update the state.

2. The pulse neural network accelerator according to claim 1, wherein the pulse encoding unit comprises: a pseudo-random number generator and a Poisson encoder;

The pseudo-random number generator is used to randomly generate a random number between 0 and 1 after receiving the time step t;

The Poisson encoder is configured to perform Poisson encoding on the input data based on the random number to generate pulse data at the next time step t+1.

3. The pulse neural network accelerator according to claim 1, wherein the pulse encoding unit is further configured to trigger the time step generator to generate Next time step t+1.

4. The spiking neural network accelerator according to claim 1, further comprising: a delay storage unit and a synaptic delay calculation unit;

The delay storage unit is used to store the synaptic delay time of the target neuron;

The synaptic delay calculation unit is configured to calculate a target time step according to the synaptic delay of the target neuron and the current time step, and send the target time step of the target neuron to the scheduler, and the scheduler The device stores according to the pulse source address of the target neuron and the target time step.

5. The pulse neural network accelerator according to claim 1, wherein the neuron computing unit is specifically used for:

When the neuron is an IF or LIF neuron, based on the four-stage pipeline, the corresponding neuron state data is read from the state memory according to the pulse source address, and the pulse neural network parameters are calculated according to the neuron state data. The network parameters update the neuron state, and write the updated neuron state back into the state memory;

When the neuron is an Izhikevich neuron, based on the six-stage pipeline, the corresponding neuron state data is read from the state memory according to the pulse source address, and the pulse neural network parameters are calculated according to the neuron state data, based on the pulse neural network parameters updating the neuron state, and writing the updated neuron state back into the state memory.

6. The spiking neural network accelerator according to claim 5, wherein when the neuron is a LIF neuron, the spiking neural network parameters include the membrane potential voltage of the LIF neuron calculated based on the first-order Euler method, The calculation formula is:

β ₁ =α·V[n],

In the formula, V[n] is the membrane state at the current moment, V[n+1] is the required membrane potential,

7. spiking neural network accelerator according to claim 5, is characterized in that, when neuron is Izhikevich neuron, described spiking neural network parameter comprises the membrane potential voltage of Izhikevich neuron calculated based on first-order Euler method, The calculation formula is:

V[n+1]=(0.04V ² +5V+140-U+I) h;

U[n+1]=[a(bV-U)] h;

In the formula, V is the required membrane potential voltage, U is the membrane potential recovery variable, I is the input current of the neuron, h is the time step, a, b, c, d are the model parameters of the neuron, and V _threshold is the membrane voltage threshold.

8. The spiking neural network accelerator according to claim 1 or 5, wherein the state memory comprises a weight memory for storing weights of neurons and a neuron parameter storage unit for storing neuron parameters.

9. The spiking neural network accelerator according to claim 1, wherein the FIFO array in the scheduler is 16 parallel FIFO arrays.