[go: up one dir, main page]

CN108205706B - Artificial neural network reverse training device and method - Google Patents

Artificial neural network reverse training device and method Download PDF

Info

Publication number
CN108205706B
CN108205706B CN201611180607.8A CN201611180607A CN108205706B CN 108205706 B CN108205706 B CN 108205706B CN 201611180607 A CN201611180607 A CN 201611180607A CN 108205706 B CN108205706 B CN 108205706B
Authority
CN
China
Prior art keywords
learning rate
unit
layer
training
weight
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201611180607.8A
Other languages
Chinese (zh)
Other versions
CN108205706A (en
Inventor
陈云霁
郝一帆
刘少礼
陈天石
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Cambricon Information Technology Co Ltd
Original Assignee
Shanghai Cambricon Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Cambricon Information Technology Co Ltd filed Critical Shanghai Cambricon Information Technology Co Ltd
Priority to CN201611180607.8A priority Critical patent/CN108205706B/en
Publication of CN108205706A publication Critical patent/CN108205706A/en
Application granted granted Critical
Publication of CN108205706B publication Critical patent/CN108205706B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/06Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons
    • G06N3/063Physical realisation, i.e. hardware implementation of neural networks, neurons or parts of neurons using electronic means

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Artificial Intelligence (AREA)
  • Neurology (AREA)
  • Feedback Control In General (AREA)
  • Complex Calculations (AREA)

Abstract

本发明提供了一种人工神经网络反向训练装置和方法,其中装置包括控制器单元、存储单元、学习率调整单元和运算单元,存储单元用于存储神经网络数据,包括指令、权值、激活函数的导数、学习率、梯度向量和学习率调整数据;控制器单元,用于从存储单元中读取指令,并将指令译码成控制存储单元、学习率调整单元和运算单元行为的微指令;学习率调整单元,每代训练开始前,根据上一代学习率和学习率调整数据,运算后得出用于本代学习率;运算单元,根据梯度向量、本代学习率、激活函数的导数和上一代权值计算本代权值。本发明的装置和方法使得训练迭代过程更加稳定,而且减少了神经网络训练至稳定所需的时间,提升了训练效率。

Figure 201611180607

The invention provides an artificial neural network reverse training device and method, wherein the device includes a controller unit, a storage unit, a learning rate adjustment unit and an operation unit, and the storage unit is used to store neural network data, including instructions, weights, activations Derivatives of functions, learning rate, gradient vectors, and learning rate adjustment data; a controller unit that reads instructions from storage units and decodes them into microinstructions that control the behavior of storage units, learning rate adjustment units, and arithmetic units ; Learning rate adjustment unit, before the start of each generation of training, adjust the data according to the learning rate and learning rate of the previous generation, and obtain the learning rate for the current generation after operation; operation unit, according to the gradient vector, the learning rate of the current generation, and the derivative of the activation function Calculate the weight of this generation with the weight of the previous generation. The device and method of the present invention make the training iterative process more stable, reduce the time required for the neural network to be trained to be stable, and improve the training efficiency.

Figure 201611180607

Description

Artificial neural network reverse training device and method
Technical Field
The invention relates to an artificial neural network, in particular to an artificial neural network reverse training device and an artificial neural network reverse training method.
Background
Artificial Neural Networks (ANNs), referred to as Neural Networks (NNs), are algorithmic mathematical models that mimic the behavioral characteristics of animal Neural Networks and perform distributed parallel information processing. The network achieves the aim of processing information by adjusting the interconnection relationship among a large number of nodes in the network depending on the complexity of the system. The algorithm used by neural networks is vector multiplication, and sign functions and their various approximations are widely used.
One known method to support multi-layer artificial neural network back training is to use a general purpose processor. One of the disadvantages of this method is that the single general-purpose processor has a low operation performance and cannot meet the performance requirements of the common multi-layer artificial neural network operation. When multiple general-purpose processors are executed in parallel, the mutual communication between the general-purpose processors becomes a performance bottleneck. In addition, the general processor needs to decode the reverse operation of the multilayer artificial neural network into a long-row operation and access instruction sequence, and the front-end decoding of the processor brings large power consumption overhead.
Another known method to support multi-layer artificial neural network back training is to use a Graphics Processor (GPU). The GPU only has small on-chip cache, model data (weight) of the multilayer artificial neural network needs to be carried from the outside of the chip repeatedly, the bandwidth of the outside of the chip becomes a main performance bottleneck, and huge power consumption overhead is brought.
Disclosure of Invention
Technical problem to be solved
The present invention is directed to an apparatus and method for artificial neural network reverse training supporting adaptive learning rate, which solves at least one of the above technical problems of the prior art.
(II) technical scheme
According to an aspect of the present invention, there is provided an artificial neural network reverse training apparatus including a controller unit, a storage unit, a learning rate adjustment unit, and an arithmetic unit, wherein,
the storage unit is used for storing neural network data, including instructions, weights, derivatives of activation functions, learning rates, gradient vectors and learning rate adjustment data;
the controller unit is used for reading the instruction from the storage unit and decoding the instruction into a microinstruction for controlling the behaviors of the storage unit, the learning rate adjusting unit and the arithmetic unit;
a learning rate adjusting unit, which adjusts data according to the previous generation learning rate and the learning rate before each generation of training, and obtains the learning rate for the current generation of training after calculation;
and the operation unit is used for calculating the weight of the current generation according to the gradient vector, the learning rate of the current generation, the derivative of the activation function and the weight of the previous generation.
Further, the arithmetic unit includes a master arithmetic unit, an interconnection unit, and a plurality of slave arithmetic units, the gradient vector includes an input gradient vector and an output gradient vector, in which: the main operation unit is used for completing subsequent calculation by utilizing the output gradient vector of the layer in the calculation process of each layer; the interconnection unit is used for transmitting the input gradient vector of the layer to all the slave operation units through the interconnection unit at the stage of starting calculation of reverse training of each layer of the neural network, and after the calculation process of the slave operation units is completed, the interconnection unit gradually adds the output gradient vector parts of all the slave operation units pairwise to obtain the output gradient vector of the layer; the plurality of slave arithmetic units calculate corresponding partial sums of output gradient vectors in parallel by using the same input gradient vector and respective weight data.
Further, the storage unit is an on-chip cache.
Further, the instruction is a SIMD instruction.
Further, the learning rate adjustment data includes a weight variation and an error function.
According to another aspect of the present invention, there is provided an artificial neural network reverse training method, including the steps of:
s1: before each generation of training begins, the learning rate used for the training of the current generation is calculated according to the learning rate of the previous generation and the learning rate adjustment data;
s2: the training is started, and the weight values are updated layer by layer according to the learning rate of the training of the present generation;
s3: after all the weight values are updated, calculating learning rate adjustment data of the present generation network, and storing the learning rate adjustment data;
s4: and judging whether the neural network converges, if so, finishing the operation, and otherwise, turning to the step S1.
Further, step S2 includes:
s21: for each layer of the network, carrying out weighted summation on input gradient vectors to calculate output gradient vectors of the layer, wherein the weight of the weighted summation is the weight to be updated of the layer;
s22: multiplying the output gradient vector of the current layer by the derivative value of the activation function of the next layer during forward operation to obtain the input gradient vector of the next layer;
s23: multiplying the input gradient vector by the input neuron counterpoint during forward operation to obtain the gradient of the weight of the layer;
s24: updating the weight of the layer according to the gradient and the learning rate of the obtained weight of the layer;
s25: judging whether all layers are updated, if so, entering step S3; otherwise, go to step S21.
Furthermore, in the present training, the weight value adopts a non-uniform learning rate.
Furthermore, in the training of the present generation, the weight value adopts the unified learning rate.
(III) advantageous effects
(1) By arranging the learning rate adjusting unit and adopting the adaptive learning rate training network, the weight variation generated in each cycle training is more properly determined, so that the training iteration process is more stable, the time required for the neural network to be trained to be stable is reduced, and the training efficiency is improved;
(2) by adopting the special on-chip cache aiming at the multilayer artificial neural network operation algorithm, the reusability of input neurons and weight data is fully excavated, the data is prevented from being read from the memory repeatedly, the memory access bandwidth is reduced, and the problem that the memory bandwidth becomes the performance bottleneck of the multilayer artificial neural network operation and the training algorithm thereof is avoided.
(3) By adopting the special SIMD instruction and the customized operation unit aiming at the operation of the multilayer artificial neural network, the problems of insufficient operation performance of the CPU and the GPU and high front-end decoding overhead are solved, and the support for the operation algorithm of the multilayer artificial neural network is effectively improved.
Drawings
FIG. 1 is a block diagram illustrating an example of the overall structure of an artificial neural network reverse training apparatus according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of the structure of the interconnection unit in the artificial neural network reverse training device in FIG. 1;
FIG. 3 is a schematic diagram of an artificial neural network back-regulation process according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a process for backward regulation using an artificial neural network, according to an embodiment of the present invention;
FIG. 5 is a flowchart illustrating an operation of a method for inverse training with an artificial neural network according to an embodiment of the present invention.
FIG. 6 is a flowchart illustrating an operation of a method for inverse training with an artificial neural network according to another embodiment of the present invention.
Detailed Description
The traditional training method adopted by the artificial neural network is a back propagation algorithm, the variable quantity of the weight between two generations is the gradient of an error function to the weight multiplied by a constant, and the constant is called as a learning rate. The learning rate determines the amount of weight variation generated in each round of training. The value is too small, the effective updating of the weight value in each iteration is too small, the small learning rate causes longer training time, and the convergence speed is quite slow; if the value is too large, the iterative process may oscillate so as to diverge. The artificial neural network reverse training device is provided with a learning rate adjusting unit, and before each generation of training, the learning rate adjusting unit adjusts data according to the learning rate and the learning rate of the previous generation and calculates to obtain the learning rate for the current generation. The weight variable quantity generated in each cycle training is more properly determined, so that the training iterative process is more stable, the time required for the neural network to be trained to be stable is shortened, and the training efficiency is improved.
In order that the objects, technical solutions and advantages of the present invention will become more apparent, the present invention will be further described in detail with reference to the accompanying drawings in conjunction with the following specific embodiments.
Fig. 1 is a block diagram illustrating an example of an overall structure of an artificial neural network reverse training apparatus according to an embodiment of the present invention. The embodiment of the invention provides a device for artificial neural network reverse training supporting adaptive learning rate, which comprises:
the storage unit A is used for storing neural network data, including instructions, weights, derivatives of activation functions, learning rates, gradient vectors (which may include input gradient vectors and output gradient vectors) and learning rate adjustment data (which may include network error values, value variation and the like); the storage unit can be an on-chip cache, so that the situation that the data are read from the memory repeatedly and the memory bandwidth becomes the performance bottleneck of multilayer artificial neural network operation and training algorithm thereof is avoided.
The controller unit B is used for reading the instruction from the storage unit A and decoding the instruction into a microinstruction for controlling the behaviors of the storage unit, the learning rate adjusting unit and the arithmetic unit;
the instructions accessed and read by the storage unit A and the controller unit B can be SIMD instructions, and the problems of insufficient operation performance and high front-end decoding overhead of the existing CPU and GPU are solved by adopting the special SIMD instruction aiming at the operation of the multilayer artificial neural network.
A learning rate adjusting unit E, before each generation of training, adjusting data according to the learning rate and the learning rate of the previous generation, and obtaining the learning rate for the current generation after calculation;
and the operation units (D, C, F) calculate the current generation weight according to the gradient vector, the current generation learning rate, the derivative of the activation function and the previous generation weight.
The storage unit A is used for storing neural network data including instructions and storage neuron input, weights, neuron output, learning rates, weight variation, activation function derivatives, gradient vectors of all layers and the like;
for the controller unit B, reading an instruction from the storage unit A and decoding the instruction into a microinstruction for controlling the behavior of each unit;
as for the operation unit, it may include a master operation unit C, an interconnection unit D, and a plurality of slave operation units F.
The interconnection unit D is used for connecting the master operation module and the slave operation module, and may be implemented in different interconnection topologies (e.g., a tree structure, a ring structure, a grid structure, a hierarchical interconnection, a bus structure, etc.).
And the interconnection unit D is used for transmitting the input gradient vector of the layer to all the slave operation units F through the interconnection unit D at the stage of starting calculation of reverse training of each layer of the neural network, and after the calculation process of the slave operation units F is completed, the interconnection unit D gradually adds the output gradient vector parts of the slave operation units F in pairs to obtain the output gradient vector of the layer.
The main operation unit C is used for completing subsequent calculation by utilizing the output gradient vector of the layer in the calculation process of each layer;
a plurality of slave operation units F, which utilize the same input gradient vector and respective weight data to calculate the corresponding output gradient vector partial sum in parallel;
the learning rate adjusting unit E is configured to obtain the learning rate for the training of the previous generation after calculation according to the information of the learning rate, the weight, the network error value, the weight variation, and the like of the previous generation (the information is stored in the storage unit in advance and can be called up).
Fig. 2 schematically shows an embodiment of the interconnect unit 4: and (5) an interconnection structure. The interconnection unit D constitutes a data path between the master operation unit C and the plurality of slave operation units F, and has an interconnection type structure. The interconnect includes a plurality of nodes that form a binary tree path, i.e., each node has a parent node and 2 child nodes. Each node sends the upstream data to the two downstream child nodes through the parent node in the same way, merges the data returned by the two downstream child nodes and returns the merged data to the upstream parent node.
For example, in the reverse operation process of the neural network, vectors returned by two nodes at the downstream end are added into one vector at the current node and returned to the node at the upstream end. At the stage that each layer of artificial neural network starts to calculate, the input gradient in the main operation unit C is sent to each slave operation unit F through the interconnection unit D; after the calculation process of the slave operation unit F is completed, the sum of the output gradient vector portions output by each slave operation unit F is added pairwise in the interconnection unit D, that is, the sum of all the output gradient vector portions is summed to be the final output gradient vector.
The learning rate adjusting means E performs different calculations on data according to the adaptive learning rate adjusting method.
First, in the standard back propagation algorithm:
w(k+1)=w(k)-ηg(w(k)) (1)
in the formula (1), w (k) is the current training weight, i.e. the current generation weight, w (k +1) is the next generation weight, η is the fixed learning rate and is a predetermined constant, and g (w) is the gradient vector.
Here we allow the learning rate to be updated from generation to generation, as are other network parameters. The method for adjusting the learning rate comprises the following steps: when the training error increases, the learning rate is reduced; when the training error decreases, the learning rate is increased. Several specific examples of adaptive learning rate adjustment rules are given below, but are not limited to these examples.
The method comprises the following steps:
Figure BDA0001184541460000061
in the formula (2), η (k) is the present-generation learning rate, η (k +1) is the next-generation learning rate, Δ E ═ E (k) -E (k-1) is the variation of the error function E, a > 0, b > 0, and a, b are appropriate constants.
The second method comprises the following steps:
η(k+1)=η(k)(1-ΔE) (3)
in the formula (3), η (k) is the present-generation learning rate, η (k +1) is the next-generation learning rate, and Δ E ═ E (k) -E (k-1) is the amount of change in the error function E.
The third method comprises the following steps:
Figure BDA0001184541460000071
in the formula (4), η (k) is a present-generation learning rate, η (k +1) is a next-generation learning rate, Δ E ═ E (k) -E (k-1) is a variation of the error function E, a > 1, 0 < b < 1, c > 0, and a, b, and c are appropriate constants.
The method four comprises the following steps:
Figure BDA0001184541460000072
in the formula (5), η (k) is a present-generation learning rate, η (k +1) is a next-generation learning rate, Δ E ═ E (k) -E (k-1) is a variation of the error function E, 0 < a < 1, b > 1, 0 < α < 1, a, b, α areThe appropriate constant is set to be a constant,
Figure BDA0001184541460000073
the learning rate η in the above four methods can be common to all weights, that is, the same learning rate is used for each layer of weights during each generation of training, and we remember this method as a uniform adaptive learning rate training method; or not universal, that is, different learning rates are adopted for each weight, and we remember this method as a respective adaptive learning rate training method. The training precision can be further improved and the training time can be reduced by the respective adaptive learning rate training method.
For clearer comparison, schematic diagrams of two methods are respectively given, and a unified adaptive learning rate training method and a respective adaptive learning rate training method respectively correspond to fig. 3 and fig. 4.
In FIG. 3, the connection weight w between the output layer P and the hidden layer Jjp1,wjp2,…,wjpnDuring reverse adjustment, the learning rate eta is uniformly adopted for adjustment; in FIG. 4, the connection weight w between the output layer P and the hidden layer Jjp1,wjp2,...,wjpnIn the reverse adjustment, the learning rate eta is adopted1,η2,...,ηnAnd (6) adjusting. The difference between different nodes is reversely adjusted, so that the self-adaptive capacity of the learning rate can be furthest adjusted, and the changeable requirements of the weight in learning can be furthest met.
As for the respective adaptive learning rate adjusting methods, after the initial values of the respective learning rates are obtained, the iterative updating of the respective learning rates can still be performed according to the methods one to four, which are not limited to these four methods. The learning rate η in this equation is the respective learning rate corresponding to each weight.
Based on the same inventive concept, the invention also provides an artificial neural network reverse training method, wherein an operation flow chart is shown in fig. 5, and the method comprises the following steps:
s1: before each generation of training begins, the learning rate used for the training of the current generation is calculated according to the learning rate of the previous generation and the learning rate adjustment data;
s2: the training is started, and the weight values are updated layer by layer according to the learning rate of the training of the present generation;
s3: after all the weight values are updated, calculating learning rate adjustment data of the present generation network, and storing the learning rate adjustment data;
s4: judging whether the neural network is converged, if so, finishing the operation, otherwise, turning to the step S1
For step S1, before each generation of training starts, the learning rate adjustment unit E calls the learning rate adjustment data in the storage unit a to adjust the learning rate, resulting in the learning rate used for the training of the present generation.
For step S2: after the training of this generation, the weight values are updated layer by layer according to the learning rate of the training of this generation. Step S2 may include the following sub-steps (see fig. 6):
step S21, for each layer, first, performing weighted summation on the input gradient vector to calculate the output gradient vector of the layer, where the weight of the weighted summation is the weight to be updated of the layer, and the process is completed by the master operation unit C, the interconnection unit D, and each slave operation unit F together;
step S22, in the main operation unit C, the output gradient vector is multiplied by the derivative value of the activation function of the next layer during forward operation to obtain the input gradient vector of the next layer;
step S23, in the main operation unit C, multiplying the input gradient vector by the input neuron counterpoint in the forward operation to obtain the gradient of the weight of the layer;
step S24, finally, in the main operation unit C, updating the weight of the layer according to the obtained gradient and learning rate of the weight of the layer;
step S25: and judging whether the weight values of all the layers are updated, if so, performing the step S3, otherwise, turning to the step S21.
In step S3, after all the weights are updated, the main operation unit C calculates other data for adjusting the learning rate, such as the network error of this generation, and puts the calculated data into the storage unit a, and this generation of training is finished.
Step S4: and judging whether the network converges, if so, finishing the operation, and otherwise, turning to the step S1.
The weight value is a non-uniform learning rate or a uniform learning rate, and the specific description refers to the above contents, which are not repeated herein.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention and are not intended to limit the present invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (15)

1. An artificial neural network reverse training device comprises a storage unit, a learning rate adjusting unit and an arithmetic unit, wherein,
the storage unit is used for storing neural network data, and the neural network data comprises instructions, weights, derivatives of activation functions, learning rates, gradient vectors and learning rate adjustment data;
a learning rate adjusting unit, which adjusts data according to the previous generation learning rate and the learning rate before each generation of training, and obtains the learning rate for the current generation after calculation;
the operation unit is used for calculating the weight of the current generation according to the gradient vector, the learning rate of the current generation, the derivative of the activation function and the weight of the previous generation;
the arithmetic unit comprises a main arithmetic unit, an interconnection unit and a plurality of slave arithmetic units, wherein the gradient vector comprises an input gradient vector and an output gradient vector, and the operation unit comprises:
the main operation unit is used for completing subsequent calculation by utilizing the output gradient vector of the layer in the calculation process of each layer;
the interconnection unit is used for transmitting the input gradient vector of the layer to all the slave operation units through the interconnection unit at the stage of starting calculation of reverse training of each layer of the neural network, and after the calculation process of the slave operation units is completed, the interconnection unit gradually adds the output gradient vector parts of all the slave operation units pairwise to obtain the output gradient vector of the layer so as to realize interconnection topology;
the plurality of slave arithmetic units calculate corresponding partial sums of output gradient vectors in parallel by using the same input gradient vector and respective weight data.
2. The apparatus of claim 1, further comprising:
and the controller unit is used for reading the instruction from the storage unit and decoding the instruction into a microinstruction for controlling the behaviors of the storage unit, the learning rate adjusting unit and the arithmetic unit.
3. The apparatus of claim 1, wherein the interconnection topology is at least one of:
tree structures, ring structures, mesh structures, hierarchical interconnects, and bus structures.
4. The apparatus of claim 1, wherein the interconnect comprises a plurality of nodes, and the plurality of nodes form a binary tree path, that is, each node has a parent node and 2 child nodes, each node sends upstream data to two child nodes downstream through the parent node, and combines data returned from the two child nodes downstream and returns the data to the parent node upstream.
5. The apparatus of claim 1, wherein the storage unit is an on-chip cache.
6. The apparatus of claim 1, wherein the instruction is a SIMD instruction.
7. The apparatus of claim 1, wherein the learning rate adjustment data comprises a weight variance and an error function.
8. An artificial neural network reverse training method comprises the following steps:
before each generation of training begins, the learning rate adjusting unit adjusts data according to the previous generation learning rate and the learning rate, and the learning rate used for the training of the current generation is calculated;
the training begins, and the operation unit updates the weight layer by layer according to the learning rate of the training of the present generation;
after all the weight values are updated, the learning rate adjusting unit calculates the learning rate adjusting data of the present network, and the storage unit stores the learning rate adjusting data;
the operation unit judges whether the neural network is converged, if so, the operation is finished, otherwise, the steps are continuously executed;
the step of the arithmetic unit executing arithmetic includes:
using a main operation unit to finish subsequent calculation by using the output gradient vector of the layer in the calculation process of each layer;
using an interconnection unit, in the stage of starting calculation by reverse training of each layer of neural network, transmitting the input gradient vector of the layer to all the slave operation units by the main operation unit through the interconnection unit, and after the calculation process of the slave operation units is completed, gradually adding the output gradient vector parts of the slave operation units in pairs by the interconnection unit to obtain the output gradient vector of the layer so as to realize interconnection topology;
using a plurality of slave arithmetic units, corresponding partial sums of output gradient vectors are calculated in parallel using the same input gradient vector and respective weight data.
9. The method of claim 8, wherein the training is started, and the computing unit updates the weights layer by layer according to the learning rate of the training of the present generation, and specifically comprises:
for each layer of the network, carrying out weighted summation on input gradient vectors to calculate output gradient vectors of the layer, wherein the weight of the weighted summation is the weight to be updated of the layer;
multiplying the output gradient vector of the current layer by the derivative value of the activation function of the next layer during forward operation to obtain the input gradient vector of the next layer;
multiplying the input gradient vector by the input neuron counterpoint during forward operation to obtain the gradient of the weight of the layer;
updating the weight of the layer according to the gradient and the learning rate of the obtained weight of the layer;
judging whether all layers are updated, if so, entering the following steps; otherwise, continuing to perform the above steps.
10. The method of claim 9, wherein the interconnection topology is at least one of:
tree structures, ring structures, mesh structures, hierarchical interconnects, and bus structures.
11. The method of claim 9, wherein the interconnect comprises a plurality of nodes, and the plurality of nodes form a binary tree path, that is, each node has a parent node and 2 child nodes, each node sends upstream data to two child nodes downstream through the parent node, and combines data returned from the two child nodes downstream and returns the data to the parent node upstream.
12. The method of claim 8, wherein the weights are non-uniformly learned using a non-uniform learning rate in the current generation of training.
13. The method of claim 8, wherein the weights are trained according to a uniform learning rate.
14. The method of claim 8, further comprising:
the controller unit is used to read instructions from the memory unit and decode the instructions into microinstructions that control the behavior of the memory unit, the learning rate adjustment unit, and the arithmetic unit.
15. The method of claim 14, wherein the instruction is a SIMD instruction.
CN201611180607.8A 2016-12-19 2016-12-19 Artificial neural network reverse training device and method Active CN108205706B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611180607.8A CN108205706B (en) 2016-12-19 2016-12-19 Artificial neural network reverse training device and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611180607.8A CN108205706B (en) 2016-12-19 2016-12-19 Artificial neural network reverse training device and method

Publications (2)

Publication Number Publication Date
CN108205706A CN108205706A (en) 2018-06-26
CN108205706B true CN108205706B (en) 2021-04-23

Family

ID=62601948

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611180607.8A Active CN108205706B (en) 2016-12-19 2016-12-19 Artificial neural network reverse training device and method

Country Status (1)

Country Link
CN (1) CN108205706B (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109445688B (en) * 2018-09-29 2022-04-15 上海百功半导体有限公司 Storage control method, storage controller, storage device and storage system
CN110309918B (en) * 2019-07-05 2020-12-18 安徽寒武纪信息科技有限公司 Neural network online model verification method and device and computer equipment
CN110782017B (en) * 2019-10-25 2022-11-22 北京百度网讯科技有限公司 Method and device for adaptively adjusting learning rate
CN118016218B (en) * 2024-04-09 2024-06-25 山东新旋工业装备科技有限公司 Intelligent analysis method for high-temperature-resistant rotary joint material performance data

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105204333A (en) * 2015-08-26 2015-12-30 东北大学 Energy consumption prediction method for improving energy utilization rate of iron and steel enterprise
CN105468335A (en) * 2015-11-24 2016-04-06 中国科学院计算技术研究所 Pipeline-level operation device, data processing method and network-on-chip chip
CN105512723A (en) * 2016-01-20 2016-04-20 南京艾溪信息科技有限公司 Artificial neural network calculating device and method for sparse connection
CN105654729A (en) * 2016-03-28 2016-06-08 南京邮电大学 Short-term traffic flow prediction method based on convolutional neural network
CN106203627A (en) * 2016-07-08 2016-12-07 中国电子科技集团公司电子科学研究院 A kind of method that network target range is evaluated

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105204333A (en) * 2015-08-26 2015-12-30 东北大学 Energy consumption prediction method for improving energy utilization rate of iron and steel enterprise
CN105468335A (en) * 2015-11-24 2016-04-06 中国科学院计算技术研究所 Pipeline-level operation device, data processing method and network-on-chip chip
CN105512723A (en) * 2016-01-20 2016-04-20 南京艾溪信息科技有限公司 Artificial neural network calculating device and method for sparse connection
CN105654729A (en) * 2016-03-28 2016-06-08 南京邮电大学 Short-term traffic flow prediction method based on convolutional neural network
CN106203627A (en) * 2016-07-08 2016-12-07 中国电子科技集团公司电子科学研究院 A kind of method that network target range is evaluated

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
一种组合型的深度学习模型学习率策略;贺昱曜等;《自动化学报》;20160630;第953-958页 *

Also Published As

Publication number Publication date
CN108205706A (en) 2018-06-26

Similar Documents

Publication Publication Date Title
WO2018112699A1 (en) Artificial neural network reverse training device and method
US10713568B2 (en) Apparatus and method for executing reversal training of artificial neural network
US20240005168A1 (en) Control sequence generation system and methods
Yu et al. LLR: Learning learning rates by LSTM for training neural networks
US10643129B2 (en) Apparatus and methods for training in convolutional neural networks
CN108205706B (en) Artificial neural network reverse training device and method
US20190370664A1 (en) Operation method
CN107316078B (en) Apparatus and method for performing artificial neural network self-learning operation
US20180260709A1 (en) Calculating device and method for a sparsely connected artificial neural network
EP3451236A1 (en) Method and device for executing forwarding operation of fully-connected layered neural network
JPH0713949A (en) Neural network and its usage method
CN107203809A (en) A kind of deep learning automation parameter adjustment method and system based on Keras
CN106022465A (en) Extreme learning machine method for improving artificial bee colony optimization
US20190130274A1 (en) Apparatus and methods for backward propagation in neural networks supporting discrete data
CN103544528A (en) BP neural-network classification method based on Hadoop
CN119299327B (en) Method, device and storage medium for predicting twin network performance based on graph neural network
CN117670586B (en) Power grid node carbon factor prediction method and system based on graph neural network
KR102263598B1 (en) Deep learning apparatus for ANN with pipeline architecture
CN117556865A (en) Stochastic multi-agent graphics game method and system based on reinforcement learning
CN114547954A (en) Method, device and computer equipment for site selection of logistics distribution center
US11995554B2 (en) Apparatus and methods for backward propagation in neural networks supporting discrete data
CN110610231A (en) Information processing method, electronic equipment and storage medium
CN104050508B (en) Self-adaptive wavelet kernel neural network tracking control method based on KLMS
CN116805157B (en) Unmanned cluster autonomous dynamic assessment method and device
CN102521203A (en) Hierarchical reinforcement learning task graph evolution method based on cause-and-effect diagram

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant