CN117725974A

CN117725974A - Modularized neural network training platform based on multipath NPU

Info

Publication number: CN117725974A
Application number: CN202311831882.1A
Authority: CN
Inventors: 杜江; 李汪军
Original assignee: Chengdu Meishu Technology Co ltd
Current assignee: Chengdu Meishu Technology Co ltd
Priority date: 2023-12-28
Filing date: 2023-12-28
Publication date: 2024-03-19

Abstract

The invention discloses a modularized neural network training platform based on multiple NPUs, which relates to the technical field of neural network training, and trains a neural network model to be trained by adopting a global optimal network parameter guiding algorithm, a parallel data optimizing algorithm and a parallel weight optimizing algorithm based on the multiple NPUs, so that the training effect of the neural network model is improved, the training time of the neural network is effectively shortened, and the training effect of the neural network is improved.

Description

A modular neural network training platform based on multi-channel NPU

技术领域Technical field

本发明涉及图像识别以及神经网络技术领域，具体涉及一种基于多路NPU的模块化神经网络训练平台。The invention relates to the technical fields of image recognition and neural networks, and in particular to a modular neural network training platform based on multi-channel NPU.

背景技术Background technique

随着海量数据的出现，人工智能技术迅速发展。机器学习等是人工智能发展到一定阶段的必然产物，其致力于通过计算的手段，从大量数据中挖掘有价值的潜在信息。With the emergence of massive data, artificial intelligence technology is developing rapidly. Machine learning is an inevitable product of the development of artificial intelligence to a certain stage. It is committed to mining valuable potential information from large amounts of data through computing methods.

在现有技术中，图像识别已经被广泛使用，如构建图像识别神经网络进行零件分类、缺陷识别以及人像识别等等，在使用这些神经网络进行图像识别之前，往往需要对神经网络进行训练。神经网络训练指对人工神经网络训练。向神经网络输人足够多的样本，通过一定算法调整网络的结构(主要是调节权值)，使网络的输出与预期值相符，这样的过程就是神经网络训练。而现有图像识别神经网络的训练过程主要通过多次对神经网络进行梯度下降训练，以获取训练好的神经网络，但是此过程存在耗时较久，且训练效果不佳的问题，从而导致图像识别效果不佳。In the existing technology, image recognition has been widely used, such as building image recognition neural networks for parts classification, defect identification, portrait recognition, etc. Before using these neural networks for image recognition, it is often necessary to train the neural networks. Neural network training refers to the training of artificial neural networks. Input enough samples into the neural network, and adjust the structure of the network (mainly adjusting the weights) through a certain algorithm so that the output of the network matches the expected value. This process is neural network training. The training process of the existing image recognition neural network mainly involves gradient descent training of the neural network multiple times to obtain a trained neural network. However, this process takes a long time and the training effect is poor, resulting in image The recognition effect is not good.

发明内容Contents of the invention

本发明的目的在于提供一种基于多路NPU的模块化神经网络训练平台，解决了现有技术中存在的问题。The purpose of the present invention is to provide a modular neural network training platform based on multi-channel NPU, which solves the problems existing in the existing technology.

本发明通过下述技术方案实现：The present invention is realized through the following technical solutions:

一种基于多路NPU的模块化神经网络训练平台，包括训练数据及NPU数据获取模块、划分模块、第一核心初始化模块、第一更新模块、第二核心初始化模块、第二更新模块、循环模块以及输出模块；A modular neural network training platform based on multi-channel NPU, including training data and NPU data acquisition module, division module, first core initialization module, first update module, second core initialization module, second update module, and loop module and output modules;

所述训练数据及NPU数据获取模块用于，获取待训练神经网络模型，并确定多路NPU中空闲的核心，得到多个目标核心；The training data and NPU data acquisition module is used to obtain the neural network model to be trained, and determine the idle cores in the multi-channel NPU to obtain multiple target cores;

所述划分模块用于，将多个所述目标核心划分为两部分，得到第一核心组以及第二核心组，所述第一核心组用于进行第一并行训练，所述第二核心组用于进行第二并行训练；The dividing module is used to divide the plurality of target cores into two parts to obtain a first core group and a second core group. The first core group is used for first parallel training, and the second core group For second parallel training;

所述第一核心初始化模块用于，在第一核心组中每个第一核心上部署多个待训练神经网络模型，得到第一待训练神经网络模型，并分别为多个第一待训练神经网络模型生成初始的网络参数值，得到第一待训练神经网络模型对应的第一网络参数向量，完成第一核心的初始化；The first core initialization module is used to deploy a plurality of neural network models to be trained on each first core in the first core group, obtain a first neural network model to be trained, and provide a plurality of first neural network models to be trained respectively. The network model generates initial network parameter values, obtains the first network parameter vector corresponding to the first neural network model to be trained, and completes the initialization of the first core;

所述第一更新模块用于，采用并行运行的方式同时运行所有第一核心，且针对第一核心中每个第一待训练神经网络模型，采用全局最优网络参数引导算法以及并行数据优化算法对每个第一待训练神经网络模型的第一网络参数向量进行第一更新，直至第一更新的次数到达上限，根据第一网络参数向量确定每个第一核心中的局部最优网络参数向量；The first update module is used to run all the first cores simultaneously in a parallel operation manner, and for each first neural network model to be trained in the first core, adopt a global optimal network parameter guidance algorithm and a parallel data optimization algorithm. The first network parameter vector of each first to-be-trained neural network model is first updated until the number of first updates reaches the upper limit, and the local optimal network parameter vector in each first core is determined based on the first network parameter vector. ;

所述第二核心初始化模块用于，在第二核心组中第二核心上部署多个待训练神经网络模型，得到第二待训练神经网络模型，根据第一核心中的局部最优网络参数向量，采用归约法确定全局最优网络参数向量，并将该全局最优网络参数向量作为第二核心上第二待训练神经网络模型初始的网络参数，得到第二待训练神经网络模型对应的第二网络参数向量，完成第二核心的初始化；The second core initialization module is used to deploy multiple neural network models to be trained on the second core in the second core group to obtain a second neural network model to be trained, based on the local optimal network parameter vector in the first core , use the reduction method to determine the global optimal network parameter vector, and use the global optimal network parameter vector as the initial network parameters of the second neural network model to be trained on the second core to obtain the second neural network model corresponding to the second to be trained. Network parameter vector to complete the initialization of the second core;

所述第二更新模块用于，采用并行运行的方式同时运行所有第二核心，且针对第二核心中每个第二待训练神经网络模型，采用并行权重优化算法对每个第二待训练神经网络模型的第二网络参数向量进行第二更新，直至第二更新的次数到达上限或者满足迭代结束要求，结束第二更新；The second update module is used to run all second cores simultaneously in a parallel operation manner, and for each second neural network model to be trained in the second core, use a parallel weight optimization algorithm to update each second neural network model to be trained. The second network parameter vector of the network model is updated for the second time until the number of second updates reaches the upper limit or the iteration end requirements are met, and the second update is ended;

所述循环模块用于，当第二更新的次数到达上限时，将第二核心中的第二网络参数向量作为全局最优网络参数向量，并返回第一更新步骤；The loop module is configured to, when the number of second updates reaches the upper limit, use the second network parameter vector in the second core as the global optimal network parameter vector, and return to the first update step;

所述输出模块用于，当满足迭代结束要求时，将第二核心中的第二网络参数向量作为待训练神经网络的最终网络参数，完成待训练神经网络模型的训练。The output module is used to, when iteration end requirements are met, use the second network parameter vector in the second core as the final network parameters of the neural network to be trained, and complete the training of the neural network model to be trained.

在一种可能的实施方式中，为多个第一待训练神经网络模型生成初始的网络参数值，包括：采用在网络参数上限与网络参数下限之间随机生成网络参数的方法生成第一待训练神经网络模型的网络参数值，或者采用混略序列策略生成第一待训练神经网络模型的网络参数值。In a possible implementation, generating initial network parameter values for multiple first neural network models to be trained includes: generating the first to be trained using a method of randomly generating network parameters between the upper limit of the network parameter and the lower limit of the network parameter. Network parameter values of the neural network model, or a mixed sequence strategy is used to generate network parameter values of the first neural network model to be trained.

在一种可能的实施方式中，采用全局最优网络参数引导算法以及并行数据优化算法对每个第一待训练神经网络模型的第一网络参数向量进行第一更新，直至第一更新的次数到达上限，根据第一网络参数向量确定每个第一核心中的局部最优网络参数向量，包括：In a possible implementation, a globally optimal network parameter guidance algorithm and a parallel data optimization algorithm are used to perform a first update on the first network parameter vector of each first neural network model to be trained until the number of first updates is reached. The upper limit is to determine the local optimal network parameter vector in each first core according to the first network parameter vector, including:

A1、并行获取第一待训练神经网络模型的第一网络参数向量所对应的第一适应度；A1. Obtain the first fitness corresponding to the first network parameter vector of the first neural network model to be trained in parallel;

A2、根据第一适应度确定每个第一网络参数向量的第一历史最优值，得到第一历史最优值以及第一历史最优值对应的第一适应度；A2. Determine the first historical optimal value of each first network parameter vector according to the first fitness, and obtain the first historical optimal value and the first fitness corresponding to the first historical optimal value;

A3、根据第一网络参数向量对应的第一历史最优值所对应的第一适应度，采用归约法确定第一核心中的局部最优网络参数向量；A3. According to the first fitness corresponding to the first historical optimal value corresponding to the first network parameter vector, use the reduction method to determine the local optimal network parameter vector in the first core;

A4、判断是否到达第一最大更新次数阈值，若是，则输出第一核心中的局部最优网络参数向量，否则采用全局最优网络参数引导算法对第一网络参数向量进行更新，并返回步骤A1。A4. Determine whether the first maximum update times threshold has been reached. If so, output the local optimal network parameter vector in the first core. Otherwise, use the global optimal network parameter guidance algorithm to update the first network parameter vector and return to step A1. .

在一种可能的实施方式中，根据第一网络参数向量对应的第一历史最优值所对应的第一适应度，采用归约法确定第一核心中的局部最优网络参数向量，包括：In a possible implementation, a reduction method is used to determine the local optimal network parameter vector in the first core based on the first fitness corresponding to the first historical optimal value corresponding to the first network parameter vector, including:

根据第一网络参数向量对应的第一历史最优值所对应的第一适应度，得到n个第一适应度；According to the first fitness corresponding to the first historical optimal value corresponding to the first network parameter vector, n first fitnesses are obtained;

将第i个第一适应度与第n-i个第一适应度进行比较，选取较大的第一适应度进入下一轮比较，直至选取的第一适应度的数量仅为一个，得到第一核心中的局部最优网络参数向量；其中，当n为奇数时，中间的数直接进入下一轮比较。Compare the i-th first fitness with the n-i-th first fitness, select the larger first fitness to enter the next round of comparison, until the number of selected first fitnesses is only one, and the first core is obtained The local optimal network parameter vector in ; among them, when n is an odd number, the middle number directly enters the next round of comparison.

在一种可能的实施方式中，采用全局最优网络参数引导算法对第一网络参数向量进行更新包括：In a possible implementation, using the global optimal network parameter guidance algorithm to update the first network parameter vector includes:

对第i个第一网络参数向量执行第一引导更新为：Performing the first guided update on the i-th first network parameter vector is:

其中，f(t)表示第t次局部训练时产生的[0,1]之间的随机数，且所有f(t)均匀分布，X_i(t)表示第t次局部训练时第i个第一网络参数向量，i＝1,2,…,L，L表示第一网络参数向量的总数，X_i(t+1)表示更新后的X_i(t)；X_i,best(t)表示在前t次局部训练过程第i个第一网络参数向量对应的适应度最大的状态，即第i个第一网络参数向量的历史最优值；表示所有第一网络参数向量的历史最优值对应的平均值；X_best表示全局最优值，当第一次循环过程中，此时各第一核心独立运行，并不知道全局最优值，因此将第一核心中的局部最优网络参数向量作为全局最优值；ω(t)表示第t次局部训练时的惯性权重；Among them, f(t) represents the random number between [0,1] generated during the t-th local training, and all f(t) are uniformly distributed, and X _i (t) represents the i-th random number during the t-th local training. The first network parameter vector, i=1,2,...,L, L represents the total number of the first network parameter vector, X _i (t+1) represents the updated X _i (t); X _{i, best} (t) Indicates the state of maximum fitness corresponding to the i-th first network parameter vector in the previous t local training processes, that is, the historical optimal value of the i-th first network parameter vector; represents the average value corresponding to the historical optimal values of all first network parameter vectors; X _best represents the global optimal value. During the first cycle, each first core runs independently at this time and does not know the global optimal value. Therefore, the local optimal network parameter vector in the first core is regarded as the global optimal value; ω(t) represents the inertia weight during the t-th local training;

将所有第一网络参数向量分割为两部分，一部分为K个较优网络参数向量，另一部分为K个较劣网络参数向量；所述K个较优网络参数向量为适应度最大的一半第一网络参数向量，当第一网络参数向量为奇数时，适应度最大的第一网络参数向量不参与分割；Divide all the first network parameter vectors into two parts, one part is K better network parameter vectors, and the other part is K worse network parameter vectors; the K better network parameter vectors are the first half with the largest fitness Network parameter vector, when the first network parameter vector is an odd number, the first network parameter vector with the largest fitness does not participate in segmentation;

根据K个较优网络参数向量，获取网络参数中心向量为：According to K optimal network parameter vectors, the network parameter center vector is obtained as:

其中，C(t)表示网络参数中心向量，X_k(t)表示第k个较优网络参数，F(X_k(t)表示第k个较优网络参数的适应度；Among them, C(t) represents the network parameter center vector, X _k (t) represents the kth better network parameter, and F(X _k (t) represents the fitness of the kth better network parameter;

根据网络参数中心向量C(t)，对较劣网络参数向量执行第二引导更新为：According to the network parameter center vector C(t), the second guided update is performed on the inferior network parameter vector as:

X_k'(t+1)＝X_k'(t)+rand*(C(t)-X_k'(t))X _k' (t+1)＝X _k' (t)+rand*(C(t)-X _k' (t))

其中，X_k'(t)表示第k'个较劣网络参数向量，X_k'(t+1)表示更新后的X_k'(t)，rand表示(0,1)之间的随机数；Among them, X _k' (t) represents the k'th worse network parameter vector, X _k' (t+1) represents the updated X _k' (t), and rand represents a random number between (0,1) ;

判断较劣网络参数向量在第二引导更新后的适应度是否大于第二引导更新前的适应度，若是，则接受较劣网络参数向量X_k'(t)的第二引导更新，完成对第一网络参数向量的更新，否则拒绝较劣网络参数向量X_k'(t)的第二引导更新，完成对第一网络参数向量的更新。Determine whether the fitness of the inferior network parameter vector after the second guidance update is greater than the fitness before the second guidance update. If so, accept the second guidance update of the inferior network parameter vector X _k' (t) to complete the first An update of the network parameter vector, otherwise the second guided update of the inferior network parameter vector X _k' (t) is rejected to complete the update of the first network parameter vector.

在一种可能的实施方式中，所述惯性权重为：In a possible implementation, the inertia weight is:

其中，ω_max表示惯性权重最大值，ω_min表示惯性权重最小值，iter表示对第一网络参数向量进行更新的最大次数，t表示对第一网络参数向量进行更新的当前次数。Among them, ω _max represents the maximum value of the inertia weight, ω _min represents the minimum value of the inertia weight, iter represents the maximum number of times to update the first network parameter vector, and t represents the current number of times to update the first network parameter vector.

在一种可能的实施方式中，采用并行权重优化算法对每个第二待训练神经网络模型的第二网络参数向量进行第二更新，直至第二更新的次数到达上限或者满足迭代结束要求，包括：In a possible implementation, a parallel weight optimization algorithm is used to perform a second update on the second network parameter vector of each second neural network model to be trained until the number of second updates reaches the upper limit or the iteration end requirements are met, including :

B1、为第二核心上每个第二待训练神经网络模型构建一个任务节点，构建一个主任务节点，并为每个任务节点划分私有内存以及划分一个公共内存；B1. Build a task node for each second neural network model to be trained on the second core, build a main task node, and divide private memory and a public memory for each task node;

B2、初始化迭代计数器t'＝1，初始化尺度矩阵B_t'＝I；B2. Initialize the iteration counter t'=1, initialize the scale matrix B _t' =I;

B3、针对所有第二待训练神经网络模型，获取第二待训练神经网络模型对应的误差函数值E(w_t')，并将该误差函数值E(w_t')放入公共内存中，以供各个任务节点调用；其中，误差函数值E(w_t')通过任意任务节点上的第二待训练神经网络模型获取，w_t'表示第t'训练时第二待训练神经网络应用的第二网络参数向量；B3. For all the second neural network models to be trained, obtain the error function value E(w _t' ) corresponding to the second neural network model to be trained, and put the error function value E(w _t' ) into the common memory, To be called by each task node; among them, the error function value E(w _t' ) is obtained through the second neural network model to be trained on any task node, w _t' represents the second neural network to be trained during the t'th training. second network parameter vector;

B4、获取第二待训练神经网络模型的梯度矩阵并将尺度矩阵B_t'以及梯度矩阵g_t'放入公共内存中，以供各个任务节点调用；其中，梯度矩阵g_t'中的每列为第二待训练神经网络模型中一个网络参数的梯度；B4. Obtain the gradient matrix of the second neural network model to be trained. And put the scale matrix B _t' and the gradient matrix g _t' into the public memory for call by each task node; among them, each column in the gradient matrix g _t' is a network parameter of the second neural network model to be trained. gradient;

其中，I表示单位尺度矩阵，表示梯度算子，E(w_t')表示误差函数，w_t'表示第二待训练神经网络模型对应的权重向量，B_t'为M阶方阵，M表示第二待训练神经网络模型的权重维数；Among them, I represents the unit scale matrix, represents the gradient operator, E(w _t' ) represents the error function, w _t' represents the weight vector corresponding to the second neural network model to be trained, B _t' is an M-order square matrix, and M represents the second neural network model to be trained. weight dimension;

B5、各个任务节点并行运行，且为各个任务节点从公共内存中调用尺度矩阵B_t'以及梯度矩阵g_t'，确定第m个任务节点上第二待训练神经网络的搜索方向d_m为尺度矩阵B_t'中第m行数据与梯度矩阵g_t'中第m列数据的乘累加数据；其中，m＝1,2,…,M，搜索方向d_m存储于第m个任务节点对应的私有内存中；B5. Each task node runs in parallel, and calls the scale matrix B _t' and gradient matrix g _t' from the common memory for each task node to determine the search direction d _m of the second to-be-trained neural network on the m-th task node as the scale. The multiplication and accumulation data of the m-th row data in the matrix B _t' and the m-th column data in the gradient matrix g _t' ; where, m=1,2,...,M, the search direction d _m is stored in the m-th task node corresponding to in private memory;

B6、各个任务节点并行运行，且为各个任务节点从其私有内存中调取搜索方向d_m，根据搜索方向d_m并采用黄金分割法确定该任务节点上第二待训练神经网络模型中第m个网络参数对应的更新步长λ_m，并将更新步长λ_m存储于第m个任务节点对应的私有内存中；B6. Each task node runs in parallel, and the search direction d _m is retrieved from its private memory for each task node. According to the search direction d _m and the golden section method is used, the mth in the second neural network model to be trained on the task node is determined. The update step size λ _m corresponding to each network _parameter is stored in the private memory corresponding to the mth task node;

B7、各个任务节点并行运行，且为各个任务节点从其私有内存中调取更新步长λ_m，并根据更新步长λ_m对任务节点上第二待训练神经网络模型中第m个网络参数进行更新；根据第m个任务节点上第二待训练神经网络模型的第m个更新过的网络参数，获取第m个梯度分量；B7. Each task node runs in parallel, and the update step size λ _m is retrieved from its private memory for each task node, and the m-th network parameter in the second neural network model to be trained on the task node is updated according to the update step size λ _m . Update; obtain the m-th gradient component based on the m-th updated network parameter of the second to-be-trained neural network model on the m-th task node;

B9、各个任务节点并行运行，根据第m个梯度分量，重新确定第m个任务节点对应的梯度矩阵g_m，并根据该梯度矩阵g_m对第m个任务节点对应的尺度矩阵B_t'进行更新；B9. Each task node runs in parallel. According to the m-th gradient component, the gradient matrix g _m corresponding to the m-th task node is re-determined, and the scale matrix B _t' corresponding to the m-th task node is calculated based on the gradient matrix g _m . renew;

B10、通过各个任务节点输出尺度矩阵B_t'中第m列元素以及第二待训练神经网络模型对应的第m个网络参数至主任务节点；B10. Output the m-th column element in the scale matrix B _t' and the m-th network parameter corresponding to the second neural network model to be trained through each task node to the main task node;

B11、通过主任务节点将m个网络参数构成更新后的第二网络参数向量，并判断更新后的第二网络参数向量对应的适应度是否大于设定的阈值，若是，则满足迭代结束要求，并结束第二更新的流程，否则进入步骤B12；B11. Use the main task node to construct the m network parameters into an updated second network parameter vector, and determine whether the fitness corresponding to the updated second network parameter vector is greater than the set threshold. If so, the iteration end requirements are met. And end the second update process, otherwise enter step B12;

B12、判断迭代计数器t'的计数值是否大于上限，若是，则确定第二更新的次数到达上限，并结束第二更新的流程，否则令迭代计数器t'的计数值加一，将各个任务节点输出尺度矩阵B_m中第m列元素组成尺度矩阵B_t'，将更新后的第二网络参数向量作为所有任务节点上第二待训练神经网络模型的网络参数，并返回步骤B3。B12. Determine whether the count value of the iteration counter t' is greater than the upper limit. If so, determine that the number of second updates has reached the upper limit, and end the second update process. Otherwise, increase the count value of the iteration counter t' by one, and add each task node to The elements in the mth column of the output scale matrix B _m form the scale matrix B _t' , use the updated second network parameter vector as the network parameters of the second neural network model to be trained on all task nodes, and return to step B3.

在一种可能的实施方式中，获取第二待训练神经网络模型对应的误差函数值E(w_t')为：In a possible implementation, obtaining the error function value E(w _t' ) corresponding to the second neural network model to be trained is:

其中，s＝1,2,…,S，S表示训练样本总数，n＝1,2,…,N，N表示第二待训练神经网络模型对应的输出神经元总数；y_n(x_s,w_t')表示当第二待训练神经网络模型以训练样本x_s为输入，以w_t'为网络参数时，第二待训练神经网络模型第n个输出神经元的实际输出；d_sn表示y_n(x_s,w_t')对应的期望输出；d_max,n表示所有训练样本对应的第n个期望输出中的最大值，每个训练样本对应一个期望输出向量；d_min,n表示所有训练样本对应的第n个期望输出中的最小值。Among them, s=1,2,...,S, S represents the total number of training samples, n=1,2,...,N, N represents the total number of output neurons corresponding to the second neural network model to be trained; y _n (x _s , w _t' ) represents the actual output of the nth output neuron of the second neural network model to be trained when it takes the training sample x _s as input and w _t' as the network parameter; d _sn represents The expected output corresponding to y _n (x _s ,w _t' ); d _max,n represents the maximum value of the nth expected output corresponding to all training samples, and each training sample corresponds to an expected output vector; d _min,n represents The minimum value of the nth expected output corresponding to all training samples.

在一种可能的实施方式中，根据搜索方向d_m并采用黄金分割法确定该任务节点上第二待训练神经网络模型中第m个网络参数对应的更新步长λ_m，包括：In a possible implementation, the update step size λ _m corresponding to the m-th network parameter in the second neural network model to be trained on the task node is determined based on the search direction d _m and using the golden section method, including:

采用搜索方向d_m构建一个只有一行的搜索矩阵D，该搜索矩阵中第m个元素为搜索方向d_m，其他元素为0，搜索矩阵的元素总数与第二待训练神经网络模型的网络参数总数相同；Use the search direction d _m to construct a search matrix D with only one row. The m-th element in the search matrix is the search direction d _m , and the other elements are 0. The total number of elements of the search matrix is the same as the total number of network parameters of the second neural network model to be trained. same;

构建一个只有一列的步长矩阵λ'，该步长矩阵中第m个元素为待确定的更新步长λ_m，其他元素为0，步长矩阵的元素总数与第二待训练神经网络模型的网络参数总数相同Construct a step matrix λ' with only one column. The m-th element in the step matrix is the update step λ _m to be determined, and the other elements are 0. The total number of elements of the step matrix is the same as that of the second neural network model to be trained. The total number of network parameters is the same

根据搜索矩阵D以及搜索矩阵D构建求解条件并对该求解条件进行求解，确定更新步长λ_m；Construct solution conditions based on search matrix D and search matrix D And solve the solution condition to determine the update step size λ _m ;

根据更新步长λ_m对任务节点上第二待训练神经网络模型中第m个网络参数进行更新为：According to the update step size λ _m , the m-th network parameter in the second neural network model to be trained on the task node is updated as:

w_m'＝w_m+λ_md_m w _m '=w _m +λ _m d _m

其中，w_m表示第二待训练神经网络模型中第m个网络参数，w_m'表示更新后的w_m。Among them, w _m represents the m-th network parameter in the second neural network model to be trained, and w _m ' represents the updated w _m .

在一种可能的实施方式中，根据第m个梯度分量，重新确定第m个任务节点对应的梯度矩阵g_m，并根据该梯度矩阵g_m对第m个任务节点对应的尺度矩阵B_t'进行更新，包括：In a possible implementation, the gradient matrix g _m corresponding to the m -th task node is re-determined based on the m -th gradient component, and the scale matrix B _t' corresponding to the m -th task node is calculated based on the gradient matrix g _m Make updates including:

以梯度矩阵g_t'为基础，采用第m个梯度分量更新梯度矩阵g_t'中第m个元素，得到第m个任务节点对应的梯度矩阵g_m；Based on the gradient matrix g _t' , use the m-th gradient component to update the m-th element in the gradient matrix g _t' , and obtain the gradient matrix g _m corresponding to the m-th task node;

根据所述第m个任务节点对应的梯度矩阵g_m，对第m个任务节点对应的尺度矩阵B_t'进行更新为：According to the gradient matrix g _m corresponding to the m-th task node, the scale matrix B _t' corresponding to the m-th task node is updated as:

其中，B_t''表示更新后的B_t'，W表示权重差量矩阵，W＝w_t''-w_t'，w_t''表示根据更新步长λ_m更新过后的第二待训练神经网络模型的权重向量，w_t'表示更新前的第二待训练神经网络模型的权重向量，T表示转置，Z＝g_m-g_t'，Z表示梯度差量矩阵。Among them, B _t' ' represents the updated B _t' , W represents the weight difference matrix, W=w _t' '-w _t' , w _t'' represents the second to be trained updated according to the update step size λ _m The weight vector of the neural network model, w _t' represents the weight vector of the second neural network model to be trained before updating, T represents the transpose, Z=g _m -g _t' , and Z represents the gradient difference matrix.

本发明提供的一种基于多路NPU的模块化神经网络训练平台，基于多路NPU，并采用全局最优网络参数引导算法、并行数据优化算法以及并行权重优化算法对待训练的神经网络模型进行训练，不仅提高了神经网络模型的训练效果，还有效地减少了神经网络的训练时间，提高了神经网络的训练效果，从而提高了图像识别的准确率。The invention provides a modular neural network training platform based on multi-channel NPU, which is based on multi-channel NPU and adopts the global optimal network parameter guidance algorithm, parallel data optimization algorithm and parallel weight optimization algorithm to train the neural network model to be trained. , not only improves the training effect of the neural network model, but also effectively reduces the training time of the neural network, improves the training effect of the neural network, thereby improving the accuracy of image recognition.

附图说明Description of the drawings

为了更清楚地说明本发明示例性实施方式的技术方案，下面将对实施例中所需要使用的附图作简单地介绍，应当理解，以下附图仅示出了本发明的某些实施例，因此不应被看作是对范围的限定，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他相关的附图。In order to more clearly illustrate the technical solutions of the exemplary embodiments of the present invention, the drawings required to be used in the embodiments will be briefly introduced below. It should be understood that the following drawings only show some embodiments of the present invention. Therefore, it should not be regarded as limiting the scope. For those of ordinary skill in the art, other relevant drawings can be obtained based on these drawings without exerting creative efforts.

图1为本发明实施例提供的一种基于多路NPU的模块化神经网络训练平台的结构示意图。Figure 1 is a schematic structural diagram of a multi-channel NPU-based modular neural network training platform provided by an embodiment of the present invention.

附图中标记及对应的零部件名称：Marks and corresponding parts names in the attached drawings:

1-训练数据及NPU数据获取模块、2-划分模块、3-第一核心初始化模块、4-第一更新模块、5-第二核心初始化模块、6-第二更新模块、7-循环模块、8-输出模块。1-Training data and NPU data acquisition module, 2-Division module, 3-First core initialization module, 4-First update module, 5-Second core initialization module, 6-Second update module, 7-Cycle module, 8-Output module.

具体实施方式Detailed ways

为使本发明的目的、技术方案和优点更加清楚明白，下面结合实施例和附图，对本发明作进一步的详细说明，本发明的示意性实施方式及其说明仅用于解释本发明，并不作为对本发明的限定。In order to make the purpose, technical solutions and advantages of the present invention more clear, the present invention will be further described in detail below in conjunction with the examples and drawings. The schematic embodiments of the present invention and their descriptions are only used to explain the present invention and do not as a limitation of the invention.

如图1所示，一种基于多路NPU(Neural Processor Unit，神经网络处理器)的模块化神经网络训练平台，包括训练数据及NPU数据获取模块1、划分模块2、第一核心初始化模块3、第一更新模块4、第二核心初始化模块5、第二更新模块6、循环模块7以及输出模块8。As shown in Figure 1, a modular neural network training platform based on multi-channel NPU (Neural Processor Unit, neural network processor) includes training data and NPU data acquisition module 1, division module 2, and first core initialization module 3 , the first update module 4, the second core initialization module 5, the second update module 6, the loop module 7 and the output module 8.

训练数据及NPU数据获取模块1用于，获取待训练神经网络模型，并确定多路NPU中空闲的核心，得到多个目标核心。The training data and NPU data acquisition module 1 is used to obtain the neural network model to be trained, determine the idle cores in the multi-channel NPU, and obtain multiple target cores.

可选的，训练数据及NPU数据获取模块1还用于获取训练数据，该训练数据用于对待训练神经网络模型进行训练。Optionally, the training data and NPU data acquisition module 1 is also used to acquire training data, which is used to train the neural network model to be trained.

划分模块2用于，将多个目标核心划分为两部分，得到第一核心组以及第二核心组，第一核心组用于进行第一并行训练，第二核心组用于进行第二并行训练。The division module 2 is used to divide multiple target cores into two parts to obtain a first core group and a second core group. The first core group is used for first parallel training, and the second core group is used for second parallel training. .

由于第二核心组的数据计算量小于第一核心组的数据计算量，因此可以根据实际情况对核心进行划分，如，第一核心组中的第一核心数量与第二核心组中的第二核心数量之比可以为7:3或者8:2等。Since the data calculation amount of the second core group is less than the data calculation amount of the first core group, the cores can be divided according to the actual situation, for example, the first number of cores in the first core group and the second number of cores in the second core group. The ratio of the number of cores can be 7:3 or 8:2, etc.

可选的，在划分核心组的时候，可以尽量避免一路NPU上多个核心位于不同的核心组上。Optionally, when dividing core groups, you can try to avoid multiple cores on one NPU being located in different core groups.

第一核心初始化模块3用于，在第一核心组中每个第一核心上部署多个待训练神经网络模型，得到第一待训练神经网络模型，并分别为多个第一待训练神经网络模型生成初始的网络参数值，得到第一待训练神经网络模型对应的第一网络参数向量，完成第一核心的初始化。The first core initialization module 3 is configured to deploy a plurality of neural network models to be trained on each first core in the first core group, obtain a first neural network model to be trained, and provide a plurality of first neural networks to be trained respectively. The model generates initial network parameter values, obtains the first network parameter vector corresponding to the first neural network model to be trained, and completes the initialization of the first core.

可选的，可以设置网络参数阈值上限以及网络参数阈值下限，在网络参数阈值上限与网络参数阈值下限之间，采用随机生成的方式生成第一待训练神经网络模型对应的第一网络参数向量，此处的向量可以理解为一个包括第一待训练神经网络模型所有网络参数的单行矩阵。也可以采用类似与混沌序列等初始化方法对第一待训练神经网络模型对应的第一网络参数向量进行初始化。Optionally, the upper limit of the network parameter threshold and the lower limit of the network parameter threshold can be set. Between the upper limit of the network parameter threshold and the lower limit of the network parameter threshold, a first network parameter vector corresponding to the first neural network model to be trained is generated in a randomly generated manner, The vector here can be understood as a single-row matrix including all network parameters of the first neural network model to be trained. An initialization method similar to chaos sequence may also be used to initialize the first network parameter vector corresponding to the first neural network model to be trained.

第一更新模块4用于，采用并行运行的方式同时运行所有第一核心，且针对第一核心中每个第一待训练神经网络模型，采用全局最优网络参数引导算法以及并行数据优化算法对每个第一待训练神经网络模型的第一网络参数向量进行第一更新，直至第一更新的次数到达上限，根据第一网络参数向量确定每个第一核心中的局部最优网络参数向量。The first update module 4 is configured to run all first cores simultaneously in a parallel operation manner, and for each first neural network model to be trained in the first core, use a global optimal network parameter guidance algorithm and a parallel data optimization algorithm to The first network parameter vector of each first neural network model to be trained is first updated until the number of first updates reaches the upper limit, and the local optimal network parameter vector in each first core is determined according to the first network parameter vector.

在现有技术中，训练图像识别神经网络时(如：训练卷积神经网络用于产品分类或者人像识别时)，传统优化算法不仅需要串行处理数据，还存在容易陷入局部最优的问题，导致神经网络的训练效果差以及最终的图像识别效果差。因此本实施例采用全局最优网络参数引导算法以及并行数据优化算法对每个第一待训练神经网络模型的第一网络参数向量进行第一更新，不仅优化了神经网络的训练效果，还有效地提高了训练的效率。In the existing technology, when training image recognition neural networks (such as when training convolutional neural networks for product classification or portrait recognition), traditional optimization algorithms not only need to process data serially, but also have the problem of easily falling into local optima. This leads to poor training effect of the neural network and poor final image recognition effect. Therefore, this embodiment uses the global optimal network parameter guidance algorithm and the parallel data optimization algorithm to perform the first update of the first network parameter vector of each first neural network model to be trained, which not only optimizes the training effect of the neural network, but also effectively Improved training efficiency.

第二核心初始化模块5用于，在第二核心组中第二核心上部署多个待训练神经网络模型，得到第二待训练神经网络模型，根据第一核心中的局部最优网络参数向量，采用归约法确定全局最优网络参数向量，并将该全局最优网络参数向量作为第二核心上第二待训练神经网络模型初始的网络参数，得到第二待训练神经网络模型对应的第二网络参数向量，完成第二核心的初始化。The second core initialization module 5 is used to deploy multiple neural network models to be trained on the second core in the second core group to obtain the second neural network model to be trained. According to the local optimal network parameter vector in the first core, Use the reduction method to determine the global optimal network parameter vector, and use the global optimal network parameter vector as the initial network parameters of the second neural network model to be trained on the second core to obtain the second network corresponding to the second neural network model to be trained. Parameter vector to complete the initialization of the second core.

第一核心中的局部最优网络参数向量是多个第一核心并行运行获取的网络参数向量，可以看作已经经过多次训练，但是并未到达最佳位置，因此可以将局部最优网络参数向量中的最优值作为第二网络参数向量，并采用并行权重优化算法对每个第二待训练神经网络模型的第二网络参数向量进行第二更新，使网络参数进一步优化，实现预期的训练效果，同时也保持了较高的训练效率。The local optimal network parameter vector in the first core is the network parameter vector obtained by running multiple first cores in parallel. It can be regarded as having been trained multiple times, but has not reached the optimal position. Therefore, the local optimal network parameter can be The optimal value in the vector is used as the second network parameter vector, and a parallel weight optimization algorithm is used to update the second network parameter vector of each second neural network model to be trained to further optimize the network parameters and achieve the expected training effect while maintaining a high training efficiency.

第二更新模块6用于，采用并行运行的方式同时运行所有第二核心，且针对第二核心中每个第二待训练神经网络模型，采用并行权重优化算法对每个第二待训练神经网络模型的第二网络参数向量进行第二更新，直至第二更新的次数到达上限或者满足迭代结束要求，结束第二更新。The second update module 6 is configured to run all second cores simultaneously in a parallel operation manner, and for each second neural network model to be trained in the second core, use a parallel weight optimization algorithm to update each second neural network model to be trained. The second network parameter vector of the model is updated for the second time until the number of second updates reaches the upper limit or the iteration end requirement is met, and the second update is ended.

传统算法在训练图像识别网络的过程中，往往构建内核函数进行数据的计算，并且通过串行运算的方式，反复调度某个内核函数进行运算，从而实现数据更新，此种方法简单易实现，但是效率较低，因此本实施例采用并行权重优化算法对每个第二待训练神经网络模型的第二网络参数向量进行第二更新。In the process of training the image recognition network, traditional algorithms often construct kernel functions to calculate data, and repeatedly schedule a kernel function to perform operations through serial operations to achieve data updates. This method is simple and easy to implement, but The efficiency is low, so this embodiment uses a parallel weight optimization algorithm to perform a second update on the second network parameter vector of each second neural network model to be trained.

循环模块7用于，当第二更新的次数到达上限时，将第二核心中的第二网络参数向量作为全局最优网络参数向量，并返回第一更新步骤。The loop module 7 is configured to, when the number of second updates reaches the upper limit, use the second network parameter vector in the second core as the global optimal network parameter vector, and return to the first update step.

输出模块8用于，当满足迭代结束要求时，将第二核心中的第二网络参数向量作为待训练神经网络的最终网络参数，完成待训练神经网络模型的训练。The output module 8 is used to, when the iteration end requirements are met, use the second network parameter vector in the second core as the final network parameters of the neural network to be trained, and complete the training of the neural network model to be trained.

首先采用全局最优网络参数引导算法进行搜索，然后从每个第一核心中确定一个局部最优网络参数向量，再根据所有局部最优网络参数向量，确定本次整体循环过程中的全局最优网络参数向量，再采用并行权重优化算法对全局最优网络参数向量进行精细搜索，从而获取目标精度的网络参数。若是精细搜索达到一定次数后，未达到目标精度，则进入下一个整体循环。First, the global optimal network parameter guidance algorithm is used to search, and then a local optimal network parameter vector is determined from each first core, and then the global optimal in this overall cycle is determined based on all local optimal network parameter vectors. Network parameter vector, and then use a parallel weight optimization algorithm to conduct a fine search for the global optimal network parameter vector to obtain the network parameters with target accuracy. If the target accuracy is not reached after a certain number of fine searches, the next overall cycle will be entered.

A1、并行获取第一待训练神经网络模型的第一网络参数向量所对应的第一适应度。A1. Obtain the first fitness corresponding to the first network parameter vector of the first neural network model to be trained in parallel.

可选的，第一待训练神经网络模型的适应度函数可以为其误差函数的负值，例如，当第一待训练神经网络模型未BP神经网络时，则其误差函数可以为：E表示误差，p＝1,2,…,P，P表示输入数据总数，q＝1,2,…,Q，Q表示BP神经网络的输出总数；/>表示BP神经网络在第p个数据输入时第q个输出神经元对应的实际输出，y'_pq表示BP神经网络在第p个数据输入时第q个输出神经元对应的期望输出；那么适应度函数就可以为/> Optionally, the fitness function of the first neural network model to be trained can be the negative value of its error function. For example, when the first neural network model to be trained does not have a BP neural network, its error function can be: E represents the error, p=1,2,…,P, P represents the total number of input data, q=1,2,…,Q, Q represents the total output of the BP neural network;/> represents the actual output corresponding to the q-th output neuron of the BP neural network when the p-th data is input, y' _pq represents the expected output corresponding to the q-th output neuron of the BP neural network when the p-th data is input; then the fitness The function can be/>

A2、根据第一适应度确定每个第一网络参数向量的第一历史最优值，得到第一历史最优值以及第一历史最优值对应的第一适应度。A2. Determine the first historical optimal value of each first network parameter vector according to the first fitness, and obtain the first historical optimal value and the first fitness corresponding to the first historical optimal value.

该第一历史最优值也参与后续的更新，当第一网络参数向量第一次更新时，其历史最优值为其本身。The first historical optimal value also participates in subsequent updates. When the first network parameter vector is updated for the first time, its historical optimal value is itself.

A3、根据第一网络参数向量对应的第一历史最优值所对应的第一适应度，采用归约法确定第一核心中的局部最优网络参数向量。A3. According to the first fitness corresponding to the first historical optimal value corresponding to the first network parameter vector, use the reduction method to determine the local optimal network parameter vector in the first core.

根据第一网络参数向量对应的第一历史最优值所对应的第一适应度，得到n个第一适应度。According to the first fitness corresponding to the first historical optimal value corresponding to the first network parameter vector, n first fitnesses are obtained.

将第i个第一适应度与第n-i个第一适应度进行比较，选取较大的第一适应度进入下一轮比较，直至选取的第一适应度的数量仅为一个，得到第一核心中的局部最优网络参数向量。其中，当n为奇数时，中间的数直接进入下一轮比较。Compare the i-th first fitness with the n-i-th first fitness, select the larger first fitness to enter the next round of comparison, until the number of selected first fitnesses is only one, and the first core is obtained The local optimal network parameter vector in . Among them, when n is an odd number, the middle number directly enters the next round of comparison.

通过归约法确定第一核心中的局部最优网络参数向量，可以有效减少数据的对比量，节约了计算资源。Determining the local optimal network parameter vector in the first core through the reduction method can effectively reduce the amount of data comparison and save computing resources.

其中，f(t)表示第t次局部训练时产生的[0,1]之间的随机数，且所有f(t)均匀分布，X_i(t)表示第t次局部训练时第i个第一网络参数向量，i＝1,2,…,L，L表示第一网络参数向量的总数，X_i(t+1)表示更新后的X_i(t)。X_i,best(t)表示在前t次局部训练过程第i个第一网络参数向量对应的适应度最大的状态，即第i个第一网络参数向量的历史最优值。表示所有第一网络参数向量的历史最优值对应的平均值。X_best表示全局最优值，当第一次循环过程中，此时各第一核心独立运行，并不知道全局最优值，因此将第一核心中的局部最优网络参数向量作为全局最优值。ω(t)表示第t次局部训练时的惯性权重。Among them, f(t) represents the random number between [0,1] generated during the t-th local training, and all f(t) are uniformly distributed, and X _i (t) represents the i-th random number during the t-th local training. The first network parameter vector, i=1,2,...,L, L represents the total number of the first network parameter vector, and X _i (t+1) represents the updated X _i (t). X _i,best (t) represents the state of maximum fitness corresponding to the i-th first network parameter vector in the previous t local training processes, that is, the historical optimal value of the i-th first network parameter vector. Represents the average value corresponding to the historical optimal values of all first network parameter vectors. X _best represents the global optimal value. During the first cycle, each first core runs independently and does not know the global optimal value. Therefore, the local optimal network parameter vector in the first core is regarded as the global optimal value. value. ω(t) represents the inertia weight during the t-th local training.

本实施例提供的第一引导更新改善了迭代后期的局部开发能力，避免算法过早收敛而陷入局部最优。The first guidance update provided by this embodiment improves the local development capability in the late iteration period and prevents the algorithm from prematurely converging and falling into a local optimum.

将所有第一网络参数向量分割为两部分，一部分为K个较优网络参数向量，另一部分为K个较劣网络参数向量。K个较优网络参数向量为适应度最大的一半第一网络参数向量，当第一网络参数向量为奇数时，适应度最大的第一网络参数向量不参与分割。Divide all first network parameter vectors into two parts, one part is K better network parameter vectors, and the other part is K worse network parameter vectors. The K better network parameter vectors are half of the first network parameter vectors with the greatest fitness. When the first network parameter vector is an odd number, the first network parameter vector with the greatest fitness does not participate in the segmentation.

其中，C(t)表示网络参数中心向量，X_k(t)表示第k个较优网络参数，F(X_k(t)表示第k个较优网络参数的适应度。Among them, C(t) represents the network parameter center vector, X _k (t) represents the kth better network parameter, and F(X _k (t) represents the fitness of the kth better network parameter.

其中，X_k'(t)表示第k'个较劣网络参数向量，X_k'(t+1)表示更新后的X_k'(t)，rand表示(0,1)之间的随机数。Among them, X _k' (t) represents the k'th worse network parameter vector, X _k' (t+1) represents the updated X _k' (t), and rand represents a random number between (0,1) .

适应度高可以证明网络参数向量的位置较优，对其他适应度低的网络参数向量具有一定的参考价值，因此可以将K个较优网络参数向量为参考，对K个较劣网络参数向量进行更新，以加快整个网络参数种群的更新，并且采用优胜劣汰的机制对较劣网络参数向量进行更新，能够避免向劣发展。High fitness can prove that the position of the network parameter vector is better, and it has certain reference value for other network parameter vectors with low fitness. Therefore, the K better network parameter vectors can be used as a reference, and the K worse network parameter vectors can be used as a reference. Update to speed up the update of the entire network parameter population, and use the survival of the fittest mechanism to update the inferior network parameter vector, which can avoid the development of inferior networks.

可选的，每次引导更新之后，可以对网络参数进行越界处理，以避免存在异常数据的情况出现。Optionally, after each boot update, the network parameters can be processed out of bounds to avoid abnormal data.

在一种可能的实施方式中，惯性权重为：In a possible implementation, the inertia weight is:

通过设置与更新次数相关的惯性权重，平衡了局部搜索能力和全局搜索能力，提高了算法的整体效率。虽然第一更新进行局部搜索以及全局搜索，但是其精度可能不够，因此可以进行第二更新，进一步提高搜索精度，保证整体训练效果。By setting the inertia weight related to the number of updates, the local search ability and global search ability are balanced, and the overall efficiency of the algorithm is improved. Although the first update performs local search and global search, its accuracy may not be enough, so the second update can be performed to further improve the search accuracy and ensure the overall training effect.

B1、为第二核心上每个第二待训练神经网络模型构建一个任务节点，构建一个主任务节点，并为每个任务节点划分私有内存以及划分一个公共内存。B1. Build a task node for each second neural network model to be trained on the second core, build a main task node, and divide private memory and a public memory for each task node.

B2、初始化迭代计数器t'＝1，初始化尺度矩阵B_t'＝I。B2. Initialize the iteration counter t'=1, and initialize the scale matrix B _t' =I.

B3、针对所有第二待训练神经网络模型，获取第二待训练神经网络模型对应的误差函数值E(w_t')，并将该误差函数值E(w_t')放入公共内存中，以供各个任务节点调用。其中，误差函数值E(w_t')通过任意任务节点上的第二待训练神经网络模型获取，w_t'表示第t'训练时第二待训练神经网络应用的第二网络参数向量。B3. For all the second neural network models to be trained, obtain the error function value E(w _t' ) corresponding to the second neural network model to be trained, and put the error function value E(w _t' ) into the common memory, To be called by each task node. Among them, the error function value E(w _t' ) is obtained through the second neural network model to be trained on any task node, and w _t' represents the second network parameter vector applied by the second neural network to be trained during the t'th training.

B4、获取第二待训练神经网络模型的梯度矩阵并将尺度矩阵B_t'以及梯度矩阵g_t'放入公共内存中，以供各个任务节点调用。其中，梯度矩阵g_t'中的每列为第二待训练神经网络模型中一个网络参数的梯度。B4. Obtain the gradient matrix of the second neural network model to be trained. And put the scale matrix B _t' and the gradient matrix g _t' into the common memory for call by each task node. Among them, each column in the gradient matrix g _t' is the gradient of a network parameter in the second neural network model to be trained.

其中，I表示单位尺度矩阵，表示梯度算子，E(w_t')表示误差函数，w_t'表示第二待训练神经网络模型对应的权重向量，B_t'为M阶方阵，M表示第二待训练神经网络模型的权重维数。Among them, I represents the unit scale matrix, represents the gradient operator, E(w _t' ) represents the error function, w _t' represents the weight vector corresponding to the second neural network model to be trained, B _t' is an M-order square matrix, and M represents the second neural network model to be trained. Weight dimension.

B5、各个任务节点并行运行，且为各个任务节点从公共内存中调用尺度矩阵B_t'以及梯度矩阵g_t'，确定第m个任务节点上第二待训练神经网络的搜索方向d_m为尺度矩阵B_t'中第m行数据与梯度矩阵g_t'中第m列数据的乘累加数据。其中，m＝1,2,…,M，搜索方向d_m存储于第m个任务节点对应的私有内存中。B5. Each task node runs in parallel, and calls the scale matrix B _t' and gradient matrix g _t' from the common memory for each task node to determine the search direction d _m of the second to-be-trained neural network on the m-th task node as the scale. The multiplication and accumulation data of the m-th row data in matrix B _t' and the m-th column data in gradient matrix g _t' . Among them, m=1,2,...,M, and the search direction _dm is stored in the private memory corresponding to the mth task node.

B6、各个任务节点并行运行，且为各个任务节点从其私有内存中调取搜索方向d_m，根据搜索方向d_m并采用黄金分割法确定该任务节点上第二待训练神经网络模型中第m个网络参数对应的更新步长λ_m，并将更新步长λ_m存储于第m个任务节点对应的私有内存中。B6. Each task node runs in parallel, and the search direction d _m is retrieved from its private memory for each task node. According to the search direction d _m and the golden section method is used, the mth in the second neural network model to be trained on the task node is determined. The update step size λ _m corresponding _to each network parameter is stored in the private memory corresponding to the mth task node.

B7、各个任务节点并行运行，且为各个任务节点从其私有内存中调取更新步长λ_m，并根据更新步长λ_m对任务节点上第二待训练神经网络模型中第m个网络参数进行更新。根据第m个任务节点上第二待训练神经网络模型的第m个更新过的网络参数，获取第m个梯度分量。B7. Each task node runs in parallel, and the update step size λ _m is retrieved from its private memory for each task node, and the m-th network parameter in the second neural network model to be trained on the task node is updated according to the update step size λ _m . Make an update. Obtain the mth gradient component according to the mth updated network parameter of the second to-be-trained neural network model on the mth task node.

B9、各个任务节点并行运行，根据第m个梯度分量，重新确定第m个任务节点对应的梯度矩阵g_m，并根据该梯度矩阵g_m对第m个任务节点对应的尺度矩阵B_t'进行更新。B9. Each task node runs in parallel. According to the m-th gradient component, the gradient matrix g _m corresponding to the m-th task node is re-determined, and the scale matrix B _t' corresponding to the m-th task node is calculated based on the gradient matrix g _m . renew.

B10、通过各个任务节点输出尺度矩阵B_t'中第m列元素以及第二待训练神经网络模型对应的第m个网络参数至主任务节点。B10. Output the m-th column element in the scale matrix B _t' and the m-th network parameter corresponding to the second to-be-trained neural network model to the main task node through each task node.

B11、通过主任务节点将m个网络参数构成更新后的第二网络参数向量，并判断更新后的第二网络参数向量对应的适应度是否大于设定的阈值，若是，则满足迭代结束要求，并结束第二更新的流程，否则进入步骤B12。B11. Use the main task node to construct the m network parameters into an updated second network parameter vector, and determine whether the fitness corresponding to the updated second network parameter vector is greater than the set threshold. If so, the iteration end requirements are met. And end the second update process, otherwise enter step B12.

其中，s＝1,2,…,S，S表示训练样本总数，n＝1,2,…,N，N表示第二待训练神经网络模型对应的输出神经元总数。y_n(x_s,w_t')表示当第二待训练神经网络模型以训练样本x_s为输入，以w_t'为网络参数时，第二待训练神经网络模型第n个输出神经元的实际输出。d_sn表示y_n(x_s,w_t')对应的期望输出。d_max,n表示所有训练样本对应的第n个期望输出中的最大值，每个训练样本对应一个期望输出向量。d_min,n表示所有训练样本对应的第n个期望输出中的最小值。Among them, s=1,2,…,S, S represents the total number of training samples, n=1,2,…,N, and N represents the total number of output neurons corresponding to the second neural network model to be trained. y _n (x _s ,w _t' ) means that when the second neural network model to be trained takes the training sample x _s as the input and w _t' as the network parameter, the nth output neuron of the second neural network model to be trained is actual output. d _sn represents the expected output corresponding to y _n (x _s ,w _t' ). d _max,n represents the maximum value of the nth expected output corresponding to all training samples, and each training sample corresponds to an expected output vector. d _min,n represents the minimum value of the nth expected output corresponding to all training samples.

采用搜索方向d_m构建一个只有一行的搜索矩阵D，该搜索矩阵中第m个元素为搜索方向d_m，其他元素为0，搜索矩阵的元素总数与第二待训练神经网络模型的网络参数总数相同。Use the search direction d _m to construct a search matrix D with only one row. The m-th element in the search matrix is the search direction d _m , and the other elements are 0. The total number of elements of the search matrix is the same as the total number of network parameters of the second neural network model to be trained. same.

根据搜索矩阵D以及搜索矩阵D构建求解条件并对该求解条件进行求解，确定更新步长λ_m。Construct solution conditions based on search matrix D and search matrix D And solve the solution condition to determine the update step size λ _m .

w_m'＝w_m+λ_md_m w _m '=w _m +λ _m d _m

以梯度矩阵g_t'为基础，采用第m个梯度分量更新梯度矩阵g_t'中第m个元素，得到第m个任务节点对应的梯度矩阵g_m。Based on the gradient matrix g _t' , the m-th gradient component is used to update the m-th element in the gradient matrix g _t' to obtain the gradient matrix g _m corresponding to the m-th task node.

若使用传统方法计算梯度会消耗大量的计算资源，而且无法保证高精度。例如，如果要使用V组训练数据训练一个神经网络，则计算目标函数需要V个工作项，该神经网络结构权重有C个元素，若使用传统的梯度定义计算权重的梯度，那么需要调用C次目标函数计算权重的每个元素的偏导数，运算效率大大降低。因此，本实施例采用另一种梯度矩阵的更新方法，包括：根据训练数据划分并行度，每个任务节点计算基于一组训练数据的权重的偏导数，最后求得目标函数的偏导数。假设存在V个训练数据，第二待神经网络结构权重有C个元素，当第二待训练神经网络模型中第m个网络参数更新后就将更新后的网络参数同步至所有节点，然后每个第二待训练神经网络就仅采用一个训练数据获取权重的偏导数，然后将权重的偏导数进行归约求和，以获取梯度矩阵g_m。Calculating gradients using traditional methods will consume a lot of computing resources and cannot guarantee high accuracy. For example, if you want to use V sets of training data to train a neural network, then V work items are needed to calculate the objective function. The weight of the neural network structure has C elements. If you use the traditional gradient definition to calculate the gradient of the weight, you need to call C times. The objective function calculates the partial derivative of each element of the weight, and the computing efficiency is greatly reduced. Therefore, this embodiment adopts another gradient matrix updating method, which includes: dividing the degree of parallelism according to the training data, each task node calculating the partial derivative of the weight based on a set of training data, and finally obtaining the partial derivative of the objective function. Assume that there are V training data, and the second neural network structure weight has C elements. When the m-th network parameter in the second neural network model to be trained is updated, the updated network parameters will be synchronized to all nodes, and then each The second neural network to be trained uses only one training data to obtain the partial derivatives of the weight, and then reduces and sums the partial derivatives of the weight to obtain the gradient matrix g _m .

归约求和的方法为：针对同一网络参数对应的M个偏导数，将第m个偏导数与第M-m个偏导数进行相加，并将相加结果放入下一轮相加数据中，下一轮相加数据中仅剩一个数据，得到一个网络参数对应的最终偏导数。其中，当n为奇数时，中间的偏导数直接进入下一轮相加。The method of reduction and summation is: for M partial derivatives corresponding to the same network parameter, add the m-th partial derivative and the M-m-th partial derivative, and put the addition result into the next round of addition data. There is only one data left in the next round of addition data, and the final partial derivative corresponding to a network parameter is obtained. Among them, when n is an odd number, the intermediate partial derivatives directly enter the next round of addition.

当需要数据整合以及数据通信时，由主任务节点向其他任务节点进行数据收发。虽然本申请增加了数据收发的一个过程，但是减少了函数的串行调用，有效地提升了训练效率。When data integration and data communication are required, the main task node sends and receives data to other task nodes. Although this application adds a process of data sending and receiving, it reduces the serial calls of functions and effectively improves training efficiency.

根据第m个任务节点对应的梯度矩阵g_m，对第m个任务节点对应的尺度矩阵B_t'进行更新为：According to the gradient matrix g _m corresponding to the m-th task node, the scale matrix B _t' corresponding to the m-th task node is updated as:

以上的具体实施方式，对本发明的目的、技术方案和有益效果进行了进一步详细说明，所应理解的是，以上仅为本发明的具体实施方式而已，并不用于限定本发明的保护范围，凡在本发明的精神和原则之内，所做的任何修改、等同替换、改进等，均应包含在本发明的保护范围之内。The above specific embodiments further describe the purpose, technical solutions and beneficial effects of the present invention in detail. It should be understood that the above are only specific embodiments of the present invention and are not intended to limit the scope of protection of the present invention. Within the spirit and principles of the present invention, any modifications, equivalent substitutions, improvements, etc. shall be included in the protection scope of the present invention.

Claims

1. A modular neural network training platform based on multi-channel NPU, which is characterized in that it includes a training data and NPU data acquisition module, a dividing module, a first core initialization module, a first update module, a second core initialization module, and a third core initialization module. 2. Update module, cycle module and output module;

The training data and NPU data acquisition module is used to obtain the neural network model to be trained, and determine the idle cores in the multi-channel NPU to obtain multiple target cores;

The dividing module is used to divide the plurality of target cores into two parts to obtain a first core group and a second core group. The first core group is used for first parallel training, and the second core group For second parallel training;

The first core initialization module is used to deploy a plurality of neural network models to be trained on each first core in the first core group, obtain a first neural network model to be trained, and provide a plurality of first neural network models to be trained respectively. The network model generates initial network parameter values, obtains the first network parameter vector corresponding to the first neural network model to be trained, and completes the initialization of the first core;

The first update module is used to run all the first cores simultaneously in a parallel operation manner, and for each first neural network model to be trained in the first core, adopt a global optimal network parameter guidance algorithm and a parallel data optimization algorithm. The first network parameter vector of each first to-be-trained neural network model is first updated until the number of first updates reaches the upper limit, and the local optimal network parameter vector in each first core is determined based on the first network parameter vector. ;

The second core initialization module is used to deploy multiple neural network models to be trained on the second core in the second core group to obtain a second neural network model to be trained, based on the local optimal network parameter vector in the first core , use the reduction method to determine the global optimal network parameter vector, and use the global optimal network parameter vector as the initial network parameters of the second neural network model to be trained on the second core to obtain the second neural network model corresponding to the second to be trained. Network parameter vector to complete the initialization of the second core;

The second update module is used to run all second cores simultaneously in a parallel operation manner, and for each second neural network model to be trained in the second core, use a parallel weight optimization algorithm to update each second neural network model to be trained. The second network parameter vector of the network model is updated for the second time until the number of second updates reaches the upper limit or the iteration end requirements are met, and the second update is ended;

The loop module is configured to, when the number of second updates reaches the upper limit, use the second network parameter vector in the second core as the global optimal network parameter vector, and return to the first update step;

The output module is used to, when iteration end requirements are met, use the second network parameter vector in the second core as the final network parameters of the neural network to be trained, and complete the training of the neural network model to be trained.

2. The modular neural network training platform based on multi-channel NPU according to claim 1, characterized in that generating initial network parameter values for a plurality of first neural network models to be trained includes: using the upper limit of the network parameter and The method of randomly generating network parameters between the lower limits of the network parameters generates the network parameter values of the first neural network model to be trained, or using a mixed sequence strategy to generate the network parameter values of the first neural network model to be trained.

3. The modular neural network training platform based on multi-channel NPU according to claim 1, characterized in that, a global optimal network parameter guidance algorithm and a parallel data optimization algorithm are used to train the first neural network model of each first neural network model to be trained. A network parameter vector is first updated until the number of first updates reaches the upper limit, and the local optimal network parameter vector in each first core is determined based on the first network parameter vector, including:

A1. Obtain the first fitness corresponding to the first network parameter vector of the first neural network model to be trained in parallel;

A2. Determine the first historical optimal value of each first network parameter vector according to the first fitness, and obtain the first historical optimal value and the first fitness corresponding to the first historical optimal value;

A3. According to the first fitness corresponding to the first historical optimal value corresponding to the first network parameter vector, use the reduction method to determine the local optimal network parameter vector in the first core;

A4. Determine whether the first maximum update times threshold has been reached. If so, output the local optimal network parameter vector in the first core. Otherwise, use the global optimal network parameter guidance algorithm to update the first network parameter vector and return to step A1. .

4. The modular neural network training platform based on multi-channel NPU according to claim 3, characterized in that, according to the first fitness corresponding to the first historical optimal value corresponding to the first network parameter vector, the reduction method is adopted. Determine the local optimal network parameter vector in the first core, including:

According to the first fitness corresponding to the first historical optimal value corresponding to the first network parameter vector, n first fitnesses are obtained;

Compare the i-th first fitness with the n-i-th first fitness, select the larger first fitness to enter the next round of comparison, until the number of selected first fitnesses is only one, and the first core is obtained The local optimal network parameter vector in ; among them, when n is an odd number, the middle number directly enters the next round of comparison.

5. The modular neural network training platform based on multi-channel NPU according to claim 3, characterized in that using the global optimal network parameter guidance algorithm to update the first network parameter vector includes:

Performing the first guided update on the i-th first network parameter vector is:

Among them, f(t) represents the random number between [0,1] generated during the t-th local training, and all f(t) are uniformly distributed, and X _i (t) represents the i-th random number during the t-th local training. The first network parameter vector, i=1,2,...,L, L represents the total number of the first network parameter vector, X _i (t+1) represents the updated X _i (t); X _{i, best} (t) Indicates the state of maximum fitness corresponding to the i-th first network parameter vector in the previous t local training processes, that is, the historical optimal value of the i-th first network parameter vector; represents the average value corresponding to the historical optimal values of all first network parameter vectors; X _best represents the global optimal value. During the first cycle, each first core runs independently at this time and does not know the global optimal value. Therefore, the local optimal network parameter vector in the first core is regarded as the global optimal value; ω(t) represents the inertia weight during the t-th local training;

Divide all the first network parameter vectors into two parts, one part is K better network parameter vectors, and the other part is K worse network parameter vectors; the K better network parameter vectors are the first half with the largest fitness Network parameter vector, when the first network parameter vector is an odd number, the first network parameter vector with the largest fitness does not participate in segmentation;

According to K optimal network parameter vectors, the network parameter center vector is obtained as:

Among them, C(t) represents the network parameter center vector, X _k (t) represents the kth better network parameter, and F(X _k (t) represents the fitness of the kth better network parameter;

According to the network parameter center vector C(t), the second guided update is performed on the inferior network parameter vector as:

X _k' (t+1)＝X _k' (t)+rand*(C(t)-X _k' (t))

Among them, X _k' (t) represents the k'th worse network parameter vector, X _k' (t+1) represents the updated X _k' (t), and rand represents a random number between (0,1) ;

Determine whether the fitness of the inferior network parameter vector after the second guidance update is greater than the fitness before the second guidance update. If so, accept the second guidance update of the inferior network parameter vector X _k' (t) to complete the first An update of the network parameter vector, otherwise the second guided update of the inferior network parameter vector X _k' (t) is rejected to complete the update of the first network parameter vector.

6. The modular neural network training platform based on multi-channel NPU according to claim 5, characterized in that the inertia weight is:

Among them, ω _max represents the maximum value of the inertia weight, ω _min represents the minimum value of the inertia weight, iter represents the maximum number of times to update the first network parameter vector, and t represents the current number of times to update the first network parameter vector.

7. The modular neural network training platform based on multi-channel NPU according to claim 5, characterized in that a parallel weight optimization algorithm is used to perform a second update on the second network parameter vector of each second neural network model to be trained. , until the number of second updates reaches the upper limit or meets the iteration end requirements, including:

B1. Build a task node for each second neural network model to be trained on the second core, build a main task node, and divide private memory and a public memory for each task node;

B2. Initialize the iteration counter t'=1, initialize the scale matrix B _t' =I;

B3. For all the second neural network models to be trained, obtain the error function value E(w _t' ) corresponding to the second neural network model to be trained, and put the error function value E(w _t' ) into the common memory, To be called by each task node; among them, the error function value E(w _t' ) is obtained through the second neural network model to be trained on any task node, w _t' represents the second neural network to be trained during the t'th training. second network parameter vector;

B4. Obtain the gradient matrix of the second neural network model to be trained. And put the scale matrix B _t' and the gradient matrix g _t' into the public memory for call by each task node; among them, each column in the gradient matrix g _t' is a network parameter of the second neural network model to be trained. gradient;

Among them, I represents the unit scale matrix, represents the gradient operator, E(w _t' ) represents the error function, w _t' represents the weight vector corresponding to the second neural network model to be trained, B _t' is an M-order square matrix, and M represents the second neural network model to be trained. weight dimension;

B5. Each task node runs in parallel, and calls the scale matrix B _t' and gradient matrix g _t' from the common memory for each task node to determine the search direction d _m of the second to-be-trained neural network on the m-th task node as the scale. The multiplication and accumulation data of the m-th row data in the matrix B _t' and the m-th column data in the gradient matrix g _t' ; where, m=1,2,...,M, the search direction d _m is stored in the m-th task node corresponding to in private memory;

B6. Each task node runs in parallel, and the search direction d _m is retrieved from its private memory for each task node. According to the search direction d _m and the golden section method is used, the mth in the second neural network model to be trained on the task node is determined. The update step size λ _m corresponding to each network _parameter is stored in the private memory corresponding to the mth task node;

B7. Each task node runs in parallel, and the update step size λ _m is retrieved from its private memory for each task node, and the m-th network parameter in the second neural network model to be trained on the task node is updated according to the update step size λ _m . Update; obtain the m-th gradient component based on the m-th updated network parameter of the second to-be-trained neural network model on the m-th task node;

B9. Each task node runs in parallel. According to the m-th gradient component, the gradient matrix g _m corresponding to the m-th task node is re-determined, and the scale matrix B _t' corresponding to the m-th task node is calculated based on the gradient matrix g _m . renew;

B10. Output the m-th column element in the scale matrix B _t' and the m-th network parameter corresponding to the second neural network model to be trained through each task node to the main task node;

B11. Use the main task node to construct the m network parameters into an updated second network parameter vector, and determine whether the fitness corresponding to the updated second network parameter vector is greater than the set threshold. If so, the iteration end requirements are met. And end the second update process, otherwise enter step B12;

B12. Determine whether the count value of the iteration counter t' is greater than the upper limit. If so, determine that the number of second updates has reached the upper limit, and end the second update process. Otherwise, increase the count value of the iteration counter t' by one, and add each task node to The elements in the mth column of the output scale matrix B _m form the scale matrix B _t' , use the updated second network parameter vector as the network parameters of the second neural network model to be trained on all task nodes, and return to step B3.

8. The modular neural network training platform based on multi-channel NPU according to claim 7, characterized in that, obtaining the error function value E(w _t' ) corresponding to the second neural network model to be trained is:

Among them, s=1,2,...,S, S represents the total number of training samples, n=1,2,...,N, N represents the total number of output neurons corresponding to the second neural network model to be trained; y _n (x _s , w _t' ) represents the actual output of the nth output neuron of the second neural network model to be trained when it takes the training sample x _s as input and w _t' as the network parameter; d _sn represents The expected output corresponding to y _n (x _s ,w _t' ); d _max,n represents the maximum value of the nth expected output corresponding to all training samples, and each training sample corresponds to an expected output vector; d _min,n represents The minimum value of the nth expected output corresponding to all training samples.

9. The modular neural network training platform based on multi-channel NPU according to claim 7, characterized in that, according to the search direction _dm and using the golden section method, the mth in the second neural network model to be trained on the task node is determined The update step size λ _m corresponding to each network parameter includes:

Use the search direction d _m to construct a search matrix D with only one row. The m-th element in the search matrix is the search direction d _m , and the other elements are 0. The total number of elements of the search matrix is the same as the total number of network parameters of the second neural network model to be trained. same;

Construct a step matrix λ' with only one column. The m-th element in the step matrix is the update step λ _m to be determined, and the other elements are 0. The total number of elements of the step matrix is the same as that of the second neural network model to be trained. The total number of network parameters is the same

Construct solution conditions based on search matrix D and search matrix D And solve the solution condition to determine the update step size λ _m ;

According to the update step size λ _m , the m-th network parameter in the second neural network model to be trained on the task node is updated as:

w _m '=w _m +λ _m d _m

Among them, w _m represents the m-th network parameter in the second neural network model to be trained, and w _m ' represents the updated w _m .

10. The modular neural network training platform based on multi-channel NPU according to claim 7, characterized in that, according to the m-th gradient component, the gradient matrix g _m corresponding to the m-th task node is re-determined, and according to the gradient The matrix g _m updates the scale matrix B _t' corresponding to the m-th task node, including:

Based on the gradient matrix g _t' , use the m-th gradient component to update the m-th element in the gradient matrix g _t' , and obtain the gradient matrix g _m corresponding to the m-th task node;

According to the gradient matrix g _m corresponding to the m-th task node, the scale matrix B _t' corresponding to the m-th task node is updated as:

Among them, B _t′ ′ represents the updated B _t′ , W represents the weight difference matrix, W=w _t′ ′-w _t′ , w _{t′ ′} represents the second to be trained updated according to the update step size λ _m The weight vector of the neural network model, w _t' represents the weight vector of the second neural network model to be trained before updating, T represents the transpose, Z=g _m -g _t' , and Z represents the gradient difference matrix.