CN110110845B

CN110110845B - Learning method based on parallel multi-level width neural network

Info

Publication number: CN110110845B
Application number: CN201910331708.8A
Authority: CN
Inventors: 席江波; 房建武; 吴田军; 康梦华
Original assignee: Changan University
Current assignee: Changan University
Priority date: 2019-04-24
Filing date: 2019-04-24
Publication date: 2020-09-22
Anticipated expiration: 2039-04-24
Also published as: CN110110845A

Abstract

The present invention discloses a learning method based on a parallel multi-level wide neural network, comprising the following steps: obtaining a verification set and constructing a base classifier; training and verifying each level of the parallel M-level wide neural network to obtain the trained parallel M-level wide neural network and the verification output corresponding to each level of the wide neural network; obtaining the decision threshold of each level of the wide neural network by statistical calculation; and testing the verified parallel multi-level wide neural network by a test set. The neural network of the present invention has a multi-level structure, each level learns for different parts of the data, and can achieve parallel training and testing. Each level uses a wide neural network to learn features in the width direction; multiple wide neural networks are reconnected as base classifiers in the width direction to achieve classifier integration in two width directions; incremental learning of the network is achieved by adding a new level of wide neural network; and parallel testing can be achieved.

Description

A learning method based on parallel multi-level wide neural network

技术领域technical field

本发明属于人工智能及机器学习技术领域，具体涉及一种基于并行多级宽度神经网络的学习方法。The invention belongs to the technical field of artificial intelligence and machine learning, and in particular relates to a learning method based on a parallel multi-level wide neural network.

背景技术Background technique

随着以深度学习网络为主的学习模型在大规模图像处理及机器视觉等领域获得巨大成功，学习模型的复杂度也快速增加，它们需要大量的高维数据去训练，从而大大增加了所需的计算资源和计算时间。此外，实际的数据往往不是同质的，有些样本非常容易分类，但是也有很多样本分类比较困难。大多数分类误差发生在输入样本很难分类的时候，比如样本的不平衡分布、异常获取样本、以及接近分类边界或者线性不可分的样本等。With the great success of deep learning network-based learning models in the fields of large-scale image processing and machine vision, the complexity of learning models has also increased rapidly. They require a large amount of high-dimensional data to train, which greatly increases the demand for computing resources and computing time. In addition, the actual data is often not homogeneous, some samples are very easy to classify, but there are also many samples that are difficult to classify. Most classification errors occur when the input samples are difficult to classify, such as unbalanced distribution of samples, abnormally obtained samples, and samples that are close to the classification boundary or linearly inseparable.

在现有的深度学习模型里，简单样本和复杂样本均用相同的方式处理，降低了计算资源的使用效率。同时，现有深度学习网络比如卷积神经网络往往具有很多层，所有样本都要经过所有的网络层，在对网络进行泛化或者测试的时候会非常耗时。而早期的并行多级自组织网络在每一级只接收被上一级拒绝的经过非线性变换的样本，这些样本被变换到易于分类的其它空间，从而再一次进行分类。但是，如何实现高维数据针对不同难度的数据样本进行计算资源的调整和分配以提高学习分类的速度和效率这一问题并没有得到很好的解决。In the existing deep learning model, simple samples and complex samples are processed in the same way, which reduces the efficiency of computing resources. At the same time, existing deep learning networks such as convolutional neural networks often have many layers, and all samples have to go through all network layers, which is very time-consuming when generalizing or testing the network. However, the early parallel multi-level self-organizing network only receives nonlinearly transformed samples rejected by the previous level at each level, and these samples are transformed to other spaces that are easy to classify, so as to classify again. However, the problem of how to adjust and allocate computing resources for high-dimensional data for data samples of different difficulties to improve the speed and efficiency of learning and classification has not been well resolved.

发明内容SUMMARY OF THE INVENTION

针对上述缺陷，本发明提供了一种基于并行多级宽度神经网络的学习方法，本发明的神经网络具有多级结构，每一级针对数据中的不同部分进行学习，且可实现并行化训练和测试。每一级采用一种宽度神经网络在宽度方向进行特征学习；通过多个宽度神经网络作为基分类器在宽度方向的再次连接，实现两个宽度方向上的分类器集成；通过增加新一级的宽度神经网络实现网络的增量学习；且可实现并行化测试，大大缩短了复杂样本的学习分类时间，提高网络运行效率。In view of the above defects, the present invention provides a learning method based on a parallel multi-level wide neural network. The neural network of the present invention has a multi-level structure, each level learns for different parts of the data, and can realize parallel training and test. Each level uses a wide neural network for feature learning in the width direction; through the reconnection of multiple wide neural networks as the base classifiers in the width direction, the integration of two classifiers in the width direction is realized; by adding a new level of The breadth neural network realizes the incremental learning of the network; and can realize parallel testing, which greatly shortens the learning and classification time of complex samples and improves the efficiency of network operation.

为了达到上述目的，本发明采用以下技术方案予以解决。In order to achieve the above object, the present invention adopts the following technical solutions to solve it.

(二)一种基于并行多级宽度神经网络的学习方法，并行多级宽度神经网络包括多级宽度神经网络，其中，每级宽度神经网络包含依次连接的输入层、隐藏层、决策层和输出层，所述决策层用于确定每个测试样本是否由当前级输出，所述学习方法包括以下步骤：(2) A learning method based on a parallel multi-level wide neural network, the parallel multi-level wide neural network includes a multi-level wide neural network, wherein each level of the wide neural network includes an input layer, a hidden layer, a decision layer and an output layer connected in sequence layer, the decision layer is used to determine whether each test sample is output by the current stage, and the learning method includes the following steps:

步骤1，获取原始训练样本集，构建并行M级宽度神经网络 Net₁，…Net_m，…，Net_M(m＝1，2…，M)，每级宽度神经网络作为对应级的基分类器；通过对原始训练样本集进行M次数据变换，对应得到M个验证集 x_{v_1}，…x_{v_m}，…x_{v_M}；Step 1, obtain the original training sample set, build a parallel M-level width neural network Net ₁ ,...Net _m ,...,Net _M (m=1, 2..., M), each level of width neural network is used as the base classifier of the corresponding level ; By performing M data transformations on the original training sample set, M corresponding verification sets x _{v_1} , ... x _{v_m} , ... x _{v_M are obtained} ;

其中，原始训练样本集的样本总数为N_tr。Among them, the total number of samples in the original training sample set is N _tr .

步骤2，采用原始训练样本集和M个验证集x_{v_1}，…x_{v_m}，…x_{v_M}分别对并行M 级宽度神经网络的每级进行训练和验证，得到训练后的并行M级宽度神经网络和每级宽度神经网络对应的验证输出y_{v_m}(m＝1，2…，M)；采用最小误差法得到每个验证输出y_{v_m}对应的标签y_{v_ind_m}，进而得到训练后的并行M级宽度神经网络的每级宽度神经网络的验证集的正确分类样本集y_{vc_m}和错误分类样本集 y_{vw_m}；Step 2, using the original training sample set and M verification sets x _{v_1} , ... x _{v_m} , ... x _{v_M} to train and verify each level of the parallel M-level width neural network respectively, and obtain the parallel M-level width neural network after training and The verification output y _{v_m} (m=1, 2..., M) corresponding to each level of width neural network; the minimum error method is used to obtain the label y _{v_ind_m} corresponding to each verification output y _{v_m} , and then the parallel M-level width neural network after training is obtained The correct classification sample set y _{vc_m} and the wrong classification sample set y _{vw_m} of the validation set of the neural network of each level width;

步骤3，对训练后的并行M级宽度神经网络的每级宽度神经网络的验证集的正确分类样本集y_{vc_m}和错误分类样本集y_{vw_m}分别进行统计计算，对应得到训练后的每级宽度神经网络的决策阈值T_m；将每级宽度神经网络的决策阈值 T_m作为对应级宽度神经网络的决策依据，得到决策阈值确定的并行M级宽度神经网络；Step 3: Statistical calculation is respectively performed on the correct classification sample set y _{vc_m} and the wrong classification sample set y _{vw_m} of the verification set of each level width neural network of the parallel M-level width neural network after training, corresponding to each level width neural network after training. The decision threshold T _m of the network; the decision threshold T _m of the width neural network of each level is used as the decision basis of the corresponding level width neural network, and the parallel M level width neural network determined by the decision threshold is obtained;

步骤4，获取测试集，将测试集作为决策阈值确定的并行M级宽度神经网络的输入数据并行输入给决策阈值确定的每级宽度神经网络进行测试，得到决策阈值确定的每级宽度神经网络的输出；获取每级宽度神经网络的误差向量，对决策阈值确定的每级宽度神经网络的输出进行判断，从而得到决策阈值确定的每级宽度神经网络的测试输出对应的标签y_{test_ind_m}。Step 4: Obtain a test set, and take the test set as the input data of the parallel M-level width neural network determined by the decision threshold and input it to the neural network of each level determined by the decision threshold for testing, and obtain the width of each level determined by the decision threshold. Output: Obtain the error vector of the neural network with the width of each level, and judge the output of the neural network with the width of each level determined by the decision threshold, so as to obtain the label y _{test_ind_m} corresponding to the test output of the neural network with the width of each level determined by the decision threshold.

本发明技术方案的特点和进一步的改进为：The characteristics and further improvement of the technical solution of the present invention are:

(1)步骤1中，所述数据变换为通过弹性变换(Elastic)对原始样本集中的样本进行压缩或变形；或所述数据变换为通过仿射变换(Affine)对原始样本集中的样本进行旋转、翻转、放大或缩小。(1) In step 1, the data is transformed to compress or deform the samples in the original sample set through elastic transformation (Elastic); or the data transformation is to rotate the samples in the original sample set through affine transformation (Affine). , flip, zoom in or zoom out.

(2)步骤2中，所述采用原始训练样本集和M个验证集x_{v_1}，…x_{v_m}，…x_{v_M}分别对并行M级宽度神经网络的每级进行训练和验证，其包含以下子步骤：(2) In step 2, the original training sample set and M verification sets x _{v_1} , ... x _{v_m} , ... x _{v_M} are used to train and verify each level of the parallel M-level width neural network respectively, which includes the following sub-steps :

子步骤2.1，将原始训练样本集作为第1级宽度神经网络Net₁的输入样本，对第1级宽度神经网络Net₁进行训练，得到训练后的第一级宽度神经网络。In sub-step 2.1, the original training sample set is used as the input sample of the first-level wide neural network Net ₁ , and the first-level wide neural network Net ₁ is trained to obtain the trained first-level wide neural network.

子步骤2.2，采用第一验证集x_{v_1}对训练后的第1级宽度神经网络进行验证，得到第1级宽度神经网络的验证集的错误分类样本集y_{vw_1}。In sub-step 2.2, the first validation set x _{v_1} is used to validate the first-level width neural network after training, and a misclassified sample set y _{vw_1} of the validation set of the first-level width neural network is obtained.

子步骤2.3，将第一级宽度神经网络的错误分类样本集y_{vw_1}作为第2级宽度神经网络的输入样本A_{v_1}；再从原始训练样本集中随机抽取训练样本集A_{v_2}，使总输入样本集{A_{v_1}+A_{v_2}}中的样本数等于原始训练样本集中的样本数，并将总输入样本集{A_{v_1}+A_{v_2}}作为第2级宽度神经网络的输入样本。Sub-step 2.3, use the misclassified sample set y _{vw_1} of the first-level width neural network as the input sample A _{v_1} of the second-level width neural network; then randomly select the training sample set A _{v_2} from the original training sample set, so that the total input sample set The number of samples in {A _{v_1} +A _{v_2} } is equal to the number of samples in the original training sample set, and the total input sample set {A _{v_1} +A _{v_2} } is used as the input sample of the 2-stage wide neural network.

子步骤2.4，采用总输入样本集{A_{v_1}+A_{v_2}}对第2级宽度神经网络进行训练，得到训练后的第2级宽度神经网络；采用第二验证集x_{v_2}对训练后的第2 级宽度神经网络进行验证，得到第2级宽度神经网络的验证集的错误分类样本集y_{vw_2}。Sub-step 2.4, use the total input sample set {A _{v_1} +A _{v_2} } to train the second-level width neural network to obtain the second-level width neural network after training; use the second validation set x _{v_2} to train the second-level width neural network after training. The second-level width neural network is used for verification, and the misclassified sample set y _{vw_2} of the verification set of the second-level width neural network is obtained.

依次类推，对第3级到第M级宽度神经网络分别进行训练，得到训练后的并行M级宽度神经网络和每级宽度神经网络的对应验证输出 y_{v_m}(m＝1，2…，M)。By analogy, the third-level to M-th level width neural networks are trained respectively, and the trained parallel M-level width neural network and the corresponding verification output y _{v_m} of each level width neural network (m=1, 2..., M) are obtained. .

(3)步骤2中，所述最小误差法为：(3) In step 2, the minimum error method is:

首先，设定原始训练样本集的总类别数为C，构建参考矩阵R_j(1≤j≤C)。First, set the total number of categories of the original training sample set as C, and construct a reference matrix R _j (1≤j≤C).

其中，参考矩阵R_j的第j行的元素都为1，其余元素都为0，每个参考矩阵R_j的维数为C×N_tr。Wherein, the elements of the jth row of the reference matrix R _j are all 1, the other elements are all 0, and the dimension of each reference matrix R _j is C×N _tr .

其次，根据训练后的每级宽度神经网络的验证输出y_{v_m}，获取验证输出 y_{v_m}与对应级的参考矩阵R_j之间的误差向量：Secondly, according to the verification output y _{v_m} of the neural network of each stage width after training, the error vector between the verification output y _{v_m} and the reference matrix R _j of the corresponding stage is obtained:

J_{v_mj}＝||softmax(y_{v_m})-R_j||₂，1≤j≤C；J _{v_mj} =||softmax(y _{v_m} )-R _j || ₂ , 1≤j≤C;

其中，J_{v_mj}的维数为1×N_tr；y_{v_m}的维数为C×N_tr。The dimension of J _{v_mj} is 1×N _tr ; the dimension of y _{v_m} is C×N _tr .

最后，对验证输出y_{v_m}与对应级的参考矩阵R_j之间的误差向量J_{v_mj}求最小值，得到训练后的每级宽度神经网络对应的类别标签y_{v_ind_m}：Finally, take the minimum value of the error vector J _{v_mj} between the verification output y _{v_m} and the reference matrix R _j of the corresponding level, and obtain the class label y _{v_ind_m} corresponding to the neural network of each level width after training:

其中，y_{v_ind_m}的维数为1×N_tr。Among them, the dimension of y _{v_ind_m} is 1×N _tr .

(4)步骤3中，所述统计计算包含以下子步骤：(4) In step 3, the statistical calculation includes the following substeps:

子步骤3.1，设定训练后的并行M级宽度神经网络的第m级宽度神经网络的正确分类样本集和错误分类样本集分别为：y_{vc_m}和y_{vw_m}，正确分类样本集和错误分类样本集中的样本总数分别为：N_{vc_m}和N_{vw_m}，且N_{vc_m}+N_{vw_m}＝N_tr，则正确分类样本集和错误分类样本集的误差分别为：Sub-step 3.1, set the correct classification sample set and misclassified sample set of the mth-level width neural network of the trained parallel M-level width neural network are: y _{vc_m} and y _{vw_m} , the correct classification sample set and the wrong classification sample set The total number of samples are: N _{vc_m} and N _{vw_m} , and N _{vc_m} +N _{vw_m} =N _tr , then the errors of the correctly classified sample set and the wrongly classified sample set are:

e_{vc_m}＝||softmax(y_{vc_m})-t_{vc_m}||₂；e _{vc_m} =||softmax(y _{vc_m} )-t _{vc_m} || ₂ ;

e_{vw_m}＝||softmax(y_{vw_m})-t_{vw_m}||₂；e _{vw_m} =||softmax(y _{vw_m} )-t _{vw_m} || ₂ ;

其中，t_{vc_m}是m级宽度神经网络中正确分类样本y_{vc_m}对应的真实标签， t_{vw_m}是m级宽度神经网络中错误分类样本y_{vw_m}对应的真实标签。Among them, t _{vc_m} is the true label corresponding to the correctly classified sample y _{vc_m} in the m-level width neural network, and t _{vw_m} is the true label corresponding to the misclassified sample y _{vw_m} in the m-level width neural network.

子步骤3.2，根据正确分类样本集y_{vc_m}和错误分类样本集y_{vw_m}，分别计算出正确分类样本集y_{vc_m}的均值和方差分别为μ_c和σ_c；错误分类样本集y_{vw_m}的均值和方差分别是：u_w和σ_w；则正确分类样本集y_{vc_m}和错误分类样本集y_{vw_m}对应的高斯分布分别是：Sub-step 3.2, according to the correctly classified sample set y _{vc_m} and the wrongly classified sample set y _{vw_m} , calculate the mean and variance of the correctly classified sample set y _{vc_m} as μ _c and σ _c respectively; the mean and variance of the wrongly classified sample set y _{vw_m} are: u _w and σ _w ; then the Gaussian distributions corresponding to the correctly classified sample set y _{vc_m} and the wrongly classified sample set y _{vw_m} are:

正确分类样本集y_{vc_m}和错误分类样本集y_{vw_m}对应的高斯概率密度函数分别是：The Gaussian probability density functions corresponding to the correctly classified sample set y _{vc_m} and the incorrectly classified sample set y _{vw_m} are:

子步骤3.3，根据错误分类样本集y_{vw_m}的误差e_{vw_m}和方差σ_w，获得m级宽度神经网络的决策阈值T_m＝min(e_{vw_m})-ασ_w。In sub-step 3.3, according to the error e _{vw_m} and the variance σ _w of the misclassified sample set y _{vw_m} , the decision threshold T _m =min(e _{vw_m} )-ασ _w of the m-level width neural network is obtained.

其中，α是一个常数，用来给出裕量，以使所有错误分类样本y_{vw_m}在当前级被拒绝。where α is a constant to give a margin so that all misclassified samples y _{vw_m} are rejected at the current stage.

(5)步骤4中，所述获取测试集为：获取原始测试样本集x_test；通过M 次数据扩充，对应获取M组测试样本集x_{test_1}，...，x_{test_m}，...，x_{test_M}，即为测试集。(5) In step 4, the acquisition of the test set is: acquiring the original test sample set x _test ; through M times of data expansion, correspondingly acquiring M groups of test sample sets x _{test_1} , . . . , x _{test_m} , . . , x _{test_M} , which is the test set.

进一步地，所述数据扩充为：对所述原始测试样本集x_test中的每个样本分别进行N_testD次所述数据变换，对应得到N_testD个测试样本集，作为决策阈值确定的并行M级宽度神经网络的第m级宽度神经网络的测试集x_{test_m}。Further, the data expansion is: perform N _testD times of the data transformation on each sample in the original test sample set x _test respectively, and correspondingly obtain N _testD test sample sets, as the parallel M level determined by the decision threshold. The test set x _{test_m} of the m-th-level wide neural network.

其中，原始测试样本集X_test中测试样本总数为N_{test_saples}。Among them, the total number of test samples in the original test sample set X _test is N _{test_saples} .

(6)步骤4中，所述获取每级宽度神经网络的误差向量包含以下子步骤：(6) In step 4, the described acquisition of the error vector of each level of width neural network includes the following sub-steps:

子步骤4.1，将M组测试样本集x_{test_1}，x_{test_2}，...，x_{test_M}分别并行输入给决策阈值确定的并行M级宽度神经网络，对应得到决策阈值确定的每级宽度神经网络的N_testD个输出y_{test_m_d}，(d＝1，2…N_testD)。Sub-step 4.1, the M groups of test sample sets x _{test_1} , x _{test_2} , ..., x _{test_M} are respectively input in parallel to the parallel M-level width neural network determined by the decision threshold, corresponding to the N of each level width neural network determined by the decision threshold. _testD outputs y _{test_m_d} , (d=1, 2...N _testD ).

子步骤4.2，对决策阈值确定的每级宽度神经网络的N_testD个输出 y_{test_m_d}，(d＝1，2…N_testD)计算平均值，得到决策阈值确定的每级宽度神经网络的测试输出

Sub-step 4.2, calculate the average value of the N _testD outputs y _{test_m_d} of the neural network with the width of each level determined by the decision threshold, (d=1, 2...N _testD ), and obtain the test output of the neural network with the width of each level determined by the decision threshold

子步骤4.3，设定测试集的总类别数为C，构建参考矩阵R_j(1≤j≤C)；获取验证输出y_{v_m}与对应级的参考矩阵R_j之间的误差向量：Sub-step 4.3, set the total number of categories in the test set as C, construct a reference matrix R _j (1≤j≤C); obtain the error vector between the verification output y _{v_m} and the reference matrix R _j of the corresponding level:

J_{test_mj}＝||softmax(y_{test_m})-R_j||₂，1≤j≤C；J _{test_mj} =||softmax(y _{test_m} )-R _j || ₂ , 1≤j≤C;

其中，参考矩阵R_j的第j行的元素都为1，其余元素都为0，每个参考矩阵R_j的维数为C×N_{test_samples}；J_{test_mj}的维数为1×N_{test_samples}，y_{v_m}的维数为 C×N_{test_samples}。Among them, the elements of the jth row of the reference matrix R _j are all 1, and the other elements are 0. The dimension of each reference matrix R _j is C×N _{test_samples} ; The dimension of J _{test_mj} is 1×N _{test_samples} , y _{v_m} The dimension of _{test_samples} is C×N.

(7)所述对决策阈值确定的每级宽度神经网络的输出进行判断为：(7) The output of each level width neural network determined by the decision threshold is judged as:

当前级宽度神经网络的最小误差小于等于当前级决策阈值时，则判断为当前级为该输出的正确分类输出级：When the minimum error of the current stage width neural network is less than or equal to the current stage decision threshold, it is judged that the current stage is the correct classification output stage of the output:

min(J_{test_mj})≤T_m。min(J _{test_mj} )≦T _m .

当前级宽度神经网络的最小误差大于当前级决策阈值时，则判断为当前级无法对该输出进行正确分类，将该输出转入下一级宽度神经网络进行测试，如此循环，直到该输出找到正确分类输出级：When the minimum error of the current-level width neural network is greater than the current-level decision threshold, it is judged that the current level cannot correctly classify the output, and the output is transferred to the next-level width neural network for testing, and so on until the output is found to be correct Classification output stage:

min(J_{test_mj})＞T_m。min(J _{test_mj} )>T _m .

(8)步骤4中，所述得到决策阈值确定的每级宽度神经网络的测试输出对应的标签y_{test_ind_m}为：

(8) In step 4, the label y _{test_ind_m} corresponding to the test output of the neural network of each level width determined by the obtained decision threshold is:

其中，y_{test_ind_m}的维数为1×N_{test_samples}。where the dimension of y _{test_ind_m} is 1×N _{test_samples} .

与现有技术相比，本发明的有益效果为：Compared with the prior art, the beneficial effects of the present invention are:

(1)本发明的神经网络具有多级基分类器，每一级用来学习数据集的不同部分样本，能够根据问题及数据集的复杂程度，自适应地确定神经网络的结构，实现计算资源的优化。(1) The neural network of the present invention has a multi-level base classifier, and each level is used to learn different parts of the data set. According to the complexity of the problem and the data set, the structure of the neural network can be adaptively determined to realize computing resources. Optimization.

(2)本发明的神经网络具有增量学习的优点，在新的训练数据可用时，实现对当前神经网络的判断，根据判断结果，确定是否能对新增训练数据进行正确分类，若不能进行正确分离，则通过增加新的宽度径向基函数作为神经网络新的一级来学习新的样本，而无需重新训练整个网络。(2) The neural network of the present invention has the advantage of incremental learning. When new training data is available, the judgment of the current neural network is realized. According to the judgment result, it is determined whether the newly added training data can be correctly classified. Correct separation, new samples are learned by adding a new width radial basis function as a new stage of the neural network without retraining the entire network.

(3)本发明的神经网络在测试的时候可以进行并行测试，也就是把测试数据同时给网络的所有级，由训练过程中得到的每一级的决策阈值来决定每个测试样本最终由哪一级的神经网络输出，并行测试过程大大减少了实际使用网络时候的等待时间。(3) The neural network of the present invention can be tested in parallel during the test, that is, the test data is given to all levels of the network at the same time, and the decision threshold of each level obtained during the training process determines where each test sample is ultimately One-level neural network output, parallel testing process greatly reduces the waiting time when actually using the network.

(4)本发明的神经网络可作为一种通用的学习框架，具有很强的灵活性，其每一级可根据实际需要使用BP神经网络、卷积神经网络或者其他类型的分类器。(4) The neural network of the present invention can be used as a general learning framework with strong flexibility, and each level of the neural network can use a BP neural network, a convolutional neural network or other types of classifiers according to actual needs.

附图说明Description of drawings

下面结合附图和具体实施例对本发明做进一步详细说明。The present invention will be further described in detail below with reference to the accompanying drawings and specific embodiments.

图1是本发明的并行多级神经网络的原理图及其训练测试过程原理图；其中，图1(a)是本发明的并行多级宽度神经网络原理图；图1(b)是本发明的并行多级宽度神经网络的训练和验证过程原理图；图1(c)是本发明的并行多级宽度神经网络的测试过程原理图。1 is a schematic diagram of a parallel multistage neural network of the present invention and a schematic diagram of a training and testing process; wherein, FIG. 1(a) is a schematic diagram of a parallel multistage width neural network of the present invention; FIG. 1(b) is a schematic diagram of the present invention Schematic diagram of the training and verification process of the parallel multi-level width neural network; Fig. 1(c) is the principle diagram of the testing process of the parallel multi-level width neural network of the present invention.

图2是本发明的并行多级宽度神经网络的结构图。FIG. 2 is a structural diagram of the parallel multi-stage wide neural network of the present invention.

图3(a)是本发明的并行多级宽度神经网络的验证集在其中一级上的误差分布图；图3(b)是图3(a)中的统计参数的高斯概率密度函数。Fig. 3(a) is the error distribution diagram of the validation set of the parallel multi-stage width neural network of the present invention on one of the stages; Fig. 3(b) is the Gaussian probability density function of the statistical parameters in Fig. 3(a).

图4是本发明实施例中的并行26级宽度神经网络在MNIST数据集上的测试结果与现有学习模型的分类结果对比图。FIG. 4 is a comparison diagram of the test result of the parallel 26-level wide neural network on the MNIST data set in the embodiment of the present invention and the classification result of the existing learning model.

具体实施方式Detailed ways

下面将结合实施例对本发明的实施方案进行详细描述，但是本领域的技术人员将会理解，以下实施例仅用于说明本发明，而不应视为限制本发明的范围。The embodiments of the present invention will be described in detail below with reference to the examples, but those skilled in the art will understand that the following examples are only used to illustrate the present invention and should not be regarded as limiting the scope of the present invention.

采用MNIST手写数据集，该数据集每幅图像为8位灰度手写数字0～9的图像，图像大小为28×28，总共10类，有60000张原始训练样本集，10000 张图像作为测试集，是新学习模型训练测试的重要通用图像数据集之一。针对该数据集，参考图1和图2，本实施例采用宽度径向基函数网络作为基分类器，即并行多级宽度神经网络的每级均采用宽度径向基函数网络，选取并行的宽度神经网络的级数为26。Using the MNIST handwriting dataset, each image in this dataset is an 8-bit grayscale handwritten digit 0-9 image, the image size is 28×28, there are 10 categories in total, there are 60,000 original training sample sets, and 10,000 images are used as the test set , which is one of the important general image datasets for training and testing of new learning models. For this data set, referring to FIG. 1 and FIG. 2 , in this embodiment, the width radial basis function network is used as the base classifier, that is, each level of the parallel multi-level width neural network adopts the width radial basis function network, and the parallel width radial basis function network is selected. The number of stages of the neural network is 26.

(1)获取验证集，构建基分类器。(1) Obtain the validation set and build the base classifier.

首先，对N_tr＝60000张原始训练样本集中的图像样本分别进行26次弹性变换，得到M＝26个验证集x_{v_1}，x_{v_2}，...，x_{v_26}，本实施例中为了保证有足够多的验证集的错分样本，每个验证集中包括N_val＝10个由原始训练集变换得到的数据集。其中，每个验证集的样本数为原始训练样本集的N_val＝10倍。First, perform 26 elastic transformations on the image samples in the original training sample set of N _tr = 60000 to obtain M = 26 verification sets x _{v_1} , x _{v_2} , . . . , x _{v_26} . In this embodiment, in order to ensure sufficient There are many misclassified samples of the validation set, and each validation set includes N _val = 10 data sets transformed from the original training set. Among them, the number of samples in each validation set is N _val =10 times of the original training sample set.

其次，采用宽度径向基函数网络作为基分类器来设计并行多级宽度神经网络；M＝26个宽度径向基函数网络连接在一起，形成并行多级宽度神经网络 Net₁，Net₂，...Net_M；每一个基分类器作为一级，专注于数据集中的不同部分。Secondly, a wide radial basis function network is used as the base classifier to design a parallel multi-level wide neural network; M=26 wide radial basis function networks are connected together to form a parallel multi-level wide neural network Net ₁ , Net ₂ , . ..Net _M ; each base classifier acts as a stage, focusing on a different part of the dataset.

最后，构建宽度径向基函数网络。具体过程如下：Finally, construct the width radial basis function network. The specific process is as follows:

构建包括N_0k＝1000个高斯基函数为

的径向基函数网络，该径向基函数网络的中心为随机取自原始训练样本集的一个子集，标准差取值为常数。采用滑动窗口获取原始训练样本集中每个图像样本的多组局部特征图像，从而获得多组局部特征矩阵，将多组局部特征矩阵作为高斯基函数的输入数据，得到多个径向基函数网络即为宽度径向基函数网络。The construction includes N _0k = 1000 Gaussian base functions as

The radial basis function network of , the center of the radial basis function network is randomly selected from a subset of the original training sample set, and the standard deviation is constant. The sliding window is used to obtain multiple sets of local feature images of each image sample in the original training sample set, thereby obtaining multiple sets of local feature matrices, and using multiple sets of local feature matrices as the input data of the Gaussian basis function to obtain multiple radial basis function networks. is the width radial basis function network.

2)对并行M级宽度神经网络的每级进行训练和验证，得到训练后的并行 M级宽度神经网络和每级宽度神经网络对应的验证输出y_{v_m}(m＝1，2…，M)。2) Train and verify each stage of the parallel M-level width neural network, and obtain the parallel M-level width neural network after training and the verification output y _{v_m} (m=1, 2..., M) corresponding to each level width neural network.

第1级宽度径向基函数网络使用原始训练样本集来训练，在训练之后，错分的训练样本送到第2级的宽度径向基函数网络，作为第二个训练集的一部分，来训练第2级的网络。采用步骤(1)获得的验证集，对当前级的训练网络验证，同时提供更多的错分样本，作为下一级训练集的一部分。如图1 (a)、(b)所示，具体地，包含以下子步骤：The first-level BRF network is trained using the original training sample set. After training, the misclassified training samples are sent to the second-level BRF network as part of the second training set for training. Level 2 network. The verification set obtained in step (1) is used to verify the training network of the current level, and at the same time, more misclassified samples are provided as part of the training set of the next level. As shown in Figure 1 (a) and (b), specifically, the following sub-steps are included:

子步骤2.1，将原始训练样本集作为第1级宽度神经网络Net₁的输入样本，对第1级宽度神经网络Net₁进行训练，得到训练后的第1级宽度神经网络。In sub-step 2.1, the original training sample set is used as the input sample of the first-level wide neural network Net ₁ , and the first-level wide neural network Net ₁ is trained to obtain the first-level wide neural network after training.

子步骤2.2，采用第一验证集x_{v_1}对训练后的第1级宽度神经网络进行验证，得到第1级宽度神经网络的错误分类样本集y_{vw_1}。In sub-step 2.2, the first verification set x _{v_1} is used to verify the trained first-level wide neural network, and a misclassified sample set y _{vw_1} of the first-level wide neural network is obtained.

子步骤2.3，将第1级宽度神经网络的错误分类样本集y_{vw_1}作为第2级宽度神经网络的输入样本A_{v_1}；再从原始训练样本集中随机抽取训练样本集A_{v_2}，使总输入样本集{A_{v_1}+A_{v_2}}中的样本数等于原始训练样本集中的样本数，并将总输入样本集{A_{v_1}+A_{v_2}}作为第2级宽度神经网络的输入样本。Sub-step 2.3, use the misclassified sample set y _{vw_1} of the first-level width neural network as the input sample A _{v_1} of the second-level width neural network; then randomly select the training sample set A _{v_2} from the original training sample set, so that the total input sample set The number of samples in {A _{v_1} +A _{v_2} } is equal to the number of samples in the original training sample set, and the total input sample set {A _{v_1} +A _{v_2} } is used as the input sample of the 2-stage wide neural network.

子步骤2.4，采用总输入样本集{A_{v_1}+A_{v_2}}对第2级宽度神经网络进行训练，得到训练后的第2级宽度神经网络；采用第二验证集x_{v_2}对训练后的第2 级宽度神经网络进行验证，得到第2级宽度神经网络的错误分类样本集y_{vw_2}。Sub-step 2.4, use the total input sample set {A _{v_1} +A _{v_2} } to train the second-level width neural network to obtain the second-level width neural network after training; use the second validation set x _{v_2} to train the second-level width neural network after training. The class width neural network is used for verification, and the misclassified sample set y _{vw_2} of the second class width neural network is obtained.

重复子步骤2.3和2.4，对第3级到第M级宽度神经网络分别进行训练，得到训练后的并行M级宽度神经网络和每级宽度神经网络的对应验证输出 y_{v_m}(m＝1，2…，M)。Repeat sub-steps 2.3 and 2.4 to train the 3rd to Mth-level width neural networks respectively, and obtain the parallel M-level width neural network after training and the corresponding verification output y _{v_m} of each level width neural network (m=1, 2 ..., M).

上述的宽度径向基函数网络的具体训练和验证过程如下：The specific training and verification process of the above-mentioned width radial basis function network is as follows:

将原始训练样本集中的图像样本作为输入数据，图像大小为 M₁×M₂＝28×28。滑动窗口大小为r＝13×13，滑动窗口的初始位置设在每个图像样本的左上角，选择滑动步长为1个像元，滑动窗口从左到右，从上到下依次滑动，把滑动窗口中的60000个图像样本的3维图像块拉伸成为矩阵 x_k∈R^r×N，即将每个局部特征图像分别按像元组成对应的原始矩阵，将每个原始矩阵的第2至最后一列依次顺序排列至第1列后形成一个列向量；将N个列向量顺序排列组成一组训练图像样本的局部特征矩阵x_k(1≤k≤K)，局部特征矩阵x_k的每一列代表一个样本。再把局部特征矩阵x_k输入给包括N_0k＝1000个高斯基函数为

的径向基函数网络，输出记为：The image samples in the original training sample set are used as input data, and the image size is M ₁ ×M ₂ =28×28. The size of the sliding window is r=13×13, the initial position of the sliding window is set at the upper left corner of each image sample, the sliding step is selected as 1 pixel, the sliding window slides from left to right, and from top to bottom, and the The 3-dimensional image block of 60000 image samples in the sliding window is stretched into a matrix x _k ∈ R ^r×N , that is, each local feature image is formed into a corresponding original matrix by pixel, and the second to The last column is sequentially arranged to the first column to form a column vector; N column vectors are sequentially arranged to form a local feature matrix x _k (1≤k≤K) of a set of training image samples, each column of the local feature matrix x _k represents a sample. Then input the local feature matrix x _k to include N _0k =1000 Gaussian base functions as

The radial basis function network of , the output is recorded as:

其中，

为包含N＝60000个元素的列向量。in,

is a column vector containing N=60000 elements.

滑动窗口每次滑动对应一个径向基函数网络，最终滑动结束后，可得到 K＝(M₁-m+1)(M₂-m+1)＝(28-13+1)×(28-13+1)＝256个径向基函数网络。Each sliding of the sliding window corresponds to a radial basis function network. After the final sliding, K=(M ₁ -m+1)(M ₂ -m+1)=(28-13+1)×(28- 13+1)=256 radial basis function networks.

针对每一个径向基函数网络，对其经过高斯基函数的输出引入排序和下采样。针对每一个径向基函数网络，对其经过非线性变换的高斯基函数输出数据Φ_k引入排序和下采样。对宽度径向基函数网络的输出数据Φ_k的每一列进行求和，得到一个行向量，行向量的每个元素为每个待处理图像的局部特定位置的像元之和，对每个待处理图像的局部特定位置的像元之和进行降序排列，得到降序向量

采用索引s_k将降序向量a_k中每个待处理图像的局部特定位置对应的原始位置进行标记，得到排序的输出数据Φ′_k＝sort(Φ_k，s_k)。For each radial basis function network, sorting and downsampling are introduced to the output of the Gaussian basis function. For each radial basis function network, sorting and down-sampling are introduced to its Gaussian basis function output data _Φk after nonlinear transformation. Sum up each column of the output data Φk of the width radial basis function network to obtain a row _vector , each element of the row vector is the sum of the pixels at the local specific position of each image to be processed, for each to-be-processed image. The sum of the pixels in the local specific position of the processed image is sorted in descending order to obtain a descending vector

The index _sk is used to mark the original position corresponding to the local specific position of each image to be processed in the descending vector _ak to obtain sorted output data Φ′ _k =sort(Φ _k , _sk ).

对排序的输出数据进行下采样，设定下采样间隔N_kS＝20，经过采样的输出个数为：Downsampling the sorted output data, setting the downsampling interval N _kS =20, and the number of sampled outputs is:

则总的宽度径向基函数网络的输出个数为

采样输出为Φ_ks＝subsample(Φ′_k，N_kS)，则高斯基函数的输出为Φ＝[Φ_1S，Φ_2S，…，Φ_KS]。Then the output number of the total width radial basis function network is

The sampling output is Φ _ks =subsample(Φ′ _k , N _kS ), then the output of the Gaussian basis function is Φ=[Φ _1S , Φ _2S , . . . , Φ _KS ].

设定期望的输出为D＝[D₁，D₂，…，D_C]；对宽度径向基函数网络的高斯基函数的输出进行线性层连接，则线性层的权重为：W＝[W₁，W₂，…，W_C]；Set the desired output as D=[D ₁ , D ₂ , ..., D _C ]; perform linear layer connection on the output of the Gaussian basis function of the width radial basis function network, then the weight of the linear layer is: W=[W ₁ , W ₂ , ..., W _C ];

其中，C＝10是原始样本的类别总数。where C=10 is the total number of categories of the original sample.

得到宽度径向基函数网络的类别输出Y＝[Y₁，Y₂，…，Y_C]＝ΦW；具体地，通过最小化平方误差计算线性层的权重的最小均方估计

具体公式为：Obtain the class output Y=[Y ₁ , Y ₂ , _.

The specific formula is:

通过宽度径向基函数网络的高斯基函数输出Φ的伪逆矩阵计算线性层的权重的最小均方估计

Compute the least-mean-square estimate of the weights of the linear layers via the pseudo-inverse matrix of the Gaussian basis function output Φ of the wide radial basis function network

其中，Φ⁺为宽度径向基函数网络的高斯基函数输出Φ的伪逆矩阵。where Φ ⁺ is the pseudo-inverse matrix of the Gaussian basis function output Φ of the width radial basis function network.

最终，计算得到宽度径向基函数网络的类别输出为：Finally, the category output of the width radial basis function network is calculated as:

进而获得训练后的宽度径向基函数网络，对每级训练后的宽度径向基函数网络采用对应验证集进行验证，获得训练后的每级宽度径向基函数网络对应的验证输出y_{v_m}(m＝1，2…，M)。Then obtain the width radial basis function network after training, use the corresponding validation set to verify the width radial basis function network after each level of training, and obtain the validation output y _{v_m} corresponding to the width radial basis function network of each level after training ( m=1,2...,M).

通过获得的验证输出y_{v_m}(m＝1，2…，M)，进一步获得每个验证输出y_{v_m}对应的类别标签y_{v_ind_m}，具体步骤如下：Through the obtained verification output y _{v_m} (m=1, 2..., M), the category label y _{v_ind_m} corresponding to each verification output y _{v_m} is further obtained, and the specific steps are as follows:

将训练后的每级宽度神经网络对应的类别标签y_{v_ind_m}与每级的验证输出 y_{v_m}进行比对，即可获得每级宽度神经网络的正确分类样本集y_{vc_m}和错误分类样本集y_{vw_m}。Comparing the class label y _{v_ind_m} corresponding to the width neural network of each level after training with the verification output y _{v_m} of each level, the correct classification sample set y _{vc_m} and the wrong classification sample set y _{vw_m} of each level width neural network can be obtained.

(3)通过统计计算得到每级宽度神经网络的决策阈值T_m (3) The decision threshold T _m of the width neural network of each stage is obtained by statistical calculation

本网络比较困难的部分是每一级决策阈值的确定，它用来确定在测试的时候，每一个样本应该由哪一级的网络输出。在训练和测试之后，对正确分类样本集和错误分类的样本集分别进行统计计算。假设在m级宽度神经网络，正确分类样本集和错误分类样本集分别为：y_{vc_m}和y_{vw_m}，正确分类样本集和错误分类样本集的样本总数分别为：N_{vc_m}和N_{vw_m}，且N_{vc_m}+N_{vw_m}＝N_tr。The difficult part of this network is the determination of the decision threshold of each level, which is used to determine which level of network output each sample should be during testing. After training and testing, statistical calculations are performed on the correctly classified sample set and the misclassified sample set, respectively. Assuming that in the m-level width neural network, the correct classification sample set and the wrong classification sample set are: y _{vc_m} and y _{vw_m} respectively, the total number of samples of the correct classification sample set and the wrong classification sample set are: N _{vc_m} and N _{vw_m} , and N _{vc_m} +N _{vw_m} =N _tr .

以上验证过程中，为了保证最终样本有足够多的错分样本集，每个验证集可以是包含有N_val个将原始训练样本集经过数据变换得到的验证样本集，即每个验证集可以包含N_val组验证样本集，也就是说每个验证集的样本数为原始训练样本的N_val倍。In the above verification process, in order to ensure that there are enough misclassified sample sets in the final sample, each verification set can contain N _val verification sample sets obtained by transforming the original training sample set through data, that is, each verification set can contain N _val sets of validation samples, that is to say, the number of samples in each validation set is N _val times the original training samples.

两类样本集的误差通过下式计算：The error of the two types of sample sets is calculated by the following formula:

其中，t_{vc_m}和t_{vw_m}是m级中正确分类样本y_{vc_m}和错误分类样本y_{vw_m}对应的真实标签。假设正确分类和错误分类这两类样本统计的均值和方差分别是：μ_c，u_w，σ_c，σ_w，与之对应的两个高斯分布分别是：Among them, t _{vc_m} and t _{vw_m} are the true labels corresponding to the correctly classified samples y _{vc_m} and the misclassified samples y _{vw_m} in the m level. Assuming that the mean and variance of the two types of sample statistics for correct classification and misclassification are: μ _c , u _w , σ _c , σ _w , the corresponding two Gaussian distributions are:

其高斯概率密度函数分别是：The Gaussian probability density functions are:

在并行多级宽度神经网络的一级上，其验证集误差分布及其概率密度函数如图3(a)和(b)所示，则m级宽度神经网络的决策阈值为：At the first level of the parallel multi-level wide neural network, its validation set error distribution and its probability density function are shown in Figure 3(a) and (b), then the decision threshold of the m-level wide neural network is:

T_m＝min(e_{vw_m})-ασ_w；T _m =min(e _{vw_m} )-ασ _w ;

(4)通过测试集对决策阈值确定的并行多级宽度神经网络进行测试(4) Test the parallel multi-level wide neural network determined by the decision threshold through the test set

如图1(c)所示，具体的测试过程为：As shown in Figure 1(c), the specific test process is:

首先，获取测试集，具体过程为：获取原始测试样本集X_test；通过M次数据扩充，对应获取M组测试样本集x_{test_1}，...，x_{test_m}，...，x_{test_M}，即为测试集；其中，原始测试样本集x_test中测试样本总数为N_{test_samples}。First, obtain the test set, the specific process is: obtain the original test sample set X _test ; through M data expansion, correspondingly obtain M groups of test sample sets x _{test_1} , . . . , x _{test_m} , . . . , x _{test_M} , namely Test set; among them, the total number of test samples in the original test sample set x _test is N _{test_samples} .

上述的数据扩充为：对所述原始测试样本集X_test中的每个样本分别进行 N_testD次数据变换，对应得到N_testD个测试样本集，作为决策阈值确定的并行M级宽度神经网络的第m级宽度神经网络的测试集x_{test_m}。The above-mentioned data expansion is: perform N _testD data transformations on each sample in the original test sample set X _test respectively, and correspondingly obtain N _testD test sample sets, which are used as the first step of the parallel M-level width neural network determined by the decision threshold. A test set x _{test_m} for a class-m wide neural network.

上述的测试集获取方法能够在后续的测试过程中得到测试的稳定性。The above test set acquisition method can obtain the stability of the test in the subsequent test process.

其次，将M组测试样本集x_{test_1}，...，x_{test_m}，...，x_{test_M}并行输入给决策阈值确定的并行M级宽度神经网络，对测试集进行测试，即将每组测试集对应输入给决策阈值确定的每级宽度神经网络进行测试，对应得到决策阈值确定的每级宽度神经网络的N_testD个测试样本集输出；对N_testD个测试样本集的输出取平均值，得到决策阈值确定的每级宽度神经网络的测试输出

Secondly, the M groups of test sample sets x _{test_1} , ..., x _{test_m} , ..., x _{test_M are} input in parallel to the parallel M-level width neural network determined by the decision threshold, and the test set is tested, that is, each test set corresponds to Input the input to the neural network with the width of each level determined by the decision threshold for testing, and correspondingly obtain the output of N _testD test sample sets of the neural network with the width of each level determined by the decision threshold; take the average of the outputs of the N _testD test sample sets to obtain the decision threshold. Determine the test output of the neural network of width per stage

再次，设定测试集的总类别数为C，构建参考矩阵R_j(1≤j≤C)；获取验证输出y_{v_m}与对应级的参考矩阵R_j之间的误差向量：Again, set the total number of categories in the test set as C, and construct the reference matrix R _j (1≤j≤C); obtain the error vector between the verification output y _{v_m} and the reference matrix R _j of the corresponding level:

最后，对决策阈值确定的每级宽度神经网络的输出进行判断，具体地：当前级宽度神经网络的最小误差小于等于当前级决策阈值时，即 min(J_{test_m}j)≤T_m，则判断为当前级为该输出的正确分类输出级。Finally, judge the output of each level width neural network determined by the decision threshold, specifically: when the minimum error of the current level width neural network is less than or equal to the current level decision threshold, that is, min(J _{test_m} j)≤T _m , then it is judged as The current stage is the correct classification output stage for this output.

当前级宽度神经网络的最小误差大于当前级决策阈值时，即 min(J_{test_mj})＞T_m，则判断为当前级无法对该输出进行正确分类，将该输出转入下一级宽度神经网络进行测试，如此循环，直到该输出找到正确分类输出级。进而得到决策阈值确定的每级宽度神经网络的测试输出对应的标签

其中，y_{test_ind_m}的维数为1×N_{test_samples}。When the minimum error of the current-level width neural network is greater than the current-level decision threshold, that is, min(J _{test_mj} )>T _m , it is judged that the current level cannot correctly classify the output, and the output is transferred to the next-level width neural network for processing. Test, and so on, until the output finds the correct classification output stage. Then get the label corresponding to the test output of the neural network of each level width determined by the decision threshold

where the dimension of y _{test_ind_m} is 1×N _{test_samples} .

如果测试样本在前面25级均不能输出，则在最后的第26级直接输出。If the test sample cannot be output in the first 25 levels, it will be output directly in the last 26 level.

最终可得到测试集在整个网络的输出L_test；其中，正确分类样本和错误分类样本都可以统计算出，进而可以得到本发明并行多级宽度神经网络的样本分类的精度。Finally, the output L _test of the test set in the entire network can be obtained; wherein, both the correctly classified samples and the wrongly classified samples can be calculated statistically, and then the accuracy of the sample classification of the parallel multi-level wide neural network of the present invention can be obtained.

对比例Comparative ratio

采用与上述实施例相同的原始训练样本集、验证集和测试集，分别采用随机森林(RF)，多层感知器(MP)，传统径向基函数网络(RBF)，支持向量机(SVM)，广度学习系统(BLS)、条件深度学习模型(CDL)，深度信念网络(DBL)，卷积神经网络LeNet-5，深度玻尔兹曼机(DBM)以及深度随机深林(gc)作为基分类器，进行学习分类，最终得到的各种学习方法对数据分类的精度如图4所示。The same original training sample set, validation set and test set as in the above-mentioned embodiment are used, and random forest (RF), multi-layer perceptron (MP), traditional radial basis function network (RBF), support vector machine (SVM) are used respectively. , Breadth Learning System (BLS), Conditional Deep Learning Model (CDL), Deep Belief Network (DBL), Convolutional Neural Network LeNet-5, Deep Boltzmann Machine (DBM) and Deep Stochastic Deep Forest (gc) as base classification Figure 4 shows the accuracy of data classification obtained by various learning methods.

从图4可以看出，相比于目前主流的学习模型：随机森林(RF)，多层感知器(MP)，传统径向基函数网络(RBF)，支持向量机(SVM)，广度学习系统(BLS)、条件深度学习模型(CDL)，深度信念网络(DBL)，卷积神经网络LeNet-5，深度玻尔兹曼机(DBM)，以及深度随机深林(gc forest)，本发明的并行多级宽度神经网络(PMWNN)的分类结果的准确率具有非常高的竞争性，本发明方法最终的分类精度为99.10％，WRBF为宽度径向基函数网络。而相比于深度随机深林学习模型，本发明方法神经网络具有多级基神经网络，每一级用来学习数据集的不同部分样本，能够根据问题及数据集的复杂程度，自适应地确定神经网络的结构，实现计算资源的优化；同时，本发明的神经网络在测试的时候可以进行并行测试，也就是把测试数据同时给网络的所有级，由训练过程中得到的每一级的决策阈值来决定每个测试样本最终由哪一级的神经网络输出，并行测试过程大大减少了实际使用网络时候的等待时间。As can be seen from Figure 4, compared to the current mainstream learning models: random forest (RF), multi-layer perceptron (MP), traditional radial basis function network (RBF), support vector machine (SVM), breadth learning system (BLS), Conditional Deep Learning Models (CDL), Deep Belief Networks (DBL), Convolutional Neural Networks LeNet-5, Deep Boltzmann Machines (DBM), and Deep Stochastic Deep Forests (gc forest), the parallel of the present invention The accuracy of the classification result of the multi-level wide neural network (PMWNN) has very high competitiveness. The final classification accuracy of the method of the present invention is 99.10%, and WRBF is a wide radial basis function network. Compared with the deep random deep forest learning model, the neural network of the method of the present invention has a multi-level base neural network, each level is used to learn different parts of the data set samples, and can adaptively determine the neural network according to the complexity of the problem and the data set. The structure of the network realizes the optimization of computing resources; at the same time, the neural network of the present invention can be tested in parallel during the test, that is, the test data is given to all levels of the network at the same time, and the decision threshold of each level obtained in the training process is obtained. To decide which level of neural network each test sample is finally output by, the parallel testing process greatly reduces the waiting time when actually using the network.

此外，本发明的并行多级宽度神经网络可以实现增量学习，即当有新数据来的时候，可以增加新的宽度径向基函数网络来学习新的特性，而无需重新训练整个并行多级宽度神经网络，这个意味着提出的网络可以在不遗忘旧知识的前提下学习新的知识。新的训练数据输入给当前M级网络，如果有错分的样本，那么它们和经过数据扩充的原始训练集一起建立新的训练数据集，训练新的宽度径向基函数网络，同时使用新的验证集进行验证，并且计算决策阈值，从而建立第M+1级网络。最终，新的并行多级宽度神经网络将由M+1 级宽度径向基函数网络组成。同时，本发明设计的并行多级宽度神经网络在测试的时候可以并行测试，所有的测试样本都会送给所有级的宽度径向基函数网络，决策阈值决定了哪一个宽度径向基函数网络分配给相应的测试样本。该过程不需要等待其它级的网络输出，从而在测试的时候并行化，加速了测试过程。In addition, the parallel multi-level wide neural network of the present invention can realize incremental learning, that is, when new data comes, a new wide radial basis function network can be added to learn new characteristics without retraining the entire parallel multi-level neural network Wide neural network, this means that the proposed network can learn new knowledge without forgetting old knowledge. The new training data is input to the current M-level network. If there are misclassified samples, then they and the original training set with data augmentation are used to build a new training data set, train the new width radial basis function network, and use the new The validation set is validated, and the decision threshold is calculated to build the M+1-th level network. Ultimately, the new parallel multi-level wide neural network will consist of M+1 level wide radial basis function networks. At the same time, the parallel multi-level width neural network designed by the present invention can be tested in parallel during testing, all test samples will be sent to the width radial basis function network of all levels, and the decision threshold determines which width radial basis function network is allocated. Give the corresponding test sample. This process does not need to wait for the network output of other stages, so that it can be parallelized during testing and speed up the testing process.

本发明的并行多级宽度神经网络中的每一级宽度神经网络，可以是宽度径向基函数网络、BP神经网络、卷积神经网络或者其他分类器，且多级宽度神经网络的每级基分类器的类型可以不同。Each level of width neural network in the parallel multi-level width neural network of the present invention may be a width radial basis function network, BP neural network, convolutional neural network or other classifiers, and each level of the multi-level width neural network The types of classifiers can be different.

显然，本领域的技术人员可以对本发明进行各种改动和变型而不脱离本发明的精神和范围。这样，倘若本发明的这些改动和变型属于本发明权利要求及其等同技术的范围之内，则本发明也意图包含这些改动和变型在内。It will be apparent to those skilled in the art that various modifications and variations can be made in the present invention without departing from the spirit and scope of the invention. Thus, provided that these changes and modifications of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include these changes and modifications.

Claims

1. A learning method based on a parallel multi-level width neural network comprises a multi-level width neural network, wherein each level of width neural network comprises an input layer, a hidden layer and an output layer which are connected in sequence, and the learning method is characterized by comprising the following steps:

step 1, obtaining an original training sample set, and constructing a parallel M-level width neural network Net₁，…Net_m，…，Net_MM is 1, 2 …, M, each level width neural network is used as a base classifier of the corresponding level; performing M times of data transformation on an original training sample set to correspondingly obtain M verification sets x_{v_1}，…x_{v_m}，…x_{v_M}；

Wherein, the total number of samples of the original training sample set is N_trEach training sample is an image sample to be learned; the parallel M-level width neural network Net is constructed₁，…Net_m，…，Net_MThe specific process is as follows:

designing a parallel multi-level width neural network by adopting a width radial basis function network as a basis classifier; m width radial basis function networks are connected together to form a parallel multi-level width neural network Net₁，Net₂，...Net_M(ii) a Each base classifier is used as a first level;

the specific process for constructing the wide radial basis function network comprises the following steps:

construction includes N_0kA Gaussian base function of

The center of the radial basis function network is a subset randomly taken from an original training sample set, and the value of the standard deviation is a constant; acquiring multiple groups of local characteristic images of each image sample to be learned in an original training sample set by adopting a sliding window so as to obtain multiple groups of local characteristic matrixes, and taking the multiple groups of local characteristic matrixes as input data of a Gaussian basis function to obtain multiple radial basis function networks, namely a wide radial basis function network;

step 2, adopting an original training sample set and M verification sets x_{v_1}，…x_{v_m}，…x_{v_M}Respectively training and verifying each level of the parallel M-level width neural network to obtain the trained parallel M-level width neural network and the verification output y corresponding to each level of the width neural network_{v_m}M is 1, 2 …, M; obtaining each verification output y by adopting a minimum error method_{v_m}Corresponding label y_{v_ind_m}Further obtaining a correct classification sample set y of the verification set of each level of width neural network of the trained parallel M level width neural network_{vc_m}And misclassification sample set y_{vw_m}；

The original training sample set and M verification sets x are adopted_{v_1}，…x_{v_m}，…x_{v_M}Training and verifying each stage of the parallel M-stage width neural network respectively, comprising the following sub-steps:

substep 2.1, using the original training sample set as the level 1 width neural networkNet₁For the 1 st order width neural network Net₁Training to obtain a trained first-level width neural network;

substep 2.2, using a first verification set x_{v_1}Verifying the trained 1 st-level width neural network to obtain an error classification sample set y of a verification set of the 1 st-level width neural network_{vw_1}；

Substep 2.3, misclassification sample set y of first-level width neural network_{vw_1}Input samples A as a level 2 wide neural network_{v_1}(ii) a Then randomly extracting a training sample set A from the original training sample set_{v_2}Let the total input sample set { A }_{v_1}+A_{v_2}The number of samples in (A) is equal to the number of samples in the original training sample set, and the total input sample set (A) is_{v_1}+A_{v_2}Taking the samples as input samples of a 2 nd-level width neural network;

substep 2.4, use the total input sample set { A }_{v_1}+A_{v_2}Training the 2 nd-level width neural network to obtain a trained 2 nd-level width neural network; using a second verification set x_{v_2}Verifying the trained 2 nd-level width neural network to obtain an error classification sample set y of a verification set of the 2 nd-level width neural network_{vw_2}；

And analogizing in sequence, respectively training the neural networks with the widths from the 3 rd level to the M th level to obtain the trained parallel neural networks with the widths of the M levels and the corresponding verification output y of the neural networks with the widths of each level_{v_m}；

The training process of each stage of the width neural network comprises the following steps:

(a) using image samples to be learned in an original training sample set as input data, setting the initial position of a sliding window at the upper left corner of each image sample to be learned, selecting a sliding step length of 1 pixel, sliding the sliding window from left to right in sequence from top to bottom, and stretching 3-dimensional image blocks of all the image samples in the sliding window into a matrix x_kRespectively forming corresponding original matrixes by each local characteristic image according to pixels, and sequentially arranging the 2 nd to the last columns of each original matrix to the 1 st column to form a column vector;arranging N column vectors in sequence to form a local feature matrix x of a group of training image samples_kK is more than or equal to 1 and less than or equal to K, and a local feature matrix x_kEach column of (a) represents one image sample to be learned;

(b) matrix x of local features_kInput to the device including N_0kA Gaussian base function of

The output is noted as:

wherein,

a column vector containing N elements;

sliding the sliding window to correspond to one radial basis function network every time, and obtaining K radial basis function networks after sliding is finished;

(c) for each radial basis function network, the output data phi of the Gaussian basis function subjected to nonlinear transformation_kIntroducing sequencing and downsampling:

output data phi for wide radial basis function networks_kEach row of the image to be learned is summed to obtain a row vector, each element of the row vector is the sum of the pixels of the local specific position of each image to be learned, the sums of the pixels of the local specific position of each image to be learned are arranged in a descending order to obtain a descending order vector

Using an index s_kVector a in descending order_kMarking an original position corresponding to the local specific position of each image to be learned to obtain sequenced output data phi'_k＝sort(Φ_k，s_k)；

Downsampling the sorted output data, and setting a downsampling interval N_kSThen the sampling output is phi_kS＝subsample(Φ′_k，N_kS) The output of the Gaussian base function is [ phi ] - [ phi ]_1S，Φ_2S，…，Φ_KS]；

(d) Setting the desired output to D ═ D₁，D₂，…，D_C](ii) a And performing linear layer connection on the output of the Gaussian basis function of the wide radial basis function network, wherein the weight of the linear layer is as follows: w ═ W₁，W₂，…，W_C]；

Wherein C is the total number of categories of the original sample;

obtaining class output Y ═ Y of the wide radial basis function network₁，Y₂，…，Y_C]Phi W; in particular, a least mean square estimate of the weights of the linear layer is calculated by minimizing the square error

The concrete formula is as follows:

least mean square estimation of weights of linear layers by pseudo-inverse matrix of gaussian basis function output phi of wide radial basis function network

Wherein phi⁺Outputting a pseudo-inverse matrix of phi for the Gaussian basis function of the wide radial basis function network;

finally, the class output of the wide radial basis function network obtained by calculation is as follows:

further obtaining a trained width radial basis function network, and completing the training process of each level of width neural network;

step 3, correctly classifying a sample set y of a verification set of each level of width neural network of the trained parallel M level width neural network_{vc_m}And misclassification sample set y_{vw_m}Respectively carrying out statistical calculation to correspondingly obtain the decision threshold T of the trained neural network with each level of width_m(ii) a Decision threshold T of neural network with each level width_mThe decision basis is used as the decision basis of the neural network with the corresponding level width, and the parallel neural network with the M level width determined by the decision threshold is obtained;

step 4, obtaining a test set, taking the test set as input data of the parallel M-level width neural network determined by the decision threshold, and inputting the input data to each level of width neural network determined by the decision threshold in parallel to test to obtain the output of each level of width neural network determined by the decision threshold; obtaining an error vector of each level of width neural network, and judging the output of each level of width neural network determined by the decision threshold, thereby obtaining a label y corresponding to the test output of each level of width neural network determined by the decision threshold_{test_ind_m}。

2. The parallel multi-level width neural network-based learning method of claim 1, wherein in step 1, the data transformation compresses or deforms the samples in the original sample set by elastic transformation; or the data transformation rotates, flips, zooms in, or zooms out samples in the original sample set through affine transformation.

3. The learning method based on the parallel multi-level width neural network as claimed in claim 1, wherein in step 2, the minimum error method is:

firstly, setting the total class number of an original training sample set as C, and constructing a reference matrix R_j，1≤j≤C；

Wherein, the reference matrix R_jEach reference matrix R having 1 for the element in the jth row and 0 for the remaining elements_jOf dimension C × N_tr；

Second, it is used forAccording to the verification output y of the trained neural network with each stage width_{v_m}Obtaining verification output y_{v_m}Reference matrix R corresponding to the stage_jError vector between:

J_{v_mj}＝||softmax(y_{v_m})-R_j||₂，1≤j≤C；

wherein | | | purple hair₂Representing the 2 norm of the matrix, softmax () being a normalized exponential function; j. the design is a square_{v_mj}Dimension of 1 × N_tr；y_{v_m}Of dimension C × N_tr；

Finally, output y to verification_{v_m}Reference matrix R corresponding to the stage_jError vector J therebetween_{v_mj}Calculating the minimum value to obtain the class label y corresponding to the trained neural network with each level of width_{v_ind_m}：

Wherein, y_{v_ind_m}Dimension of 1 × N_tr。

4. The parallel multi-level width neural network-based learning method of claim 1, wherein in step 3, the statistical calculation comprises the following sub-steps:

and 3.1, setting a correct classification sample set and an incorrect classification sample set of the mth level width neural network of the trained parallel M level width neural network as follows: y is_{vc_m}And y_{vw_m}The total number of samples in the correctly classified sample set and the incorrectly classified sample set is respectively as follows: n is a radical of_{vc_m}And N_{vw_m}And N is_{vc_m}+N_{vw_m}＝N_trThen, the errors of the correctly classified sample set and the incorrectly classified sample set are respectively:

e_{vc_m}＝||softmax(y_{vc_m})-t_{vc_m}||₂；

e_{vw_m}＝||softmax(y_{vw_m})-t_{vw_m}||₂；

wherein, t_{vc_m}Is an m-level width neural networkIn correctly classifying the sample y_{vc_m}Corresponding real label, t_{vw_m}Is a misclassified sample y in an m-level wide neural network_{vw_m}A corresponding real label;

substep 3.2, sample set y is classified correctly_{vc_m}And misclassification sample set y_{vw_m}Respectively calculate the correct classification sample set y_{vc_m}Respectively, mean and variance of_cAnd σ_c(ii) a Misclassification sample set y_{vw_m}The mean and variance of (a) are respectively: u. of_wAnd σ_w(ii) a Then the sample set y is correctly classified_{vc_m}And misclassification sample set y_{vw_m}The corresponding gaussian distributions are:

correctly classifying sample set y_{vc_m}And misclassification sample set y_{vw_m}The corresponding gaussian probability density functions are:

substep 3.3, sorting the sample set y according to errors_{vw_m}Error e of_{vw_m}Sum variance σ_wObtaining a decision threshold T of the neural network with m-level width_m＝min(e_{vw_m})-ασ_w；

Wherein α is a constant to give a margin to allow all misclassified samples y_{vw_m}Is rejected at the current stage.

5. Parallel multi-level-width neural network-based science according to claim 2The learning method is characterized in that in step 4, the test set acquisition is as follows: obtaining an original test sample set x_test(ii) a Correspondingly acquiring M groups of test sample sets x through M times of data expansion_{test_1}，...，x_{test_m}，...，x_{test_M}I.e. a test set.

6. The parallel multi-level-width neural network-based learning method of claim 5, wherein the data is augmented as: for the original test sample set x_testIs performed for each sample in N_testDConverting the data to obtain N_testDTest sample set as test set x of M-th order width neural network of parallel M-order width neural network determined by decision threshold_{test_m}；

Wherein, the original test sample set x_testTotal number of middle test samples is N_{test_samples}。

7. The parallel multi-level width neural network-based learning method of claim 1, wherein in step 4, the obtaining the error vector of each level of width neural network comprises the following sub-steps:

substep 4.1, set M groups of test samples x_{test_1}，x_{test_2}，...，x_{test_M}Respectively parallelly inputting the data to parallel M-level width neural networks determined by the decision threshold, and correspondingly obtaining N of each-level width neural network determined by the decision threshold_testDAn output y_{test_m_d}，d＝1，2…N_testD；

Substep 4.2, N for each level of width neural network determined for decision threshold_testDAn output y_{test_m_d}，d＝1，2…N_testDCalculating the average value to obtain the test output of each level of width neural network determined by the decision threshold

Substep 4.3, setting the total class number of the test set as C, and constructing a reference matrix R_j，1≤j is less than or equal to C; obtaining verification output y_{v_m}Reference matrix R corresponding to the stage_jError vector between:

J_{test_mj}＝||softmax(y_{test_m})-R_j||₂，1≤j≤C；

wherein, the reference matrix R_jEach reference matrix R having 1 for the element in the jth row and 0 for the remaining elements_jOf dimension C × N_{test_samples}；J_{test_mj}Dimension of 1 × N_{test_samples}，y_{v_m}Of dimension C × N_{test_samples}。

8. The learning method based on the parallel multi-level width neural network of claim 7, wherein the output of each level of width neural network determined by the decision threshold is determined as:

when the minimum error of the current stage width neural network is less than or equal to the current stage decision threshold, judging that the current stage is the correct classification output stage of the output:

min(J_{test_mj})≤T_m；

when the minimum error of the current stage width neural network is larger than the decision threshold of the current stage, judging that the current stage can not correctly classify the output, transferring the output to the next stage width neural network for testing, and repeating the steps until the output finds the correctly classified output stage:

min(J_{test_mj})＞T_m。

9. the learning method based on the parallel multi-level width neural network of claim 8, wherein in step 4, the label y corresponding to the test output of each level of width neural network determined by the decision threshold is obtained_{test_ind_m}Comprises the following steps:

wherein, y_{test_ind_m}Dimension of 1 × N_{test_samples}。