CN110738242B

CN110738242B - Bayes structure learning method and device of deep neural network

Info

Publication number: CN110738242B
Application number: CN201910912494.3A
Authority: CN
Inventors: 朱军; 邓志杰; 张钹
Original assignee: Tsinghua University
Current assignee: Tsinghua University
Priority date: 2019-09-25
Filing date: 2019-09-25
Publication date: 2021-08-10
Anticipated expiration: 2039-09-25
Also published as: CN110738242A

Abstract

Embodiments of the present invention provide a Bayesian structure learning method and device for a deep neural network. The method includes constructing a deep neural network including a plurality of learning units with the same internal structure, each learning unit contains several hidden layers, and between the hidden layers includes a plurality of computing units, the network structure is the relative weight of each computing unit, and the parameters are used. The network structure is modeled by changing the variational distribution; the training subset is extracted, and the network structure is sampled by the reparameterization process; . In the embodiment of the present invention, a deep neural network including a plurality of learning units with the same internal structure is constructed, and the relative weights of the calculation units among the hidden layers in the learning unit are trained through a training set, so as to obtain an optimized network structure, This brings a comprehensive improvement to the prediction performance and prediction uncertainty of deep neural networks.

Description

A Bayesian Structure Learning Method and Device for Deep Neural Networks

技术领域technical field

本发明涉及数据处理技术领域，尤其涉及一种深度神经网络的贝叶斯结构学习方法及装置。The present invention relates to the technical field of data processing, in particular to a Bayesian structure learning method and device of a deep neural network.

背景技术Background technique

贝叶斯深度学习旨在为灵活而有效的深度神经网络提供准确可靠的不确定性评估。传统上，贝叶斯网络在网络权重上引入不确定性，这往往可以防止模型容易过拟合的问题，也为模型带来了使用高效的预测不确定性。然而，在网络权重上引入不确定性存在问题。首先，人为设定的权重的先验分布往往不可靠，容易导致过剪枝等问题，使得模型拟合能力大大受限；其次，在权重上引入灵活的变分分布容易带来推理的困难，这是因为变分分布中复杂的依赖关系。最近，基于粒子的变分推理技术也被使用来优化贝叶斯网络，但是它们也存在粒子塌陷和退化等问题。Bayesian deep learning aims to provide accurate and reliable uncertainty assessments for flexible and efficient deep neural networks. Traditionally, Bayesian networks introduce uncertainty in the network weights, which often prevents the model from being prone to overfitting, and also brings uncertainty to the model using efficient predictions. However, there are problems with introducing uncertainty on network weights. First, the prior distribution of the artificially set weights is often unreliable, which can easily lead to problems such as over-pruning, which greatly limits the model fitting ability; This is because of the complex dependencies in the variational distribution. Recently, particle-based variational inference techniques have also been used to optimize Bayesian networks, but they also suffer from problems such as particle collapse and degradation.

因此，现在的贝叶斯网络在实际应用中无法提供准确可靠的预测性能。Therefore, current Bayesian networks cannot provide accurate and reliable prediction performance in practical applications.

发明内容SUMMARY OF THE INVENTION

由于现有方法存在上述问题，本发明实施例提供一种深度神经网络的贝叶斯结构学习方法及装置。Due to the above problems in the existing methods, embodiments of the present invention provide a Bayesian structure learning method and device for a deep neural network.

第一方面，本发明实施例提供了一种深度神经网络的贝叶斯结构学习方法，包括：In a first aspect, an embodiment of the present invention provides a Bayesian structure learning method for a deep neural network, including:

构建深度神经网络，所述深度神经网络包括至少一个具有相同内部结构的学习单元，所述学习单元包括预设层数的隐层，且每两个隐层间包括复数种计算单元，定义网络结构为各计算单元的相对权重，并采用参数化的变分分布来建模所述网络结构；Constructing a deep neural network, the deep neural network includes at least one learning unit with the same internal structure, the learning unit includes a preset number of hidden layers, and every two hidden layers includes a plurality of computing units, defining the network structure is the relative weight of each computing unit, and adopts a parameterized variational distribution to model the network structure;

从预设的训练集中随机提取训练子集，并采用重参数化过程采样所述学习单元的网络结构；Randomly extract a training subset from a preset training set, and use a reparameterization process to sample the network structure of the learning unit;

根据所述采样的网络结构，计算所述深度神经网络的证据下界ELBO；Calculate the evidence lower bound ELBO of the deep neural network according to the sampled network structure;

若所述证据下界的变化超过预设的损失阈值，则根据预设的优化方法优化所述网络结构和网络权重，并再次从所述训练集中随机提取训练子集以继续对所述学习单元的网络结构进行训练；If the change of the lower bound of the evidence exceeds a preset loss threshold, the network structure and network weight are optimized according to a preset optimization method, and a training subset is randomly extracted from the training set again to continue the optimization of the learning unit. The network structure is trained;

若所述证据下界的变化未超过预设的损失阈值，则判定训练结束。If the change of the lower bound of the evidence does not exceed the preset loss threshold, it is determined that the training ends.

进一步地，所述采用重参数化过程采样所述学习单元的网络结构；具体包括：Further, the re-parameterization process is used to sample the network structure of the learning unit; specifically, it includes:

根据预设的适应性系数，采用重参数化过程采样所述学习单元的网络结构。According to the preset adaptive coefficient, the network structure of the learning unit is sampled using a reparameterization process.

进一步地，所述根据所述采样的网络结构，得到所述深度神经网络的证据下界；具体包括：Further, according to the sampled network structure, the evidence lower bound of the deep neural network is obtained; specifically, it includes:

根据所述采样的网络结构，计算所述训练子集中经过标注的各样本对应的输出结果，并计算所述深度神经网络的误差，以及所述网络变分分布与预设的先验分布中的对数密度差值；According to the sampled network structure, the output results corresponding to the marked samples in the training subset are calculated, and the error of the deep neural network and the difference between the network variational distribution and the preset prior distribution are calculated. log density difference;

将所述深度神经网络的误差和所述对数密度差值进行加权求和，以得到所述深度神经网络的证据下界。A weighted summation of the error of the deep neural network and the log density difference is performed to obtain an evidence lower bound for the deep neural network.

进一步地，所述构建深度神经网络，所述深度神经网络包括至少一个具有相同内部结构的学习单元；具体包括：Further, in the construction of a deep neural network, the deep neural network includes at least one learning unit with the same internal structure; specifically, it includes:

构建深度神经网络，所述深度神经网络包括至少一个具有相同内部结构的学习单元，在预先确定的学习单元间插入预设的下采样层和/或上采样层；其中，所述下采样层包括：批正则化层、线性整流层、卷积层和池化层，所述上采样层由反卷积层构建。Constructing a deep neural network, the deep neural network includes at least one learning unit with the same internal structure, and inserting a preset down-sampling layer and/or an up-sampling layer between predetermined learning units; wherein, the down-sampling layer includes : batch regularization layer, linear rectification layer, convolutional layer and pooling layer, the upsampling layer is constructed by deconvolution layer.

进一步地，所述深度神经网络的输入层为用于预处理的卷积层，输出层为线性全连接层。Further, the input layer of the deep neural network is a convolutional layer used for preprocessing, and the output layer is a linear fully connected layer.

进一步地，所述变分分布为concrete分布。Further, the variational distribution is a concrete distribution.

第二方面，本发明实施例提供了一种深度神经网络的贝叶斯结构学习装置，包括：In a second aspect, an embodiment of the present invention provides a Bayesian structure learning device for a deep neural network, including:

构建单元，用于构建深度神经网络，所述深度神经网络包括至少一个具有相同内部结构的学习单元，所述学习单元包括预设层数的隐层，且每两个隐层间包括复数种计算单元，定义网络结构为各计算单元的相对权重，并采用参数化的变分分布来建模所述网络结构；a construction unit for constructing a deep neural network, the deep neural network includes at least one learning unit with the same internal structure, the learning unit includes a preset number of hidden layers, and every two hidden layers includes a plurality of computations unit, the network structure is defined as the relative weight of each computing unit, and the parameterized variational distribution is used to model the network structure;

训练单元，用于从预设的训练集中随机提取训练子集，并采用重参数化过程采样所述学习单元的网络结构；A training unit for randomly extracting a training subset from a preset training set, and sampling the network structure of the learning unit by a reparameterization process;

误差计算单元，用于根据所述采样的网络结构，得到所述深度神经网络的证据下界ELBO；an error calculation unit, configured to obtain the evidence lower bound ELBO of the deep neural network according to the sampled network structure;

判断单元，用于若所述证据下界的变化超过预设的损失阈值，则根据预设的优化方法优化所述网络结构和网络权重，并再次从所述训练集中随机提取训练子集以继续对所述学习单元的网络结构进行训练；若所述证据下界的变化未超过预设的损失阈值，则判定训练结束。The judgment unit is configured to optimize the network structure and network weight according to a preset optimization method if the change of the lower bound of the evidence exceeds a preset loss threshold, and randomly extract a training subset from the training set again to continue the analysis. The network structure of the learning unit is trained; if the change of the lower bound of the evidence does not exceed a preset loss threshold, it is determined that the training is over.

进一步地，所述训练单元具体用于，从预设的训练集中随机提取训练子集，并根据预设的适应性系数，采用重参数化过程采样所述学习单元的网络结构。Further, the training unit is specifically configured to randomly extract a training subset from a preset training set, and use a reparameterization process to sample the network structure of the learning unit according to a preset adaptability coefficient.

第三方面，本发明实施例还提供了一种电子设备，包括：In a third aspect, an embodiment of the present invention also provides an electronic device, including:

处理器、存储器、通信接口和通信总线；其中，processors, memories, communication interfaces and communication buses; wherein,

所述处理器、存储器、通信接口通过所述通信总线完成相互间的通信；The processor, the memory, and the communication interface communicate with each other through the communication bus;

所述通信接口用于该电子设备的通信设备之间的信息传输；The communication interface is used for information transmission between communication devices of the electronic device;

所述存储器存储有可被所述处理器执行的计算机程序指令，所述处理器调用所述程序指令能够执行如下方法：The memory stores computer program instructions executable by the processor, and the processor invokes the program instructions to perform the following methods:

若所述证据下界的变化超过预设的损失阈值，则根据预设的优化方法优化所述网络结构的分布和网络权重，并再次从所述训练集中随机提取训练子集以继续对所述学习单元的网络结构进行训练；If the change of the lower bound of the evidence exceeds a preset loss threshold, optimize the distribution and network weight of the network structure according to a preset optimization method, and randomly extract a training subset from the training set again to continue the learning The network structure of the unit is trained;

第四方面，本发明实施例还提供了一种非暂态计算机可读存储介质，其上存储有计算机程序，该计算机程序被处理器执行时实现如下方法：In a fourth aspect, an embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, and when the computer program is executed by a processor, the following method is implemented:

本发明实施例提供的深度神经网络的贝叶斯结构学习方法及装置，通过构建包括多个相同内部结构的学习单元的深度神经网络，并通过训练集对所述学习单元中各隐层间各计算单元的相对权重进行训练，以得到所述学习单元的优化的网络结构，从而为深度神经网络的预测性能和预测不确定性带来了全面提升。The Bayesian structure learning method and device for a deep neural network provided by the embodiments of the present invention, by constructing a deep neural network including a plurality of learning units with the same internal structure, and using a training set for each hidden layer in the learning unit. The relative weights of the computing units are trained to obtain an optimized network structure of the learning unit, thereby bringing about an overall improvement in the prediction performance and prediction uncertainty of the deep neural network.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作一简单地介绍，显而易见地，下面描述中的附图是本发明的一些实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据这些附图获得其他的附图。In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following briefly introduces the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description These are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained according to these drawings without creative efforts.

图1为本发明实施例的深度神经网络的贝叶斯结构学习方法流程图；1 is a flowchart of a Bayesian structure learning method of a deep neural network according to an embodiment of the present invention;

图2为本发明实施例的深度神经网络的贝叶斯结构学习装置结构示意图；2 is a schematic structural diagram of a Bayesian structure learning device for a deep neural network according to an embodiment of the present invention;

图3示例了一种电子设备的实体结构示意图。FIG. 3 illustrates a schematic diagram of the physical structure of an electronic device.

具体实施方式Detailed ways

为使本发明实施例的目的、技术方案和优点更加清楚，下面将结合本发明实施例中的附图，对本发明实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例是本发明一部分实施例，而不是全部的实施例。基于本发明中的实施例，本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例，都属于本发明保护的范围。In order to make the purposes, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments These are some embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts shall fall within the protection scope of the present invention.

图1为本发明实施例的深度神经网络的贝叶斯结构学习方法流程图，如图1所示，所述方法包括：FIG. 1 is a flowchart of a Bayesian structure learning method for a deep neural network according to an embodiment of the present invention. As shown in FIG. 1 , the method includes:

步骤S01、构建深度神经网络，所述深度神经网络包括至少一个具有相同内部结构的学习单元，所述学习单元包括预设层数的隐层，且每两个隐层间包括复数种计算单元，定义网络结构为各计算单元的相对权重，并采用参数化的变分分布来建模所述网络结构。Step S01, constructing a deep neural network, the deep neural network includes at least one learning unit with the same internal structure, the learning unit includes a preset number of hidden layers, and between every two hidden layers includes a plurality of computing units, The network structure is defined as the relative weight of each computing unit, and a parameterized variational distribution is used to model the network structure.

根据实际的需要构建深度神经网络，所述深度神经网络包括：输入层、输出层，以及位于输入层和输出层之间的至少一个重复堆砌的学习单元，且各学习单元依次串联，具有相同的网络结构。A deep neural network is constructed according to actual needs. The deep neural network includes: an input layer, an output layer, and at least one repeatedly stacked learning unit between the input layer and the output layer, and each learning unit is connected in series in sequence, with the same network structure.

所述学习单元包括预设层数K个隐层，且任意两个隐层间包括多个计算单元，例如，全连接单元、卷积单元和池化单元等。所述学习单元的网络结构α＝{α^(i，j)|1≤i＜j≤K}，其中，α^(i，j)包括第i层隐层与第j层隐层之间与各计算单元对应的相对权重。所述第i层隐层的特征表示到第j个隐层的特征表示的计算为：根据所述α^(i，j)，两者之间各计算单元输出的加权和。The learning unit includes a preset number of K hidden layers, and any two hidden layers include multiple computing units, such as a fully connected unit, a convolution unit, and a pooling unit. The network structure of the learning unit α={α ^{(i, j)} |1≤i<j≤K}, where α ^{(i, j)} includes the distance between the hidden layer of the i-th layer and the hidden layer of the j-th layer and each Calculate the relative weights corresponding to the units. The calculation from the feature representation of the i-th hidden layer to the feature representation of the j-th hidden layer is: according to the α ^{(i, j)} , the weighted sum of the outputs of the calculation units between the two.

所述与各计算单元对应的相对权重，服从分类分布，为了方便基于优化方法进行训练，可设定为可学习的、连续化、参数化的变分分布，以此构建所述网络结构。The relative weight corresponding to each computing unit obeys the classification distribution. In order to facilitate training based on the optimization method, it can be set as a learnable, continuous and parameterized variational distribution to construct the network structure.

所述变分分布可根据实际的需要来进行设定，本发明实施例仅给出了其中的一种举例说明：concrete分布，但为了简便起见，在下面的实施例中都concrete分布为例进行举例说明。The variational distribution can be set according to actual needs, and the embodiment of the present invention only provides an example of the concrete distribution: concrete distribution, but for the sake of simplicity, the concrete distribution is used as an example in the following embodiments. for example.

步骤S02、从预设的训练集中随机提取训练子集，并采用重参数化过程采样所述学习单元的网络结构。Step S02 , randomly extract a training subset from a preset training set, and use a reparameterization process to sample the network structure of the learning unit.

步骤S03、根据所述采样的网络结构，得到所述深度神经网络的证据下界ELBO。Step S03: Obtain the evidence lower bound ELBO of the deep neural network according to the sampled network structure.

步骤S04、若所述证据下界的变化超过预设的损失阈值，则根据预设的优化方法优化所述网络结构的分布和网络权重，并再次从所述训练集中随机提取训练子集以继续对所述学习单元的网络结构进行训练。Step S04, if the change of the lower bound of the evidence exceeds the preset loss threshold, optimize the distribution of the network structure and the network weight according to the preset optimization method, and randomly extract training subsets from the training set again to continue the analysis. The network structure of the learning unit is trained.

步骤S05、若所述证据下界的变化未超过预设的损失阈值，则判定训练结束。Step S05, if the change of the lower bound of the evidence does not exceed the preset loss threshold, it is determined that the training is over.

预先设定包含N个样本的训练集，每个样本(x_n，y_n)包括输入数据x_n以及对应的预先标注的输出结果y_n。A training set containing N samples is preset, and each sample (x _n , y _n ) includes input data x _n and a corresponding pre-labeled output result _yn .

根据所述训练集，对所述深度神经网络进行训练，并在结束训练后，最终得到所述学习单元的优化的网络结构，所述优化的网络结构包括每两个隐层间各计算单元对应的优化的相对权重。具体的训练过程如下：According to the training set, the deep neural network is trained, and after the training is completed, the optimized network structure of the learning unit is finally obtained, and the optimized network structure includes the corresponding calculation units between each two hidden layers. The relative weight of the optimization. The specific training process is as follows:

1.随机选取训练集中一个大小为k的子集

使用重参数化技术从变分分布中随机采样出所述学习单元的网络结构α，所述深度神经网络中的网络权重和偏置参数表示为w，所述深度神经网络根据所述采样的网络结构得到的预测概率

f是一个以x_n为输入以w为参数、以α为学习单元的网络结构的深度神经网络对应的函数。1. Randomly select a subset of size k in the training set

Using the reparameterization technique to randomly sample the network structure α of the learning unit from the variational distribution, the network weight and bias parameters in the deep neural network are represented as w, and the deep neural network is based on the sampled network. Predicted probabilities obtained by the structure

f is a function corresponding to a deep neural network with x _n as the input, w as the parameter, and α as the network structure of the learning unit.

2.根据所述采样的网络结构，得到所述深度神经网络当前的证据下界(EvidenceLower Bound，ELBO)。2. Obtain the current evidence lower bound (Evidence Lower Bound, ELBO) of the deep neural network according to the sampled network structure.

3.将当前的证据下界与上次训练完成后得到的证据下界进行比较，以得到所述证据下界的变化。若本次训练后得到证据下界的变化超过了预设的损失阈值，则判定训练未完成，进一步根据预设的优化方法，对当前的网络结构和网络权重进行优化。然后，重新从训练集中选取子集以继续执行1-3的训练过程。直到所述证据下界的变化小于等于所述损失阈值，则判定训练结束。3. Compare the current evidence lower bound with the evidence lower bound obtained after the last training is completed to obtain the change of the evidence lower bound. If the change of the lower bound of the evidence obtained after this training exceeds the preset loss threshold, it is determined that the training is not completed, and the current network structure and network weight are further optimized according to the preset optimization method. Then, re-select a subset from the training set to continue the training process of 1-3. Until the change of the lower bound of the evidence is less than or equal to the loss threshold, it is determined that the training is over.

由上述的训练过程可知，本发明实施例的训练过程仅是对学习单元内部各隐层之间的连接关系进行训练，因此对于K层隐层结构，所述学习单元的网络结构α中共有K(K-1)/2个变量，每一个变量表示一对隐层之间的计算单元的相对权重。It can be seen from the above training process that the training process of the embodiment of the present invention is only to train the connection relationship between the hidden layers inside the learning unit. Therefore, for the K-layer hidden layer structure, the network structure α of the learning unit has a total of K (K-1)/2 variables, each of which represents the relative weight of computing units between a pair of hidden layers.

本发明实施例通过构建包括多个相同内部结构的学习单元的深度神经网络，并通过训练集对所述学习单元中各隐层间各计算单元的相对权重进行训练，以得到所述学习单元的优化的网络结构，从而为深度神经网络的预测性能和预测不确定性带来了全面提升。In the embodiment of the present invention, a deep neural network including a plurality of learning units with the same internal structure is constructed, and the relative weights of the calculation units between the hidden layers in the learning unit are trained through a training set, so as to obtain the learning unit's relative weight. The optimized network structure brings a comprehensive improvement to the prediction performance and prediction uncertainty of the deep neural network.

基于上述实施例，进一步地，所述步骤S02中采用重参数化过程采样所述学习单元的网络结构；具体包括：Based on the above embodiment, further, in step S02, a reparameterization process is used to sample the network structure of the learning unit; specifically, it includes:

为了防止在训练过程中出现训练难以收敛的问题，因此，在构建学习单元时，对重参数化过程添加预先设置的适应性系数β＝{β^(i，j)}来调节采样的方差。从而得到的具体的重参数化过程为

其中∈＝{∈^(i，j)}一组服从各个维度独立的Gumbel变量，τ是正实数表示温度。In order to prevent the problem that the training is difficult to converge during the training process, when constructing the learning unit, a preset adaptive coefficient β={β ^{(i, j)} } is added to the reparameterization process to adjust the sampling variance. The resulting specific reparameterization process is

where ∈={∈ ^(i,j) } a set of Gumbel variables obeying the independence of each dimension, and τ is a positive real number representing temperature.

具体过程如下：The specific process is as follows:

a.从Gumbel分布里随机采样出一组独立变量∈；a. Randomly sample a set of independent variables ∈ from the Gumbel distribution;

b.将(a)中得到的变量与适应性系数β相乘得到缩放后的变量；b. Multiply the variable obtained in (a) by the adaptive coefficient β to obtain the scaled variable;

c.将(b)中得到的变量与concrete分布的参数θ相加，然后除以温度系数τ；c. Add the variable obtained in (b) to the parameter θ of the concrete distribution and divide by the temperature coefficient τ;

d.将(c)中得到的结果输入softmax变换，得到采样的网络结构α＝g(θ，β，∈)。d. Input the result obtained in (c) into the softmax transform to obtain the sampled network structure α=g(θ, β, ∈).

本发明实施例通过为所述重参数化过程调协适应性系数，从而防止了训练过程中出现训练难以收敛的问题，从而为深度神经网络的预测性能和预测不确定性带来了全面提升。The embodiment of the present invention avoids the problem of difficulty in training convergence during the training process by adjusting the adaptive coefficient for the reparameterization process, thereby bringing an overall improvement to the prediction performance and prediction uncertainty of the deep neural network.

基于上述实施例，进一步地，所述步骤S03具体包括：Based on the above embodiment, further, the step S03 specifically includes:

步骤S031、根据所述采样的网络结构，计算所述训练子集中经过标注的各样本对应的输出结果，并计算所述深度神经网络的误差，以及所述网络变分分布与预设的先验分布中的对数密度差值。Step S031, according to the sampled network structure, calculate the output results corresponding to the marked samples in the training subset, and calculate the error of the deep neural network, as well as the variational distribution of the network and a preset prior. The log density difference in the distribution.

步骤S032、将所述深度神经网络的误差和所述对数密度差值进行加权求和，以得到所述深度神经网络的证据下界。Step S032 , weighting and summing the error of the deep neural network and the logarithmic density difference to obtain the evidence lower bound of the deep neural network.

在每次根据提取出的训练子集，采用重参数化过程采样得出所述学习单元的采样的网络结构后，根据采样的网络结构得到训练子集中各样本经过当前的深度神经网络得到的预测结果，并计算预测结果的误差，所述误差的计算方法有很多，在此仅以交叉熵为例进行举例说明：交叉熵

同时，基于当前采样后的网络结构α，计算它在变分分布与预设的先验分布中的对数密度的差值，两个分布都是适应性的concrete分布，对数密度为：After each sampled network structure of the learning unit is obtained according to the extracted training subset by using the reparameterization process, the prediction obtained by each sample in the training subset through the current deep neural network is obtained according to the sampled network structure. results, and calculate the error of the prediction result. There are many calculation methods for the error. Here we only take cross-entropy as an example to illustrate: cross-entropy

At the same time, based on the current sampled network structure α, the difference between its logarithmic density in the variational distribution and the preset prior distribution is calculated. Both distributions are adaptive concrete distributions, and the logarithmic density is:

因此，易得差值记为KL；Therefore, the easy-to-obtain difference is recorded as KL;

将所述交叉熵和对数密度的差值进行加权，得到深度神经网络整体的损失为

其中

表示整个训练集对应的KL距离分配到该子集所对应的权重，该损失即为变分推理的近似误差，利用预设优化方法，例如，梯度下降的方法，来解优化问题min_θ，wELBO，以此来进一步优化所述网络结构。Weighting the difference between the cross entropy and the logarithmic density, the overall loss of the deep neural network is obtained as

in

Indicates that the KL distance corresponding to the entire training set is assigned to the weight corresponding to the subset, and the loss is the approximate error of variational inference. Use a preset optimization method, such as gradient descent, to solve the optimization problem min _{θ, w} ELBO to further optimize the network structure.

本发明实施例中在每一次训练中，计算交叉熵和对数密度的差值，相加权后得到所述深度神经网络的证据下界，并根据证据下界的变化对网络结构进行优化，从而为深度神经网络的预测性能和预测不确定性带来了全面提升。In the embodiment of the present invention, in each training, the difference between the cross-entropy and the logarithmic density is calculated, and weighted to obtain the evidence lower bound of the deep neural network, and the network structure is optimized according to the change of the evidence lower bound, so that the depth The prediction performance and prediction uncertainty of the neural network brought an overall improvement.

基于上述实施例，进一步地，所述步骤S01具体包括：Based on the above embodiment, further, the step S01 specifically includes:

本发明实施例所构建的深度神经网络中包括多个串联的学习单元，并且在部分预先确定的学习单元之间根据实际应用的需求插入预设的下采样层和/或上采样层。其中，所述图片分类中，仅需要插入下采样层，而对于语义分割，则均需要插入。其中，所述下采样层一般由批正则化-线性整流-卷积-池化组成，而上采样一般由反卷积构建。The deep neural network constructed in the embodiment of the present invention includes a plurality of learning units connected in series, and a preset down-sampling layer and/or an up-sampling layer is inserted between some predetermined learning units according to actual application requirements. Among them, in the picture classification, only the downsampling layer needs to be inserted, and for semantic segmentation, both need to be inserted. Among them, the downsampling layer is generally composed of batch regularization-linear rectification-convolution-pooling, and the upsampling is generally constructed by deconvolution.

使用的卷积计算单元采用批正则化-线性整流-卷积-批正则化的操作顺序；The convolution computing unit used adopts the operation order of batch regularization-linear rectification-convolution-batch regularization;

将同时发生的、相互独立的、同类别的计算单元整合为一个组操作，提高计算效率，主要是将多个卷积整理成一个组卷积，提高计算效率。Integrate simultaneous, independent, and same-type computing units into a group operation to improve computing efficiency, mainly by organizing multiple convolutions into a group convolution to improve computing efficiency.

本发明实施例通过在预先确定的学习单元之间插入下采样层和/或上采样层，在输入层增加一个预处理的卷积层，在网络最后插入一个线性全连接层，从而增加了深度神经网络的性能。The embodiment of the present invention increases the depth by inserting a downsampling layer and/or an upsampling layer between predetermined learning units, adding a preprocessing convolution layer to the input layer, and inserting a linear fully connected layer at the end of the network. performance of neural networks.

上述实施例构建的深度神经网络可以应用于各个场景，例如：The deep neural network constructed in the above embodiment can be applied to various scenarios, for example:

图片分类：Image classification:

1)使用网络预测结果与标注的交叉熵作为模型训练中的误差进行训练；1) Use the network prediction result and the marked cross entropy as the error in the model training for training;

2)测试时，对于一组测试样例，随机从学习得到的网络结构的分布里采样出100个网络结构，基于这些结构，模型给出100组预测概率；2) During the test, for a set of test samples, 100 network structures are randomly sampled from the distribution of the learned network structure, and based on these structures, the model gives 100 sets of predicted probabilities;

3)将这些预测概率取均值，得到预测概率；3) Take the mean value of these predicted probabilities to obtain the predicted probability;

4)取3)中预测概率最大的类别作为图片的分类。4) Take the category with the largest predicted probability in 3) as the classification of the picture.

语义分割：Semantic segmentation:

1)使用所有像素的预测结果与标注的交叉熵之和作为模型训练中的误差进行训练；1) Use the sum of the prediction results of all pixels and the marked cross-entropy as the error in the model training for training;

2)测试时，对于一组测试样例，随机从学习得到的网络结构的分布里采样出100个网络结构，基于这些结构，模型给出100组像素级的预测概率；2) During the test, for a set of test samples, 100 network structures are randomly sampled from the distribution of the learned network structure. Based on these structures, the model gives 100 groups of pixel-level prediction probabilities;

3)将这些预测概率取均值，得到像素级预测概率；3) Average these predicted probabilities to obtain pixel-level predicted probabilities;

4)取3)中得到的结果在每一个像素上最大的类别作为该像素的分割结果。4) Take the largest category of the result obtained in 3) on each pixel as the segmentation result of the pixel.

检测对抗样本：Detecting adversarial examples:

1)对于一组对抗样本，随机从学习得到的网络结构分布里采样出30个网络结构；1) For a set of adversarial samples, randomly sample 30 network structures from the learned network structure distribution;

2)基于这些结构，模型给出30组预测概率；2) Based on these structures, the model gives 30 groups of predicted probabilities;

3)将这些预测概率取均值，得到模型最终的预测概率；3) Take the average of these predicted probabilities to obtain the final predicted probability of the model;

4)计算3)中得到的预测概率的熵，作为检测的指标；4) Calculate the entropy of the predicted probability obtained in 3) as an index for detection;

5)如果4)中得到的熵明显大于正常样本的预测对应的熵，则说明检测出对抗样本。5) If the entropy obtained in 4) is significantly larger than the predicted corresponding entropy of the normal sample, it means that the adversarial sample is detected.

检测领域迁移：Detection field migration:

1)对于一组与训练数据采样自不同领域的样本，随机从学习得到的结构分布里采样出100个网络结构；1) For a group of samples sampled from different fields with the training data, randomly sample 100 network structures from the learned structure distribution;

2)基于这些结构，模型给出100组预测概率；2) Based on these structures, the model gives 100 groups of predicted probabilities;

5)如果4)中得到的熵明显大于正常样本的预测对应的熵，则说明检测出领域迁移。5) If the entropy obtained in 4) is significantly larger than the predicted corresponding entropy of the normal sample, it means that domain migration is detected.

使用如下述实施例的深度贝叶斯结构网络在自然图片分类数据集CIFAR-10和CIFAR-100上进行测试，训练得到的模型分别达到了4.98％和22.50％的分类错误率，大幅优于最先进的深度神经网络ResNet和DenseNet。将本发明应用到CamVid语义分割任务中，在与强力的FC-DenseNet方法使用相同的训练条件的情况下，可以取得比FC-DenseNet高2.3的平均IoU，同时达到和世界领先水平的方法相媲美的水平。此外，本发明训练得到的模型可以通过预测的不确定性检测对抗样本和领域迁移，在测试中该发明展示出显著优于传统贝叶斯神经网络的预测不确定性。综上所述，本发明通过利用神经网络架构搜索中提出的网络结构学习空间，为深度网络的网络结构引入不确定性，进行贝叶斯建模，使用随机变分推理方法进行学习，缓解拥有权重不确定性的贝叶斯深度网络的难设计先验、难推理后验等问题，为网络模型的预测性能和预测不确定性带来了全面提升，显著提高了贝叶斯神经网络在图片分类、语义分割、检测对抗样本和领域迁移等任务上的表现。Using the deep Bayesian structure network in the following example to test on the natural image classification datasets CIFAR-10 and CIFAR-100, the trained models achieved classification error rates of 4.98% and 22.50%, respectively, which are significantly better than the most advanced models. Advanced deep neural networks ResNet and DenseNet. Applying the present invention to the CamVid semantic segmentation task, under the same training conditions as the powerful FC-DenseNet method, an average IoU that is 2.3 higher than that of FC-DenseNet can be achieved, and at the same time, it is comparable to the world-leading method. s level. In addition, the model trained by the present invention can detect adversarial samples and domain transfer through the uncertainty of prediction, and in the test, the invention shows that the prediction uncertainty is significantly better than that of the traditional Bayesian neural network. To sum up, the present invention introduces uncertainty into the network structure of the deep network by using the network structure learning space proposed in the neural network architecture search, performs Bayesian modeling, and uses the random variational inference method for learning, and alleviates the The problems of difficult design priors and difficult inference posteriors of Bayesian deep networks with weight uncertainty have brought a comprehensive improvement to the prediction performance and prediction uncertainty of the network model, and significantly improved the performance of Bayesian neural networks in pictures. Performance on tasks such as classification, semantic segmentation, detecting adversarial examples, and domain transfer.

图2为本发明实施例的深度神经网络的贝叶斯结构学习装置结构示意图，如图2所示，所述装置包括：构建单元10、训练单元11、误差计算单元12和判断单元13，其中，FIG. 2 is a schematic structural diagram of a Bayesian structure learning device for a deep neural network according to an embodiment of the present invention. As shown in FIG. 2 , the device includes: a construction unit 10 , a training unit 11 , an error calculation unit 12 and a judgment unit 13 , wherein ,

所述构建单元10用于构建深度神经网络，所述深度神经网络包括至少一个具有相同内部结构的学习单元，所述学习单元包括预设层数的隐层，且每两个隐层间包括复数种计算单元，定义网络结构为各计算单元的相对权重，并采用参数化的变分分布来建模所述网络结构；所述训练单元11用于从预设的训练集中随机提取训练子集，并采用重参数化过程采样所述学习单元的网络结构；所述误差计算单元12用于根据所述采样的网络结构，得到所述深度神经网络的证据下界ELBO；所述判断单元13用于若所述证据下界的变化超过预设的损失阈值，则根据预设的优化方法优化所述网络结构和网络权重，并再次从所述训练集中随机提取训练子集以继续对所述学习单元的网络结构进行训练；若所述证据下界的变化未超过预设的损失阈值，则判定训练结束。The construction unit 10 is used to construct a deep neural network, the deep neural network includes at least one learning unit with the same internal structure, the learning unit includes a preset number of hidden layers, and every two hidden layers include complex numbers. A computing unit, the network structure is defined as the relative weight of each computing unit, and a parameterized variational distribution is used to model the network structure; the training unit 11 is used to randomly extract a training subset from a preset training set, And adopt the reparameterization process to sample the network structure of the learning unit; the error calculation unit 12 is used to obtain the evidence lower bound ELBO of the deep neural network according to the sampled network structure; the judgment unit 13 is used for if If the change of the lower bound of the evidence exceeds the preset loss threshold, the network structure and network weight are optimized according to the preset optimization method, and a training subset is randomly extracted from the training set again to continue the network analysis of the learning unit. The structure is trained; if the change of the lower bound of the evidence does not exceed the preset loss threshold, it is determined that the training is over.

构建单元10根据实际的需要构建深度神经网络，所述深度神经网络包括：输入层、输出层，以及位于输入层和输出层之间的至少一个重复堆砌的学习单元，且各学习单元依次串联，具有相同的网络结构。The construction unit 10 constructs a deep neural network according to actual needs, and the deep neural network includes: an input layer, an output layer, and at least one repeatedly stacked learning unit between the input layer and the output layer, and each learning unit is connected in series in sequence, have the same network structure.

所述学习单元包括预设层数K个隐层，且任意两个隐层间包括多个计算单元，例如，全连接单元、卷积单元和池化单元等。所述学习单元的网络结构α＝{α^(i,j)|1≤i<j≤K}，其中，α^(i,j)包括第i层隐层与第j层隐层之间与各计算单元对应的相对权重。所述第i层隐层的特征表示到第j个隐层的特征表示的计算为：根据所述α^(i,j)，两者之间各计算单元输出的加权和。The learning unit includes a preset number of K hidden layers, and any two hidden layers include multiple computing units, such as a fully connected unit, a convolution unit, and a pooling unit. The network structure of the learning unit α={α ^(i,j) |1≤i<j≤K}, where α ^(i,j) includes the distance between the hidden layer of the i-th layer and the hidden layer of the j-th layer and each Calculate the relative weights corresponding to the units. The calculation from the feature representation of the i-th hidden layer to the feature representation of the j-th hidden layer is: according to the α ^(i,j) , the weighted sum of the outputs of the calculation units between the two.

所述与各计算单元对应的相对权重，服从分类分布，为了方便基于优化方法进行训练，可设定为可学习的、连续化、参数化的变分分布。The relative weights corresponding to each computing unit are subject to classification distribution, and can be set as a learnable, continuous, and parameterized variational distribution in order to facilitate training based on the optimization method.

训练单元11预先设定包含N个样本的训练集，每个样本(x_n，y_n)包括输入数据x_n以及对应的预先标注的输出结果y_n。The training unit 11 presets a training set including N samples, and each sample (x _n , y _n ) includes input data x _n and a corresponding pre-labeled output result _yn .

根据所述训练集，对由所述构建单元10构建的所述深度神经网络进行训练，并在结束训练后，最终得到所述学习单元的优化的网络结构，所述优化的网络结构包括每两个隐层间各计算单元对应的优化的相对权重。具体的训练过程如下：According to the training set, the deep neural network constructed by the construction unit 10 is trained, and after the training is completed, the optimized network structure of the learning unit is finally obtained, and the optimized network structure includes every two The relative weight of optimization corresponding to each computing unit between two hidden layers. The specific training process is as follows:

1.由训练单元11随机选取训练集中一个大小为k的子集

f是一个以x_n为输入以w为参数、以α为学习单元的网络结构的深度神经网络对应的函数。1. The training unit 11 randomly selects a subset of size k in the training set

2.误差计算单元12根据所述采样的网络结构，得到所述深度神经网络当前的证据下界(Evidence Lower Bound，ELBO)。2. The error calculation unit 12 obtains the current evidence lower bound (Evidence Lower Bound, ELBO) of the deep neural network according to the sampled network structure.

3.判断单元13将当前的证据下界与上次训练完成后得到的证据下界进行比较，以得到所述证据下界的变化。若本次训练后得到证据下界的变化超过了预设的损失阈值，则判定训练未完成，进一步根据预设的优化方法，对所述采样的网络结构和网络参数进行优化。然后，重新从训练集中选取子集以继续执行1-3的训练过程。直到所述证据下界的变化小于等于所述损失阈值，则判定训练结束。3. The judging unit 13 compares the current lower bound of evidence with the lower bound of evidence obtained after the last training, so as to obtain the change of the lower bound of evidence. If the change of the lower bound of the evidence obtained after this training exceeds the preset loss threshold, it is determined that the training is not completed, and the sampled network structure and network parameters are further optimized according to the preset optimization method. Then, re-select a subset from the training set to continue the training process of 1-3. Until the change of the lower bound of the evidence is less than or equal to the loss threshold, it is determined that the training is over.

本发明实施例提供的装置用于执行上述方法，其功能具体参考上述方法实施例，其具体方法流程在此处不再赘述。The apparatus provided in the embodiment of the present invention is used to execute the foregoing method, and its function refers to the foregoing method embodiment for details, and the specific method flow is not repeated here.

基于上述实施例，进一步地，所述训练单元具体用于，从预设的训练集中随机提取训练子集，并根据预设的适应性系数，采用重参数化过程采样所述学习单元的网络结构。Based on the above embodiment, further, the training unit is specifically configured to randomly extract a training subset from a preset training set, and use a reparameterization process to sample the network structure of the learning unit according to a preset adaptability coefficient .

为了防止在训练过程中出现训练难以收敛的问题，因此，在构建学习单元时，所述构建单元对重参数化过程添加预先设置的适应性系数β＝{β^(i，j)}来调节采样的方差。从而所述训练单元得到的具体的重参数化过程为

其中∈＝{∈^(i，j)}一组服从各个维度独立的Gumbel变量，τ是正实数表示温度。In order to prevent the problem that the training is difficult to converge during the training process, when constructing the learning unit, the constructing unit adds a preset adaptive coefficient β={β ^{(i, j)} } to the reparameterization process to adjust the sampling Variance. Therefore, the specific reparameterization process obtained by the training unit is:

具体过程如下：The specific process is as follows:

图3示例了一种电子设备的实体结构示意图，如图3所示，该电子设备可以包括：处理器(processor)301、通信接口(Communications Interface)303、存储器(memory)302和通信总线304，其中，处理器301，通信接口303，存储器302通过通信总线304完成相互间的通信。处理器301可以调用存储器302中的逻辑指令，以执行上述方法。FIG. 3 illustrates a schematic diagram of the physical structure of an electronic device. As shown in FIG. 3 , the electronic device may include: a processor (processor) 301, a communication interface (Communications Interface) 303, a memory (memory) 302, and a communication bus 304, The processor 301 , the communication interface 303 , and the memory 302 communicate with each other through the communication bus 304 . The processor 301 may invoke logic instructions in the memory 302 to perform the above-described method.

进一步地，本发明实施例公开一种计算机程序产品，所述计算机程序产品包括存储在非暂态计算机可读存储介质上的计算机程序，所述计算机程序包括程序指令，当所述程序指令被计算机执行时，计算机能够执行上述各方法实施例所提供的方法。Further, an embodiment of the present invention discloses a computer program product, the computer program product includes a computer program stored on a non-transitory computer-readable storage medium, the computer program includes program instructions, and when the program instructions are executed by a computer During execution, the computer can execute the methods provided by the foregoing method embodiments.

进一步地，本发明实施例提供一种非暂态计算机可读存储介质，所述非暂态计算机可读存储介质存储计算机指令，所述计算机指令使所述计算机执行上述各方法实施例所提供的方法。Further, an embodiment of the present invention provides a non-transitory computer-readable storage medium, where the non-transitory computer-readable storage medium stores computer instructions, and the computer instructions cause the computer to execute the methods provided by the foregoing method embodiments. method.

本领域普通技术人员可以理解：此外，上述的存储器302中的逻辑指令可以通过软件功能单元的形式实现并作为独立的产品销售或使用时，可以存储在一个计算机可读取存储介质中。基于这样的理解，本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来，该计算机软件产品存储在一个存储介质中，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行本发明各个实施例所述方法的全部或部分步骤。而前述的存储介质包括：U盘、移动硬盘、只读存储器(ROM，Read-Only Memory)、随机存取存储器(RAM，Random AccessMemory)、磁碟或者光盘等各种可以存储程序代码的介质。Those skilled in the art can understand that: in addition, the above-mentioned logic instructions in the memory 302 can be implemented in the form of software functional units and can be stored in a computer-readable storage medium when sold or used as an independent product. Based on this understanding, the technical solution of the present invention can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution. The computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present invention. The aforementioned storage medium includes: U disk, removable hard disk, Read-Only Memory (ROM, Read-Only Memory), Random Access Memory (RAM, Random Access Memory), magnetic disk or optical disk and other media that can store program codes.

以上所描述的装置实施例仅仅是示意性的，其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的，作为单元显示的部件可以是或者也可以不是物理单元，即可以位于一个地方，或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。本领域普通技术人员在不付出创造性的劳动的情况下，即可以理解并实施。The device embodiments described above are only illustrative, wherein the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in One place, or it can be distributed over multiple network elements. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution in this embodiment. Those of ordinary skill in the art can understand and implement it without creative effort.

通过以上的实施方式的描述，本领域的技术人员可以清楚地了解到各实施方式可借助软件加必需的通用硬件平台的方式来实现，当然也可以通过硬件。基于这样的理解，上述技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来，该计算机软件产品可以存储在计算机可读存储介质中，如ROM/RAM、磁碟、光盘等，包括若干指令用以使得一台计算机设备(可以是个人计算机，服务器，或者网络设备等)执行各个实施例或者实施例的某些部分所述的方法。From the description of the above embodiments, those skilled in the art can clearly understand that each embodiment can be implemented by means of software plus a necessary general hardware platform, and certainly can also be implemented by hardware. Based on this understanding, the above-mentioned technical solutions can be embodied in the form of software products in essence or the parts that make contributions to the prior art, and the computer software products can be stored in computer-readable storage media, such as ROM/RAM, magnetic A disc, an optical disc, etc., includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to perform the methods described in various embodiments or some parts of the embodiments.

最后应说明的是：以上实施例仅用以说明本发明的技术方案，而非对其限制；尽管参照前述实施例对本发明进行了详细的说明，本领域的普通技术人员应当理解：其依然可以对前述各实施例所记载的技术方案进行修改，或者对其中部分技术特征进行等同替换；而这些修改或者替换，并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。Finally, it should be noted that the above embodiments are only used to illustrate the technical solutions of the present invention, but not to limit them; although the present invention has been described in detail with reference to the foregoing embodiments, those of ordinary skill in the art should understand that it can still be The technical solutions described in the foregoing embodiments are modified, or some technical features thereof are equivalently replaced; and these modifications or replacements do not make the essence of the corresponding technical solutions deviate from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A Bayesian structure learning method for a deep neural network is characterized by comprising the following steps:

constructing a deep neural network, wherein the deep neural network comprises at least one learning unit with the same internal structure, the learning unit comprises a preset number of hidden layers, a plurality of computing units are arranged between every two hidden layers, a network structure is defined as the relative weight of each computing unit, and the network structure is modeled by adopting parameterized variational distribution;

randomly extracting a training subset from a preset training set, and sampling a network structure of the learning unit by adopting a re-parameterization process;

according to the sampled network structure, calculating an evidence lower bound ELBO of the deep neural network;

if the change of the evidence lower bound exceeds a preset loss threshold value, optimizing the network structure and the network weight according to a preset optimization method, and randomly extracting a training subset from the training set again to continue training the network structure of the learning unit;

if the change of the evidence lower bound does not exceed a preset loss threshold, judging that the training is finished;

when the deep neural network is used for picture classification, then:

training by using the cross entropy of the network prediction result and the label as an error in model training;

during testing, for a group of test samples, randomly sampling 100 network structures from the network structure distribution obtained by learning, and based on the structures, giving 100 groups of prediction probabilities by a model;

averaging the prediction probabilities to obtain the prediction probability;

taking the category with the maximum prediction probability as the classification of the pictures;

when the deep neural network is used for semantic segmentation, then:

training by using the sum of the prediction results of all pixels and the cross entropy of the labels as an error in model training;

during testing, for a group of test samples, randomly sampling 100 network structures from the network structure distribution obtained by learning, and based on the structures, giving 100 groups of pixel-level prediction probabilities by a model;

averaging the prediction probabilities to obtain pixel-level prediction probabilities;

taking the class with the maximum pixel-level prediction probability on each pixel as the segmentation result of the pixel;

when the deep neural network is used to detect challenge samples, then:

for a group of confrontation samples, randomly sampling 30 network structures from the network structure distribution obtained by learning;

based on these structures, the model gives 30 sets of prediction probabilities;

averaging the prediction probabilities to obtain the final prediction probability of the model;

calculating the entropy of the final prediction probability of the obtained model as a detection index;

if the obtained entropy is obviously larger than the predicted corresponding entropy of the normal sample, the confrontation sample is detected;

when the deep neural network is used to detect a domain migration, then:

for a group of samples which are sampled from different fields from the training data, randomly sampling 100 network structures from the structure distribution obtained by learning;

based on these structures, the model gives 100 sets of prediction probabilities;

calculating the entropy of the obtained prediction probability as a detection index;

if the entropy obtained in the step (b) is obviously larger than the entropy corresponding to the prediction of the normal sample, the field migration is detected;

the network structure for sampling the learning unit by adopting the reparameterization process specifically comprises the following steps:

sampling the network structure of the learning unit by adopting a re-parameterization process according to a preset adaptability coefficient;

when a learning unit is constructed, a preset adaptability coefficient beta is added to a heavy parameterization process^(i，j)Adjusting the variance of the samples; the specific reparameterization process thus obtained is

Wherein ∈ { [ epsilon ]^(i，j)A set of Gumbel variables subject to dimensional independence, where τ is a positive real number representing temperature;

the specific process is as follows:

a. randomly sampling a group of independent variables belonging to the same group from Gumbel distribution;

b. multiplying the variable obtained in (a) by an adaptive coefficient beta to obtain a scaled variable;

c. adding the variable obtained in (b) to a parameter θ of the constret distribution, and then dividing by the temperature coefficient τ;

d. inputting the result obtained in the step (c) into softmax transformation, and obtaining a sampled network structure alpha which is g (theta, beta and epsilon);

the calculating of the evidence lower bound ELBO of the deep neural network according to the sampled network structure specifically includes:

according to the sampled network structure, calculating output results corresponding to the labeled samples in the training subset, and calculating the error of the deep neural network and the difference value of the logarithmic density in the network variation distribution and the preset prior distribution;

and carrying out weighted summation on the error of the deep neural network and the logarithmic density difference value to obtain the evidence lower bound of the deep neural network.

2. The Bayesian structure learning method for the deep neural network as recited in claim 1, wherein the deep neural network is constructed and comprises at least one learning unit with the same internal structure; the method specifically comprises the following steps:

constructing a deep neural network, wherein the deep neural network comprises at least one learning unit with the same internal structure, and a preset down-sampling layer and/or an up-sampling layer are/is inserted between predetermined learning units; wherein the downsampling layer comprises: the device comprises a batch regularization layer, a linear rectification layer, a convolution layer and a pooling layer, wherein the up-sampling layer is constructed by a deconvolution layer.

3. The Bayesian structure learning method for the deep neural network as recited in claim 2, wherein an input layer of the deep neural network is a convolutional layer for preprocessing, and an output layer is a linear fully-connected layer.

4. The Bayesian structure learning method for a deep neural network as recited in claim 3, wherein the variation distribution is a concoret distribution.

5. A Bayesian structure learning device for a deep neural network, comprising:

the deep neural network comprises at least one learning unit with the same internal structure, the learning unit comprises a preset number of hidden layers, a plurality of computing units are arranged between every two hidden layers, the network structure is defined as the relative weight of each computing unit, and the parameterized variational distribution is adopted to model the network structure;

the training unit is used for randomly extracting a training subset from a preset training set and sampling the network structure of the learning unit by adopting a re-parameterization process;

the error calculation unit is used for obtaining an evidence lower bound ELBO of the deep neural network according to the sampled network structure;

the judging unit is used for optimizing the distribution and the network weight of the network structure according to a preset optimization method if the change of the evidence lower bound exceeds a preset loss threshold value, and randomly extracting a training subset from the training set again to continue training the network structure of the learning unit; if the change of the evidence lower bound does not exceed a preset loss threshold, judging that the training is finished;

when the deep neural network is used for picture classification, then:

averaging the prediction probabilities to obtain the prediction probability;

when the deep neural network is used for semantic segmentation, then:

when the deep neural network is used to detect challenge samples, then:

based on these structures, the model gives 30 sets of prediction probabilities;

when the deep neural network is used to detect a domain migration, then:

if the obtained entropy is obviously larger than the entropy corresponding to the prediction of the normal sample, the field migration is detected;

the specific process is as follows:

e. randomly sampling a group of independent variables belonging to the same group from Gumbel distribution;

f. multiplying the variable obtained in (a) by an adaptive coefficient beta to obtain a scaled variable;

g. adding the variable obtained in (b) to a parameter θ of the constret distribution, and then dividing by the temperature coefficient τ;

h. inputting the result obtained in the step (c) into softmax transformation, and obtaining a sampled network structure alpha which is g (theta, beta and epsilon);

6. An electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements the steps of the bayesian structure learning method of the deep neural network of any one of claims 1 to 4.

7. A non-transitory computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the bayesian structure learning method of a deep neural network according to any one of claims 1 to 4.