CN107958216A

CN107958216A - Based on semi-supervised multi-modal deep learning sorting technique

Info

Publication number: CN107958216A
Application number: CN201711202305.0A
Authority: CN
Inventors: 李照奎; 黄林; 刘翠微; 王天宁; 张德园; 赵亮; 石祥滨; 王岩; 吴昊
Original assignee: Shenyang Aerospace University
Current assignee: Shenyang Aerospace University
Priority date: 2017-11-27
Filing date: 2017-11-27
Publication date: 2018-04-24

Abstract

The present invention proposes to use deep learning to classify while considering the rich multimodal information of samples and the classification contribution differences of each modality, and uses a semi-supervised method to solve the problem of insufficient samples. The data are sent to the deep neural network separately, and a semi-supervised method is used to utilize a large number of unlabeled samples, and the deep neural network based on self-encoding is used for feature learning, and all marked and unlabeled data are sent to the self-encoding deep network for learning. Design similar networks for different modalities, obtain their respective initialization parameters through self-encoding reconstruction, and obtain the hidden attribute classification of marked samples through clustering. For unlabeled data, first calculate through multi-objective deep network Its deep features, then find approximate labeled samples based on cluster labels, and finally predict the labels of unlabeled samples based on the label information of labeled samples.

Description

Semi-supervised multimodal deep learning classification method

技术领域technical field

本发明涉及在用深度学习进行分类的同时考虑了样本丰富的多模态信息以及各模态的分类贡献差异性，利用半监督的方法解决样本不足问题的一种基于半监督的多模态深度学习分类方法。The present invention relates to a semi-supervised method based on semi-supervised multi-modal deep learning to solve the problem of insufficient samples while considering the rich multi-modal information of samples and the differences in the classification contributions of each modality while using deep learning to classify. Learn how to classify.

背景技术Background technique

高光谱遥感图像具有的纹理、空间相关性以及光谱等多种模态信息，融合这么多方面的特征信息使得高光谱遥感可以探测到更多的地物信息，极大提高了人类对客观世界的认知能力。Hyperspectral remote sensing images have multiple modal information such as texture, spatial correlation, and spectrum. The fusion of so many aspects of feature information enables hyperspectral remote sensing to detect more ground object information, which greatly improves human understanding of the objective world. cognitive ability.

目前更多的研究者直接利用深度模型对部分模态信息进行融合，较少研究者针对模态的分类贡献差异性来构建深度网络架构。此外，图像的分类性能主要依赖大量的训练样本，而实际往往缺乏足够的有标记样本。如何在拥有较少有标记样本的基础上，保持深度学习泛化能力的研究还比较少。At present, more researchers directly use deep models to fuse part of the modal information, and fewer researchers construct deep network architectures for the differences in the classification contribution of modalities. In addition, the classification performance of images mainly depends on a large number of training samples, but in practice there are often insufficient labeled samples. There are still relatively few studies on how to maintain the generalization ability of deep learning on the basis of having fewer labeled samples.

在公知技术中，现在有很多的深度模型，比如SAE、DBN、DCNN等。在2015年，Li等人提出基于SAE的深度架构来组合底层次特征。Chen等人提出利用另一种新的深度模型DBN进行分类，DBN模型能够提取更鲁棒的特征，获得优于其它模型的分类精度。In the known technology, there are many deep models, such as SAE, DBN, DCNN and so on. In 2015, Li et al. proposed a SAE-based deep architecture to combine low-level features. Chen et al. proposed to use another new deep model DBN for classification. The DBN model can extract more robust features and obtain better classification accuracy than other models.

在深度模型DCNN方面，DCNN使用局部连接可以有效地提取空间信息，并通过共享权重来减少训练参数。Zhao等人首先通过局部判别嵌入方法压缩光谱特征，通过CNN学习空间特征，然后把处理后的光谱特征和空间特征放入多分类器。Chen等人提出端到端的DCNN框架进行特征学习和高光谱图像分类。A.Romero等人提出利用无监督的卷积神经网络分析遥感图像，该方法通过无监督的贪心分层预训练方法构建一个DCNN模型。针对训练样本少而维数高所带来的过拟合问题，Chen等人采用L2正则化和dropout来解决。Li等人提出利用CNN进行像素对特征分类的思想来补偿数据的不足。In terms of deep model DCNN, DCNN can effectively extract spatial information by using local connections, and reduce training parameters by sharing weights. Zhao et al. first compress spectral features by local discriminative embedding method, learn spatial features by CNN, and then put the processed spectral features and spatial features into a multi-classifier. Chen et al. proposed an end-to-end DCNN framework for feature learning and hyperspectral image classification. A.Romero et al. proposed to use unsupervised convolutional neural network to analyze remote sensing images. This method builds a DCNN model through an unsupervised greedy layered pre-training method. Aiming at the overfitting problem caused by few training samples and high dimensionality, Chen et al. used L2 regularization and dropout to solve it. Li et al. proposed the idea of using CNN for pixel-wise feature classification to compensate for the lack of data.

发明内容Contents of the invention

综上所述，现有的公知技术研究中更多是直接利用深度模型对部分模态信息进行融合，较少研究者针对模态的分类贡献差异性来构建深度网络架构。公知技术的缺点在于，许多深度模型比如SAE、DBN的不同层间采用全连接模式，需要训练大量的参数，而实际往往缺少大量可用的标记训练样本。此外，SAE和DBN在训练阶段，只是把空间信息表示成向量形式，因而不能更有效地提取空间信息。To sum up, most of the existing known technology researches directly use the deep model to fuse part of the modal information, and few researchers construct a deep network architecture for the difference in the classification contribution of the modal. The disadvantage of the known technology is that many deep models, such as SAE and DBN, adopt a fully connected mode between different layers, which requires training a large number of parameters, but in practice, there is often a lack of a large number of available labeled training samples. In addition, SAE and DBN only represent spatial information in the form of vectors during the training phase, so they cannot extract spatial information more effectively.

对于DANN这个模型，它虽然使用局部连接可以有效地提取空间信息，并通过共享权值的方式减少参数的数量。但它需要很多个隐藏层才可以学到更复杂的数据特征，可能仍然需要学习非常多的参数，尤其当训练样本数量不充足时往往导致过拟合问题。For the DANN model, although it uses local connections, it can effectively extract spatial information and reduce the number of parameters by sharing weights. However, it requires many hidden layers to learn more complex data features, and may still need to learn a lot of parameters, especially when the number of training samples is insufficient, which often leads to overfitting problems.

相比之下，现有的公知技术不能全面的考虑如何在拥有较少有标记样本的基础上，保持深度学习的泛化能力以及如何将样本的多模态信息和少的标记样本结合起来构建深度学习架构。In contrast, the existing known technologies cannot comprehensively consider how to maintain the generalization ability of deep learning on the basis of having fewer labeled samples and how to combine the multimodal information of samples with fewer labeled samples to construct Deep Learning Architecture.

为了克服现有技术上的缺陷提出了本发明。本发明结合了样本的空间领域特征，纹理特征，光谱特征的多模态信息以及用各模态信息的分类贡献差异性来构建深度网络架构，此外还考虑了少量标记样本的问题，利用半监督的方法来利用少量标记的样本预测大量未标记的样本。The present invention is proposed in order to overcome the defects in the prior art. The present invention combines the multi-modal information of the sample's spatial domain features, texture features, and spectral features and uses the classification contribution differences of each modal information to construct a deep network architecture. In addition, it also considers the problem of a small number of labeled samples, using semi-supervised A method to predict a large number of unlabeled samples from a small number of labeled samples.

所述代价是分类正确率。The cost is classification accuracy.

本发明所采用的技术方案为：基于半监督的多模态深度学习分类方法，包括如下步骤：The technical solution adopted in the present invention is: a classification method based on semi-supervised multimodal deep learning, comprising the following steps:

(1)将高光谱图像分为光谱模态、空间纹理模态和空间相关性模态，针对每种模态分别设计对应的深度卷积神经网络；(1) The hyperspectral image is divided into spectral mode, spatial texture mode and spatial correlation mode, and the corresponding deep convolutional neural network is designed for each mode;

(2)根据模态的分类贡献差异度，设计融合三个模态的融合分类器F；每个模态本身具有自己的隐含属性，因此可以对不同模态数据进行聚类，以获得其内在属性类别，每个模态都对应一个隐含属性分类器，分别为光谱隐含属性分类器S，纹理隐含属性分类器T，以及相关性隐含属性分类器R。(2) Design a fusion classifier F that fuses the three modalities according to the differences in the classification contributions of the modalities; each modal itself has its own hidden attributes, so different modal data can be clustered to obtain its Intrinsic attribute category, each modality corresponds to a latent attribute classifier, which are spectral latent attribute classifier S, texture latent attribute classifier T, and correlation latent attribute classifier R.

(3)针对所有的标记样本，利用聚类方法来获取其隐含的属性分类；(3) For all labeled samples, use the clustering method to obtain their hidden attribute classification;

(4)通过聚类后每个样本拥有两个标签，可以获得这两个类别的对应关系；(4) After clustering, each sample has two labels, and the corresponding relationship between these two categories can be obtained;

(5)对未标记的样本进行聚类分析，先通过多目标深度网络计算其深度特征，然后基于聚类标签寻找近似的有标记样本，最后根据有标记样本的标记信息预测未标记样本的标记；(5) Carry out cluster analysis on unlabeled samples, first calculate its deep features through multi-target deep network, then find approximate labeled samples based on cluster labels, and finally predict the label of unlabeled samples according to the label information of labeled samples ;

所述步骤(5)需要设置预测标记的置信标准，如果超出标准则不做预测，当一定批次的未标记样本被预测出标记信息后，自适应地调整每个模态的分类贡献差异度。The step (5) needs to set the confidence standard of the predicted label. If it exceeds the standard, no prediction is made. When a certain batch of unlabeled samples is predicted to have the label information, the classification contribution difference of each modality is adaptively adjusted. .

本发明的有益效果：Beneficial effects of the present invention:

本发明涉及基于半监督的多模态深度学习分类方法，较为全面考虑了样本多模态信息和各模态分类贡献差异度，以及将其和半监督方法结合起来解决标记样本少的问题，来提高深度模型的泛化能力。The present invention relates to a semi-supervised multi-modal deep learning classification method, which comprehensively considers the multi-modal information of samples and the contribution difference of each modal classification, and combines it with a semi-supervised method to solve the problem of few labeled samples. Improve the generalization ability of deep models.

在用深度学习进行分类的同时考虑了样本丰富的多模态信息以及各模态的分类贡献差异性，利用半监督的方法解决样本不足问题的一种基于半监督的多模态深度学习分类方法。该方法考虑了像元所具有的的纹理特征，空间特征，光谱特征的多模态信息的融合，同时还考虑了各个模态信息对深度网络架构的贡献差异度，并且将多模态信息对分类的贡献差异度和预测未标记样本结合起来来提高深度模型的泛化能力，最终达到提高高光谱图像分类的正确率。A semi-supervised multi-modal deep learning classification method based on semi-supervised methods to solve the problem of insufficient samples while considering the rich multi-modal information of samples and the differences in the classification contributions of each modality while using deep learning for classification. . This method takes into account the fusion of multi-modal information of texture features, spatial features, and spectral features of the pixel, and also considers the contribution of each modal information to the deep network architecture. The contribution difference of the classification and the prediction of unlabeled samples are combined to improve the generalization ability of the deep model, and finally to improve the accuracy of hyperspectral image classification.

附图说明Description of drawings

下面结合附图对本发明作进一步的说明。The present invention will be further described below in conjunction with the accompanying drawings.

通过参考以下结合附图对所采用的实施方式的详细描述，本发明的上述目的、优点和特征将变得更显而易见，其中：The above objects, advantages and features of the present invention will become more apparent by referring to the following detailed description of the adopted embodiments in conjunction with the accompanying drawings, wherein:

图1是本发明的整体流程结构的示意图；Fig. 1 is a schematic diagram of the overall process structure of the present invention;

图2是本发明实施方案的半监督多模态深度学习框架的具体实践步骤；Fig. 2 is the specific practical steps of the semi-supervised multimodal deep learning framework of the embodiment of the present invention;

图3是本发明实施方案的半监督多模态深度学习框架的具体实践步骤；Fig. 3 is the specific practical steps of the semi-supervised multimodal deep learning framework of the embodiment of the present invention;

图4是本发明实施方案的半基于自编码的网络结构参数初始化的具体流程图；Fig. 4 is the concrete flowchart of the semi-autonomous network structure parameter initialization based on the embodiment of the present invention;

图5是本发明实施方案的基于多目标的无标记样本协同预测方法的具体流程图。Fig. 5 is a specific flow chart of the multi-objective-based collaborative prediction method for unlabeled samples according to the embodiment of the present invention.

具体实施方式Detailed ways

下面将参考附图来详细描述本发明的具体实施方式。Specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

图1示出本发明的实施方案的系统结构。102是高光谱图像的样本和标记信息的输入，103代表我们对接收的样本进行的预处理，104代表我们将处理后的样本分组，105是将分好组后的样本分别送到半监督多模态深度学习框架中进行学习，106代表将每个深度学习框架的结果结合起来做一个决策处理，107是得到最终的分类结果。Fig. 1 shows the system structure of the embodiment of the present invention. 102 is the input of hyperspectral image samples and label information, 103 represents our preprocessing of received samples, 104 represents our grouping of processed samples, and 105 is sending the grouped samples to semi-supervised multiple Learning in the modal deep learning framework, 106 represents combining the results of each deep learning framework to make a decision process, and 107 represents obtaining the final classification result.

图2和图3示出本发明实施方案的半监督多模态深度学习框架的具体实践步骤。本方案把高光谱图像不同模态数据分别送入深度神经网络，并采用半监督的方法来利用大量未标记样本。201到203表示把高光谱图像分为光谱模态、空间纹理模态和空间相关性模态，针对每种模态分别设计对应的深度卷积神经网络分别为204到206，其中深度网络采用AlexNet架构，AlexNet架构拥有五个卷积层，两个全连接层。根据模态的分类贡献差异度如304所示，306是设计融合三个模态的融合分类器F。每个模态本身具有自己的隐含属性，比如纹理类似的可划做一类，因此可以对不同模态数据进行聚类，以获得其内在属性类别，每个模态都对应一个隐含属性分类器，分别为305是光谱隐含属性分类器S，307是纹理隐含属性分类器T，以及308是相关性隐含属性分类器R。深度神经网络以样本类别和隐含属性类别两个分类目标进行抽象特征的学习，具体多分类目标学习策略(207所示)如图3所示。以光谱模态301为例，展示了多分类目标的学习策略，其中306是融合分类器F，它是以样本类别作为目标的，谱隐含属性分类器S如305所示是以光谱的隐含属性为目标的，通过融合两个分类目标，可以学到以样本类别为基础的隐含属性类别如309所示。Fig. 2 and Fig. 3 show the specific practical steps of the semi-supervised multi-modal deep learning framework of the embodiment of the present invention. In this scheme, different modal data of hyperspectral images are fed into the deep neural network, and a semi-supervised method is used to utilize a large number of unlabeled samples. 201 to 203 indicate that the hyperspectral image is divided into spectral mode, spatial texture mode and spatial correlation mode, and the corresponding deep convolutional neural network is designed for each mode respectively 204 to 206, and the deep network adopts AlexNet Architecture, the AlexNet architecture has five convolutional layers and two fully connected layers. The difference degree of classification contribution according to the modalities is shown in 304 , and 306 is the fusion classifier F designed to fuse the three modalities. Each modality itself has its own hidden attributes. For example, similar textures can be classified as a category. Therefore, different modality data can be clustered to obtain their intrinsic attribute categories. Each modality corresponds to an implicit attribute. Classifiers, respectively 305 is a spectral hidden attribute classifier S, 307 is a texture hidden attribute classifier T, and 308 is a correlation hidden attribute classifier R. The deep neural network uses two classification targets of sample category and hidden attribute category to learn abstract features. The specific multi-category target learning strategy (shown in 207 ) is shown in FIG. 3 . Taking the spectral modality 301 as an example, it demonstrates the learning strategy of multi-category targets, in which 306 is the fusion classifier F, which takes the sample category as the target, and the spectral hidden attribute classifier S, as shown in 305, uses the spectral latent With the attribute as the target, by fusing two classification targets, the hidden attribute category based on the sample category can be learned, as shown in 309 .

图4示出本发明实施方案的基于自编码的网络结构参数初始化的具体实践步骤。高光谱遥感图像通常只有少量的高质量的标记信息，很难利用有监督方法学习到合适的网络初始化参数，为了学习相对合理的初始网络参数，将采用基于自编码的深度神经网络进行特征学习，402是将所有标记和未标记的数据都送入自编码深度网络进行学习，此深度网络架构包括五层卷积层和四层编码与自编码层，403是分别为不同模态设计类似的网络，404是通过自编码重构来获取各自的初始化参数,对于高光谱图像的某种模态数据v，相应的输出是405是通过最小化均方重构误差来训练网络，其中M和N分别是有标记样本和未标记样本的数量，r表示模态。406是我们使用基于随机梯度下降的标准反向传播算法来优化重建网络。407是当每个模态的网络达到收敛后，基于自编码的网络结构中的卷积层参数作为对应的半监督深度网络结构中相应模态的卷积层参数。Fig. 4 shows specific practical steps of initialization of network structure parameters based on autoencoder according to the embodiment of the present invention. Hyperspectral remote sensing images usually only have a small amount of high-quality label information, and it is difficult to use supervised methods to learn suitable network initialization parameters. In order to learn relatively reasonable initial network parameters, a deep neural network based on self-encoding will be used for feature learning. 402 is to send all labeled and unlabeled data into the self-encoding deep network for learning. This deep network architecture includes five layers of convolutional layers and four layers of encoding and self-encoding layers. 403 is to design similar networks for different modalities. , 404 is to obtain the respective initialization parameters through self-encoding reconstruction. For a certain modality data v of the hyperspectral image, the corresponding output is 405 is by minimizing the mean square reconstruction error To train the network, where M and N are the number of labeled samples and unlabeled samples, respectively, and r represents the modality. 406 is where we optimize the reconstruction network using a standard backpropagation algorithm based on stochastic gradient descent. 407 is when the network of each modality reaches convergence, the parameters of the convolution layer in the network structure based on self-encoding are used as the parameters of the convolution layer of the corresponding mode in the corresponding semi-supervised deep network structure.

图5示出本发明实施方案的基于多目标的无标记样本协同预测方法的具体实践步骤。对于高光谱的某一模态数据，503是针对所有的标记样本，利用聚类方法来获取其隐含的属性分类，通过聚类后每个样本拥有两个标签：原始的类别标签c和聚类后的隐含属性类别标签c’，504是针对标记样本,可以获得这两个类别的对应关系。针对无标记样本，聚类的目的就是：根据无标记样本(c标签未知)的聚类结果(获得c’标签)，在有标记样本中根据c’标签来寻找其相似的c标签。为了更好的平衡类别标签c和隐含属性类别标签c’，在505种我们以两者为目标进行深度模型训练。基于对多模态数据的考虑，针对类别标签c的分类目标采用多模态融合的方式，融合策略采用前面提到的分类贡献差异度，多目标深度模型学习的目标函数为：这里r表示模态，w^r表示r模态的分类贡献度。基于多目标损失函数完成网络训练后，506是针对未标记数据，先通过多目标深度网络计算其深度特征，然后507是基于聚类标签寻找近似的有标记样本，508是根据有标记样本的标记信息预测未标记样本的标记，需要设置预测标记的置信标准，如果超出标准则不做预测，509是当一定批次的未标记样本被预测出标记信息后，可以自适应地调整每个模态的分类贡献差异度，510是当网络训练趋于稳定，未有满足预测标准的新样本出现，可停止训练。通过深度模型得到各自的分类结果后，通过决策组合来确定最终的分类结果。FIG. 5 shows specific practical steps of the multi-objective-based collaborative prediction method for unlabeled samples according to the embodiment of the present invention. For a certain modality data of the hyperspectrum, 503 is for all labeled samples, using the clustering method to obtain its hidden attribute classification. After clustering, each sample has two labels: the original category label c and the clustering The hidden attribute category label c' after the class, 504 is for the labeled sample, and the corresponding relationship between the two categories can be obtained. For unlabeled samples, the purpose of clustering is: according to the clustering results of unlabeled samples (unknown c labels) (obtaining c' labels), in labeled samples according to c' labels to find similar c labels. In order to better balance the category label c and the hidden attribute category label c', in the 505 categories, we target both for deep model training. Based on the consideration of multi-modal data, multi-modal fusion is adopted for the classification target of category label c, and the fusion strategy adopts the difference degree of classification contribution mentioned above. The objective function of multi-target deep model learning is: Here r represents the modality, and w ^r represents the classification contribution of modality r. After the network training is completed based on the multi-objective loss function, 506 is for the unlabeled data, first calculate its deep features through the multi-objective deep network, then 507 is to find approximate marked samples based on the cluster label, and 508 is based on the label of the marked sample Information predicts the labeling of unlabeled samples. It is necessary to set the confidence standard of the prediction label. If it exceeds the standard, no prediction is made. 509 means that when a certain batch of unlabeled samples is predicted with label information, each modality can be adaptively adjusted. The classification contribution difference degree of , 510 means that when the network training tends to be stable and no new samples meeting the prediction standard appear, the training can be stopped. After the respective classification results are obtained through the deep model, the final classification result is determined through decision combination.

尽管以上已经结合本发明的优选实施例示出了本发明，但是本领域的技术人员将会理解，在不脱离本发明的精神和范围的情况下，可以对本发明进行各种修改、替换和改变。因此，本发明不应由上述实施例来限定，而应由所附权利要求及其等价物来限定。Although the present invention has been illustrated in conjunction with the preferred embodiments thereof, those skilled in the art will understand that various modifications, substitutions and alterations can be made to the present invention without departing from the spirit and scope of the invention. Accordingly, the invention should not be limited by the above-described embodiments, but by the appended claims and their equivalents.

Claims

1. based on semi-supervised multimodal deep learning classification method, it is characterized in that, comprising the steps:

(1) The hyperspectral image is divided into spectral mode, spatial texture mode and spatial correlation mode, and the corresponding deep convolutional neural network is designed for each mode;

(2) Design a fusion classifier F that fuses the three modalities according to the differences in the classification contributions of the modalities; each modal itself has its own hidden attributes, so different modal data can be clustered to obtain its Intrinsic attribute category, each modality corresponds to a latent attribute classifier, which are spectral latent attribute classifier S, texture latent attribute classifier T, and correlation latent attribute classifier R.

(3) For all labeled samples, use the clustering method to obtain their hidden attribute classification;

(4) After clustering, each sample has two labels, and the corresponding relationship between these two categories can be obtained;

(5) Carry out cluster analysis on unlabeled samples, first calculate its deep features through multi-target deep network, then find approximate labeled samples based on cluster labels, and finally predict the label of unlabeled samples according to the label information of labeled samples .

2. the multimodal deep learning classification method based on semi-supervised according to claim 1, it is characterized in that, described step (5) needs to set the confidence standard of prediction mark, if exceed standard then do not predict, when certain batch After the unlabeled samples are predicted with labeled information, the classification contribution difference of each modality is adaptively adjusted.