CN110096948B

CN110096948B - Remote sensing image identification method based on characteristic aggregation convolutional network

Info

Publication number: CN110096948B
Application number: CN201910198418.0A
Authority: CN
Inventors: 卢孝强; 李学龙; 孙昊
Original assignee: XiAn Institute of Optics and Precision Mechanics of CAS
Current assignee: XiAn Institute of Optics and Precision Mechanics of CAS
Priority date: 2019-03-15
Filing date: 2019-03-15
Publication date: 2020-11-17
Anticipated expiration: 2039-03-15
Also published as: CN110096948A

Abstract

The invention relates to a remote sensing image recognition method based on a feature aggregation convolutional network, comprising the following steps: 1) extracting features by using a VGG-16 convolutional neural network; 2) coding convolution features; 3) remote sensing scene expression; 4) training Feature aggregation convolutional network; 5) Predict the scene category of remote sensing images. The invention establishes a remote sensing scene recognition method based on a feature aggregation convolution network, directly learns the remote sensing scene expression from the remote sensing scene database, improves the remote sensing scene recognition accuracy, and can be used in forest fire monitoring, urban planning and other fields.

Description

Remote sensing image recognition method based on feature aggregation convolutional network

技术领域technical field

本发明涉及遥感图像处理技术领域，具体涉及一种遥感影像识别方法。The invention relates to the technical field of remote sensing image processing, in particular to a remote sensing image recognition method.

背景技术Background technique

遥感影像是一类由遥感成像系统捕获的对地观测影像数据，其包含了详细的地物构成和分布信息。但由于遥感场景构成复杂、地物目标较小、地物空间分布杂乱因素，使得同一类遥感场景图像之间差异较大，而不同类遥感场景图像之间差异较小。因此，准确地根据遥感场景地物内容推断出遥感场景类别，仍然是一件非常具有挑战性的任务。Remote sensing images are a type of earth observation image data captured by remote sensing imaging systems, which contain detailed information on the composition and distribution of ground objects. However, due to the complex composition of remote sensing scenes, the small objects of objects, and the cluttered spatial distribution of objects, the differences between the images of the same type of remote sensing scenes are large, and the differences between the images of different types of remote sensing scenes are small. Therefore, it is still a very challenging task to accurately infer the remote sensing scene category according to the remote sensing scene ground object content.

近年来广大学者提出很多遥感场景识别方法，根据方法中所采用的特征算子可将现有方法分为两类：基于手工特征的方法和基于卷积神经网路的方法。基于手工特征的方法通常采用人工精心设计的特征描述符来构建遥感场景表达。如Wang等人在文献“Y.Wanget al.,"Learning a Discriminative Distance Metric With Label Consistency forScene Classification,"IEEE Transactions on Geoscience and Remote Sensing,vol.55,no.8,pp.4427-4440,2017.”中利用经典的尺度不变特征和稀疏编码提取遥感场景表达，然后采用标签约束的距离度量函数对遥感场景进行分类。然而，人工设计的特征算子往往强依赖于图像先验知识，不能提取遥感场景的高层语义信息。基于卷积神经网路的方法往往利用深度卷积神经网路提取遥感场景的高层语义信息。如He等人在文献“N.He,L.Fang,S.Li,A.Plaza and J.Plaza,"Remote Sensing Scene Classification UsingMultilayer Stacked Covariance Pooling,"IEEE Transactions on Geoscience andRemote Sensing,pp.1–12,2018.”中利用卷积神经网络和多层堆叠协方差下采样构建含有高层语义信息的遥感场景表达，接着采用支持向量机进行遥感场景识别。但是，目前基于卷积神经网络的方法在利用卷积神经网络提取遥感场景的高层语义信息后，仍然使用传统无监督的特征聚合方法来构建遥感场景表达，未见充分有效利用标签信息监督特征聚合过程。In recent years, scholars have proposed many remote sensing scene recognition methods. According to the feature operators used in the methods, the existing methods can be divided into two categories: methods based on handcrafted features and methods based on convolutional neural networks. Handcrafted feature-based methods usually employ handcrafted feature descriptors to construct remote sensing scene representations. Such as Wang et al. in the document "Y.Wang et al., "Learning a Discriminative Distance Metric With Label Consistency for Scene Classification," IEEE Transactions on Geoscience and Remote Sensing, vol.55, no.8, pp.4427-4440, 2017. ” uses classical scale-invariant features and sparse coding to extract remote sensing scene representations, and then uses a label-constrained distance metric function to classify remote sensing scenes. However, artificially designed feature operators are often strongly dependent on image prior knowledge and cannot extract high-level semantic information of remote sensing scenes. Methods based on convolutional neural networks often use deep convolutional neural networks to extract high-level semantic information of remote sensing scenes. For example, He et al. in the literature "N. He, L. Fang, S. Li, A. Plaza and J. Plaza, "Remote Sensing Scene Classification Using Multilayer Stacked Covariance Pooling," IEEE Transactions on Geoscience and Remote Sensing, pp.1–12 , 2018." in using convolutional neural network and multi-layer stacked covariance downsampling to construct remote sensing scene representation containing high-level semantic information, and then using support vector machine for remote sensing scene recognition. However, the current methods based on convolutional neural networks still use traditional unsupervised feature aggregation methods to construct remote sensing scene representations after using convolutional neural networks to extract high-level semantic information of remote sensing scenes. process.

综上所述，现有遥感场景识别方法依赖于传统手工特征或传统特征聚合方法，难以获得令人满意的识别精度。To sum up, the existing remote sensing scene recognition methods rely on traditional manual features or traditional feature aggregation methods, and it is difficult to obtain satisfactory recognition accuracy.

发明内容SUMMARY OF THE INVENTION

本发明的目的是解决原始遥感场景数据库训练集数据过少、难以训练卷积神经网络的问题，提出一种基于特征聚合卷积网络的遥感影像识别方法，提高卷积神经网络对遥感场景的表达能力。The purpose of the invention is to solve the problem that the original remote sensing scene database training set has too little data and it is difficult to train the convolutional neural network, and proposes a remote sensing image recognition method based on the feature aggregation convolutional network to improve the expression of the convolutional neural network for remote sensing scenes. ability.

本发明的技术方案如下：The technical scheme of the present invention is as follows:

一种基于特征聚合卷积网络的遥感影像识别方法，包括以下步骤：A remote sensing image recognition method based on a feature aggregation convolutional network, comprising the following steps:

一、训练阶段：First, the training stage:

1)利用VGG-16卷积神经网络提取特征；1) Extract features using VGG-16 convolutional neural network;

利用卷积层、下采样层、全连接层、激活函数层搭建VGG-16卷积神经网络框架，利用VGG-16不同卷积层分别提取遥感影像的卷积特征，其中，VGG-16中较浅卷积层的卷积特征可以较好提取遥感影像的空间信息，而较深的卷积特征更善于提取遥感影像的语义信息；The VGG-16 convolutional neural network framework is built by using convolutional layer, downsampling layer, fully connected layer and activation function layer, and the convolutional features of remote sensing images are extracted by different convolutional layers of VGG-16 respectively. The convolution features of the shallow convolution layer can better extract the spatial information of remote sensing images, while the deeper convolution features are better at extracting the semantic information of remote sensing images;

2)卷积特征编码；2) Convolutional feature encoding;

构建一个深度卷积特征编码模块，该模块可以嵌入到VGG-16卷积神经网络中，对不同卷积特征的空间信息和语义信息进行融合，将不同卷积层的卷积特征编码为卷积表达；Construct a deep convolutional feature encoding module, which can be embedded in the VGG-16 convolutional neural network, fuse the spatial information and semantic information of different convolutional features, and encode the convolutional features of different convolutional layers as convolutional features. Express;

3)遥感场景表达；3) Remote sensing scene expression;

将卷积表达与VGG-16卷积神经网络的全连接层特征进行融合，得到具有判别性的遥感场景表达；The convolutional representation is fused with the fully connected layer features of the VGG-16 convolutional neural network to obtain a discriminative remote sensing scene representation;

4)训练特征聚合卷积网络；4) Training feature aggregation convolutional network;

将VGG-16卷积神经网络、卷积特征编码模块和遥感场景表达模块整合至同一卷积神经网络结构中，形成端到端的特征聚合卷积网络。使用随机梯度下降法对特征聚合卷积网络进行端到端的联合训练。The VGG-16 convolutional neural network, convolutional feature encoding module and remote sensing scene expression module are integrated into the same convolutional neural network structure to form an end-to-end feature aggregation convolutional network. End-to-end joint training of feature aggregation convolutional networks using stochastic gradient descent.

二、实测场景类别2. Types of Measured Scenarios

将待测试遥感影像输入到完成训练的特征聚合卷积网络中，特征聚合卷积网络运算输出遥感影像的类别。Input the remote sensing image to be tested into the trained feature aggregation convolution network, and the feature aggregation convolution network operation outputs the category of the remote sensing image.

上述步骤1)具体可以是：The above step 1) can be specifically:

分别使用卷积层、下采样层、激活函数层等搭建VGG-16卷积神经网络，将256×256尺寸的遥感场景图像输入到VGG-16卷积神经网络，经由如下一系列的非线性卷积操作，提取遥感场景图像的多层卷积特征：The VGG-16 convolutional neural network is built using the convolutional layer, the downsampling layer, the activation function layer, etc., and the 256×256-sized remote sensing scene image is input into the VGG-16 convolutional neural network, through the following series of nonlinear volumes Product operation to extract multi-layer convolution features of remote sensing scene images:

xⁱ＝σ(wⁱ*x^i-1+bⁱ)x ⁱ =σ( ^wi *x ^i-1 +b ⁱ )

其中，xⁱ表示遥感场景图像在i卷积层的卷积特征，i＝1,2,3,…，x⁰表示初始256×256尺寸的遥感场景图像，wⁱ表示第i卷积层的权重，bⁱ表示第i卷积层的偏置，“*”表示卷积运算，σ(x)＝max(0,x)表示非线性激活函数。Among them, x ⁱ represents the convolution feature of the remote sensing scene image in the i convolution layer, i=1, 2, 3,..., x ⁰ represents the initial 256×256 size remote sensing scene image, and w ⁱ represents the i-th convolution layer. Weight, b ⁱ represents the bias of the i-th convolutional layer, "*" represents the convolution operation, and σ(x)=max(0,x) represents the nonlinear activation function.

上述步骤2)具体可以是：The above step 2) can be specifically:

2.1)构建卷积特征尺度统一模块，使用VGG-16的第3,4,5个下采样层提取遥感场景图像的卷积特征并分别记为x¹、x²、x³。一方面由于下采样层会降低卷积特征的分辨率，另一方面因为不同卷积层卷积核个数不一致，所以x¹、x²和x³的尺寸分别为28×28×256、14×14×512和7×7×512。接着，分别使用步长为4的4×4下采样和步长为2的2×2下采样作用于x¹和x²，则所有卷积特征的长和宽都被统一至7×7。2.1) Construct a convolution feature scale unified module, and use the 3rd, 4th, and 5th downsampling layers of VGG-16 to extract the convolution features of remote sensing scene images and denote them as x ¹ , x ² , and x ³ respectively. On the one hand, since the downsampling layer will reduce the resolution of convolutional features, and on the other hand, because the number of convolution kernels in different convolutional layers is inconsistent, the dimensions of x ¹ , x ² and x ³ are 28×28×256, 14 respectively. ×14×512 and 7×7×512. Then, 4×4 downsampling with stride 4 and 2×2 downsampling with stride 2 are applied to x ¹ and x ² respectively, and the length and width of all convolutional features are unified to 7×7.

2.2)构建通道内卷积特征归一化模块，利用L2-标准化分别将统一尺度后的x¹、x²和x³在通道维度归一化至[0,1]，计算公式如下所示：2.2) Construct an in-channel convolution feature normalization module, and use L2-normalization to normalize the unified scale x ¹ , x ² and x ³ in the channel dimension to [0,1] respectively. The calculation formula is as follows:

其中，

为卷积特征xⁱ在(h,w,c)处的值，

和

的尺寸相同，ε＝e^-8用于避免除数为0。in,

is the value of the convolution feature x ⁱ at (h, w, c),

and

of the same size, ε=e- ⁸ is used to avoid division by 0.

2.3)构建卷积特征编码模块，将归一化后的卷积特征

和

编码为卷积表达。首先，为避免引入需要训练的参数，采用特征串联的方式直接在通道维将

和

聚合到一起，聚合后的特征尺度为7×7×1280。接着，利用卷积核尺寸为1×1卷积层和ReLU非线性激活函数，在语义标签信息监督下对聚合后的特征进行非线性编码，得到融合了不同卷积层空间信息和语义信息的卷积表达。2.3) Construct a convolutional feature encoding module to normalize the convolutional features

and

Encoded as a convolutional representation. First of all, in order to avoid introducing parameters that need to be trained, the feature concatenation method is used to directly combine the channel dimension.

and

Aggregated together, the aggregated feature scale is 7×7×1280. Then, using the convolution kernel size of 1×1 convolution layer and the ReLU nonlinear activation function, the aggregated features are nonlinearly encoded under the supervision of semantic label information, and a fusion of spatial information and semantic information of different convolution layers is obtained. Convolutional expression.

上述步骤3)具体为：The above step 3) is specifically:

构建遥感场景表达模块，卷积神经网络的层次结构决定其不同层特征含有不同的空间信息和语义信息，通过卷积特征编码模块得到的卷积表达融合了不同卷积特征的空间信息和语义信息。但是卷积特征是遥感场景的局部特征，经由卷积特征编码模块得到的卷积表达同样仍然不能捕捉遥感场景的全局信息。而VGG-16卷积神经网络全连接层的特征含有遥感场景的全局信息。本发明提出一种渐进特征聚合策略，根据特征的空间结构信息逐步聚合特征。首先，通过卷积特征编码模块对卷积特征进行聚合得到卷积表达。然后，将卷积表达与VGG-16的全连接层特征进行融合，最终得到具有判别性的遥感场景表达。The remote sensing scene expression module is constructed. The hierarchical structure of the convolutional neural network determines that its different layer features contain different spatial information and semantic information. The convolutional expression obtained by the convolutional feature encoding module integrates the spatial information and semantic information of different convolutional features. . However, convolutional features are local features of remote sensing scenes, and the convolutional representation obtained through the convolutional feature encoding module still cannot capture the global information of remote sensing scenes. The features of the fully connected layer of the VGG-16 convolutional neural network contain the global information of the remote sensing scene. The present invention proposes a progressive feature aggregation strategy, which gradually aggregates features according to the spatial structure information of the features. First, the convolutional representation is obtained by aggregating the convolutional features through the convolutional feature encoding module. Then, the convolutional representation is fused with the fully connected layer features of VGG-16, and finally a discriminative remote sensing scene representation is obtained.

上述步骤4)具体为：The above step 4) is specifically:

构建端到端的特征聚合卷积网络，将VGG-16卷积神经网络、卷积特征编码模块和遥感场景表达模块整合至一个网络中，得到特征聚合卷积网络。所提出的特征聚合卷积网络将特征学习、特征聚合和分类器集成到一个网络中，可以在语义标签信息指导下进行端到端的训练。接着，在训练数据集中，使用随机梯度下降法对特征聚合卷积网络进行端到端的联合训练。特征聚合卷积网络的目标函数如下：An end-to-end feature aggregation convolutional network is constructed, and the VGG-16 convolutional neural network, convolutional feature encoding module and remote sensing scene expression module are integrated into one network to obtain a feature aggregation convolutional network. The proposed feature aggregation convolutional network integrates feature learning, feature aggregation and classifier into one network, which can be trained end-to-end guided by semantic label information. Next, on the training dataset, the feature aggregation convolutional network is jointly trained end-to-end using stochastic gradient descent. The objective function of the feature aggregation convolutional network is as follows:

其中，x和y分别表示遥感场景图像I的场景表达和语义标签，

表示M张遥感场景图像，K表示场景类别数目，1{·}表示示性函数。由于现有的遥感场景识别数据库中训练数据较少，不足以从头开始训练卷积神经网络，本发明采用微调的方法缓解训练数据不足的问题。首先使用在目标识别数据库ImageNet中预训练的权重来初始化特征聚合卷积网络，然后利用随机梯度下降法更新特征聚合卷积网络的各层权重参数，直至最大迭代次数。where x and y represent the scene representation and semantic label of the remote sensing scene image I, respectively,

represents M remote sensing scene images, K represents the number of scene categories, and 1{·} represents the indicative function. Since there are few training data in the existing remote sensing scene recognition database, it is not enough to train the convolutional neural network from scratch, the present invention adopts the fine-tuning method to alleviate the problem of insufficient training data. First, the weights pre-trained in the target recognition database ImageNet are used to initialize the feature aggregation convolutional network, and then the stochastic gradient descent method is used to update the weight parameters of each layer of the feature aggregation convolutional network until the maximum number of iterations.

在预测各待测遥感场景的类别时，直接将256×256尺寸的遥感场景图像输入到训练完成的特征聚合卷积网络中，即可得到该遥感场景的类别。When predicting the category of each remote sensing scene to be tested, the remote sensing scene image of size 256×256 is directly input into the trained feature aggregation convolutional network, and the category of the remote sensing scene can be obtained.

本发明与现有技术相比，具有以下技术效果：Compared with the prior art, the present invention has the following technical effects:

1.本发明提出了一种有监督的特征编码模块，可以在标签信息的监督下对来自卷积神经网络不同卷积层的特征进行编码。另外，为了探索不同特征的互补性，本发明提出了一种渐进聚合策略，根据特征的空间结构信息逐步聚合特征。1. The present invention proposes a supervised feature encoding module that can encode features from different convolutional layers of a convolutional neural network under the supervision of label information. In addition, in order to explore the complementarity of different features, the present invention proposes a progressive aggregation strategy, which gradually aggregates features according to their spatial structure information.

2.本发明提出了一种端到端的特征聚合卷积网络，直接从遥感场景数据库学习遥感场景表达，所提出的方法将特征学习、特征聚合和分类器集成到一个统一的框架中进行联合训练。2. The present invention proposes an end-to-end feature aggregation convolutional network that directly learns remote sensing scene representations from a remote sensing scene database. The proposed method integrates feature learning, feature aggregation and classifiers into a unified framework for joint training. .

3.本发明直接从遥感场景数据库中学习遥感场景表达，提高了遥感场景识别准确率，可广泛应用于森林火灾监测、城市规划等领域。3. The present invention directly learns the remote sensing scene expression from the remote sensing scene database, improves the recognition accuracy of the remote sensing scene, and can be widely used in forest fire monitoring, urban planning and other fields.

附图说明Description of drawings

图1为本发明基于特征聚合卷积网络的遥感场景识别方法流程图。FIG. 1 is a flowchart of a remote sensing scene recognition method based on a feature aggregation convolutional network according to the present invention.

具体实施方式Detailed ways

如图1所示，本发明提供的基于特征聚合卷积网络的遥感场景识别方法，主要包括以下步骤：As shown in Figure 1, the remote sensing scene recognition method based on the feature aggregation convolution network provided by the present invention mainly includes the following steps:

利用卷积层、下采样层、全连接层、激活函数层搭建VGG-16卷积神经网络框架，利用VGG-16不同卷积层分别提取遥感影像的卷积特征；The VGG-16 convolutional neural network framework is constructed by using convolutional layer, downsampling layer, fully connected layer, and activation function layer, and the convolutional features of remote sensing images are extracted by different convolutional layers of VGG-16;

2)卷积特征编码；2) Convolutional feature encoding;

3)遥感场景表达；3) Remote sensing scene expression;

5)预测遥感影像的场景类别；5) Predict the scene category of remote sensing images;

将待测试的遥感影像输入到特征聚合卷积网络中，直接得到该遥感场景的类别。The remote sensing image to be tested is input into the feature aggregation convolutional network, and the category of the remote sensing scene is directly obtained.

以下结合附图1，对本发明实现的步骤作进一步的详细描述。The steps implemented by the present invention will be further described in detail below with reference to FIG. 1 .

步骤1，利用VGG-16卷积神经网络提取特征。Step 1, using VGG-16 convolutional neural network to extract features.

xⁱ＝σ(wⁱ*x^i-1+bⁱ)x ⁱ =σ( ^wi *x ^i-1 +b ⁱ )

步骤2，卷积特征编码。Step 2, convolutional feature encoding.

2.1)构建卷积特征尺度统一模块，使用VGG-16的第3,4,5个下采样层提取遥感场景图像的卷积特征并分别记为x¹、x²、x³。一方面由于下采样层会降低卷积特征的分辨率，另一方面因为不同卷积层卷积核个数不一致，所以x¹、x²和x³的尺寸分别为28×28×256、14×14×512和7×7×512。接着，分别使用步长为4的4×4下采样和步长为2的2×2下采样作用于x¹和x²，则所有卷积特征的长和宽都被统一至7×7。2.1) Construct a convolution feature scale unified module, and use the 3rd, 4th, and 5th downsampling layers of VGG-16 to extract the convolution features of remote sensing scene images and denote them as x ¹ , x ² , and x ³ respectively. On the one hand, because the downsampling layer will reduce the resolution of convolutional features, and on the other hand, because the number of convolution kernels in different convolutional layers is inconsistent, the dimensions of x ¹ , x ² and x ³ are 28×28×256, 14 respectively. ×14×512 and 7×7×512. Then, 4×4 downsampling with stride 4 and 2×2 downsampling with stride 2 are applied to x ¹ and x ² respectively, and the length and width of all convolutional features are unified to 7×7.

其中，

为卷积特征xⁱ在(h,w,c)处的值，

和

的尺寸相同，ε＝e^-8用于避免除数为0。in,

is the value of the convolution feature x ⁱ at (h, w, c),

and

of the same size, ε=e- ⁸ is used to avoid division by 0.

2.3)构建卷积特征编码模块，将归一化后的卷积特征

和

和

and

步骤3，遥感场景表达。Step 3, remote sensing scene expression.

步骤4，训练特征聚合卷积网络。Step 4, train the feature aggregation convolutional network.

其中，x和y分别表示遥感场景图像I的场景表达和语义标签，

步骤5，预测遥感影像的场景类别。Step 5, predict the scene category of the remote sensing image.

预测各待测遥感场景的类别，直接将256×256尺寸的遥感场景图像输入到训练完成的特征聚合卷积网络中，得到该遥感场景的类别。统计遥感场景识别精度。Predict the category of each remote sensing scene to be tested, and directly input the 256×256 size remote sensing scene image into the trained feature aggregation convolution network to obtain the category of the remote sensing scene. Statistical remote sensing scene recognition accuracy.

本发明的效果可以通过以下实验做进一步的说明。The effect of the present invention can be further illustrated by the following experiments.

1.仿真条件1. Simulation conditions

本发明是在中央处理器为Intel(R)Core i7-5930K 3.50GHZ、GeForce GTX TitanX GPU、内存64G、Linux操作系统上，运用MATLAB软件进行的仿真。The present invention uses MATLAB software for simulation on the central processing unit of Intel(R) Core i7-5930K 3.50GHZ, GeForce GTX TitanX GPU, memory 64G and Linux operating system.

2.仿真内容2. Simulation content

为验证所提出特征聚合卷积网络的性能，在两个公共遥感场景识别数据库上进行了实验：AID数据库和UC-Merced数据库。AID数据库由30个遥感场景类别组成，每个类别有200到400个图像。AID数据库上的图像尺寸为600×600。随机选择各场景类别50％的图像作为训练集，其余作为测试集。UC-Merced数据库包含21种遥感场景类别。每个场景类别有100个图像，尺寸为256×256。随机选择每个场景类别的80张图像作为训练集，剩余的图像作为测试集。To verify the performance of the proposed feature aggregation convolutional network, experiments are conducted on two public remote sensing scene recognition databases: AID database and UC-Merced database. The AID database consists of 30 remote sensing scene categories, each with 200 to 400 images. The image size on the AID database is 600×600. 50% images of each scene category are randomly selected as the training set, and the rest are used as the test set. The UC-Merced database contains 21 remote sensing scene categories. There are 100 images per scene category with dimensions of 256×256. 80 images of each scene category are randomly selected as the training set, and the remaining images are used as the test set.

综合考虑算法的新颖性和流行性，我们在两个数据库中分别选取了4种对比方法进行对比，以验证所提出方法的识别性能。所采用的对比算法分别出自以下文献：1.“G.-S.Xia,J.Hu,F.Hu,B.Shi,X.Bai,Y.Zhong,L.Zhang,and X.Lu,“Aid:A Benchmark DataSet for Performance Evaluation of Aerial Scene Classification,”IEEETransactions on Geoscience and Remote Sensing,vol.55,no.7,pp.3965–3981,2017.”，2.“S.Chaib,H.Liu,Y.Gu,and H.Yao,“Deep Feature Fusion for Vhr RemoteSensing Scene Classification,”IEEE Transactions on Geoscience and RemoteSensing,vol.55,no.8,pp.4775–4784,2017.”3.“N.He,L.Fang,S.Li,A.Plaza,andJ.Plaza,“Remote Sensing Scene Classification Using Multilayer StackedCovariance Pooling,”IEEE Transactions on Geoscience and Remote Sensing,pp.1–12,2018.”4.“Y.Yu and F.Liu,“Aerial Scene Classification via Multilevel Fusionbased on Deep Convolutional Neural Networks,”IEEE Geoscience and RemoteSensing Letters,vol.15,no.2,pp.287–291,2018.”和5.“K.Qi,C.Yang,Q.Guan,H.Wu,andJ.Gong,“A Multiscale Deeply Described Correlatons-based Model for Land-useScene Classification,”Remote Sensing,vol.9,no.9,pp.917–933,Sep.2017.”Taking into account the novelty and popularity of the algorithm, we selected 4 contrasting methods in the two databases for comparison to verify the recognition performance of the proposed method. The comparison algorithms used are from the following documents: 1. "G.-S.Xia, J.Hu, F.Hu, B.Shi, X.Bai, Y.Zhong, L.Zhang, and X.Lu, " Aid: A Benchmark DataSet for Performance Evaluation of Aerial Scene Classification,"IEEETransactions on Geoscience and Remote Sensing,vol.55,no.7,pp.3965–3981,2017.", 2."S.Chaib,H.Liu, Y.Gu, and H.Yao, “Deep Feature Fusion for Vhr RemoteSensing Scene Classification,” IEEE Transactions on Geoscience and RemoteSensing, vol.55, no.8, pp.4775–4784, 2017.” 3. “N.He , L. Fang, S. Li, A. Plaza, and J. Plaza, "Remote Sensing Scene Classification Using Multilayer StackedCovariance Pooling," IEEE Transactions on Geoscience and Remote Sensing, pp.1–12, 2018. "4." Y. Yu and F. Liu, "Aerial Scene Classification via Multilevel Fusionbased on Deep Convolutional Neural Networks," IEEE Geoscience and RemoteSensing Letters, vol.15, no.2, pp.287–291, 2018." and 5. "K.Qi , C. Yang, Q. Guan, H. Wu, and J. Gong, "A Multiscale Deeply Described Correlatons-based Model for Land-useScene Classification," Remote Sensing, vol.9, no.9, pp.917–933, Sep. 2017.”

利用遥感场景识别的准确率来量化衡量遥感场景识别方法的性能。在AID数据库和UC-Merced数据库上分别重复5次试验，计算平均识别准确类别，其结果如表1和表2所示。The accuracy of remote sensing scene recognition is used to quantify the performance of remote sensing scene recognition methods. The experiments were repeated 5 times on the AID database and the UC-Merced database, respectively, and the average recognition accuracy category was calculated, and the results are shown in Table 1 and Table 2.

表1 AID数据库中不同遥感场景识别方法的准确率Table 1 Accuracy of different remote sensing scene recognition methods in AID database

表2 UC-Merced数据库中不同遥感场景识别方法的准确率Table 2 Accuracy of different remote sensing scene recognition methods in UC-Merced database

从表1和表2可见，本发明的遥感场景识别准确率最高。这是因为本发明充分利用了标签信息监督特征学习和特征聚合过程，特征聚合卷积网络所提出的遥感场景表达更能提取遥感场景的地物分布特征。因此本方法比其他方法更有效，实验结果进一步验证了本发明的有效性。It can be seen from Table 1 and Table 2 that the remote sensing scene recognition accuracy of the present invention is the highest. This is because the invention makes full use of the label information to supervise the feature learning and feature aggregation process, and the remote sensing scene representation proposed by the feature aggregation convolutional network can better extract the feature distribution features of the remote sensing scene. Therefore, the method is more effective than other methods, and the experimental results further verify the effectiveness of the present invention.

Claims

1. a remote sensing image recognition method based on feature aggregation convolutional network, is characterized in that, comprises:

1) Training phase

1.1) Extract features using VGG-16 convolutional neural network:

The VGG-16 convolutional neural network framework is constructed by using the convolutional layer, the downsampling layer, the fully connected layer, and the activation function layer, and the convolutional features of the remote sensing images in the training set are extracted by different convolutional layers of the VGG-16;

Among them, the convolutional features of the shallower convolutional layers in VGG-16 are used to extract the spatial information of remote sensing images, and the convolutional features of the deeper convolutional layers are used to extract the semantic label information of remote sensing images;

1.2) Convolutional feature encoding:

Construct a convolutional feature encoding module so that the module can be embedded in the VGG-16 convolutional neural network formed in step 1.1), fuse the spatial information and semantic label information of different convolutional features, and combine the volumes of different convolutional layers. Product features are encoded as convolutional representations;

1.3) Remote sensing scene expression;

A remote sensing scene expression module is constructed to fuse the convolutional expression obtained in step 1.2) with the fully connected layer features of the VGG-16 convolutional neural network formed in step 1.1) to obtain a discriminative remote sensing scene expression;

1.4) Training feature aggregation convolutional network;

The VGG-16 convolutional neural network, convolutional feature encoding module and remote sensing scene expression module are integrated into the same convolutional neural network structure to obtain a feature aggregation convolutional network; the feature aggregation convolutional network integrates feature learning, feature aggregation and A classifier, which can be trained end-to-end under the guidance of semantic label information; in the training dataset, the feature aggregation convolutional network is jointly trained end-to-end using stochastic gradient descent;

2) Measured scene category

Input the remote sensing image to be tested into the trained feature aggregation convolution network, and the feature aggregation convolution network calculates and outputs the category of the remote sensing image; specifically refer to the training process to perform the following steps:

2.1) Extract the convolution features of the remote sensing images to be tested, and obtain the spatial information and semantic label information of different convolution features;

2.2) Integrate the spatial information and semantic label information of different convolutional features, and encode the convolutional features of different convolutional layers into convolutional expressions;

2.3) The convolutional representation is fused with the features of the fully connected layer to obtain a discriminative remote sensing scene representation; and then the classification of the remote sensing image to be tested is obtained through the classifier.

2. the remote sensing image recognition method based on feature aggregation convolutional network according to claim 1, is characterized in that, step 1.1) specifically carries out non-linear convolution operation according to following formula, extracts the multi-layer convolution feature of remote sensing scene image:

x ⁱ =σ( ^wi *x ^i-1 +b ⁱ )

Among them, x ⁱ represents the convolution feature of the remote sensing scene image in the i convolution layer, i=1, 2, 3,..., x ⁰ represents the initial 256×256 size remote sensing scene image, and w ⁱ represents the i-th convolution layer. Weight, b ⁱ represents the bias of the i-th convolutional layer, "*" represents the convolution operation, and σ(x)=max(0,x) represents the nonlinear activation function.

3. the remote sensing image recognition method based on feature aggregation convolutional network according to claim 2, is characterized in that, step 1.2) is specially:

1.2.1) Construct a unified module of convolution feature scale: use the 3rd, 4th, and 5th downsampling layers of VGG-16 to extract the convolution features of remote sensing scene images and denote them as x ¹ , x ² , and x ³ ; where x ¹ , x ² , and x ³ have dimensions 28×28×256, 14×14×512, and 7×7×512, respectively; then, downsampling 4×4 with stride 4 and 2 with stride 2, respectively ×2 downsampling acts on x ¹ and x ² , and the length and width of all convolution features are unified to 7 × 7;

1.2.2) Construct an intra-channel convolution feature normalization module: use L2-normalization to normalize the unified scale x ¹ , x ² and x ³ in the channel dimension to [0,1], the calculation formula is as follows Show:

in,

is the value of the convolution feature x ⁱ at (h, w, c),

and

are the same size, ε=e ^-8 is used to avoid the divisor being 0;

1.2.3) Constructing the convolutional feature encoding module: First, the normalized convolutional features are directly in the channel dimension by feature concatenation

and

Aggregated together, the aggregated feature scale is 7×7×1280; then, using the convolution kernel size of 1×1 convolutional layer and ReLU nonlinear activation function, the aggregated features are processed under the guidance of semantic label information. Linear encoding, resulting in a convolutional representation that fuses spatial and semantic information from different convolutional layers.

4. the remote sensing image recognition method based on feature aggregation convolutional network according to claim 3, is characterized in that, the objective function of feature aggregation convolutional network described in step 1.4) is as follows:

where x and y represent the scene representation and semantic label of the remote sensing scene image I, respectively,

represents M remote sensing scene images, K represents the number of scene categories, and 1{·} represents the indicative function.

5. the remote sensing image recognition method based on feature aggregation convolutional network according to claim 4, is characterized in that: in training stage, at first use the weight pretrained in target recognition database ImageNet to initialize feature aggregation convolutional network, then utilize The stochastic gradient descent method updates the weight parameters of each layer of the feature aggregation convolutional network until the maximum number of iterations.