CN102867195B

CN102867195B - Method for detecting and identifying a plurality of types of objects in remote sensing image

Info

Publication number: CN102867195B
Application number: CN201210300645.8A
Authority: CN
Inventors: 韩军伟; 周培诚; 王东阳; 郭雷; 程塨; 李晖晖
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2012-08-22
Filing date: 2012-08-22
Publication date: 2014-11-26
Anticipated expiration: 2032-08-22
Also published as: CN102867195A

Abstract

The invention relates to a method for detecting and recognizing multiple types of targets in remote sensing images based on sparse representation dictionary learning. The technical features are as follows: first, a dictionary is trained on the preprocessed training data using a training method based on a sparse representation dictionary; then, the sub-image block in the test image is sparsely coded using the dictionary obtained through training, and its sparse representation coefficient is obtained to obtain The reconstruction error of the sub-image block is obtained, and the candidate target area is determined by thresholding the reconstruction error; finally, the accurate detection and recognition of multiple types of targets in the remote sensing image is realized through post-processing. By using the method of the invention, multiple types of targets can be detected and recognized from remote sensing images under complex backgrounds. The invention has higher detection and recognition precision and lower false alarm rate.

Description

A Method for Detection and Recognition of Multiple Types of Targets in Remote Sensing Images

技术领域 technical field

本发明涉及一种遥感图像多类目标检测和识别方法，可以应用于复杂背景遥感图像下的多种类型目标检测和识别。The invention relates to a method for detecting and identifying multiple types of targets in remote sensing images, which can be applied to detecting and identifying multiple types of targets under complex background remote sensing images.

背景技术 Background technique

作为遥感图像处理技术的一个应用，复杂背景遥感图像下的目标检测与识别是军事侦察和精确打击等领域的一项关键技术，也一直是该领域的研究热点和难点，有着重要的军事和民用价值，受到人们越来越多的关注。As an application of remote sensing image processing technology, target detection and recognition under complex background remote sensing images is a key technology in the fields of military reconnaissance and precision strikes, and has always been a research hotspot and difficulty in this field, with important military and civilian applications. value has received more and more attention.

目前遥感图像目标检测主要有两种方法。一种是在遥感图像中通过检测目标所具有的某些形状、几何特征来解决目标检测问题，但是由于遥感图像背景复杂，存在着大量和目标相似的形状、几何特征，仅仅依靠这些特征来检测目标会出现大量的漏检、误检。另一种是基于分类的思想，其中最常见的是Bag-of-Words（BoW）分类方法，该方法首先是对图像提取SIFT特征并聚类，将聚类中心作为图像空间中的一组标准基（标准的图像区域），然后可以用这组标准基对图像进行向量表示，最后将所得到的向量通过使用SVM分类器进行分类并阈值化，得到检测结果；但是BoW方法，虽然提取的SIFT特征具有尺度和旋转不变性，但仅仅利用了特征区域的统计特征，而忽略了特征区域的空间信息，因此使用BoW的方法检测率低，虚警率高；而另外一种分类方法Linear Spatial Pyramid Matching Using Sparse Coding（ScSPM）虽然考虑到了特征区域的空间信息，但是所得到的用于分类的向量维数过高，运算量过大。另外，目前大多数基于分类的目标检测方法也仅限于对单一目标进行检测，不能同时对多个目标进行检测与识别。At present, there are mainly two methods for object detection in remote sensing images. One is to solve the target detection problem by detecting certain shape and geometric features of the target in the remote sensing image, but due to the complex background of the remote sensing image, there are a large number of similar shape and geometric features to the target, and only rely on these features to detect There will be a large number of missed detections and false detections in the target. The other is based on the idea of classification, the most common of which is the Bag-of-Words (BoW) classification method, which first extracts SIFT features from the image and clusters them, using the cluster center as a set of criteria in the image space base (standard image area), then you can use this set of standard bases to represent the image as a vector, and finally use the SVM classifier to classify and threshold the obtained vector to obtain the detection result; but the BoW method, although the extracted SIFT The feature has scale and rotation invariance, but only uses the statistical characteristics of the feature area, while ignoring the spatial information of the feature area, so the method using BoW has a low detection rate and a high false alarm rate; another classification method Linear Spatial Pyramid Although Matching Using Sparse Coding (ScSPM) takes into account the spatial information of the feature area, the dimension of the obtained vector for classification is too high and the amount of calculation is too large. In addition, most current classification-based target detection methods are limited to single target detection, and cannot detect and recognize multiple targets at the same time.

发明内容 Contents of the invention

要解决的技术问题technical problem to be solved

为了避免现有技术的不足之处，本发明提出了一种基于稀疏表示字典学习的遥感图像多类目标检测和识别的方法。这种方法可以自动地从复杂背景的遥感图像中检测并识别出不同类型的目标，具有较高的检测精度和较低的虚警率。In order to avoid the deficiencies of the prior art, the present invention proposes a method for detecting and recognizing multi-category objects in remote sensing images based on sparse representation dictionary learning. This method can automatically detect and recognize different types of targets from remote sensing images with complex backgrounds, and has high detection accuracy and low false alarm rate.

技术方案Technical solutions

一种遥感图像多类目标检测和识别方法，其特征在于步骤如下：A remote sensing image multi-class target detection and recognition method is characterized in that the steps are as follows:

步骤1：使用基于稀疏表示字典学习的方法训练字典，具体步骤如下：Step 1: Train the dictionary using the method based on sparse representation dictionary learning, the specific steps are as follows:

步骤a1训练图像前期处理：首先将原始图像中的同类别目标统一到一个主方向，然后将统一方向后的图像沿着0°到360°、按照步长旋转为个不同方向的图像；将不同类别目标的原始图像都按照上述方法处理，得到类训练图像，其中p为所要检测的不同类别目标数，为旋转角度，c是所得到的训练图像中不同目标不同方向图像的类别总个数；其中：为向下取整；Step a1 Pre-processing of training images: first unify the objects of the same category in the original image into one main direction, and then unify the direction of the image along 0° to 360°, according to the step size rotate as images in different directions; the original images of different types of targets are processed according to the above method, and the obtained Class training images, where p is the number of different categories of targets to be detected, is the rotation angle, c is the total number of categories of images in different directions for different targets in the obtained training image; where: is rounded down;

步骤b1数据预处理：采用加权平均法对类训练图像的RGB三个分量进行加权平均得到灰度图像，然后对灰度图像进行下采样处理，得到n×n大小的图像；对n×n大小的图像进行能量归一化处理得到归一化图像，再将归一化图像转换为n²×1维的列向量，将列向量作为训练数据中的一列，得到预处理后的训练数据集U＝[U₁,U₂,…,U_c]，其中U_i是训练数据集U中对应第i类的子数据集，i＝1,2,…,c；Step b1 data preprocessing: use the weighted average method to The weighted average of the RGB three components of the class training image is to obtain a grayscale image, and then the grayscale image is down-sampled to obtain an n×n size image; the energy normalization process is performed on the n×n size image to obtain a normalized Convert the normalized image to n ² ×1-dimensional column vector, and use the column vector as a column in the training data to obtain the preprocessed training data set U=[U ₁ ,U ₂ ,…,U _c ], where U _i is the sub-data set corresponding to the i-th class in the training data set U, i=1,2,...,c;

步骤c1训练字典：通过Fisher Discrimination Dictionary Learning for SparseRepresentation发布的FDDL软件包训练已知训练数据集U＝[U₁,U₂,...,U_c]，得到字典D＝[D₁,D₂,…,D_c]，其中，D_i是与第i类相对应的子字典；Step c1 training dictionary: use the FDDL software package released by Fisher Discrimination Dictionary Learning for SparseRepresentation to train the known training data set U=[U ₁ , U ₂ ,...,U _c ] to obtain the dictionary D=[D ₁ , D ₂ ,...,D _c ], where D _i is the sub-dictionary corresponding to the i-th class;

步骤2稀疏编码：根据训练所得到的字典D＝[D₁,D₂，...,D_c]，对测试图像中的每个子图像块进行稀疏编码，求出每个子图像块对应的稀疏系数，具体处理步骤如下：Step 2 Sparse coding: according to the dictionary D=[D ₁ , D ₂ , ..., D _c ] obtained from training, perform sparse coding on each sub-image block in the test image, and find the corresponding sparseness of each sub-image block The specific processing steps are as follows:

步骤a2测试图像预处理：首先使用步骤b1中所述的加权平均法将测试图像转化为测试灰度图像，然后使用大小为S×S的滑动窗口沿着测试灰度图像以间隔步长b滑动得到子图像块；将子图像块下采样处理到大小为n×n的图像，然后进行能量归一化处理，再将能量归一化处理后的图像转换为一个n²×1维的列向量β，用列向量β来表示通过滑动窗口所得到的子图像块的像素灰度值信息；Step a2 Test image preprocessing: first use the weighted average method described in step b1 to convert the test image into a test grayscale image, and then use a sliding window of size S×S to slide along the test grayscale image with an interval step b Obtain the sub-image block; downsample the sub-image block to an image of size n×n, then perform energy normalization processing, and then convert the image after energy normalization processing into an n ² ×1-dimensional column vector β, using the column vector β to represent the pixel gray value information of the sub-image block obtained through the sliding window;

步骤b2稀疏编码：对每个子图像块通过优化模型Step b2 sparse coding: pass the optimization model for each sub-image block

$\overset{^^}{α α} = = arg arg min min {| | | | α α | | | |}_{11} s the s . . t t . . {| | | | β β - - Dα Dα | | | |}_{22}^{22} \leq \leq ϵ ϵ$

得到对应每个子图像块的稀疏编码系数其中是与子字典D_i所对应的系数向量，ε＞0是容许误差，||·||₁为l₁范数，||·||₂为l₂范数；Get the sparse coding coefficient corresponding to each sub-image block in is the coefficient vector corresponding to the sub-dictionary D _i , ε>0 is the allowable error, ||·|| ₁ is the l ₁ norm, and ||·|| ₂ is the l ₂ norm;

步骤c2求取重构误差：根据稀疏编码系数计算每个子图像块与每一类的重构误差e_i，取e＝min{e_i}作为此子图像块的重构误差，并记录其所对应的类别然后根据重构误差e与预先设定的阈值τ之间的大小关系来判定此子图像块中是否包含目标：如果e＜τ，说明包含目标，否则，说明此子图像块为背景；Step c2 calculates the reconstruction error: according to the sparse coding coefficient Calculate the reconstruction error e _i of each sub-image block and each category, take e=min{e _i } as the reconstruction error of this sub-image block, and record its corresponding category Then, according to the size relationship between the reconstruction error e and the preset threshold τ, it is determined whether the sub-image block contains the target: if e<τ, it means that the target is contained, otherwise, it means that the sub-image block is the background;

步骤3目标检测与识别：Step 3 target detection and recognition:

步骤a3：将步骤c2中判定包含目标的每个子图像块所对应的重构误差e，组成一个与测试灰度图像大小一致的、表示候选目标区域的重构误差矩阵E＝(e_st)_P×Q；其中，e_st为重构误差矩阵在坐标点(s,t)处的值， $e_{st} = \{\begin{matrix} 0 & e &GreaterEqual; τ \\ e & e < τ \end{matrix},$ P×Q为测试图像的大小，s＝1,2,…P，t＝1,2,…Q；Step a3: Combine the reconstruction error e corresponding to each sub-image block determined to contain the target in step c2 to form a reconstruction error matrix E=( _est ) _P that is consistent with the size of the test grayscale image and represents the candidate target area _{× Q} ; where, e _st is the value of the reconstruction error matrix at the coordinate point (s, t), $e_{st} = \{\begin{matrix} 0 & e &Greater Equal; τ \\ e & e < τ \end{matrix},$ P×Q is the size of the test image, s=1,2,...P, t=1,2,...Q;

将步骤c2中判定包含目标的每个子图像块所对应的类别C，组成一个与测试灰度图像大小一致的、表示候选目标类别的类别矩阵L＝(C_st)_P×Q；其中C_st为类别矩阵在坐标点(s,t)处的值， $C_{st} = \{\begin{matrix} 0 & e &GreaterEqual; τ \\ C & e < τ \end{matrix};$ In step c2, the category C corresponding to each sub-image block that contains the target is determined to form a category matrix L=(C _st ) _P×Q that is consistent with the size of the test grayscale image and represents the candidate target category; where C _st is The value of the category matrix at the coordinate point (s, t), $C_{st} = \{\begin{matrix} 0 & e &Greater Equal; τ \\ C & e < τ \end{matrix};$

步骤b3：改变滑窗S×S的大小G次，重复步骤2~步骤a3G次，得到的G个重构误差矩阵和G个类别矩阵，G的取值范围为5~10；将得到的G个重构误差矩阵组成一个多尺度重构误差矩阵MAP＝(e_stg)_P×Q×G；其中，e_stg为矩阵MAP中的元素，其值为第g次改变滑窗大小得到的重构误差矩阵所对应的e_st，P×Q×G为多尺度重构误差矩阵的大小，g＝1,2,…G；Step b3: Change the size of the sliding window S×S G times, repeat steps 2 to 3G times to obtain G reconstruction error matrices and G category matrices, and the value range of G is 5~10; the obtained G Reconstruction error matrices form a multi-scale reconstruction error matrix MAP=( _estg ) _P×Q×G ; among them, e _stg is an element in the matrix MAP, and its value is the reconstruction obtained by changing the size of the sliding window for the gth time The _est corresponding to the error matrix, P×Q×G is the size of the multi-scale reconstruction error matrix, g=1,2,...G;

将得到的G个类别矩阵构成一个多尺度类别矩阵CLASS＝(C_stg)_P×Q×G；其中，C_stg为矩阵CLASS中的元素，其值为第g次改变滑窗大小得到的类别矩阵所对应的C_st；根据多尺度重构误差矩阵MAP得到一个最小重构误差矩阵(map(s,t))_P×Q，其中map(s,t)为对应最小重构误差矩阵在坐标点(s,t)处的值， The obtained G category matrices constitute a multi-scale category matrix CLASS=(C _stg ) _P×Q×G ; where, C _stg is an element in the matrix CLASS, and its value is the category matrix obtained by changing the size of the sliding window for the gth time The corresponding C _st ; According to the multi-scale reconstruction error matrix MAP, a minimum reconstruction error matrix (map(s,t)) _P×Q is obtained, where map(s,t) is the corresponding minimum reconstruction error matrix at the coordinate point the value at (s,t),

然后求出对应最小重构误差矩阵的最小类别矩阵(class(s,t))_P×Q，其中class(s,t)为最小类别矩阵在坐标点(s,t)处的值， Then find the minimum class matrix (class(s, t)) _P×Q corresponding to the minimum reconstruction error matrix, where class(s, t) is the value of the minimum class matrix at the coordinate point (s, t),

根据多尺度重构误差矩阵MAP求出尺度矩阵scale＝(scale(s,t))_P×Q，scale(s,t)为对应尺度矩阵在坐标点(r,t)处的值， $scale (s, t) = \{\begin{matrix} 0 & e_{st} = 0 \\ \underset{g}{\arg \min} {e_{stg}} & e_{st} &NotEqual; 0 \end{matrix};$ Calculate the scale matrix scale=(scale(s,t)) _P×Q according to the multi-scale reconstruction error matrix MAP, scale(s,t) is the value of the corresponding scale matrix at the coordinate point (r,t), $scale (the s, t) = \{\begin{matrix} 0 & e_{st} = 0 \\ \underset{g}{\arg \min} {e_{stg}} & e_{st} &NotEqual; 0 \end{matrix};$

步骤c3：求取最小重构误差矩阵(map(s,t))_P×Q的局部邻域极小值作为检测到的目标响应值，局部邻域极小值在最小重构误差矩阵(map(s,t))_P×Q中所对应的坐标即为目标的中心位置，根据中心位置在(class(s,t))_P×Q和(scale(s,t))_P×Q中对应的位置找到目标所对应的类别及尺度大小。Step c3: Calculate the local neighborhood minima of the minimum reconstruction error matrix (map(s,t)) _P×Q as the detected target response value, and the local neighborhood minima are in the minimum reconstruction error matrix (map (s,t)) _P×Q corresponds to the center position of the target, according to the center position in (class(s,t)) _P×Q and (scale(s,t)) _P×Q corresponding Find the category and scale size corresponding to the target.

所述加权平均法计算公式为f(x,y)＝0.3R(x,y)+0.59G(x,y)+0.11B(x,y)，式中，f(x,y)为加权平均法得到的灰度图像在像素点(x,y)的灰度值，R(x,y)、G(x,y)和B(x,y)分别为输入的训练图像在像素点(x,y)的RGB三个分量值。The calculation formula of the weighted average method is f(x, y)=0.3R(x, y)+0.59G(x, y)+0.11B(x, y), in the formula, f(x, y) is weighted The gray value of the grayscale image obtained by the averaging method at the pixel point (x, y), R(x, y), G(x, y) and B(x, y) are the input training image at the pixel point ( x, y) RGB three-component value.

所述能量归一化计算公式为式中，f_norm(x,y)为f(x,y)经过能量归一化后的灰度值，u和v分别为灰度图像的行和列大小。The energy normalization calculation formula is In the formula, f _norm (x, y) is the gray value of f(x, y) after energy normalization, and u and v are the row and column sizes of the gray image, respectively.

所述l₁范数的计算公式为The calculation formula of the _l1 norm is

${| | | | z z | | | |}_{11} = = {Σ Σ}_{k k = = 11}^{M m} | | {ξ ξ}_{k k} | |$

式中，z是大小为M×1的向量，ξ_k为向量z的元素，k＝1,2,…,M。In the formula, z is a vector with size M×1, ξ _k is the element of vector z, k=1,2,...,M.

所述l₂范数的计算公式为The calculation formula of the ₁₂ norm is

${| | | | z z | | | |}_{11} = = \sqrt{{Σ Σ}_{k k = = 11}^{M m} {| | {ξ ξ}_{k k} | |}^{22}}$

所述重构误差e_i的计算公式为The calculation formula of the reconstruction error e _i is

${e e}_{i i} = = {| | | | β β - - {D D.}_{i i} {\overset{^^}{α α}}_{i i} | | | |}_{22}^{22} + + γ γ {| | | | \overset{^^}{α α} - - {m m}_{i i} | | | |}_{22}^{22}$

式中，γ为预先设定的权值，γ的取值范围为0~1，m_i是对Y_i中的每一行的元素求均值得到的均值向量；Y_i为U_i经过字典D稀疏编码得到的最优编码系数。In the formula, γ is a preset weight, and the value range of γ is 0~1, m _i is the mean value vector obtained by averaging the elements of each row in Y _i ; Y _i is that U _i is sparse through dictionary D The optimal encoding coefficient obtained by encoding.

所述旋转角度取值范围为0°到90°。The rotation angle The value range is from 0° to 90°.

所述FDDL软件包参数λ₁的范围是0.001~0.01，λ₂的范围是0.01~0.1。The range of the FDDL software package parameter _λ1 is 0.001-0.01, and the range of _λ2 is 0.01-0.1.

所述S的取值范围为40~90之间的整数，b的取值范围为1~15之间的整数。The value range of S is an integer between 40 and 90, and the value range of b is an integer between 1 and 15.

所述阈值τ的取值范围为0~1。The value range of the threshold τ is 0~1.

有益效果Beneficial effect

本发明提出的一种基于稀疏表示字典学习的遥感图像多类目标检测和识别的方法，首先使用预处理后的训练数据训练出冗余字典，然后对测试图像中的子图像块使用训练所得到的字典进行稀疏编码，求出其稀疏表示系数，进而通过稀疏表示系数求出子图像块的重构误差，并对其进行阈值化处理，确定候选目标区域；最后经过一些后期处理实现对遥感图像多类目标的精确检测和识别。The present invention proposes a method for detecting and recognizing multi-category targets in remote sensing images based on sparse representation dictionary learning. First, the preprocessed training data is used to train a redundant dictionary, and then the sub-image blocks in the test image are obtained by training. The dictionary is sparsely encoded to obtain its sparse representation coefficient, and then obtain the reconstruction error of the sub-image block through the sparse representation coefficient, and perform thresholding processing on it to determine the candidate target area; finally, after some post-processing, the remote sensing image Accurate detection and recognition of multiple types of targets.

本发明可以自动地从复杂背景下的遥感图像中检测并识别出多种类别的目标。实践证明，该方法具有较高的检测与识别精度和较低的虚警率。The invention can automatically detect and recognize multiple types of targets from remote sensing images under complex backgrounds. Practice has proved that this method has high detection and recognition accuracy and low false alarm rate.

附图说明 Description of drawings

图1：本发明方法的基本流程图Fig. 1: basic flowchart of the inventive method

图2：本发明方法中的训练数据Figure 2: Training data in the inventive method

图3：本发明方法的部分检测结果Figure 3: Partial test results of the inventive method

(a)飞机目标检测结果（红色方框代表飞机目标，黄色方框为虚警）(a) Aircraft target detection results (red boxes represent aircraft targets, yellow boxes are false alarms)

(b)舰船目标检测结果（白色方框代表舰船目标）(b) Ship target detection results (the white box represents the ship target)

(c)油库目标检测结果（蓝色方框代表油库目标）(c) Oil depot target detection results (the blue box represents the oil depot target)

(d)飞机、舰船目标检测结果(d) Aircraft and ship target detection results

(e)飞机、油库目标检测结果(e) Detection results of aircraft and oil depot targets

(f)舰船、油库目标检测结果(f) Target detection results of ships and oil depots

具体实施方式 Detailed ways

现结合实施实例、附图对本发明作进一步描述：Now in conjunction with embodiment, accompanying drawing, the present invention will be further described:

用于实施的硬件环境是：Intel Pentium 2.93GHz CPU计算机、2.0GB内存，运行的软件环境是：Matlab R2011a和Windows XP。选取了100幅从Google Earth上获取的遥感图像进行多类目标检测实验，主要包括了三类目标：飞机、舰船、油库，其中，飞机目标共200个，舰船目标共120个，油库目标共420个。The hardware environment used for implementation is: Intel Pentium 2.93GHz CPU computer, 2.0GB memory, and the running software environment is: Matlab R2011a and Windows XP. 100 remote sensing images obtained from Google Earth were selected for multi-type target detection experiments, mainly including three types of targets: aircraft, ships, and oil depots. Among them, there are 200 aircraft targets, 120 ship targets, and oil depot targets. 420 in total.

本发明具体实施如下：The present invention is specifically implemented as follows:

1、训练冗余字典：使用基于稀疏表示字典学习的方法训练字典，具体过程如下：1. Training redundant dictionaries: Training dictionaries based on sparse representation dictionary learning methods, the specific process is as follows:

（1.1）训练图像前期处理：具体处理过程为：首先将原始图像中的同类别目标统一到一个主方向，然后将统一方向后的图像沿着0°到360°、每隔10°旋转一次，得到36类训练数据，将不同类别目标的原始图像都按照以上方法处理，最终得到55类训练图像，即c＝55，其中，飞机共36类，舰船18类，油库1类；(1.1) Pre-processing of the training image: the specific processing process is: first unify the objects of the same category in the original image into one main direction, and then rotate the image after the unified direction along 0° to 360°, every 10°, 36 types of training data are obtained, and the original images of different types of targets are processed according to the above method, and finally 55 types of training images are obtained, that is, c=55, of which, there are 36 types of aircraft, 18 types of ships, and 1 type of oil depot;

（1.2）数据预处理：采用加权平均法对55类训练图像的RGB三个分量进行加权平均得到灰度图像，然后对灰度图像进行下采样处理，得到15×15大小的图像，对15×15大小的图像进行能量归一化处理得到归一化图像，再将归一化图像转换为255×1维的列向量，将此列向量作为训练数据中的一列，得到预处理后的训练数据集U＝[U₁,U₂,...，U_c]，其中U_i是训练数据集U中对应第i类的子数据集，i＝1,2,...,c；(1.2) Data preprocessing: weighted average of the RGB three components of 55 types of training images by weighted average method to obtain a grayscale image, and then downsampled the grayscale image to obtain an image of 15×15 size, and for 15×15 Perform energy normalization processing on images with a size of 15 to obtain a normalized image, and then convert the normalized image into a 255×1-dimensional column vector, and use this column vector as a column in the training data to obtain the preprocessed training data Set U=[U ₁ , U ₂ ,..., U _c ], where U _i is the sub-data set corresponding to the i-th class in the training data set U, i=1, 2,..., c;

（1.3）通过Lei Zhang发布的FDDL软件包训练已知训练数据集U＝[U₁,U₂,...,U_c]，得到字典D＝[D₁,D₂,…,D_c]，其中，D_i是与第i类相对应的子字典；软件包参数λ₁＝0.005，λ₂＝0.05；(1.3) Train the known training data set U=[U ₁ ,U ₂ ,...,U _c ] through the FDDL software package released by Lei Zhang, and get the dictionary D=[D ₁ ,D ₂ ,...,D _c ] , where D _i is a sub-dictionary corresponding to the i-th category; software package parameters λ ₁ =0.005, λ ₂ =0.05;

所述Lei Zhang的FDDL软件包见论文：Meng Yang,Lei Zhang,Xiangchu Feng,DavidZhang.Fisher Discrimination Dictionary Learning for Sparse Representation[C].ICCV，2011The FDDL software package of Lei Zhang can be found in the paper: Meng Yang, Lei Zhang, Xiangchu Feng, David Zhang. Fisher Discrimination Dictionary Learning for Sparse Representation[C].ICCV, 2011

2、稀疏编码：根据训练得到的字典D＝[D₁,D₂,...,D_c]，对测试图像中的每个子图像块进行稀疏编码，求出每个子图像块对应的稀疏系数，具体处理步骤如下：2. Sparse coding: According to the dictionary D=[D ₁ ,D ₂ ,...,D _c ] obtained from training, perform sparse coding on each sub-image block in the test image, and find the corresponding sparse coefficient of each sub-image block , the specific processing steps are as follows:

（2.1）测试图像预处理：首先使用（1.1）中所述的加权平均法将测试图像转化为测试灰度图像，然后使用大小为S×S的滑动窗口沿着测试灰度图像以间隔步长为5个像素滑动得到子图像块，S初始值取90；对使用滑动窗口所得到的每一个子图像块，将其下采样到大小为15×15的图像，然后进行能量归一化处理，再将能量归一化处理后的图像转换为一个225×1维的列向量β，用列向量β来表示通过滑动窗口所得到的子图像块的像素灰度值信息；(2.1) Test image preprocessing: first use the weighted average method described in (1.1) to convert the test image into a test grayscale image, and then use a sliding window of size S×S to step along the test grayscale image at intervals The sub-image block is obtained by sliding 5 pixels, and the initial value of S is 90; for each sub-image block obtained by using the sliding window, it is down-sampled to an image with a size of 15×15, and then energy normalization is performed. Then convert the energy-normalized image into a 225×1-dimensional column vector β, and use the column vector β to represent the pixel gray value information of the sub-image block obtained through the sliding window;

（2.2）稀疏编码：对每个子图像块通过通过优化模型：(2.2) Sparse coding: pass through the optimization model for each sub-image block:

得到对应每一个子图像块的稀疏编码系数向量其中是与子字典D_i Get the sparse coding coefficient vector corresponding to each sub-image block in is the subdictionary D _i

对应的系数向量，容许误差ε＝0.15，||·||₁为l₁范数，||·||₂为l₂范数；；Corresponding coefficient vector, allowable error ε=0.15, ||·|| ₁ is the l ₁ norm, ||·|| ₂ is the l ₂ norm;

（2.3）求取重构误差：根据稀疏编码系数计算子图像块像块与每一类的重构误差权值γ＝0.5，取e＝min{e_i}作为此子图像块的重构误差，并记录其所对应的类别然后根据重构误差e与预先设定的阈值τ=0.3之间的大小关系来判定此子图像块中是否包含目标：如果e＜τ，说明包含目标，否则，说明此子图像块为背景；(2.3) Find the reconstruction error: according to the sparse coding coefficient Calculate the reconstruction error of the sub-image block and each class Weight γ=0.5, take e=min{e _i } as the reconstruction error of this sub-image block, and record its corresponding category Then, according to the size relationship between the reconstruction error e and the preset threshold τ=0.3, it is determined whether the sub-image block contains the target: if e<τ, it means that the target is contained, otherwise, it means that the sub-image block is the background;

3、目标检测与识别：3. Target detection and recognition:

（3.1）将（2.3）中判定包含目标的每个子图像块所对应的重构误差e，组成一个与测试灰度图像大小一致的、表示候选目标区域的重构误差矩阵E＝(e_st)_P×Q；其中e_st为重构误差矩阵在坐标点(s,t)处的值， $e_{st} = \{\begin{matrix} 0 & e &GreaterEqual; τ \\ e & e < τ \end{matrix},$ P×Q为测试图像的大小，s＝1,2,…P，t＝1,2,…Q；将（2.3）中判定包含目标的每个子图像块所对应的类别，组成一个与测试灰度图像大小一致的、表示候选目标类别的类别矩阵L＝(C_st)_P×Q；其中C_st为类别矩阵在坐标点(s,t)处的值， $C_{st} = \{\begin{matrix} 0 & e &GreaterEqual; τ \\ C & e < τ \end{matrix};$ (3.1) Combine the reconstruction error e corresponding to each sub-image block that contains the target in (2.3) to form a reconstruction error matrix E=( _est ) that is consistent with the size of the test grayscale image and represents the candidate target area _P×Q ; where e _st is the value of the reconstruction error matrix at the coordinate point (s, t), $e_{st} = \{\begin{matrix} 0 & e &Greater Equal; τ \\ e & e < τ \end{matrix},$ P×Q is the size of the test image, s=1,2,...P, t=1,2,...Q; the category corresponding to each sub-image block determined to contain the target in (2.3) is composed of a test gray The category matrix L=(C _st ) _P×Q representing the candidate target category with the same size of the degree image; where C _st is the value of the category matrix at the coordinate point (s, t), $C_{st} = \{\begin{matrix} 0 & e &Greater Equal; τ \\ C & e < τ \end{matrix};$

（3.2）改变滑窗S×S的大小，变S＝90-10×j，j＝1,2,…G为改变次数，重复2、步骤（3.1）共G次，得到的G个重构误差矩阵和G个类别矩阵；将得到的G个重构误差矩阵组成一个多尺度重构误差矩阵MAP＝(e_stg)_P×Q×G；其中，e_stg为矩阵MAP中的元素，其值为第g次改变滑窗大小得到的重构误差矩阵所对应的e_st，P×Q×G为多尺度重构误差矩阵的大小，g＝1,2,…G；将得到的G个类别矩阵构成一个多尺度类别矩阵CLASS＝(C_stg)_P×Q×G；其中，C_stg为矩阵CLASS中的元素，其值为第g次改变滑窗大小得到的类别矩阵所对应的C_st；根据多尺度重构误差矩阵MAP得到一个最小重构误差矩阵(map(s,t))_P×Q，其中map(s,t)为对应最小重构误差矩阵在坐标点(s,t)处的值，然后求出对应最小重构误差矩阵的最小类别矩阵(class(s,t))_P×Q，其中class(s,t)为最小类别矩阵在坐标点(s,t)处的值，根据多尺度重构误差矩阵MAP求出尺度矩阵scale(s,t)为对应尺度矩阵在坐标点(r,t)处的值，(3.2) Change the size of the sliding window S×S, change S=90-10×j, j=1,2,...G as the number of changes, repeat 2, step (3.1) for a total of G times, and get G reconstructions Error matrix and G category matrices; the obtained G reconstruction error matrices form a multi-scale reconstruction error matrix MAP=(e _stg ) _P×Q×G ; where, e _stg is an element in the matrix MAP, and its value is the _est corresponding to the reconstruction error matrix obtained by changing the size of the sliding window for the gth time, P×Q×G is the size of the multi-scale reconstruction error matrix, g=1,2,...G; the G categories that will be obtained The matrix constitutes a multi-scale category matrix CLASS=(C _stg ) _P×Q×G ; wherein, C _stg is an element in the matrix CLASS, and its value is C _st corresponding to the category matrix obtained by changing the size of the sliding window for the gth time; According to the multi-scale reconstruction error matrix MAP, a minimum reconstruction error matrix (map(s,t)) _P×Q is obtained, where map(s,t) is the corresponding minimum reconstruction error matrix at the coordinate point (s,t) the value of Then find the minimum class matrix (class(s, t)) _P×Q corresponding to the minimum reconstruction error matrix, where class(s, t) is the value of the minimum class matrix at the coordinate point (s, t), Calculate the scale matrix according to the multi-scale reconstruction error matrix MAP scale(s,t) is the value of the corresponding scale matrix at the coordinate point (r,t),

$scale scale ((s the s,, t t)) = = \{\begin{matrix} 00 & {e e}_{st st} = = 00 \\ \underset{g g}{arg arg min min} {{{e e}_{stg stg}}} & {e e}_{st st} &NotEqual; &NotEqual; 00 \end{matrix};;$

（3.3）：求取最小重构误差矩阵(map(s,t))_P×Q的局部邻域极小值作为检测到的目标响应值，局部邻域极小值在最小重构误差矩阵(map(s,t))_P×Q中所对应的坐标即为目标的中心位置，根据中心位置便可在(class(s,t))_P×Q和(scale(s,t))_P×Q中对应的位置找到目标所对应的类别及尺度大小。(3.3): Find the minimum reconstruction error matrix (map(s,t)) _P×Q local neighborhood minimum value as the detected target response value, the local neighborhood minimum value is in the minimum reconstruction error matrix ( The corresponding coordinates in map(s,t)) _P×Q are the center position of the target. According to the center position, the coordinates in (class(s,t)) _P×Q and (scale(s,t)) _P× The corresponding position in _Q finds the category and scale size corresponding to the target.

所述加权平均法计算公式为The formula for calculating the weighted average method is

f(x,y)＝0.3R(x,y)+0.59G(x,y)+0.11B(x,y)f(x,y)=0.3R(x,y)+0.59G(x,y)+0.11B(x,y)

式中，f(x,y)为加权平均法得到的灰度图像在像素点(x,y)的灰度值，R(x,y)、G(x,y)和B(x,y)分别为输入的训练图像在像素点(x,y)的RGB三个分量值。In the formula, f(x,y) is the grayscale value of the grayscale image at the pixel point (x,y) obtained by the weighted average method, and R(x,y), G(x,y) and B(x,y) ) are the RGB three component values of the input training image at the pixel point (x, y).

所述能量归一化计算公式为The energy normalization calculation formula is

${f f}_{norm the norm} ((x x,, y the y)) = = \frac{f f ((x x,, y the y))}{\sqrt{{Σ Σ}_{x x = = 11}^{u u} {Σ Σ}_{y the y = = 11}^{v v} {[[f f ((x x,, y the y))]]}^{22}}}$

式中，f_norm(x,y)为f(x,y)经过能量归一化后的灰度值，u和v分别为灰度图像的行和列大小，u＝15，v＝15。In the formula, f _norm (x, y) is the gray value of f(x, y) after energy normalization, u and v are the row and column sizes of the gray image, u=15, v=15.

所述l₁范数的计算公式为The calculation formula of the _l1 norm is

所述l₂范数的计算公式为The calculation formula of the ₁₂ norm is

式中，γ为预先设定的权值，γ＝0.5，m_i是对Y_i中的每一行的元素求均值得到的均值向量；Y_i为U_i经过字典D稀疏编码得到的最优编码系数；In the formula, γ is the preset weight, γ=0.5, m _i is the mean value vector obtained by averaging the elements of each row in Y _i ; Y _i is the optimal code obtained by sparse coding of U _i through dictionary D coefficient;

选用正确检测率和虚警率对本发明的有效性进行评估。其中，正确检测率定义为正确检测的目标个数与总的目标个数之比，虚警率定义为虚警个数与正确检测的目标个数和虚警个数之和的比值。同时，将本发明所得的检测结果与基于BoW的多类目标检测算法进行了对比，对比结果如表1所示。正确检测率以及虚警率均表明了本发明方法的有效性。The effectiveness of the present invention is evaluated by selecting correct detection rate and false alarm rate. Among them, the correct detection rate is defined as the ratio of the number of correctly detected targets to the total number of targets, and the false alarm rate is defined as the ratio of the number of false alarms to the sum of the number of correctly detected targets and the number of false alarms. At the same time, the detection results obtained by the present invention are compared with the multi-class target detection algorithm based on BoW, and the comparison results are shown in Table 1. Both the correct detection rate and the false alarm rate show the validity of the method of the present invention.

表1检测结果评价Table 1 Test result evaluation

Claims

1. A remote sensing image multi-category target detection and recognition method is characterized in that the steps are as follows:

Step 1: Train the dictionary using the method based on sparse representation dictionary learning, the specific steps are as follows:

Step a1 Pre-processing of training images: first unify the objects of the same category in the original image into one main direction, and then unify the direction of the image along 0° to 360°, according to the step size rotate as images in different directions; the original images of different types of targets are processed according to the above method, and the obtained Class training images, where p is the number of different categories of targets to be detected, is the rotation angle, c is the total number of categories of images in different directions for different targets in the obtained training image; where: is rounded down;

Step b1 data preprocessing: use the weighted average method to The RGB three components of class training image carry out weighted average and obtain gray scale image, and described weighted average method calculation formula is f (x, y)=0.3R (x, y)+0.59G (x, y)+0.11B ( x, y), where f(x, y) is the gray value of the gray image at the pixel point (x, y) obtained by the weighted average method, R(x, y), G(x, y) and B(x, y) are the RGB three component values of the input training image at the pixel point (x, y); then the grayscale image is down-sampled to obtain an n×n size image; for n×n size The image of the energy normalization processing, the energy normalization calculation formula is In the formula, f _norm (x, y) is the gray value of f(x, y) after energy normalization, u and v are the row and column sizes of the gray image respectively; after energy normalization processing, we get After obtaining the normalized image, the normalized image is converted into n ² ×1-dimensional column vector, and the column vector is used as a column in the training data to obtain the preprocessed training data set U=[U ₁ , U ₂ , ..., U _c ], where U _i is the sub-data set corresponding to the i-th class in the training data set U, i=1,2,...,c;

Step c1 training dictionary: use the FDDL software package released by Fisher Discrimination Dictionary Learning for SparseRepresentation to train the known training data set U=[U ₁ , U ₂ ,...,U _c ] to obtain the dictionary D=[D ₁ , D ₂ ,...,D _c ], where D _i is the sub-dictionary corresponding to the i-th class;

Step 2 sparse coding: according to the dictionary D=[D ₁ ,D ₂ ,…,D _c ] obtained from training, perform sparse coding on each sub-image block in the test image, and obtain the corresponding sparse coefficient of each sub-image block, The specific processing steps are as follows:

Step a2 Test image preprocessing: first use the weighted average method described in step b1 to convert the test image into a test grayscale image, and then use a sliding window of size S×S to slide along the test grayscale image with an interval step b Obtain the sub-image block; downsample the sub-image block to an image of size n×n, then perform energy normalization processing, and then convert the image after energy normalization processing into an n ² ×1-dimensional column vector β, using the column vector β to represent the pixel gray value information of the sub-image block obtained through the sliding window;

Step b2 sparse coding: pass the optimization model for each sub-image block

\overset{^^}{α α} = = arg arg min min {| | | | α α | | | |}_{11} s the s . . t t . . {| | | | β β - - Dα Dα | | | |}_{22}^{22} \leq \leq ϵ ϵ

Get the sparse coding coefficient corresponding to each sub-image block in is the coefficient vector corresponding to the sub-dictionary D _i , ε>0 is the allowable error, ||·|| ₁ is the l ₁ norm, and ||·|| ₂ is the l ₂ norm; the l _l norm The calculation formula is The calculation formula of the ₁₂ norm is In the two formulas, z is a vector of size M×1, ξ _k is the element of vector z, k=1,2,...,M

Step c2 calculates the reconstruction error: according to the sparse coding coefficient Calculate the reconstruction error e _i of each sub-image block and each class, the calculation formula of the reconstruction error e _i is In the formula, γ is the preset weight, m _i is the mean vector obtained by averaging the elements of each row in Y _i ; Y _i is the optimal coding coefficient obtained by sparse coding of U _i through dictionary D; take e = min{e _i } as the reconstruction error of this sub-image block, and record its corresponding category Then, according to the size relationship between the reconstruction error e and the preset threshold τ, it is determined whether the sub-image block contains the target: if e<τ, it means that the target is contained, otherwise, it means that the sub-image block is the background;

Step 3 target detection and recognition:

Step a3: Combine the reconstruction error e corresponding to each sub-image block determined to contain the target in step c2 to form a reconstruction error matrix E=( _est ) _P that is consistent with the size of the test grayscale image and represents the candidate target area _{× Q} ; where, e _st is the value of the reconstruction error matrix at the coordinate point (s, t),

e_{st} = \{\begin{matrix} 0 & e &Greater Equal; τ \\ e & e < τ \end{matrix},

P×Q is the size of the test image, s=1,2,...P, t=1,2,...Q;

In step c2, the category C corresponding to each sub-image block that contains the target is determined to form a category matrix L=(C _st ) _P×Q that is consistent with the size of the test grayscale image and represents the candidate target category; where C _st is The value of the category matrix at the coordinate point (s, t),

C_{st} = \{\begin{matrix} 0 & e &Greater Equal; τ \\ C & e < τ \end{matrix},

Step b3: Change the size of the sliding window S×S G times, repeat steps 2 to 3G times to obtain G reconstruction error matrices and G category matrices, and the value range of G is 5 to 10; the obtained G Reconstruction error matrices form a multi-scale reconstruction error matrix MAP=( _estg ) _P×Q×G ; among them, e _stg is an element in the matrix MAP, and its value is the reconstruction obtained by changing the size of the sliding window for the gth time The _est corresponding to the error matrix, P×Q×G is the size of the multi-scale reconstruction error matrix, g=1,2,...G;

The obtained G category matrices constitute a multi-scale category matrix CLASS=(C _stg ) _P×Q×G ; where, C _stg is an element in the matrix CLASS, and its value is the category matrix obtained by changing the size of the sliding window for the gth time The corresponding C _st ; According to the multi-scale reconstruction error matrix MAP, a minimum reconstruction error matrix (map(s,t)) _P×Q is obtained, where map(s,t) is the corresponding minimum reconstruction error matrix at the coordinate point the value at (s,t),

Then find the minimum class matrix (class(s, t)) _P×Q corresponding to the minimum reconstruction error matrix, where class(s, t) is the value of the minimum class matrix at the coordinate point (s, t),

Calculate the scale matrix scale=(scale(s,t)) _P×Q according to the multi-scale reconstruction error matrix MAP, scale(s,t) is the value of the corresponding scale matrix at the coordinate point (r,t),

scale (the s, t) = \{\begin{matrix} 0 & e_{st} = 0 \\ \underset{g}{\arg \min} {e_{stg}} & e_{st} &NotEqual; 0 \end{matrix};

Step c3: Calculate the local neighborhood minima of the minimum reconstruction error matrix (map(s,t)) _P×Q as the detected target response value, and the local neighborhood minima are in the minimum reconstruction error matrix (map (s,t)) _P×Q corresponds to the center position of the target, according to the center position in (class(s,t)) _P×Q and (scale(s,t)) _P×Q corresponding Find the category and scale size corresponding to the target.

2. according to claim 1 described remote sensing image multi-class target detection and recognition method, it is characterized in that: described rotation angle The value range is from 0° to 90°.

3. according to the described remote sensing image multiclass target detection and identification method of claim 1, it is characterized in that: the scope of described FDDL software package parameter λ ₁ is 0.001～0.01, and the scope of λ ₂ is 0.01～0.1.

4. The multi-class target detection and recognition method for remote sensing images according to claim 1, characterized in that: the value range of S is an integer between 40 and 90, and the value range of b is an integer between 1 and 15. integer.

5 . The method for detecting and identifying multiple types of targets in remote sensing images according to claim 1 , wherein the value range of the threshold τ is 0-1. 6 .