Detailed Description
The present invention will be described in further detail with reference to specific examples, but embodiments of the present invention are not limited thereto.
Example 1
Referring to fig. 1, fig. 1 is a flowchart of a SAR target recognition method based on fusion of ASC features and multi-scale depth features, provided in an embodiment of the present invention, which includes:
Step 1, acquiring original SAR complex images of an observation target, and extracting an attribute scattering center corresponding to each SAR complex image based on an ASC parameter estimation algorithm of improved image domain sparse representation.
The existing fast attribute scattering center extraction algorithm flow based on image domain sparse representation is shown in fig. 2, and when the parameter estimation of the target attribute scattering center is performed, the artificial zero setting operation exists, namely after one attribute scattering center is extracted, all the regional pixel points covered by the scattering center are set to zero, so that the situation that the extraction cannot be repeated at the same position is ensured, and the inaccuracy of the parameter estimation result of the attribute scattering center is increased. Aiming at the problem, the embodiment improves a fast attribute scattering center extraction algorithm based on image domain sparse representation, and the parameter estimation result is inaccurate due to the fact that the artificial zeroing operation exists in the extraction algorithm.
In this embodiment, the improved ASC parameter estimation algorithm of image domain sparse representation is used to extract the scattering center of SAR image S, the number of extracted scattering center points Q is set to 25, each scattering center point corresponds to a feature vector, and the feature vector corresponding to the ith scattering center can be expressed asWhere A i denotes the complex amplitude, a i denotes the frequency dependent factor, x i and y i denote the position coordinates in the distance and azimuth directions, respectively, L i denotes the length of the scattering center,And γ i denotes the direction angle and azimuth dependence factor of the scattering center.
Specifically, referring to fig. 3, fig. 3 is a flowchart of an improved fast attribute scattering center extraction algorithm for image domain sparse representation according to an embodiment of the present invention, which includes:
11 Firstly, SAR echo signals are converted into an image domain, and the problem to be solved by the improved rapid attribute scattering center extraction algorithm based on image domain sparse representation is still to estimate attribute scattering center parameters of the target from the back scattering echoes of the target, namely the number Q of the attribute scattering centers forming a complex target and parameter sets theta m of the attribute scattering centers, m=1, 2. This problem can be described as follows:
wherein f represents the operating frequency of the radar, Represents the range of synthetic pore diameters, f k,Respectively, represent a discrete f-number,K. H represents the number of discrete points in the frequency direction and the azimuth direction, x, y represents the coordinates of the pixel points after conversion to the image domain, f 0 represents the center frequency, c represents the speed of light,Back-scattered echo data representing the object,Echo data representing the qth attribute scattering center, Q representing the total number of attribute scattering centers, σ q representing a sparse coefficient, θ q representing a set of parameters for the qth scattering center, ε representing an error coefficient and ε >0,S (x, y), D (x, y; θ q) representing, respectivelyAnd (3) withA corresponding image domain representation; And (3) with Respectively represent after discretizationAnd (3) with
Equation (1) remains a sparse representation problem by applying a binary pattern toAnd (3) withBy applying the same linear imaging operator beta {.cndot }, the specific expression is represented by the expression (2), the expression (1) can be converted into the image domain expression, as represented by the expression (3):
wherein S (x, y), D (x, y; θ q) respectively represent And (3) withThe corresponding image domain representation.
12 Using NOMP algorithm to solve the parameters of the attribute scattering centers of S (x, y), and adding a parameter fine correction process in the solving process to obtain a plurality of attribute scattering centers.
First, an initial dictionary is established, and a residual image R (x, y) is initialized, and R (x, y) =s (x, y). Then, the optimized NOMP algorithm is utilized to extract the attribute scattering center. The extraction of the attribute scattering center of the observation target is divided into four steps, namely atom selection, atom parameter fine estimation, least square solution calculation and residual error calculation.
Specifically, step 12) includes:
a) An initial dictionary phi is established, and the expression is as follows:
in the formula,
Wherein, the Represents a normalized attribute scattering center image, Θ loc and Θ dis represent parameter sets corresponding to a local attribute scattering center and a distributed attribute scattering center, respectively, Θ A,Θαloc,ΘLloc,Θ γloc,Θx,Θy corresponds to the parameters a, α, L,Γ, x, y, where A represents complex amplitude, α represents a frequency dependent factor, x and y represent position coordinates in the distance and azimuth directions, respectively, L represents the length of the scattering center,And gamma represents the direction angle and the azimuth dependence factor of the scattering center respectively, - A,Θαdis,ΘLdis,Θ γdis,Θx,Θy corresponds to the parameters a, α, L,The specific meaning of each parameter is the same as described above for γ, x, y, "×" represents the cartesian product.
When the initialization dictionary phi is built, let Θ A={1},Θαloc={0},Θαdis={0},ΘLloc={0},ΘLdis={2ΔL,4ΔL,...,2NL deltal },Θγloc={0},Θγdis={0},This is because the values of the frequency-dependent factor α and the azimuth-dependent factor γ have little influence on the attribute scattering center echo signal and can be ignored temporarily, so that Θ αloc,Θαdis,Θγloc is set to {0}. The initialization dictionary Φ is thus built up.
B) And (3) performing atom selection on the initial dictionary phi, and selecting an atom theta i_chose with the maximum similarity with the current residual image R (x, y) from the initial dictionary phi as a rough estimation result of the current ith ASC parameter.
Specifically, for a given initial dictionary Φ, the first step of the NOMP algorithm is to select the atom from Φ that best matches (has the largest inner product) the residual image R (x, y), as shown in the following formula:
wherein (-) * represents a conjugate operation.
C) Judging whether the position parameter of the currently selected atom theta i_chose is the same as the position parameter of the atom which is selected last time, discarding the currently selected atom theta i_chose, and re-selecting the atom with the maximum similarity after removing the atom theta i_chose as a rough estimation result of the round, otherwise, executing the step d).
Specifically, the embodiment performs optimization of the atom selection method, records that the currently selected dictionary atom is θ i_chose, that is, θ i_chose=θi, records that the dictionary atom selected in the last execution of step b) is θ i_last, compares whether θ i_chose and θ i_last are the same atom, if θ i_last≠θi_chose, continues the following steps to perform parameter estimation, if θ i_last=θi_chose, gives up the atom with the largest current similarity, and selects the atom with the largest similarity after removing θ i_last as θ i_chose of the present round.
D) And taking the rough estimation result theta i_chose as an initial point, carrying out fine estimation on the ith ASC parameter to obtain a fine estimation result theta i,opt, and putting the fine estimation result theta i,opt into the selected atom set phi Gen.
Specifically, since the values of most of the attribute scattering center parameters are continuous, the parameters obtained in step b) are inaccurate and require further refined estimation. Therefore, the following equation is solved by newton method using the rough estimation result θ i_chose of the current ith ASC parameter obtained in step b) as an initial point:
where θ/A represents the remaining set of parameters except for the removal parameter A.
Note Φ Gen is the collection of the attribute scattering center images corresponding to the parameter θ i,opt, and initializeThe method comprises the following steps:
e) The input image S (x, y) is approximated by atoms in Φ Gen and the sparse coefficients are solved by the least squares method.
Specifically, using least squares estimation, the input S (x, y) is approximated by an atom in Φ Gen, as shown in the following equation:
Wherein the method comprises the steps of Representing the set of coefficients corresponding to each atom,Represents the optimal coefficients for approximating the input S (x, y) using Φ Gen.
F) And updating the residual image according to the sparse coefficient.
Specifically, the update formula is:
g) Repeating the operations from step b) to step f) until the current residual image R (x, y) can no longer extract valid attribute scattering centers, and exiting the loop.
So far, the extraction of the attribute scattering center of the observed target is finished by using the improved algorithm.
Preferably, in the present embodiment, 25 attribute scattering centers are extracted for each SAR complex image using the above method.
The method is improved and optimized for the defects of a recently proposed image domain-based quick ASC extraction algorithm, and the artificial zeroing operation is removed in the process of extracting the ASC of the target by changing the processing mode of the residual image and the selection mode of initial dictionary atoms when the sparse coefficient is solved, so that the ASC parameter estimation result of the target is more accurate and the reconstruction error of the target is smaller.
And 2, performing image global and local reconstruction on the attribute scattering center, and obtaining global and local reconstruction images with different scales in a downsampling mode.
21 And (3) bringing the attribute scattering center parameters of the observed target obtained in the step (1) into a defined expression of an attribute scattering center model to obtain integral back scattering echo data of the target and back scattering echo data of an independent attribute scattering center.
Specifically, according to the definition of the attribute scattering center model, that is, the backward scattering echo of an object in the high frequency region can be regarded as the superposition of many independent scattering point echoes, the specific form is as follows:
Wherein, the Indicating that the signal is a frequency domain signal; representing the set of all parameters, and respectively representing the backscattering coefficient, the frequency dependent factor, the length, the inclination angle, the azimuth dependent factor, the distance-oriented coordinate and the azimuth-oriented coordinate from left to right; Represents the back scattering echo signal of the observed target, Q represents the total number of attribute scattering centers constituting the current complex target, and theta q represents the parameter set of the Q-th attribute scattering center.
Wherein, the Echo signals representing the qth individual attribute scattering center; representing additive white Gaussian noise, where The specific representation of (2) is as follows:
Wherein, the Sinc (·) =sin (·)/(·), f 0 represents the center frequency of radar operation, c represents the speed of light, a q represents the backscattering coefficient, a q represents the frequency dependent factor, x q and y q represent the position coordinates of the range and azimuth directions respectively, L q,The three parameters gamma q describe the length, tilt angle and azimuth dependence factor of the attribute scattering center, respectively.
The back scattering echo of the single scattering point can be obtained by respectively introducing the estimation result of the scattering center parameters of the observation target into the step (13)The target back-scattered echo reconstructed by the extracted attribute scattering center result can be obtained by recombination (12)Then, a linear imaging operator beta {. Is applied to the frequency domain echo, and a reconstructed image S (x, y) can be obtained.
22 Respectively applying linear imaging operators to the whole back scattering echo data of the target and the back scattering echo data of the independent attribute scattering center to correspondingly obtain a global reconstruction map and a local reconstruction map of a plurality of independent scattering points;
The global reconstruction size and the global reconstruction size are both 128×128 identical to the original SAR complex image size, and the number of the obtained local reconstruction is 25 because of 25 attribute scattering centers extracted in the embodiment.
23 A global reconstruction having a size of 128×128 is downsampled to obtain a global reconstruction S recon_all_64 having a size of 64×64 and a global reconstruction S recon_all_32 having a size of 32×32, respectively, and a local reconstruction having a size of 128×128 is downsampled to obtain a local reconstruction S recon_single_64 having a size of 64×64.
It can be understood that, in this embodiment, corresponding thresholds are further set for three images obtained after the downsampling operation, so as to obtain three types of binarized graphs, which specifically includes the following steps:
For the global reconstruction S recon_all_64, a threshold t=0.01 is set, all pixel values on S recon_all_64 are compared with the threshold t, the point where the pixel value is greater than the threshold t is set to 255, and the point where the pixel value is less than the threshold t is set to 0, resulting in a binary image B recon_all_64.
For 25 partial reconstruction images S recon_single_64, a threshold t=0.01 is set, and two-value processing is performed on 25 partial reconstruction images S recon_single_64, respectively. Comparing all pixel values on the S recon_single_64 with a threshold t, wherein the point with the pixel value larger than the threshold t is set to 255, and the point with the pixel value smaller than the threshold t is set to 0. 25 binary images B recon_single_64 were obtained.
For the global reconstruction S recon_all_32, a threshold t=0.01 is set, all pixel values on S recon_all_32 are compared with the threshold t, the point where the pixel value is greater than the threshold t is set to 255, and the point where the pixel value is less than the threshold t is set to 0, resulting in a binary image B recon_all_32.
Step 3, constructing a deep neural network comprising a feature extraction module and a feature fusion module, wherein,
The feature extraction module is used for carrying out multi-scale feature extraction on the amplitude image corresponding to the original complex SAR image to obtain a multi-scale depth feature map;
the feature fusion module is used for carrying out feature fusion on different layers on the extracted multi-scale depth feature map and the binary image corresponding to the reconstruction image.
First, a feature extraction module is constructed.
In the embodiment, the constructed feature extraction module comprises 12 layers of convolution layers and 3 layers of maximum pooling layers, and the structure of the feature extraction module sequentially comprises a first convolution layer L C1, a second convolution layer L C2, a third maximum pooling layer L p3, a fourth convolution layer L C4, a fifth convolution layer L C5, a sixth maximum pooling layer L p6, a seventh convolution layer L C7, an eighth convolution layer L C8, a ninth convolution layer L C9, a tenth maximum pooling layer L p10, an eleventh convolution layer L C11, a twelfth convolution layer L C12, a thirteenth convolution layer L C13, a fourteenth convolution layer L C14 and a fifteenth convolution layer L C15, wherein the network structure of the front 13 layers is the same as the structure of the front 13 layers of the VGG16Net network.
The depth global feature output by the fifth convolution layer L C5 is used as a first-scale depth feature to be fused extracted by the feature extraction module;
the depth global feature output by the ninth convolution layer L C9 is used as a second-scale depth feature to be fused extracted by the feature extraction module;
The depth global feature output by the fifteenth convolution layer L C15 is used as a third-scale depth feature to be fused extracted by the feature extraction module.
Specifically, the parameters of each layer are set as follows: the number of convolution kernels of the 12 convolution layers is set to 64, 128, 256, 512, 256, 32, the convolution kernel sizes are all set to 3 multiplied by 3, the convolution kernel step sizes are all set to 1, and the ReLU activation functions are used as the activation functions; the core sizes of the 3 layers of the maximum pooling layers are all 2×2, and the step sizes are all set to 2.
Because the front 13-layer network structure of the feature extraction module constructed in this embodiment is identical to the VGG16Net network structure, when the network is trained subsequently, the parameter result trained on the ImageNet dataset can be used as the initial parameter of the front 13-layer of the feature extraction network in this embodiment.
Then, a feature fusion module is constructed.
In the embodiment, the feature fusion module comprises a local feature image layer fusion unit and a whole feature image layer fusion unit, wherein,
The local feature image layer fusion unit fuses the first-scale depth feature to be fused and the second-scale depth feature to be fused extracted by the feature extraction module with binary images corresponding to three different reconstruction images for multiple times, and three fusion feature images are correspondingly obtained;
The integrated feature map layer fusion unit is used for fusing the three fusion feature maps and the third-scale depth feature to be fused extracted by the feature extraction module to obtain a final fusion feature.
Specifically, as shown in fig. 4, the local feature fusion is performed three times in this embodiment, namely, the depth global feature of the 5 th layer is fused with 25 binary images B recon_single_64 reconstructed by a single scattering point, the depth global feature of the 5 th layer is fused with the binary image B recon_all_64 reconstructed integrally, and the depth global feature of the 9 th layer is fused with the binary image B recon_all_32 reconstructed integrally. The specific operation is as follows:
step1 first fusion:
The binary image B recon_single_64 reconstructed by the single scattering point has the size of 64×64×25, the binary image B recon_single_64 is multiplied by the global feature of depth of the depth feature extraction module 5 layer with the size of 64×64×128 along the channel dimension to obtain a fusion feature of 64×64×25×128, and global average pooling (Global Average Pooling, GAP) operation is carried out on the fusion feature to obtain a feature of 25×128 dimension.
In order to compress the 25 component features into one vector, statistical functions may be used at corresponding positions of the 25 vectors, and in consideration of representativeness and calculation amount, the embodiment uses max (·) and mean (·) statistical functions to complete the vector compression process, which is specifically shown in the formula (14):
C(·)=max(·)+mean(·) (14)
Wherein, C (-) represents the fusion mode used, namely, max (-) and mean (-) are respectively used at the corresponding positions of 25 local component feature vectors, and then the results are added to finally obtain a feature of 1X 128 dimension, namely, a first fusion feature.
Step2 second fusion:
the overall reconstruction binary image B recon_all_64 with the size of 64×64 is multiplied by the global feature with the size of 64×64×128 of the depth feature extraction module 5 layer along the channel dimension to obtain a 64×64×128 fusion feature, and then GAP operation is performed on the fusion feature to obtain a1×128-dimensional feature, namely a second fusion feature.
Step3 third fusion:
the overall reconstruction binary image B recon_all_32 with the size of 32×32 is multiplied by the global feature with the size of 32×32×256 of the 9 th layer of the depth feature extraction module along the channel dimension to obtain a 32×32×256 fusion feature, and then the GAP operation is performed on the fusion feature to obtain a1×256-dimensional feature, namely a third fusion feature.
For the overall feature map layer, four features are fused together in this embodiment, that is, the depth feature extraction module outputs the depth feature with the final output size of 16×16×32, and obtains the global network feature with the size of 1×32 (that is, the depth feature to be fused with the third dimension) after GAP operation, the first fusion feature with 1×128 dimensions is generated in the local feature fusion, the second fusion feature with 1×128 dimensions is generated in the local feature fusion, and the third fusion feature with 1×256 dimensions is generated in the third fusion in the local feature fusion. These four features are stitched along a first dimension to yield an overall fused feature of size 1 x 544.
And finally, constructing a fully-connected network to classify the final fusion characteristics to obtain a target classification result.
The fully-connected network FC comprises two fully-connected layers, an activation layer, a Dropout layer and a classifier layer, and has the structure that a first fully-connected layer L F1, a second activation layer L F2, a third Dropout layer L d3, a fourth fully-connected layer L F4 and a fifth classification layer L F5 are respectively arranged, the input of the network is a feature vector of 544-dimension after fusion, and the output of the network is a category prediction vector of 3-dimension
The parameters of each layer are set as follows, the dimensions of the two fully connected layers are 544 multiplied by 512 and 512 multiplied by 3, the active layer uses a ReLU activation function, the drop probability of the Dropout layer is 0.5, and the classifier layer uses a softmax classifier.
The above parts are combined together in the order shown in fig. 4 to obtain the deep neural network ψ.
It can be appreciated that after the deep neural network is built, the network needs to be trained, and then the trained network is used for target recognition.
In this embodiment, the deep neural network is trained in the following manner:
Extracting attribute scattering centers from the actual measured SAR complex images with the labels, performing global and local reconstruction, and obtaining global and local reconstruction images with different scales in a downsampling mode;
Inputting the labeled SAR complex images and the binary images corresponding to the global and local reconstruction images with different scales into a constructed deep neural network for forward propagation as shown in fig. 4;
And calculating the classification loss, and updating network parameters in a back propagation mode to obtain a trained network. Wherein the classification loss uses a cross entropy loss function, as shown below,
Wherein n represents the number of training samples, y i represents the category label of the i-th input image in the form of one-time thermal coding,Representing the corresponding prediction category label.
According to the method, the physical properties of the targets reflected by the ASC model are fully utilized, the accurately estimated ASC parameter set is used for carrying out multi-type reconstruction on the observed targets in the image domain, the ASC parameter set is fused with depth features of different scales, more information is provided for a network, and therefore target identification performance is improved.
And 4, inputting the original complex SAR image of the test data and the binary image corresponding to the reconstructed image into a trained deep neural network for processing, and outputting a target recognition result.
The invention uses the parameter migration mode in the deep network training process, and uses the front 13 layers of parameter results of VGG16Net which are fully trained on the ImageNet data set as the initial parameters of the deep network, so that the network can obtain better recognition performance through less times of training, and can also improve training efficiency.
Example two
The method provided by the invention is simulated by taking a specific scene as an example, so as to verify the effectiveness of the attribute scattering center extraction algorithm in the invention.
Specifically, in this experiment, the T72 original SAR image shown in fig. 5 is taken as an extraction object, and the method of the present invention is used to extract the attribute scattering center and reconstruct the image, and the results are shown in fig. 6 and 7.
Fig. 6 is a result of performing attribute scattering center extraction and image reconstruction on a T72 original SAR image by using the algorithm of the present invention, wherein (a) is a global reconstruction of 128 pixels in size reconstructed using the extracted 25 scattering center points, (b) is a global reconstruction of 64 pixels in size obtained by downsampling, (c) is a local reconstruction of 64 pixels in size obtained by downsampling, and (d) is a global reconstruction of 32 pixels in size obtained by downsampling.
Fig. 7 is a binarized image corresponding to the reconstruction result in fig. 6. Wherein, (a) the image is a global reconstruction binarized image of 64 pixels in size obtained through a downsampling operation, (b) the image is a local reconstruction binarized image of 64 pixels in size obtained through a downsampling operation, and (c) the image is a global reconstruction binarized image of 32 pixels in size obtained through a downsampling operation.
In order to further verify the SAR target recognition method based on fusion of ASC features and multi-scale depth features, the embodiment also recognizes the SAR target recognition method on the disclosed moving and static target MSTAR data sets.
The MSTAR dataset used for the experiment was a complex image with a resolution of 0.3m x 0.3m, each 128 x 128 pixels in size. The data set used in this experiment identifies scenes for three categories of MSTAR. The three categories of target data are T72, BMP2, BTR70, respectively. Wherein BMP2 and T72 each contain three different sequence numbers, each class of training data contains only one sequence number of each type, and the test data contains all sequence numbers of each type. The experimental data are specifically set forth in table 1.
TABLE 1 MSTAR class 3 target recognition scenario
The following table 2 shows the identification results of the method of the present invention on the MSTAR three categories of identification data shown in the above table 1, and combines the SAR ATR method of the attribute scattering center and the convolutional neural network with the existing identification method (ACNNC for short, the multiscale SAR ATR convolutional neural network based on component analysis from article A Convolutional Neural Network Combined with Attributed Scattering Centers for SAR ATR,IEEE Transactions on Geoscience and Remote Sensing,Zhou Y,2021)、 for short, CA-MCNN for short, and article Multiscale CNN Based on Component Analysis for SAR ATR,IEEE Transactions on Geoscience and Remote Sensing,Li Y,2021) for comparison.
TABLE 2 detailed recognition results of different recognition methods on MSTAR 3 class target data
| Application method |
Accuracy of identification |
| The method provided by the invention |
0.9890 |
| ACNNC |
0.9795 |
| CA-MVCC |
0.9861 |
Because the SAR image has the problem of insufficient training data, the problem of small samples is very prominent in SAR image recognition, in order to further verify the effectiveness of the invention, small sample experiments are carried out on three categories of MSTAR target data shown in table 1, small sample experimental conditions are simulated by randomly selecting a certain proportion of training samples, and the average result of 10 experiments is selected as a recognition result. And compared with the existing SAR target recognition method based on limited training data of a network generated by angular rotation by a small sample recognition method (ARGN for short, the SAR ATR method based on an improved polar coordinate mapping classifier from article SAR Target Recognition With Limited Training Data Based on Angular Rotation Generative Network,IEEE Geoscience and Remote Sensing Letters,Sun Y,2019)、 (M-PMC for short, the SAR target recognition method based on a data enhanced convolutional neural network from article Modified Polar Mapping Classifier for SAR Automatic Target Recognition,IEEE Transactions on Aerospace and Electronic Systems,Park J,2014)、 (DA-CNN for short, the SAR image classification method based on a deep convolutional network from article Convolutional Neural Network with Data Augmentation for SAR Target Recognition,IEEE Geoscience and Remote Sensing Letters,Ding J,2016)、 (A-ConvNet for short, the recognition results of the method and the method under the small sample environment from article Target Classification using the Deep Convolutional Networks for SAR images,IEEE Transactions on Geoscience and Remote Sensing,Chen S,2016). are shown in the table 3).
TABLE 3 comparison of the identification performance of the inventive method with some prior methods in a small sample environment
The results in table 3 above were all trained by randomly selecting training samples in the corresponding proportions, with the sample ratios representing the ratio of the number of randomly selected samples to the number of all training samples. 10 experiments are carried out on the duty ratio value of each sample, and the average value of the 10 experiments is taken as the recognition result of the final sample duty ratio. It can be seen that when only less than half of training samples are used, the recognition accuracy of the method is superior to that of other comparison methods, which effectively proves the effectiveness of the model in the case of insufficient training data, and when the sample ratio is 0.1, namely each training data has only 22 samples, the average recognition accuracy of 1365 test samples reaches 89.46 percent.
The experimental results can show that in the identification experiments of three types of MSTAR target data, the method obtains better identification results under all training sample experimental conditions and less than 50% of small sample experimental conditions. The validity of the method is verified, and the method can better utilize global information and local information to obtain effective and stable target feature representation, so that the method has certain validity and feasibility.
The foregoing is a further detailed description of the invention in connection with the preferred embodiments, and it is not intended that the invention be limited to the specific embodiments described. It will be apparent to those skilled in the art that several simple deductions or substitutions may be made without departing from the spirit of the invention, and these should be considered to be within the scope of the invention.