1. Introduction
Radar indication technology is necessary for detecting ground/sea and low-altitude moving targets due to its all-day and all-weather capability. Since ground-based radars are susceptible to occlusion effects and low-altitude blind spots, airborne radar has significant advantages for detecting ground/sea and low-altitude moving targets. Moving target indication (MTI) is one of the most critical tasks in airborne radar. MTI is the presence or absence of a moving target with a certain relative velocity in an interesting scenario, also referred to as the cell under test (CUT). However, it is difficult to detect the target due to the severe ground and sea clutter when the airborne radar is working downward-looking. Moreover, one-dimensional filtering techniques based on the conventional moving target indication and moving target detection (MTD) often suffer from ineffective clutter suppression, especially in the non-homogeneous environments. Therefore, an efficient method for clutter suppression and target indication is needed for target detection.
To suppress the clutter and detect the moving target effectively, space–time adaptive processing (STAP) is proposed. Space–time adaptive processing (STAP) utilizes two-dimensional joint adaptive filtering in the spatial and temporal domains to achieve effective clutter suppression. Currently, STAP technology has been widely used in airborne radar systems [
1,
2,
3]. In general, the optimal filters for MTI and STAP require a known clutter and noise covariance matrix (CNCM) of the CUT. Since the clutter covariance matrix (CCM) of the CUT in the optimal filter is unknown, Reed et al. [
4] proposed an adaptive STAP filter using the sample covariance matrix (SCM) instead of the real CCM, which is called sample matrix inversion (SMI). To obtain an excellent adaptive clutter suppression performance, the training samples and the CUT need to have the same clutter statistical characteristics and to satisfy the independent and identically distributed (IID) condition. However, due to the non-homogeneous environments, the SMI faces two main challenges in practice. First, the samples of the range cells near the CUT may not satisfy the IID condition, resulting in a large performance loss of the simple training sample selection methods. Second, the number of samples with the IID condition among all available training samples is limited and less than two times the system’s degrees of freedom. These problems lead to the degradation of adaptive clutter suppression performance, which in turn causes the loss of target detection performance. Therefore, it is of great theoretical significance and practical application to study the adaptive clutter suppression and target indication techniques in non-homogeneous environments.
To solve the problem of insufficient training samples, researchers propose that the problem’s impact can be mitigated or overcome with techniques such as training samples selection and single-sample processing. Such methods are collectively referred to as the non-homogeneous STAP, including the classic methods such as Doppler compensation [
5] (DC), angle-Doppler compensation [
6] (ADC), and adaptive ADC [
7] (A2DC). Although these algorithms can improve clutter suppression performance in non-homogeneous environments, there are also some shortcomings. For example, DC, ADC and A2DC all use a single point as a reference for the mainlobe center compensation, which cannot simultaneously compensate the clutter spectrum in all directions. Therefore, the drawbacks degrade the algorithms’ MTI performance. At present, the advanced STAP techniques based on knowledge-aid [
8,
9,
10,
11] (KA) and sparse recovery [
12,
13,
14,
15,
16] (SR) are also applied in airborne radar MTI, which can reduce the negative effects caused by the clutter non-homogeneous to a certain extent. Moreover, the KA-STAP techniques aim to improve the performance of the conventional STAP algorithms through prior knowledge of various forms and properties. However, the exact form of prior knowledge is difficult to obtain, resulting in a poor real-time performance. Though SR-STAP can effectively reduce the demand for the IID training samples, it is accompanied by a large amount of computation and grid mismatch. Therefore, the existing STAP techniques in practice have limited ability to suppress the clutter due to the insufficient training samples, thus reducing the detection performance. As a result, the emphasis of STAP-MTI is mainly on breaking the limited IID samples in the CCM estimation.
For image processing in the MTI, different approaches have been investigated [
17,
18,
19]. For signal data processing, STAP adaptively filters the space–time observation (STO) echo data, while the subsequent constant-false-alarm-rate (CFAR) can be considered a two-class classifier in STAP-MTI. The two classes represent the target-present case or the target-absent case. In addition to STAP-MTI, researchers have recently proposed other alternative MTI methods. The MTI method based on the pattern recognition first transforms the traditional filtering problem into the pattern classification. Khatib et al. [
20] proposed a STAP method based on least squares for moving target indication (LI-MTI). The method avoids CCM estimation and constructs a classifier identifier to process the radar space–time echo data. To reduce the moving target energy required by LI-MTI, Khatib et al. [
21] constructed a polynomial classifier for target indication (POLY-MTI). However, due to the limited fitting and poor feature extraction ability, the above methods need to be further improved in terms of the non-homogeneous clutter environments and low signal-to-clutter ratio (SCR).
In recent years, deep learning technologies, represented by the convolutional neural network (CNN), have developed rapidly, and have gained extensive attention and great success in the field of computer vision [
22,
23,
24]. Deep learning automatically learns to extract the hierarchical and expressive features directly from the STO data. It provides new ideas for problems such as radar image processing [
25,
26,
27,
28] and radar signal processing [
29,
30,
31]. Recently, deep learning techniques have been applied to clutter suppression in airborne radar. CNN-STAP [
32] utilizes the low-resolution clutter angle-Doppler spectrum to reconstruct the high-resolution clutter angle-Doppler spectrum and then calculates the CCM to derive the STAP weight vector. However, this method is aimed at clutter suppression by CNN. In the field of airborne radar, CNN-MTI [
31] uses AlexNet to construct a classifier to achieve effective target indication. However, the CNN-MTI method suffers from a large number of network parameters and a low detection accuracy.
Despite its widespread applications and great advantages, deep learning has rarely been applied to angle-Doppler domain estimation tasks in the field of MTI. We propose an end-to-end moving target indication method based on the D2CNN to improve the target detection capability with a few training samples. First, the established training dataset considered various realistic situations in the non-homogeneous clutter environments, such as aircraft crabbing, array errors, and internal clutter motion (ICM). Then, a D2CNN with five layers was built to train and fit the network parameters. Finally, the high-resolution target spectrum after training was used to obtain the velocity and space information. To the best of our knowledge, this paper is the first work to apply deep learning techniques to angle-Doppler spectrum estimation for target indication in non-homogeneous clutter environments.
The main contributions of this paper are as follows:
- (1)
The proposed method can obtain higher detection accuracy using a few samples, which solves the problem of insufficient samples in non-homogeneous clutter environments. The simulation demonstrates that the proposed ETE-MTI has a much lower computational load and a higher detection accuracy in non-homogeneous and low-SCR environments than the existing CNN-MTI [
31] method;
- (2)
The five-layer D2CNN was constructed with the requirement of the high resolution, which achieved end-to-end target indication to improve the detection accuracy. The D2CNN’s input was built by the clutter-plus-target angle-Doppler spectrum with a low-resolution estimated by a few samples. The label was constructed by the target angle-Doppler spectrum with a high-resolution obtained by the exact angle and Doppler. Once trained, the D2CNN can be used to predict the target properly with a high resolution using a few samples in near real-time. We also took into account the spatial–temporal sparsity of the clutter and target, which helps network design and training.
The rest of the paper is organized as follows. In
Section 2, the space–time signal model is introduced. In
Section 3, the deep learning framework and the principle of the proposed ETE-MTI method are proposed. In
Section 4, the simulation results and discussion are provided to demonstrate the proposed method’s computational efficiency and target detection performance. The conclusions are presented in
Section 5.
Notation: Boldface lowercase letters denote vectors and boldface uppercase letters denote matrices. The transposition and conjugate transposition operations are denoted by superscripts T and H, respectively. The symbols and * represent the Kronecker product, Hadamard product and convolution, respectively. is the notation of the expectation operation. denotes the Frobenius norm.
2. Signal Model
Assume that the antenna array of the airborne phased array pulse radar system with a uniform linear array (ULA) consisting of
N elements is moving with constant velocity
v at altitude
H. The distance between the two adjacent array elements is equal to the half wavelength.
Figure 1 shows the model between the ULA and the ground geometry. The pulse repetition frequency is
, and
M pulses are transmitted at a constant pulse repetition frequency (PRF) during each coherent processing interval (CPI). Set
as the carrier coordinate system, where ULA is placed parallel to the
Y-axis, and the angle between
v and the
Y-axis is
.
P is a clutter patch of a certain range cell on the ground plane. The angle of the clutter patch relative to the antenna array is
, and the azimuth and elevation angles relative to the antenna axis are
and
, respectively.
The space–time snapshot vector
can be expressed as:
where
is the target space–time snapshot vector,
is the clutter space–time snapshot vector, and
is the complex Gaussian white noise vector.
In the ULA radar system, the target velocity relative to the airborne radar platform is
, then the spatial steering vector
and the temporal steering vector
can be written as:
where
and
are the normalized spatial frequency (NSF) and normalized Doppler frequency (NDF) of the target, respectively.
The space–time snapshot vector of a single-point target
can be expressed as the multiplication of the complex amplitude
and the corresponding space–time steering vector
of the target:
where
For the clutter scattering point
P of a certain range gate, its spatial steering vector
and temporal steering vector
can be described for:
where
and
are the clutter patch’s NSF and NDF.
Considering the non-ideal factors in non-heterogeneous clutter environments with array errors and internal clutter motion (ICM), the clutter space–time snapshot vectors of all range cells are the accumulation of the echo signal of each clutter block at different ambiguous ranges. Assuming that each clutter scattering point is statistically independent, the clutter space–time snapshot is defined as:
where
,
,
denote the number of ambiguous range rings, the number of spurious scattering points on a single range ring, and the complex scattering amplitude of the
spurious scattering point on the
ambiguous range ring, respectively;
represents the real temporal weight vector brought by ICM;
represents the real spatial weight vector caused by the array errors.
obeys the complex Gaussian distribution with mean-zero and variance
.
Since each clutter block is statistically independent and
is a Gaussian random variable with mean-zero and variance
, the corresponding CCM of this clutter data is defined as:
where
denotes the clutter space–time steering vector; the time autocorrelation matrix is
Toeplitz
due to the ICM, where
;
represents the variance of the spreading of the clutter spectrum caused by the wind speed and
denotes the wavelength;
denotes the spatial autocorrelation matrix caused by the array errors.
In general, the CCM is unknown, so it is usually obtained by maximum likelihood estimation (MLE) using the adjacent datasets of the CUT as training samples. Hence, the corresponding covariance matrix can be represented by:
where
L is the number of training samples.
represents the STO data of the
lth training sample.
According to the RMB rule, the number of training samples must be at least twice the number of the system degrees of freedom to keep the loss of SNR within 3 dB. After obtaining the CCM, the space–time adaptive optimal weight vector can be obtained:
where
is the normalization constant. It can be seen that if the estimated CCM is inaccurate, the calculated space–time adaptive filter weight vector and the theoretical STAP optimal filter weight vector have a large gap in the clutter suppression performance, which will affect the performance of subsequent target detection.
Due to the severe clutter, noise and jamming, the moving target is always buried in the interference. The goal of MTI is to detect the moving target’s Doppler frequency and spatial frequency from the STO. In this paper, we make use of the D2CNN to learn the distribution characteristics of the clutter and the target. The D2CNN extracts information about the target directly from the clutter-plus-target spectrum. Hence, the proposed method avoids reconstructing the clutter spectrum to achieve the end-to-end target indication for airborne radar.
4. Results and Discussion
In this section, simulation experiments were used to verify the effectiveness of the proposed method. The simulation parameters are listed in
Table 1. The number of used training samples was 4. The angle frequency discretization factor
was 6 and the Doppler frequency discretization factor
was 6. The network parameters were given as: the number of channels
is 1 and
are set to
,
,
,
, respectively. Meanwhile, the learning rate was set to
. Moreover, the pairs dataset was used for training with a batch size of 64. Furthermore, we conducted the experiment using an AMD Ryzen 7 5700 G with Radeon Graphics CPU.
4.1. Convergence Analysis
This subsection analyzes each network’s overall training and validation MSEs concerning the number of iterations.
Figure 4 presents the variation of the training and validation MSEs with the training iterations in the ideal and non-ideal cases. Two networks were trained for 350 and 400 iterations, respectively. The training MSE in both the ideal and non-ideal cases decreases rapidly in the early training period and essentially reaches convergence at the 300th training iteration with only minor changes in the subsequent training iterations. In addition, the network converges faster in the ideal case than in the non-ideal case, since the training dataset in the ideal case does not contain other non-ideal factors. The clutter distribution is relatively single. Therefore, ETE-MTI can quickly learn the distribution characteristics between the clutter and the target. In contrast, in the non-ideal case, the clutter-plus-target contains various non-ideal factors. So the clutter spectrum distribution is complicated to affect the target indication, which makes ETE-MTI need a longer period to learn. Moreover, the validation curves level off after about 150 iterations and remain roughly constant thereafter. The result confirms that there is no overfitting in the two networks.
4.2. Visualization of Prediction Results
This subsection analyzes the prediction performance of ETE-MTI. For simplicity, the NSF was made to be 0.
If the clutter and the target were easily distinguishable on the space–time spectrum, the target’s NDF was set to 0.556.
Figure 5 shows the predicted target angle-Doppler results.
Figure 5a,b show the clutter-plus-target and the target spectrum in the ideal case. The target was estimated by the proposed method. ETE-MTI can predict the target position well without the clutter remaining, realizing end-to-end target indication. The prediction performance in the case of the aircraft crabbing angle
is shown in
Figure 5c,d. The clutter spectrum is bent due to the influence of the aircraft crabbing and is mixed with a part of the target in
Figure 5c. Nonetheless, It can be seen that, from
Figure 5d, the expected target can be detected after the CNN, but there is a bit of residual clutter at the zero Doppler position. As shown in
Figure 5e, in the presence of array errors, the energy of the clutter spectrum leaks along the angle direction and undergoes spectral broadening. The predicted result in the case of crabbing is shown in
Figure 5f. Although the target can be indicated, there is relatively more clutter remaining at zero Doppler along the angle direction.
Figure 5g,h show the clutter-plus-target and the target Fourier spectrum in the case of ICM. The target was estimated by the proposed method. As shown in
Figure 5g, the clutter spectrum is broadened due to the wind speed. The predicted result is shown in
Figure 5h, that the target can still be indicated with NSF = 0.556 after the deep learning network.
In the following, we discuss the performance when the target is close to the mainlobe of the clutter. The target’s NDF was set to
.
Figure 6 shows the predicted target angle-Doppler results.
Figure 6a,b show the clutter-plus-target Fourier spectrum and the predicted target spectrum in the ideal case. The target was buried in the clutter with the high power; ETE-MTI could still predict the target after the trained network, but the target’s power was weakened at this time. In the non-ideal case, the factor parameters were set to the array error
, the ICM
, and the aircraft crabbing angle
.
Figure 6c,d show the clutter-plus-target Fourier spectrum and the predicted target in the non-ideal case. As is shown in
Figure 6c, although the clutter spectrum is completely mixed with the target due to the bending, energy leakage and spectral broadening because of the aircraft crabbing, array errors and ICM, the expected target can be indicated after the deep learning from
Figure 6d.
As a result, when the target is buried and covered by the clutter with high power or the target is at low speed, ETE-MTI can quickly learn the spatial–temporal distribution characteristics of the clutter and the target through the neural network to extract the target information, realizing the end-to-end target indication.
4.3. Detection of Probability under Different SCR Scenarios
In this subsection, we evaluate the target detection performance of different NDFs by the probability of detection (PD) versus SNR curves. There are 31 artificially generated test datasets with different SCRs, which are produced by the different target powers under the same clutter power of 50 dB. In different test datasets, the targets’ powers varied from 20 dB to 60 dB with equal intervals. In each dataset, 1000 test samples were generated by adding the target signals with the same power and candidate NDFs to the clutter. The samples from each test dataset were fed into the trained D
2CNN. The detection performance was evaluated by PD which were obtained by using the adaptive matched filter (AMF) detector. PD is the average percentage of correctly classified test samples for each target in the test dataset.
Figure 7 shows the effect of non-homogeneous clutter on detection performance. Two cases are also considered in
Figure 7. In the non-ideal case, the non-ideal factors were set to the array error
, the ICM
, and the aircraft crabbing angle
. The target’s NSF was fixed to 0, while the NDF considers three values; 0.167, 0.367 and 0.5, respectively.
As depicted in
Figure 7a,b, with the increase of SCR, the detection performance of the ETE-MTI method has improved. The three curves indicate that the ETE-MTI method have superior target detection performance whether in the mainlobe region (
= 0.1667) or in the sidelobe region (
= 0.367 or
= 0.5) at the high SCR conditions. The PD approximately approaches
in the sidelobe region (
= 0.367 or
= 0.5) with the SCR of −15 dB. As the target’s NDF increases, the proposed method’s detection performance improves. It can be seen that the detection performance of the proposed method in the sidelobe region is better than that in the mainlobe region. It will degrade the target detection performance when the clutter exists with non-ideal factors. From
Figure 7a,b, compared with the PD curves in the ideal case, the PD in the non-ideal case is slightly decreasing, although in non-homogeneous clutter environments, the PD can remain above
in the sidelobe region (
= 0.367 or
= 0.5) at −10 dB SCR. Thus, the results demonstrate that the ETE-MTI method has a good detection performance in the non-homogeneous clutter environments and low SCR conditions.
4.4. Comparison of Computation Complexity
The calculation burden mainly comes from convolution operations during the D
2CNN’s training and test. For the mentioned D
2CNN, the component complexity formula is as follows [
34]:
where
l is the index of a convolutional layer, and
C is the depth.
is the number of filters in the
l-th layer.
represents the number of input channels of the
l-th layer.
is the spatial size (length) of the filter.
is the spatial size of the output feature map. The calculation complexity of the ETE-MTI method is obtained by substituting the network parameters set in this paper into Equation (
22). According to
Table 1 and
Figure 3, the computation complexity of the proposed ETE-MTI is in the order of
O(
MN). However, the computation complexity of the CNN-MTI method is
, which is one order of magnitude more than the method proposed in this paper.
4.5. Comparison of Detection of Probability
In this subsection, PD verifies the detection performance of different methods. First, we evaluated the proposed ETE-MTI method’s detection performance under different doppler channels’ PD compared with other methods. Other conditions were the same; the target’s power was set to 30 dB and the target’s velocity in different test datasets varied from −150 m/s to 150 m/s. Fifteen test datasets with different target velocities were generated, which corresponded to the 15 Doppler channels. The results of the traditional optimal method (OPT-STAP-MTI) and the CNN-MTI method in [
31] for comparison were used to verify the ETE-MTI method’s accuracy and effectiveness. ETE-MTI used four IID data range cells, CNN-MTI used 105 IID data range cells around the CUT, and OPT-STAP-MTI used all range cells. The PD of the three methods in different Doppler channels were compared as shown in
Figure 8. ETE-MTI had the lowest PD of
in the zero Doppler channel since the clutter entirely buried the target. As the target velocity increased, the target was further and further away from the main lobe of the clutter spectrum. Therefore, the distinguishability between the target and the clutter increased and ETE-MTI could detect the target more accurately with PD of up to
.
It can be observed that the detection performance of CNN-MTI is poor, and its PD is lower than that of the other two methods in the zero Doppler channel. The detection performance of all three methods improves as the Doppler channel increases, and ETE-MTI and STAP-MTI can detect the target with the PD of 100% in multiple Doppler channels. Moreover, ETE-MTI and STAP-MTI can detect the target when the target’s velocity is low. The reason for the improved detection performance of the three methods is that, as the Doppler channel increases, the target velocity increases relative to the stationary clutter. Hence, the clutter and the target can be distinguished in the spectrum, making it easier to detect the target.
The comparison shows that the average PD of ETE-MTI exceeds that of CNN-MTI. Moreover, the ETE-MTI ’s PD curve is very close to that of OPT-STAP-MTI. Thus, the results demonstrate that ETE-MTI can achieve an excellent performance under different Doppler channels and excels in detecting low-speed targets. Furthermore, the ETE-MTI method will outperform the traditional STAP method when the training sample is limited.
In addition, we compared the detection performance of the proposed method ETE-MTI and CNN-MTI with different SCRs. The target’s NDF was set as 0.367 and the number of test samples was 1000. The test samples’ generation was the same as in
Section 4.3. The performance comparison is shown in
Figure 9. The PD of both methods gradually improves with the increase of the SCR. The highest PD of ETE-MTI can reach
, while the highest PD of CNN-MTI is close to
. The detection performance of the proposed ETE-MTI is better than that of CNN-MTI at low SCRs. Therefore, the result demonstrates that the proposed ETE-MTI has a much lower computational load and a higher detection accuracy than the existing CNN-MTI method with a few samples in the non-homogeneous and low SCR environments.
Consequently, the two methods—ETE-MTI and CNN-MTI—differ in the form of the data entered. The proposed method’s input is the power spectrum amplitude data of the clutter-plus-target. In CNN-MTI, the input is the space–time observation data. Furthermore, the five-layer D2CNN built in this paper considers the target’s high resolution for target indication, allowing our method to detect the target more easily in the non-homogeneous clutter and low SCR environments. From the results, the proposed simpler D2CNN with less computation is more efficient in learning the power spectrum amplitude data and therefore has a better detection performance.
5. Conclusions
This paper proposes an end-to-end moving target indication method for airborne radar based on deep learning. First, we constructed the training dataset including non-ideal factors in non-homogeneous clutter environments. In the dataset, the low-resolution clutter-plus-target spectrum was considered as the D
2CNN’s input, which was estimated by a few samples to solve the problem of insufficient samples. Then, the high-resolution target spectrum is taken as the D
2CNN’s label. Secondly, the proposed five-layer D
2CNN is established to extract the input’s feature. Finally, once the clutter and target distribution characteristics are learned, the D
2CNN can predict the target space–time information from the output’s high-resolution spectrum, realizing the end-to-end moving target indication. The D
2CNN with five layers is in consideration of the high-resolution requirements, which can improve the target detection. Furthermore, unlike other traditional STAP technologies, the proposed method mainly uses the D
2CNN’s mapping characteristics to complete clutter filtering to realize the target indication directly. The results demonstrate that the proposed ETE-MTI with a few samples has a much lower computational load and a higher detection accuracy in non-homogeneous and low-SCR environments than the existing CNN-MTI [
31] method.
The limitation of the proposed method is that it has studied the target indication performance in the non-homogeneous environments for the time being. Target indication in the heterogeneous environments is the next research goal. In our future research, the more realistic physical effects, such as heterogeneous clutter environments, should also be considered to validate the robustness of our method.