Gated Convolutional Neural Network for Semantic Segmentation in High-Resolution Images
<p>The strong relationship between segmentation error label map with entropy heat map. (<b>a</b>) Input image; (<b>b</b>) Segmentation reference map; (<b>c</b>) Predicted label map; (<b>d</b>) Error map with white pixels indicating wrongly classified pixels; (<b>e</b>) Corresponding entropy heat map.</p> "> Figure 2
<p>The overview of our gated segmentation network. In the encoder part, we use ResNet-101 as the feature extractor. Then the Entropy Control Module (ECM) are proposed for feature fusion in decoder. In addition, we design the Residual Convolution Module (RCM) as a basic processing unit. The details of RCM and ECM are shown in the dashed boxes.</p> "> Figure 3
<p>Overview of the ISPRS 2D Vaihingen Labeling dataset. There are 33 tiles. Numbers in the figure refer to the individual tile flag.</p> "> Figure 4
<p>Model visualization. We show the error maps, entropy heat maps, and predictions at different iterations in the training procedure. Four rows at each iteration block correspond to four ECMs, which are used to merge five kinds of feature maps with different resolutions.</p> "> Figure 5
<p>Visual comparisons between GSN and other related methods on ISPRS test set. Images come from the website of ISPRS 2D Semantic Labeling Contest.</p> "> Figure 6
<p>Three failure modules. (<b>a</b>) Placing gate on the output of lower layer; (<b>b</b>) Placing gate both on the output of lower layer and upper layers; (<b>c</b>) Gate on the output of lower layer is created by the combination of lower and upper layers output.</p> ">
:1. Introduction
- A gated network architecture is proposed for adaptive information propagation among feature maps with different level. With this architecture, convolution layers propagate the selected information into the final features. In this way, local and contextual features work with each other for improving the segmentation accuracy.
- An entropy control layer is introduced to implement the gate. It is based on the observation that the information entropy of the feature maps before the classifier are closely related to the label-error map of the segmentation, as shown in Figure 1.
- A new deep learning pipeline for semantic segmentation is proposed. It effectively integrates local details and contextual information and can be trained via an end-to-end manner.
- The proposed method achieves state-of-the-art performance among all the published papers on the ISPRS 2D semantic labeling benchmark. Specifically, our method achieves a mean score of 88.7% on five categories (ranking 1st) and overall accuracy 90.3% (ranking 1st). It should be noted that these results are obtained using only RGB images with a single model, without Digital Surface Model (DSM) and model ensemble strategy.
2. Related Work
2.1. Deep Learning
2.2. Semantic Segmentation in Remote Sensing
2.3. Gate in Neural Networks
3. Method
3.1. Important Observation
3.2. Gated Segmentation Network
3.2.1. Entropy Control Module
3.2.2. Residual Convolution Module
3.2.3. Model Optimization
Algorithm 1 The training algorithm for the proposed GSN. |
Input: Training data x, maximum iteration T. Initialize the parameters θ in convolutional layers, learning rate αt, learning rate policy ploy. Set the initialized iteration t ← 0. Output: The leanred parameter θ. 1: while do 2: . 3: Call network forward to compute the output and loss L. 4: Call network backward to compute the gradients . 5: Update the parameters by . 6: Updates the according to learning rate policy. 7: end while |
3.3. Implementation Details
4. Experiments
4.1. Dataset
4.2. Model Analysis
4.3. Comparisons with Related Methods
4.4. Model Visualization
4.5. ISPRS Benchmark Testing Results
4.6. Failed Attempts
5. Conclusions
Author Contributions
Conflicts of Interest
Method | Imp Surf | Building | Low_veg | Tree | Car | Overall Accuracy | Mean Score |
baseline | 87.6% | 93.2% | 73.3% | 86.9% | 54.1% | 86.1% | 79.0% |
GSN | 89.2% | 94.5% | 74.9% | 87.5% | 79.8% | 87.9% | 85.2% |
GSN_noL | 89.1% | 94.3% | 74.7% | 87.4% | 78.7% | 87.8% | 84.8% |
GSN_w | 89.5% | 94.4% | 75.9% | 87.8% | 80.9% | 88.3% | 85.7% |
GSN_w_mc | 90.2% | 94.8% | 76.9% | 88.3% | 82.3% | 88.9% | 86.5% |
Method | Imp Surf | Building | Low_veg | Tree | Car | Overall Accuracy | Mean Score |
FCN-8s [12] | 87.1% | 91.8% | 75.2% | 86.1% | 63.8% | 85.9% | 80.8% |
SegNet [14] | 82.7% | 89.1% | 66.3% | 83.9% | 55.7% | 82.1% | 75.5% |
Deeplab-v2 [21] | 88.5% | 93.5% | 73.9% | 86.9% | 84.7% | 86.9% | 83.5% |
RefineNet [15] | 88.1% | 93.3% | 74.0% | 87.1% | 65.1% | 86.7% | 81.5% |
GSN | 89.2% | 94.5% | 74.9% | 87.5% | 79.8% | 87.9% | 85.2% |
Method | Imp Surf | Building | Low_veg | Tree | Car | Overall Accuracy | Mean Score |
UPB [43] | 87.5% | 89.3% | 77.3% | 85.8% | 77.1% | 85.1% | 83.4% |
ETH_C [44] | 87.2% | 92.0% | 77.5% | 87.1% | 54.5% | 85.9% | 79.7% |
UOA [45] | 89.8% | 92.1% | 80.4% | 88.2% | 82.0% | 87.6% | 86.5% |
ADL_3 [26] | 89.5% | 93.2% | 82.3% | 88.2% | 63.3% | 88.0% | 83.3% |
RIT_2 [46] | 90.0% | 92.6% | 81.4% | 88.4% | 61.1% | 88.0% | 82.7% |
DST_2 [8] | 90.5% | 93.7% | 83.4% | 89.2% | 72.6% | 89.1% | 85.9% |
ONE_7 [47] | 91.0% | 94.5% | 84.4% | 89.9% | 77.8% | 89.8% | 87.5% |
DLR_9 [28] | 92.4% | 95.2% | 83.9% | 89.9% | 81.2% | 90.3% | 88.5% |
GSN | 92.2% | 95.1% | 83.7% | 89.9% | 82.4% | 90.3% | 88.7% |
Model_1 | Model_2 | Model_3 | GSN | |
overall accuracy | 83.4% | 60.0% | 82.2% | 86.1% |
mean score | 75.3% | 57.3% | 74.8% | 79.0% |
