Enlighten-GAN for Super Resolution Reconstruction in Mid-Resolution Remote Sensing Images
"> Figure 1
<p>The super resolution reconstruction (SRR) results of SRCNN, ESRGAN, and our proposed Enlighten-GAN along with the ground truth. We crop and zoom the classic area, in relation to the observation of details. The result of SRCNN, as a representation for pixel-loss-based methods, is blurred for their conservative strategy, while the ESRGAN one is of the unpleasing artifacts. Our result is clear and reliable as a comparison.</p> "> Figure 2
<p>The demonstration of our GAN Structure. Given an LR image, we synthesis SR results from the G and compare it with the real HR one. The D is responsible for distinguishing the fake and real one, and thus provides adversarial loss for the training of the G.</p> "> Figure 3
<p>The architecture of the G. The bottom of network extracts feature maps by recursive learning and residual learning, while the top exploits these feature maps to predict multi-level HR images. The “Conv” refers to a convolutional layer with a 3 × 3 size kernels, while the RRDB means Residual-in-Residual Dense Block for short. The <math display="inline"><semantics> <mi>β</mi> </semantics></math> in RRDB is the residual scaling parameter, which is set as 0.2.</p> "> Figure 4
<p>The architecture of the D. It is responsible for encouraging the G to generate images similar enough to real-world HR data. “BN” is batch normalization for short, “Conv” refers to convolutional layer, and “FC{<span class="html-italic">N</span>}” represents a fully connected layers outputting <span class="html-italic">N</span>-element array.</p> "> Figure 5
<p>The architecture of our autoencoder for extracting feature maps. It consists of an encoder module and a decoder module. We mark some layers to green as the feature maps we adopt.</p> "> Figure 6
<p>The pipeline of clipping-and-merging method. The input image is cropped into four patches with overlaps. Each patch is upsampled by 4 times through our networks, namely, 384 × 384 pixels. Half of overlap in each patch is clipped, and thus the patches get size of 336 × 336 pixels, namely a quarter of result. Therefore, four patches predicted by the aforementioned method compose the whole upsampling result.</p> "> Figure 7
<p>The flaws in PSNR. The second and third patches are the two SR results, while the first one is the ground truth. Notably, the second patch retain the basic shape, but due to the information loss, it introduces geometry error and swaps the white and black area when predicting, thus obtaining a lower PSNR than the third. The GSM metric, on the contrast, evaluates the results reasonably.</p> "> Figure 8
<p>The quality result. We conduct experiment on varied methods. Our proposed Enlighten GAN outperforms the others.</p> "> Figure 9
<p>The merged image and the detail information. The left one is average merging result from ESRGAN output, and the right one is the result from clipping-and-merging method. Two green dashed lines refer to the position where they merge. One can hardly find the seam line between patches without supplement of dashed lines in our method, while there is a clear line in the other one.</p> "> Figure 10
<p>The comparison of ground truth and SR images in the town scene. The remote sensing image in the town area is full of various objects with different colors. The networks cannot distinguish one building with the others in the complicate background.</p> ">
Abstract
:1. Introduction
- We design a novel Enlighten-GAN with an enlighten block. The enlighten block benefits the network by setting an easier target to ensure it receives effective gradient. Owing to the varied scale reconstructed results, the enlighten block gains even higher generalization ability. Our proposed Enlighten-GAN proves itself in our comparison experiments on Sentinel-2 images, exceeding the state-of-the-art methods.
- We introduce and employ a Self-Supervised Hierarchical Perceptual Loss for training rather than the conventional perceptual loss defined with VGGNet [15], which is more suitable for SRR-like tasks. We conduct ablation experiment to verify its effectiveness.
- To address the merging issue, we propose a clipping-and-merging method with a learning-based batch internal inconsistency loss, by which the seam lines in the predicted large-scale remote sensing images are dismissed.
2. Related Work
2.1. Generative Adversarial Network
2.2. Super Resolution Reconstruction
3. Methodology
3.1. Enlighten-Gan
3.2. Model Optimization
3.3. Patches Clipping-And-Merging Method
4. Experiment
4.1. Implement Details
4.2. Image Quality Assessment
4.3. Results of Evaluation
4.4. Ablation Study
5. Discussion
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Ledig, C.; Theis, L.; Huszár, F.; Caballero, J.; Cunningham, A.; Acosta, A.; Aitken, A.; Tejani, A.; Totz, J.; Wang, Z.; et al. Photo-realistic single image super-resolution using a generative adversarial network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–27 July 2017; pp. 4681–4690. [Google Scholar]
- Stark, H.; Oskoui, P. High-resolution image recovery from image-plane arrays, using convex projections. J. Opt. Soc. Am. Opt. Image Sci. 1989, 6, 1715. [Google Scholar] [CrossRef] [PubMed]
- Yang, J.; Wright, J.; Huang, T.; Ma, Y. Image super-resolution as sparse representation of raw image patches. In Proceedings of the 2008 IEEE Conference on Computer Vision and Pattern Recognition, Anchorage, AK, USA, 23–28 June 2008; pp. 1–8. [Google Scholar]
- Yang, J.; Wright, J.; Huang, T.S.; Ma, Y. Image super-resolution via sparse representation. IEEE Trans. Image Process. 2010, 19, 2861–2873. [Google Scholar] [CrossRef] [PubMed]
- Dong, C.; Loy, C.C.; He, K.; Tang, X. Learning a deep convolutional network for image super-resolution. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2014; pp. 184–199. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Kim, J.; Kwon Lee, J.; Mu Lee, K. Accurate image super-resolution using very deep convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 1646–1654. [Google Scholar]
- Zhang, Y.; Li, K.; Li, K.; Wang, L.; Zhong, B.; Fu, Y. Image super-resolution using very deep residual channel attention networks. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 286–301. [Google Scholar]
- Tai, Y.; Yang, J.; Liu, X. Image super-resolution via deep recursive residual network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 3147–3155. [Google Scholar]
- Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
- Tong, T.; Li, G.; Liu, X.; Gao, Q. Image super-resolution using dense skip connections. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 4799–4807. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7132–7141. [Google Scholar]
- Johnson, J.; Alahi, A.; Fei-Fei, L. Perceptual losses for real-time style transfer and super-resolution. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2016; pp. 694–711. [Google Scholar]
- Sajjadi, M.S.; Scholkopf, B.; Hirsch, M. Enhancenet: Single image super-resolution through automated texture synthesis. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 4491–4500. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- Wang, X.; Yu, K.; Wu, S.; Gu, J.; Liu, Y.; Dong, C.; Qiao, Y.; Change Loy, C. Esrgan: Enhanced super-resolution generative adversarial networks. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
- Lei, S.; Shi, Z.; Zou, Z. Super-resolution for remote sensing images via local–global combined network. IEEE Geosci. Remote. Sens. Lett. 2017, 14, 1243–1247. [Google Scholar] [CrossRef]
- Jiang, K.; Wang, Z.; Yi, P.; Wang, G.; Lu, T.; Jiang, J. Edge-enhanced GAN for remote sensing image superresolution. IEEE Trans. Geosci. Remote Sens. 2019, 57, 5799–5812. [Google Scholar] [CrossRef]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. Adv. Neural Inf. Process. Syst. 2014, 27, 2672–2680. [Google Scholar]
- Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein gan. arXiv 2017, arXiv:1701.07875. [Google Scholar]
- Mirza, M.; Osindero, S. Conditional generative adversarial nets. arXiv 2014, arXiv:1411.1784. [Google Scholar]
- Nowozin, S.; Cseke, B.; Tomioka, R. f-gan: Training generative neural samplers using variational divergence minimization. In Proceedings of the 30th International Conference on Neural Information Processing Systems; Curran ssociates Inc.: Red Hook, NY, USA, 2016; pp. 271–279. [Google Scholar]
- Miyato, T.; Kataoka, T.; Koyama, M.; Yoshida, Y. Spectral normalization for generative adversarial networks. arXiv 2018, arXiv:1802.05957. [Google Scholar]
- Jolicoeur-Martineau, A. The relativistic discriminator: A key element missing from standard GAN. arXiv 2018, arXiv:1807.00734. [Google Scholar]
- Dong, C.; Loy, C.C.; Tang, X. Accelerating the super-resolution convolutional neural network. In European Conference on Computer Vision; Springer: Berlin/Heidelberg, Germany, 2016; pp. 391–407. [Google Scholar]
- Lim, B.; Son, S.; Kim, H.; Nah, S.; Mu Lee, K. Enhanced deep residual networks for single image super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Honolulu, HI, USA, 21–26 July 2017; pp. 136–144. [Google Scholar]
- Lai, W.S.; Huang, J.B.; Ahuja, N.; Yang, M.H. Deep laplacian pyramid networks for fast and accurate super-resolution. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 624–632. [Google Scholar]
- Shocher, A.; Cohen, N.; Irani, M. “zero-shot” super-resolution using deep internal learning. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3118–3126. [Google Scholar]
- Jiang, K.; Wang, Z.; Yi, P.; Jiang, J.; Xiao, J.; Yao, Y. Deep distillation recursive network for remote sensing imagery super-resolution. Remote Sens. 2018, 10, 1700. [Google Scholar] [CrossRef] [Green Version]
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. Imagenet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar]
- Hinton, G.E.; Salakhutdinov, R.R. Reducing the dimensionality of data with neural networks. Science 2006, 313, 504–507. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Blau, Y.; Mechrez, R.; Timofte, R.; Michaeli, T.; Zelnik-Manor, L. The 2018 PIRM Challenge on Perceptual Image Super-Resolution In Computer Vision–ECCV 2018 Workshops. ECCV 2018; Lecture Notes in Computer Science; Leal-Taixé, L., Roth, S., Eds.; Springer: Cham, Switzerland, 2018; Volume 11133. [Google Scholar]
- Ma, C.; Yang, C.Y.; Yang, X.; Yang, M.H. Learning a no-reference quality metric for single-image super-resolution. Comput. Vis. Image Underst. 2017, 158, 1–16. [Google Scholar] [CrossRef] [Green Version]
- Mittal, A.; Soundararajan, R.; Bovik, A.C. Making a “completely blind” image quality analyzer. IEEE Signal Process. Lett. 2012, 20, 209–212. [Google Scholar] [CrossRef]
- Blau, Y.; Michaeli, T. The perception-distortion tradeoff. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 6228–6237. [Google Scholar]
- Liu, A.; Lin, W.; Narwaria, M. Image quality assessment based on gradient similarity. IEEE Trans. Image Process. 2011, 21, 1500–1512. [Google Scholar] [PubMed]
- Ponomarenko, N.; Jin, L.; Ieremeiev, O.; Lukin, V.; Egiazarian, K.; Astola, J.; Vozel, B.; Chehdi, K.; Carli, M.; Battisti, F.; et al. Image database TID2013: Peculiarities, results and perspectives. Signal Process. Image Commun. 2015, 30, 57–77. [Google Scholar] [CrossRef] [Green Version]
- Zhang, L.; Shen, Y.; Li, H. VSI: A visual saliency-induced index for perceptual image quality assessment. IEEE Trans. Image Process. 2014, 23, 4270–4281. [Google Scholar] [CrossRef] [PubMed] [Green Version]
- Zhang, R.; Isola, P.; Efros, A.A.; Shechtman, E.; Wang, O. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 586–595. [Google Scholar]
Method | GSM (avg/std) | LPIPS (avg/std) | PI (avg/std) | PSNR (avg/std) |
---|---|---|---|---|
ground-truth | 1.000000/0.000000 | 0.000/0.000 | 2.493/0.501 | ∞/- |
bicubic upsampling | 0.874414/0.031840 | 0.557/0.065 | 7.077/0.323 | 23.881/2.737 |
SRCNN [5] | 0.949582/0.017110 | 0.464/0.068 | 6.736/0.534 | 24.576/2.758 |
SRGAN [1] | 0.973279/0.007973 | 0.293/0.038 | 3.659/0.595 | 19.413/2.812 |
ESRGAN [16] | 0.995795/0.002947 | 0.189/0.027 | 2.391/0.488 | 22.368/2.744 |
EEGAN [18] | 0.998480/0.001953 | 0.508/0.114 | 3.439/0.707 | 19.929/3.067 |
Enlighten-GAN | 0.999336/0.000750 | 0.182/0.027 | 2.509/0.546 | 22.834/2.851 |
Perceptual Loss | GSM (avg/std) | LPIPS (avg/std) | PI (avg/std) | PSNR (avg/std) |
---|---|---|---|---|
ground-truth | 1.000000/0.000000 | 0.000/0.000 | 2.493/0.501 | ∞/- |
our Perceptual | 0.999336/0.000750 | 0.182/0.027 | 2.509/0.546 | 22.834/2.851 |
without Perceptual | 0.999204/0.001265 | 0.228/0.028 | 2.611/0.491 | 22.291/2.655 |
VGG-Perceptual | 0.999112/0.000834 | 0.169/0.027 | 2.396/0.571 | 22.485/2.850 |
GAN | GSM (avg/std) | LPIPS (avg/std) | PI (avg/std) | PSNR (avg/std) |
---|---|---|---|---|
ground-truth | 1.000000/0.000000 | 0.000/0.000 | 2.493/0.501 | ∞/- |
WGAN | 0.999336/0.000750 | 0.182/0.027 | 2.509/0.546 | 22.834/2.851 |
Standard GAN | 0.994932/0.003888 | 0.174/0.025 | 2.338/0.523 | 22.760/2.709 |
RaGAN | 0.998035/0.003349 | 0.237/0.031 | 2.784/0.494 | 21.758/2.616 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
Share and Cite
Gong, Y.; Liao, P.; Zhang, X.; Zhang, L.; Chen, G.; Zhu, K.; Tan, X.; Lv, Z. Enlighten-GAN for Super Resolution Reconstruction in Mid-Resolution Remote Sensing Images. Remote Sens. 2021, 13, 1104. https://doi.org/10.3390/rs13061104
Gong Y, Liao P, Zhang X, Zhang L, Chen G, Zhu K, Tan X, Lv Z. Enlighten-GAN for Super Resolution Reconstruction in Mid-Resolution Remote Sensing Images. Remote Sensing. 2021; 13(6):1104. https://doi.org/10.3390/rs13061104
Chicago/Turabian StyleGong, Yuanfu, Puyun Liao, Xiaodong Zhang, Lifei Zhang, Guanzhou Chen, Kun Zhu, Xiaoliang Tan, and Zhiyong Lv. 2021. "Enlighten-GAN for Super Resolution Reconstruction in Mid-Resolution Remote Sensing Images" Remote Sensing 13, no. 6: 1104. https://doi.org/10.3390/rs13061104