Spectral normalization and dual contrastive regularization for image-to-image translation

300 Accesses
1 Altmetric
Explore all metrics

Abstract

Existing image-to-image (I2I) translation methods achieve state-of-the-art performance by incorporating the patch-wise contrastive learning into generative adversarial networks. However, patch-wise contrastive learning only focuses on the local content similarity but neglects the global structure constraint, which affects the quality of the generated images. In this paper, we propose a new unpaired I2I translation framework based on dual contrastive regularization and spectral normalization, namely SN-DCR. To maintain consistency of the global structure and texture, we design the dual contrastive regularization using different deep feature spaces respectively. In order to improve the global structure information of the generated images, we formulate a semantic contrastive loss to make the global semantic structure of the generated images similar to the real images from the target domain in the semantic feature space. We use gram matrices to extract the style of texture from images. Similarly, we design a style contrastive loss to improve the global texture information of the generated images. Moreover, to enhance the stability of the model, we employ the spectral normalized convolutional network in the design of our generator. We conduct comprehensive experiments to evaluate the effectiveness of SN-DCR, and the results prove that our method achieves SOTA in multiple tasks. The code and pretrained models are available at https://github.com/zhihefang/SN-DCR.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Unsupervised content and style learning for multimodal cross-domain image translation

Article Open access 27 November 2024

Multimodal Unsupervised Image-to-Image Translation Without Independent Style Encoder

OmniStyleGAN for Style-Guided Image-to-Image Translation

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data Availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

Benaim, S., Wolf, L.: One-sided unsupervised domain mapping. Neural Inf. Process. Syst. 752–762 (2017)
Bruckstein, A.M., ter Haar Romeny, B.M., Bronstein, A.M., et al.: Wasserstein barycenter and its application to texture mixing. In: International Conference on Scale Space and Variational Methods, pp. 435–446 (2011)
Caron, M., Misra, I., Mairal, J., et al.: Unsupervised learning of visual features by contrasting cluster assignments. Neural Inf. Process. Syst. (2020)
Chang, Y., Guo, Y., Ye, Y., et al.: Unsupervised deraining: Where asymmetric contrastive learning meets self-similarity. IEEE Trans. Pattern Anal. Mach. Intell. (2023)
Chen, T., Kornblith, S., Norouzi, M., et al.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607 (2020)
Chen, X., Fan, H., Girshick, R.B., et al.: Improved baselines with momentum contrastive learning. CoRR (2020)
Choi, Y., Uh, Y., Yoo, J., et al.: Stargan v2: Diverse image synthesis for multiple domains. In: Conference on Computer Vision and Pattern Recognition, pp. 8185–8194 (2020)
Fu, T., Gong, M., Wang, C., et al.: Geometry-consistent generative adversarial networks for one-sided unsupervised domain mapping. In: Conference on Computer Vision and Pattern Recognition, pp. 2427–2436 (2019)
Gatys, L.A., Ecker, A.S., Bethge, M.: Image style transfer using convolutional neural networks. In: Conference on Computer Vision and Pattern Recognition, pp. 2414–2423 (2016)
Goodfellow, I., Pouget-Abadie, J., Mirza, M., et al.: Generative adversarial networks. Commun. ACM 63(11), 139–144 (2020)
Article MathSciNet MATH Google Scholar
Gou, Y., Li, M., Song, Y., et al.: Multi-feature contrastive learning for unpaired image-to-image translation. Complex Intell. Syst. 9(4), 4111–4122 (2023)
Article MATH Google Scholar
Han, J., Shoeiby, M., Petersson, L., et al.: Dual contrastive learning for unsupervised image-to-image translation. In: Conference on Computer Vision and Pattern Recognition Workshops, pp. 746–755 (2021)
He, K., Fan, H., Wu, Y., et al.: Momentum contrast for unsupervised visual representation learning. In: Conference on Computer Vision and Pattern Recognition, pp. 9726–9735 (2020)
Heusel, M., Ramsauer, H., Unterthiner, T., et al.: Gans trained by a two time-scale update rule converge to a local nash equilibrium. Neural Inf. Process. Syst. 6626–6637 (2017)
Hu, J., Shen, L., Albanie, S., et al.: Squeeze-and-excitation networks. IEEE Trans. Pattern Anal. Mach. Intell. 42(8), 2011–2023 (2020)
Article MATH Google Scholar
Hu, X., Zhou, X., Huang, Q., et al.: Qs-attn: Query-selected attention for contrastive learning in I2I translation. In: Conference on Computer Vision and Pattern Recognition, pp. 18270–18279 (2022)
Huang, X., Liu, M.Y., Belongie, S., et al.: Multimodal unsupervised image-to-image translation. In: European Conference on Computer Vision, pp .172–189 (2018)
Isola, P., Zhu, J., Zhou, T., et al.: Image-to-image translation with conditional adversarial networks. In: Conference on Computer Vision and Pattern Recognition, pp. 5967–5976 (2017)
Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. IEEE Trans. Pattern Anal. Mach. Intell. 43(12), 4217–4228 (2021)
Article MATH Google Scholar
Mirza, M., Osindero, S.: Conditional generative adversarial nets. CoRR (2014)
Miyato, T., Kataoka, T., Koyama, M., et al.: Spectral normalization for generative adversarial networks. In: International Conference on Learning Representations (2018)
Park, T., Liu, M., Wang, T., et al.: Semantic image synthesis with spatially-adaptive normalization. In: Conference on Computer Vision and Pattern Recognition, pp. 2337–2346 (2019)
Park, T., Efros, A.A., Zhang, R., et al.: Contrastive learning for unpaired image-to-image translation. Euro. Conf. Comput. Vis. 12345, 319–345 (2020)
MATH Google Scholar
Phaphuangwittayakul, A., Ying, F., Guo, Y., et al.: Few-shot image generation based on contrastive meta-learning generative adversarial network. Vis. Comput. 39(9), 4015–4028 (2023)
Article Google Scholar
Qin, Z., Zhang, P., Wu, F., et al.: Fcanet: Frequency channel attention networks. In: International Conference on Computer Vision, pp. 763–772 (2021)
Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: A unified embedding for face recognition and clustering. In: Conference on Computer Vision and Pattern Recognition, pp. 815–823 (2015)
Son, J., Park, S.J., Jung, K.: Retinal vessel segmentation in fundoscopic images with generative adversarial networks. CoRR (2017)
Song, S., Lee, S., Seong, H., et al.: Shunit: Style harmonization for unpaired image-to-image translation. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 2292–2302 (2023)
Sung, F., Yang, Y., Zhang, L., et al.: Learning to compare: Relation network for few-shot learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1199–1208 (2018)
Torbunov, D., Huang, Y., Yu, H., et al.: Uvcgan: Unet vision transformer cycle-consistent gan for unpaired image-to-image translation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 702–712 (2023)
un-Yan, Zhu, Park, T., Isola, P., et al.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: International Conference on Computer Vision, pp. 2242–2251 (2017)
Wang, T., Liu, M., Zhu, J., et al.: High-resolution image synthesis and semantic manipulation with conditional gans. In: Conference on Computer Vision and Pattern Recognition, pp. 8798–8807 (2018)
Wang, W., Zhou, W., Bao, J., et al.: Instance-wise hard negative example generation for contrastive learning in unpaired image-to-image translation. In: International Conference on Computer Vision, pp. 14000–14009 (2021)
Wang, Z., Bovik, A.C., Sheikh, H.R., et al.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)
Article MATH Google Scholar
Wu, G., Jiang, J., Liu, X.: A practical contrastive learning framework for single-image super-resolution. IEEE Trans. Neural Netw. Learn. Syst. (2023)
Wu, H., Qu, Y., Lin, S., et al.: Contrastive learning for compact single image dehazing. In: Conference on Computer Vision and Pattern Recognition, pp. 10551–10560 (2021)
Yi, Z., Zhang, H.R., Tan, P., et al.: Dualgan: unsupervised dual learning for image-to-image translation. In: International Conference on Computer Vision, pp. 2868–2876 (2017)
Yu, F., Koltun, V., Funkhouser, T.A.: Dilated residual networks. In: Conference on Computer Vision and Pattern Recognition, pp. 636–644 (2017)
Zhang, D., Zheng, Z., Li, M., et al.: Reinforced similarity learning: Siamese relation networks for robust object tracking. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 294–303 (2020)
Zhang, Y., Tian, Y., Hou, J.: Csast: content self-supervised and style contrastive learning for arbitrary style transfer. Neural Netw. 164, 146–155 (2023)
Article MATH Google Scholar
Zhao, C., Cai, W., Yuan, Z., et al.: Multi-crop contrastive learning for unsupervised image-to-image translation. CoRR (2023)
Zheng, C., Cham, T., Cai, J.: The spatially-correlative loss for various image translation tasks. In: Conference on Computer Vision and Pattern Recognition, pp. 16407–16417 (2021)
Zhu, D., Wang, W., Xue, X., et al.: Structure-preserving image smoothing via contrastive learning. In: The Visual Computer, pp. 1–15 (2023)

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant Nos. 62276138 and 61876087.

Author information

Authors and Affiliations

Department of Computer Science and Technology, Nanjing Normal University, Nanjing, China
Chen Zhao, Wei-Ling Cai & Zheng Yuan

Authors

Chen Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Wei-Ling Cai
View author publications
You can also search for this author in PubMed Google Scholar
Zheng Yuan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Wei-Ling Cai.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhao, C., Cai, WL. & Yuan, Z. Spectral normalization and dual contrastive regularization for image-to-image translation. Vis Comput 41, 129–140 (2025). https://doi.org/10.1007/s00371-024-03314-5

Download citation

Accepted: 08 February 2024
Published: 13 March 2024
Issue Date: January 2025
DOI: https://doi.org/10.1007/s00371-024-03314-5

Spectral normalization and dual contrastive regularization for image-to-image translation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Unsupervised content and style learning for multimodal cross-domain image translation

Multimodal Unsupervised Image-to-Image Translation Without Independent Style Encoder

OmniStyleGAN for Style-Guided Image-to-Image Translation

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Spectral normalization and dual contrastive regularization for image-to-image translation

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Unsupervised content and style learning for multimodal cross-domain image translation

Multimodal Unsupervised Image-to-Image Translation Without Independent Style Encoder

OmniStyleGAN for Style-Guided Image-to-Image Translation

Explore related subjects

Data Availability

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now