[go: up one dir, main page]

Skip to main content

Advertisement

Log in

Spectral normalization and dual contrastive regularization for image-to-image translation

  • Original article
  • Published:
The Visual Computer Aims and scope Submit manuscript

Abstract

Existing image-to-image (I2I) translation methods achieve state-of-the-art performance by incorporating the patch-wise contrastive learning into generative adversarial networks. However, patch-wise contrastive learning only focuses on the local content similarity but neglects the global structure constraint, which affects the quality of the generated images. In this paper, we propose a new unpaired I2I translation framework based on dual contrastive regularization and spectral normalization, namely SN-DCR. To maintain consistency of the global structure and texture, we design the dual contrastive regularization using different deep feature spaces respectively. In order to improve the global structure information of the generated images, we formulate a semantic contrastive loss to make the global semantic structure of the generated images similar to the real images from the target domain in the semantic feature space. We use gram matrices to extract the style of texture from images. Similarly, we design a style contrastive loss to improve the global texture information of the generated images. Moreover, to enhance the stability of the model, we employ the spectral normalized convolutional network in the design of our generator. We conduct comprehensive experiments to evaluate the effectiveness of SN-DCR, and the results prove that our method achieves SOTA in multiple tasks. The code and pretrained models are available at https://github.com/zhihefang/SN-DCR.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

Data Availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

  1. Benaim, S., Wolf, L.: One-sided unsupervised domain mapping. Neural Inf. Process. Syst. 752–762 (2017)

  2. Bruckstein, A.M., ter Haar Romeny, B.M., Bronstein, A.M., et al.: Wasserstein barycenter and its application to texture mixing. In: International Conference on Scale Space and Variational Methods, pp. 435–446 (2011)

  3. Caron, M., Misra, I., Mairal, J., et al.: Unsupervised learning of visual features by contrasting cluster assignments. Neural Inf. Process. Syst. (2020)

  4. Chang, Y., Guo, Y., Ye, Y., et al.: Unsupervised deraining: Where asymmetric contrastive learning meets self-similarity. IEEE Trans. Pattern Anal. Mach. Intell. (2023)

  5. Chen, T., Kornblith, S., Norouzi, M., et al.: A simple framework for contrastive learning of visual representations. In: International Conference on Machine Learning, pp. 1597–1607 (2020)

  6. Chen, X., Fan, H., Girshick, R.B., et al.: Improved baselines with momentum contrastive learning. CoRR (2020)

  7. Choi, Y., Uh, Y., Yoo, J., et al.: Stargan v2: Diverse image synthesis for multiple domains. In: Conference on Computer Vision and Pattern Recognition, pp. 8185–8194 (2020)

  8. Fu, T., Gong, M., Wang, C., et al.: Geometry-consistent generative adversarial networks for one-sided unsupervised domain mapping. In: Conference on Computer Vision and Pattern Recognition, pp. 2427–2436 (2019)

  9. Gatys, L.A., Ecker, A.S., Bethge, M.: Image style transfer using convolutional neural networks. In: Conference on Computer Vision and Pattern Recognition, pp. 2414–2423 (2016)

  10. Goodfellow, I., Pouget-Abadie, J., Mirza, M., et al.: Generative adversarial networks. Commun. ACM 63(11), 139–144 (2020)

    Article  MathSciNet  MATH  Google Scholar 

  11. Gou, Y., Li, M., Song, Y., et al.: Multi-feature contrastive learning for unpaired image-to-image translation. Complex Intell. Syst. 9(4), 4111–4122 (2023)

    Article  MATH  Google Scholar 

  12. Han, J., Shoeiby, M., Petersson, L., et al.: Dual contrastive learning for unsupervised image-to-image translation. In: Conference on Computer Vision and Pattern Recognition Workshops, pp. 746–755 (2021)

  13. He, K., Fan, H., Wu, Y., et al.: Momentum contrast for unsupervised visual representation learning. In: Conference on Computer Vision and Pattern Recognition, pp. 9726–9735 (2020)

  14. Heusel, M., Ramsauer, H., Unterthiner, T., et al.: Gans trained by a two time-scale update rule converge to a local nash equilibrium. Neural Inf. Process. Syst. 6626–6637 (2017)

  15. Hu, J., Shen, L., Albanie, S., et al.: Squeeze-and-excitation networks. IEEE Trans. Pattern Anal. Mach. Intell. 42(8), 2011–2023 (2020)

    Article  MATH  Google Scholar 

  16. Hu, X., Zhou, X., Huang, Q., et al.: Qs-attn: Query-selected attention for contrastive learning in I2I translation. In: Conference on Computer Vision and Pattern Recognition, pp. 18270–18279 (2022)

  17. Huang, X., Liu, M.Y., Belongie, S., et al.: Multimodal unsupervised image-to-image translation. In: European Conference on Computer Vision, pp .172–189 (2018)

  18. Isola, P., Zhu, J., Zhou, T., et al.: Image-to-image translation with conditional adversarial networks. In: Conference on Computer Vision and Pattern Recognition, pp. 5967–5976 (2017)

  19. Karras, T., Laine, S., Aila, T.: A style-based generator architecture for generative adversarial networks. IEEE Trans. Pattern Anal. Mach. Intell. 43(12), 4217–4228 (2021)

    Article  MATH  Google Scholar 

  20. Mirza, M., Osindero, S.: Conditional generative adversarial nets. CoRR (2014)

  21. Miyato, T., Kataoka, T., Koyama, M., et al.: Spectral normalization for generative adversarial networks. In: International Conference on Learning Representations (2018)

  22. Park, T., Liu, M., Wang, T., et al.: Semantic image synthesis with spatially-adaptive normalization. In: Conference on Computer Vision and Pattern Recognition, pp. 2337–2346 (2019)

  23. Park, T., Efros, A.A., Zhang, R., et al.: Contrastive learning for unpaired image-to-image translation. Euro. Conf. Comput. Vis. 12345, 319–345 (2020)

    MATH  Google Scholar 

  24. Phaphuangwittayakul, A., Ying, F., Guo, Y., et al.: Few-shot image generation based on contrastive meta-learning generative adversarial network. Vis. Comput. 39(9), 4015–4028 (2023)

    Article  Google Scholar 

  25. Qin, Z., Zhang, P., Wu, F., et al.: Fcanet: Frequency channel attention networks. In: International Conference on Computer Vision, pp. 763–772 (2021)

  26. Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: A unified embedding for face recognition and clustering. In: Conference on Computer Vision and Pattern Recognition, pp. 815–823 (2015)

  27. Son, J., Park, S.J., Jung, K.: Retinal vessel segmentation in fundoscopic images with generative adversarial networks. CoRR (2017)

  28. Song, S., Lee, S., Seong, H., et al.: Shunit: Style harmonization for unpaired image-to-image translation. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 2292–2302 (2023)

  29. Sung, F., Yang, Y., Zhang, L., et al.: Learning to compare: Relation network for few-shot learning. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 1199–1208 (2018)

  30. Torbunov, D., Huang, Y., Yu, H., et al.: Uvcgan: Unet vision transformer cycle-consistent gan for unpaired image-to-image translation. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 702–712 (2023)

  31. un-Yan, Zhu, Park, T., Isola, P., et al.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: International Conference on Computer Vision, pp. 2242–2251 (2017)

  32. Wang, T., Liu, M., Zhu, J., et al.: High-resolution image synthesis and semantic manipulation with conditional gans. In: Conference on Computer Vision and Pattern Recognition, pp. 8798–8807 (2018)

  33. Wang, W., Zhou, W., Bao, J., et al.: Instance-wise hard negative example generation for contrastive learning in unpaired image-to-image translation. In: International Conference on Computer Vision, pp. 14000–14009 (2021)

  34. Wang, Z., Bovik, A.C., Sheikh, H.R., et al.: Image quality assessment: from error visibility to structural similarity. IEEE Trans. Image Process. 13(4), 600–612 (2004)

    Article  MATH  Google Scholar 

  35. Wu, G., Jiang, J., Liu, X.: A practical contrastive learning framework for single-image super-resolution. IEEE Trans. Neural Netw. Learn. Syst. (2023)

  36. Wu, H., Qu, Y., Lin, S., et al.: Contrastive learning for compact single image dehazing. In: Conference on Computer Vision and Pattern Recognition, pp. 10551–10560 (2021)

  37. Yi, Z., Zhang, H.R., Tan, P., et al.: Dualgan: unsupervised dual learning for image-to-image translation. In: International Conference on Computer Vision, pp. 2868–2876 (2017)

  38. Yu, F., Koltun, V., Funkhouser, T.A.: Dilated residual networks. In: Conference on Computer Vision and Pattern Recognition, pp. 636–644 (2017)

  39. Zhang, D., Zheng, Z., Li, M., et al.: Reinforced similarity learning: Siamese relation networks for robust object tracking. In: Proceedings of the 28th ACM International Conference on Multimedia, pp. 294–303 (2020)

  40. Zhang, Y., Tian, Y., Hou, J.: Csast: content self-supervised and style contrastive learning for arbitrary style transfer. Neural Netw. 164, 146–155 (2023)

    Article  MATH  Google Scholar 

  41. Zhao, C., Cai, W., Yuan, Z., et al.: Multi-crop contrastive learning for unsupervised image-to-image translation. CoRR (2023)

  42. Zheng, C., Cham, T., Cai, J.: The spatially-correlative loss for various image translation tasks. In: Conference on Computer Vision and Pattern Recognition, pp. 16407–16417 (2021)

  43. Zhu, D., Wang, W., Xue, X., et al.: Structure-preserving image smoothing via contrastive learning. In: The Visual Computer, pp. 1–15 (2023)

Download references

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant Nos. 62276138 and 61876087.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Wei-Ling Cai.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Zhao, C., Cai, WL. & Yuan, Z. Spectral normalization and dual contrastive regularization for image-to-image translation. Vis Comput 41, 129–140 (2025). https://doi.org/10.1007/s00371-024-03314-5

Download citation

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00371-024-03314-5

Keywords