[go: up one dir, main page]

Skip to main content

Advertisement

Log in

Instance-dimension dual contrastive learning of visual representations

  • Original Paper
  • Published:
Machine Vision and Applications Aims and scope Submit manuscript

Abstract

Existing contrastive methods usually learn visual representations either by maximizing instance contrast or by minimizing dimension redundancy separately, and fail to make full use of data information. In this paper, we propose an instance-dimension dual contrastive method named IDDCLR to thoroughly mine the intrinsic knowledge underlying data. It jointly optimizes the instance contrast and the dimension redundancy to learn better visual representations. Specifically, we employ the normalized temperature scaled cross entropy (NT-Xent) to formulate the instance contrast loss, and propose a dimension contrast loss function that also takes the form of NT-Xent, resulting in symmetric form of the whole loss. The significance of minimizing the loss is twofold: On the one hand, it learns effective visual representations in the latent space, where the agreement between differently augmented views of the same instance is maximized. On the other hand, it minimizes the redundancy among feature dimensions, consequently being capable of avoiding trivial embeddings. Experimental results show that IDDCLR outperforms state-of-the-art self-supervised methods on classification tasks, and performs comparably on transfer learning tasks.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Algorithm 1
Fig. 3
Fig. 4
Fig. 5
Fig. 6

Similar content being viewed by others

References

  1. Li, H., Yan, S., Yu, Z., Tao, D.: Attribute-identity embedding and self-supervised learning for scalable person re-identification. IEEE Trans. Circuits Syst. Video Technol. 30(10), 3472–3485 (2020)

    Article  Google Scholar 

  2. Chen, P., Li, L., Wu, J., Dong, W., Shi, G.: Contrastive self-supervised pre-training for video quality assessment. IEEE Trans. Image Process. 31, 458–471 (2022)

    Article  Google Scholar 

  3. Jing, L., Tian, Y.: Self-supervised visual feature learning with deep neural networks: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 43(11), 4037–4058 (2021)

    Article  Google Scholar 

  4. Yuan, D., Chang, X., Huang, P., Liu, Q., He, Z.: Self-supervised deep correlation tracking. IEEE Trans. Image Process. 30, 976–985 (2021)

    Article  Google Scholar 

  5. Chen, T., Kornblith, S., Norouzi, M., Hinton, G.E.: A simple framework for contrastive learning of visual representations. In: Proceedings of the 37th international conference on machine learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of machine learning research, vol. 119, pp. 1597–1607. PMLR, (2020)

  6. He, K., Fan, H., Wu, Y., Xie, S., Girshick, R.B.: Momentum contrast for unsupervised visual representation learning. In: 2020 IEEE/CVF conference on computer vision and pattern recognition, CVPR 2020, Seattle, WA, USA, June 13-19, 2020, pp. 9726–9735. Computer Vision Foundation / IEEE, (2020)

  7. Grill, J., Strub, F., Altché, F., Tallec, C., Richemond, P.H., Buchatskaya, E., Doersch, C., Pires, B.Á., Guo, Z., Azar, M.G., Piot, B., Kavukcuoglu, K., Munos, R., Valko, M.: Bootstrap your own latent - A new approach to self-supervised learning. In: Advances in neural information processing systems 33: annual conference on neural information processing systems 2020, NeurIPS 2020, December 6-12, 2020, Virtual (2020)

  8. Chen, X., He, K.: Exploring simple siamese representation learning. In: IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2021, Virtual, June 19–25, 2021, pp. 15750–15758. Computer Vision Foundation/IEEE, (2021)

  9. Zbontar, J., Jing, L., Misra, I., LeCun, Y., Deny, S.: Barlow twins: Self-supervised learning via redundancy reduction. In: Proceedings of the 38th International Conference on Machine Learning, ICML 2021, 18-24 July 2021, Virtual Event. Proceedings of Machine Learning Research, vol. 139, pp. 12310–12320. PMLR, (2021)

  10. Xue, Z., Li, G., Wang, S., Zhang, W., Huang, Q.: Bilevel multiview latent space learning. IEEE Trans. Circuits Syst. Video Technol. 28(2), 327–341 (2018)

    Article  Google Scholar 

  11. Wang, T., Isola, P.: Understanding contrastive representation learning through alignment and uniformity on the hypersphere. In: Proceedings of the 37th international conference on machine learning, ICML 2020, 13-18 July 2020, Virtual Event. Proceedings of machine learning research, vol. 119, pp. 9929–9939. PMLR, (2020)

  12. Tian, Y., Chen, X., Ganguli, S.: Understanding self-supervised learning dynamics without contrastive pairs. In: Proceedings of the 38th international conference on machine learning, ICML 2021, 18-24 July 2021, Virtual Event. Proceedings of machine learning research, vol. 139, pp. 10268–10278. PMLR, (2021)

  13. Dwibedi, D., Aytar, Y., Tompson, J., Sermanet, P., Zisserman, A.: With a little help from my friends: Nearest-neighbor contrastive learning of visual representations. In: 2021 IEEE/CVF international conference on computer vision, ICCV 2021, Montreal, QC, Canada, October 10–17, 2021, pp. 9568–9577. IEEE, (2021)

  14. Wu, Z., Xiong, Y., Yu, S.X., Lin, D.: Unsupervised feature learning via non-parametric instance discrimination. In: 2018 IEEE conference on computer vision and pattern recognition, CVPR 2018, Salt Lake City, UT, USA, June 18-22, 2018, pp. 3733–3742. Computer vision foundation/IEEE computer society, (2018)

  15. Tian, Y., Sun, C., Poole, B., Krishnan, D., Schmid, C., Isola, P.: What makes for good views for contrastive learning? In: Advances in neural information processing systems 33: annual conference on neural information processing systems 2020, NeurIPS 2020, December 6-12, 2020, Virtual (2020)

  16. Caron, M., Misra, I., Mairal, J., Goyal, P., Bojanowski, P., Joulin, A.: Unsupervised learning of visual features by contrasting cluster assignments. In: Advances in neural information processing systems 33: annual conference on neural information processing systems 2020, NeurIPS 2020, December 6-12, 2020, Virtual (2020)

  17. Asano, Y.M., Rupprecht, C., Vedaldi, A.: Self-labelling via simultaneous clustering and representation learning. In: 8th international conference on learning representations, ICLR 2020, Addis Ababa, Ethiopia, April 26-30, 2020. OpenReview.net, (2020)

  18. Caron, M., Bojanowski, P., Joulin, A., Douze, M.: Deep clustering for unsupervised learning of visual features. In: Computer Vision - ECCV 2018—15th European conference, Munich, Germany, September 8–14, 2018, Proceedings, Part XIV. Lecture Notes in Computer Science, vol. 11218, pp. 139–156. Springer, (2018)

  19. Gansbeke, W.V., Vandenhende, S., Georgoulis, S., Proesmans, M., Gool, L.V.: SCAN: learning to classify images without labels. In: Computer Vision - ECCV 2020 - 16th European conference, Glasgow, UK, August 23-28, 2020, Proceedings, Part X. Lecture Notes in Computer Science, vol. 12355, pp. 268–285. Springer, (2020)

  20. Caron, M., Bojanowski, P., Mairal, J., Joulin, A.: Unsupervised pre-training of image features on non-curated data. In: 2019 IEEE/CVF international conference on computer vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019, pp. 2959–2968. IEEE, (2019)

  21. Tsai, Y.H., Bai, S., Morency, L., Salakhutdinov, R.: A note on connecting barlow twins with negative-sample-free contrastive learning. CoRR arXiv:2104.13712 (2021)

  22. Hua, T., Wang, W., Xue, Z., Ren, S., Wang, Y., Zhao, H.: On feature decorrelation in self-supervised learning. In: 2021 IEEE/CVF international conference on computer vision, ICCV 2021, Montreal, QC, Canada, October 10–17, 2021, pp. 9578–9588. IEEE, (2021)

  23. Fujiwara, T., Kwon, O., Ma, K.: Supporting analysis of dimensionality reduction results with contrastive learning. IEEE Trans. Vis. Comput. Graph. 26(1), 45–55 (2020)

    Article  Google Scholar 

  24. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: 2016 IEEE conference on computer vision and pattern recognition, CVPR 2016, Las Vegas, NV, USA, June 27–30, 2016, pp. 770–778. IEEE Computer Society, (2016)

  25. Hadsell, R., Chopra, S., LeCun, Y.: Dimensionality reduction by learning an invariant mapping. In: 2006 IEEE computer society conference on computer vision and pattern recognition (CVPR 2006), 17–22 June 2006, New York, NY, USA, pp. 1735–1742. IEEE Computer Society, (2006)

  26. Zhuang, C., Zhai, A.L., Yamins, D.: Local aggregation for unsupervised learning of visual embeddings. In: 2019 IEEE/CVF international conference on computer vision, ICCV 2019, Seoul, Korea (South), October 27 - November 2, 2019, pp. 6001–6011. IEEE, (2019)

  27. Kalantidis, Y., Sariyildiz, M.B., Pion, N., Weinzaepfel, P., Larlus, D.: Hard negative mixing for contrastive learning. In: Advances in neural information processing systems 33: annual conference on neural information processing systems 2020, NeurIPS 2020, December 6–12, 2020, Virtual (2020)

  28. Hénaff, O.J.: Data-efficient image recognition with contrastive predictive coding. In: Proceedings of the 37th international conference on machine learning, ICML 2020, 13–18 July 2020, Virtual Event. Proceedings of machine learning research, vol. 119, pp. 4182–4192. PMLR, (2020)

  29. Hjelm, R.D., Fedorov, A., Lavoie-Marchildon, S., Grewal, K., Bachman, P., Trischler, A., Bengio, Y.: Learning deep representations by mutual information estimation and maximization. In: 7th international conference on learning representations, ICLR 2019, New Orleans, LA, USA, May 6–9, 2019. OpenReview.net, (2019)

  30. Sohn, K.: Improved deep metric learning with multi-class n-pair loss objective. In: Advances in neural information processing systems 29: annual conference on neural information processing systems 2016, December 5–10, 2016, Barcelona, Spain, pp. 1849–1857 (2016)

  31. van den Oord, A., Li, Y., Vinyals, O.: Representation learning with contrastive predictive coding (2018). CoRR. arXiv:1807.03748

  32. Bachman, P., Hjelm, R.D., Buchwalter, W.: Learning representations by maximizing mutual information across views. In: Advances in neural information processing systems 32: annual conference on neural information processing systems 2019, NeurIPS 2019, December 8–14, 2019, Vancouver, BC, Canada, pp. 15509–15519 (2019)

  33. Chen, X., Fan, H., Girshick, R.B., He, K.: Improved baselines with momentum contrastive learning (2020). CoRR. arXiv:2003.04297

  34. Bromley, J., Bentz, J.W., Bottou, L., Guyon, I., LeCun, Y., Moore, C., Säckinger, E., Shah, R.: Signature verification using A siamese time delay neural network. Int. J. Pattern Recognit. Artif. Intell. 7(4), 669–688 (1993)

    Article  Google Scholar 

  35. Tian, Y., Krishnan, D., Isola, P.: Contrastive multiview coding. In: Computer Vision—ECCV 2020—16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part XI. Lecture Notes in Computer Science, vol. 12356, pp. 776–794. Springer, (2020)

  36. Li, Y., Hu, P., Liu, J.Z., Peng, D., Zhou, J.T., Peng, X.: Contrastive clustering. In: Thirty-fifth AAAI conference on artificial intelligence, AAAI 2021, thirty-third conference on innovative applications of artificial intelligence, IAAI 2021, the eleventh symposium on educational advances in artificial intelligence, EAAI 2021, Virtual Event, February 2–9, 2021, pp. 8547–8555. AAAI Press, (2021)

  37. Ermolov, A., Siarohin, A., Sangineto, E., Sebe, N.: Whitening for self-supervised representation learning. In: Proceedings of the 38th international conference on machine learning, ICML 2021, 18–24 July 2021, Virtual Event. Proceedings of machine learning research, vol. 139, pp. 3015–3024. PMLR, (2021)

  38. Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: 2009 IEEE computer society conference on computer vision and pattern recognition (CVPR 2009), 20–25 June 2009, Miami, Florida, USA, pp. 248–255. IEEE computer society, (2009)

  39. Krizhevsky, A., Hinton, G.: Learning multiple layers of features from tiny images. Handbook of Systemic Autoimmune Diseases 1(4) (2009)

  40. Coates, A., Ng, A.Y., Lee, H.: An analysis of single-layer networks in unsupervised feature learning. In: Proceedings of the fourteenth international conference on artificial intelligence and statistics, AISTATS 2011, Fort Lauderdale, USA, April 11–13, 2011. JMLR Proceedings, vol. 15, pp. 215–223. JMLR.org, (2011)

  41. Le, Y., Yang, X.: Tiny imagenet visual recognition challenge. CS 231N 7(7), 3 (2015)

  42. Bossard, L., Guillaumin, M., Gool, L.V.: Food-101—mining discriminative components with random forests. In: Computer vision—ECCV 2014—13th European conference, Zurich, Switzerland, September 6–12, 2014, Proceedings, Part VI. Lecture Notes in Computer Science, vol. 8694, pp. 446–461. Springer, (2014)

  43. Berg, T., Liu, J., Lee, S.W., Alexander, M.L., Jacobs, D.W., Belhumeur, P.N.: Birdsnap: Large-scale fine-grained visual categorization of birds. In: 2014 IEEE conference on computer vision and pattern recognition, CVPR 2014, Columbus, OH, USA, June 23-28, 2014, pp. 2019–2026. IEEE Computer Society, (2014)

  44. Krause, J., Stark, M., Deng, J., Fei-Fei, L.: 3d object representations for fine-grained categorization. In: 2013 IEEE international conference on computer vision workshops, ICCV Workshops 2013, Sydney, Australia, December 1–8, 2013, pp. 554–561. IEEE Computer Society, (2013)

  45. Nilsback, M., Zisserman, A.: Automated flower classification over a large number of classes. In: Sixth Indian conference on computer vision, graphics and image processing, ICVGIP 2008, Bhubaneswar, India, 16–19 December 2008, pp. 722–729. IEEE computer society, (2008)

  46. You, Y., Gitman, I., Ginsburg, B.: Large batch training of convolutional networks (2017). arXiv preprint arXiv:1708.03888

  47. Loshchilov, I., Hutter, F.: SGDR: stochastic gradient descent with warm restarts. In: 5th international conference on learning representations, ICLR 2017, Toulon, France, April 24–26, 2017, Conference Track Proceedings. OpenReview.net, (2017)

  48. Chen, T., Kornblith, S., Swersky, K., Norouzi, M., Hinton, G.E.: Big self-supervised models are strong semi-supervised learners. In: Advances in neural information processing systems 33: annual conference on neural information processing systems 2020, NeurIPS 2020, December 6–12, 2020, Virtual (2020)

  49. Misra, I., van der Maaten, L.: Self-supervised learning of pretext-invariant representations. In: 2020 IEEE/CVF conference on computer vision and pattern recognition, CVPR 2020, Seattle, WA, USA, June 13–19, 2020, pp. 6706–6716. Computer Vision Foundation/IEEE, (2020)

  50. Van der Maaten, L., Hinton, G.: Visualizing data using t-sne. J. Mach. Learn. Res. 9(11) (2008)

  51. Zhou, B., Khosla, A., Lapedriza, A., Oliva, A., Torralba, A.: Learning deep features for discriminative localization. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 2921–2929 (2016)

Download references

Acknowledgements

This work was supported in part by the Fundamental Research Funds for the Central Universities under Grant B230201025, in part by the Natural Science Foundation of Jiangsu Province under Grant BK20201160, and in part by the Key Research and Development Program of Changzhou (Social Development) under Grant CE20225042.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Liantao Wang.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, Q., Wang, L., Wang, Q. et al. Instance-dimension dual contrastive learning of visual representations. Machine Vision and Applications 34, 89 (2023). https://doi.org/10.1007/s00138-023-01440-z

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s00138-023-01440-z

Keywords