[go: up one dir, main page]

Skip to main content
Log in

Concept-guided multi-level attention network for image emotion recognition

  • Original Paper
  • Published:
Signal, Image and Video Processing Aims and scope Submit manuscript

Abstract

Image emotion recognition aims to predict people’s emotional response toward visual stimuli. Recently, emotional region discovery has become hot topic in this field because it brings significant improvement for the task. Existing studies mainly discover emotional region by sophisticated analyzing from object aspect, which is less discriminative for emotion. In this paper, we propose a Concept-guided Multi-level Attention Network (CMANet) that makes full use of attribute aspect concept to enhance image emotion recognition. To leverage multiple concepts to guide the mining of emotional region, CMANet is designed as multi-level architecture, in which attended semantic feature is firstly calculated under the guidance of feature from holistic branch. Subsequently, with the obtained attended semantic feature, emotional region of feature map in local branch can be attended on. And then, an adaptive fusion method is proposed to achieve complementation of both attended visual and semantic features. Notably, for the emotion categories that are easily to be confused, a novel variable weight cross-entropy loss which enables the model to focus on the hard samples, is proposed to improve the performance of the task. Experiments on several affective image datasets prove that the proposed method is effective and superior to the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

  1. Yadollahi, A., Shahraki, A.G., Zaiane, O.R.: Current state of text sentiment analysis from opinion to emotion mining. ACM Comput. Surv. (CSUR) 50(2), 1–33 (2017)

    Article  Google Scholar 

  2. Asabere, N.Y., Acakpovi, A., Michael, M.B.: Improving socially-aware recommendation accuracy through personality. IEEE Trans. Affect. Comput. 9(3), 351–361 (2017)

    Article  Google Scholar 

  3. Ortis, A., Farinella, G.M., Battiato, S.: Survey on visual sentiment analysis. IET Image Proc. 14(8), 1440–1456 (2020)

    Article  Google Scholar 

  4. Mittal, N., Sharma, D., Joshi, M.L.: Image sentiment analysis using deep learning[C]//2018 IEEE/WIC/ACM international conference on web intelligence (WI). IEEE, 2018: 684–687.

  5. Yang, J., She, D., Sun, M.: Joint image emotion classification and distribution learning via deep convolutional neural network. In: IJCAI. Pp. 3266–3272. (2017)

  6. He, X., Zhang, W.: Emotion recognition by assisted learning with convolutional neural networks. Neurocomputing 291, 187–194 (2018)

    Article  Google Scholar 

  7. Das, P., Ghosh, A., Majumdar, R.: Determining attention mechanism for visual sentiment analysis of an image using svm classifier in deep learning based architecture. In: 2020 8th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions)(ICRITO). IEEE, pp. 339–343. (2020)

  8. Liu, C., Huang, L., Wei, Z., Zhang, W.: Subtler mixed attention network on fine-grained image classification. Appl. Intell. 51, 1–14 (2021)

    Article  Google Scholar 

  9. Wu, Z., Meng, M., Wu, J.: Visual sentiment prediction with attribute augmentation and multi-attention mechanism. Neural. Process. Lett. 51, 2403–2416 (2020)

    Article  Google Scholar 

  10. Zhang, J., Liu, X., Chen, M., Ye, Q., Wang, Z.: Image sentiment classification via multi-level sentiment region correlation analysis. Neurocomputing 469, 221–233 (2022)

    Article  Google Scholar 

  11. Yang, J., Gao, X., Li, L., Wang, X., Ding, J.: SOLVER: scene-object interrelated visual emotion reasoning network. IEEE Trans. Image Process. 30, 8686–8701 (2021)

    Article  Google Scholar 

  12. Zhang, J., Liu, X., Wang, Z., Yang, H.: Graph-based object semantic refinement for visual emotion recognition. IEEE Trans. Circuits Syst. Video Technol. 32(5), 3036–3049 (2021)

    Article  Google Scholar 

  13. Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst., 28. (2015)

  14. Frijda, N.H.: Emotion experience and its varieties. Emot. Rev. 1(3), 264–271 (2009)

    Article  Google Scholar 

  15. Bar, M.: Visual objects in context. Nat. Rev. Neurosci. 5(8), 617–629 (2004)

    Article  Google Scholar 

  16. Chen, T., Borth, D., Darrell, T., Chang, S.F.: Deepsentibank: Visual sentiment concept classification with deep convolutional neural networks. arXiv preprint https://arxiv.org/abs/1410.8586, (2014)

  17. Zhang, H., Gönen, M., Yang, Z., Oja, E.: Understanding emotional impact of images using Bayesian multiple kernel learning. Neurocomputing 165, 3–13 (2015)

    Article  Google Scholar 

  18. Rao, T., Xu, M., Liu, H., Wang, J., Burnett, I.: Multi-scale blocks based image emotion classification using multiple instance learning. In: 2016 IEEE International Conference on Image Processing (ICIP). IEEE, pp. 634–638. (2016)

  19. Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE, pp. 248–255. (2009)

  20. Zhou, B., Lapedriza, A., Khosla, A., Oliva, A., Torralba, A.: Places: a 10 million image database for scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 40(6), 1452–1464 (2017)

    Article  Google Scholar 

  21. Ahsan, U., De Choudhury, M., Essa, I.: Towards using visual attributes to infer image sentiment of social events. In: 2017 International Joint Conference on Neural Networks (IJCNN). IEEE, pp. 1372–1379. (2017)

  22. Borth, D., Ji, R., Chen, T., Breuel, T., Chang, S. F.: Large-scale visual sentiment ontology and detectors using adjective noun pairs. In: Proceedings of the 21st ACM international conference on Multimedia. pp. 223–232. (2013)

  23. Yuan, J., Mcdonough, S., You, Q., Luo, J.: Sentribute: image sentiment analysis from a mid-level perspective. In: Proceedings of the second international workshop on issues of sentiment discovery and opinion mining. pp. 1–8. (2013)

  24. Ali, A. R., Shahid, U., Ali, M., Ho, J.: High-level concepts for affective understanding of images. In: 2017 IEEE winter conference on applications of computer vision (WACV). IEEE, pp. 679–687. (2017)

  25. Zhang, J., Chen, M., Sun, H., Li, D., Wang, Z.: Object semantics sentiment correlation analysis enhanced image sentiment classification. Knowl.-Based Syst. 191, 105245 (2020)

    Article  Google Scholar 

  26. Gao, Y., Zhou, M., Metaxas, D.N.: UTNet: a hybrid transformer architecture for medical image segmentation. In: Medical Image Computing and Computer Assisted Intervention-MICCAI 2021: 24th International Conference, Strasbourg, France, September 27-October 1, 2021, Proceedings, Part III 24. Springer International Publishing, pp. 61–71. (2021)

  27. Chen, W., Huang, H., Peng, S., Zhou, C., Zhang, C.: YOLO-face: a real-time face detector. Vis. Comput. 37, 805–813 (2021)

    Article  Google Scholar 

  28. Biten, A.F., Mafla, A., Gómez, L., Karatzas, D.: Is an image worth five sentences? a new look into semantics for image-text matching. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. pp. 1391–1400. (2022)

  29. Liang, Y., Maeda, K., Ogawa, T., Haseyama, M.: Deep metric network via heterogeneous semantics for image sentiment analysis. In: 2021 IEEE International Conference on Image Processing (ICIP). IEEE, pp. 1039–1043. (2021)

  30. Li, Z., Sun, Q., Guo, Q., Wu, H., Deng, L., Zhang, Q., Chen, Y.: Visual sentiment analysis based on image caption and adjective–noun–pair description. Soft. Comput. (2021). https://doi.org/10.1007/s00500-021-06530-6

    Article  Google Scholar 

  31. You Q., Jin H., Luo J.: Visual sentiment analysis by attending on local image regions. In: Proceedings of the AAAI conference on artificial intelligence. 31(1). (2017)

  32. Li, Z., Lu, H., Zhao, C., Feng, L., Gu, G., Chen, W.: Weakly supervised discriminate enhancement network for visual sentiment analysis. Artif. Intell. Rev. 56(2), 1763–1785 (2023)

    Article  Google Scholar 

  33. She, D., Yang, J., Cheng, M.M., Lai, Y.K., Rosin, P.L., Wang, L.: Wscnet: Weakly supervised coupled networks for visual sentiment classification and detection. IEEE Trans. Multimedia 22(5), 1358–1371 (2019)

    Article  Google Scholar 

  34. Zhang, H., Xu, M.: Weakly supervised emotion intensity prediction for recognition of emotions in images. IEEE Trans. Multimedia 23, 2033–2044 (2020)

    Article  Google Scholar 

  35. Xu, Z., Wang, S.: Emotional attention detection and correlation exploration for image emotion distribution learning. IEEE Trans. Affect. Comput., (2021)

  36. Fan, S., Jiang, M., Shen, Z., Koenig, B.L., Kankanhalli, M.S., Zhao, Q.: The role of visual attention in sentiment prediction. In: Proceedings of the 25th ACM international conference on Multimedia. pp. 217–225. (2017)

  37. Song, K., Yao, T., Ling, Q.: Boosting image sentiment analysis with visual attention. Neurocomputing 312, 218–228 (2018)

    Article  Google Scholar 

  38. Wu, L., Qi, M., Jian, M., Zhang, H.: Visual sentiment analysis by combining global and local information. Neural. Process. Lett. 51, 2063–2075 (2020)

    Article  Google Scholar 

  39. Pennington, J., Socher, R., Manning, C. D.: Glove: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). pp. 1532–1543. (2014)

  40. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770–778. (2016)

  41. Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Polosukhin, I.: Attention is all you need. Adv. Neural Inf. Process. Syst., 30. (2017)

  42. Mikels, J.A., Fredrickson, B.L., Larkin, G.R.: Emotional category data on images from the international affective picture system. Behav. Res. Methods 37(4), 626–630 (2005)

    Article  Google Scholar 

  43. Ekman, P.: An argument for basic emotions. Cogn. Emot. 6(3–4), 169–200 (1992)

    Article  Google Scholar 

  44. Yao, X., Zhao, S., Lai, Y.K., She, D., Liang, J., Yang, J.: APSE: attention-aware polarity-sensitive embedding for emotion-based image retrieval. IEEE Trans. Multimedia 23, 4469–4482 (2020)

    Article  Google Scholar 

  45. Yao, X., She, D., Zhang, H., Yang, J., Cheng, M.M., Wang, L.: Adaptive deep metric learning for affective image retrieval and classification. IEEE Trans. Multimed. 23, 1640–1653 (2020)

    Article  Google Scholar 

  46. You, Q., Luo, J., Jin, H., Yang, J.: Building a large scale dataset for image emotion recognition: The fine print and the benchmark. In: Proceedings of the AAAI conference on artificial intelligence. 30(1). (2016)

  47. Peng, K.C., Chen, T., Sadovnik, A., Gallagher, A.C.: A mixed bag of emotions: Model, predict, and transfer emotion distributions. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 860–868. (2015)

  48. You, Q., Luo, J., Jin, H., Yang, J.: Robust image sentiment analysis using progressively trained and domain transferred deep networks. In: Proceedings of the AAAI conference on Artificial Intelligence. 29(1). (2015)

  49. Rao, T., Li, X., Xu, M.: Learning multi-level deep representations for image emotion classification. Neural. Process. Lett. 51, 2043–2061 (2020)

    Article  Google Scholar 

  50. Zhang, H., Xu, D., Luo, G., He, K.: Learning multi-level representations for affective image recognition. Neural Comput. Appl. 34(16), 14107–14120 (2022)

    Article  Google Scholar 

  51. Yang, J., Li, J., Wang, X., Ding, Y., Gao, X.: Stimuli-aware visual emotion analysis. IEEE Trans. Image Process. 30, 7432–7445 (2021)

    Article  Google Scholar 

Download references

Funding

This work was supported by the National Natural Science Foundation of China (62071384), the Key Research and Development Project of Shaanxi Province of China (2023-YBGY-239).

Author information

Authors and Affiliations

Authors

Contributions

Conceptualization contributed by HY, YF, ZG; methodology contributed by HY; formal analysis and investigation contributed by GL; writing—original draft preparation contributed by HY; writing—review and editing contributed by YF, ZG; funding acquisition contributed by ZG; supervision contributed by SL.

Corresponding author

Correspondence to Guoyun Lv.

Ethics declarations

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Ethical approval

This research does not involve human participants or animals.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yang, H., Fan, Y., Lv, G. et al. Concept-guided multi-level attention network for image emotion recognition. SIViP 18, 4313–4326 (2024). https://doi.org/10.1007/s11760-024-03074-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11760-024-03074-8

Keywords

Navigation