Concept-guided multi-level attention network for image emotion recognition

Hansen Yang¹,
Yangyu Fan¹,
Guoyun Lv¹,
Shiya Liu² &
…
Zhe Guo¹

281 Accesses
Explore all metrics

Abstract

Image emotion recognition aims to predict people’s emotional response toward visual stimuli. Recently, emotional region discovery has become hot topic in this field because it brings significant improvement for the task. Existing studies mainly discover emotional region by sophisticated analyzing from object aspect, which is less discriminative for emotion. In this paper, we propose a Concept-guided Multi-level Attention Network (CMANet) that makes full use of attribute aspect concept to enhance image emotion recognition. To leverage multiple concepts to guide the mining of emotional region, CMANet is designed as multi-level architecture, in which attended semantic feature is firstly calculated under the guidance of feature from holistic branch. Subsequently, with the obtained attended semantic feature, emotional region of feature map in local branch can be attended on. And then, an adaptive fusion method is proposed to achieve complementation of both attended visual and semantic features. Notably, for the emotion categories that are easily to be confused, a novel variable weight cross-entropy loss which enables the model to focus on the hard samples, is proposed to improve the performance of the task. Experiments on several affective image datasets prove that the proposed method is effective and superior to the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Exploiting emotional concepts for image emotion recognition

Article 24 April 2022

A supervised contrastive learning-based model for image emotion classification

Article 24 April 2024

Affective image recognition with multi-attribute knowledge in deep neural networks

Article 17 July 2023

Data availability

The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.

References

Yadollahi, A., Shahraki, A.G., Zaiane, O.R.: Current state of text sentiment analysis from opinion to emotion mining. ACM Comput. Surv. (CSUR) 50(2), 1–33 (2017)
Article Google Scholar
Asabere, N.Y., Acakpovi, A., Michael, M.B.: Improving socially-aware recommendation accuracy through personality. IEEE Trans. Affect. Comput. 9(3), 351–361 (2017)
Article Google Scholar
Ortis, A., Farinella, G.M., Battiato, S.: Survey on visual sentiment analysis. IET Image Proc. 14(8), 1440–1456 (2020)
Article Google Scholar
Mittal, N., Sharma, D., Joshi, M.L.: Image sentiment analysis using deep learning[C]//2018 IEEE/WIC/ACM international conference on web intelligence (WI). IEEE, 2018: 684–687.
Yang, J., She, D., Sun, M.: Joint image emotion classification and distribution learning via deep convolutional neural network. In: IJCAI. Pp. 3266–3272. (2017)
He, X., Zhang, W.: Emotion recognition by assisted learning with convolutional neural networks. Neurocomputing 291, 187–194 (2018)
Article Google Scholar
Das, P., Ghosh, A., Majumdar, R.: Determining attention mechanism for visual sentiment analysis of an image using svm classifier in deep learning based architecture. In: 2020 8th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions)(ICRITO). IEEE, pp. 339–343. (2020)
Liu, C., Huang, L., Wei, Z., Zhang, W.: Subtler mixed attention network on fine-grained image classification. Appl. Intell. 51, 1–14 (2021)
Article Google Scholar
Wu, Z., Meng, M., Wu, J.: Visual sentiment prediction with attribute augmentation and multi-attention mechanism. Neural. Process. Lett. 51, 2403–2416 (2020)
Article Google Scholar
Zhang, J., Liu, X., Chen, M., Ye, Q., Wang, Z.: Image sentiment classification via multi-level sentiment region correlation analysis. Neurocomputing 469, 221–233 (2022)
Article Google Scholar
Yang, J., Gao, X., Li, L., Wang, X., Ding, J.: SOLVER: scene-object interrelated visual emotion reasoning network. IEEE Trans. Image Process. 30, 8686–8701 (2021)
Article Google Scholar
Zhang, J., Liu, X., Wang, Z., Yang, H.: Graph-based object semantic refinement for visual emotion recognition. IEEE Trans. Circuits Syst. Video Technol. 32(5), 3036–3049 (2021)
Article Google Scholar
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst., 28. (2015)
Frijda, N.H.: Emotion experience and its varieties. Emot. Rev. 1(3), 264–271 (2009)
Article Google Scholar
Bar, M.: Visual objects in context. Nat. Rev. Neurosci. 5(8), 617–629 (2004)
Article Google Scholar
Chen, T., Borth, D., Darrell, T., Chang, S.F.: Deepsentibank: Visual sentiment concept classification with deep convolutional neural networks. arXiv preprint https://arxiv.org/abs/1410.8586, (2014)
Zhang, H., Gönen, M., Yang, Z., Oja, E.: Understanding emotional impact of images using Bayesian multiple kernel learning. Neurocomputing 165, 3–13 (2015)
Article Google Scholar
Rao, T., Xu, M., Liu, H., Wang, J., Burnett, I.: Multi-scale blocks based image emotion classification using multiple instance learning. In: 2016 IEEE International Conference on Image Processing (ICIP). IEEE, pp. 634–638. (2016)
Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE, pp. 248–255. (2009)
Zhou, B., Lapedriza, A., Khosla, A., Oliva, A., Torralba, A.: Places: a 10 million image database for scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 40(6), 1452–1464 (2017)
Article Google Scholar
Ahsan, U., De Choudhury, M., Essa, I.: Towards using visual attributes to infer image sentiment of social events. In: 2017 International Joint Conference on Neural Networks (IJCNN). IEEE, pp. 1372–1379. (2017)
Borth, D., Ji, R., Chen, T., Breuel, T., Chang, S. F.: Large-scale visual sentiment ontology and detectors using adjective noun pairs. In: Proceedings of the 21st ACM international conference on Multimedia. pp. 223–232. (2013)
Yuan, J., Mcdonough, S., You, Q., Luo, J.: Sentribute: image sentiment analysis from a mid-level perspective. In: Proceedings of the second international workshop on issues of sentiment discovery and opinion mining. pp. 1–8. (2013)
Ali, A. R., Shahid, U., Ali, M., Ho, J.: High-level concepts for affective understanding of images. In: 2017 IEEE winter conference on applications of computer vision (WACV). IEEE, pp. 679–687. (2017)
Zhang, J., Chen, M., Sun, H., Li, D., Wang, Z.: Object semantics sentiment correlation analysis enhanced image sentiment classification. Knowl.-Based Syst. 191, 105245 (2020)
Article Google Scholar
Gao, Y., Zhou, M., Metaxas, D.N.: UTNet: a hybrid transformer architecture for medical image segmentation. In: Medical Image Computing and Computer Assisted Intervention-MICCAI 2021: 24th International Conference, Strasbourg, France, September 27-October 1, 2021, Proceedings, Part III 24. Springer International Publishing, pp. 61–71. (2021)
Chen, W., Huang, H., Peng, S., Zhou, C., Zhang, C.: YOLO-face: a real-time face detector. Vis. Comput. 37, 805–813 (2021)
Article Google Scholar
Biten, A.F., Mafla, A., Gómez, L., Karatzas, D.: Is an image worth five sentences? a new look into semantics for image-text matching. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. pp. 1391–1400. (2022)
Liang, Y., Maeda, K., Ogawa, T., Haseyama, M.: Deep metric network via heterogeneous semantics for image sentiment analysis. In: 2021 IEEE International Conference on Image Processing (ICIP). IEEE, pp. 1039–1043. (2021)
Li, Z., Sun, Q., Guo, Q., Wu, H., Deng, L., Zhang, Q., Chen, Y.: Visual sentiment analysis based on image caption and adjective–noun–pair description. Soft. Comput. (2021). https://doi.org/10.1007/s00500-021-06530-6
Article Google Scholar
You Q., Jin H., Luo J.: Visual sentiment analysis by attending on local image regions. In: Proceedings of the AAAI conference on artificial intelligence. 31(1). (2017)
Li, Z., Lu, H., Zhao, C., Feng, L., Gu, G., Chen, W.: Weakly supervised discriminate enhancement network for visual sentiment analysis. Artif. Intell. Rev. 56(2), 1763–1785 (2023)
Article Google Scholar
She, D., Yang, J., Cheng, M.M., Lai, Y.K., Rosin, P.L., Wang, L.: Wscnet: Weakly supervised coupled networks for visual sentiment classification and detection. IEEE Trans. Multimedia 22(5), 1358–1371 (2019)
Article Google Scholar
Zhang, H., Xu, M.: Weakly supervised emotion intensity prediction for recognition of emotions in images. IEEE Trans. Multimedia 23, 2033–2044 (2020)
Article Google Scholar
Xu, Z., Wang, S.: Emotional attention detection and correlation exploration for image emotion distribution learning. IEEE Trans. Affect. Comput., (2021)
Fan, S., Jiang, M., Shen, Z., Koenig, B.L., Kankanhalli, M.S., Zhao, Q.: The role of visual attention in sentiment prediction. In: Proceedings of the 25th ACM international conference on Multimedia. pp. 217–225. (2017)
Song, K., Yao, T., Ling, Q.: Boosting image sentiment analysis with visual attention. Neurocomputing 312, 218–228 (2018)
Article Google Scholar
Wu, L., Qi, M., Jian, M., Zhang, H.: Visual sentiment analysis by combining global and local information. Neural. Process. Lett. 51, 2063–2075 (2020)
Article Google Scholar
Pennington, J., Socher, R., Manning, C. D.: Glove: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). pp. 1532–1543. (2014)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770–778. (2016)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Polosukhin, I.: Attention is all you need. Adv. Neural Inf. Process. Syst., 30. (2017)
Mikels, J.A., Fredrickson, B.L., Larkin, G.R.: Emotional category data on images from the international affective picture system. Behav. Res. Methods 37(4), 626–630 (2005)
Article Google Scholar
Ekman, P.: An argument for basic emotions. Cogn. Emot. 6(3–4), 169–200 (1992)
Article Google Scholar
Yao, X., Zhao, S., Lai, Y.K., She, D., Liang, J., Yang, J.: APSE: attention-aware polarity-sensitive embedding for emotion-based image retrieval. IEEE Trans. Multimedia 23, 4469–4482 (2020)
Article Google Scholar
Yao, X., She, D., Zhang, H., Yang, J., Cheng, M.M., Wang, L.: Adaptive deep metric learning for affective image retrieval and classification. IEEE Trans. Multimed. 23, 1640–1653 (2020)
Article Google Scholar
You, Q., Luo, J., Jin, H., Yang, J.: Building a large scale dataset for image emotion recognition: The fine print and the benchmark. In: Proceedings of the AAAI conference on artificial intelligence. 30(1). (2016)
Peng, K.C., Chen, T., Sadovnik, A., Gallagher, A.C.: A mixed bag of emotions: Model, predict, and transfer emotion distributions. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 860–868. (2015)
You, Q., Luo, J., Jin, H., Yang, J.: Robust image sentiment analysis using progressively trained and domain transferred deep networks. In: Proceedings of the AAAI conference on Artificial Intelligence. 29(1). (2015)
Rao, T., Li, X., Xu, M.: Learning multi-level deep representations for image emotion classification. Neural. Process. Lett. 51, 2043–2061 (2020)
Article Google Scholar
Zhang, H., Xu, D., Luo, G., He, K.: Learning multi-level representations for affective image recognition. Neural Comput. Appl. 34(16), 14107–14120 (2022)
Article Google Scholar
Yang, J., Li, J., Wang, X., Ding, Y., Gao, X.: Stimuli-aware visual emotion analysis. IEEE Trans. Image Process. 30, 7432–7445 (2021)
Article Google Scholar

Download references

Funding

This work was supported by the National Natural Science Foundation of China (62071384), the Key Research and Development Project of Shaanxi Province of China (2023-YBGY-239).

Author information

Authors and Affiliations

School of Electronics and Information, Northwestern Polytechnical University, Xi’an, China
Hansen Yang, Yangyu Fan, Guoyun Lv & Zhe Guo
Content Production Center of Virtual Reality, Beijing, China
Shiya Liu

Authors

Hansen Yang
View author publications
You can also search for this author in PubMed Google Scholar
Yangyu Fan
View author publications
You can also search for this author in PubMed Google Scholar
Guoyun Lv
View author publications
You can also search for this author in PubMed Google Scholar
Shiya Liu
View author publications
You can also search for this author in PubMed Google Scholar
Zhe Guo
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization contributed by HY, YF, ZG; methodology contributed by HY; formal analysis and investigation contributed by GL; writing—original draft preparation contributed by HY; writing—review and editing contributed by YF, ZG; funding acquisition contributed by ZG; supervision contributed by SL.

Corresponding author

Correspondence to Guoyun Lv.

Ethics declarations

Conflict of interest

The authors have no competing interests to declare that are relevant to the content of this article.

Ethical approval

This research does not involve human participants or animals.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Yang, H., Fan, Y., Lv, G. et al. Concept-guided multi-level attention network for image emotion recognition. SIViP 18, 4313–4326 (2024). https://doi.org/10.1007/s11760-024-03074-8

Download citation

Received: 09 December 2023
Revised: 03 February 2024
Accepted: 05 February 2024
Published: 13 March 2024
Issue Date: July 2024
DOI: https://doi.org/10.1007/s11760-024-03074-8

Concept-guided multi-level attention network for image emotion recognition

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Exploiting emotional concepts for image emotion recognition

A supervised contrastive learning-based model for image emotion classification

Affective image recognition with multi-attribute knowledge in deep neural networks

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Concept-guided multi-level attention network for image emotion recognition

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Exploiting emotional concepts for image emotion recognition

A supervised contrastive learning-based model for image emotion classification

Affective image recognition with multi-attribute knowledge in deep neural networks

Data availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now