Abstract
Image emotion recognition aims to predict people’s emotional response toward visual stimuli. Recently, emotional region discovery has become hot topic in this field because it brings significant improvement for the task. Existing studies mainly discover emotional region by sophisticated analyzing from object aspect, which is less discriminative for emotion. In this paper, we propose a Concept-guided Multi-level Attention Network (CMANet) that makes full use of attribute aspect concept to enhance image emotion recognition. To leverage multiple concepts to guide the mining of emotional region, CMANet is designed as multi-level architecture, in which attended semantic feature is firstly calculated under the guidance of feature from holistic branch. Subsequently, with the obtained attended semantic feature, emotional region of feature map in local branch can be attended on. And then, an adaptive fusion method is proposed to achieve complementation of both attended visual and semantic features. Notably, for the emotion categories that are easily to be confused, a novel variable weight cross-entropy loss which enables the model to focus on the hard samples, is proposed to improve the performance of the task. Experiments on several affective image datasets prove that the proposed method is effective and superior to the state-of-the-art methods.
Similar content being viewed by others
Data availability
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
References
Yadollahi, A., Shahraki, A.G., Zaiane, O.R.: Current state of text sentiment analysis from opinion to emotion mining. ACM Comput. Surv. (CSUR) 50(2), 1–33 (2017)
Asabere, N.Y., Acakpovi, A., Michael, M.B.: Improving socially-aware recommendation accuracy through personality. IEEE Trans. Affect. Comput. 9(3), 351–361 (2017)
Ortis, A., Farinella, G.M., Battiato, S.: Survey on visual sentiment analysis. IET Image Proc. 14(8), 1440–1456 (2020)
Mittal, N., Sharma, D., Joshi, M.L.: Image sentiment analysis using deep learning[C]//2018 IEEE/WIC/ACM international conference on web intelligence (WI). IEEE, 2018: 684–687.
Yang, J., She, D., Sun, M.: Joint image emotion classification and distribution learning via deep convolutional neural network. In: IJCAI. Pp. 3266–3272. (2017)
He, X., Zhang, W.: Emotion recognition by assisted learning with convolutional neural networks. Neurocomputing 291, 187–194 (2018)
Das, P., Ghosh, A., Majumdar, R.: Determining attention mechanism for visual sentiment analysis of an image using svm classifier in deep learning based architecture. In: 2020 8th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions)(ICRITO). IEEE, pp. 339–343. (2020)
Liu, C., Huang, L., Wei, Z., Zhang, W.: Subtler mixed attention network on fine-grained image classification. Appl. Intell. 51, 1–14 (2021)
Wu, Z., Meng, M., Wu, J.: Visual sentiment prediction with attribute augmentation and multi-attention mechanism. Neural. Process. Lett. 51, 2403–2416 (2020)
Zhang, J., Liu, X., Chen, M., Ye, Q., Wang, Z.: Image sentiment classification via multi-level sentiment region correlation analysis. Neurocomputing 469, 221–233 (2022)
Yang, J., Gao, X., Li, L., Wang, X., Ding, J.: SOLVER: scene-object interrelated visual emotion reasoning network. IEEE Trans. Image Process. 30, 8686–8701 (2021)
Zhang, J., Liu, X., Wang, Z., Yang, H.: Graph-based object semantic refinement for visual emotion recognition. IEEE Trans. Circuits Syst. Video Technol. 32(5), 3036–3049 (2021)
Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst., 28. (2015)
Frijda, N.H.: Emotion experience and its varieties. Emot. Rev. 1(3), 264–271 (2009)
Bar, M.: Visual objects in context. Nat. Rev. Neurosci. 5(8), 617–629 (2004)
Chen, T., Borth, D., Darrell, T., Chang, S.F.: Deepsentibank: Visual sentiment concept classification with deep convolutional neural networks. arXiv preprint https://arxiv.org/abs/1410.8586, (2014)
Zhang, H., Gönen, M., Yang, Z., Oja, E.: Understanding emotional impact of images using Bayesian multiple kernel learning. Neurocomputing 165, 3–13 (2015)
Rao, T., Xu, M., Liu, H., Wang, J., Burnett, I.: Multi-scale blocks based image emotion classification using multiple instance learning. In: 2016 IEEE International Conference on Image Processing (ICIP). IEEE, pp. 634–638. (2016)
Deng, J., Dong, W., Socher, R., Li, L. J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. IEEE, pp. 248–255. (2009)
Zhou, B., Lapedriza, A., Khosla, A., Oliva, A., Torralba, A.: Places: a 10 million image database for scene recognition. IEEE Trans. Pattern Anal. Mach. Intell. 40(6), 1452–1464 (2017)
Ahsan, U., De Choudhury, M., Essa, I.: Towards using visual attributes to infer image sentiment of social events. In: 2017 International Joint Conference on Neural Networks (IJCNN). IEEE, pp. 1372–1379. (2017)
Borth, D., Ji, R., Chen, T., Breuel, T., Chang, S. F.: Large-scale visual sentiment ontology and detectors using adjective noun pairs. In: Proceedings of the 21st ACM international conference on Multimedia. pp. 223–232. (2013)
Yuan, J., Mcdonough, S., You, Q., Luo, J.: Sentribute: image sentiment analysis from a mid-level perspective. In: Proceedings of the second international workshop on issues of sentiment discovery and opinion mining. pp. 1–8. (2013)
Ali, A. R., Shahid, U., Ali, M., Ho, J.: High-level concepts for affective understanding of images. In: 2017 IEEE winter conference on applications of computer vision (WACV). IEEE, pp. 679–687. (2017)
Zhang, J., Chen, M., Sun, H., Li, D., Wang, Z.: Object semantics sentiment correlation analysis enhanced image sentiment classification. Knowl.-Based Syst. 191, 105245 (2020)
Gao, Y., Zhou, M., Metaxas, D.N.: UTNet: a hybrid transformer architecture for medical image segmentation. In: Medical Image Computing and Computer Assisted Intervention-MICCAI 2021: 24th International Conference, Strasbourg, France, September 27-October 1, 2021, Proceedings, Part III 24. Springer International Publishing, pp. 61–71. (2021)
Chen, W., Huang, H., Peng, S., Zhou, C., Zhang, C.: YOLO-face: a real-time face detector. Vis. Comput. 37, 805–813 (2021)
Biten, A.F., Mafla, A., Gómez, L., Karatzas, D.: Is an image worth five sentences? a new look into semantics for image-text matching. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision. pp. 1391–1400. (2022)
Liang, Y., Maeda, K., Ogawa, T., Haseyama, M.: Deep metric network via heterogeneous semantics for image sentiment analysis. In: 2021 IEEE International Conference on Image Processing (ICIP). IEEE, pp. 1039–1043. (2021)
Li, Z., Sun, Q., Guo, Q., Wu, H., Deng, L., Zhang, Q., Chen, Y.: Visual sentiment analysis based on image caption and adjective–noun–pair description. Soft. Comput. (2021). https://doi.org/10.1007/s00500-021-06530-6
You Q., Jin H., Luo J.: Visual sentiment analysis by attending on local image regions. In: Proceedings of the AAAI conference on artificial intelligence. 31(1). (2017)
Li, Z., Lu, H., Zhao, C., Feng, L., Gu, G., Chen, W.: Weakly supervised discriminate enhancement network for visual sentiment analysis. Artif. Intell. Rev. 56(2), 1763–1785 (2023)
She, D., Yang, J., Cheng, M.M., Lai, Y.K., Rosin, P.L., Wang, L.: Wscnet: Weakly supervised coupled networks for visual sentiment classification and detection. IEEE Trans. Multimedia 22(5), 1358–1371 (2019)
Zhang, H., Xu, M.: Weakly supervised emotion intensity prediction for recognition of emotions in images. IEEE Trans. Multimedia 23, 2033–2044 (2020)
Xu, Z., Wang, S.: Emotional attention detection and correlation exploration for image emotion distribution learning. IEEE Trans. Affect. Comput., (2021)
Fan, S., Jiang, M., Shen, Z., Koenig, B.L., Kankanhalli, M.S., Zhao, Q.: The role of visual attention in sentiment prediction. In: Proceedings of the 25th ACM international conference on Multimedia. pp. 217–225. (2017)
Song, K., Yao, T., Ling, Q.: Boosting image sentiment analysis with visual attention. Neurocomputing 312, 218–228 (2018)
Wu, L., Qi, M., Jian, M., Zhang, H.: Visual sentiment analysis by combining global and local information. Neural. Process. Lett. 51, 2063–2075 (2020)
Pennington, J., Socher, R., Manning, C. D.: Glove: Global vectors for word representation. In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). pp. 1532–1543. (2014)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 770–778. (2016)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Polosukhin, I.: Attention is all you need. Adv. Neural Inf. Process. Syst., 30. (2017)
Mikels, J.A., Fredrickson, B.L., Larkin, G.R.: Emotional category data on images from the international affective picture system. Behav. Res. Methods 37(4), 626–630 (2005)
Ekman, P.: An argument for basic emotions. Cogn. Emot. 6(3–4), 169–200 (1992)
Yao, X., Zhao, S., Lai, Y.K., She, D., Liang, J., Yang, J.: APSE: attention-aware polarity-sensitive embedding for emotion-based image retrieval. IEEE Trans. Multimedia 23, 4469–4482 (2020)
Yao, X., She, D., Zhang, H., Yang, J., Cheng, M.M., Wang, L.: Adaptive deep metric learning for affective image retrieval and classification. IEEE Trans. Multimed. 23, 1640–1653 (2020)
You, Q., Luo, J., Jin, H., Yang, J.: Building a large scale dataset for image emotion recognition: The fine print and the benchmark. In: Proceedings of the AAAI conference on artificial intelligence. 30(1). (2016)
Peng, K.C., Chen, T., Sadovnik, A., Gallagher, A.C.: A mixed bag of emotions: Model, predict, and transfer emotion distributions. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 860–868. (2015)
You, Q., Luo, J., Jin, H., Yang, J.: Robust image sentiment analysis using progressively trained and domain transferred deep networks. In: Proceedings of the AAAI conference on Artificial Intelligence. 29(1). (2015)
Rao, T., Li, X., Xu, M.: Learning multi-level deep representations for image emotion classification. Neural. Process. Lett. 51, 2043–2061 (2020)
Zhang, H., Xu, D., Luo, G., He, K.: Learning multi-level representations for affective image recognition. Neural Comput. Appl. 34(16), 14107–14120 (2022)
Yang, J., Li, J., Wang, X., Ding, Y., Gao, X.: Stimuli-aware visual emotion analysis. IEEE Trans. Image Process. 30, 7432–7445 (2021)
Funding
This work was supported by the National Natural Science Foundation of China (62071384), the Key Research and Development Project of Shaanxi Province of China (2023-YBGY-239).
Author information
Authors and Affiliations
Contributions
Conceptualization contributed by HY, YF, ZG; methodology contributed by HY; formal analysis and investigation contributed by GL; writing—original draft preparation contributed by HY; writing—review and editing contributed by YF, ZG; funding acquisition contributed by ZG; supervision contributed by SL.
Corresponding author
Ethics declarations
Conflict of interest
The authors have no competing interests to declare that are relevant to the content of this article.
Ethical approval
This research does not involve human participants or animals.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Yang, H., Fan, Y., Lv, G. et al. Concept-guided multi-level attention network for image emotion recognition. SIViP 18, 4313–4326 (2024). https://doi.org/10.1007/s11760-024-03074-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11760-024-03074-8