[go: up one dir, main page]

Skip to main content
Log in

A deep learning architecture of RA-DLNet for visual sentiment analysis

  • Regular Paper
  • Published:
Multimedia Systems Aims and scope Submit manuscript

Abstract

Visual media has become one of the most potent means of conveying opinions or sentiments on the web. Millions of photos are being uploaded by the people on famous social networking sites for expressing themselves. The area of visual sentiment analysis is abstract in nature due to the high level of biasing in the human recognition process. This work proposes a residual attention-based deep learning network (RA-DLNet), which examines the problem of visual sentiment analysis. We aim to learn the spatial hierarchies of image features using CNN. Since the local regions in the image convey significant sentiments, we apply residual attention model, which focuses on crucial sentiment-rich, local regions in the image. The significant contribution of this work also includes an exhaustive analysis of seven popular CNN-based architectures such as VGG-16, VGG-19, Inception-Resnet-V2, Inception-V3, ResNet-50, Xception, and NASNet. The impact of fine-tuning on these CNN variants is demonstrated in visual sentiment analysis domain. The extensive experiments on eight popular benchmark data sets are conducted and the performance is measured in terms of accuracy. The comparison of accuracy with similar state-of-the-art exhibits the superiority of the proposed work.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Explore related subjects

Discover the latest articles, news and stories from top researchers in related subjects.

References

  1. You, Q., Luo, J., Jin, H., Yang, J.: Robust image sentiment analysis using progressively trained and domain transferred deep networks. In: Twenty-Ninth AAAI Conference on Artificial Intelligence, pp. 381–388. USA (2015)

  2. Ohn-bar, E., Trivedi, M.M.: Multi-scale volumes for deep object detection and localization. Pattern Recogn. 61, 557–572 (2016)

    Article  Google Scholar 

  3. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Region-based convolutional networks for accurate object detection and segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 38(1), 142–158 (2016)

    Article  Google Scholar 

  4. Oquab, M., Bottou, L.: Learning and transferring mid-level image representations using convolutional neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition Learning, pp. 1717–1724. Columbus, OH (2014)

  5. Deng, J., Dong, W., Socher, R., Li, L., Li, K., Fei-fei, L.: ImageNet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. Florida (2009)

  6. Chu, B., Madhavan, V., Beijbom, O., Hoffman, J., Darrell, T.: Best practices for fine-tuning visual classifiers to new domains. In: Hua, G., Jégou, H. (eds.) European Conference on Computer Vision, pp. 435–442. Springer, Amsterdam (2016)

    Google Scholar 

  7. Borth, D., Ji, R., Chen, T., Breuel, T., Chang, S.-F.: Large-scale visual sentiment ontology and detectors using adjective noun pairs. In: 21st ACM International Conference on Multimedia, pp. 223–232 (2013)

  8. Siersdorfer, S., Minack, E., Deng, F., Hare, J.: Analyzing and predicting sentiment of images on the social web. In: 18th ACM International Conference on Multimedia, pp. 715–718 (2010)

  9. Vonikakis, V., Winkler, S.: Emotion-based sequence of family photos. In: Proceedings of the 20th ACM International conference on Multimedia, pp. 1371–1372 (2012)

  10. Jia, J., Wu, S., Wang, X., Hu, P., Cai, L., Tang, J.: Can we understand van gogh’s mood? Learning to infer affects from images in social networks. In: 20th ACM International Conference on Multimedia, pp. 857–860 (2012)

  11. Li, B., Feng, S., Xiong, W., Hu, W.: Scaring or pleasing: exploit emotional impact of an image. In: 20th ACM International Conference on Multimedia, pp. 1365–1366 (2012)

  12. Wang, S., Wang, J., Wang, Z., Ji, Q.: Multiple emotion tagging for multimedia data by exploiting high-order dependencies among emotions. IEEE Trans. Multimedia 17(12), 2185–2197 (2015)

    Article  Google Scholar 

  13. Yuan, J., You, Q., Mcdonough, S., Luo, J.: Sentribute: image sentiment analysis from a mid-level perspective. In: Second International Workshop on Issues of Sentiment Discovery and Opinion Mining, pp. 1–8. Chicago (2013)

  14. Zhao, S., Gao, Y., Jiang, X., Yao, H., Chua, T., Sun, X.: Exploring principles-of-art features for image emotion recognition. In: 22nd ACM International Conference on Multimedia, pp. 47–56. Florida (2014)

  15. Chen, Y., Chen, T., Liu, T., Liao, H.Y.M., Chang, S.: Assistive image comment robot—a novel mid-level concept-based representation. IEEE Trans. Affect. Comput. 6(3), 298–311 (2015)

    Article  Google Scholar 

  16. Chen, F., Ji, R., Su, J., Cao, D., Gao, Y.: Predicting microblog sentiments via weakly supervised multimodal deep learning. IEEE Trans. Multimedia 20(4), 997–1007 (2018)

    Article  Google Scholar 

  17. Yang, J., She, D., Sun, M., Cheng, M.-M., Rosin, P.L., Wang, L.: Visual sentiment prediction based on automatic discovery of affective regions. IEEE Trans. Multimedia 20, 2513–2525 (2018)

    Article  Google Scholar 

  18. Xiong, H., Liu, Q., Song, S., Cai, Y.: Region-based convolutional neural network using group sparse regularization for image sentiment classification. EURASIP J. Image Video Process. 30, 1–9 (2019)

    Google Scholar 

  19. Zhao, B., Wu, X., Feng, J., Peng, Q., Yan, S.: Diversified visual attention networks for fine-grained object classification. IEEE Trans. Multimedia 19(6), 1245–1256 (2017)

    Article  Google Scholar 

  20. Zoph, B., Vasudevan, V., Shlens, J., Le, Q.V.: Learning transferable architectures for scalable image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 8697–8710. Utah (2018)

  21. Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. In: International Conference on Learning Representations. California (2015)

  22. Szegedy, C., Vanhoucke, V., Shlens, J., Wojna, Z.: Rethinking the inception architecture for computer vision. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 2818–2826 (2015)

  23. Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: International Conference on Learning Representations. California (2015)

  24. Wang, F., Jiang, M., Qian, C., Yang, S., Li, C., Zhang, H., Wang, X., Tang, X.: Residual attention network for image classification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3156–3164 (2017)

  25. Campos, V., Salvador, A., Jou, B., Giró-i-nieto, X.: Diving deep into sentiment: understanding fine-tuned CNNs for visual sentiment prediction. In: 1st International Workshop on Affect & Sentiment in Multimedia, pp. 57–62 (2015)

  26. Wang, J., Fu, J., Xu, Y., Mei, T.: Beyond object recognition: visual sentiment analysis with deep coupled adjective and noun neural networks. In: Twenty-Fifth International Joint Conference on Artificial Intelligence, pp. 3484–3490. New York (2016)

  27. Song, K., Yao, T., Ling, Q., Mei, T.: Boosting image sentiment analysis with visual attention. Neurocomputing 312, 218–228 (2018)

    Article  Google Scholar 

  28. Islam, J., Zhang, Y.: Visual sentiment analysis for social images using transfer learning approach. In: IEEE International Conferences on Big Data and Cloud Computing (BDCloud), Social Computing and Networking (SocialCom), Sustainable Computing and Communications (SustainCom), pp. 124–130. Atlanta (2016)

  29. Fan, S., Jiang, M., Shen, Z., Koenig, B.L., Kankanhalli, M.S., Zhao, Q.: The role of visual attention in sentiment prediction. In: 25th ACM International Conference on Multimedia, pp. 217–225. California (2017)

  30. Sharma, R., Tan, L.N., Sadat, F.: Multimodal sentiment analysis using deep learning. In: 17th IEEE International Conference on Machine Learning and Applications, pp. 1475–1478 (2018)

  31. Li, Z., Jiao, Y., Yang, X., Zhang, T., Huang, S.: 3D attention-based deep ranking model for video highlight detection. IEEE Trans. Multimedia 20(10), 2693–2705 (2018)

    Article  Google Scholar 

  32. Szegedy, C., Ioffe, S., Vanhoucke, V., Alemi, A.: Inception-v4, inception-ResNet and the impact of residual connections on learning. In: 31st AAAI Conference on Artificial Intelligence, pp. 4278–4284. Arizona (2016)

  33. Li, Z., Fan, Y., Liu, W., Wang, F.: Image sentiment prediction based on textual descriptions with adjective noun pairs. Multimedia Tools Appl. 77(1), 1115–1132 (2017)

    Article  Google Scholar 

  34. Yang, H., Yuan, C., Li, B., Du, Y., Xing, J.: Asymmetric 3D convolutional neural networks for action recognition. Pattern Recogn. 85, 1–12 (2019)

    Article  Google Scholar 

  35. Chollet, F.: Xception: deep learning with depthwise separable convolutions. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1251–1258. Honolulu (2017)

  36. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778. Las Vegas (2016)

  37. Xiao, T., Li, H., Ouyang, W., Wang, X.: Learning deep feature representations with domain guided dropout for person re-identification. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1249–1258. Las Vegas (2016)

  38. Toshev, A., Szegedy, C.: DeepPose: human pose estimation via deep neural networks. In: IEEE Conference on Computer Vision and Pattern Recognition, Columbus, pp. 1653–1660. Ohio (2014)

  39. Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587. Columbus, Ohio (2014)

  40. Schulman, J., Wolski, F., Dhariwal, P., Radford, A., Klimov, O.: Proximal policy optimization algorithms (2017). arXiv preprint arXiv:1707.06347

  41. Machajdik, J., Hanbury, A.: Affective image classification using features inspired by psychology and art theory. In: ACM International Conference on Multimedia, pp. 83–92 (2010)

  42. Wang, X., Jia, J., Yin, J., Cai, L.: Interpretable aesthetic features for affective image classification. In: IEEE International Conference on Image Processing, pp. 3230–3234 (2013)

  43. Rao, T., Xu, M., Liu, H., Wang, J., Burnett, I.: Multi-scale blocks based image emotion classification using multiple instance learning. In: IEEE International Conference on Image Processing (ICIP), pp. 634–638. Arizona (2016)

  44. Rao, T., Xu, M., Liu, H.: Generating affective maps for images. Multimedia Tools Appl. 77(13), 17247–17267 (2018)

    Article  Google Scholar 

  45. Liu, X., Li, N., Xia, Y.: Affective image classification by jointly using interpretable art features. J. Vis. Commun. Image Represent. 58, 576–588 (2019)

    Article  Google Scholar 

  46. Campos, V., Jou, B., Giro-i-Nieto, X.: From pixels to sentiment: fine-tuning CNNs for visual sentiment prediction. Image Vis. Comput. 65, 15–22 (2017)

    Article  Google Scholar 

  47. Zoph, B., Le, Q.V.: Neural architecture search with reinforcement learning. In: International Conference on Learning representations (2017)

  48. Yadav, A., Vishwakarma, D.K.: Sentiment analysis using deep learning architectures: a review. Artificial Intelligence Review, pp. 1–51 (2019)

  49. She, D., Yang, J., Cheng, M.M., Lai, Y.K., Rosin, P.L., Wang, L.: WSCNet: weakly supervised coupled networks for visual sentiment classification and detection. IEEE Trans. Multimedia. 22(5), 1358–1371 (2019)

    Article  Google Scholar 

  50. Fan, S., Jiang, M., Koenig, B.L., Xu, J., Kankanhalli, M.S., Zhao, Q.: Emotional attention: a study of image sentiment and visual attention. In: IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7521–7531. Salt Lake (2018)

  51. Lee, J., Kim, S., Kim, S., Park, J., Sohn, K.: Context-aware emotion recognition networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 10143–10152. Seoul (2019)

  52. Bawa, V.S., Kumar, V.: "Emotional sentiment analysis for a group of people based on transfer learning with a multi-modal system. Neural Comput. Appl. 31(12), 9061–9072 (2018)

    Article  Google Scholar 

  53. Yang, J., She, D., Sun, M.: Joint image emotion classification and distribution learning via deep convolutional neural network. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence (IJCAI-17), pp. 3266–3272 (2017)

  54. Zhu, X., Li, L., Zhang, W., Rao, T., Xu, M., Huang, Q., Xu, D.: Dependency exploitation: a unified CNN-RNN approach for visual emotion recognition. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, pp. 3595–3601 (2017)

  55. Yang, J., She, D., Lai, Y.K., Yang, M.H.: Retrieving and classifying affective images via deep metric learning. In: The Thirty-Second AAAI Conference on Artificial Intelligence (AAAI-18), pp. 491–498. Louisiana (2018)

  56. Zhao, S., Lin, C., Xu, P., Zhao, S., Guo, Y., Krishna, R., Ding, G., Keutzer, K.: CycleEmotionGAN: emotional semantic consistency preserved CycleGAN for adapting image emotions. In: The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19), pp. 2620–2627. Hawaii (2019)

  57. Zhang, W., He, X., Lu, W.: Exploring discriminative representations for image emotion recognition with CNNs. IEEE Trans. Multimed. 22(2), 515–523 (2019)

    Article  Google Scholar 

  58. Chen, T., Borth, D., Darrell, T., Chang, S.F.: Deepsentibank: Visual sentiment concept classification with deep convolutional neural networks (2014). arXiv preprint arXiv:1410.8586

  59. Katsurai, M., Satoh, S.: Image sentiment analysis using latent correlations among visual, textual, and sentiment views. In: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 2837–2841 (2016)

  60. He, X., Zhang, H., Li, N., Feng, L., Zheng, F.: A multi-attentive pyramidal model for visual sentiment analysis. In: International Joint Conference on Neural Networks, pp. 1–8 (2019)

  61. Yang, J., She, D., Lai, Y.K., Rosin, P.L., Yang, M.H.: Weakly supervised coupled networks for visual sentiment analysis. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 7584–7592. Salt Lake City (2018)

  62. Zadeh, A., Zellers, R., Pincus, E., Morency, L.P.: MOSI: multimodal corpus of sentiment intensity and subjectivity analysis in online opinion videos (2016). arXiv preprint arXiv:1606.06259

  63. Zadeh, A., Liang, P.P., Vanbriesen, J., Poria, S., Tong, E., Cambria, E., Chen, M., Morency, L.P.: Multimodal language analysis in the wild: Cmu-mosei dataset and interpretable dynamic fusion graph. In: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics, vol. 1, pp. 2236–2246 (2018)

  64. You, Q., Luo, J., Jin, H., Yang, J.: Building a large scale dataset for image emotion recognition: the fine print and the benchmark. In: Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16), pp. 308–314. Arizona (2016)

  65. Dumpala, S.H., Sheikh, I., Chakraborty, R., Kopparapu, S.K.: Sentiment classification on erroneous ASR transcripts: a multi view learning approach. In: IEEE Spoken Language Technology Workshop (SLT 2018), pp. 807–814. Greece (2018)

  66. Dumpala, S.H., Sheikh, I., Chakraborty, R., Kopparapu, S.K.: Audio-visual fusion for sentiment classification using cross-modal autoencoder. In: 32nd Conference on Neural Information Processing Systems (NIPS 2018), pp. 1–4. Canada (2018)

  67. Chauhan, D.S., Akhtar, M.S., Ekbal, A., Bhattacharyya, P.: Context-aware interactive attention for multi-modal sentiment and emotion analysis. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing, pp. 5646–5656 (2019)

  68. Akhtar, M.S., Chauhan, D.S., Ghosal, D., Poria, S., Ekbal, A., Bhattacharyya, P.: Multi-task learning for multi-modal emotion recognition and sentiment analysis. In: Proceedings of 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1, pp. 370–379. Minnesota (2019)

  69. Sun, Z., Sarma, P.K., Sethares, W.A., Liang, Y.: Learning relationships between text, audio, and video via deep canonical correlation for multimodal language analysis. In: AAAI Conference on Artificial Intelligence (AAAI) (2019)

  70. Zadeh, A., Chen, M., Poria, S., Cambria, E., Morency, L.P.: Tensor fusion network for multimodal sentiment analysis. In: Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pp. 1103–1114 (2017)

  71. Chen, M., Wang,S., Liang, P.P., Baltrušaitis, T., Zadeh, A., Morency, L.P.: Multimodal sentiment analysis with word-level fusion and reinforcement learning. In: Proceedings of the 19th ACM International Conference on Multimodal Interaction (ICMI), pp. 163–171 (2017)

  72. Li, H., Xu, H.: Video-based sentiment analysis with hvnLBP-TOP feature and bi-LSTM. In: The Thirty-Third AAAI Conference on Artificial Intelligence (AAAI-19), pp. 9963–9964. Hawaii (2019)

  73. Zadeh, A., Liang, P.P., Poria, S., Vij, P., Cambria, E., Morency, L.P.: Multi-attention recurrent network for human communication comprehension. In: Thirty-Second AAAI Conference on Artificial Intelligence, pp. 5642–5649. Louisiana (2018)

  74. Yadav, A., Vishwakarma, D.K.: A comparative study on bio-inspired algorithms for sentiment analysis. Cluster Comput. (2020). https://doi.org/10.1007/s10586-020-03062-w

    Article  Google Scholar 

  75. Sun, Z., Sarma, P.K., Sethares, W., Bucy, E.P.: Multi-modal sentiment analysis using deep canonical correlation analysis. In: The 20th Annual Conference of the International Speech Communication Association (INTERSPEECH), pp. 1323–1327 (2019)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Dinesh Kumar Vishwakarma.

Additional information

Communicated by Y. Zhang.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Yadav, A., Vishwakarma, D.K. A deep learning architecture of RA-DLNet for visual sentiment analysis. Multimedia Systems 26, 431–451 (2020). https://doi.org/10.1007/s00530-020-00656-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00530-020-00656-7

Keywords

Navigation