Video Rumor Classification Based on Multi-modal Theme and Keyframe Fusion

Jinpeng You¹³,
Yanghao Lin¹³,
Dazhen Lin¹³ &
…
Donglin Cao^13,14

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1681))

Included in the following conference series:

CCF Conference on Computer Supported Cooperative Work and Social Computing

674 Accesses

Abstract

In recent years, short video platforms have become the main source of online rumors. According to the statistics of Shanghai online rumor refutation platform in 2021, the number of short video rumors was about five times that of short video rumors in 2020, which makes it necessary to detect rumors of short videos. At present, short video rumor detection has the problem of multi-modal information fusion, the traditional multi-modal fusion uses deep learning to obtain the underlying features of multi-modality and then aggregate them into cross-modal features. However, there are distortions of theme and tampering with key-frame in rumor videos. Therefore, short video rumors need to learn features from the perspective of theme and key-frame. Aiming at the problem of multi-modal information fusion of short video rumors, this paper proposes a short video rumor detection model (TKCM) based on theme and key-frame. It uses aggregation network to obtain the theme feature of video, attention network to obtain the key-frame feature, and fuses multi-modality by modal adjustment mechanism for short video rumor detection. Experimental results show that the F1 score of the proposed method on the short video rumor dataset is improved by 2%–5% compared with some state-of-the-art video classification models.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Multimodal Violent Video Recognition Based on Mutual Distillation

Evaluation of multiple features for violent scenes detection

Article 25 February 2016

A survey on video content rating: taxonomy, challenges and open issues

Article 31 March 2021

References

Singhal, S., Shah, R., Chakraborty, T., et al.: Spotfake: A multi-modal framework for fake news detection. In: 2019 IEEE Fifth International Conference on Multimedia Big Data (BigMM). IEEE, 2019 pp. 39–47 (2019)
Google Scholar
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013)
Devlin, J., Chang, M.W., Lee, K., et al.: Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018. Author, F.: Contribution title. In: 9th International Proceedings on Proceedings, pp. 1–2. Publisher, Location (2010)
Singhal, S., Kabra, A., Sharma, M., et al.: Spotfake+: A multimodal framework for fake news detection via transfer learning (student abstract). In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34(10), 13915–13916 (2020)
Google Scholar
Yang, Z., Dai, Z., Yang, Y., et al.: Xlnet: Generalized autoregressive pretraining for lan-guage understanding. Adv. Neural Inform. Process. Syst. 32 (2019)
Google Scholar
Liu, J., Feng, K., Jeff, Z.P., et al.: MSRD: Multi-Modal Web Rumor Detection Method. J. Comput. Res. Develop. 57(11), 2328 (2020)
Google Scholar
Wang, Y., Ma, F., Jin, Z., et al.: Eann: Event adversarial neural networks for multi-modal fake news detection. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and data Mining, pp. 849–857 (2018)
Google Scholar
Khattar, D., Goud, J.S., Gupta, M., et al.: Mvae: Multimodal variational autoencoder for fake news detection. In: The World Wide Web Conference, pp. 2915–2921 (2019)
Google Scholar
Zhou, X., Wu, J., Zafarani, R.: $\sf SAFE$: similarity-aware multi-modal fake news detection. In: Lauw, H.W., Wong, R.C.-W., Ntoulas, A., Lim, E.-P., Ng, S.-K., Pan, S.J. (eds.) PAKDD 2020. LNCS (LNAI), vol. 12085, pp. 354–367. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-47436-2_27
Chapter Google Scholar
Xue, J., Wang, Y., Tian, Y., et al.: Detecting fake news by exploring the consistency of mul-timodal data. Inform. Process. Manage. 58(5), 102610 (2021)
Article Google Scholar
Jin, Z., Cao, J., Guo, H., et al.: Multimodal fusion with recurrent neural networks for rumor detection on microblogs. In: Proceedings of the 25th ACM International Conference on Multimedia, pp. 795–816 (2017)
Google Scholar
Zhang, H., Fang, Q., Qian, S., et al.: Multi-modal knowledge-aware event memory network for social media rumor detection. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 1942–1951 (2019)
Google Scholar
Wu, Y., Zhan, P., Zhang, Y., et al.: Multimodal Fusion with Co-Attention Networks for Fake News Detection. In: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp. 2560–2569 (2021)
Google Scholar
Qian, S., Wang, J., Hu, J., et al.: Hierarchical multi-modal contextual attention network for fake news detection. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 153–162 (2021)
Google Scholar
Zhang, W., Gui, L., He, Y.: Supervised Contrastive Learning for Multimodal Unreliable News Detection in COVID-19 Pandemic. In: Proceedings of the 30th ACM International Conference on Information and Knowledge Management, pp. 3637–3641 (2021)
Google Scholar
He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Qi, P., Cao, J., Li, X., et al.: Improving Fake News Detection by Using an Entity-enhanced Framework to Fuse Diverse Multimodal Clues. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 1212–1220 (2021)
Google Scholar
Long, X., Gan, C., De Melo, G., et al.: Attention clusters: Purely attention based local fea-ture integration for video classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7834–7843 (2018)
Google Scholar
Li, Z., Gavrilyuk, K., Gavves, E., et al.: Videolstm convolves, attends and flows for action recognition. Comput. Vis. Image Underst. 166, 41–50 (2018)
Article Google Scholar
Sun L, Jia K, Yeung D Y, et al. Human action recognition using factorized spatio-temporal convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4597–4605 (2015)
Google Scholar
Girdhar, R., Ramanan, D., Gupta, A., et al.: Actionvlad: Learning spatio-temporal aggregation for action classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 971–980 (2017)
Google Scholar
Yang, C., Xu, Y., Shi, J., et al.: Temporal pyramid network for action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 591–600 (2020)
Google Scholar
Jin, Z., Cao, J., Guo, H., et al.: Multimodal fusion with recurrent neural networks for rumor detection on microblogs. In: Proceedings of the 25th ACM International Conference on Multimedia, pp. 795–816 (2017)
Google Scholar
Lv, Z., Lei, T., Liang, X., et al.: A Multi-modal System for Video Semantic Understanding. In: China Conference on Knowledge Graph and Semantic Computing. Springer, Singapore, pp. 34–43 (2021). https://doi.org/10.1007/978-981-19-0713-5_5

Download references

Acknowledgements

This work is supported by the National Natural Science Foundation of China (No. 62076210), the Natural Science Foundation of Xiamen (No. 3502Z20227188) and the Open Project Program of The Key Laboratory of Cognitive Computing and Intelligent Information Processing of Fujian Education InstitutionsWuyi University(No.KLCCIIP2020203)

Author information

Authors and Affiliations

Artificial Intelligence Department, Xiamen University, Xiamen, 361005, China
Jinpeng You, Yanghao Lin, Dazhen Lin & Donglin Cao
The Key Laboratory of Cognitive Computing and Intelligent Information Processing of Fujian Education Institutions, Wuyi University, Wuyishan, 354300, China
Donglin Cao

Authors

Jinpeng You
View author publications
You can also search for this author in PubMed Google Scholar
Yanghao Lin
View author publications
You can also search for this author in PubMed Google Scholar
Dazhen Lin
View author publications
You can also search for this author in PubMed Google Scholar
Donglin Cao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Dazhen Lin .

Editor information

Editors and Affiliations

Shandong University, Jinan, China
Yuqing Sun
Fudan University, Shanghai, China
Tun Lu
Taiyuan University of Science and Technology, Taiyuan, China
Yinzhang Guo
Shanxi Datong University, Datong, China
Xiaoxia Song
Tongji University, Shanghai, China
Hongfei Fan
Guangdong University of Technology, Guangzhou, China
Dongning Liu
University of Shanghai for Science and Technology, Shanghai, China
Liping Gao
Tongji University, Shanghai, China
Bowen Du

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

You, J., Lin, Y., Lin, D., Cao, D. (2023). Video Rumor Classification Based on Multi-modal Theme and Keyframe Fusion. In: Sun, Y., et al. Computer Supported Cooperative Work and Social Computing. ChineseCSCW 2022. Communications in Computer and Information Science, vol 1681. Springer, Singapore. https://doi.org/10.1007/978-981-99-2356-4_5

Download citation

DOI: https://doi.org/10.1007/978-981-99-2356-4_5
Published: 13 May 2023
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-2355-7
Online ISBN: 978-981-99-2356-4
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Societies and partnerships

the China Computer Federation (CCF) (opens in a new tab)

Video Rumor Classification Based on Multi-modal Theme and Keyframe Fusion

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Multimodal Violent Video Recognition Based on Mutual Distillation

Evaluation of multiple features for violent scenes detection

A survey on video content rating: taxonomy, challenges and open issues

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Societies and partnerships

Subscribe and save

Buy Now

Navigation

Video Rumor Classification Based on Multi-modal Theme and Keyframe Fusion

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

Multimodal Violent Video Recognition Based on Mutual Distillation

Evaluation of multiple features for violent scenes detection

A survey on video content rating: taxonomy, challenges and open issues

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Societies and partnerships

Search

Navigation