Abstract
In recent years, short video platforms have become the main source of online rumors. According to the statistics of Shanghai online rumor refutation platform in 2021, the number of short video rumors was about five times that of short video rumors in 2020, which makes it necessary to detect rumors of short videos. At present, short video rumor detection has the problem of multi-modal information fusion, the traditional multi-modal fusion uses deep learning to obtain the underlying features of multi-modality and then aggregate them into cross-modal features. However, there are distortions of theme and tampering with key-frame in rumor videos. Therefore, short video rumors need to learn features from the perspective of theme and key-frame. Aiming at the problem of multi-modal information fusion of short video rumors, this paper proposes a short video rumor detection model (TKCM) based on theme and key-frame. It uses aggregation network to obtain the theme feature of video, attention network to obtain the key-frame feature, and fuses multi-modality by modal adjustment mechanism for short video rumor detection. Experimental results show that the F1 score of the proposed method on the short video rumor dataset is improved by 2%–5% compared with some state-of-the-art video classification models.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Similar content being viewed by others
References
Singhal, S., Shah, R., Chakraborty, T., et al.: Spotfake: A multi-modal framework for fake news detection. In: 2019 IEEE Fifth International Conference on Multimedia Big Data (BigMM). IEEE, 2019 pp. 39–47 (2019)
Kingma, D.P., Welling, M.: Auto-encoding variational bayes. arXiv preprint arXiv:1312.6114 (2013)
Devlin, J., Chang, M.W., Lee, K., et al.: Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805, 2018. Author, F.: Contribution title. In: 9th International Proceedings on Proceedings, pp. 1–2. Publisher, Location (2010)
Singhal, S., Kabra, A., Sharma, M., et al.: Spotfake+: A multimodal framework for fake news detection via transfer learning (student abstract). In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 34(10), 13915–13916 (2020)
Yang, Z., Dai, Z., Yang, Y., et al.: Xlnet: Generalized autoregressive pretraining for lan-guage understanding. Adv. Neural Inform. Process. Syst. 32 (2019)
Liu, J., Feng, K., Jeff, Z.P., et al.: MSRD: Multi-Modal Web Rumor Detection Method. J. Comput. Res. Develop. 57(11), 2328 (2020)
Wang, Y., Ma, F., Jin, Z., et al.: Eann: Event adversarial neural networks for multi-modal fake news detection. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and data Mining, pp. 849–857 (2018)
Khattar, D., Goud, J.S., Gupta, M., et al.: Mvae: Multimodal variational autoencoder for fake news detection. In: The World Wide Web Conference, pp. 2915–2921 (2019)
Zhou, X., Wu, J., Zafarani, R.: \(\sf SAFE\): similarity-aware multi-modal fake news detection. In: Lauw, H.W., Wong, R.C.-W., Ntoulas, A., Lim, E.-P., Ng, S.-K., Pan, S.J. (eds.) PAKDD 2020. LNCS (LNAI), vol. 12085, pp. 354–367. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-47436-2_27
Xue, J., Wang, Y., Tian, Y., et al.: Detecting fake news by exploring the consistency of mul-timodal data. Inform. Process. Manage. 58(5), 102610 (2021)
Jin, Z., Cao, J., Guo, H., et al.: Multimodal fusion with recurrent neural networks for rumor detection on microblogs. In: Proceedings of the 25th ACM International Conference on Multimedia, pp. 795–816 (2017)
Zhang, H., Fang, Q., Qian, S., et al.: Multi-modal knowledge-aware event memory network for social media rumor detection. In: Proceedings of the 27th ACM International Conference on Multimedia, pp. 1942–1951 (2019)
Wu, Y., Zhan, P., Zhang, Y., et al.: Multimodal Fusion with Co-Attention Networks for Fake News Detection. In: Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021, pp. 2560–2569 (2021)
Qian, S., Wang, J., Hu, J., et al.: Hierarchical multi-modal contextual attention network for fake news detection. In: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 153–162 (2021)
Zhang, W., Gui, L., He, Y.: Supervised Contrastive Learning for Multimodal Unreliable News Detection in COVID-19 Pandemic. In: Proceedings of the 30th ACM International Conference on Information and Knowledge Management, pp. 3637–3641 (2021)
He, K., Zhang, X., Ren, S., et al.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Qi, P., Cao, J., Li, X., et al.: Improving Fake News Detection by Using an Entity-enhanced Framework to Fuse Diverse Multimodal Clues. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 1212–1220 (2021)
Long, X., Gan, C., De Melo, G., et al.: Attention clusters: Purely attention based local fea-ture integration for video classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 7834–7843 (2018)
Li, Z., Gavrilyuk, K., Gavves, E., et al.: Videolstm convolves, attends and flows for action recognition. Comput. Vis. Image Underst. 166, 41–50 (2018)
Sun L, Jia K, Yeung D Y, et al. Human action recognition using factorized spatio-temporal convolutional networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 4597–4605 (2015)
Girdhar, R., Ramanan, D., Gupta, A., et al.: Actionvlad: Learning spatio-temporal aggregation for action classification. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 971–980 (2017)
Yang, C., Xu, Y., Shi, J., et al.: Temporal pyramid network for action recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 591–600 (2020)
Jin, Z., Cao, J., Guo, H., et al.: Multimodal fusion with recurrent neural networks for rumor detection on microblogs. In: Proceedings of the 25th ACM International Conference on Multimedia, pp. 795–816 (2017)
Lv, Z., Lei, T., Liang, X., et al.: A Multi-modal System for Video Semantic Understanding. In: China Conference on Knowledge Graph and Semantic Computing. Springer, Singapore, pp. 34–43 (2021). https://doi.org/10.1007/978-981-19-0713-5_5
Acknowledgements
This work is supported by the National Natural Science Foundation of China (No. 62076210), the Natural Science Foundation of Xiamen (No. 3502Z20227188) and the Open Project Program of The Key Laboratory of Cognitive Computing and Intelligent Information Processing of Fujian Education InstitutionsWuyi University(No.KLCCIIP2020203)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
You, J., Lin, Y., Lin, D., Cao, D. (2023). Video Rumor Classification Based on Multi-modal Theme and Keyframe Fusion. In: Sun, Y., et al. Computer Supported Cooperative Work and Social Computing. ChineseCSCW 2022. Communications in Computer and Information Science, vol 1681. Springer, Singapore. https://doi.org/10.1007/978-981-99-2356-4_5
Download citation
DOI: https://doi.org/10.1007/978-981-99-2356-4_5
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-99-2355-7
Online ISBN: 978-981-99-2356-4
eBook Packages: Computer ScienceComputer Science (R0)