Abstract
In this paper, we present a methodology for the early detection of fake news on emerging topics through the innovative application of weak supervision. Traditional techniques for fake news detection often rely on fact-checkers or supervised learning with labeled data, which is not readily available for emerging topics. To address this, we introduce the Weakly Supervised Text Classification framework (WeSTeC), an end-to-end solution designed to programmatically label large-scale text datasets within specific domains and train supervised text classifiers using the assigned labels. The proposed framework automatically generates labeling functions through multiple weak labeling strategies and eliminates underperforming ones. Labels assigned through the generated labeling functions are then used to fine-tune a pre-trained RoBERTa classifier for fake news detection. By using a weakly labeled dataset, which contains fake news related to the emerging topic, the trained fake news detection model becomes specialized for the topic under consideration. We explore both semi-supervision and domain adaptation setups, utilizing small amounts of labeled data and labeled data from other domains, respectively. The fake news classification model generated by the proposed framework excels when compared with all baselines in both setups. In addition, when compared to its fully supervised counterpart, our fake news detection model trained through weak labels achieves accuracy within 1%, emphasizing the robustness of the proposed framework’s weak labeling capabilities.



Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Availability of supporting data
The supporting data and associated programs for this journal submission are available in a dedicated GitHub repository. Access to these resources is available upon request. Interested parties may request access to the data and programs by contacting the corresponding author of this submission.
Notes
References
Dong, X., Victor, U., & Qian, L. (2020). Two-path deep semisupervised learning for timely fake news detection. IEEE Transactions on Computational Social Systems, 7(6), 1386–1398. https://doi.org/10.1109/TCSS.2020.3027639
D’ulizia, A., Caschera, M.C., Ferri, F., et al. (2021). Fake news detection: a survey of evaluation datasets. PeerJ Computer Science, 7, e518. https://doi.org/10.7717/peerj-cs.518
Galli, A., Masciari, E., Moscato, V., et al. (2022). A comprehensive benchmark for fake news detection. Journal of Intelligent Information Systems, 59(1), 237–261. https://doi.org/10.1007/s10844-021-00646-9
Gasparetto, A., Marcuzzo, M., Zangari, A., et al. (2022). A survey on text classification algorithms: From text to predictions. Information 13(2). https://doi.org/10.3390/info13020083
Gruppi, M., Horne, B.D., & Adalı, S. (2021). Nela-gt-2020: A large multi-labelled news dataset for the study of misinformation in news articles. arXiv preprint arXiv:2102.04567https://doi.org/10.48550/arXiv.2102.04567
Hamed, S. K., Aziz, M. J. A., & Yaakub, M. R. (2023). A review of fake news detection approaches: A critical analysis of relevant studies and highlighting key challenges associated with the dataset, feature representation, and data fusion. Heliyon 9(10). https://doi.org/10.1016/j.heliyon.2023.e20382
Horne, B. D., & Adali, S. (2017). This just in: Fake news packs a lot in title, uses simpler, repetitive content in text body, more similar to satire than real news. arXiv:1703.09398. https://api.semanticscholar.org/CorpusID:7083781
Hu, L., Wei, S., Zhao, Z., et al. (2022). Deep learning for fake news detection: A comprehensive survey. AI Open, 3, 133–155. https://doi.org/10.1016/j.aiopen.2022.09.001
Jlifi, B., Sakrani, C., & Duvallet, C. (2023). Towards a soft three-level voting model (soft t-lvm) for fake news detection. Journal of Intelligent Information Systems, 61(1), 249–269. https://doi.org/10.1007/s10844-022-00769-7
Konkobo, P. M., Zhang, R., Huang, S., et al. (2020). A deep learning model for early detection of fake news on social media. In: 2020 7th International Conference on Behavioural and Social Computing (BESC), IEEE, (pp 1–6). https://doi.org/10.1109/BESC51023.2020.9348311
Lazer, D. M., Baum, M. A., Benkler, Y., et al. (2018). The science of fake news. Science, 359(6380), 1094–1096. https://doi.org/10.1126/science.aao2998
Leite, J. A., Razuvayevskaya, O., Bontcheva, K., et al. (2023). Detecting misinformation with llm-predicted credibility signals and weak supervision. arXiv:2309.07601. https://doi.org/10.48550/arXiv.2309.07601
Li, Y., Lee, K., Kordzadeh, N., et al. (2021). Multi-source domain adaptation with weak supervision for early fake news detection. In: 2021 IEEE International Conference on Big Data (Big Data), IEEE, (pp. 668–676). https://doi.org/10.1109/BigData52589.2021.9671592
Liu, Y., Ott, M., Goyal, N., et al. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692. https://doi.org/10.48550/arXiv.1907.11692
Mohawesh, R., Maqsood, S., & Althebyan, Q. (2023). Multilingual deep learning framework for fake news detection using capsule neural network. Journal of Intelligent Information Systems (pp. 1–17). https://doi.org/10.1007/s10844-023-00788-y
Ngada, O., & Haskins, B. (2020). Fake news detection using content-based features and machine learning. In: 2020 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE), (pp. 1–6). https://doi.org/10.1109/CSDE50874.2020.9411638
Özgöbek, Ö., Kille, B., From, A. R., et al. (2022). Fake news detection by weakly supervised learning based on content features. In: Symposium of the Norwegian AI Society, (pp. 52–64), https://doi.org/10.1007/978-3-031-17030-0_5
Qin, Y., Wurzer, D., Lavrenko, V., et al. (2016). Spotting rumors via novelty detection. arXiv:1611.06322. https://doi.org/10.48550/arXiv.1611.06322
Ratner, A. J., Bach, S. H., Ehrenberg, H. R., et al. (2017). Snorkel: rapid training data creation with weak supervision. The VLDB Journal, 29, 709–730. https://doi.org/10.1007/s00778-019-00552-1
Raza, S., & Ding, C. (2022). Fake news detection based on news content and social contexts: a transformer-based approach. International Journal of Data Science and Analytics, 13, 335–362. https://doi.org/10.1007/s41060-021-00302-z
Ren, Y., Wang, B., Zhang, J., et al (2020) Adversarial active learning based heterogeneous graph neural network for fake news detection. 2020 IEEE International Conference on Data Mining (ICDM) (pp. 452–461). https://doi.org/10.1109/ICDM50108.2020.00054
Samadi, M., Mousavian, M., & Momtazi, S. (2021). Deep contextualized text representation and learning for fake news detection. Information processing & management 58(6). https://doi.org/10.1016/j.ipm.2021.102723
Shu, K., Zheng, G., Li, Y., et al. (2020). Early detection of fake news with multi-source weak social supervision. In: Machine Learning and Knowledge Discovery in Databases: European Conference, ECML PKDD 2020, Ghent, Belgium, Sep. 14–18, Proceedings, Part III, https://doi.org/10.1007/978-3-030-67664-3_39
Singh, V. K., Ghosh, I., & Sonagara, D. (2021). Detecting fake news stories via multimodal analysis. Journal of the Association for Information Science and Technology, 72(1), 3–17. https://doi.org/10.1002/asi.24359
Varma, P., & Ré, C. (2018). Snuba: Automating weak supervision to label training data. In: Proceedings of the VLDB Endowment. International Conference on Very Large Data Bases, (p. 223). https://doi.org/10.14778/3291264.3291268
Wang, Y., Yang, W., Ma, F., et al. (2020). Weak supervision for fake news detection via reinforcement learning. In: Proceedings of the AAAI conference on artificial intelligence, (pp. 516–523). https://doi.org/10.1609/aaai.v34i01.5389
Wu, R., Chen, S. E., Zhang, J., et al. (2023). Learning hyper label model for programmatic weak supervision. https://doi.org/10.48550/arXiv.2207.13545
Yuan C., et al. (2020) Early detection of fake news by utilizing the credibility of news, publishers, and users based on weakly supervised learning. In: Proceedings of the 28th International Conference on Computational Linguistics, Barcelona, Spain (Online), (pp. 5444–5454). https://doi.org/10.18653/v1/2020.coling-main.475
Zhou, X., & Zafarani, R. (2020). A survey of fake news: Fundamental theories, detection methods, and opportunities. ACM Computing Surveys, 53(5), 1–40. https://doi.org/10.1145/3395046
Acknowledgements
Not applicable
Funding
Not applicable
Author information
Authors and Affiliations
Contributions
S.A. and N.K.C. jointly contributed to this manuscript. N.C. revised the work for intellectual content and approved the final version for publication. S.A. played a pivotal role in the conception, design, data analysis, and software development for the study. Both authors reviewed and finalized the manuscript.
Corresponding author
Ethics declarations
Competing interests
The authors claim that they do not have any conflicts of interest.
Ethical Approval
Not applicable
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Akdag, S.H., Cicekli, N.K. Early detection of fake news on emerging topics through weak supervision. J Intell Inf Syst 62, 1263–1284 (2024). https://doi.org/10.1007/s10844-024-00852-1
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10844-024-00852-1