[go: up one dir, main page]

Academia.eduAcademia.edu
International Journal of Scientific Research in Computer Science, Engineering and Information Technology ISSN : 2456-3307 (www.ijsrcseit.com) doi : https://doi.org/10.32628/IJSRCSEIT Detecting and Mitigating the Dissemination of Fake News : Challenges and Future Research Opportunities 1 Bathini Pravalika, 2Amati Sanghavi, 3Yasmeen *1 Assistant Professor, Department of CSE, Bhoj Reddy Engineering College for Women, Hyderabad, Telangana, India *2,3 Students, Department of CSE, Bhoj Reddy Engineering College for Women, Hyderabad, Telangana, India ARTICLEINFO Article History: ABSTRACT With increasing popularity in the use of social media for news consumption, the substantial widespread dissemination of fake news has Accepted: 01 June 2023 the potential to adversely affect individuals as well as the society as a Published: 05 June 2023 whole. Even in the midst of the current covid-19 pandemic, false information shared on websites such as WhatsApp, Twitter, and Facebook have the potential to cause panic and shock a large number of people in Publication Issue Volume 9, Issue 3 various parts of the world. These misconceptions obscure healthier habits May-June-2023 virus and, as a result, result in poor physical and psychological health and encourage incorrect procedures, which aid in the transmission of the results for individuals. Therefore, it is a research challenge to validate the Page Number source, content and publisher of a news article for classifying it as genuine 361-367 or fake. The existing systems and techniques are not efficient enough to accurately classify a given news based on its statistical rating. Machine learning plays an imperative part in categorizing news data and information, despite some limitations. Our project not only aims on fake news detection but also on generation of real news once the fake news is detected. We propose a user-friendly webpage on which the user enters the news article statement. It is then tested by our machine learning algorithm which then classifies it as genuine or fake, after which the important words are extracted from the statement which helps to get the corresponding genuine news by scraping it from trusted sources and show it to the user. We have compared two machine learning algorithms in this which are- Passive Aggressive Classifier and Naïve Bayes algorithm. We got an accuracy of about 93.5% from Passive Aggressive Classifier and about 83.5% from Naïve Bayes algorithm. Keywords : Social Media, News Consumption, Fake News, Machine Learning Algorithms. Copyr i ght : © t he aut hor ( s ) , publ i s her and l i c ens ee Tec hnos c i enc e Ac ademy . Thi s i s an open- ac c es s ar t i c l e di s t r i but ed under t he t er ms of t he Cr eat i v e Commons At t r i but i on NonCommer c i al Li c ens e, whi c h per mi t s unr es t r i c t ed non - c ommer c i al us e, di s t r i but i on, and r epr oduc t i on i n any medi um, pr ov i ded t he or i gi nal wor k i s pr oper l y c i t ed 361 B Pr av al l i k a et al I nt . J . Sc i . Res . Comput . Sc i . Eng. I nf . Tec hnol . , May - J une- 2023, 9 ( 3) : 361-367 I. INTRODUCTION contributed to rise in prevalence of falsification of news that can not only have grave consequences in In today‘s society, most of the news consumption by the events of the real world but also risks the people is through different social media platforms, credibility of social media. since it is the most easy and convenient way of sharing news to each other. But with this comes the While the existence of fake news itself is not new as risk of widespread dissemination of fake news. These different civilizations, organizations, nations have fake news not just adversely affect an individual but it been manipulating the news media to sway public also affects the society as a whole. Today our world is opinion in their favor or for propaganda, the fighting against covid19. This pandemic not just prominence of social media have augmented the destroyed the livelihood of many people but also power that the fake news can have on an individual destroyed many families. Amidst these problems, fake and in a society. news just acts as a fuel to the fire. These Taking into consideration the impact that the misinformation and consumption of fake news can have on the fragility of encourage erroneous activities which aid in the spread of virus and lead to poor mental and physical health the ways society‘s function, we have proposed a system which can not only detect fake news by cross outcomes in people. Thus, it is very important to stop verifying a news article with various trustworthy the chain of fake news from the root itself. This can news sources but also generate real news for the users be done only if we have the proof whether the given to consume. news is real or fake and also the source of real news. experience of news consumption due to its cost This is where our project will be beneficial. With the rapid rise of social media and the protean effective, easily accessible and widely distributable characteristic. However, it has made an average technological advancements in recent years, we have internet user easily vulnerable to consuming news progressed from accessing news from traditional, that is intentionally or unintentionally distorted conventional means such as radio, newspapers and TV which can have drastic consequences and puts an news to a more ubiquitous, dynamic sources which individual and society at risk. can be credited due to the evolution of the internet. Therefore, detecting fake news especially on social Thus, we are living in a period of time where there is media poses a relatively new and unique problem an easy access to information that is growing because of which it provides a wide range of research exponentially. However, such conveniences that have opportunities to tackle such challenges. One such been brought on by whole host of social media challenge is the different ways in which a news is networks have also added multiple layers of intricacies and complexities which have made it falsified. Fake news can vary greatly from satirical, inflated news articles that are misinterpreted as difficult for a news consumer to differentiate between genuine to articles that make use of sensationalist, genuine and fake news, and such dissemination of clickbait headlines to grasp the attention of users. news followed by sharing and forwarding of such News articles can even be fabricated and manipulated news with intention to deceive, harm or influence public articles conceal without healthy cross behavior verification have Vol ume 9, I s s ue 3, May - J une- 2023 | ht t p: / / i j s r c s ei t . c om Social media have enhanced the 404 B Pr av al l i k a et al I nt . J . Sc i . Res . Comput . Sc i . Eng. I nf . Tec hnol . , May - J une- 2023, 9 ( 3) : 361-367 opinion that may result in confirmation bias or systematically describe the issue of detecting fake political polarization. Since fake news also usually news and summarize the strategies of doing so. They emerge out of developing critical real time events, it is discussed about the datasets and measurement criteria difficult to properly check and verify the quality of that are currently used in existing methods. Yuta data itself. Since fake news is riddled with factual Yanagi et al [3] proposes a fake news detector that can inaccuracies, it can mitigate the influence of real news create fake social contexts(comments), with the aim by competing with it. In this project, we propose a of detecting fake news early on in its spread when few system that makes use of machine learning algorithms social contexts are available It's been trained on a and various feature extraction methods to detect fake series of news articles and their social situations. They news by cross verifying from various other trusted also trained a classify model using news posts, real- news sites while also generating and displaying real posted comments, and generated comments. They news from trusted sources in the form of a website. compared the quality of produced comments for Through this project, we aim to obtain maximum articles with actual comments and those generated by accuracy in fake news detection and real news the classifying model to determine the detector's generation to obtain a perfect result. effectiveness. II. RELATED WORK In this paper, Shuo yang et al [1] inspect the matter of Limitation: According to their study, the words "!", "?", "false," "breaking," and other similar phrases are Unsupervised discovery of fake news on social media essential signals of fake news. by media J Zhang et al [4] in this paper, the false news engagement details. They used current event truths identification problem has been formulated as a and users' integrity as dormant random factors, and legitimacy inference problem, in which genuine news they used users' social media engagements to recognise their views on the validity of current events. has a higher reputation than fake news, which has a lower credibility. A deep diffusive network model is They suggest a method for unsupervised learning. proposed based on the interrelationship between This graphical various news stories, publishers, and their topics. paradigm to model current case truths and, as a result, They also implement a new diffusive unit model the users' reputation. To solve the inference dilemma, called GDU, which acquires multiple inputs from an effective Gibbs sampling technique is proposed. various sources simultaneously and then functionally Their experiment results show that their proposed combines the inputs to generate the necessary output algorithm outperforms the unsupervised standards. using material "forget" and "change" gates. Substantial Kai Shu et al [2] examines two facets of the issue of deployment of this model on a real-world fake news false news identification: repository, such as PolitiFact, has yielded remarkable -a) Characterization- This aspect introduces the fundamental concepts of fake news in both traditional results when it comes to identifying fake news stories, publishers, and material in the network, and social media. demonstrating the proposed model's impressive b) Detection- The current detection methods, efficiency and ability. including feature extraction and model construction, A Thota et al [5] proposes Dissemination and are examined from a data mining perspective. consumption of fake news has become a matter of They described fake news and characterized it by major concern due to its potential to destabilize evaluating various theories and properties in both governments, which poses a grave threat to society traditional and social media. They continue to and its individuals. In this paper, Shuo yang et al [1] utilizing system the users‘ employs a reckless social probabilistic Vol ume 9, I s s ue 3, May - J une- 2023 | ht t p: / / i j s r c s ei t . c om 362 B Pr av al l i k a et al I nt . J . Sc i . Res . Comput . Sc i . Eng. I nf . Tec hnol . , May - J une- 2023, 9 ( 3) : 361-367 inspect the matter of Unsupervised discovery of fake study is the presence of significant amount of noise in news on social media by utilizing the users‘ reckless the selection strategy that is implemented for web social media engagement details. They used current search results during the process of fetching data. event truths and users' integrity as dormant random Adrian Groza [7], in this paper, proposes that factors, and they used users' social media engagements although the consumption and sharing of false, to recognise their views on the validity of current unverified news and pieces of information pertaining events. They suggest a method for unsupervised to the health and medical domain has been an old learning. probabilistic practice, there are still a plethora of challenges that graphical paradigm to model current case truths and, still exists and is needed to be tackled in order to save as a result, the users' reputation. To solve the people from falling prey to medical myths. The study inference dilemma, an effective Gibbs sampling aims to identify fake news related to the Covid-19 by technique is proposed. Their experiment results show integrating natural language processing with ontology that their proposed algorithm outperforms the reasoning. They look into the way in which reasoning unsupervised standards. in An alternative way in which fake news can be inconsistencies between information from trusted detected is through stance detection which functions by automatically detecting the interrelation between medical sources and information that is not verified and is presented in natural language. Limitation: - different news articles and its contents. This study System assessments and verbalizing explanations for thus each conflicting information are some limitations of This surveys system different employs ways a to predict this Description Logics (DLs) can identify relationship, with the help of the news article and this paper. headline pair provided. Based on the similarity Subhadra Gurav et al [8] proposes that the current between the news article and the headlines, the stances can be categorized as ‗unrelated‘, ‗discuss‘, techniques and systems are ineffective in providing an accurate statistical rating for any news. Moreover, the ‗agree‘ or ‗disagree‘. Such an approach has been limitations in terms of news categorization and implemented machine feedback makes the systems less diverse. In this study, learning models to set a standard in order to identify an innovative system for detecting false news using contrasts with respect to the modern, sophisticated machine learning algorithms is proposed. Based on Deep neural networks are used to define the Twitter feedback and the application of classification relationship between the news story and the headline. algorithms to identify such news events, this model Kai Shu et al [6], In this study, proposes a takes news events as input and calculates the FakeNewsNet which is a comprehensive repository percentage of news that is real or false. that contains data from a variety of features which Limitations: Some of the major limitations of this were otherwise scarce, such as spatiotemporal information, social context and news material. The paper is the accuracy of the model as well as the limited information that the model can fetch from repository implements an approach to fetch relevant different sources. information from eclectic sources. Furthermore, a Kai-Chou Yang et al [9] in this paper treats the preliminary exploration analysis has also been problem as a natural language inference (NLI) task conducted on FakeNewsNet through a variety of where the sentences can be classified as ―premise‖ (P) features to demonstrate its efficiency and utility in and ―hypothesis‖ (H). For such a task, NLI models fake news detection tasks. Limitations: -Apart from tend to be more reliable and accurate. The collective with various traditional being time consuming, another limitation of this Vol ume 9, I s s ue 3, May - J une- 2023 | ht t p: / / i j s r c s ei t . c om 363 B Pr av al l i k a et al I nt . J . Sc i . Res . Comput . Sc i . Eng. I nf . Tec hnol . , May - J une- 2023, 9 ( 3) : 361-367 utility of gradient boosting and fine-tuning with noisy verification have contributed to rise in prevalence of labels demonstrated its significance in the model. falsification of news that can not only have grave Limitations: The performance of the model was not consequences in the events of the real world but also satisfactory and the research was adversely affected by risks the credibility of social media. time constraints. We suggest a model in this project that makes use of Limeng Cui et al [10], In this paper, proposes a robust machine learning algorithms and various feature COVID-19 misinformation dataset known as CoAID, extraction methods to identify fake news by cross- which includes news articles, posts on social media referencing it with other reliable news sources, as platforms as well as the user interaction that pertains well as producing and displaying real news from to such misinformation. In addition to the description reliable sources in the form of a website. To achieve a of the datasets fetched for this study, data analysis has perfect result, we strive to achieve maximum accuracy also been conducted to illustrate the distinctive in fake news detection and real news generation in characteristics between fake and factual information, this project. as well as to demonstrate the potential future research These are the steps followed: opportunities that can be addressed through such methods with techniques. the implementation of modern Limitations: A major limitation of this paper is the stance of information or news article is true or false. Basically, the title content and domain name are checked. difficulty regarding authenticity of news or a piece of information as the study addresses a fairly recent and like Passive Aggressive Classifier, Naïve Bayes ongoing issue which adds to the complexity of the algorithm and keyword search algorithm. problem, as well as the process of fetching datasets since it is dynamic and frequently changing. real, it will give genuine news from trusted sites so the dissemination of false information can be stopped. III. PROPOSED SYSTEM We have moved from receiving news from old, traditional means such as radio, newspapers, and TV news to a more widespread, dynamic outlets which can be attributed to the growth of the internet, thanks to the rapid rise of social media and the protean technological advances in recent years. Thus, we are living in a time when knowledge is readily available and increasing exponentially. However, such conveniences that have been brought on by whole host of social media networks have also added multiple layers of intricacies and complexities which have made it more complicated for a news consumer to differentiate between genuine and fake news, and such dissemination of news followed by sharing and forwarding of such news articles without cross- Fig 1. Architecture Diagram DATA COLLECTION: In the proposed system, the data is collected keeping in mind the current covid situation. So, we have collected the dataset which were publicly available on Kaggle. We went through various datasets and at last came up with dataset with maximum number of records. Vol ume 9, I s s ue 3, May - J une- 2023 | ht t p: / / i j s r c s ei t . c om 364 B Pr av al l i k a et al I nt . J . Sc i . Res . Comput . Sc i . Eng. I nf . Tec hnol . , May - J une- 2023, 9 ( 3) : 361-367 PRE-PROCESSING: extraction methods. In this project, we have used the In the pre-processing step, the data is cleaned such TFIDF vectorizer. that the unwanted and unnecessary information can TFIDF vectorizerTFIDF vectorizer is an abbreviation be removed and only the relevant details will be kept. for In this project we have used Stemming and stopwords. Frequency. It checks that how significant a word is in There are different methods used in pre-processing. the whole document. Some of the methods are mentioned below- The term frequency function determines how often a - The method of minimizing various Term Frequency and Inverse Document term appears in the text. words to their root or basic word is known as The inverse document frequency determines whether stemming. For example: If we have words like a word is uncommon or common across a document. ‗retrieval‘, ‗retrieves‘, ‗retrieved‘ etc., these words The TFIDF will thus check the authenticity. So, if a will be reduced to its root form which is retrieve. word occurs frequently in many documents like Stemming is an important part of Natural language ‗what‘, ‗if‘ etc., they have the chances that they are processing and is widely used. In a domain analysis, fake, while the words that appear often in one text the stemming is used to evaluate the main but not in all others have a good chance of being true. vocabularies. - Stopwords are the common words present in a text such as ‗a‘, ‗an‘, ‗the‘ etc. In the preprocessing, these are the steps which will be filtered out and are not necessary. These are the words which add very little meaning to a sentence in any language. IV. RESULTS AND DISCUSSION They can be easily overlooked without jeopardizing the sentence's purpose. When we remove the stopwords, the dataset size also decreases which helps in faster processing of data and it also enhances the performance. - Tokenization refers to splitting of text or words into small tokens. For example, in a paragraph, a line is a token. Similarly, in a line a word is a token. Tokenization is important because, by studying the words in a document, the meaning of the text can be easily deduced. There are different types Fig 2. Fake news detected on website of tokenization present such as word tokenization, line tokenization, regular expression tokenization etc. FEATURE EXTRACTION In Feature extraction, after identifying the key feature from the document, the data is reduced so that it can be cleaned and further be tested on various machine learning algorithms. There are various feature Vol ume 9, I s s ue 3, May - J une- 2023 | ht t p: / / i j s r c s ei t . c om 365 B Pr av al l i k a et al I nt . J . Sc i . Res . Comput . Sc i . Eng. I nf . Tec hnol . , May - J une- 2023, 9 ( 3) : 361-367 the keyword extraction algorithm. On the basis of our analysis, we can successfully remove the fake news if any. The future work would extend the same for implementing an false news detection system in a social media scenario where we can predict if a news is reliable or not the moment it is being posted in the web and to delete it completely from the media and avoid its transmission in the internet, Several other Fig 3. Real news detected on website features may include identifying the false urls, or finding out if videos are morphed and trying to V. CONCLUSION AND FUTURE WORK remove it from the media. For better accuracy results, we can plan to store the results in the cloud. Storing With the increased use of social media for news the data in the cloud can enable the possibility of consumption and in prevalence, the widespread performing classification algorithm at a faster rate. distribution of false news has the potential to harm VI. REFERENCES both individuals and society as a whole. Even in the midst of the current covid-19 pandemic, false information on platforms like WhatsApp, Twitter and [1]. Yang, S., Shu, K., Wang, S., Gu, R., Wu, F. and Facebook can cause panic and have a shocking impact Liu, H., 2019, July. Unsupervised fake news not just on an individual but to a society as a whole. detection on social media: A generative The objective is to detect the fake news through latest approach. technologies and algorithms like Passive aggressive conference on artificial intelligence (Vol. 33, classifier. We used fake news detection where the No. 01, pp. 5644-5651). user will enter the text and this text will go through [2]. In Proceedings of the AAAI Kai Shu , Amy Sliva , Suhang Wang , Jiliang whether it is true or false. Further, our real news Tang , and Huan Liu,2017 september. Fake News Detection on Social Media: A Data generation will check and validate the news and give Mining Perspective our various models and at last give a prediction us some news from trusted sites. [3]. Yanagi, Y., Orihara, R., Sei, Y., Tahara, Y. and Our proposed model consists of two components, one where the detection takes place and the other where Ohsuga, A., 2020, July. Fake News Detection its correction takes place, if the news is found out to In 2020 IEEE 24th International Conference on be false corresponding correct news is given as output. Intelligent Engineering Systems (INES) (pp. 85- We determine the accuracy of these models and 90). IEEE. discuss about their limitations. In our project, the user with Generated Comments for News Articles. [4]. Zhang, J., Dong, B. and Philip, S.Y., 2020, April. can enter the text. Various machine learning Fakedetector: Effective fake news detection algorithms are performed and we found out that with deep diffusive neural network. In 2020 Passive aggressive classifier gives a better accuracy as IEEE 36th International Conference on Data compared to Naïve Bayes. Engineering (ICDE) (pp. 1826-1829). IEEE. Further, the data is extracted and then real news generation is done using Vol ume 9, I s s ue 3, May - J une- 2023 | ht t p: / / i j s r c s ei t . c om 366 B Pr av al l i k a et al I nt . J . Sc i . Res . Comput . Sc i . Eng. I nf . Tec hnol . , May - J une- 2023, 9 ( 3) : 361-367 [5]. [6]. Thota, A., Tilak, P., Ahluwalia, S. and Lohia, N., [15]. Jin, Z.; Cao, J.; Zhang, Y.; and Luo, J. 2016. 2018. Fake news detection: A deep learning News verification by exploiting conflicting approach. SMU Data Science Review, 1(3), p.10. social viewpoints in microblogs. In AAAI, Shu, K., Mahudeswaran, D., Wang, S., Lee, D. 2972–2978. and Liu, H., 2020. FakeNewsNet: A Data Repository with News Content, Social Context, and Spatiotemporal Information for Studying Fake News on Social Media. Big Data, 8(3), pp.171-188. [7]. Groza, A., 2020. Detecting fake news for the new coronavirus by reasoning on the Covid-19 ontology. arXiv preprint arXiv:2004.12330. [8]. Gurav, S., Sase, S., Shinde, S., Wabale, P. and Hirve, S., 2019. Survey on Automated System for Fake News Detection using NLP & Machine Cite this article as : B Pravallika, Sanghavi Amati, Yasmeen "Detecting and Mitigating the Dissemination of Fake News : Challenges and Future Research Opportunities", International Journal of Scientific Research in Computer Science, Engineering and Information Technology (IJSRCSEIT), ISSN : 2456-3307, Volume 9, Issue 3, pp.361-367, May-June-2023. Learning Approach. International Research Journal of Engineering and Technology (IRJET), 6(01), pp.308-309. [9]. Yang, K.C., Niven, T. and Kao, H.Y., 2019. Fake news detection as natural language inference. arXiv preprint arXiv:1907.07347. [10]. Cui, L. and Lee, D., 2020. Coaid: Covid-19 healthcare misinformation dataset. ArXiv preprint arXiv:2006.00885. [11]. Qi, P., Cao, J., Yang, T., Guo, J. and Li, J., 2019, November. Exploiting multi-domain visual information for fake news detection. In 2019 IEEE International Conference on Data Mining (ICDM) (pp. 518-527). IEEE. [12]. Srivastava, A., Kannan, R., Chelmis, C. and Prasanna, V.K., 2019, December. RecANt: Network-based Recruitment for Active Fake News Correction. In 2019 IEEE International Conference on Big Data (Big Data) (pp. 940949). IEEE [13]. Long, Y., 2017. Fake news detection through multi-perspective speaker profiles. Association for Computational Linguistics. [14]. Wang, W. Y. 2017. ‖ liar, liar pants on fire‖: A new benchmark dataset for fake news detection. arXiv preprint arXiv:1705.00648 Vol ume 9, I s s ue 3, May - J une- 2023 | ht t p: / / i j s r c s ei t . c om 367