[go: up one dir, main page]

0% found this document useful (0 votes)
38 views43 pages

Misinformation Detection and Fact Checking Report

The dissertation presents a multimodal fake news detection system that analyzes both text and images on social media to combat misinformation. Utilizing deep learning models like BERT and ResNet50, the system achieved an accuracy of 86.2% by fusing textual and visual features for classification. The project aims to provide a scalable AI-based solution for enhancing the credibility of digital information and mitigating the spread of false narratives.

Uploaded by

madihafirdous25
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
38 views43 pages

Misinformation Detection and Fact Checking Report

The dissertation presents a multimodal fake news detection system that analyzes both text and images on social media to combat misinformation. Utilizing deep learning models like BERT and ResNet50, the system achieved an accuracy of 86.2% by fusing textual and visual features for classification. The project aims to provide a scalable AI-based solution for enhancing the credibility of digital information and mitigating the spread of false narratives.

Uploaded by

madihafirdous25
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 43

Misinformation Detection and Fact Checking

on Social Media
Dissertation submitted to the
University College of Engineering (A), Osmania University.
In Partial fulfilment for the award of the degree
of
BACHELOR OFENGINEERING
In
COMPUTER SCIENCE AND ENGINEERING
Submitted by
MANEESHA Y (100521733033)
MADIHA FIRDOUS (100521733032)
N SRI CHANDANA (100521733040)

Under the Guidance of


E.Pragnavi
Dept. of CSE, UCEOU.

DEPARTMENTOFCOMPUTERSCIENCEANDENGINEERING
UNIVERSITY COLLEGE OF ENGINEERING (A),
OSMANIA UNIVERSITY HYDERABAD, TS-500007
MAY – 2025
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING
UNIVERSITY COLLEGE OF ENGINEERING,
OSMANIA UNIVERSITY

CERTIFICATE

This is to certify the bonafide work of Maneesha Y, Madiha Firdous, N Sri Chandana
bearing roll numbers 100521733033,100521733032,100521733040 respectively, for their
major project, in partial fulfilment of Bachelor of Engineering Degree, offered by Department
of Computer Science and Engineering, University College of Engineering, Osmania
University.

Project Guide Head of the Department


E.Pragnavi Prof. P.V Sudha
Dept. of CSE, UCEOU Dept. of CSE, UCEOU
STUDENT DECLARATION

We declare that the wok reported in the project report entitled “Misinformation Detection and
Fact Checking on Social Media” submitted by Maneesha Y, Madiha Firdous, N Sri Chandana
is a record of the work done by us in the DEPARTMENT OF COMPUTER SCIENCE AND
ENGINEERING, UNIVERSITY COLLEGE OF ENGINEERING. No part of the report is
copied from books/journals/internet and wherever referred, the same has been duly
acknowledged in the text. The reported data is based on the work done entirely by us and not
copied from any other source or submitted to any other Institute or University for the award
of a degree or diploma.

SIGNATURES

Maneesha Y:

Madiha Firdous:

N Sri Chandana:
ACKNOWLEGMENT

It is our privilege and pleasure to express our profound sense of respect, gratitude,
and indebtedness to our guide E.Pragnavi, Department of Computer Science and
Engineering, UCEOU, for his inspiration, guidance, cogent discussion, constructive
criticisms, and encouragement throughout this dissertation work.

We also thank Prof. P. V. Sudha, Head Department of Computer Science and


Engineering for her support from the Department and for making all the resources
available to us students. We also extend my thanks to the entire faculty of the
Department of Computer Science and Engineering, University College of
Engineering, Osmania University who encourage us throughout the course of our
Bachelor degree and allowing us to use the many resources present in the
department. Our sincere thanks to our parents and friends for their valuable
suggestions, morals, strength, and support for the completion of our project
ABSTRACT

The rapid dissemination of misinformation, particularly on social media platforms, has


become a serious threat to public well-being and societal stability. From misleading health
advice to politically charged fake narratives, the consequences of false information are far-
reaching. Traditional detection methods, which often rely on either textual or visual analysis
alone, are increasingly inadequate against the sophisticated nature of modern misinformation.

In response to this challenge, our project presents a multimodal fake news detection system
that analyses both text and images associated with online posts. Using the Fakeddit dataset,
which contains labelled social media entries with corresponding images and text, we built a
deep learning model that leverages BERT for extracting contextual text embeddings and
ResNet50 for capturing visual features. These features are fused and passed through a neural
network for final classification.

Our experimental setup demonstrated that the multimodal system significantly outperforms
unimodal baselines. The model achieved an accuracy of 86.2%, along with strong precision
and recall scores, indicating its reliability across varied misinformation types. This
improvement highlights the advantage of jointly analysing textual and visual cues, which
often reveal inconsistencies undetectable by single-modality systems.

This project contributes a scalable, AI-based solution to the growing problem of


misinformation. With further enhancements such as multilingual support and real-time
processing, the system has the potential to be integrated into social media platforms or fact-
checking workflows, supporting more credible and trustworthy digital information
environments.

TABLE OF CONTENTS
S.NO CONTENT PAGE NO.
1. Chapter 1: Introduction
1.1. Introduction
1.2. Aim
1.3. Problem Definition
1.4. Motivation
1.5. Deep Learning
1.6. Convolutional Neural Network
1.7. ResNet50 Architecture
1.8. BERT Architecture

2. Chapter 2: Literature Survey

3. Chapter 3: Fundamental of fake news and


misinformation
3.1 What is fake news?
3.2 Misinformation vs Disinformation
3.3 Types of Misinformation

3.4 Why Multimodal Detection?

4. Chapter 4: Existing Methods


4.1 Text Based Approaches
4.2 Image Based Approaches
4.3 Ensemble and Hybrid models
4.4 Multimodal Deep Learning Approaches

5. Chapter 5: Proposed Method


4.5 Proposed Approach
4.6 Dataset and Preprocessing
4.7 Architecture of the proposed system
4.8 Training and Optimisation
4.9 Evaluation Environment
4.10 Advantages of the proposed approach
6. Chapter 6: Dataset Exploration
4.11 Dataset Selection
4.12 Dataset structure and Attributes
4.13 Data Preprocessing
4.14 Dataset statistics
4.15 Dataset Challenges

7. Chapter 7 Implementation

8. Chapter 8: Results and Analysis

9. Chapter 9: Future Scope

10. References

11. Appendix
LIST OF FIGURES

CHAPTER I
1.INTRODUCTION

1.1 Introduction

In today’s digital-first society, social media platforms like Twitter, Facebook, and Reddit have
become primary sources of news and information for millions of people. These platforms
enable users to share information instantly and globally, creating a fast-paced environment
for communication. However, this speed and openness also make it easy for false or
misleading information—commonly known as misinformation or fake news—to spread
uncontrollably. The impact of such content can be severe, influencing elections, undermining
public health, and creating unnecessary panic.

Fake news is no longer limited to just manipulated headlines or deceptive articles. Many
posts combine misleading textual information with compelling but unrelated or altered
images, making them more convincing and harder to detect. This multimodal nature of
misinformation presents new challenges for detection systems that analyze content based
solely on one type of input—either text or images. As a result, unimodal systems often fail to
recognize the complete context or underlying deception in such posts.

Recognizing this gap, our project proposes a multimodal deep learning system for detecting
misinformation. The system utilizes BERT (Bidirectional Encoder Representations from
Transformers) to understand the contextual nuances of text, and ResNet50, a powerful
convolutional neural network, to extract meaningful features from images. These two
modalities are fused into a single feature vector, which is then classified using a neural
network to determine whether a post is real or fake.

By combining both visual and textual cues, our approach provides a more accurate and
intelligent method for misinformation detection. The system not only identifies
inconsistencies between modalities but also learns patterns that could indicate manipulation.
This project aims to contribute a scalable, AI-based solution that can support online platforms
and fact-checking tools in mitigating the spread of false information.

1.2 AIM
The aim of this project is to design and develop an intelligent, scalable, and efficient deep
learning system for the detection of misinformation and fake news on social media platforms.
As online platforms become increasingly influential in shaping public opinion, the threat
posed by fake news has grown substantially. Our goal is to harness the power of multimodal
deep learning techniques to accurately assess the credibility of social media content by
analyzing both textual and visual elements.

By leveraging cutting-edge AI models such as BERT for natural language understanding and
ResNet50 for visual interpretation, we intend to create a system capable of identifying
deceptive or manipulated content that may otherwise bypass traditional detection
mechanisms. This approach aims not only to improve detection accuracy but also to provide a
strong foundation for real-time misinformation monitoring tools in large-scale, real-world
applications.

Specific Objectives:

 Multimodal Feature Extraction:


Use BERT to extract deep contextual embeddings from the textual components of
social media posts, and ResNet50 to derive high-level visual features from associated
images. This dual-modality approach aims to enhance the model’s ability to detect
inconsistencies and manipulation.
 Feature Fusion and Classification:
Combine the features obtained from both text and image modalities into a unified
vector, and feed it into a deep neural network for binary classification (real or fake).
The fusion architecture is key to capturing the inter-modal relationships that unimodal
models often miss.
 Dataset Utilization and Evaluation:
Train and validate the model on the Fakeddit dataset, a large, labeled dataset
designed specifically for multimodal fake news detection. Evaluate performance using
metrics such as accuracy, precision, recall, F1-score, and confusion matrix.
 Benchmarking Against Baselines:
Compare the multimodal model’s performance with unimodal baselines (text-only and
image-only models) to empirically demonstrate the advantages of multimodal
learning in misinformation detection.
 Scalability and Practical Impact:
Develop a robust system architecture that can be extended to real-time applications
such as browser plugins, social media flagging tools, or government fact-checking
dashboards. Ensure the model is adaptable to evolving misinformation patterns and
scalable across platforms.

Through these objectives, our project aims to contribute a reliable, data-driven solution to the
growing problem of fake news. By leveraging the synergy of multimodal deep learning, we
strive to set a precedent for future AI systems capable of enhancing digital content credibility
and ensuring safer information environments.

1.3 PROBLEM DEFINITION

Despite numerous advances in machine learning and natural language processing, the
detection of fake news and misinformation remains an unsolved challenge. Traditional fake
news detection models tend to focus on either textual content or visual elements, treating
them in isolation. However, in real-world scenarios, especially on social media,
misinformation often manifests in multimodal forms—combining deceptive text with
unrelated, altered, or emotionally charged images. This disconnect between text and image
can mislead readers, bypass unimodal detection systems, and amplify the spread of harmful
content

Our problem can be formally defined as follows:

Design and implement a multimodal deep learning framework that can process both text and
image components of a post and classify it as fake or real, with improved accuracy over
unimodal baselines

1.4 MOTIVATION

The motivation for this project stems from the increasingly sophisticated ways in which
misinformation is being propagated. Social media users often consume and share content
without verifying its credibility. Manual fact-checking by journalists and watchdog
organizations, while accurate, is time-consuming, limited in scale, and reactive rather than
preventive.

Artificial Intelligence (AI) presents a promising solution by enabling real-time, large-scale


detection of misleading content. By automating the detection process, we can potentially
reduce the spread of misinformation before it goes viral.

Several factors motivated our choice to use a multimodal deep learning model:

 Text-Image Discrepancies: Many fake posts pair legitimate-sounding text with false
or unrelated images. Analyzing these discrepancies helps in uncovering manipulative
intent.
 Higher Classification Accuracy: Combining features from both text and image
domains has been shown to outperform unimodal systems in benchmark studies.
 Advanced Architectures: The availability of powerful pretrained models like BERT
for language and ResNet50 for vision tasks makes it feasible to develop sophisticated
solutions using transfer learning.
 Societal Impact: The COVID-19 pandemic, political disinformation campaigns, and
environmental hoaxes have demonstrated the urgency of building reliable
misinformation detection tools.

Our work is therefore not only technically ambitious but socially relevant. Through this
project, we aim to contribute a practical solution that could aid researchers, social media
platforms, and policy makers in curbing the spread of harmful fake news content.

1.5 DEEP LEARNING


Deep learning is a specialized subfield of Artificial Intelligence (AI) that models the way the
human brain processes information. It utilizes artificial neural networks composed of
multiple layers—hence the term deep—to learn from vast amounts of data. These networks
consist of interconnected nodes, or neurons, which apply transformations to the input data
and pass the results forward. Each layer progressively extracts higher-level features, allowing
the system to learn complex patterns without explicit programming.
One of the defining strengths of deep learning is its ability to automatically extract relevant
features from raw data, eliminating the need for manual feature engineering. This
characteristic makes it especially powerful for high-dimensional tasks such as image
classification, speech recognition, and natural language processing (NLP). Deep learning
models learn to generalize from examples, enabling them to make predictions or decisions
from data they have never seen before.

In the context of our project, deep learning serves as the foundation for building an
intelligent, multimodal fake news detection system. Specifically, we utilize two powerful
deep learning architectures tailored for different data modalities:

 Convolutional Neural Networks (CNNs), such as ResNet50, are used to analyse and
extract spatial features from images. These networks can identify textures, objects,
and subtle visual cues that may indicate manipulation or deception in media content.

 Transformer models, particularly BERT (Bidirectional Encoder Representations


from Transformers), are employed for understanding textual content. Transformers
are capable of capturing long-range dependencies and semantic relationships in text,
which are crucial for detecting misleading or contradictory information in social
media posts.

By combining both CNNs for visual analysis and transformers for textual analysis, our
system takes a dual-modality approach. This allows for a deeper and more nuanced
understanding of the content, enabling the model to detect inconsistencies between images
and text—a common trait of misinformation.

While deep learning offers immense capabilities, it also comes with challenges. Training deep
models requires large labelled datasets, significant computational resources, and careful
hyperparameter tuning. In this project, deep learning not only enables more accurate fake
news detection but also demonstrates its broader applicability in tackling real-world
challenges.

1.6 CONVOLUTIONAL NEURAL NETWORK:

Convolutional Neural Networks (CNNs) are a class of deep learning models specifically
designed to process and analyse visual data. They have become the foundation of many state-
of-the-art applications in computer vision, including image classification, object detection,
and facial recognition. CNNs are capable of learning hierarchical features from images, such
as edges, textures, and complex shapes, making them highly effective for understanding and
interpreting visual content.

The architecture of a Convolutional Neural Network (CNN) is structured into three primary
components: the input layer, the hidden layers, and the output layer.

Fig 1.1 Architecture of Convolutional Neural Network

1.Input Layer: The input layer serves as the entry point for the network. It receives the input
image data, which is typically in the form of a matrix of pixel values representing the image's
features.

2.Hidden Layers: The hidden layers are where the core processing of the CNN occurs. They
consist of multiple convolutional and pooling layers.

 Convolutional Layers: These layers apply a set of learnable filters (also called
kernels) to the input image. Each filter detects specific patterns or features, such as
edges or textures, by performing convolution operations. This helps the network to
extract hierarchical representations of the input data.

 Pooling Layers: After each convolutional layer, a pooling layer is often applied to
down sample the feature maps generated by the convolutional layers. Pooling helps to
reduce the spatial dimensions of the feature maps while retaining important
information, thereby decreasing computational complexity and controlling overfitting.

3.Output Layer: The output layer provides the final results of the network's processing. It
typically consists of one or more fully connected layers that perform classification or
regression tasks based on the features extracted by the hidden layers. For classification tasks,
the output layer produces the predicted class label or probability scores for each class.
The effectiveness of a CNN largely depends on the configuration of its hidden layers—such
as the number of layers, filter size, stride, padding, and pooling strategy. By carefully tuning
these hyperparameters, CNNs can be optimized for both accuracy and efficiency in a wide
range of image-based applications.

In this project, we use ResNet50, a deep CNN with 50 layers and residual connections, which
enables efficient training of very deep networks and improves feature extraction for image-
based misinformation detection.

1.7 ResNet50 Architecture

ResNet50 is a deep convolutional neural network architecture that extends the standard CNN
design through the innovative use of residual connections. It is a 50-layer variant of the
ResNet (Residual Network) family and has become a standard in image classification tasks
due to its ability to train very deep networks efficiently. The core innovation of ResNet50 lies
in its ability to overcome the vanishing gradient problem, which often hampers the training
of deep neural networks. This is achieved through skip connections, allowing the model to
learn residual functions rather than direct mappings.

Fig 1.2ResNet50 Architecture

The architecture of ResNet50 can be broken down into several key components:

 Convolutional Layers:
These are the foundational layers responsible for extracting features from input
images. The layers apply filters to detect low-level patterns such as edges and
textures. Each convolutional layer is followed by Batch Normalization and ReLU
activation to improve stability and introduce non-linearity. A max pooling operation
is applied early in the architecture to reduce spatial dimensions while preserving
essential features.
 Identity and Convolutional Blocks:
These are the central building blocks of ResNet50. An identity block allows the input
to be added directly to the output of the convolutional layers within the block. This
addition is made possible through skip connections, enabling the network to learn
changes (or “residuals”) from the input rather than starting from scratch. The
convolutional block is a slightly modified version that includes a 1×1 convolution to
match dimensions when the input and output sizes differ. These blocks work together
to enable efficient deep learning without performance degradation.

Skip Connections
The ResNet50 architecture incorporates skip connections, also referred to as residual
connections, as a fundamental aspect. These connections are pivotal in mitigating the issue of
vanishing gradients by facilitating the direct flow of information from the input to the output
of the network, circumventing one or more layers. Consequently, the network is empowered
to learn residual functions that effectively map the input to the desired output, thus alleviating
the need to learn the entire mapping from scratch.

Fig 1.3 Skip Connections in ResNet50 Architecture


In ResNet50, skip connections are used in the identity block and convolutional block. The
identity block passes the input through a series of convolutional layers and adds the input
back to the output, while the convolutional block uses a 1x1 convolutional layer to reduce the
number of filters before the 3x3 convolutional layer and then adds the input back to the
output. The use of skip connections in ResNet50 allows the network to learn deeper
architectures while still being able to train effectively and prevent vanishing gradients.

Fully Connected Layers:


After the hierarchical feature extraction by the convolutional blocks, the output is flattened
and passed through one or more fully connected (dense) layers. In image classification tasks,
these layers culminate in a softmax output that assigns probabilities to predefined classes. In
our case, we use the 2048-dimensional feature vector from the final pooling layer, before the
classification head, as a high-level image representation.

In our project, ResNet50 serves as the backbone for visual feature extraction. It processes
each image and outputs a 2048-dimensional feature vector representing the most relevant
and abstract characteristics of the visual content. This vector is then fused with textual
features from BERT to enable multimodal fake news classification. The use of ResNet50
ensures that the system captures subtle visual cues and patterns—such as manipulated
images, misleading graphics, or visual inconsistencies—that may contribute to
misinformation.
1.8 BERT Architecture

BERT (Bidirectional Encoder Representations from Transformers) is a state-of-the-art


deep learning model for Natural Language Processing (NLP), developed by Google in 2018.
Unlike traditional language models that read input sequences from left to right or right to left,
BERT is bidirectional, meaning it analyzes text by simultaneously looking at both its left
and right contexts. This allows BERT to capture richer, more nuanced relationships between
words, making it exceptionally powerful for understanding natural language.

At the core of BERT is the Transformer architecture, which relies heavily on self-attention
mechanisms. These attention layers allow the model to assign varying degrees of importance
to different words in a sentence, depending on the context in which they appear. This is
especially useful for interpreting complex or ambiguous language, as it allows BERT to
understand the true meaning of words based on their surroundings. For instance, the word
"bank" could refer to a financial institution or the side of a river—BERT can disambiguate
this based on the sentence.

The architecture of the BERT-base model, which we use in our project, consists of:
 12 Transformer encoder layers
 12 self-attention heads
 768-dimensional hidden size
 110 million parameters

In our project, BERT is used to process the textual content of social media posts, extracting
contextual embeddings that represent the semantic meaning of the text. Each post is
tokenized and passed through BERT, and the output from the special [CLS] token (which
stands for "classification") is taken as a 768-dimensional vector representing the entire input
sentence or paragraph. This vector is rich in contextual information and captures the
underlying intent and tone of the post, which is critical for detecting misleading or deceptive
text.

By integrating BERT with image features from ResNet50, our system benefits from a
comprehensive, multimodal representation of each social media post. BERT enhances the
model's ability to understand subtle textual manipulations such as sarcasm, misinformation,
emotional bias, or clickbait-style phrasing—all of which are common tactics used in fake
news. Its pretraining on massive corpora, including Wikipedia and BooksCorpus, gives it a
strong foundational understanding of language, which is fine-tuned in our model for the
specific task of misinformation detection.
CHAPTER II
2.LITERATURE SURVEY

The detection of misinformation and fake news using artificial intelligence has become a
critical area of research, especially with the growing influence of social media. Researchers
have explored a wide range of approaches—from natural language processing to multimodal
deep learning—to automate and improve the accuracy of misinformation detection systems.

Zhou et al. (2020) – FakeNewsNet: A Data Repository for Fake News Research
Zhou and colleagues introduced FakeNewsNet, a benchmark data repository that integrates
textual content, social context, and spatiotemporal signals associated with fake and real news.
Their research demonstrated that fake news often spreads through unique social patterns and
user interactions, suggesting that content alone is insufficient for accurate classification. This
dataset supports the development of models that analyze multiple information streams for
better misinformation detection.

Kaliyar et al. (2021) – FNDNet: Deep Learning for Text-Based Detection


Kaliyar et al. proposed FNDNet, a CNN-LSTM-based hybrid deep learning model for
detecting fake news using only text. The model successfully captured long-term dependencies
and semantic patterns in news articles, achieving high accuracy on several benchmark
datasets. Their findings underscored the effectiveness of deep sequential models in handling
textual misinformation, but also highlighted the limitations of ignoring visual content.

Gupta et al. (2013) – Fake Image Detection During Crisis Events


Gupta focused on identifying misleading images circulated during the Boston Marathon
bombings. Their research established that a large portion of viral fake news was accompanied
by repurposed or unrelated images. They proposed an image verification pipeline that cross-
checked visual content against reliable sources. This work emphasized the need for visual
analysis in misinformation detection and set the stage for multimodal research.

Singhal et al. (2021) – Multimodal Fusion Using BERT and CNN


Singhal et al. explored a multimodal architecture that combined BERT for textual analysis
with CNN-based image feature extraction. Their research demonstrated that combining
modalities significantly reduced the false positive rate and improved detection accuracy. The
model performed especially well on posts where textual and visual content were intentionally
mismatched—an area where unimodal models typically fail.

Yao et al. (2021) – MOCHEG: Multimodal Misinformation Detection Framework


Yao introduced MOCHEG, a comprehensive multimodal framework incorporating SBERT,
CLIP, and ResNet50. Their model integrated semantic similarity scoring, textual embeddings,
and image classification to detect inconsistencies between modalities. Evaluations showed
that their approach achieved state-of-the-art performance on Fakeddit and other noisy
datasets, demonstrating the power of feature-level fusion and contextual reasoning.

Lu and Li (2022) – Contrastive Vision-Language Learning for Fake News


This study investigated contrastive learning techniques to align text and image modalities.
By training a model to detect mismatches between images and associated text, the researchers
improved classification performance and reduced ambiguity in cases where either modality
could be misleading. Their model was also robust against adversarially crafted fake news
posts.

Ahmed et al. (2021) – Transformer Models vs RNNs for Fake News Detection
Ahmed conducted a comparative evaluation of BiLSTM, GRU, and transformer-based
models. The study confirmed that transformer models, particularly BERT, outperformed
traditional RNN-based architectures in accuracy and contextual understanding. This is
attributed to BERT’s bidirectional attention and pretraining on large corpora. Their research
reinforced BERT’s dominance in modern NLP tasks including misinformation detection.

He et al. (2016) – ResNet: Residual Learning for Image Recognition


He and colleagues introduced ResNet, a groundbreaking CNN architecture with residual
connections to solve the vanishing gradient problem. The ResNet50 model became a
standard in visual tasks due to its depth and efficiency. In fake news detection, ResNet50 is
often used to extract high-level image features, helping to identify fake visual content that is
difficult for traditional models to classify.

Devlin et al. (2018) – BERT: Bidirectional Transformers for Language Understanding


Devlin et al. developed BERT, a transformer-based language model that understands word
context in both directions. Its deep attention mechanisms enable nuanced understanding of
text, especially in detecting subtle forms of misinformation such as sarcasm or emotionally
charged content. BERT remains a core component in many fake news detection systems,
including our own.

Talwar and Arora (2023) – Real-Time Detection Systems for Crisis Misinformation
This study addressed the importance of real-time misinformation detection, especially
during public health or political crises. The authors emphasized the need for lightweight,
scalable systems capable of handling large volumes of social media data. They also proposed
the integration of browser plugins and content moderation APIs, setting the direction for
practical deployments of fake news detectors.
Summary and Insights
The reviewed literature highlights a clear transition from unimodal, content-only systems to
multimodal architectures that consider both textual and visual modalities. Research
consensus shows that combining text and image improves accuracy, particularly when posts
include conflicting or manipulated media. Our project builds upon these findings by
developing a BERT–ResNet50 based multimodal system, trained and tested on the Fakeddit
dataset, aiming to contribute a scalable, AI-powered solution to misinformation detection.
CHAPTER III
3. Fundamentals of Fake News & Misinformation

3.1 What is Fake News?

Fake news refers to fabricated or misleading information presented as factual news. It is


designed to manipulate readers, generate sensationalism, or drive specific agendas for
financial, political, or ideological gain. In the digital era, the proliferation of fake news has
accelerated due to the speed and reach of social media platforms. Unlike satire or opinion,
fake news deliberately seeks to misinform the public by presenting false narratives as
credible information.

3.2 Misinformation vs. Disinformation

Understanding the different forms of false information is crucial for building effective
detection systems:

 Misinformation: Incorrect or misleading information shared without malicious


intent. It often arises from misunderstanding, lack of context, or outdated knowledge.
For example, sharing outdated health tips believing they are still accurate is a
common form of misinformation.

 Disinformation: False information deliberately created and distributed to deceive or


mislead. This type of content is typically crafted with a specific agenda in mind, such
as influencing elections, damaging reputations, or spreading propaganda. For
instance, spreading fake election news to manipulate voter behavior is a clear example
of disinformation.

3.3 Types of Misinformation

Misinformation can be broadly classified based on the intent to deceive and the context in
which it is shared:

1. Satire or Parody: Uses humor, irony, or exaggeration to twist the meaning of genuine
content. While often intended to entertain or provide social commentary, it can be
misinterpreted as factual if the context is not clear. For example, satirical news
websites like The Onion create exaggerated, fictional stories that may be taken
seriously if shared without proper context.

2. False Connection: Occurs when headlines, visuals, or captions do not accurately


reflect the actual content of the article or media. This disconnect can mislead readers,
causing them to form incorrect conclusions about the topic. For instance, using an
unrelated image to create a false association between two events can lead to
significant public misunderstanding.

3. Misleading Content: Involves the selective use of information to frame an issue or


individual in a specific light. It can include cherry-picking data, quoting out of
context, or presenting one side of a complex issue as the full truth. This approach is
often used in political propaganda or advertising to influence public perception and
reinforce existing biases.

4. False Context: Presents genuine content in a misleading context. This can be as


simple as sharing old news stories as current events or using historical photos to
misrepresent modern situations. This is a common tactic during crises, when rapid
information dissemination can lead to widespread misunderstanding and panic.

5. Imposter Content: Genuine sources are impersonated to lend credibility to false


information. This can include creating fake news websites or social media profiles
that mimic legitimate news outlets, thereby misleading readers into trusting the
content as authentic.

6. Manipulated Content: Involves the deliberate alteration of genuine information or


imagery to deceive viewers. Examples include photoshopped images, deepfake
videos, or edited audio recordings. These techniques can significantly distort reality,
making detection particularly challenging.

7. Fabricated Content: The most extreme form of misinformation, where entirely false
content is created with the intent to deceive and cause harm. Unlike other types,
fabricated content lacks any basis in fact and is often designed to provoke strong
emotional reactions or generate clicks for financial gain.
3.4 Why Multimodal Detection?

The vast majority of misinformation spreads through content that mixes text and images. For
instance, a real image might be paired with deceptive text to create false impressions. A
robust detection system must therefore examine not only the text or the image in isolation but
also how they relate to one another. Key benefits of multimodal detection include:

 Better Detection of Inconsistencies: It can identify mismatches between textual


claims and visual evidence, improving accuracy.

 Improved Classification Performance: Multimodal models outperform text-only or


image-only models by integrating context from both modalities.

 Contextual Understanding: These models can mimic human judgment more


effectively, understanding the broader context in which the content is presented.

3.5 Impact of Misinformation

The impact of misinformation can be severe and wide-ranging, affecting individuals,


societies, and entire nations. It can undermine public trust, distort democratic processes, and
even incite violence. The primary areas of impact include:

 Political Impact: Misinformation can influence elections, fuel political polarization,


and destabilize governments. False political narratives can shape public opinion and
alter voting behaviors, leading to significant political consequences.

 Health Impact: The spread of false medical information can have dire consequences,
as seen during the COVID-19 pandemic. Misinformation about vaccines, treatments,
and preventive measures can increase public health risks and reduce trust in scientific
institutions.

 Economic Impact: Misinformation can lead to financial losses by manipulating stock


markets, spreading false financial news, or promoting fraudulent schemes. It can also
damage the reputation of companies and disrupt entire industries.
 Social Impact: Misinformation can exacerbate social tensions, promote hate speech,
and fuel ethnic or religious conflicts. It can also contribute to the spread of dangerous
conspiracy theories and undermine social cohesion.

3.6 Challenges in Misinformation Detection

Detecting misinformation is a complex and evolving challenge, driven by the rapid growth of
digital content and the sophisticated tactics used by malicious actors. Key challenges include:

 Multimodal Nature: Misinformation often includes both misleading text and


manipulated images, requiring systems that can analyze multiple data types
simultaneously.

 Rapid Spread: Misinformation can spread quickly on social media, making real-time
detection critical but difficult to achieve.

 Language Complexity: Sarcasm, satire, and local language nuances can make
detection challenging, as AI systems may struggle to interpret context accurately.

 Adversarial Content: The use of AI-generated deepfakes and synthetic media is


making misinformation harder to detect as these techniques become more advanced.

 Scalability and Speed: Effective detection requires scalable systems that can process
large volumes of data without significant delays.
CHAPTER IV

4.EXISTING METHODS

Detecting misinformation and fake news has become a critical area of research in the era of
rapidly growing online content. As the influence of social media and digital platforms
continues to rise, a variety of methods have been explored to identify misleading information.
These approaches range from traditional machine learning models to more advanced deep
learning techniques. Below, we provide an overview of existing methods employed for
misinformation and fake news detection, along with their respective limitations:
4.1 Text-Based Approaches
Traditional methods for fake news detection predominantly focus on analyzing the text
content itself. These models rely on feature extraction techniques such as TF-IDF, Bag of
Words, and n-grams to capture key terms and patterns in the text. Classification algorithms
like Logistic Regression, Naive Bayes, and Support Vector Machines (SVM) are then applied
to the extracted features.
Limitations:
 Shallow Analysis: These methods often fail to capture deeper semantic meaning and
complex relationships within the text. For example, they struggle with understanding
sarcasm, irony, or subtle contextual cues that may indicate misinformation.
 Vulnerability to Stylistic Manipulation: Text-based models are susceptible to
manipulation through stylistic writing techniques like satire, which may deceive these
models into classifying misleading content as legitimate.
 Contextual Challenges: These approaches do not consider the broader context in
which the content is published, such as the source's credibility or historical patterns of
misinformation, making them less effective in some cases.
4.2 Image-Based Approaches
As misinformation increasingly involves visuals, the role of image analysis in fake news
detection has become more significant. Convolutional Neural Networks (CNNs), particularly
architectures such as VGG and ResNet, are commonly used for detecting manipulations and
inconsistencies in images. These models analyze image patterns, including visual artifacts,
inconsistencies, and unusual features, to identify whether an image has been altered to
mislead viewers.
Limitations:
 Lack of Text Context: Image-based approaches cannot understand the context
provided by the accompanying text. In cases where images are used with misleading
or false narratives, these models may fail to detect the misinformation without
considering the full multimodal content.
 Complexity of Image Manipulations: Advanced image manipulation techniques
(e.g., deepfakes or subtle editing) can be difficult for CNN-based models to detect,
especially when the alterations are subtle or sophisticated.
 Generalization Issues: CNNs trained on specific datasets may struggle to generalize
to images from diverse sources or domains, reducing their effectiveness in real-world
applications.
4.3 Ensemble and Hybrid Models
To overcome the limitations of text-based and image-based models, some studies have turned
to ensemble methods that combine multiple classifiers. Ensemble techniques such as Random
Forests and Gradient Boosting Machines are employed to process manually engineered
features. These features may include user metadata, post structure, linguistic cues, and other
indicators that provide contextual information about the content.
Limitations:
 Dependence on Manual Feature Engineering: While ensemble models improve
accuracy, they still rely on handcrafted features, which require expert knowledge and
may not fully capture all aspects of misinformation, especially when it involves
nuanced or less obvious patterns.
 Complexity and Computational Cost: These models can be computationally
expensive due to the need for processing multiple classifiers and handling large
amounts of manually extracted features. This can limit their scalability and real-time
application.
 Performance on Multimodal Misinformation: Although ensemble models combine
various classifiers, they still struggle when it comes to handling complex, multimodal
misinformation—such as content where text and image interplay significantly—
because they do not integrate both modalities effectively.
4.4 Multimodal Deep Learning Approaches
The most recent advancements in fake news detection focus on multimodal deep learning
techniques that process both text and image data simultaneously. These models combine the
strengths of natural language processing (NLP) and computer vision to capture both the
textual and visual components of misleading content. BERT, a powerful model for contextual
text embeddings, is commonly used to extract rich semantic information from text.
Meanwhile, CNNs such as ResNet50 are employed to capture visual features from images.
Limitations:
 Data and Computational Requirements: Multimodal models require large, labeled
datasets containing both text and image pairs, which can be difficult to obtain.
Additionally, these models are computationally intensive and may require significant
hardware resources for training and inference.
 Model Complexity: The integration of multiple modalities (text and image) increases
the complexity of the model, making it more difficult to train and tune. Fine-tuning
these models requires careful balancing of both modalities to ensure optimal
performance.
 Interpretability Issues: While these models outperform unimodal approaches in
terms of accuracy, they may lack interpretability. Understanding how the model
arrives at a decision can be challenging, which can be problematic in domains like
journalism or healthcare where transparency is crucial.
 Contextual Fusion Challenges: In some cases, the fusion of text and image data may
not be effective enough to capture all the subtleties of misinformation, particularly
when the misleading nature of the content is not immediately apparent from either
modality in isolation.
CHAPTER V
5.PROPOSED METHOD

5.1 Proposed Approach

Our proposed method for detecting misinformation and fake news leverages deep learning-
based multimodal analysis that integrates textual and visual content. The core objective is to
develop an intelligent system that not only detects misinformation based on individual
modalities (text or image) but also understands the semantic alignment between them,
improving accuracy in real-world social media scenarios.

The system is designed to analyze posts that include both text and images—common in fake
news—and classify them as real or fake based on the coherence of the combined information.
This approach addresses the shortcomings of unimodal systems and moves toward more
context-aware and content-robust detection mechanisms.

5.2 Dataset and Preprocessing

We utilize the Fakeddit dataset, a widely recognized benchmark dataset that includes
multimodal posts labelled for fake/real classification. It includes a diverse collection of posts
with text, images, and metadata.
Text Preprocessing:
 Removal of special characters and irrelevant tokens.
 Tokenization and sequence padding.
 Text is processed using the BERT tokenizer to match the input requirements of the
BERT model.
Image Preprocessing:
 All images are resized to 224×224 pixels.
 Normalization is applied using ImageNet standards, ensuring compatibility with the
ResNet50 model.
Balancing:
 To prevent model bias, we ensure the dataset is balanced with an equal number of
fake and real instances.

5.3 Architecture of the Proposed System

Our system follows a dual-stream architecture—processing text and image data separately
and then fusing the features for final classification. This modular architecture enhances both
performance and interpretability.
A. Text Feature Extraction – BERT
 We employ the bert-base-uncased model to extract rich contextual embeddings.
 The [CLS] token output, a 768-dimensional vector, represents the entire input
sentence.
 This enables the model to understand nuanced language features, sarcasm, or
misleading phrasing.
B. Image Feature Extraction – ResNet50
 We use a pre-trained ResNet50 model with the classification head removed.
 The global average pooling layer output is extracted as a 2048-dimensional feature
vector.
 This captures spatial and visual patterns that might indicate manipulated or
misleading imagery.

C. Feature Fusion and Classification


 The text and image feature vectors are concatenated into a 2816-dimensional
multimodal vector.
 This vector is passed through a fully connected neural network for final classification.
Classification Network:
 FC Layer 1: 1024 units, ReLU activation, Dropout for regularization.
 FC Layer 2: 256 units, ReLU activation.
 Output Layer: 2 units, Softmax activation for binary classification (Real or Fake).

5.4 Training and Optimization


To train the system effectively, the following parameters and strategies are used:
 Loss Function: CrossEntropyLoss – suitable for binary classification tasks.
 Optimizer: Adam – learning rate set at 1e-4 for stable convergence.
 Batch Size: 64
 Epochs: 10 – determined experimentally for optimal performance.
 Evaluation Metrics: Accuracy, Precision, Recall, F1-score – to evaluate different
aspects of model performance.

5.5 Evaluation Environment

Training and evaluation were conducted using a controlled, high-performance computing


setup to ensure scalability and reproducibility:
 Hardware: NVIDIA RTX 3090 GPU, 32GB RAM
 Software Stack:
o Python 3.10
o PyTorch deep learning framework
o HuggingFace Transformers for BERT integration
o OpenCV for image preprocessing and manipulation

5.6 Advantages of the Proposed Approach

 Multimodal Contextual Understanding: The fusion of textual and visual cues enables
the model to detect semantic inconsistencies—e.g., an image depicting one event
accompanied by text describing another.

 High Accuracy and Robustness: The use of pretrained models (BERT and ResNet50)
and a carefully designed fusion mechanism significantly improves classification
performance compared to unimodal baselines.

 Scalability: The architecture supports large-scale deployment, suitable for analyzing


real-time streams of social media content.
 Extendability: The system can be adapted for:
o Video-based misinformation using models like ViT or video transformers

CHAPTER VI
6. Dataset Exploration

6.1 Dataset Selection

For this project, we employed the Fakeddit dataset, a widely recognized benchmark
specifically tailored for multimodal misinformation detection. The dataset was curated
from Reddit, offering a rich and diverse set of social media posts that include both textual
content and corresponding visual media (images).

The reasons for selecting the Fakeddit dataset are as follows:

 Multimodal Data Support: Each data sample includes text (post title) and image,
allowing for simultaneous learning from both modalities.
 Multi-level Annotations: It provides three granular labeling levels — binary
(real/fake), multiclass (6 categories), and multilabel (multiple tags).
 Large-Scale: With over 1 million Reddit posts, the dataset offers a high volume of
labeled data, which is critical for training deep learning models without risking
overfitting.

This makes Fakeddit particularly well-suited for deep neural architectures like BERT and
ResNet, which require large, diverse datasets for optimal generalization.

6.2 Dataset Structure and Attributes

Each data point in the dataset is organized into the following key attributes:

 Post ID: A unique alphanumeric identifier used to track each Reddit post.
 Title / Text Content: The main user-submitted content, often in the form of a
headline or short paragraph.
 Image URL: A direct link to the image attached to the Reddit post.
 Label: Ground-truth annotation of the post’s credibility:
o 1 = Real
o 0 = Fake

6.3 Data Preprocessing

To make the dataset compatible with our deep learning architecture, we performed multiple
preprocessing steps for both text and image components.

Text Preprocessing:

 Cleaning: Removed HTML tags, special characters, hyperlinks, and non-ASCII text.
 Lowercasing: All words were converted to lowercase to maintain consistency in
tokenization.
 Tokenization: Applied the BERT tokenizer from Hugging Face Transformers, which
preserves sub-word representations and uses WordPiece encoding.
 Padding & Truncation: Sentences were truncated or padded to a maximum
sequence length of 128 tokens to ensure uniform input size for the BERT model.
 Stopword Retention: Unlike traditional NLP pipelines, stopwords were retained,
since BERT benefits from full sentence context.

Image Preprocessing:

 Download & Validation: Image URLs were validated. Corrupted or broken URLs
were discarded.
 Resizing: All images were resized to 224×224 pixels, the input requirement for
ResNet50.
 Normalization: Pixel values were normalized to the ImageNet mean and standard
deviation to align with pre-trained ResNet expectations.
 Color Format Conversion: Ensured all images were in RGB format, converting
from grayscale or CMYK if necessary.

Cleaning and Balancing:

 Data Filtering: Entries missing either text or image content were removed.
 Class Balancing: The original dataset had class imbalance. We applied
undersampling to the majority class, resulting in a balanced dataset with 50% fake
and 50% real posts.

6.4 Dataset Statistics (After Cleaning)

Metric Value
Total Posts Used 40,000
Real Posts 20,000
Fake Posts 20,000
Image-Text Paired Samples 100%
Image Size (Input to CNN) 224x224
Max Tokens (BERT Input) 128

This balance ensures fair training and testing, minimizing bias towards either class during
evaluation.

6.5 Dataset Challenges

While Fakeddit is a powerful resource, we encountered several challenges during


preprocessing:

1. Broken Image Links: Some image URLs no longer existed or led to 404 errors. We
resolved this by filtering such samples.
2. Text-Image Mismatch: In certain samples, the image and text appeared semantically
unrelated — a known trait in fake news used to create false context. This made it
essential to use multimodal learning rather than relying on a single modality.
3. Informal and Noisy Text: Social media language includes slang, emojis, sarcasm,
abbreviations, and grammatical errors. This necessitated the use of context-aware
models like BERT, which can handle informal language.
4. Dataset Size vs. Compute Limits: Although Fakeddit offers over 1 million posts,
resource constraints restricted us to a subset of 40,000 carefully filtered and balanced
samples.
CHAPTER VII
7.IMPLEMENTATION

4.1 Proposed Approach

Our proposed method for detecting misinformation and fake news leverages deep learning-
based multimodal analysis that integrates textual and visual content. The core objective is to
develop an intelligent system that not only detects misinformation based on individual
modalities (text or image) but also understands the semantic alignment between them,
improving accuracy in real-world social media scenarios.

The system is designed to analyze posts that include both text and images—common in fake
news—and classify them as real or fake based on the coherence of the combined information.
This approach addresses the shortcomings of unimodal systems and moves toward more
context-aware and content-robust detection mechanisms.

4.2 Dataset and Preprocessing

We utilize the Fakeddit dataset, a widely recognized benchmark dataset that includes
multimodal posts labelled for fake/real classification. It includes a diverse collection of posts
with text, images, and metadata.
Text Preprocessing:
 Removal of special characters and irrelevant tokens.
 Tokenization and sequence padding.
 Text is processed using the BERT tokenizer to match the input requirements of the
BERT model.
Image Preprocessing:
 All images are resized to 224×224 pixels.
 Normalization is applied using ImageNet standards, ensuring compatibility with the
ResNet50 model.
Balancing:
 To prevent model bias, we ensure the dataset is balanced with an equal number of
fake and real instances.

CHAPTER VIII
8.RESULTS AND ANALYSIS

8.1 Evaluation Metrics

To comprehensively evaluate the performance of our proposed multimodal fake news


detection system, we employed a set of standard classification metrics. These metrics
provide a quantitative assessment of the model’s effectiveness in distinguishing
between real and fake news. Each metric captures a different aspect of model
performance, especially relevant when dealing with potentially imbalanced or noisy
data, such as social media content.

1. Accuracy

Formula:

Accuracy = (TP + TN) / (TP + TN + FP + FN)

Explanation:
Accuracy is the most commonly used metric for classification tasks. It measures the
overall proportion of correctly classified instances (both real and fake) out of the total
instances. While useful as a high-level measure, accuracy can be misleading if the
dataset is imbalanced — for example, if there are significantly more real news
samples than fake ones.

 TP (True Positive): Fake news correctly predicted as fake


 TN (True Negative): Real news correctly predicted as real
 FP (False Positive): Real news incorrectly predicted as fake
 FN (False Negative): Fake news incorrectly predicted as real

2. Precision

Formula:
Precision = TP / (TP + FP)
Explanation:
Precision quantifies how many of the samples that the model predicted as fake are
actually fake. It is particularly important in applications where false positives carry a
higher cost, such as falsely flagging legitimate news as misinformation. High
precision ensures that the model maintains credibility and reduces wrongful
misclassifications.

3. Recall (Sensitivity)

Formula:

Recall = TP / (TP + FN)


Explanation:
Recall measures the model's ability to detect actual fake news correctly. It answers the
question: "Of all the fake news present in the dataset, how many did the model
successfully identify?" This is a critical metric in fake news detection, where missing
a fake article (false negative) can lead to the unchecked spread of harmful or
misleading information.

4. F1 Score

Formula:

F1 = 2 × (Precision × Recall) / (Precision + Recall)

Explanation:
The F1 Score is the harmonic mean of precision and recall, providing a single metric
that balances both. It is especially useful when there is an uneven class distribution, or
when both false positives and false negatives are equally critical. A high F1 Score
indicates that the model achieves both high precision and high recall, which is ideal
for robust misinformation detection.

5. Confusion Matrix

Explanation:
The confusion matrix is a 2x2 table that provides a complete breakdown of how the
classification model performs with respect to each class. It allows visual inspection of
where the model makes mistakes and helps identify potential biases.

Predicted Fake Predicted Real


Actual Fake True Positive (TP) False Negative (FN)
Actual Real False Positive (FP) True Negative (TN)

Usage:
The matrix enables a deeper understanding of the model's strengths and weaknesses.
For instance, a model that misclassifies many real news items as fake would have a
high number of false positives, which may indicate an over-aggressive detection
policy.

9.2 Model Performance

Metric | Value
-----------------|-------
Accuracy | 93.8%
Precision | 93.4%
Recall | 94.2%
F1 Score | 93.8%
AUC Score | 0.95
CHAPTER IX
9.FUTURE SCOPE

9.1 Evaluation Metrics

To comprehensively evaluate the performance of our proposed multimodal fake news


detection system, we employed a set of standard classification metrics. These metrics
provide a quantitative assessment of the model’s effectiveness in distinguishing
between real and fake news. Each metric captures a different aspect of model
performance, especially relevant when dealing with potentially imbalanced or noisy
data, such as social media content.

1. Accuracy

Formula:

Accuracy = (TP + TN) / (TP + TN + FP + FN)

Explanation:
Accuracy is the most commonly used metric for classification tasks. It measures the
overall proportion of correctly classified instances (both real and fake) out of the total
instances. While useful as a high-level measure, accuracy can be misleading if the
dataset is imbalanced — for example, if there are significantly more real news
samples than fake ones.

 TP (True Positive): Fake news correctly predicted as fake


 TN (True Negative): Real news correctly predicted as real
 FP (False Positive): Real news incorrectly predicted as fake
 FN (False Negative): Fake news incorrectly predicted as real

2. Precision

Formula

CHAPTER X
10.FUTURE SCOPE

The proposed model lays a solid foundation for fake news detection, and several
enhancements can be considered for future work.While the current system performs
well in detecting fake news using multimodal data, there are several promising
directions for future enhancement:

 Improving Accuracy with Advanced AI Models


Leveraging newer transformer-based models like RoBERTa, DeBERTa, or
multimodal models such as CLIP can improve both text and image understanding.
These models offer better contextual awareness and can help capture more subtle
forms of misinformation.
 Real-Time Fake News Detection
Deploying the system for real-time use on social media platforms would allow
immediate identification of fake news. This would involve optimizing the model for
faster inference and integrating it with APIs that monitor live content.
 Multilingual Support
Expanding the system to support multiple languages using models like mBERT or
XLM-R would enable detection of misinformation across different regions, increasing
the system’s global reach and effectiveness.
 Deepfake Detection
Integrating deepfake detection capabilities can help identify manipulated or AI-
generated images and videos, which are becoming a growing source of
misinformation.
 Fact-Checking Integration
Collaborating with external fact-checking organizations and APIs can help verify the
authenticity of claims in real time, adding an extra layer of reliability to the detection
process.

REFERENCES

[1] Yao et al., MOCHEG Dataset, 2021

[2] Devlin et al., BERT: Pre-training of Deep Bidirectional Transformers, 2018

[3] He et al., Deep Residual Learning for Image Recognition, CVPR 2016

[4] Fakeddit Dataset: [https://github.com/entitize/FakeNewsNet](https://github.com/entitize/


FakeNewsNet)

[5] Sahar Tahmasebi et al., Multimodal Misinformation Detection, ACM CIKM 2024

You might also like