📰 Fake News Detection

🔍 Overview

In an era where misinformation spreads rapidly, the ability to detect fake news is crucial. This project presents a robust pipeline for detecting fake news articles using machine learning techniques. It encompasses data preprocessing, model training, evaluation, and deployment, ensuring a comprehensive approach to the problem.([GitHub][1])

📁 Project Structure

The repository follows a modular and organized structure inspired by best practices in machine learning project development:

Fake-News-Detection/
├── .dvc/                   # DVC configuration files
├── .github/workflows/      # GitHub Actions workflows for CI/CD
├── docs/                   # Documentation files
├── flask_app/              # Flask application for deployment
├── models/                 # Serialized models and model checkpoints
├── notebooks/              # Jupyter notebooks for exploration and experimentation
├── references/             # Data dictionaries, manuals, and all other explanatory materials
├── reports/                # Generated analysis as HTML, PDF, LaTeX, etc.
├── scripts/                # Utility scripts for various tasks
├── src/                    # Source code for use in this project
│   ├── data/               # Scripts to download or generate data
│   ├── features/           # Scripts to turn raw data into features for modeling
│   ├── models/             # Scripts to train models and then use trained models to make predictions
│   └── visualization/      # Scripts to create exploratory and results-oriented visualizations
├── tests/                  # Unit tests and test datasets
├── .dvcignore              # DVC ignore file
├── .gitignore              # Git ignore file
├── Dockerfile              # Dockerfile for containerization
├── LICENSE                 # License file
├── Makefile                # Makefile with commands like `make data` or `make train`
├── README.md               # Project README
├── deployment.yaml         # Kubernetes deployment configuration
├── dvc.lock                # DVC lock file
├── dvc.yaml                # DVC pipeline configuration
├── params.yaml             # Parameters for experiments
├── requirements.txt        # Python dependencies
├── setup.py                # Makes project pip installable
├── test_environment.py     # Script to test the environment setup
└── tox.ini                 # Tox configuration file

🚀 Getting Started

Prerequisites

Python 3.10
Conda
Git
Docker
AWS CLI
kubectl
eksctl

Installation

Clone the Repository

git clone https://github.com/udaygupta8899/Fake-News-Detection.git
cd Fake-News-Detection

Create and Activate Virtual Environment

conda create -n atlas python=3.10
conda activate atlas

Install Dependencies
```
pip install -r requirements.txt
```

Install Cookiecutter Template

pip install cookiecutter
cookiecutter -c v1 https://github.com/drivendata/cookiecutter-data-science

Initialize DVC
```
dvc init
```

🧪 Experiment Tracking with MLflow and DVC

Set Up MLflow with DagsHub
- Create a new repository on DagsHub.
- Connect your GitHub repository to DagsHub.
- Copy the MLflow tracking URI provided by DagsHub.

Configure MLflow

pip install mlflow dagshub
export MLFLOW_TRACKING_URI=<your_dagshub_tracking_uri>

Run Experiments

Execute your Jupyter notebooks or scripts to log experiments to MLflow.

Track Data and Models with DVC

dvc add data/raw
dvc add models/
git add data/raw.dvc models.dvc .gitignore
git commit -m "Add raw data and models to DVC"

🛠️ Model Development Pipeline

Data Ingestion

Scripts located in src/data/ handle data loading and initial preprocessing.
Data Preprocessing

Further cleaning and preparation are performed using scripts in src/features/.
Feature Engineering

Feature extraction and selection methods are implemented in src/features/.
Model Training

Training scripts utilizing various algorithms are found in src/models/.
Model Evaluation

Evaluation metrics and validation procedures are in src/models/.
Model Registration

Trained models are registered and versioned using MLflow.
Pipeline Execution

The entire pipeline can be executed using DVC:
```
dvc repro
```

🌐 Deployment

Flask Application

Navigate to Flask App Directory
```
cd flask_app
```
Install Flask
```
pip install flask
```
Run the Application
```
flask run
```

Dockerization

Build Docker Image

docker build -t fake-news-detection:latest .

Run Docker Container

docker run -p 5000:5000 fake-news-detection:latest

Kubernetes Deployment

Create EKS Cluster

eksctl create cluster --name fake-news-cluster --region us-east-1 --nodegroup-name standard-workers --node-type t3.medium --nodes 3

Deploy Application
```
kubectl apply -f deployment.yaml
```
Access the Application

Retrieve the external IP:
```
kubectl get svc fake-news-service
```

Access the application at http://<external-ip>:5000.

📊 Monitoring with Prometheus and Grafana

Prometheus Setup
- Launch an EC2 instance.
- Install Prometheus and configure it to scrape metrics from your application.
Grafana Setup
- Launch another EC2 instance.
- Install Grafana and configure it to use Prometheus as a data source.([arXiv][2])
Create Dashboards

Set up dashboards in Grafana to monitor application metrics and performance.

🧪 Testing and CI/CD

Testing

Unit tests are located in the tests/ directory. Run them using:

  pytest tests/

Continuous Integration

GitHub Actions workflows are defined in .github/workflows/ci.yaml to automate testing and deployment processes.

📄 License

This project is licensed under the MIT License. See the LICENSE file for details.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

📰 Fake News Detection

🔍 Overview

📁 Project Structure

🚀 Getting Started

Prerequisites

Installation

🧪 Experiment Tracking with MLflow and DVC

🛠️ Model Development Pipeline

🌐 Deployment

Flask Application

Dockerization

Kubernetes Deployment

📊 Monitoring with Prometheus and Grafana

🧪 Testing and CI/CD

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.dvc		.dvc
.github/workflows		.github/workflows
docs		docs
flask_app		flask_app
models		models
notebooks		notebooks
references		references
reports		reports
scripts		scripts
src		src
tests		tests
.dvcignore		.dvcignore
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Makefile		Makefile
ProjectFlow.md		ProjectFlow.md
README.md		README.md
deployment.yaml		deployment.yaml
dvc.lock		dvc.lock
dvc.yaml		dvc.yaml
params.yaml		params.yaml
projectflow.txt		projectflow.txt
requirements.txt		requirements.txt
setup.py		setup.py
test_environment.py		test_environment.py
tox.ini		tox.ini

License

udaygupta8899/Fake-News-Detection

Folders and files

Latest commit

History

Repository files navigation

📰 Fake News Detection

🔍 Overview

📁 Project Structure

🚀 Getting Started

Prerequisites

Installation

🧪 Experiment Tracking with MLflow and DVC

🛠️ Model Development Pipeline

🌐 Deployment

Flask Application

Dockerization

Kubernetes Deployment

📊 Monitoring with Prometheus and Grafana

🧪 Testing and CI/CD

📄 License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages