This is an industry-style, end-to-end machine learning project to predict customer churn using structured data from a telecom company.
Built with:
- ✅ Clean Python code
- ✅ Modular scripts
- ✅ Reproducible pipeline
- ✅ Logistic regression model
- ✅ Confusion matrix + ROC AUC evaluation
``` churn_prediction_project/ ├── data/ # Raw dataset (CSV) ├── scripts/ # Python modules │ ├── eda.py # EDA & visualizations │ ├── preprocess.py # Data cleaning, encoding, scaling │ ├── train_model.py # Model training & saving │ └── evaluate.py # Evaluation & metrics ├── models/ # Saved model (pkl file) ├── main.py # Full pipeline runner ├── README.md # You're here! ```
- Python 🐍
- Pandas, NumPy
- Scikit-learn
- Matplotlib, Seaborn
- Joblib
- VS Code
- GitHub
-
Clone this repo
-
Create a virtual environment and activate it: ```bash python -m venv venv venv\Scripts\activate ```
-
Install dependencies: ```bash pip install -r requirements.txt ```
-
Download the dataset from Kaggle Telco Churn
Place it in the `data/` folder as `telco_churn.csv` -
Run the full pipeline: ```bash python main.py ```
Model Accuracy: ~78%
Includes confusion matrix, ROC-AUC curve, and full classification report.
- Add a Streamlit-based frontend
- Save predictions to a dashboard
- Try other models (XGBoost, SVM, RandomForest)
Krishna Prasad
ML Learner • Python Enthusiast • Open to Opportunities
GitHub
This project is open-source and free to use under the MIT License.