Department of Computer Science and Engineering AISC Lab[CS3131]
Project Report
Predict Customer Ad Clicks Using Logistic Regression and Gradient Boosting
Overview:
This project focuses on predicting whether a customer will click on an advertisement using
historical data. Leveraging machine learning techniques like Logistic Regression and Gradient
Boosting, the project aims to identify patterns in customer behavior and provide actionable insights
for marketing strategies.
Objectives:
To analyse customer data and identify the factors influencing ad clicks.
To build machine learning models (Logistic Regression and Gradient Boosting) for click
prediction.
To evaluate model performance using appropriate metrics and visualizations.
To visualize decision boundaries for better interpretability of the models.
Dataset Description:
The dataset consists of customer information and their interaction with ads. Key attributes include:
Time Spent on Site: Time (in minutes) the user spent on the advertiser’s website.
Estimated Salary: Customer’s estimated income.
Clicked: Target variable indicating whether the customer clicked on the ad (1 for clicked, 0
for not clicked).
After preprocessing, non-essential columns like Names, emails, and Country were removed, leaving
only the necessary features for modelling.
Methodology:
The project followed a systematic approach:
1. Data Loading and Preprocessing:
o Data was loaded using Pandas and cleaned by removing irrelevant columns.
o Features were scaled using StandardScaler to standardize values and improve model
performance.
2. Exploratory Data Analysis (EDA):
o Scatter plots, histograms, and box plots were used to visualize relationships between
features and the target variable.
o Key findings:
A significant correlation exists between time spent on the site and ad clicks.
3. Model Building:
Suhani Talreja 229301425
Department of Computer Science and Engineering AISC Lab[CS3131]
o Logistic Regression: A linear model to predict the binary outcome (clicked or not
clicked).
o Gradient Boosting Classifier: An ensemble model to enhance prediction accuracy.
4. Model Evaluation:
o Metrics like confusion matrix, accuracy, precision, recall, and F1-score were used for
evaluation.
o Decision boundaries were plotted for visualizing model predictions.
Source Code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import GradientBoostingClassifier
from sklearn.metrics import classification_report, confusion_matrix
from matplotlib.colors import ListedColormap
# Setting up theme for plots
def set_plot_theme():
from jupyterthemes import jtplot
jtplot.style(theme='monokai', context='notebook', ticks=True, grid=False)
# Load data
def load_data(filepath):
return pd.read_csv(filepath, encoding='ISO-8859-1')
# Preprocess data
def preprocess_data(data):
data.drop(['Names', 'emails', 'Country'], axis=1, inplace=True)
X = data.drop('Clicked', axis=1).values
y = data['Clicked'].values
return X, y
# Scale features
def scale_features(X):
Suhani Talreja 229301425
Department of Computer Science and Engineering AISC Lab[CS3131]
scaler = StandardScaler()
return scaler.fit_transform(X)
# Split dataset
def split_data(X, y, test_size=0.2):
return train_test_split(X, y, test_size=test_size, random_state=42)
# Train logistic regression model
def train_logistic_regression(X_train, y_train):
model = LogisticRegression()
model.fit(X_train, y_train)
return model
# Train Gradient Boosting model
def train_gradient_boosting(X_train, y_train):
model = GradientBoostingClassifier(n_estimators=100, learning_rate=1.0, max_depth=1, random_state=0)
model.fit(X_train, y_train)
return model
# Evaluate model
def evaluate_model(model, X_test, y_test):
y_pred = model.predict(X_test)
print("Confusion Matrix:")
cm = confusion_matrix(y_test, y_pred)
sns.heatmap(cm, annot=True, fmt="d", cmap="Blues")
plt.show()
print("\nClassification Report:")
print(classification_report(y_test, y_pred))
# Visualize decision boundary
def visualize_boundary(X, y, model, title):
X1, X2 = np.meshgrid(np.arange(start=X[:, 0].min() - 1, stop=X[:, 0].max() + 1, step=0.01),
np.arange(start=X[:, 1].min() - 1, stop=X[:, 1].max() + 1, step=0.01))
plt.contourf(X1, X2, model.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),
alpha=0.75, cmap=ListedColormap(('magenta', 'blue')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y)):
plt.scatter(X[y == j, 0], X[y == j, 1],
Suhani Talreja 229301425
Department of Computer Science and Engineering AISC Lab[CS3131]
c=ListedColormap(('magenta', 'blue'))(i), label=j)
plt.title(title)
plt.xlabel('Time Spent on Site')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()
# Main pipeline
def main():
set_plot_theme
# Load dataset
data_path = 'AD_CLICKS_Project\clicks_dataset.csv'
data = load_data(data_path)
# Preprocess data
X, y = preprocess_data(data)
# Scale features
X = scale_features(X)
# Split dataset
X_train, X_test, y_train, y_test = split_data(X, y)
# Train logistic regression
print("Training Logistic Regression...")
lr_model = train_logistic_regression(X_train, y_train)
print("Logistic Regression Evaluation:")
evaluate_model(lr_model, X_test, y_test)
# Visualize decision boundary
visualize_boundary(X_train, y_train, lr_model, "Logistic Regression (Training Set)")
visualize_boundary(X_test, y_test, lr_model, "Logistic Regression (Test Set)")
# Train Gradient Boosting
print("\nTraining Gradient Boosting...")
gb_model = train_gradient_boosting(X_train, y_train)
print("Gradient Boosting Evaluation:")
evaluate_model(gb_model, X_test, y_test)
# Visualize decision boundary
visualize_boundary(X_train, y_train, gb_model, "Gradient Boosting (Training Set)")
visualize_boundary(X_test, y_test, gb_model, "Gradient Boosting (Test Set)")
Suhani Talreja 229301425
Department of Computer Science and Engineering AISC Lab[CS3131]
if __name__ == "__main__":
main()
Output:
Suhani Talreja 229301425
Department of Computer Science and Engineering AISC Lab[CS3131]
Suhani Talreja 229301425
Department of Computer Science and Engineering AISC Lab[CS3131]
Suhani Talreja 229301425