[go: up one dir, main page]

0% found this document useful (0 votes)
9 views7 pages

Perceptron Regression

Uploaded by

Suhani Talreja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views7 pages

Perceptron Regression

Uploaded by

Suhani Talreja
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Department of Computer Science and Engineering AISC Lab[CS3131]

Project Report
Predict Customer Ad Clicks Using Logistic Regression and Gradient Boosting
Overview:
This project focuses on predicting whether a customer will click on an advertisement using
historical data. Leveraging machine learning techniques like Logistic Regression and Gradient
Boosting, the project aims to identify patterns in customer behavior and provide actionable insights
for marketing strategies.
Objectives:
 To analyse customer data and identify the factors influencing ad clicks.
 To build machine learning models (Logistic Regression and Gradient Boosting) for click
prediction.
 To evaluate model performance using appropriate metrics and visualizations.
 To visualize decision boundaries for better interpretability of the models.
Dataset Description:
The dataset consists of customer information and their interaction with ads. Key attributes include:
 Time Spent on Site: Time (in minutes) the user spent on the advertiser’s website.
 Estimated Salary: Customer’s estimated income.
 Clicked: Target variable indicating whether the customer clicked on the ad (1 for clicked, 0
for not clicked).
After preprocessing, non-essential columns like Names, emails, and Country were removed, leaving
only the necessary features for modelling.
Methodology:
The project followed a systematic approach:
1. Data Loading and Preprocessing:
o Data was loaded using Pandas and cleaned by removing irrelevant columns.
o Features were scaled using StandardScaler to standardize values and improve model
performance.
2. Exploratory Data Analysis (EDA):
o Scatter plots, histograms, and box plots were used to visualize relationships between
features and the target variable.
o Key findings:
 A significant correlation exists between time spent on the site and ad clicks.
3. Model Building:

Suhani Talreja 229301425


Department of Computer Science and Engineering AISC Lab[CS3131]

o Logistic Regression: A linear model to predict the binary outcome (clicked or not
clicked).
o Gradient Boosting Classifier: An ensemble model to enhance prediction accuracy.
4. Model Evaluation:
o Metrics like confusion matrix, accuracy, precision, recall, and F1-score were used for
evaluation.
o Decision boundaries were plotted for visualizing model predictions.

Source Code:
import pandas as pd

import numpy as np

import matplotlib.pyplot as plt

import seaborn as sns

from sklearn.preprocessing import StandardScaler

from sklearn.model_selection import train_test_split

from sklearn.linear_model import LogisticRegression

from sklearn.ensemble import GradientBoostingClassifier

from sklearn.metrics import classification_report, confusion_matrix

from matplotlib.colors import ListedColormap

# Setting up theme for plots

def set_plot_theme():

from jupyterthemes import jtplot

jtplot.style(theme='monokai', context='notebook', ticks=True, grid=False)

# Load data

def load_data(filepath):

return pd.read_csv(filepath, encoding='ISO-8859-1')

# Preprocess data

def preprocess_data(data):

data.drop(['Names', 'emails', 'Country'], axis=1, inplace=True)

X = data.drop('Clicked', axis=1).values

y = data['Clicked'].values

return X, y

# Scale features

def scale_features(X):

Suhani Talreja 229301425


Department of Computer Science and Engineering AISC Lab[CS3131]

scaler = StandardScaler()

return scaler.fit_transform(X)

# Split dataset

def split_data(X, y, test_size=0.2):

return train_test_split(X, y, test_size=test_size, random_state=42)

# Train logistic regression model

def train_logistic_regression(X_train, y_train):

model = LogisticRegression()

model.fit(X_train, y_train)

return model

# Train Gradient Boosting model

def train_gradient_boosting(X_train, y_train):

model = GradientBoostingClassifier(n_estimators=100, learning_rate=1.0, max_depth=1, random_state=0)

model.fit(X_train, y_train)

return model

# Evaluate model

def evaluate_model(model, X_test, y_test):

y_pred = model.predict(X_test)

print("Confusion Matrix:")

cm = confusion_matrix(y_test, y_pred)

sns.heatmap(cm, annot=True, fmt="d", cmap="Blues")

plt.show()

print("\nClassification Report:")

print(classification_report(y_test, y_pred))

# Visualize decision boundary

def visualize_boundary(X, y, model, title):

X1, X2 = np.meshgrid(np.arange(start=X[:, 0].min() - 1, stop=X[:, 0].max() + 1, step=0.01),

np.arange(start=X[:, 1].min() - 1, stop=X[:, 1].max() + 1, step=0.01))

plt.contourf(X1, X2, model.predict(np.array([X1.ravel(), X2.ravel()]).T).reshape(X1.shape),

alpha=0.75, cmap=ListedColormap(('magenta', 'blue')))

plt.xlim(X1.min(), X1.max())

plt.ylim(X2.min(), X2.max())

for i, j in enumerate(np.unique(y)):

plt.scatter(X[y == j, 0], X[y == j, 1],

Suhani Talreja 229301425


Department of Computer Science and Engineering AISC Lab[CS3131]

c=ListedColormap(('magenta', 'blue'))(i), label=j)

plt.title(title)

plt.xlabel('Time Spent on Site')

plt.ylabel('Estimated Salary')

plt.legend()

plt.show()

# Main pipeline

def main():

set_plot_theme

# Load dataset

data_path = 'AD_CLICKS_Project\clicks_dataset.csv'

data = load_data(data_path)

# Preprocess data

X, y = preprocess_data(data)

# Scale features

X = scale_features(X)

# Split dataset

X_train, X_test, y_train, y_test = split_data(X, y)

# Train logistic regression

print("Training Logistic Regression...")

lr_model = train_logistic_regression(X_train, y_train)

print("Logistic Regression Evaluation:")

evaluate_model(lr_model, X_test, y_test)

# Visualize decision boundary

visualize_boundary(X_train, y_train, lr_model, "Logistic Regression (Training Set)")

visualize_boundary(X_test, y_test, lr_model, "Logistic Regression (Test Set)")

# Train Gradient Boosting

print("\nTraining Gradient Boosting...")

gb_model = train_gradient_boosting(X_train, y_train)

print("Gradient Boosting Evaluation:")

evaluate_model(gb_model, X_test, y_test)

# Visualize decision boundary

visualize_boundary(X_train, y_train, gb_model, "Gradient Boosting (Training Set)")

visualize_boundary(X_test, y_test, gb_model, "Gradient Boosting (Test Set)")

Suhani Talreja 229301425


Department of Computer Science and Engineering AISC Lab[CS3131]

if __name__ == "__main__":

main()

Output:

Suhani Talreja 229301425


Department of Computer Science and Engineering AISC Lab[CS3131]

Suhani Talreja 229301425


Department of Computer Science and Engineering AISC Lab[CS3131]

Suhani Talreja 229301425

You might also like