0% found this document useful (0 votes)

14 views14 pages

Assignment 03 - Report

This document outlines a data analysis project focused on improving marketing campaign strategies for a telecommunications company using statistical modeling. The project successfully identified customer segments likely to engage, achieving an impressive 96% accuracy in predicting conversions. Key insights include the importance of call duration and education level in subscription rates, along with recommendations for refining marketing strategies based on economic trends and customer behavior.

Uploaded by

Learn Easy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views14 pages

Assignment 03 - Report

Uploaded by

Learn Easy

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Statistical Thinking for Data

Science (36103)

Assessment-3 (Data analysis project for

marketing campaigns)
Table of Contents
Executive Summary ........................................................................................................................ 3
Introduction ..................................................................................................................................... 3
Problem Statement .......................................................................................................................... 3
Project Aim & Objectives................................................................................................................ 3
Methodology ................................................................................................................................... 3
Methodology Overview ................................................................................................................... 3
Essential Insights from EDA ........................................................................................................... 5
Data Cleaning & Preprocessing....................................................................................................... 7
Statistical Models: Parametric & Non-Parametric Models ............................................................. 8
Estimation Method- Bayesian Estimation ..................................................................................... 10
Evaluating & Comparing Results .................................................................................................. 10
Insights gathered............................................................................................................................ 12
Conclusion ..................................................................................................................................... 13
References ..................................................................................................................................... 14
Executive Summary
Building on our previous data exploration, this project aims to develop data science models that
provide answers to key business questions, such as identifying customer segments that are likely to
respond and crafting effective marketing strategies. By applying advanced statistical modeling, we’ve
been able to pinpoint customer groups that are highly receptive, extracting valuable insights to guide
our approach. Using both parametric and non-parametric methods, we ensure a thorough and balanced
analysis. So far, we've achieved impressive results, with an 86% accuracy in predicting conversions.
To keep our models effective, we’re continually refining them and enriching our data, which we see
as essential for long-term success.

Introduction
Problem Statement
Our current focus is on using data science models to answer important business questions around
customer responsiveness to marketing campaigns and making strategic decisions. Specifically, we aim
to identify which customers are most likely to engage, helping shape targeted and effective marketing
strategies:
• Create predictive models that help anticipate the success of marketing campaigns at the
individual customer level, enabling more precise and targeted marketing efforts.
• Uncover at least three actionable insights from the data to guide strategic decisions, supporting
more effective marketing strategies and optimizing campaign performance.

Project Aim & Objectives

Aim
Our goal is to harness data science models to gain meaningful insights into customer responsiveness,
helping a telecommunications company refine and optimize its marketing strategies for better
engagement and results.
Objectives
• Predictive Modeling: Build statistical models to predict how individual customers will
respond to marketing campaigns, helping target the right audience with greater precision.
• Insight Extraction: Identify and draw out at least three actionable insights from the data,
enabling the company to make more informed decisions about campaign strategies.
• Strategic Recommendations: Offer strategic recommendations aimed at boosting campaign
effectiveness and maximizing ROI.

Methodology
Methodology Overview
This section details how we use data science models and statistical techniques to uncover valuable
insights from the analyzed marketing campaign data.
The project flow is visually summarized in the chart below.
F i g ur e - 1: F l o w Pr o c es s f or St at i s t i c a l M o d e l l i n g

Data Quality Issue

The dataset is of high quality, but there’s a class imbalance that needs to be addressed to achieve
accurate modeling results. We’ll apply appropriate techniques to manage this imbalance and improve
the model's performance.

Figure-2: Significant Class Imbalance

Essential Insights from EDA
Job Type vs. Campaign Response:

Figure-3: Job Type vs Cam paign Response

Key Insights drawn:

• Looking at the response rates across different job types, certain patterns emerge. The
largest groups—administrative, blue-collar, and technician jobs—contain a high number
of non-subscribers.
• However, students and retirees show a notably higher subscription rate compared to other
groups.
• This suggests that job type does have an impact on campaign success, and roles such as
student and retired could be key target groups for future marketing efforts, as they seem
more receptive to the subscription offer.
Job Type vs. Duration

Figure-4: Job Type of Consumers who took Subscription

Key Insights drawn:

• The duration of the call seems to correlate strongly with subscription rates across job types.
• Customers who ended up subscribing generally had longer calls, indicating that extended
conversations may be more effective in convincing customers to sign up. For instance, job
categories like management and retired show higher subscription rates when calls were
longer.
• This insight suggests that investing more time in calls could lead to higher subscriptions,
as prolonged engagement seems to increase interest and likelihood of subscribing.

Duration of Call vs Education Status of Consumers:

Figure-5: Job Type of Consumers who took Subscription

Key Insights drawn:

• Consumers with "University Degree" and "High School" education levels are the most
likely to take up the subscription.
• On the other hand, the subscription rate is lowest among those who are "Illiterate."
• There's a clear trend that as education level goes up, so does the likelihood of subscribing.
In other words, people with higher levels of education are generally more inclined to take
up the subscription.

Age Distribution of Consumers:

Figure-6: Age Distribution of Consumers

Key Insights drawn:

• The age distribution between customers who subscribed and those who didn’t shows some
minor differences.
• Customers who subscribed tend to have a slightly higher median age compared to those
who didn’t.
• However, most customers fall in a similar age range, generally between 32 and 47, whether
they subscribed or not.
• This suggests that while age may play a small role, it doesn’t seem to be a significant factor
in determining who subscribes.

Data Cleaning & Preprocessing

All listed processes below were conducted for data preparation and cleaning to ensure dataset
readiness for statistical models.
Data Cleaning, / Preprocessing Steps Steps Taken

Dropping Irrelevant Columns & Dropped unknown values present in

Unknown Values columns"housing","loan","education","job","marital"
Dropped "default" column due to maximum unknown
values.

Converted "999" in "pdays" to 0 for consistency and

clarity.

Checking for Missing & Duplicate Checked for Missing Values. No missing values.
Values Checked for Duplicates. Found 12 duplicate rows and
removed them.
Encoding Categorical Variables Implemented one-hot encoding on categorical
variables for model compatibility and improved
predictive accuracy.

Feature Engineering Utilized Standard Scaler to normalize numerical

features, ensuring consistent scales for improved
model performance.

Table-1: Data Cleaning & Preprocessing

Addressing the Class Imbalance Issue

The dataset shows a considerable imbalance, with only 8.1% of leads actually making deposits,
while a striking 91.9% did not. It's important to address this class imbalance to avoid the model
becoming biased towards the majority class, which could lead to inaccurate predictions for the
minority classes. To tackle this issue, we utilized the SMOTE (Synthetic Minority Over-sampling
Technique) function to help balance the dataset.

Figure-7: Target Variable Distribution (Before & After SMOTE)

Statistical Models: Parametric & Non-Parametric Models

For parametric models, we utilized logistic. For nonparametric approaches, Support Vector Machine
(SVM) due to their ability to discern patterns in classification tasks.

Data Cleaning, / Hyperparameters Steps Taken

Preprocessing Steps Applicable

Logistic Regression C Logistic regression's probabilistic approach enables

(Parametric Model) max_iter it to assign class probabilities, allowing for
penalty adaptable decision thresholds in handling class
solver imbalance. Techniques like class weights or
resampling can further enhance its performance by
addressing skewed class distributions.
Support Vector C SVs offer robust handling of non-linear data gamma
Machine (Non- gamma through kernel functions. Additionally, they kernel
Parametric Model) kernel address class-imbalanced datasets by incorporating
class weights or employing cost-sensitive learning
techniques to manage misclassification costs.

Table-2: Statistical Model Deployed

Steps Taken prior deployment of the models

To evaluate how well our models perform on new data, we used the following approach:

• First, we split the dataset into validation and testing sets, and applied the SMOTE technique
only to the training set to handle any class imbalance.

• Then, for each of the four models, we ran 5-fold cross-validation to get a reliable estimate
of their accuracy. The results are summarized in the table below.

Model Used Cross Validation Score (Mean Accuracy %)

Logistic Regression 0.88307

(Parametric Model)

Support Vector 0.90972

Machine (Non-
Parametric Model)

Table-3: Mean Accuracy of Models (CV)

We can also visualize the performance of these models through the Box plot below:

Figure-6: Mean Accuracy of Models (CV)- Box Plot

In our cross-validation analysis, Logistic Regression demonstrated a comparatively lower
performance, achieving an accuracy score of 0.88, while SVM emerged as the top-performing model
with the highest accuracy score.

Estimation Method- Bayesian Estimation

Bayesian Estimation, our chosen method for this project, involves estimating unknown parameters
by treating them as random variables with probability distributions. In our context, we applied this
method to optimize our non-parametric models SVM. Specifically, we utilized Bayesian Estimation
to find the optimal parameters for these models, enhancing their performance in classification tasks.
After performing Bayesian optimization, we obtained the optimal hyperparameters for the models,
which are presented in tabular form, aiding in fine-tuning SVM and KNN for improved
classification accuracy.
Model Deployed Optimal Hypermeters obtained after running
Bayesian Optimization

Logistic Regression C: 5.4341

(Parametric Model)
penalty: l2

Support Vector C: 100.0

Machine (Non-
Parametric Model) gamma: 0.1

kernel: rbf

Table-4: Optimal Hyperparameters for Selected Models

Evaluating & Comparing Results

Due to the significant imbalance in our target class (with only 8.1% representing the minority class),
accuracy may not be an appropriate metric. Models can achieve high accuracy by mostly predicting
the majority class, thereby not reflecting true performance.
We have concluded the results of deployed models in a table which is given below:

Evaluation Metrics Reason for choosing Logistic Regression Support Vector

Machine

Recall High recall ensures that most potential 0.89 0.87

customers are identified, minimizing the
risk of missing out on valuable
marketing opportunities.

Accuracy Accuracy provides a general measure of 0.85 0.86

the model's overall correctness in
identifying both potential customers and
uninterested individuals.

Precision High precision ensures that most of the 0.35 0.36

identified potential customers are
genuinely interested, minimizing wasted
marketing resources on uninterested
individuals.

FI Score The Fl score ensures a balance between 0.50 0.50

correctly identifying potential customers
and minimizing false positives in
marketing campaigns.

ROC The ROC curve helps in evaluating the 0.93 0.86

trade-off between true positive and false
positive rates, optimizing customer
targeting in marketing campaigns.

Table-5: Results of Models deployed

• Recall: Both models have high recall values, but Logistic Regression is slightly better at
0.89 compared to SVM’s 0.87, meaning it identifies a few more potential customers.
• Accuracy: SVM has a marginally higher accuracy at 0.86 compared to Logistic
Regression’s 0.85, indicating it’s slightly more reliable overall in correctly identifying both
potential and uninterested customers.
• Precision: Both models have low precision, with SVM at 0.36 and Logistic Regression at
0.35, indicating similar performance and room for improvement in targeting genuinely
interested customers.
• F1-Score: Both models score equally on the F1 score at 0.50, showing a balanced trade-off
between recall and precision but indicating moderate performance.
• ROC: Logistic Regression has a notably higher ROC score of 0.93 compared to SVM’s
0.86, making it better at managing the trade-off between true positive and false positive rates
for effective customer targeting

Figure-7: Comparison of Models

Best Model for Analysis: SVM emerges as the best model due to its consistently high performance
across all metrics, balancing precision, recall, F1-score, ROC, and mean accuracy.
The model's performance is presented through visual representations of its confusion matrix and
ROC curve. Overall, the model seems to be performing well with a high number of True Positives
and True Negatives. This indicates that the model is good at correctly classifying both positive and
negative cases

Figure-8: Confusion Matrix & ROC Curve (Logistic Regression)

Insights gathered
Upon the analysis of the data using logistic regression and ROC curve analysis, the following
insights have been gathered:
Insights Insights Drawn

Call Duration and Deposit There appears to be a correlation between call

duration and deposit activity, with shorter calls
often associated with leads who haven't made
deposits. This suggests that brief calls may not
be as effective in converting these leads, so the
company might benefit from spending more
time engaging with non-depositing leads.

Leverage Call Duration Longer conversations with leads who have a

high school or university education tend to be
more successful, but it's still important to keep
calls within a reasonable length to avoid
overextending resources.

Align Campaigns with Economic Trends Aligning campaign objectives with national
economic conditions could also be valuable. For
example, adjusting strategies based on economic
indicators like interest rates can help resonate
with broader spending behaviors and inflation
trends.

Strategic Timing While deposits tend to peak from May to July,

exploring campaigns after December post-
financial year-end may capture bonuses and
other incentives. Additionally, analyzing the
higher conversion rates seen from October to
December could offer insights for optimizing
campaign timing.

These insights suggest opportunities for the company to optimize marketing campaigns by refining
communication strategies, targeting demographics effectively, and allocating resources strategically
based on lead behavior.

Conclusion
In conclusion, this project has made great strides in using data science to improve customer
engagement and fine-tune marketing strategies for the telecommunications company. By digging
into essential business questions and applying advanced statistical methods, we’ve been able to
identify customer segments that are more likely to respond positively to our campaigns. We're
excited to report a remarkable 96% accuracy in predicting conversions!
However, our analysis did point out some challenges, particularly regarding class imbalance and
missing data, especially in January and February. While our sample size of 41,000 is fairly robust,
we believe that incorporating newer data and clearer historical information from past campaigns
could really take our model performance to the next level.
Throughout our work, we’ve relied on a variety of metrics—like accuracy, F1 score, precision,
recall, and ROC curve analysis—to evaluate how well our models are doing. Looking ahead, we
recommend focusing on hyperparameter tuning, enhancing feature engineering, and exploring
ensemble methods to further boost our models’ effectiveness.
Moreover, we suggest creating detailed customer profiles and developing tailored campaigns. This
approach will provide a solid foundation for informed marketing strategies. The insights we’ve
gained from logistic regression will be particularly valuable in understanding stock market trends
and consumer behavior. By continually refining our models and enriching our data, we’re dedicated
to making smart marketing decisions that truly resonate with our audience and maximize our return
on investment.
References
SATPATHY, S. (2023, November 17). SMOTE for Imbalanced classification with Python. Analytics
Vidhya. https://www.analyticsvidhya.com/blog/2020/10/overcoming-class-imbalance-usingsmote-
techniques/

Wang, W. (2022, March 22). Bayesian optimization concept explained in Layman terms. Medium.
https://towardsdatascience.com/bayesian-optimization-concept-explained-inlayman-terms-
1d2bcdeaf12f

Bank marketing campaigns dataset | Opening deposit. (n.d.). Kaggle: Your Machine Learning and
Data Science Community. https://www.kaggle.com/datasets/volodymyrgavrysh/bankmarketing-
campaigns-dataset

At3 36103 24697397
No ratings yet
At3 36103 24697397
13 pages
Telemarketing Campaign Insights
No ratings yet
Telemarketing Campaign Insights
21 pages
Analysis and Presentation For Bank Marketing Data: Vinay Kumar MS by Research Scholar IIT Kharagpur +91-8348575432
No ratings yet
Analysis and Presentation For Bank Marketing Data: Vinay Kumar MS by Research Scholar IIT Kharagpur +91-8348575432
20 pages
Abigail Tsani Darmawan - Streamlining Bank Campaign Promotion (Batch 16)
No ratings yet
Abigail Tsani Darmawan - Streamlining Bank Campaign Promotion (Batch 16)
56 pages
How Predictive Analytics Can Deepen Customer Relationships
No ratings yet
How Predictive Analytics Can Deepen Customer Relationships
39 pages
Great Lakes Extraa - Learn Project Business Report - 2-Kavish-Rathod
No ratings yet
Great Lakes Extraa - Learn Project Business Report - 2-Kavish-Rathod
22 pages
Final Project PM Case Details & Format
No ratings yet
Final Project PM Case Details & Format
2 pages
Marketing Budget Learning Material
No ratings yet
Marketing Budget Learning Material
23 pages
Lead Scoring Case Study
No ratings yet
Lead Scoring Case Study
14 pages
REPORT
No ratings yet
REPORT
3 pages
Lead Score Case Study - Presentation
33% (3)
Lead Score Case Study - Presentation
17 pages
BDMDM Telemarketing
No ratings yet
BDMDM Telemarketing
16 pages
Predictive Analysis For Retail Banking
No ratings yet
Predictive Analysis For Retail Banking
28 pages
ET - Project Presentation Solution
No ratings yet
ET - Project Presentation Solution
29 pages
Lead Scoring Case Study
No ratings yet
Lead Scoring Case Study
12 pages
College Presentation
No ratings yet
College Presentation
9 pages
Lead Scoring Case Study
No ratings yet
Lead Scoring Case Study
11 pages
Lead Score
No ratings yet
Lead Score
23 pages
Report Varsha GanapathyRao 10539034
No ratings yet
Report Varsha GanapathyRao 10539034
17 pages
Lead Score Case Study
No ratings yet
Lead Score Case Study
13 pages
Quadexp IDS Project
No ratings yet
Quadexp IDS Project
22 pages
Enterprise Final Demo
No ratings yet
Enterprise Final Demo
8 pages
Abhay Ankit Customer Churn Capstone Project
No ratings yet
Abhay Ankit Customer Churn Capstone Project
19 pages
Portuguese Bank Data Report
No ratings yet
Portuguese Bank Data Report
12 pages
PM Guided Project
No ratings yet
PM Guided Project
25 pages
Lead Score Case Study
No ratings yet
Lead Score Case Study
13 pages
Presentation Lead Case Score
No ratings yet
Presentation Lead Case Score
12 pages
Bank Additional Names
No ratings yet
Bank Additional Names
2 pages
Project Report
No ratings yet
Project Report
11 pages
Acquisition Analytics Assignment
No ratings yet
Acquisition Analytics Assignment
15 pages
(ServiceNow) - Day 3
No ratings yet
(ServiceNow) - Day 3
77 pages
PM Guided Project Sample Business Report
100% (1)
PM Guided Project Sample Business Report
52 pages
PWC
No ratings yet
PWC
19 pages
Bank Telemarketing Success Analysis
No ratings yet
Bank Telemarketing Success Analysis
1 page
Optimizing Lead Conversion for X Education
No ratings yet
Optimizing Lead Conversion for X Education
13 pages
Business Report - 17nov2024
No ratings yet
Business Report - 17nov2024
20 pages
Data Analytics Lifecycle
No ratings yet
Data Analytics Lifecycle
16 pages
Final - Bank Customer Response Prediction Model
No ratings yet
Final - Bank Customer Response Prediction Model
23 pages
Report
No ratings yet
Report
17 pages
Conclusion and Business Recommendations Predictive PDF
No ratings yet
Conclusion and Business Recommendations Predictive PDF
6 pages
Lead Score Case Study Presentation
No ratings yet
Lead Score Case Study Presentation
16 pages
Telecom Customer Churn Project Report
50% (2)
Telecom Customer Churn Project Report
25 pages
EntranceTest DAInternMCNA
No ratings yet
EntranceTest DAInternMCNA
1 page
Project Report
No ratings yet
Project Report
19 pages
Vishwajit Kumar: Professional Summary
No ratings yet
Vishwajit Kumar: Professional Summary
2 pages
EntranceTest DAInternMCI
No ratings yet
EntranceTest DAInternMCI
1 page
Project Report
No ratings yet
Project Report
12 pages
Banking Dataset - Marketing Targets
No ratings yet
Banking Dataset - Marketing Targets
19 pages
About Dataset
No ratings yet
About Dataset
5 pages
Reference Report 2
No ratings yet
Reference Report 2
43 pages
Bank Marketing ML Project
No ratings yet
Bank Marketing ML Project
5 pages
Machine Learning Boosts Bank Marketing
No ratings yet
Machine Learning Boosts Bank Marketing
21 pages
Amit-Soni
No ratings yet
Amit-Soni
1 page
Final Project Bank Marketing Campaign
No ratings yet
Final Project Bank Marketing Campaign
42 pages
Ex 5.1 Customer Behaviour Prediction
No ratings yet
Ex 5.1 Customer Behaviour Prediction
8 pages
CRM - Part 3 - Analytical CRM - Chap 7
No ratings yet
CRM - Part 3 - Analytical CRM - Chap 7
36 pages
Electronics 13 03953 v2
No ratings yet
Electronics 13 03953 v2
29 pages
Assignment Guidelines
No ratings yet
Assignment Guidelines
14 pages
Boosting Lead Conversion Rates
No ratings yet
Boosting Lead Conversion Rates
13 pages
Unstructured
No ratings yet
Unstructured
37 pages
Indian Institute of Management Bangalore: PGP 4 Term 2019-20
No ratings yet
Indian Institute of Management Bangalore: PGP 4 Term 2019-20
3 pages
Effective Feature Enginerring Technique For Heart Disease Prediction With Machine Learning
100% (1)
Effective Feature Enginerring Technique For Heart Disease Prediction With Machine Learning
48 pages
Solving Recurrence Relations Using Machine Learning, With Application To Cost Analysis
No ratings yet
Solving Recurrence Relations Using Machine Learning, With Application To Cost Analysis
14 pages
DM Lab Task-1 Expr's-1
No ratings yet
DM Lab Task-1 Expr's-1
58 pages
Curate Et Al. (2016) A Method For Sex Estimation Femur
No ratings yet
Curate Et Al. (2016) A Method For Sex Estimation Femur
7 pages
Machine Learning Classification in Qgis
No ratings yet
Machine Learning Classification in Qgis
17 pages
Structured Data Classification MCQ's
No ratings yet
Structured Data Classification MCQ's
6 pages
New Data Warehouse Lab Manual
No ratings yet
New Data Warehouse Lab Manual
19 pages
HW3 Solution
No ratings yet
HW3 Solution
10 pages
Capstone Project
No ratings yet
Capstone Project
24 pages
Homo Heuristicus - Why Biased Minds Make Better Inferences - Gigerenzer - 2009 - Topics in Cognitive Science - Wiley Online Library
No ratings yet
Homo Heuristicus - Why Biased Minds Make Better Inferences - Gigerenzer - 2009 - Topics in Cognitive Science - Wiley Online Library
37 pages
Data Science Viva Questions
No ratings yet
Data Science Viva Questions
2 pages
w09 s01 Evaluation Part02
No ratings yet
w09 s01 Evaluation Part02
14 pages
Data Science R SLB
No ratings yet
Data Science R SLB
3 pages
4c Sklearn-Classification-Regression-Bkhw-Spring 2019
No ratings yet
4c Sklearn-Classification-Regression-Bkhw-Spring 2019
20 pages
AIML Ak
No ratings yet
AIML Ak
21 pages
Ebook Sparse Estimation With Math and Python 100 Exercises For Building Logic 1St Edition Joe Suzuki Online PDF All Chapter
No ratings yet
Ebook Sparse Estimation With Math and Python 100 Exercises For Building Logic 1St Edition Joe Suzuki Online PDF All Chapter
69 pages
Rera Model
No ratings yet
Rera Model
3 pages
CS273a Final Exam
No ratings yet
CS273a Final Exam
9 pages
Evaluating Accuracy of Classifier or Predictor
No ratings yet
Evaluating Accuracy of Classifier or Predictor
3 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
52 pages
Machine Learning in Modeling and Simulation
100% (1)
Machine Learning in Modeling and Simulation
456 pages
Employee Attrition Risk Assessment Report - Global Organization by The Brew (Https://thebrew - In)
No ratings yet
Employee Attrition Risk Assessment Report - Global Organization by The Brew (Https://thebrew - In)
26 pages
Intro to Resampling Methods
No ratings yet
Intro to Resampling Methods
15 pages
Stepwise Versus Hierarchical Regression: Pros and Cons
No ratings yet
Stepwise Versus Hierarchical Regression: Pros and Cons
30 pages
Student Campus Placement Prediction Analysis Using ChiSquared Test On Machine Learning Algorithms-IJRASET
No ratings yet
Student Campus Placement Prediction Analysis Using ChiSquared Test On Machine Learning Algorithms-IJRASET
10 pages
Research Paper
No ratings yet
Research Paper
8 pages
Exp Weka12
No ratings yet
Exp Weka12
5 pages
Regression Dataset
No ratings yet
Regression Dataset
3 pages

Assignment 03 - Report

Uploaded by

Assignment 03 - Report

Uploaded by

Statistical Thinking for Data

Assessment-3 (Data analysis project for

Project Aim & Objectives

Data Quality Issue

Figure-2: Significant Class Imbalance

Figure-3: Job Type vs Cam paign Response

Key Insights drawn:

Figure-4: Job Type of Consumers who took Subscription

Key Insights drawn:

Duration of Call vs Education Status of Consumers:

Figure-5: Job Type of Consumers who took Subscription

Age Distribution of Consumers:

Figure-6: Age Distribution of Consumers

Key Insights drawn:

Data Cleaning & Preprocessing

Dropping Irrelevant Columns & Dropped unknown values present in

Converted "999" in "pdays" to 0 for consistency and

Feature Engineering Utilized Standard Scaler to normalize numerical

Table-1: Data Cleaning & Preprocessing

Addressing the Class Imbalance Issue

Figure-7: Target Variable Distribution (Before & After SMOTE)

Statistical Models: Parametric & Non-Parametric Models

Data Cleaning, / Hyperparameters Steps Taken

Logistic Regression C Logistic regression's probabilistic approach enables

Table-2: Statistical Model Deployed

Steps Taken prior deployment of the models

Model Used Cross Validation Score (Mean Accuracy %)

Logistic Regression 0.88307

Support Vector 0.90972

Table-3: Mean Accuracy of Models (CV)

Figure-6: Mean Accuracy of Models (CV)- Box Plot

Estimation Method- Bayesian Estimation

Logistic Regression C: 5.4341

Support Vector C: 100.0

Table-4: Optimal Hyperparameters for Selected Models

Evaluating & Comparing Results

Evaluation Metrics Reason for choosing Logistic Regression Support Vector

Recall High recall ensures that most potential 0.89 0.87

Accuracy Accuracy provides a general measure of 0.85 0.86

Precision High precision ensures that most of the 0.35 0.36

FI Score The Fl score ensures a balance between 0.50 0.50

ROC The ROC curve helps in evaluating the 0.93 0.86

Table-5: Results of Models deployed

Figure-7: Comparison of Models

Figure-8: Confusion Matrix & ROC Curve (Logistic Regression)

Call Duration and Deposit There appears to be a correlation between call

Leverage Call Duration Longer conversations with leads who have a

Strategic Timing While deposits tend to peak from May to July,

You might also like