[go: up one dir, main page]

0% found this document useful (0 votes)
12 views3 pages

Links For Datasets

The document outlines a final year project focused on customer churn prediction, providing various resources and datasets for analysis. It includes a series of questions aimed at uncovering insights related to customer demographics, usage patterns, and churn behavior. Additionally, it describes methods for analyzing correlations between features and the response variable using visualizations such as heatmaps.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views3 pages

Links For Datasets

The document outlines a final year project focused on customer churn prediction, providing various resources and datasets for analysis. It includes a series of questions aimed at uncovering insights related to customer demographics, usage patterns, and churn behavior. Additionally, it describes methods for analyzing correlations between features and the response variable using visualizations such as heatmaps.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 3

My Final year Project Collab -

https://colab.research.google.com/drive/1oBFKigw8OVusXc8hLP8_Xyc5Jb-HQMfK?
authuser=1#scrollTo=yzJpofrja8YU&uniqifier=2

**********Customer churn prediction - **********


1. https://www.kaggle.com/code/abdalrhamnhebishy/churn-prediction
2. https://www.kaggle.com/code/mostafahabibi1994/customer-churn-eda-part-1
3. https://www.kaggle.com/code/abhashrai/customer-retention-analysis-prediction
4. https://github.com/faisalghifari17/customer-churn-prediction/blob/main/
Customer_Churn_Prediction_Script.ipynb

https://github.com/ugis22/churn_model/blob/master/churn_analysis.ipynb

https://www.kaggle.com/code/ssyyhh/eda-of-insurance-with-prediction-of-
response#Submission-Data-information

5. https://www.kaggle.com/code/akathuria07/underwriter-for-demo
6. https://github.com/Abdullahw72/Prudential-Life-Insurance-Assessment/blob/
master/Prudential%20Life%20Insurance%20Prediction.ipynb
7. https://www.kaggle.com/code/ithesisart/prudential-life-insurance-assessment-
edaml#Data-Ingestion

chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/https://www.diva-portal.org/
smash/get/diva2:1784294/FULLTEXT02

Let's dive into some questions to uncover insights from this dataset:
1.Age Distribution and Churn Rate:

What is the distribution of ages among your customers? Is there a relationship


between age and churn rate?

2.Gender Analysis:

What is the gender distribution of your customers? Is there any noticeable


difference in churn rates between genders?

3.Tenure and Churn:

How long, on average, have your customers been with your service (tenure)? Is there
any pattern between tenure and churn?

4.Usage Frequency:

How frequently do customers use your service, on average? Does usage frequency
affect churn rates?

5.Support Calls and Churn:

What is the average number of support calls made by customers? Is there any
correlation between support calls and churn?

6.Payment Delay:
What is the typical payment delay among customers? Does payment delay influence
churn behavior?

7.Subscription Type and Contract Length:

What are the different subscription types and their proportions? Do customers with
different subscription types have different churn rates? How does contract length
relate to churn?

8.Total Spend and Churn:

What is the average total spend of customers? Is there any correlation between
total spend and churn?

9.Last Interaction:

How recently did customers interact with your service? Is there any connection
between the recency of the last interaction and churn?

10.Churn Analysis:

What is the overall churn rate in your dataset? Are there specific patterns or
trends that stand out in the churned customers?

11.Correlations:

Are there any notable correlations between different features and churn? Could
multicollinearity between features affect your analysis?

numeric_features= df.select_dtypes(include=[np.number])
print(len(numeric_features.columns))
# Compute the correlation matrix for numeric features
correlation = numeric_features.corr()

# Plot the full correlation heatmap


f,ax = plt.subplots(figsize = (40,30))
plt.title("Correlation of numeric features", y=1, size=16)
sns.heatmap(correlation, square=True, vmax=0.8)

k=20
cols = correlation.nlargest(k, 'Response')['Response'].index
print(cols)
cm=np.corrcoef(df[cols].values.T)
f, ax =plt.subplots(figsize = (16,14))
sns.heatmap(cm, vmax=.8, linewidth =0.01, square =True, annot = True, cmap=
'viridis',linecolor="white", fmt=".2f",
xticklabels = cols.values, annot_kws={'size':12},yticklabels
=cols.values)

# Compute the correlation matrix


numeric_features= df.select_dtypes(include=[np.number])
print(len(numeric_features.columns))
# Compute the correlation matrix for numeric features
corr_matrix = numeric_features.corr()

# Calculate the average correlation of each feature with all other features
avg_corr = corr_matrix.abs().mean().sort_values(ascending=False)
top_features = avg_corr.head(20).index

# Subset the correlation matrix with the top features


subset_corr = corr_matrix.loc[top_features, top_features]

plt.figure(figsize=(14, 10))
sns.heatmap(subset_corr, cmap='coolwarm', annot=True, fmt=".2f", vmin=-1, vmax=1,
linewidths=.5)
plt.title('Correlation heat map')
plt.show()
# Calculate correlations with the 'Response' column
if 'Response' in df.columns:
correlations = numeric_features.corr()['Response'].sort_values(ascending=False)

# Extract the top 10 features with highest positive and negative correlations
top_correlations = pd.concat([correlations.head(11), correlations.tail(10)])

# Plotting the correlations


plt.figure(figsize=(14, 8))
top_features = top_correlations.drop('Response') # Drop 'Response' from the
plot
colors = top_features.apply(lambda x: 'green' if x > 0 else 'red') # Different
colors for positive/negative correlations
top_features.plot(kind='bar', color=colors)
plt.title('Top Correlations with Response Variable', fontsize=16)
plt.xlabel('Features', fontsize=14)
plt.ylabel('Correlation Coefficient', fontsize=14)
plt.grid(axis='y', linestyle='--', alpha=0.7)
for index, value in enumerate(top_features):
plt.text(index, value, f"{value:.2f}", ha='center', va='bottom' if value >
0 else 'top', fontsize=10)
plt.xticks(rotation=45, ha='right', fontsize=12)
plt.show()
else:
print("The 'Response' column is not present in the DataFrame.")

You might also like