My Final year Project Collab -
https://colab.research.google.com/drive/1oBFKigw8OVusXc8hLP8_Xyc5Jb-HQMfK?
authuser=1#scrollTo=yzJpofrja8YU&uniqifier=2
**********Customer churn prediction - **********
1. https://www.kaggle.com/code/abdalrhamnhebishy/churn-prediction
2. https://www.kaggle.com/code/mostafahabibi1994/customer-churn-eda-part-1
3. https://www.kaggle.com/code/abhashrai/customer-retention-analysis-prediction
4. https://github.com/faisalghifari17/customer-churn-prediction/blob/main/
Customer_Churn_Prediction_Script.ipynb
https://github.com/ugis22/churn_model/blob/master/churn_analysis.ipynb
https://www.kaggle.com/code/ssyyhh/eda-of-insurance-with-prediction-of-
response#Submission-Data-information
5. https://www.kaggle.com/code/akathuria07/underwriter-for-demo
6. https://github.com/Abdullahw72/Prudential-Life-Insurance-Assessment/blob/
master/Prudential%20Life%20Insurance%20Prediction.ipynb
7. https://www.kaggle.com/code/ithesisart/prudential-life-insurance-assessment-
edaml#Data-Ingestion
chrome-extension://efaidnbmnnnibpcajpcglclefindmkaj/https://www.diva-portal.org/
smash/get/diva2:1784294/FULLTEXT02
Let's dive into some questions to uncover insights from this dataset:
1.Age Distribution and Churn Rate:
What is the distribution of ages among your customers? Is there a relationship
between age and churn rate?
2.Gender Analysis:
What is the gender distribution of your customers? Is there any noticeable
difference in churn rates between genders?
3.Tenure and Churn:
How long, on average, have your customers been with your service (tenure)? Is there
any pattern between tenure and churn?
4.Usage Frequency:
How frequently do customers use your service, on average? Does usage frequency
affect churn rates?
5.Support Calls and Churn:
What is the average number of support calls made by customers? Is there any
correlation between support calls and churn?
6.Payment Delay:
What is the typical payment delay among customers? Does payment delay influence
churn behavior?
7.Subscription Type and Contract Length:
What are the different subscription types and their proportions? Do customers with
different subscription types have different churn rates? How does contract length
relate to churn?
8.Total Spend and Churn:
What is the average total spend of customers? Is there any correlation between
total spend and churn?
9.Last Interaction:
How recently did customers interact with your service? Is there any connection
between the recency of the last interaction and churn?
10.Churn Analysis:
What is the overall churn rate in your dataset? Are there specific patterns or
trends that stand out in the churned customers?
11.Correlations:
Are there any notable correlations between different features and churn? Could
multicollinearity between features affect your analysis?
numeric_features= df.select_dtypes(include=[np.number])
print(len(numeric_features.columns))
# Compute the correlation matrix for numeric features
correlation = numeric_features.corr()
# Plot the full correlation heatmap
f,ax = plt.subplots(figsize = (40,30))
plt.title("Correlation of numeric features", y=1, size=16)
sns.heatmap(correlation, square=True, vmax=0.8)
k=20
cols = correlation.nlargest(k, 'Response')['Response'].index
print(cols)
cm=np.corrcoef(df[cols].values.T)
f, ax =plt.subplots(figsize = (16,14))
sns.heatmap(cm, vmax=.8, linewidth =0.01, square =True, annot = True, cmap=
'viridis',linecolor="white", fmt=".2f",
xticklabels = cols.values, annot_kws={'size':12},yticklabels
=cols.values)
# Compute the correlation matrix
numeric_features= df.select_dtypes(include=[np.number])
print(len(numeric_features.columns))
# Compute the correlation matrix for numeric features
corr_matrix = numeric_features.corr()
# Calculate the average correlation of each feature with all other features
avg_corr = corr_matrix.abs().mean().sort_values(ascending=False)
top_features = avg_corr.head(20).index
# Subset the correlation matrix with the top features
subset_corr = corr_matrix.loc[top_features, top_features]
plt.figure(figsize=(14, 10))
sns.heatmap(subset_corr, cmap='coolwarm', annot=True, fmt=".2f", vmin=-1, vmax=1,
linewidths=.5)
plt.title('Correlation heat map')
plt.show()
# Calculate correlations with the 'Response' column
if 'Response' in df.columns:
correlations = numeric_features.corr()['Response'].sort_values(ascending=False)
# Extract the top 10 features with highest positive and negative correlations
top_correlations = pd.concat([correlations.head(11), correlations.tail(10)])
# Plotting the correlations
plt.figure(figsize=(14, 8))
top_features = top_correlations.drop('Response') # Drop 'Response' from the
plot
colors = top_features.apply(lambda x: 'green' if x > 0 else 'red') # Different
colors for positive/negative correlations
top_features.plot(kind='bar', color=colors)
plt.title('Top Correlations with Response Variable', fontsize=16)
plt.xlabel('Features', fontsize=14)
plt.ylabel('Correlation Coefficient', fontsize=14)
plt.grid(axis='y', linestyle='--', alpha=0.7)
for index, value in enumerate(top_features):
plt.text(index, value, f"{value:.2f}", ha='center', va='bottom' if value >
0 else 'top', fontsize=10)
plt.xticks(rotation=45, ha='right', fontsize=12)
plt.show()
else:
print("The 'Response' column is not present in the DataFrame.")