7/26/25, 10:44 AM LTI CheckList assignment 1.
ipynb - Colab
import pandas as pd
# Load the dataset
df = pd.read_csv("Customer_Data - Customer_Data.csv")
# Quick overview
print(df.head())
print(df.info())
CustomerID Gender Age Country Subscribed MonthlyIncome Education \
0 1 Male 25.0 India Yes 50000.0 Graduate
1 2 Female 30.0 USA No 60000.0 Post-Graduate
2 3 Female 22.0 UK Yes NaN Undergraduate
3 4 Male 45.0 India No 45000.0 Graduate
4 5 Female NaN Germany Yes 70000.0 NaN
LoyaltyScore PreferredDevice TotalPurchases
0 7.0 Mobile 12
1 8.0 Laptop 15
2 6.0 Tablet 8
3 9.0 NaN 20
4 5.0 Laptop 10
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20 entries, 0 to 19
Data columns (total 10 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 CustomerID 20 non-null int64
1 Gender 18 non-null object
2 Age 17 non-null float64
3 Country 20 non-null object
4 Subscribed 20 non-null object
5 MonthlyIncome 18 non-null float64
6 Education 18 non-null object
7 LoyaltyScore 18 non-null float64
8 PreferredDevice 18 non-null object
9 TotalPurchases 20 non-null int64
dtypes: float64(3), int64(2), object(5)
memory usage: 1.7+ KB
https://colab.research.google.com/drive/1wuNW7PYu7fCu7ar7eUze2tJ5_Z4oLQOW#scrollTo=GndyhzlfJmGd&printMode=true 1/13
7/26/25, 10:44 AM LTI CheckList assignment 1.ipynb - Colab
None
import pandas as pd
# Load the dataset
df = pd.read_csv("Customer_Data - Customer_Data.csv")
# Mean imputation for numerical columns
df['Age'].fillna(df['Age'].mean(), inplace=True)
df['MonthlyIncome'].fillna(df['MonthlyIncome'].mean(), inplace=True)
df['LoyaltyScore'].fillna(df['LoyaltyScore'].mean(), inplace=True)
# Mode imputation for categorical columns
df['Gender'].fillna(df['Gender'].mode()[0], inplace=True)
df['PreferredDevice'].fillna(df['PreferredDevice'].mode()[0], inplace=True)
df['Education'].fillna(df['Education'].mode()[0], inplace=True)
# Optional: Check if all missing values are filled
print("Missing values after imputation:\n", df.isnull().sum())
# Preview the first 5 rows
print("\nCleaned Dataset Preview:")
print(df.head())
Missing values after imputation:
CustomerID 0
Gender 0
Age 0
Country 0
Subscribed 0
MonthlyIncome 0
Education 0
LoyaltyScore 0
PreferredDevice 0
TotalPurchases 0
dtype: int64
Cleaned Dataset Preview:
https://colab.research.google.com/drive/1wuNW7PYu7fCu7ar7eUze2tJ5_Z4oLQOW#scrollTo=GndyhzlfJmGd&printMode=true 2/13
7/26/25, 10:44 AM LTI CheckList assignment 1.ipynb - Colab
CustomerID Gender Age Country Subscribed MonthlyIncome \
0 1 Male 25.000000 India Yes 50000.000000
1 2 Female 30.000000 USA No 60000.000000
2 3 Female 22.000000 UK Yes 56666.666667
3 4 Male 45.000000 India No 45000.000000
4 5 Female 33.352941 Germany Yes 70000.000000
Education LoyaltyScore PreferredDevice TotalPurchases
0 Graduate 7.0 Mobile 12
1 Post-Graduate 8.0 Laptop 15
2 Undergraduate 6.0 Tablet 8
3 Graduate 9.0 Mobile 20
4 Graduate 5.0 Laptop 10
/tmp/ipython-input-9-1909683163.py:7: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through ch
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are se
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] =
df['Age'].fillna(df['Age'].mean(), inplace=True)
/tmp/ipython-input-9-1909683163.py:8: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through ch
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are se
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] =
df['MonthlyIncome'].fillna(df['MonthlyIncome'].mean(), inplace=True)
/tmp/ipython-input-9-1909683163.py:9: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through ch
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are se
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] =
df['LoyaltyScore'].fillna(df['LoyaltyScore'].mean(), inplace=True)
/tmp/ipython-input-9-1909683163.py:12: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through c
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are se
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] =
df['Gender'].fillna(df['Gender'].mode()[0], inplace=True)
https://colab.research.google.com/drive/1wuNW7PYu7fCu7ar7eUze2tJ5_Z4oLQOW#scrollTo=GndyhzlfJmGd&printMode=true 3/13
7/26/25, 10:44 AM LTI CheckList assignment 1.ipynb - Colab
/tmp/ipython-input-9-1909683163.py:13: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through c
The behavior will change in pandas 3 0 This inplace method will never work because the intermediate object on which we are se
# Label Encoding for 'Subscribed' column
df['Subscribed'] = df['Subscribed'].map({'Yes': 1, 'No': 0})
from sklearn.preprocessing import LabelEncoder
# List of categorical columns to encode
categorical_cols = ['Gender', 'Education', 'PreferredDevice', 'Country']
# Apply Label Encoding
le = LabelEncoder()
for col in categorical_cols:
df[col] = le.fit_transform(df[col])
# One-Hot Encoding for Country, PreferredDevice, and Education
df = pd.get_dummies(df, columns=['Country', 'PreferredDevice', 'Education'])
# Calculate percentage of missing data in each column
missing_percentage = df.isnull().mean() * 100
# Display only columns with missing values
missing_percentage = missing_percentage[missing_percentage > 0]
print("Percentage of Missing Data in Each Column:\n")
print(missing_percentage)
Percentage of Missing Data in Each Column:
Series([], dtype: float64)
df.info(memory_usage='deep')
https://colab.research.google.com/drive/1wuNW7PYu7fCu7ar7eUze2tJ5_Z4oLQOW#scrollTo=GndyhzlfJmGd&printMode=true 4/13
7/26/25, 10:44 AM LTI CheckList assignment 1.ipynb - Colab
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20 entries, 0 to 19
Data columns (total 17 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 CustomerID 20 non-null int64
1 Gender 20 non-null int64
2 Age 20 non-null float64
3 Subscribed 20 non-null int64
4 MonthlyIncome 20 non-null float64
5 LoyaltyScore 20 non-null float64
6 TotalPurchases 20 non-null int64
7 Country_0 20 non-null bool
8 Country_1 20 non-null bool
9 Country_2 20 non-null bool
10 Country_3 20 non-null bool
11 PreferredDevice_0 20 non-null bool
12 PreferredDevice_1 20 non-null bool
13 PreferredDevice_2 20 non-null bool
14 Education_0 20 non-null bool
15 Education_1 20 non-null bool
16 Education_2 20 non-null bool
dtypes: bool(10), float64(3), int64(4)
memory usage: 1.4 KB
from sklearn.preprocessing import LabelEncoder
df_label = df.copy()
for col in ['Gender', 'Subscribed']: # Assume Country, PreferredDevice, Education already one-hot encoded
df_label[col] = LabelEncoder().fit_transform(df_label[col])
df_label.info(memory_usage='deep')
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20 entries, 0 to 19
Data columns (total 17 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 CustomerID 20 non-null int64
https://colab.research.google.com/drive/1wuNW7PYu7fCu7ar7eUze2tJ5_Z4oLQOW#scrollTo=GndyhzlfJmGd&printMode=true 5/13
7/26/25, 10:44 AM LTI CheckList assignment 1.ipynb - Colab
1 Gender 20 non-null int64
2 Age 20 non-null float64
3 Subscribed 20 non-null int64
4 MonthlyIncome 20 non-null float64
5 LoyaltyScore 20 non-null float64
6 TotalPurchases 20 non-null int64
7 Country_0 20 non-null bool
8 Country_1 20 non-null bool
9 Country_2 20 non-null bool
10 Country_3 20 non-null bool
11 PreferredDevice_0 20 non-null bool
12 PreferredDevice_1 20 non-null bool
13 PreferredDevice_2 20 non-null bool
14 Education_0 20 non-null bool
15 Education_1 20 non-null bool
16 Education_2 20 non-null bool
dtypes: bool(10), float64(3), int64(4)
memory usage: 1.4 KB
df = pd.read_csv("Customer_Data - Customer_Data.csv")
# Fill missing values
df['Age'].fillna(df['Age'].mean(), inplace=True)
df['MonthlyIncome'].fillna(df['MonthlyIncome'].mean(), inplace=True)
df['LoyaltyScore'].fillna(df['LoyaltyScore'].mean(), inplace=True)
df['Gender'].fillna(df['Gender'].mode()[0], inplace=True)
df['PreferredDevice'].fillna(df['PreferredDevice'].mode()[0], inplace=True)
df['Education'].fillna(df['Education'].mode()[0], inplace=True)
# Now safe to apply one-hot encoding
df_ohe = pd.get_dummies(df, columns=['Country', 'PreferredDevice', 'Education'])
df_ohe.info(memory_usage='deep')
https://colab.research.google.com/drive/1wuNW7PYu7fCu7ar7eUze2tJ5_Z4oLQOW#scrollTo=GndyhzlfJmGd&printMode=true 6/13
7/26/25, 10:44 AM LTI CheckList assignment 1.ipynb - Colab
8 Country_India 20 non-null bool
9 Country_UK 20 non-null bool
10 Country_USA 20 non-null bool
11 PreferredDevice_Laptop 20 non-null bool
12 PreferredDevice_Mobile 20 non-null bool
13 PreferredDevice_Tablet 20 non-null bool
14 Education_Graduate 20 non-null bool
15 Education_Post-Graduate 20 non-null bool
16 Education_Undergraduate 20 non-null bool
dtypes: bool(10), float64(3), int64(2), object(2)
memory usage: 3.5 KB
/tmp/ipython-input-20-417497002.py:2: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through ch
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are se
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] =
df['Age'].fillna(df['Age'].mean(), inplace=True)
/tmp/ipython-input-20-417497002.py:3: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through ch
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are se
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] =
df['MonthlyIncome'].fillna(df['MonthlyIncome'].mean(), inplace=True)
/tmp/ipython-input-20-417497002.py:4: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through ch
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are se
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] =
df['LoyaltyScore'].fillna(df['LoyaltyScore'].mean(), inplace=True)
https://colab.research.google.com/drive/1wuNW7PYu7fCu7ar7eUze2tJ5_Z4oLQOW#scrollTo=GndyhzlfJmGd&printMode=true 7/13
7/26/25, 10:44 AM LTI CheckList assignment 1.ipynb - Colab
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] =
df['PreferredDevice'].fillna(df['PreferredDevice'].mode()[0], inplace=True)
/tmp/ipython-input-20-417497002.py:8: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through ch
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are se
For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] =
df['Education'].fillna(df['Education'].mode()[0], inplace=True)
# Count unique customer profiles
profile_counts = df.groupby(['Gender', 'Country', 'PreferredDevice']).size()
# Display the result
print(profile_counts)
# Optional: Count total number of unique combinations
print(f"\nTotal unique customer profiles: {profile_counts.shape[0]}")
Gender Country PreferredDevice
Female Germany Laptop 2
Tablet 1
India Laptop 1
Mobile 2
Tablet 2
UK Mobile 1
Tablet 1
USA Laptop 1
Tablet 1
Male Germany Laptop 1
India Mobile 2
UK Mobile 2
USA Laptop 1
Mobile 2
dtype: int64
Total unique customer profiles: 14
https://colab.research.google.com/drive/1wuNW7PYu7fCu7ar7eUze2tJ5_Z4oLQOW#scrollTo=GndyhzlfJmGd&printMode=true 8/13
7/26/25, 10:44 AM LTI CheckList assignment 1.ipynb - Colab
# Behavioral analysis: Mean income and purchases grouped by subscription status
avg_behavior = df.groupby('Subscribed')[['MonthlyIncome', 'TotalPurchases']].mean()
print("Average MonthlyIncome and TotalPurchases by Subscription Status:")
print(avg_behavior)
Average MonthlyIncome and TotalPurchases by Subscription Status:
MonthlyIncome TotalPurchases
Subscribed
No 57777.777778 14.888889
Yes 55757.575758 12.636364
# Filter users aged below 30
young_users = df[df['Age'] < 30]
# Count device preferences
device_trend = young_users['PreferredDevice'].value_counts()
print("Preferred Devices Among Users Aged Below 30:")
print(device_trend)
Preferred Devices Among Users Aged Below 30:
PreferredDevice
Mobile 3
Tablet 3
Laptop 1
Name: count, dtype: int64
import seaborn as sns
import matplotlib.pyplot as plt
# Boxplot of LoyaltyScore by Gender
sns.boxplot(x='Gender', y='LoyaltyScore', data=df)
plt.title('Loyalty Score Distribution by Gender')
plt.ylabel('Loyalty Score')
https://colab.research.google.com/drive/1wuNW7PYu7fCu7ar7eUze2tJ5_Z4oLQOW#scrollTo=GndyhzlfJmGd&printMode=true 9/13
7/26/25, 10:44 AM LTI CheckList assignment 1.ipynb - Colab
plt.xlabel('Gender')
plt.show()
# Group by Education level and compute mean MonthlyIncome and LoyaltyScore
edu_insight = df.groupby('Education')[['MonthlyIncome', 'LoyaltyScore']].mean()
print("Average MonthlyIncome and LoyaltyScore by Education Level:")
print(edu_insight)
# Optional: sort by highest income or loyalty
edu_sorted_income = edu_insight.sort_values(by='MonthlyIncome', ascending=False)
edu_sorted_loyalty = edu_insight.sort_values(by='LoyaltyScore', ascending=False)
https://colab.research.google.com/drive/1wuNW7PYu7fCu7ar7eUze2tJ5_Z4oLQOW#scrollTo=GndyhzlfJmGd&printMode=true 10/13
7/26/25, 10:44 AM LTI CheckList assignment 1.ipynb - Colab
print("\nSorted by Income:")
print(edu_sorted_income)
print("\nSorted by Loyalty:")
print(edu_sorted_loyalty)
Average MonthlyIncome and LoyaltyScore by Education Level:
MonthlyIncome LoyaltyScore
Education
Graduate 55000.000000 6.929293
Post-Graduate 58200.000000 7.600000
Undergraduate 59333.333333 7.000000
Sorted by Income:
MonthlyIncome LoyaltyScore
Education
Undergraduate 59333.333333 7.000000
Post-Graduate 58200.000000 7.600000
Graduate 55000.000000 6.929293
Sorted by Loyalty:
MonthlyIncome LoyaltyScore
Education
Post-Graduate 58200.000000 7.600000
Undergraduate 59333.333333 7.000000
Graduate 55000.000000 6.929293
# Aggregate total purchases and average income by country
country_stats = df.groupby('Country').agg({
'TotalPurchases': 'sum',
'MonthlyIncome': 'mean'
})
# Sort and get top 2 countries for each metric
top_purchases = country_stats.sort_values(by='TotalPurchases', ascending=False).head(2)
top_income = country_stats.sort_values(by='MonthlyIncome', ascending=False).head(2)
https://colab.research.google.com/drive/1wuNW7PYu7fCu7ar7eUze2tJ5_Z4oLQOW#scrollTo=GndyhzlfJmGd&printMode=true 11/13
7/26/25, 10:44 AM LTI CheckList assignment 1.ipynb - Colab
print("🔝 Top 2 Countries by Total Purchases:\n", top_purchases)
print("\n💰 Top 2 Countries by Average Monthly Income:\n", top_income)
🔝 Top 2 Countries by Total Purchases:
TotalPurchases MonthlyIncome
Country
India 93 50428.571429
USA 74 60533.333333
💰 Top 2 Countries by Average Monthly Income:
TotalPurchases MonthlyIncome
Country
Germany 58 65250.000000
USA 74 60533.333333
# Aggregate total purchases and average income by country
country_stats = df.groupby('Country').agg({
'TotalPurchases': 'sum',
'MonthlyIncome': 'mean'
})
# Sort and get top 2 countries for each metric
top_purchases = country_stats.sort_values(by='TotalPurchases', ascending=False).head(2)
top_income = country_stats.sort_values(by='MonthlyIncome', ascending=False).head(2)
print("🔝 Top 2 Countries by Total Purchases:\n", top_purchases)
print("\n💰 Top 2 Countries by Average Monthly Income:\n", top_income)
🔝 Top 2 Countries by Total Purchases:
TotalPurchases MonthlyIncome
Country
India 93 50428.571429
USA 74 60533.333333
💰 Top 2 Countries by Average Monthly Income:
TotalPurchases MonthlyIncome
https://colab.research.google.com/drive/1wuNW7PYu7fCu7ar7eUze2tJ5_Z4oLQOW#scrollTo=GndyhzlfJmGd&printMode=true 12/13
7/26/25, 10:44 AM LTI CheckList assignment 1.ipynb - Colab
Country
Germany 58 65250.000000
USA 74 60533.333333
https://colab.research.google.com/drive/1wuNW7PYu7fCu7ar7eUze2tJ5_Z4oLQOW#scrollTo=GndyhzlfJmGd&printMode=true 13/13