0% found this document useful (0 votes)

15 views13 pages

LTI CheckList Assignment 1.ipynb - Colab

The document details a data preprocessing workflow for a customer dataset using pandas in Python. It includes loading the dataset, handling missing values through mean and mode imputation, and applying encoding techniques such as label encoding and one-hot encoding. The final output shows a cleaned dataset with no missing values and transformed categorical variables ready for analysis.

Uploaded by

Shubham Bandopadhyay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

15 views13 pages

LTI CheckList Assignment 1.ipynb - Colab

Uploaded by

Shubham Bandopadhyay

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 13

7/26/25, 10:44 AM LTI CheckList assignment 1.

ipynb - Colab

import pandas as pd

# Load the dataset

df = pd.read_csv("Customer_Data - Customer_Data.csv")

# Quick overview
print(df.head())
print(df.info())

CustomerID Gender Age Country Subscribed MonthlyIncome Education \

0 1 Male 25.0 India Yes 50000.0 Graduate
1 2 Female 30.0 USA No 60000.0 Post-Graduate
2 3 Female 22.0 UK Yes NaN Undergraduate
3 4 Male 45.0 India No 45000.0 Graduate
4 5 Female NaN Germany Yes 70000.0 NaN

LoyaltyScore PreferredDevice TotalPurchases

0 7.0 Mobile 12
1 8.0 Laptop 15
2 6.0 Tablet 8
3 9.0 NaN 20
4 5.0 Laptop 10
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20 entries, 0 to 19
Data columns (total 10 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 CustomerID 20 non-null int64
1 Gender 18 non-null object
2 Age 17 non-null float64
3 Country 20 non-null object
4 Subscribed 20 non-null object
5 MonthlyIncome 18 non-null float64
6 Education 18 non-null object
7 LoyaltyScore 18 non-null float64
8 PreferredDevice 18 non-null object
9 TotalPurchases 20 non-null int64
dtypes: float64(3), int64(2), object(5)
memory usage: 1.7+ KB
https://colab.research.google.com/drive/1wuNW7PYu7fCu7ar7eUze2tJ5_Z4oLQOW#scrollTo=GndyhzlfJmGd&printMode=true 1/13
7/26/25, 10:44 AM LTI CheckList assignment 1.ipynb - Colab
None

import pandas as pd

# Load the dataset

df = pd.read_csv("Customer_Data - Customer_Data.csv")

# Mean imputation for numerical columns

df['Age'].fillna(df['Age'].mean(), inplace=True)
df['MonthlyIncome'].fillna(df['MonthlyIncome'].mean(), inplace=True)
df['LoyaltyScore'].fillna(df['LoyaltyScore'].mean(), inplace=True)

# Mode imputation for categorical columns

df['Gender'].fillna(df['Gender'].mode()[0], inplace=True)
df['PreferredDevice'].fillna(df['PreferredDevice'].mode()[0], inplace=True)
df['Education'].fillna(df['Education'].mode()[0], inplace=True)

# Optional: Check if all missing values are filled

print("Missing values after imputation:\n", df.isnull().sum())

# Preview the first 5 rows

print("\nCleaned Dataset Preview:")
print(df.head())

Missing values after imputation:

CustomerID 0
Gender 0
Age 0
Country 0
Subscribed 0
MonthlyIncome 0
Education 0
LoyaltyScore 0
PreferredDevice 0
TotalPurchases 0
dtype: int64

Cleaned Dataset Preview:

https://colab.research.google.com/drive/1wuNW7PYu7fCu7ar7eUze2tJ5_Z4oLQOW#scrollTo=GndyhzlfJmGd&printMode=true 2/13
7/26/25, 10:44 AM LTI CheckList assignment 1.ipynb - Colab
CustomerID Gender Age Country Subscribed MonthlyIncome \
0 1 Male 25.000000 India Yes 50000.000000
1 2 Female 30.000000 USA No 60000.000000
2 3 Female 22.000000 UK Yes 56666.666667
3 4 Male 45.000000 India No 45000.000000
4 5 Female 33.352941 Germany Yes 70000.000000

Education LoyaltyScore PreferredDevice TotalPurchases

0 Graduate 7.0 Mobile 12
1 Post-Graduate 8.0 Laptop 15
2 Undergraduate 6.0 Tablet 8
3 Graduate 9.0 Mobile 20
4 Graduate 5.0 Laptop 10
/tmp/ipython-input-9-1909683163.py:7: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through ch
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are se

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] =

df['Age'].fillna(df['Age'].mean(), inplace=True)
/tmp/ipython-input-9-1909683163.py:8: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through ch
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are se

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] =

df['MonthlyIncome'].fillna(df['MonthlyIncome'].mean(), inplace=True)
/tmp/ipython-input-9-1909683163.py:9: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through ch
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are se

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] =

df['LoyaltyScore'].fillna(df['LoyaltyScore'].mean(), inplace=True)
/tmp/ipython-input-9-1909683163.py:12: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through c
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are se

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] =

df['Gender'].fillna(df['Gender'].mode()[0], inplace=True)
https://colab.research.google.com/drive/1wuNW7PYu7fCu7ar7eUze2tJ5_Z4oLQOW#scrollTo=GndyhzlfJmGd&printMode=true 3/13
7/26/25, 10:44 AM LTI CheckList assignment 1.ipynb - Colab
/tmp/ipython-input-9-1909683163.py:13: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through c
The behavior will change in pandas 3 0 This inplace method will never work because the intermediate object on which we are se

# Label Encoding for 'Subscribed' column

df['Subscribed'] = df['Subscribed'].map({'Yes': 1, 'No': 0})

from sklearn.preprocessing import LabelEncoder

# List of categorical columns to encode

categorical_cols = ['Gender', 'Education', 'PreferredDevice', 'Country']

# Apply Label Encoding

le = LabelEncoder()
for col in categorical_cols:
df[col] = le.fit_transform(df[col])

# One-Hot Encoding for Country, PreferredDevice, and Education

df = pd.get_dummies(df, columns=['Country', 'PreferredDevice', 'Education'])

# Calculate percentage of missing data in each column

missing_percentage = df.isnull().mean() * 100

# Display only columns with missing values

missing_percentage = missing_percentage[missing_percentage > 0]

print("Percentage of Missing Data in Each Column:\n")

print(missing_percentage)

Percentage of Missing Data in Each Column:

Series([], dtype: float64)

df.info(memory_usage='deep')

https://colab.research.google.com/drive/1wuNW7PYu7fCu7ar7eUze2tJ5_Z4oLQOW#scrollTo=GndyhzlfJmGd&printMode=true 4/13
7/26/25, 10:44 AM LTI CheckList assignment 1.ipynb - Colab

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20 entries, 0 to 19
Data columns (total 17 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 CustomerID 20 non-null int64
1 Gender 20 non-null int64
2 Age 20 non-null float64
3 Subscribed 20 non-null int64
4 MonthlyIncome 20 non-null float64
5 LoyaltyScore 20 non-null float64
6 TotalPurchases 20 non-null int64
7 Country_0 20 non-null bool
8 Country_1 20 non-null bool
9 Country_2 20 non-null bool
10 Country_3 20 non-null bool
11 PreferredDevice_0 20 non-null bool
12 PreferredDevice_1 20 non-null bool
13 PreferredDevice_2 20 non-null bool
14 Education_0 20 non-null bool
15 Education_1 20 non-null bool
16 Education_2 20 non-null bool
dtypes: bool(10), float64(3), int64(4)
memory usage: 1.4 KB

from sklearn.preprocessing import LabelEncoder

df_label = df.copy()
for col in ['Gender', 'Subscribed']: # Assume Country, PreferredDevice, Education already one-hot encoded
df_label[col] = LabelEncoder().fit_transform(df_label[col])

df_label.info(memory_usage='deep')

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 20 entries, 0 to 19
Data columns (total 17 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 CustomerID 20 non-null int64
https://colab.research.google.com/drive/1wuNW7PYu7fCu7ar7eUze2tJ5_Z4oLQOW#scrollTo=GndyhzlfJmGd&printMode=true 5/13
7/26/25, 10:44 AM LTI CheckList assignment 1.ipynb - Colab
1 Gender 20 non-null int64
2 Age 20 non-null float64
3 Subscribed 20 non-null int64
4 MonthlyIncome 20 non-null float64
5 LoyaltyScore 20 non-null float64
6 TotalPurchases 20 non-null int64
7 Country_0 20 non-null bool
8 Country_1 20 non-null bool
9 Country_2 20 non-null bool
10 Country_3 20 non-null bool
11 PreferredDevice_0 20 non-null bool
12 PreferredDevice_1 20 non-null bool
13 PreferredDevice_2 20 non-null bool
14 Education_0 20 non-null bool
15 Education_1 20 non-null bool
16 Education_2 20 non-null bool
dtypes: bool(10), float64(3), int64(4)
memory usage: 1.4 KB

df = pd.read_csv("Customer_Data - Customer_Data.csv")

# Fill missing values

df['Age'].fillna(df['Age'].mean(), inplace=True)
df['MonthlyIncome'].fillna(df['MonthlyIncome'].mean(), inplace=True)
df['LoyaltyScore'].fillna(df['LoyaltyScore'].mean(), inplace=True)

df['Gender'].fillna(df['Gender'].mode()[0], inplace=True)
df['PreferredDevice'].fillna(df['PreferredDevice'].mode()[0], inplace=True)
df['Education'].fillna(df['Education'].mode()[0], inplace=True)

# Now safe to apply one-hot encoding

df_ohe = pd.get_dummies(df, columns=['Country', 'PreferredDevice', 'Education'])
df_ohe.info(memory_usage='deep')

https://colab.research.google.com/drive/1wuNW7PYu7fCu7ar7eUze2tJ5_Z4oLQOW#scrollTo=GndyhzlfJmGd&printMode=true 6/13
7/26/25, 10:44 AM LTI CheckList assignment 1.ipynb - Colab
8 Country_India 20 non-null bool
9 Country_UK 20 non-null bool
10 Country_USA 20 non-null bool
11 PreferredDevice_Laptop 20 non-null bool
12 PreferredDevice_Mobile 20 non-null bool
13 PreferredDevice_Tablet 20 non-null bool
14 Education_Graduate 20 non-null bool
15 Education_Post-Graduate 20 non-null bool
16 Education_Undergraduate 20 non-null bool
dtypes: bool(10), float64(3), int64(2), object(2)
memory usage: 3.5 KB
/tmp/ipython-input-20-417497002.py:2: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through ch
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are se

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] =

df['Age'].fillna(df['Age'].mean(), inplace=True)
/tmp/ipython-input-20-417497002.py:3: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through ch
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are se

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] =

df['MonthlyIncome'].fillna(df['MonthlyIncome'].mean(), inplace=True)
/tmp/ipython-input-20-417497002.py:4: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through ch
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are se

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] =

df['LoyaltyScore'].fillna(df['LoyaltyScore'].mean(), inplace=True)

https://colab.research.google.com/drive/1wuNW7PYu7fCu7ar7eUze2tJ5_Z4oLQOW#scrollTo=GndyhzlfJmGd&printMode=true 7/13
7/26/25, 10:44 AM LTI CheckList assignment 1.ipynb - Colab

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] =

df['PreferredDevice'].fillna(df['PreferredDevice'].mode()[0], inplace=True)
/tmp/ipython-input-20-417497002.py:8: FutureWarning: A value is trying to be set on a copy of a DataFrame or Series through ch
The behavior will change in pandas 3.0. This inplace method will never work because the intermediate object on which we are se

For example, when doing 'df[col].method(value, inplace=True)', try using 'df.method({col: value}, inplace=True)' or df[col] =

df['Education'].fillna(df['Education'].mode()[0], inplace=True)

# Count unique customer profiles

profile_counts = df.groupby(['Gender', 'Country', 'PreferredDevice']).size()

# Display the result

print(profile_counts)

# Optional: Count total number of unique combinations

print(f"\nTotal unique customer profiles: {profile_counts.shape[0]}")

Gender Country PreferredDevice

Female Germany Laptop 2
Tablet 1
India Laptop 1
Mobile 2
Tablet 2
UK Mobile 1
Tablet 1
USA Laptop 1
Tablet 1
Male Germany Laptop 1
India Mobile 2
UK Mobile 2
USA Laptop 1
Mobile 2
dtype: int64

Total unique customer profiles: 14

https://colab.research.google.com/drive/1wuNW7PYu7fCu7ar7eUze2tJ5_Z4oLQOW#scrollTo=GndyhzlfJmGd&printMode=true 8/13
7/26/25, 10:44 AM LTI CheckList assignment 1.ipynb - Colab

# Behavioral analysis: Mean income and purchases grouped by subscription status

avg_behavior = df.groupby('Subscribed')[['MonthlyIncome', 'TotalPurchases']].mean()

print("Average MonthlyIncome and TotalPurchases by Subscription Status:")

print(avg_behavior)

Average MonthlyIncome and TotalPurchases by Subscription Status:

MonthlyIncome TotalPurchases
Subscribed
No 57777.777778 14.888889
Yes 55757.575758 12.636364

# Filter users aged below 30

young_users = df[df['Age'] < 30]

# Count device preferences

device_trend = young_users['PreferredDevice'].value_counts()

print("Preferred Devices Among Users Aged Below 30:")

print(device_trend)

Preferred Devices Among Users Aged Below 30:

PreferredDevice
Mobile 3
Tablet 3
Laptop 1
Name: count, dtype: int64

import seaborn as sns

import matplotlib.pyplot as plt

# Boxplot of LoyaltyScore by Gender

sns.boxplot(x='Gender', y='LoyaltyScore', data=df)
plt.title('Loyalty Score Distribution by Gender')
plt.ylabel('Loyalty Score')

https://colab.research.google.com/drive/1wuNW7PYu7fCu7ar7eUze2tJ5_Z4oLQOW#scrollTo=GndyhzlfJmGd&printMode=true 9/13
7/26/25, 10:44 AM LTI CheckList assignment 1.ipynb - Colab

plt.xlabel('Gender')
plt.show()

# Group by Education level and compute mean MonthlyIncome and LoyaltyScore

edu_insight = df.groupby('Education')[['MonthlyIncome', 'LoyaltyScore']].mean()

print("Average MonthlyIncome and LoyaltyScore by Education Level:")

print(edu_insight)

# Optional: sort by highest income or loyalty

edu_sorted_income = edu_insight.sort_values(by='MonthlyIncome', ascending=False)
edu_sorted_loyalty = edu_insight.sort_values(by='LoyaltyScore', ascending=False)
https://colab.research.google.com/drive/1wuNW7PYu7fCu7ar7eUze2tJ5_Z4oLQOW#scrollTo=GndyhzlfJmGd&printMode=true 10/13
7/26/25, 10:44 AM LTI CheckList assignment 1.ipynb - Colab

print("\nSorted by Income:")
print(edu_sorted_income)

print("\nSorted by Loyalty:")
print(edu_sorted_loyalty)

Average MonthlyIncome and LoyaltyScore by Education Level:

MonthlyIncome LoyaltyScore
Education
Graduate 55000.000000 6.929293
Post-Graduate 58200.000000 7.600000
Undergraduate 59333.333333 7.000000

Sorted by Income:
MonthlyIncome LoyaltyScore
Education
Undergraduate 59333.333333 7.000000
Post-Graduate 58200.000000 7.600000
Graduate 55000.000000 6.929293

Sorted by Loyalty:
MonthlyIncome LoyaltyScore
Education
Post-Graduate 58200.000000 7.600000
Undergraduate 59333.333333 7.000000
Graduate 55000.000000 6.929293

# Aggregate total purchases and average income by country

country_stats = df.groupby('Country').agg({
'TotalPurchases': 'sum',
'MonthlyIncome': 'mean'
})

# Sort and get top 2 countries for each metric

top_purchases = country_stats.sort_values(by='TotalPurchases', ascending=False).head(2)
top_income = country_stats.sort_values(by='MonthlyIncome', ascending=False).head(2)

https://colab.research.google.com/drive/1wuNW7PYu7fCu7ar7eUze2tJ5_Z4oLQOW#scrollTo=GndyhzlfJmGd&printMode=true 11/13
7/26/25, 10:44 AM LTI CheckList assignment 1.ipynb - Colab

print("🔝 Top 2 Countries by Total Purchases:\n", top_purchases)

print("\n💰 Top 2 Countries by Average Monthly Income:\n", top_income)

🔝 Top 2 Countries by Total Purchases:

TotalPurchases MonthlyIncome
Country
India 93 50428.571429
USA 74 60533.333333

💰 Top 2 Countries by Average Monthly Income:

TotalPurchases MonthlyIncome
Country
Germany 58 65250.000000
USA 74 60533.333333

# Aggregate total purchases and average income by country

country_stats = df.groupby('Country').agg({
'TotalPurchases': 'sum',
'MonthlyIncome': 'mean'
})

# Sort and get top 2 countries for each metric

top_purchases = country_stats.sort_values(by='TotalPurchases', ascending=False).head(2)
top_income = country_stats.sort_values(by='MonthlyIncome', ascending=False).head(2)

print("🔝 Top 2 Countries by Total Purchases:\n", top_purchases)

print("\n💰 Top 2 Countries by Average Monthly Income:\n", top_income)

🔝 Top 2 Countries by Total Purchases:

TotalPurchases MonthlyIncome
Country
India 93 50428.571429
USA 74 60533.333333

💰 Top 2 Countries by Average Monthly Income:

TotalPurchases MonthlyIncome

https://colab.research.google.com/drive/1wuNW7PYu7fCu7ar7eUze2tJ5_Z4oLQOW#scrollTo=GndyhzlfJmGd&printMode=true 12/13
7/26/25, 10:44 AM LTI CheckList assignment 1.ipynb - Colab
Country
Germany 58 65250.000000
USA 74 60533.333333

https://colab.research.google.com/drive/1wuNW7PYu7fCu7ar7eUze2tJ5_Z4oLQOW#scrollTo=GndyhzlfJmGd&printMode=true 13/13

Code
No ratings yet
Code
1 page
Customer Churn Prediction
No ratings yet
Customer Churn Prediction
16 pages
Sunbase Data Assignment
No ratings yet
Sunbase Data Assignment
11 pages
Manual
No ratings yet
Manual
48 pages
Building Logistic Regression Model in Python
No ratings yet
Building Logistic Regression Model in Python
24 pages
DS Capestone PDF
No ratings yet
DS Capestone PDF
41 pages
Customer Churn Analysis 1740361695
No ratings yet
Customer Churn Analysis 1740361695
14 pages
E-Commerce Product Delivery Prediction
No ratings yet
E-Commerce Product Delivery Prediction
13 pages
BankX Marketing 1744722258
No ratings yet
BankX Marketing 1744722258
29 pages
Customer Segmentation 1683225943
No ratings yet
Customer Segmentation 1683225943
34 pages
EDA Diwali Sale Analysis Project
No ratings yet
EDA Diwali Sale Analysis Project
11 pages
Even Students
No ratings yet
Even Students
36 pages
ML Lab Manual 1-10
No ratings yet
ML Lab Manual 1-10
58 pages
DS Food
No ratings yet
DS Food
23 pages
Walmart Business Case - Updated
No ratings yet
Walmart Business Case - Updated
47 pages
Cleaning Data in Python
No ratings yet
Cleaning Data in Python
8 pages
Lab 1 ML
No ratings yet
Lab 1 ML
2 pages
AI Lab 04 Lab Tasks
No ratings yet
AI Lab 04 Lab Tasks
18 pages
Data Wrangling - Jupyter Notebook
No ratings yet
Data Wrangling - Jupyter Notebook
5 pages
Data Science Midterm: SCD Challenges
No ratings yet
Data Science Midterm: SCD Challenges
56 pages
Laptop Price Prediction
No ratings yet
Laptop Price Prediction
15 pages
Exp 8 - LM
No ratings yet
Exp 8 - LM
10 pages
Walmart - Project - Jupyter Notebook
No ratings yet
Walmart - Project - Jupyter Notebook
7 pages
Question Bank-BDA (Module 1&2) 2
No ratings yet
Question Bank-BDA (Module 1&2) 2
5 pages
Supervised Decision Trees A Case Study For AllLife Bank
No ratings yet
Supervised Decision Trees A Case Study For AllLife Bank
50 pages
Problem Scenario
No ratings yet
Problem Scenario
13 pages
Data Wrangling Notebook Summary
No ratings yet
Data Wrangling Notebook Summary
9 pages
DS Question Bank Unit-1 Part-2
No ratings yet
DS Question Bank Unit-1 Part-2
3 pages
Data Preprocessing 1
No ratings yet
Data Preprocessing 1
6 pages
Oddstudents
No ratings yet
Oddstudents
35 pages
Nqndtzdyg: Pandas PD Numpy NP Sklearn - Preprocessing
No ratings yet
Nqndtzdyg: Pandas PD Numpy NP Sklearn - Preprocessing
3 pages
Observation: Import As Import As Import As Import As
No ratings yet
Observation: Import As Import As Import As Import As
31 pages
ML Assignment No 5
No ratings yet
ML Assignment No 5
11 pages
Data Preprocessing 2
No ratings yet
Data Preprocessing 2
5 pages
Quantium Task 1
No ratings yet
Quantium Task 1
4 pages
Data Analysis for Sales Insights
No ratings yet
Data Analysis for Sales Insights
4 pages
Machine Learning Lab Assignment 2
No ratings yet
Machine Learning Lab Assignment 2
23 pages
Walmart - A Case Study
No ratings yet
Walmart - A Case Study
51 pages
Informatics Practices Practical Exam 2023-2024
No ratings yet
Informatics Practices Practical Exam 2023-2024
6 pages
Aerofit Case Study
No ratings yet
Aerofit Case Study
16 pages
Diwali Sales Anlaysis
No ratings yet
Diwali Sales Anlaysis
10 pages
Mall Customer Data Analysis PDF
No ratings yet
Mall Customer Data Analysis PDF
10 pages
Assignment 1
No ratings yet
Assignment 1
8 pages
Zindi Financial Inclusion Guide
No ratings yet
Zindi Financial Inclusion Guide
12 pages
Dataframe
No ratings yet
Dataframe
19 pages
Quantium Task 2
No ratings yet
Quantium Task 2
5 pages
Part A Assignment - No - 1
No ratings yet
Part A Assignment - No - 1
7 pages
Lab File
No ratings yet
Lab File
96 pages
Masterclass Data Analysis - Ipynb - Colab
No ratings yet
Masterclass Data Analysis - Ipynb - Colab
4 pages
Linear Regression
No ratings yet
Linear Regression
4 pages
Data Analysis: Data Preparation
No ratings yet
Data Analysis: Data Preparation
9 pages
#Group: B (ML) : Numpy NP Pandas PD
No ratings yet
#Group: B (ML) : Numpy NP Pandas PD
9 pages
Telecom Churn Prediction with Logistic Regression
No ratings yet
Telecom Churn Prediction with Logistic Regression
38 pages
Kunal Assignment 3
No ratings yet
Kunal Assignment 3
19 pages
Practice Questions2
No ratings yet
Practice Questions2
2 pages
1 Demand
No ratings yet
1 Demand
13 pages
B Tech-AIML-question Bank-2 Answer Key
No ratings yet
B Tech-AIML-question Bank-2 Answer Key
9 pages
Mtech Study Material
No ratings yet
Mtech Study Material
10 pages
Cybersecurity Skills Certification
No ratings yet
Cybersecurity Skills Certification
1 page
Digital Watch Warranty Details
No ratings yet
Digital Watch Warranty Details
2 pages
aNJGnRtgfiK5fQqcR Verizon pqhH55EZ4M2mFhcdc 1729024820161 Completion Certificate
No ratings yet
aNJGnRtgfiK5fQqcR Verizon pqhH55EZ4M2mFhcdc 1729024820161 Completion Certificate
1 page
OD333147309844972200
No ratings yet
OD333147309844972200
4 pages
Admission WEB General Information 2024 PG
No ratings yet
Admission WEB General Information 2024 PG
1 page
Expected Questions in GATE 2022 Part I With Anno
No ratings yet
Expected Questions in GATE 2022 Part I With Anno
84 pages
Oberoi Hotel - Case Study On Trade Union PDF
0% (1)
Oberoi Hotel - Case Study On Trade Union PDF
3 pages
Dini Argeo CPWEBELT-MODBUS - 01.00 - 10.10 - EN
No ratings yet
Dini Argeo CPWEBELT-MODBUS - 01.00 - 10.10 - EN
19 pages
CA Final Audit Insights
No ratings yet
CA Final Audit Insights
12 pages
6300 Catalog
No ratings yet
6300 Catalog
15 pages
Jenbacher Gas Engine Specs
No ratings yet
Jenbacher Gas Engine Specs
4 pages
New ps5 Console Disc: Hi, Is This Still Available? Is The Price Negotiable? Can I See More Photos?
No ratings yet
New ps5 Console Disc: Hi, Is This Still Available? Is The Price Negotiable? Can I See More Photos?
1 page
Nimbin Radio Sponsors & Programs
No ratings yet
Nimbin Radio Sponsors & Programs
1 page
Induction Motor
100% (1)
Induction Motor
20 pages
Extinction of Criminal Action
No ratings yet
Extinction of Criminal Action
17 pages
Industrial Visit Report: Forge & Forge
No ratings yet
Industrial Visit Report: Forge & Forge
14 pages
Art 1193-1206 Case Digest - From GOOGLE DRIVE
No ratings yet
Art 1193-1206 Case Digest - From GOOGLE DRIVE
3 pages
ECE 571 Laboratory: New Era University
No ratings yet
ECE 571 Laboratory: New Era University
6 pages
Norriseal PDF
100% (2)
Norriseal PDF
349 pages
CMQ/OE Exam Practice Questions
100% (1)
CMQ/OE Exam Practice Questions
47 pages
The Torcch Tools
No ratings yet
The Torcch Tools
25 pages
Debate1bingo PDF
No ratings yet
Debate1bingo PDF
1 page
Skylark2 Pamp - 1
No ratings yet
Skylark2 Pamp - 1
20 pages
Legal Profession and Ethics Guide
100% (1)
Legal Profession and Ethics Guide
5 pages
Climate Finance Provided Between 2016 and 2020pdf
No ratings yet
Climate Finance Provided Between 2016 and 2020pdf
77 pages
Leadership Styles
No ratings yet
Leadership Styles
2 pages
Energy Conservation Calculation Case: Geyser Comparison: Asif Raza Electrical Team Leader
No ratings yet
Energy Conservation Calculation Case: Geyser Comparison: Asif Raza Electrical Team Leader
6 pages
Development and Verification of A Simulation Model For Paddy Drying With Different Atbed Dryers
No ratings yet
Development and Verification of A Simulation Model For Paddy Drying With Different Atbed Dryers
14 pages
Introduction To WireShark
No ratings yet
Introduction To WireShark
15 pages
Bank Statement Overview
100% (1)
Bank Statement Overview
1 page
Delfingen Tds en Nu Crush
No ratings yet
Delfingen Tds en Nu Crush
2 pages
Federal Judge Denies Temporary Restraining Order On Restaurants - UNITED STATES DISTRICT COURT WESTERN DISTRICT OF MICHIGAN SOUTHERN DIVISION
No ratings yet
Federal Judge Denies Temporary Restraining Order On Restaurants - UNITED STATES DISTRICT COURT WESTERN DISTRICT OF MICHIGAN SOUTHERN DIVISION
11 pages
Quick Reference Card Invoice Example Essent
No ratings yet
Quick Reference Card Invoice Example Essent
1 page
Inflation's Impact on Worker Performance
No ratings yet
Inflation's Impact on Worker Performance
6 pages
Surge Arrester for Medium Voltage
No ratings yet
Surge Arrester for Medium Voltage
1 page
AI Fundamentals for Beginners
No ratings yet
AI Fundamentals for Beginners
19 pages

LTI CheckList Assignment 1.ipynb - Colab

Uploaded by

LTI CheckList Assignment 1.ipynb - Colab

Uploaded by

7/26/25, 10:44 AM LTI CheckList assignment 1.

# Load the dataset

CustomerID Gender Age Country Subscribed MonthlyIncome Education \

LoyaltyScore PreferredDevice TotalPurchases

# Load the dataset

# Mean imputation for numerical columns

# Mode imputation for categorical columns

# Optional: Check if all missing values are filled

# Preview the first 5 rows

Missing values after imputation:

Cleaned Dataset Preview:

Education LoyaltyScore PreferredDevice TotalPurchases

# Label Encoding for 'Subscribed' column

from sklearn.preprocessing import LabelEncoder

# List of categorical columns to encode

# Apply Label Encoding

# One-Hot Encoding for Country, PreferredDevice, and Education

# Calculate percentage of missing data in each column

# Display only columns with missing values

print("Percentage of Missing Data in Each Column:\n")

Percentage of Missing Data in Each Column:

Series([], dtype: float64)

from sklearn.preprocessing import LabelEncoder

# Fill missing values

# Now safe to apply one-hot encoding

# Count unique customer profiles

# Display the result

# Optional: Count total number of unique combinations

Gender Country PreferredDevice

Total unique customer profiles: 14

# Behavioral analysis: Mean income and purchases grouped by subscription status

print("Average MonthlyIncome and TotalPurchases by Subscription Status:")

Average MonthlyIncome and TotalPurchases by Subscription Status:

# Filter users aged below 30

# Count device preferences

print("Preferred Devices Among Users Aged Below 30:")

Preferred Devices Among Users Aged Below 30:

import seaborn as sns

# Boxplot of LoyaltyScore by Gender

# Group by Education level and compute mean MonthlyIncome and LoyaltyScore

print("Average MonthlyIncome and LoyaltyScore by Education Level:")

# Optional: sort by highest income or loyalty

Average MonthlyIncome and LoyaltyScore by Education Level:

# Aggregate total purchases and average income by country

# Sort and get top 2 countries for each metric

print("🔝 Top 2 Countries by Total Purchases:\n", top_purchases)

print("\n💰 Top 2 Countries by Average Monthly Income:\n", top_income)

🔝 Top 2 Countries by Total Purchases:

💰 Top 2 Countries by Average Monthly Income:

# Aggregate total purchases and average income by country

# Sort and get top 2 countries for each metric

print("🔝 Top 2 Countries by Total Purchases:\n", top_purchases)

print("\n💰 Top 2 Countries by Average Monthly Income:\n", top_income)

🔝 Top 2 Countries by Total Purchases:

💰 Top 2 Countries by Average Monthly Income:

You might also like