0% found this document useful (0 votes)

8 views6 pages

Walmart - Ipynb - Colaboratory

The document is a Jupyter notebook analyzing Walmart purchase data, focusing on customer demographics such as gender and age. It includes data loading, descriptive statistics, visualizations, and confidence interval calculations for average spending by gender and age groups. Key findings indicate that male customers spend more on average than female customers, and various age groups have different spending patterns.

Uploaded by

vpalak93

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views6 pages

Walmart - Ipynb - Colaboratory

Uploaded by

vpalak93

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

8/15/23, 11:02 PM Walmart.

ipynb - Colaboratory

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

data=pd.read_csv("https://d2beiqkhq929f0.cloudfront.net/public_assets/assets/000/001/293/original/walmart_data.csv?1641285094")

data.head(5)

User_ID Product_ID Gender Age Occupation City_Category Stay_In_Current_City_Years Marital_Status Produc

0-
0 1000001 P00069042 F 10 A 2 0
17

0-
1 1000001 P00248942 F 10 A 2 0
17

0-
2 1000001 P00087842 F 10 A 2 0
17

0-
3 1000001 P00085442 F 10 A 2 0
data.columns

Index(['User_ID', 'Product_ID', 'Gender', 'Age', 'Occupation', 'City_Category',

'Stay_In_Current_City_Years', 'Marital_Status', 'Product_Category',
'Purchase'],
dtype='object')

data.shape

(550068, 10)

data.describe()

User_ID Occupation Marital_Status Product_Category Purchase

count 5.500680e+05 550068.000000 550068.000000 550068.000000 550068.000000

mean 1.003029e+06 8.076707 0.409653 5.404270 9263.968713

std 1.727592e+03 6.522660 0.491770 3.936211 5023.065394

min 1.000001e+06 0.000000 0.000000 1.000000 12.000000

25% 1.001516e+06 2.000000 0.000000 1.000000 5823.000000

50% 1.003077e+06 7.000000 0.000000 5.000000 8047.000000

75% 1.004478e+06 14.000000 1.000000 8.000000 12054.000000

max 1.006040e+06 20.000000 1.000000 20.000000 23961.000000

sns.boxplot(x=data["Gender"],y=data["Purchase"])

https://colab.research.google.com/drive/18DgebZPGy-tMWk9OnkIC91-iKYDk6ymX#scrollTo=d6vbi-rQOC15&printMode=true 1/6
8/15/23, 11:02 PM Walmart.ipynb - Colaboratory

<Axes: xlabel='Gender', ylabel='Purchase'>

data["Gender"].value_counts()

M 414259
F 135809
Name: Gender, dtype: int64

data["Marital_Status"].value_counts()

0 324731
1 225337
Name: Marital_Status, dtype: int64

pd.crosstab(data["Gender"],data["Purchase"],margins=True)

Purchase 12 13 14 24 25 26 36 37 38 48 ... 23952 23953 23954 23955 23956 23958 23959 239

Gender

F 27 25 30 28 30 27 36 31 34 33 ... 0 0 0 1 0 0 1

M 74 81 65 90 83 85 71 79 80 75 ... 1 2 2 2 1 4 1

All 101 106 95 118 113 112 107 110 114 108 ... 1 2 2 3 1 4 2

3 rows × 18106 columns

## Women are not spending more than man

## Univariante Analysis

plt.figure(figsize=(10, 6))
sns.histplot(data=data, x='Purchase', kde=True)
plt.show()

sns.boxplot(data=data, x='Purchase', orient='h')

plt.show()

https://colab.research.google.com/drive/18DgebZPGy-tMWk9OnkIC91-iKYDk6ymX#scrollTo=d6vbi-rQOC15&printMode=true 2/6
8/15/23, 11:02 PM Walmart.ipynb - Colaboratory

amt_df = data.groupby(['User_ID', 'Gender'])[['Purchase']].sum()

amt_df = amt_df.reset_index()
amt_df

User_ID Gender Purchase

0 1000001 F 334093

1 1000002 M 810472

2 1000003 M 341635

3 1000004 M 206468

4 1000005 M 821001

... ... ... ...

5886 1006036 F 4116058

5887 1006037 F 1119538

5888 1006038 F 90034

5889 1006039 F 590319

5890 1006040 M 1653299

5891 rows × 3 columns

amt_df[amt_df['Gender']=='M']['Purchase'].hist(bins=35)
plt.show()

amt_df[amt_df['Gender']=='F']['Purchase'].hist(bins=35)
plt.show()

https://colab.research.google.com/drive/18DgebZPGy-tMWk9OnkIC91-iKYDk6ymX#scrollTo=d6vbi-rQOC15&printMode=true 3/6
8/15/23, 11:02 PM Walmart.ipynb - Colaboratory

male_avg = amt_df[amt_df['Gender']=='M']['Purchase'].mean()
female_avg = amt_df[amt_df['Gender']=='F']['Purchase'].mean()

print("Average amount spend by Male customers: {:.2f}".format(male_avg))

print("Average amount spend by Female customers: {:.2f}".format(female_avg))

Average amount spend by Male customers: 925344.40

Average amount spend by Female customers: 712024.39

male_df = amt_df[amt_df['Gender']=='M']
female_df = amt_df[amt_df['Gender']=='F']

genders = ["M", "F"]

male_sample_size = 3000
female_sample_size = 1500
num_repitions = 1000
male_means = []
female_means = []

for _ in range(num_repitions):
male_mean = male_df.sample(male_sample_size, replace=True)['Purchase'].mean()
female_mean = female_df.sample(female_sample_size, replace=True)['Purchase'].mean()

male_means.append(male_mean)
female_means.append(female_mean)

fig, axis = plt.subplots(nrows=1, ncols=2, figsize=(20, 6))

axis[0].hist(male_means, bins=35)
axis[1].hist(female_means, bins=35)
axis[0].set_title("Male - Distribution of means, Sample size: 3000")
axis[1].set_title("Female - Distribution of means, Sample size: 1500")

plt.show()

print("Population mean - Mean of sample means of amount spend for Male: {:.2f}".format(np.mean(male_means)))
print("Population mean - Mean of sample means of amount spend for Female: {:.2f}".format(np.mean(female_means)))

print("\nMale - Sample mean: {:.2f} Sample std: {:.2f}".format(male_df['Purchase'].mean(), male_df['Purchase'].std()))

print("Female - Sample mean: {:.2f} Sample std: {:.2f}".format(female_df['Purchase'].mean(), female_df['Purchase'].std()))

Population mean - Mean of sample means of amount spend for Male: 925203.96
Population mean - Mean of sample means of amount spend for Female: 712059.99

Male - Sample mean: 925344.40 Sample std: 985830.10

Female - Sample mean: 712024.39 Sample std: 807370.73

https://colab.research.google.com/drive/18DgebZPGy-tMWk9OnkIC91-iKYDk6ymX#scrollTo=d6vbi-rQOC15&printMode=true 4/6
8/15/23, 11:02 PM Walmart.ipynb - Colaboratory

male_margin_of_error_clt = 1.96*male_df['Purchase'].std()/np.sqrt(len(male_df))
male_sample_mean = male_df['Purchase'].mean()
male_lower_lim = male_sample_mean - male_margin_of_error_clt
male_upper_lim = male_sample_mean + male_margin_of_error_clt

female_margin_of_error_clt = 1.96*female_df['Purchase'].std()/np.sqrt(len(female_df))
female_sample_mean = female_df['Purchase'].mean()
female_lower_lim = female_sample_mean - female_margin_of_error_clt
female_upper_lim = female_sample_mean + female_margin_of_error_clt

print("Male confidence interval of means: ({:.2f}, {:.2f})".format(male_lower_lim, male_upper_lim))

print("Female confidence interval of means: ({:.2f}, {:.2f})".format(female_lower_lim, female_upper_lim))

Male confidence interval of means: (895617.83, 955070.97)

Female confidence interval of means: (673254.77, 750794.02)

amt_df = data.groupby(['User_ID', 'Age'])[['Purchase']].sum()

amt_df = amt_df.reset_index()
amt_df

User_ID Age Purchase

0 1000001 0-17 334093

1 1000002 55+ 810472

2 1000003 26-35 341635

3 1000004 46-50 206468

4 1000005 26-35 821001

... ... ... ...

5886 1006036 26-35 4116058

5887 1006037 46-50 1119538

5888 1006038 55+ 90034

5889 1006039 46-50 590319

5890 1006040 26-35 1653299

5891 rows × 3 columns

amt_df['Age'].value_counts()

26-35 2053
36-45 1167
18-25 1069
46-50 531
51-55 481
55+ 372
0-17 218
Name: Age, dtype: int64

sample_size = 200
num_repitions = 1000

all_means = {}

age_intervals = ['26-35', '36-45', '18-25', '46-50', '51-55', '55+', '0-17']

for age_interval in age_intervals:
all_means[age_interval] = []

for age_interval in age_intervals:

for _ in range(num_repitions):
mean = amt_df[amt_df['Age']==age_interval].sample(sample_size, replace=True)['Purchase'].mean()
all_means[age_interval].append(mean)

for val in ['26-35', '36-45', '18-25', '46-50', '51-55', '55+', '0-17']:

new_df = amt_df[amt_df['Age']==val]

margin_of_error_clt = 1.96*new_df['Purchase'].std()/np.sqrt(len(new_df))
sample_mean = new_df['Purchase'].mean()
lower_lim = sample_mean - margin_of_error_clt
upper_lim = sample_mean + margin_of_error_clt

print("For age {} --> confidence interval of means: ({:.2f}, {:.2f})".format(val, lower_lim, upper_lim))

https://colab.research.google.com/drive/18DgebZPGy-tMWk9OnkIC91-iKYDk6ymX#scrollTo=d6vbi-rQOC15&printMode=true 5/6
8/15/23, 11:02 PM Walmart.ipynb - Colaboratory

For age 26-35 --> confidence interval of means: (945034.42, 1034284.21)

For age 36-45 --> confidence interval of means: (823347.80, 935983.62)
For age 18-25 --> confidence interval of means: (801632.78, 908093.46)
For age 46-50 --> confidence interval of means: (713505.63, 871591.93)
For age 51-55 --> confidence interval of means: (692392.43, 834009.42)
For age 55+ --> confidence interval of means: (476948.26, 602446.23)
For age 0-17 --> confidence interval of means: (527662.46, 710073.17)

check 0s completed at 11:00 PM

https://colab.research.google.com/drive/18DgebZPGy-tMWk9OnkIC91-iKYDk6ymX#scrollTo=d6vbi-rQOC15&printMode=true 6/6

Walmart - Project - Jupyter Notebook
No ratings yet
Walmart - Project - Jupyter Notebook
7 pages
Business Case Study Walmart New
No ratings yet
Business Case Study Walmart New
37 pages
ML Lab Manual 1-10
No ratings yet
ML Lab Manual 1-10
58 pages
Logistic Regression 007
No ratings yet
Logistic Regression 007
1 page
Walmart Business Case - Updated
No ratings yet
Walmart Business Case - Updated
47 pages
Exp 12 and 15
No ratings yet
Exp 12 and 15
4 pages
Exp8 Clutering
No ratings yet
Exp8 Clutering
5 pages
Business Case - Aerofit - Descriptive Statistics Probability (Final)
100% (1)
Business Case - Aerofit - Descriptive Statistics Probability (Final)
1 page
Btech1010622 Lab4
No ratings yet
Btech1010622 Lab4
4 pages
Walmart - A Case Study
No ratings yet
Walmart - A Case Study
51 pages
Exercise 5 - Vlookup and SUMIF
No ratings yet
Exercise 5 - Vlookup and SUMIF
211 pages
Mall Customer
No ratings yet
Mall Customer
1 page
Aerofit Eda
No ratings yet
Aerofit Eda
25 pages
Ex 12
No ratings yet
Ex 12
2 pages
K Means Clustering For Customer Data
No ratings yet
K Means Clustering For Customer Data
6 pages
Diwali Sales Anlaysis
No ratings yet
Diwali Sales Anlaysis
10 pages
Aerofit
No ratings yet
Aerofit
7 pages
AML Project LearnerNotebook LowCode
No ratings yet
AML Project LearnerNotebook LowCode
74 pages
Walmart - Case - Study Ref
No ratings yet
Walmart - Case - Study Ref
1 page
ML Assignment No 5
No ratings yet
ML Assignment No 5
11 pages
Churn Prediction with Python
No ratings yet
Churn Prediction with Python
7 pages
Walmart Case Study
No ratings yet
Walmart Case Study
40 pages
Diwali Sales Analysis EDA 1696347982
No ratings yet
Diwali Sales Analysis EDA 1696347982
8 pages
Project Sale Analysis
No ratings yet
Project Sale Analysis
8 pages
Project
No ratings yet
Project
12 pages
Prac 2
No ratings yet
Prac 2
11 pages
Vijay Shankar Customer Churn Random Forest Hyperparameter Tuning
No ratings yet
Vijay Shankar Customer Churn Random Forest Hyperparameter Tuning
40 pages
Customer Churn Syntax
No ratings yet
Customer Churn Syntax
66 pages
Analysis
No ratings yet
Analysis
37 pages
Prac 2
No ratings yet
Prac 2
11 pages
Walmart Solution PDF
No ratings yet
Walmart Solution PDF
35 pages
Business Case Aerofit Descriptive Statistics & Probability
No ratings yet
Business Case Aerofit Descriptive Statistics & Probability
12 pages
Class Notes
No ratings yet
Class Notes
30 pages
SSCE PRACTICAL EXAMINATION Xii Ip 2024-25
No ratings yet
SSCE PRACTICAL EXAMINATION Xii Ip 2024-25
2 pages
Lab Programmes Adwaith
No ratings yet
Lab Programmes Adwaith
18 pages
Aosdijfpqoiew
No ratings yet
Aosdijfpqoiew
6 pages
Aerofit Case Study
No ratings yet
Aerofit Case Study
16 pages
Guides
No ratings yet
Guides
23 pages
E-Commerce Product Delivery Prediction
No ratings yet
E-Commerce Product Delivery Prediction
13 pages
KMEANS
No ratings yet
KMEANS
13 pages
Technologyname Phase2
No ratings yet
Technologyname Phase2
20 pages
ML Lab FileDhruv
No ratings yet
ML Lab FileDhruv
74 pages
Matplotlib Library in Python
No ratings yet
Matplotlib Library in Python
85 pages
Practical File (Xii - Ip) 2023-24
No ratings yet
Practical File (Xii - Ip) 2023-24
40 pages
Hrithik Saini Class 12th c1, Roll No 1033
No ratings yet
Hrithik Saini Class 12th c1, Roll No 1033
25 pages
EDA Diwali Sale Analysis Project
No ratings yet
EDA Diwali Sale Analysis Project
11 pages
Data Science Practical Guide
No ratings yet
Data Science Practical Guide
28 pages
Howxtre
No ratings yet
Howxtre
8 pages
ML Cops
No ratings yet
ML Cops
17 pages
IP Practical File2
No ratings yet
IP Practical File2
35 pages
Data Science Sample
No ratings yet
Data Science Sample
5 pages
Aerofit - Business - Case - JupyterLab
No ratings yet
Aerofit - Business - Case - JupyterLab
36 pages
Ip Final Practical File
No ratings yet
Ip Final Practical File
22 pages
Experiment 8
No ratings yet
Experiment 8
9 pages
Nikita Prasad - Exploratory Data Analysis (EDA)
No ratings yet
Nikita Prasad - Exploratory Data Analysis (EDA)
18 pages
Big Data Analytics for Businesses
No ratings yet
Big Data Analytics for Businesses
28 pages
AI Tools Repository
100% (1)
AI Tools Repository
4 pages
Rockwell Test
No ratings yet
Rockwell Test
4 pages
Ad 137b23a-B Lowres
No ratings yet
Ad 137b23a-B Lowres
4 pages
III Internals Hydrology and Irrigation Engineering
No ratings yet
III Internals Hydrology and Irrigation Engineering
5 pages
Flipkart Email-Chat Blended Process
No ratings yet
Flipkart Email-Chat Blended Process
3 pages
The Evolution of Management Theory
No ratings yet
The Evolution of Management Theory
49 pages
BWF Coaches' Manual Level 1 PDF
No ratings yet
BWF Coaches' Manual Level 1 PDF
249 pages
Do's and Don'ts in TCS Selection
No ratings yet
Do's and Don'ts in TCS Selection
2 pages
K 5 Science Lesson Plan
No ratings yet
K 5 Science Lesson Plan
2 pages
Exit Ticket Multiplication Area Model Method Word Problems TURN IT in
No ratings yet
Exit Ticket Multiplication Area Model Method Word Problems TURN IT in
7 pages
Rizal Law and Filipino Nationalism
No ratings yet
Rizal Law and Filipino Nationalism
13 pages
About CPCL
No ratings yet
About CPCL
64 pages
6.5 Tools and Techniques: 2.critical Path Method (CPM)
No ratings yet
6.5 Tools and Techniques: 2.critical Path Method (CPM)
8 pages
Merits and Flaws
No ratings yet
Merits and Flaws
29 pages
A Case Study Upon Non-Functional Requirements of Online Banking System
No ratings yet
A Case Study Upon Non-Functional Requirements of Online Banking System
6 pages
Seafarers' Welfare Directory Guide
No ratings yet
Seafarers' Welfare Directory Guide
164 pages
Media and Policy Making in The Digital Age
No ratings yet
Media and Policy Making in The Digital Age
19 pages
RE331298 - Suspension Shock Absorber Kit
No ratings yet
RE331298 - Suspension Shock Absorber Kit
2 pages
Ingles 1er Bchto U 1-5
No ratings yet
Ingles 1er Bchto U 1-5
40 pages
08 Chapter 3
No ratings yet
08 Chapter 3
15 pages
Slides - Design Guideline For HDI (MULTEK)
No ratings yet
Slides - Design Guideline For HDI (MULTEK)
11 pages
PDF Fixed Point Signal Processors 1st Edition Wayne T. Padgett Download
100% (19)
PDF Fixed Point Signal Processors 1st Edition Wayne T. Padgett Download
47 pages
Gaslands FAQ
100% (1)
Gaslands FAQ
8 pages
Solar Energy Fundamentals (Citizenre Training)
90% (20)
Solar Energy Fundamentals (Citizenre Training)
69 pages
Ian 124-11 PDF
No ratings yet
Ian 124-11 PDF
70 pages
Rigaku Progeny Spec Sheet LTR 3.14
100% (1)
Rigaku Progeny Spec Sheet LTR 3.14
2 pages
Arduino MEGA 2560 With WiFi Built in ESP8266 PDF
No ratings yet
Arduino MEGA 2560 With WiFi Built in ESP8266 PDF
6 pages
Lab 9: O A: Perational Mplifiers
No ratings yet
Lab 9: O A: Perational Mplifiers
6 pages
Community Engagement for Health
No ratings yet
Community Engagement for Health
16 pages