[go: up one dir, main page]

0% found this document useful (0 votes)
4 views9 pages

Project

Uploaded by

niharikars.bsc23
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views9 pages

Project

Uploaded by

niharikars.bsc23
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

ANALYSIS REPORT OF

CUSTOMERS
Neha Dhami- SOCSE

Dataset:
The dataset consists of various statistics of
customers across shops. The parameters
include customer id, age, gender, category and
so on. It also gives data about which season
and clothing that the customer has brought
items of various clothing for. It consists of 1700
rows and 18 columns of data.

There were columns with null values above


than 60 percent so the columns (promo code
used, previous purchases and preferred
payment method) were completely dropped
using drop() and columns with less than 40%
null values were filled with mean, median or
mode of the given data type using fillna().
There were 146 duplicate values which were
also removed using drop_duplicates().

Analysis with full dataset:

A bar plot was made comparing Item Price and


Item Purchased and a scatter plot was made
for comparing the Age and the amount they
spent. The graphs have been explained in the
PPT. Here are the graphs:
Grouping and Aggregate:
After finding the mean, median and mode of
each, the following observations were made:
● Most of the customers were aged 43 and
64.
● Highest amount spent was 100 (USD).
● The most purchased item was Sweater.

Then the mean and median of age and amount


spent by customer was found.
● The average age of the customer was 44.
● The average amount spent by the
customer is 59(USD) .
● The average review rating is 3.7.
Percentile:
To differentiate between customer’s statistics, the
90th percentile of various columns was found.

● The 90th percentile of Age was found to


be 64. This means very few aged
customers were there shopping.

● The 90th percentile of Review Rating was


found to be 4.7. This means that very few
products were too good to get a 5-star
rating.

Correlation:
Then, the correlation heatmap was made whose
observations are explained in the PPT:
(Since, the data set is more of a categorical
dataset than a numerical dataset, the correlation
between the columns is very low because no
matter what age and what the purchase amount
was, the customers have bought items according
to their own personal preferences. Hence, the
correlation in not very accurate.)
Skewness and Kurtosis:

The histplot for purchase amount and age is:


The histplot for Overall Rating is:

More info we get from the dataset:


 The most brought item was dress and pants.
 Spring was the season where most of the
people shopped and then during the summer.
 Clothing was the most brought category.
 Most of the customer paid through debit card.
 Green was the most bought colour in clothing.

You might also like