CIA 01-Data Visualization 2228328
CIA 01-Data Visualization 2228328
CIA-1
By
LIKHITHA S
2228328
MBA PROGRAMME
SCHOOL OF BUSINESS AND MANAGEMENT
CHRIST (DEEMED TO BE UNIVERSITY), BANGALORE
SL. Table of Contents Page No
No
1 Introduction 2
2 About the data 2
3 Levels of Measurement for Dataset 2
4 Data Size 3
4 Changes in the Dataset pre-analysis 3
5 Analysis 4
6 Deep Dive 4
7 Variable Relationships 6
8 Top Brands Overall 7
9 Conclusion 8
10 Reference 8
1
Introduction:
Data is the facts and figures collected, analyzed, and summarized for presentation and interpretation. All
the data collected in a particular study are referred to as the datasets for the study.
In this report, I have used the dataset of 30000 women’s fashion products from Kaggle. I have employed
my knowledge to identify different variables, distinguish types of measurements of data variables, and
have used data analysis to thoroughly analyze, visualize and interpret the dataset to gather relevant
business insights and have presented the same to you in a systematic format.
I have used various Histograms, Pie charts, bar graphs, scatterplots, and tables to support my analysis
and insights. This also underscores the importance of data analysis in today's data-driven world.
Nominal: - A nominal scale usually deals with the non-numeric variables or the numbers that do
not have any value. Using the nominal scale of measurement, the data can be classified but cannot
be added, subtracted, multiplied, or divided
2
Ordinal: - An ordinal scale indicates the order and ranking of data without specifying the degree
of variation among the data. Ordinal data is known as categorical. Grouping, naming, and orders
are possible.
Interval: - The interval scale of measurement includes those values that can be measured in a
specific interval, for example, time, temperature, etc. It shows the order of variables with a
meaning proportion or difference between them.
Ratio: - The ratio scale is the most comprehensive scale among others. It includes the properties
of all the above three scales of measurement. The ratio scale has a unique feature i.e., it possesses
the character of the origin or zero points.
As the 'Discount' variable contains some text in its entries and hence is categorical, we need to change it
to continuous by only taking the number entry from it.
Ex: - '50% off' becomes 50, '20% off' becomes 20, Etc.
Data Size
Total records 30758
Nan records 1183
Records with no MRP or Discount 7025
Total records 22550
3
MRP converted to a number (Transformation of levels of measurement to continuous ratio)
Discount percentages converted to a number (Transformation of levels of measurement to
continuous ordinal)
Removed blank records: 1183 records
Removed records with missing MRP as it will not be useful for analysis: 7025 records
Analysis
Additional Calculate fields:
There are multiple cuts and slices that we can use on this database. We are focusing on Categories and
Brands here. Total Sales seen in the data are for 3.32 crores through 22.5k transactions across more than
170 brands.
Deep Dive
Brands can be categorized into seven categories, out of which Indian and Western wear is the major
contributors and contribute to 74% of the total sales whereas Watches and Fragrance give the highest Sell
Price per transaction.
4
By sales values, 69% of the transactions have a Sell Price of less than 1500.
Top Brands in each category can be seen in the following chart, ‘Casio’ has the highest Avg. sell price per
Transaction whereas ‘Vastranand’ contributes to the highest value sales.
5
In the next section, I have used the original variables for showing if there is any relationship between
them.
Variable Relationships:
Left: Original data – to check if there is any trend seen (Correlation = 0.955)
Right: After removing the outlier data point – to check if the trend still holds (Correlation = 0.950)
The correlation value for the Number of Transaction vs. Discount is the Number of transactions that increases
as the Discount amount is increased for Western wear.
Western wear category shows high dependence on the discounts given. Several transactions are seen to
increase with high discount value.
6
Left: Original data – to check if there is any trend seen (Correlation = 0.931)
Right: After removing the outlier data point – to check if the trend still holds (Correlation = 0.892)
Indian wear category, as well as Lingerie&Nightwear, also show dependence on the discounts given. The
number of transactions is seen to increase with increasing discount value.
While the other categories – Watches, Jewelry, and Fragrances show very less dependency on the discount
value given.
7
Conclusion
1. 31% of total sales and 34% of total transactions are contributed by the top 10 brands, while the
top 2 categories (Indian and Western wear) contribute 74% of total sales
2. Garments – Indian wear, Western wear, Lingerie&Nightwear are seen to be more discount
dependent for increasing number of transactions.
3. Vastranand dominates the Indian wear category and also is the highest in terms of sales.
4. 69% of the transactions have Sell Price less than 1500
5. Watches and Fragrance have the highest Sell Price per transaction.
Data Visualization offers an extraordinary insight into key business aspects. It can enable businesses to
unearth their strengths and weaknesses and in turn, help them grow stronger in today's data-driven
landscape.