0% found this document useful (0 votes)

88 views4 pages

Solution

Uploaded by

kkesarkar5

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

88 views4 pages

Solution

Uploaded by

kkesarkar5

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Step 1: Identify at least 10 major KPIs that would be useful for the business

Based on the dataset, I have identified the following 10 major KPIs that would be useful for the
business:

 Sales Revenue: Total sales revenue generated by the supermarket chain

 Customer Count: Number of unique customers who have made purchases

 Average Order Value (AOV): Average amount spent by customers in a single transaction

 Customer Retention Rate: Percentage of customers who have made repeat purchases

 Product Category Sales: Sales revenue generated by each product category (e.g. dairy,
bakery, etc.)

 Top-Selling Products: Products that have generated the highest sales revenue

 Region-wise Sales: Sales revenue generated by each region (e.g. Chennai, Coimbatore, etc.)

 State-wise Sales: Sales revenue generated by each state (e.g. Tamil Nadu, Karnataka, etc.)

 Gross Margin: Difference between revenue and cost of goods sold

 Inventory Turnover: Number of times inventory is sold and replaced within a given period

Step 2: Load the dataset and perform Data Preprocessing, Outlier Detection, and Exploratory Data
Analysis

To perform data preprocessing, outlier detection, and exploratory data analysis, I will use Python
with the Pandas and NumPy libraries.

import pandas as pd

import numpy as np

# Load the dataset

df = pd.read_csv('Supermart Grocery Sales - Retail Analytics Dataset.csv')

# Data Preprocessing

# Check for missing values

print(df.isnull().sum())

# Handle missing values (e.g. impute with mean or median)

df.fillna(df.mean(), inplace=True)
# Outlier Detection

# Use the Z-score method to detect outliers

from scipy import stats

z_scores = np.abs(stats.zscore(df))

print(z_scores)

# Exploratory Data Analysis

# Summary statistics

print(df.describe())

# Visualize the data using plots and charts

import matplotlib.pyplot as plt

df.plot(kind='bar')

plt.show()

Output:

 Summary statistics of the dataset

 Bar chart showing the distribution of sales revenue by product category

Step 3: Use Association Rule Mining technique to identify the items frequently bought together
and their demands

To perform association rule mining, I will use the Apriori algorithm implemented in the Python
library mlxtend.

from mlxtend.frequent_patterns import apriori

from mlxtend.frequent_patterns import association_rules

# Convert the dataset to a transactional format

transactions = []

for index, row in df.iterrows():

transactions.append(row['Item Name'])

# Perform association rule mining

frequent_itemsets = apriori(transactions, min_support=0.01, use_colnames=True)

rules = association_rules(frequent_itemsets, metric='confidence', min_threshold=0.5)

# Print the top 10 rules

print(rules.head(10))

Output:

 Top 10 association rules showing the items frequently bought together and their demands

Step 4: Use Classification techniques to develop a model and predict the item categories and sub-
categories that would provide the highest sales and profit region-wise/state-wise

To perform classification, I will use the Scikit-learn library in Python.

from sklearn.model_selection import train_test_split

from sklearn.ensemble import RandomForestClassifier

from sklearn.metrics import accuracy_score, classification_report

# Prepare the dataset for classification

X = df.drop(['Item Category', 'Item Sub-Category'], axis=1)

y = df['Item Category']

# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Train a random forest classifier

rfc = RandomForestClassifier(n_estimators=100, random_state=42)

rfc.fit(X_train, y_train)

# Make predictions on the testing set

y_pred = rfc.predict(X_test)

# Evaluate the model

print('Accuracy:', accuracy_score(y_test, y_pred))

print('Classification Report:')

print(classification_report(y_test, y_pred))

Output:

 Accuracy and classification report of the random forest classifier

Step 5: Modify the dataset to incorporate the Non-Volatile feature of data warehouse

To modify the dataset to incorporate the Non-Volatile feature of data warehouse, I will create a new
column Version to track changes to the data.

# Create a new column 'Version' to track changes

df['Version'] = 1

# Save the modified dataset to a new CSV file

df.to_csv('Supermart Grocery Sales - Retail Analytics Dataset_Modified

Lab Manual 4
No ratings yet
Lab Manual 4
23 pages
BigMart PDF
100% (1)
BigMart PDF
42 pages
Data Analysis On BigMart Sales
67% (3)
Data Analysis On BigMart Sales
17 pages
Supermart Grocery Sales - Retail Analytics Dataset (Finance Analyst)
No ratings yet
Supermart Grocery Sales - Retail Analytics Dataset (Finance Analyst)
19 pages
Supermart Grocery Sales Analysis
No ratings yet
Supermart Grocery Sales Analysis
8 pages
EDA Report Week2
No ratings yet
EDA Report Week2
15 pages
Retail Sales Analytics Project
100% (1)
Retail Sales Analytics Project
3 pages
Prediction of Sales On Market Basket Data Using: Machine Learning Techniques (Apriori and FP Growth)
No ratings yet
Prediction of Sales On Market Basket Data Using: Machine Learning Techniques (Apriori and FP Growth)
23 pages
Olist Kasyapa
No ratings yet
Olist Kasyapa
22 pages
ML Report
No ratings yet
ML Report
11 pages
Supermart Grocery Sales - Retail Analytics Dataset - (Data Analyst)
No ratings yet
Supermart Grocery Sales - Retail Analytics Dataset - (Data Analyst)
17 pages
Sample 3rd Project I
No ratings yet
Sample 3rd Project I
13 pages
Supermarket Sales Analysis 1
No ratings yet
Supermarket Sales Analysis 1
13 pages
Case Study-1-Pattern Discovery in Supermarket Sales Transactions Using EDA
No ratings yet
Case Study-1-Pattern Discovery in Supermarket Sales Transactions Using EDA
3 pages
MRA Milestone-2
No ratings yet
MRA Milestone-2
20 pages
Data Analysis
No ratings yet
Data Analysis
10 pages
QuickBasket Basket Breakdown
No ratings yet
QuickBasket Basket Breakdown
4 pages
21f1000089 BDM Proposal
No ratings yet
21f1000089 BDM Proposal
8 pages
Data Mining with FP-Growth
No ratings yet
Data Mining with FP-Growth
2 pages
Data Mining
No ratings yet
Data Mining
10 pages
MRA Project Milestone 2
71% (17)
MRA Project Milestone 2
20 pages
PRJ Sales Forecasting
No ratings yet
PRJ Sales Forecasting
22 pages
Dinya Antony MRA ML2
100% (1)
Dinya Antony MRA ML2
24 pages
Supermart Grocery Sales Data Analysis
No ratings yet
Supermart Grocery Sales Data Analysis
13 pages
Mini Project - Documentation (0052)
No ratings yet
Mini Project - Documentation (0052)
3 pages
Kaviya V Phase1 Report
No ratings yet
Kaviya V Phase1 Report
3 pages
21f1000089 BDM Proposal
No ratings yet
21f1000089 BDM Proposal
6 pages
Customer Segmentation in Python
No ratings yet
Customer Segmentation in Python
71 pages
Final Ca
No ratings yet
Final Ca
10 pages
Amazon Sales Analysis
No ratings yet
Amazon Sales Analysis
51 pages
Supermarket Sales Data Analysis
No ratings yet
Supermarket Sales Data Analysis
6 pages
Project Amazon Sales Data Analysis
No ratings yet
Project Amazon Sales Data Analysis
12 pages
Apriori 2
No ratings yet
Apriori 2
19 pages
Python for Business Analysts
No ratings yet
Python for Business Analysts
21 pages
Task 1 - Data Preparation and Customer Analytics - Jupyter Notebook
No ratings yet
Task 1 - Data Preparation and Customer Analytics - Jupyter Notebook
64 pages
Task 2 - Experimentation and Uplift Testing - Jupyter Notebook
No ratings yet
Task 2 - Experimentation and Uplift Testing - Jupyter Notebook
41 pages
Retail Data Insights for Retailers
No ratings yet
Retail Data Insights for Retailers
25 pages
Research Paper On Retail Data Analytics
No ratings yet
Research Paper On Retail Data Analytics
6 pages
Ai Phase2
No ratings yet
Ai Phase2
4 pages
Case Study Reportf
No ratings yet
Case Study Reportf
6 pages
SQL Capstone Project
No ratings yet
SQL Capstone Project
4 pages
CDAC Assignment
No ratings yet
CDAC Assignment
3 pages
The Factors Affecting Big Mart's Sales
No ratings yet
The Factors Affecting Big Mart's Sales
20 pages
Major ppt-1
No ratings yet
Major ppt-1
13 pages
Market Basket Analysis Project
No ratings yet
Market Basket Analysis Project
12 pages
Project Report
No ratings yet
Project Report
57 pages
Wa0002.
No ratings yet
Wa0002.
4 pages
Another Project-Creating Customer Segments
No ratings yet
Another Project-Creating Customer Segments
31 pages
Gaurav Upadhyay ML Project
No ratings yet
Gaurav Upadhyay ML Project
8 pages
File 2620
No ratings yet
File 2620
24 pages
CUSTOMER ANALYSIS - Report
No ratings yet
CUSTOMER ANALYSIS - Report
10 pages
Excel Project
No ratings yet
Excel Project
2 pages
Big Mart Sales Data Analysis Guide
No ratings yet
Big Mart Sales Data Analysis Guide
3 pages
B M Sale Analysis
No ratings yet
B M Sale Analysis
3 pages
Data Science Hack Night Guide
No ratings yet
Data Science Hack Night Guide
3 pages
Marketing & Retail Analytics - Project 2
No ratings yet
Marketing & Retail Analytics - Project 2
28 pages
Document 11
No ratings yet
Document 11
6 pages
Intro To BA
No ratings yet
Intro To BA
7 pages
Chapter 5 Tool Support For Testing
No ratings yet
Chapter 5 Tool Support For Testing
7 pages
AI Annotation in Image
No ratings yet
AI Annotation in Image
7 pages
Research - Paper - Format - For MCA
No ratings yet
Research - Paper - Format - For MCA
6 pages
Portfolio Final Report
No ratings yet
Portfolio Final Report
42 pages
Robotics Project 1
No ratings yet
Robotics Project 1
4 pages
Mini Project Certificate
No ratings yet
Mini Project Certificate
3 pages
Chapter 4 Data Visualization
No ratings yet
Chapter 4 Data Visualization
21 pages
MODULE 3 - Data Management
0% (1)
MODULE 3 - Data Management
24 pages
Demand Forecasting Methods
No ratings yet
Demand Forecasting Methods
5 pages
Student Study Time & Grades Analysis
No ratings yet
Student Study Time & Grades Analysis
7 pages
Reliability and Validity OF THE 505 AGILITY TEST
No ratings yet
Reliability and Validity OF THE 505 AGILITY TEST
1 page
Tutorial 1 STA416
No ratings yet
Tutorial 1 STA416
3 pages
Stats Test II
No ratings yet
Stats Test II
3 pages
Meta Analysis-1
No ratings yet
Meta Analysis-1
8 pages
Dispersion for Students
No ratings yet
Dispersion for Students
12 pages
Design-Based Causal Inference in Bipartite Experiments
No ratings yet
Design-Based Causal Inference in Bipartite Experiments
32 pages
Lesson 7 Measures of Dispersion Lecture
No ratings yet
Lesson 7 Measures of Dispersion Lecture
32 pages
Box and Whisker Plots
100% (1)
Box and Whisker Plots
2 pages
The Normal Distribution MS
No ratings yet
The Normal Distribution MS
7 pages
Homework 5
No ratings yet
Homework 5
5 pages
4 Measures of Central Tendency, Position, Variability PDF
100% (1)
4 Measures of Central Tendency, Position, Variability PDF
24 pages
(ISOM2500) (2019) (F) Quiz N Sgpimc 83667
No ratings yet
(ISOM2500) (2019) (F) Quiz N Sgpimc 83667
20 pages
Sampling Distribution: Definition
No ratings yet
Sampling Distribution: Definition
39 pages
R Assignment 1
No ratings yet
R Assignment 1
6 pages
Chem 26.1 Calculations ATQ - 6
No ratings yet
Chem 26.1 Calculations ATQ - 6
6 pages
Hypothesis 2
No ratings yet
Hypothesis 2
26 pages
Uji Asumsi Klasik Regression: Model Summary
No ratings yet
Uji Asumsi Klasik Regression: Model Summary
3 pages
Application of ANOVA
50% (10)
Application of ANOVA
18 pages
Intro to R: Stats Tutorial
No ratings yet
Intro to R: Stats Tutorial
3 pages
Honda Civic Regression Analysis
100% (1)
Honda Civic Regression Analysis
13 pages
InferenceForMeans Hotelling PDF
No ratings yet
InferenceForMeans Hotelling PDF
23 pages
Chapter 4 Fin534
No ratings yet
Chapter 4 Fin534
38 pages
UNIT-1 Regression vs. Classification
No ratings yet
UNIT-1 Regression vs. Classification
25 pages
STATA Red Tutorial
100% (1)
STATA Red Tutorial
84 pages
نمذجة المعادلات الهيكلية باستخدام المربعات الصغرى الجزئية مثال تطبيقي باستخدام r في بحوث المحاسبة والتدقيق
No ratings yet
نمذجة المعادلات الهيكلية باستخدام المربعات الصغرى الجزئية مثال تطبيقي باستخدام r في بحوث المحاسبة والتدقيق
18 pages
Chapter 14 - Nonlinear Regression Models
No ratings yet
Chapter 14 - Nonlinear Regression Models
20 pages