8000 GitHub - Nishan-k/E-commerce-SQL-Analytics: E-commerce customer and revenue analysis (Revenue Leakage, RFM, and Growth Trend Over Time) using SQL, Python, Plotly, and Jupyter Notebook with clear separation of data querying, metric calculations, analysis, visualizations, and insights. Β· GitHub
[go: up one dir, main page]

Skip to content

Nishan-k/E-commerce-SQL-Analytics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

90 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

E-commerce Customer & Revenue Analysis Using SQL and Python

Analyzing customer behavior and revenue patterns to uncover business insights and growth opportunities in a Brazilian e-commerce marketplace.

e-commerce image

This project analyzes customer behavior and revenue patterns for a Brazilian e-commerce marketplace using historical order data. The goal is to identify key customer segments, understand purchase frequency and recency, and evaluate the drivers of revenue and regional growth. Insights from this analysis highlight opportunities for improving customer retention and long-term business value.

Image Source: Google Images
Date Source: Kaggle

Table Of Contents

  1. Project Overview
  2. Dataset
  3. Business Questions Covered
    • 3.1 Revenue Leakage:
      1. What is the company’s current baseline revenue?
      2. What is the current level of revenue leakage?
      3. What is the current distribution of order statuses across all orders?
      4. Are there specific product categories with a high frequency of order cancellations?
      5. Is there a relationship between a high number of cancellations and increased revenue loss?
    • 3.2 Customer Segmentation(RFM Analysis):
      1. What is the current customer recency profile?
      2. What are the purchasing frequency patterns among customers?
      3. Is revenue primarily driven by order volume or by high-value customers?
    • 3.3 Regional Performance With Growth Trends:
      1. How is revenue distributed across states?
      2. How is order volume distributed across states?
      3. Are high-revenue states driven by higher order volume or higher average order value (AOV)?
      4. What is the trend between revenue and order volume over time for top-performing states?
  4. Tech Stack
  5. Project Structure

1. Project Overview

This project analyzes the revenue leakage, customer segmentation (RFM) and regional performance with growth trends for a Brazilian e-commerce marketplace using historical order data. The goal is to identify the source of revenue leakage, key customer segments, understand purchase frequency and recency, and evaluate the drivers of revenue and regional growth. Insights from this analysis highlight opportunities for improving customer retention and long-term business value.

2. Dataset

The analysis uses the Olist e-commerce dataset, which includes historical information about customers, orders, payments, products, geo-location, and sellers across Brazil. Only delivered orders were considered for revenue-related analyses to ensure accurate evaluation of realized revenue.

The image below represents the schema for the database: database relation diagram

Image Source: Google Images

3. Business Questions Covered

NOTE:

Charts are built using Plotly and GitHub doesn't render interactive charts Each section includes a link, click the link stated as Open NotebookπŸ”— to view the full notebook with an interactive charts in a new tab with an option below:

(Ctrl+Click / Cmd+Click)

3.1 Revenue Leakage:

Open NotebookπŸ”—

a. What is the company’s current baseline revenue?

  • Total market opportunity: R$15.84M.
  • Total revenue loss from order cancellation: R$0.11M.
  • Total revenue realized: R$15.42M

b. What is the current level of revenue leakage?

- The current revenue leakage is below 1% (0.69%) suggesting an effective order fulfillment and a low order cancellation impact.

c. What is the current distribution of order statuses across all orders?

- A total count of 8 order status. They are:
  • delivered: 96,478
  • shipped: 1,107
  • canceled: 625
  • unavailable: 609
  • invoiced: 314
  • processing: 301
  • created: 5
  • approved: 2

There are two layers for the canceled orders, with-items and without-items.

  1. Canceled orders with-items: 461
  2. Canceled orders without-items: 164

Out of 625 canceled orders, 461 had items (hard revenue loss), and 164 were canceled before item-level data was recorded (soft revenue loss).


d. Are there specific product categories with a high frequency of order cancellations?

- These are the top 10 product categories in descending order with the total amount lost and their revenue leakage in % whose order gets canceled too often:

Top 10 canceled products


e. Is there a relationship between a high number of cancellations and increased revenue loss?

  • All the product categories follows a linear relationship where higher cancellations lead to higher revenue loss with just one exception, i.e. cool_stuff category, it has the highest revenue loss R$ 15,153.48 despite low cancellations (16), sports_leisure faces the highest number of order cancellation of 51 and has a revenue loss of R$ 9,411.85.

3.2 Customer Segmentation (RFM Analysis):

Open NotebookπŸ”—

a. What is the current customer recency profile?

  • On average, customers take approximately 248 days to place their second purchase, indicating a long repurchase cycle.

  • Out of 96,478 customers, only 18,741 return within 1–100 days, while the majority (45,240 customers) make a second purchase between 100–300 days.

  • A significant portion of customers show very long gaps between purchases, with 25,406 customers returning after 300–500 days, and 7,091 customers taking more than 500 days.

  • Overall, repeat purchasing behavior is heavily concentrated in the 100–300 day window, suggesting low short-term retention and a slow repeat-purchase cycle.


b. What are the purchasing frequency patterns among customers?

  • The analysis is done by segmenting the customers into 5 different segements:
  1. One-Time Buyer: 1 order
  2. Returning Customer: 2 orders
  3. Loyal Customer: 3 orders
  4. Very Loyal Customer: 4 orders
  5. VIP Customers: 5 or more orders
  • Where, the One-Time Buyer domniates the customer base accounting for 96.947%, a total of 90,557 customers.
  • Returning (2.756% or 2,573 customers) and Loyal (0.194% or 181 customers) form a much smaller but strategically important segment.
  • Very Loyal (0.030% or 28 customers) and VIP (0.020% or 19) customers represent a very small fraction but are likely to contribute disproportionately higher lifetime value.

c. Is revenue primarily driven by order volume or by high-value customers?

  • Order volume is the KEY revenue driver. The companys revenue heavily relies on the One-Time Buyer, with a total of 90,557 customers, a total revenue of R$ 14.55 M, with an average revenue of R$ 160.73 per customer, and a revenue contribution of 97.00 %.
  • The value of the customer increases as the customer counts for their second purchase starts decreasing, although, revenue contribution (%) for the rest of the segments except One-Time Buyer is < 2.7%, value wise, the company earns on an average of R$ 291.02 to R$ 787.14 per customer in other remaining 4 segments while the average revenue per customer for One-Time Buyer is just R$ 160.73.
  • A table below summarizes the Volume Vs Value for the company:

RFM


3.3 Regional Performance With Growth Trends:

Open NotebookπŸ”—

a. How is revenue distributed across states?

  • Out of 27 states, SΓ£o Paulo is the highest revenue-generating state, contributing R$ 5,773,869.02 (37.44% of total revenue).

  • Roraima is the lowest revenue-generating state, contributing R$ 9,039.52 (0.06% of total revenue).


b. How is order volume distributed across states?

  • SΓ£o Paulo has the highest volume (40,519 orders or 42.0%) for orders and is also the highest revenue generating state.

  • Roraima has the lowest volume (41 orders or 0.04%) for orders and is also the lowest revenue generating state.

The Volume of Order and Revenue Generation has a linear relationship for majority of the states.


c. Are high-revenue states driven by higher order volume or higher average order value (AOV)?

  • ParaΓ­ba state has the highest AOV of 1.68x but is the lowest revenue generating state, contributing 0.89% in the total revenue, with the lowest number of orders contribution 0.53% in the total order-volume.
  • SΓ£o Paulo state has the lowest AOV of 0.89x but is the highest revenue generating state, with a contribution of 37.44%, in the total revenue, with the highest number of orders contribution 42.00%, in the total order-volume.
  • AOV and Order volume shows an inverse relationship, several states exhibit high AOV but low order volume, indicating premium but niche markets.

d. What is the trend between revenue and order volume over time for the top-performing states?

  • Revenue and order volume exhibit nearly identical growth patterns, indicating that revenue growth is primarily driven by increasing order volume rather than price effects.
  • SΓ£o Paulo consistently leads across time with the highest order volume of 46,448.00 and shows the steepest growth trajectory, significantly outperforming other states.
  • SΓ£o Paulo is also the highest revenue-generating state of $R5,769,703.15, reinforcing its role as the primary growth engine for the business.

4. Tech Stack

  • Programming and analysis:

    • Python (Pandas, NumPy)
    • SQL (JOINS, AGGREGATION, WINDOW FUNCTIONS)
  • Data visualization:

    • Plotly (interactive charts)
    • Matplotlib
  • Environment and tools:

    • Anaconda (virtual environment)
    • Jupyter Notebook
    • GitHub (version control and project hosting)

5. Project Structure


β”œβ”€β”€ data/  
β”‚   └── raw/                  # Raw CSV files used to create the e-commerce database  
β”œβ”€β”€ docs/                     # Exported HTML notebooks for GitHub viewing  
β”œβ”€β”€ notebooks/                # Jupyter notebooks with analysis, interactive visuals, and insights  
β”œβ”€β”€ plots/                    # Reusable Python functions for Plotly visualizations  
β”œβ”€β”€ plot_html/                # Interactive Plotly charts saved as HTML  
β”œβ”€β”€ sql/                      # SQL queries used for metric calculations  
β”‚   β”œβ”€β”€ 02_Revenue_Leakage_Analysis/  
β”‚   β”œβ”€β”€ 03_Customer_Segmentation_(RFM_Analysis)/  
β”‚   └── 04_Regional_Performance_with_Growth_Trends/  
β”œβ”€β”€ src/                      # Python modules executing SQL and computing metrics  
β”‚   β”œβ”€β”€ revenue_leakage/  
β”‚   β”œβ”€β”€ rfm_analysis/  
β”‚   └── regional_performance/  
└── utils/                    # Helper and utility functions

About

E-commerce customer and revenue analysis (Revenue Leakage, RFM, and Growth Trend Over Time) using SQL, Python, Plotly, and Jupyter Notebook with clear separation of data querying, metric calculations, analysis, visualizations, and insights.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

0