Analyzing customer behavior and revenue patterns to uncover business insights and growth opportunities in a Brazilian e-commerce marketplace.
This project analyzes customer behavior and revenue patterns for a Brazilian e-commerce marketplace using historical order data. The goal is to identify key customer segments, understand purchase frequency and recency, and evaluate the drivers of revenue and regional growth. Insights from this analysis highlight opportunities for improving customer retention and long-term business value.
Image Source: Google Images
Date Source: Kaggle
- Project Overview
- Dataset
- Business Questions Covered
-
3.1 Revenue Leakage:
- What is the companyβs current baseline revenue?
- What is the current level of revenue leakage?
- What is the current distribution of order statuses across all orders?
- Are there specific product categories with a high frequency of order cancellations?
- Is there a relationship between a high number of cancellations and increased revenue loss?
-
3.2 Customer Segmentation(RFM Analysis):
- What is the current customer recency profile?
- What are the purchasing frequency patterns among customers?
- Is revenue primarily driven by order volume or by high-value customers?
-
3.3 Regional Performance With Growth Trends:
- How is revenue distributed across states?
- How is order volume distributed across states?
- Are high-revenue states driven by higher order volume or higher average order value (AOV)?
- What is the trend between revenue and order volume over time for top-performing states?
- Tech Stack
- Project Structure
This project analyzes the revenue leakage, customer segmentation (RFM) and regional performance with growth trends for a Brazilian e-commerce marketplace using historical order data. The goal is to identify the source of revenue leakage, key customer segments, understand purchase frequency and recency, and evaluate the drivers of revenue and regional growth. Insights from this analysis highlight opportunities for improving customer retention and long-term business value.
The analysis uses the Olist e-commerce dataset, which includes historical information about customers, orders, payments, products, geo-location, and sellers across Brazil. Only delivered orders were considered for revenue-related analyses to ensure accurate evaluation of realized revenue.The image below represents the schema for the database:

Image Source: Google Images
Charts are built using Plotly and GitHub doesn't render interactive charts Each section includes a link, click the link stated as Open Notebookπ to view the full notebook with an interactive charts in a new tab with an option below:(Ctrl+Click / Cmd+Click)
- Total market opportunity: R$15.84M.
- Total revenue loss from order cancellation: R$0.11M.
- Total revenue realized: R$15.42M
- The current revenue leakage is below 1% (0.69%) suggesting an effective order fulfillment and a low order cancellation impact.
- A total count of 8 order status. They are:
delivered: 96,478shipped: 1,107canceled: 625unavailable: 609invoiced: 314processing: 301created: 5approved: 2
There are two layers for the canceled orders, with-items and without-items.
- Canceled orders with-items: 461
- Canceled orders without-items: 164
Out of 625 canceled orders, 461 had items (hard revenue loss), and 164 were canceled before item-level data was recorded (soft revenue loss).
- These are the top 10 product categories in descending order with the total amount lost and their revenue leakage in % whose order gets canceled too often:
- All the
product categoriesfollows a linear relationship where higher cancellations lead to higher revenue loss with just one exception, i.e.cool_stuffcategory, it has the highest revenue loss R$ 15,153.48 despite low cancellations (16),sports_leisurefaces the highest number of order cancellation of 51 and has a revenue loss of R$ 9,411.85.
Open Notebookπ
-
On average, customers take approximately 248 days to place their
second purchase, indicating along repurchase cycle. -
Out of 96,478 customers, only 18,741 return within 1β100 days, while the majority (45,240 customers) make a
second purchase between100β300 days. -
A significant portion of customers show very long gaps between purchases, with 25,406 customers returning after 300β500 days, and 7,091 customers taking more than 500 days.
-
Overall, repeat purchasing behavior is
heavily concentratedin the 100β300 day window, suggestinglow short-term retention and a slow repeat-purchase cycle.
- The analysis is done by segmenting the customers into
5 different segements:
One-Time Buyer:1 orderReturning Customer:2 ordersLoyal Customer:3 ordersVery Loyal Customer:4 ordersVIP Customers: 5 or more orders
- Where, the
One-Time Buyerdomniates the customer base accounting for 96.947%, a total of 90,557 customers. Returning(2.756% or 2,573 customers) andLoyal(0.194% or 181 customers) form a much smaller but strategically important segment.Very Loyal(0.030% or 28 customers) andVIP(0.020% or 19) customers represent a very small fraction but are likely to contribute disproportionately higher lifetime value.
- Order volume is the KEY revenue driver. The companys revenue
heavily relies on the One-Time Buyer, with a total of 90,557 customers,a total revenueof R$ 14.55 M, withan average revenueof R$ 160.73 per customer, and arevenue contributionof 97.00 %. - The
value of the customer increases as the customer counts for their second purchase starts decreasing, although, revenue contribution (%) for the rest of the segments exceptOne-Time Buyeris < 2.7%, value wise, the company earns on an average of R$ 291.02 to R$ 787.14 per customer in other remaining4 segmentswhile the average revenue per customer forOne-Time Buyeris just R$ 160.73. - A table below summarizes the Volume Vs Value for the company:
Open Notebookπ
-
Out of 27 states,
SΓ£o Paulois the highest revenue-generating state, contributing R$ 5,773,869.02 (37.44% of total revenue). -
Roraimais the lowest revenue-generating state, contributing R$ 9,039.52 (0.06% of total revenue).
-
SΓ£o Paulohas the highest volume (40,519 orders or 42.0%) for orders and is also the highest revenue generating state. -
Roraimahas the lowest volume (41 orders or 0.04%) for orders and is also the lowest revenue generating state.
The Volume of Order and Revenue Generation has a linear relationship for majority of the states.
ParaΓbastate has the highestAOVof 1.68x but is the lowest revenue generating state, contributing 0.89% in thetotal revenue, with the lowest number of orders contribution 0.53% in thetotal order-volume.SΓ£o Paulostate has the lowestAOVof 0.89x but is thehighest revenue generating state, with a contribution of 37.44%, inthe total revenue, with the highest number of orders contribution 42.00%, inthe total order-volume.AOV and Order volumeshowsan inverse relationship, several states exhibit high AOV but low order volume,indicating premium but niche markets.
- Revenue and order volume exhibit nearly identical growth patterns, indicating that
revenue growth is primarily driven by increasing order volume rather than price effects. SΓ£o Pauloconsistently leads across time withthe highest order volumeof 46,448.00 andshows the steepest growth trajectory, significantly outperforming other states.SΓ£o Paulois also thehighest revenue-generating stateof $R5,769,703.15, reinforcing its role as the primary growth engine for the business.
-
Programming and analysis:
- Python (Pandas, NumPy)
- SQL (JOINS, AGGREGATION, WINDOW FUNCTIONS)
-
Data visualization:
- Plotly (interactive charts)
- Matplotlib
-
Environment and tools:
- Anaconda (virtual environment)
- Jupyter Notebook
- GitHub (version control and project hosting)
βββ data/
β βββ raw/ # Raw CSV files used to create the e-commerce database
βββ docs/ # Exported HTML notebooks for GitHub viewing
βββ notebooks/ # Jupyter notebooks with analysis, interactive visuals, and insights
βββ plots/ # Reusable Python functions for Plotly visualizations
βββ plot_html/ # Interactive Plotly charts saved as HTML
βββ sql/ # SQL queries used for metric calculations
β βββ 02_Revenue_Leakage_Analysis/
β βββ 03_Customer_Segmentation_(RFM_Analysis)/
β βββ 04_Regional_Performance_with_Growth_Trends/
βββ src/ # Python modules executing SQL and computing metrics
β βββ revenue_leakage/
β βββ rfm_analysis/
β βββ regional_performance/
βββ utils/ # Helper and utility functions


