Data Understanding
1. Exploring the Dataset
The dataset used for this project is titled "E-commerce behavior data from a multi-
category store", sourced from Kaggle. It records user activity throughout November
2019 at the product-interaction level. Each row represents a specific user event such
as viewing, adding to cart, favoriting, or purchasing an item.
Key features of the dataset include:
Feature Description
event_time The timestamp of the user interaction (originally in UNIX format)
user_id Unique identifier for the user
product_id Unique identifier for the product
Product category in a hierarchical format (e.g.,
category_code
electronics.smartphone)
brand Product brand name
price Price of the product at the time of interaction
event_type Behavior type: view, cart, purchase, or favorite
2. Key Data Points
User ID & Product ID: Central for tracking unique behaviors and identifying
trends in individual engagement or item popularity.
Event Type: Vital for building behavior funnels and understanding customer
journeys.
Timestamp: Enables time-series analysis, such as detecting seasonal spikes or
evaluating campaign effects.
3. Observed Patterns and Trends
📈 Daily Visitor Activity
Figure 1 – Daily number of interactions
There was a dramatic spike in user activity around November 11 (Single’s Day), with
nearly 500,000 interactions — more than double the regular volume. This suggests
strong seasonal influence and possibly marketing campaigns aligned with major
shopping events.
💰 Daily Average Price Trends
Figure 2 – Average daily product price
Prices remained relatively stable with minor fluctuations, but there was a noticeable
price peak around November 14–15, likely tied to high-demand events or
promotions. Following this, prices showed a gradual decline through the rest of the
month.
📦 Product Categories by Interaction
Figure 3 – Distribution of interactions by product category
The most popular category was clearly electronics.smartphone, dominating the
dataset. Other frequently interacted categories include
appliances.kitchen.refrigerators, computers.notebook, and apparel.shoes. This
reinforces the platform's tech-oriented product focus.
Top Brands by Engagement
Figure 4 – Distribution of interactions by brand
Brand interaction is led by Apple, Samsung, and Xiaomi. These top brands align with
the popularity of the smartphone category. Secondary brands include Huawei, LG,
Sony, and HP, which are also strong in electronics and computing.
🛒 Event Type Breakdown
Figure 5 – Event type distribution (view, cart, purchase)
94.15% of events were product views.
4.49% were items added to the cart.
Only 1.36% of interactions led to a purchase.
This highlights the natural conversion gap typical in e-commerce and reveals an
opportunity to focus predictive modeling on identifying which views and cart actions
will lead to purchases.
4. External Influences
Single’s Day (11/11) and Black Friday (end of November) likely contributed to
the spikes in both interaction volume and pricing.
While promotional events and marketing campaigns are not explicitly
recorded in the data, their influence is clearly visible in the trends.
Price dynamics also show potential signs of discounting or high-demand
inflation around mid-November.
✅ Summary
The dataset provides a rich behavioral snapshot of user interactions in a high-traffic
e-commerce environment. The clear patterns in user activity, product popularity, and
seasonal impact offer strong potential for predictive modeling, such as:
Forecasting purchase likelihood
Detecting high-converting product segments
Analyzing campaign or price sensitivity
By combining structured event logs with temporal and categorical breakdowns, this
dataset lays a solid foundation for further AI-driven business insights.