Case Study: Case Study: Online Retail Sales Data Preprocessing
Dataset:
The dataset (online_retail.csv) has the following columns:
InvoiceNo: Invoice number.
StockCode: Product code.
Description: Product description.
Quantity: Quantity of products sold.
InvoiceDate: Date and time of the invoice.
UnitPrice: Price per unit.
CustomerID: Customer ID.
Country: Country where the sale occurred.
Aim:
Prepare the dataset for analysis by performing the following preprocessing tasks:
Data Cleaning:
Handle missing values.
Remove duplicate rows.
Check for and handle outliers.
Feature Engineering:
Create a new column for the total amount spent (TotalAmount).
Data Transformation:
Convert the InvoiceDate column to datetime format.
Create new columns for the month and year of the invoice.
Categorical Variable Encoding:
Encode categorical variables, such as Country.