[go: up one dir, main page]

0% found this document useful (0 votes)
7 views2 pages

Case Study - Cleaning The Data Set Using Pandas

The document outlines a case study focused on preprocessing an online retail sales dataset. Key tasks include data cleaning (handling missing values, removing duplicates, and addressing outliers), feature engineering (creating a total amount spent column), and data transformation (converting dates and encoding categorical variables). The dataset contains essential columns such as InvoiceNo, StockCode, Quantity, and CustomerID.

Uploaded by

skmosid05
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views2 pages

Case Study - Cleaning The Data Set Using Pandas

The document outlines a case study focused on preprocessing an online retail sales dataset. Key tasks include data cleaning (handling missing values, removing duplicates, and addressing outliers), feature engineering (creating a total amount spent column), and data transformation (converting dates and encoding categorical variables). The dataset contains essential columns such as InvoiceNo, StockCode, Quantity, and CustomerID.

Uploaded by

skmosid05
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

Case Study: Case Study: Online Retail Sales Data Preprocessing

Dataset:

The dataset (online_retail.csv) has the following columns:

InvoiceNo: Invoice number.

StockCode: Product code.

Description: Product description.

Quantity: Quantity of products sold.

InvoiceDate: Date and time of the invoice.

UnitPrice: Price per unit.

CustomerID: Customer ID.

Country: Country where the sale occurred.

Aim:

Prepare the dataset for analysis by performing the following preprocessing tasks:

Data Cleaning:

Handle missing values.

Remove duplicate rows.

Check for and handle outliers.

Feature Engineering:

Create a new column for the total amount spent (TotalAmount).

Data Transformation:

Convert the InvoiceDate column to datetime format.

Create new columns for the month and year of the invoice.

Categorical Variable Encoding:


Encode categorical variables, such as Country.

You might also like