Data Preprocessing and
Feature Engineering
Techniques
@ AIMS Cameroon
25 Sept- 14 Oct, 2023
Rockefeller,
Stellenbosch University, South Africa
Who am I ?
• Data Scientist Consultant and
Trainer
• My name is Rockefeller.
• PhD Candidate in A.I.,
Stellenbosch University, South
• You can call me Tonton Rock if
Africa.
you like
• I was born in Douala, Cameroon. • Research focuses on Deep
Learning methods applied to
Dynamical Systems.
rockefeller@aims.ac.za
FACTS
• Fitting models with raw data is
(often) the guarantee of building
biased models.
• Data literacy on the
African continent is still quite low.
Data Science Project Life Cycle Simple! Right?
Problem Statement Deployment
Data Collection Evaluation
Data preprocessing Modeling
Data Science Project Life Cycle Well, life is not that
simple!
Problem Statement Feedback Deployment
Data Collection Evaluation
Data preprocessing Modeling
Data Science Project Life Cycle Well, life is not that
simple!
Problem Statement Feedback Deployment
Data Collection Evaluation
Data preprocessing Modeling
Data Science Project Life Cycle Well, life is not that
simple!
Problem Statement Feedback Deployment
Data Collection Evaluation
Data preprocessing Modeling
Data Reading, Data
visualization, Data cleaning,
Data normalization
on
i.i.d Data
Time Series
Image Data
Text Data
Part 0 : The Data Science Ecosystem
1. The Data Science Ecosystem
2. Getting started with Jupyter and Colab
3. Introduction to Python for Data Science
Part 1 : Dealing with i.i.d. data
1. Working with Series and DataFrames
2. Data Reading Methods
3. Introducing Features and Observations
4. Handling Text Data
5. Grouping the Data
6. Basic Data Explorations
7. Data Organization Methods
8. Customizing Functions
Part 2 : Dealing with Time Series
1. Working with Time Data
2. Basic Data Manipulation on Time Series
3. Advanced Manipulation on Time Series
4. Framing Time Series for Machine Learning
Part 3 : Dealing with image data
1. Introduction to Image Data
2. Image Pre-processing operations
3. Advanced Image Pre-processing operations
4. Feature Extraction from Image
5. Preparing Image Data for Model Training
Part 4 : Dealing with Text Data
1. Introduction to Text Data
2. Text Mining Operations
3. Feature Extraction from Text Data
4. Word Embeddings
Some tips!!!
1. It is a practical data analysis course, not a
programming course!!!
2. Focus on building your data literacy, not on
copy pasting codes.
3. Do not code while I am teaching, you will have
plenty of time for that.
Tips for success
1. Ask Questions
Tips for success
1. Ask Questions
2. Ask Questions again
Tips for success
1. Ask Questions
2. Ask Questions again
3. Ask Questions again and again
Course Outline
Tuesday Wednesday Thursday Friday Saturday
Lectures Lectures • Quiz 1 Lectures
• Lectures Lectures
• Assignment 1
(release)
Lectures • Lectures Lectures • Quiz 3
• Quiz 2 • Lectures • Lectures
• Assignment 2
(release)
• Lectures Lectures • Lectures Lectures Group
• Group Assignment • Quiz 4 Presentations
(release)