[go: up one dir, main page]

0% found this document useful (0 votes)
16 views18 pages

Intro Slides

This document outlines a 5-day course on data preprocessing and feature engineering techniques. The course will cover working with different data types like time series, images, and text. It will also cover topics like data cleaning, normalization, feature extraction, and preparing data for modeling. The course schedule includes lectures, quizzes, and assignments each day with the goal of improving student's data literacy skills. Students are encouraged to actively ask questions throughout the sessions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views18 pages

Intro Slides

This document outlines a 5-day course on data preprocessing and feature engineering techniques. The course will cover working with different data types like time series, images, and text. It will also cover topics like data cleaning, normalization, feature extraction, and preparing data for modeling. The course schedule includes lectures, quizzes, and assignments each day with the goal of improving student's data literacy skills. Students are encouraged to actively ask questions throughout the sessions.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

Data Preprocessing and

Feature Engineering
Techniques
@ AIMS Cameroon
25 Sept- 14 Oct, 2023

Rockefeller,
Stellenbosch University, South Africa
Who am I ?

• Data Scientist Consultant and


Trainer

• My name is Rockefeller.
• PhD Candidate in A.I.,
Stellenbosch University, South
• You can call me Tonton Rock if
Africa.
you like

• I was born in Douala, Cameroon. • Research focuses on Deep


Learning methods applied to
Dynamical Systems.

rockefeller@aims.ac.za
FACTS

• Fitting models with raw data is


(often) the guarantee of building
biased models.

• Data literacy on the


African continent is still quite low.
Data Science Project Life Cycle Simple! Right?

Problem Statement Deployment

Data Collection Evaluation

Data preprocessing Modeling


Data Science Project Life Cycle Well, life is not that
simple!

Problem Statement Feedback Deployment

Data Collection Evaluation

Data preprocessing Modeling


Data Science Project Life Cycle Well, life is not that
simple!

Problem Statement Feedback Deployment

Data Collection Evaluation

Data preprocessing Modeling


Data Science Project Life Cycle Well, life is not that
simple!

Problem Statement Feedback Deployment

Data Collection Evaluation

Data preprocessing Modeling


Data Reading, Data
visualization, Data cleaning,
Data normalization
on

i.i.d Data
Time Series
Image Data

Text Data
Part 0 : The Data Science Ecosystem

1. The Data Science Ecosystem

2. Getting started with Jupyter and Colab

3. Introduction to Python for Data Science


Part 1 : Dealing with i.i.d. data
1. Working with Series and DataFrames

2. Data Reading Methods

3. Introducing Features and Observations

4. Handling Text Data

5. Grouping the Data

6. Basic Data Explorations

7. Data Organization Methods

8. Customizing Functions
Part 2 : Dealing with Time Series

1. Working with Time Data

2. Basic Data Manipulation on Time Series

3. Advanced Manipulation on Time Series

4. Framing Time Series for Machine Learning


Part 3 : Dealing with image data

1. Introduction to Image Data

2. Image Pre-processing operations

3. Advanced Image Pre-processing operations

4. Feature Extraction from Image

5. Preparing Image Data for Model Training


Part 4 : Dealing with Text Data

1. Introduction to Text Data

2. Text Mining Operations

3. Feature Extraction from Text Data

4. Word Embeddings
Some tips!!!

1. It is a practical data analysis course, not a


programming course!!!

2. Focus on building your data literacy, not on


copy pasting codes.

3. Do not code while I am teaching, you will have


plenty of time for that.
Tips for success

1. Ask Questions
Tips for success

1. Ask Questions

2. Ask Questions again


Tips for success

1. Ask Questions

2. Ask Questions again

3. Ask Questions again and again


Course Outline

Tuesday Wednesday Thursday Friday Saturday


Lectures Lectures • Quiz 1 Lectures
• Lectures Lectures
• Assignment 1
(release)

Lectures • Lectures Lectures • Quiz 3


• Quiz 2 • Lectures • Lectures
• Assignment 2
(release)

• Lectures Lectures • Lectures Lectures Group


• Group Assignment • Quiz 4 Presentations
(release)

You might also like