[go: up one dir, main page]

0% found this document useful (0 votes)
16 views100 pages

01 Overview of Machine Learning

General Overview of Machine Learning process using diagrams and a full explanation from source to final results

Uploaded by

leocortes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views100 pages

01 Overview of Machine Learning

General Overview of Machine Learning process using diagrams and a full explanation from source to final results

Uploaded by

leocortes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 100

Machine Learning

Overview
Machine Learning

Collect & Clean & Exploratory Machine


Store Organize Data Learning
Data Data Analysis Models

Real Supervised Learning:


World Predict an Outcome

Unsupervised Learning:
Discover Patterns in Data
Machine Learning
● Our main goals in ML Overview section:
○ Problems solved by Machine Learning

○ Types of Machine Learning


■ Supervised Learning
■ Unsupervised Learning
■ Reinforcement Learning

○ ML Process for Supervised Learning


Regression
Machine Learning

● Machine Learning Sections

○ Linear Regression
■ Intuition and Mathematical
Theory
■ Simple Linear Regression
■ Scikit-learn and Linear Regression
■ Regularization
Machine Learning

● Machine Learning Sections

○ Additional ML topics

■ Performance Metrics
■ Feature Engineering
■ Data Preparation
■ Cross-validation
Machine Learning

● Machine Learning Sections

○ Different Types of Algorithm

■ Intuition and Mathematical Theory


■ Example code of application of
Algorithm
■ Project Exercise & Project Solution
Machine Learning

● Machine Learning Sections

○ Revisit Linear Regression

■ Combine previous ML topics for


Project Exercise
Why Machine
Learning?
Machine Learning

● Machine learning in general is the study of


statistical computer algorithms that
improve automatically through data.

● This means unlike typical computer


algorithms that rely on human input for
what approach to take, ML algorithms infer
best approach from the data itself.
Machine Learning

● Machine learning is a subset of Artificial


Intelligence.

● ML algorithms are not explicitly programmed


on which decisions to make, instead the
algorithm is designed to infer from the data
the most optimal choices to make.
Machine Learning
● What kinds of problems can ML solve?

○ Credit Scoring
○ Insurance Risk
○ Price Forecasting
○ Spam Filtering
○ Customer Segmentation
○ Recommender Systems
○ Text Sentiment Analysis
Machine Learning

● Structure of ML Problem framing:

○ Given features from a dataset


obtain a desired label.
○ ML algorithms are often called
“estimators” since they are
estimating the desired label or
output.
Machine Learning

● How can ML be so robust in solving all


sorts of problems?

● Machine learning algorithms rely on data


and a set of statistical methods to learn
what features are important in data.
Machine Learning

● Simple Example:

○ Predict the price a house should sell


at given its current features (Area,
Bedrooms, Bathrooms, etc.)
Machine Learning

● House Price Prediction

○ Typical Algorithm
■ Human user defines an algorithm
to manually set values of
importance for each feature.
Machine Learning

● House Price Prediction

○ ML Algorithm
■ Algorithm automatically
determines importance of each
feature from existing data
Machine Learning

● Why machine learning?


○ Many complex problems are only
solvable with machine learning
techniques.
○ Problems such as spam email or
handwriting identification require
ML for an effective solution.
Machine Learning

● Why not just use machine learning for


everything?
○ Major caveat to effective ML is good
data.
○ Majority of development time is
spent cleaning and organizing data,
not implementing ML algorithms.
Machine Learning

● Do we develop our own ML algorithms?

○ Rare to have a need to manually


develop and implement a new ML
algorithm, since these techniques
are well documented and
developed.
Types of
Machine Learning
Machine Learning

● There are two main types of Machine


Learning :

○ Supervised Learning

○ Unsupervised Learning
Machine Learning

● Supervised Learning
○ Using historical and labeled data, the
machine learning model predicts a value.

● Unsupervised Learning
○ Applied to unlabeled data, the machine
learning model discovers possible patterns
in the data.
Machine Learning

● Supervised Learning
○ Requires historical labeled data:
■ Historical
● Known results and data from
the past.
■ Labeled
● The desired output is known.
Machine Learning

● Supervised Learning
○ Two main label types
■ Categorical Value to Predict
● Classification Task
■ Continuous Value to Predict
● Regression Task
Machine Learning

● Supervised Learning
○ Classification Tasks
■ Predict an assigned category
● Cancerous vs. Benign Tumor
● Fulfillment vs. Credit Default
● Assigning Image Category
○ Handwriting Recognition
○ Cats vs Dogs vs Dolphins
Machine Learning

● Supervised Learning
○ Regression Tasks
■ Predict a continuous value
● Future prices
● Electricity loads
● Test scores
Machine Learning

● Unsupervised Learning
○ Group and interpret data without a
label.
○ Example:
■ Clustering customers into
separate groups based off their
behavior features.
Machine Learning

● Unsupervised Learning
○ Major downside is because there
was no historical “correct” label, it is
much harder to evaluate
performance of an unsupervised
learning algorithm.
Machine Learning

● Machine Learning Sections


○ We first focus on supervised
learning to build an understanding
of machine learning capabilities
since they are the most “easiest” to
understand.
Machine Learning

● Finally, let’s have a deep dive into the


entire Supervised Machine Learning
process!
Supervised Machine
Learning Process
Machine Learning

● Machine Learning Pathway


Collect &
Store
Data

Real
World
Machine Learning

● Machine Learning Pathway


Collect & Clean &
Store Organize
Data Data

Real
World
Machine Learning

● Machine Learning Pathway


Collect & Clean & Exploratory
Store Organize Data
Data Data Analysis

Real
World
Machine Learning

● Machine Learning Pathway


Collect & Clean & Exploratory Machine
Store Organize Data Learning
Data Data Analysis Models

Real Supervised Learning:


World Predict an Outcome
Unsupervised Learning:
Discover Patterns in Data
Machine Learning

● Machine Learning Pathway


Collect & Clean & Exploratory Machine
Store Organize Data Learning
Data Data Analysis Models

Real Supervised Learning:


World Predict an Outcome
Jupyter,NumPy, Pandas, Matplotlib, Seaborn Unsupervised Learning:
Discover Patterns in Data

Scikit-learn
Machine Learning

● Machine Learning Pathway


Collect & Clean & Exploratory Machine
Store Organize Data Learning
Data Data Analysis Models

Real Supervised Learning:


World Predict an Outcome
Machine Learning

● ML Process : Supervised Learning Tasks


Collect & Clean & Exploratory Machine
Store Organize Data Learning
Data Data Analysis Models

Real Supervised Learning:


World Predict an Outcome
Machine Learning

● Predict price a house should sell at.


Collect & Clean & Exploratory Machine
Store Organize Data Learning
Data Data Analysis Models

Real Supervised Learning:


World Predict an Outcome
Machine Learning
● Supervised Machine Learning Process
● Start with collecting and organizing a data
set based on history:
Area m2 Bedrooms Bathrooms Price

200 3 2 $500,000

190 2 1 $450,000

230 3 3 $650,000

180 1 1 $400,000

210 2 2 $550,000
Machine Learning

● Historical labeled data on previously


sold houses.
Area m2 Bedrooms Bathrooms Price

200 3 2 $500,000

190 2 1 $450,000

230 3 3 $650,000

180 1 1 $400,000

210 2 2 $550,000
Machine Learning
● If a new house comes on the market with
a known Area, Bedrooms, and
Bathrooms: Predict what price should it
sell at.
Area m2 Bedrooms Bathrooms Price

200 3 2 $500,000

190 2 1 $450,000

230 3 3 $650,000

180 1 1 $400,000

210 2 2 $550,000
Machine Learning
● Data Product:
○ Input house features
○ Output predicted selling price (label)
Area m2 Bedrooms Bathrooms Price

200 3 2 $500,000

190 2 1 $450,000

230 3 3 $650,000

180 1 1 $400,000

210 2 2 $550,000
Machine Learning

● Using historical, labeled data predict a


future outcome or result.
Area m2 Bedrooms Bathrooms Price

200 3 2 $500,000

190 2 1 $450,000

230 3 3 $650,000

180 1 1 $400,000

210 2 2 $550,000
Machine Learning

● Predict price a house should sell at.


Collect & Clean & Exploratory Machine
Store Organize Data Learning
Data Data Analysis Models

Real Supervised Learning:


World Predict an Outcome
Machine Learning

● Predict price a house should sell at.


Machine
Learning
Models

Supervised Learning:
Predict an Outcome
Machine Learning

● Predict price a house should sell at.


Machine Learning Models

Supervised Learning:
Predict an Outcome
Machine Learning

● Supervised Machine Learning Process

Data
Machine Learning

● Supervised Machine Learning Process

X:
Data Features
y: Label
Machine Learning

● Supervised Machine Learning Process

Area m2 Bedrooms Bathrooms Price

X: 200 3 2 $500,000
Data Features
y: Label 190 2 1 $450,000

230 3 3 $650,000

180 1 1 $400,000

210 2 2 $550,000
Machine Learning

● Label is what we are trying to predict

Area m2 Bedrooms Bathrooms Price

X: 200 3 2 $500,000
Data Features
y: Label 190 2 1 $450,000

230 3 3 $650,000

180 1 1 $400,000

210 2 2 $550,000
Machine Learning

● Features are known characteristics or


components in the data
Area m2 Bedrooms Bathrooms Price

X: 200 3 2 $500,000
Data Features
y: Label 190 2 1 $450,000

230 3 3 $650,000

180 1 1 $400,000

210 2 2 $550,000
Machine Learning

● Features and Label are identified


according to the problem being solved.
Area m2 Bedrooms Bathrooms Price

X: 200 3 2 $500,000
Data Features
y: Label 190 2 1 $450,000

230 3 3 $650,000

180 1 1 $400,000

210 2 2 $550,000
Supervised Machine Learning Process

● Split data into training set and test set


Training
Data Set
X:
Data Features
Y: Label

Test
Data Set
Supervised Machine Learning Process

● Why perform this split? How to split?


Training
Data Set
X:
Data Features
Y: Label

Test
Data Set
Supervised Machine Learning Process

● Why perform this split? How to split?

Area m2 Bedrooms Bathrooms Price

200 3 2 $500,000

190 2 1 $450,000

230 3 3 $650,000

180 1 1 $400,000

210 2 2 $550,000
Supervised Machine Learning Process

● How would you judge a human realtor’s


performance?
Area m2 Bedrooms Bathrooms Price

200 3 2 $500,000

190 2 1 $450,000

230 3 3 $650,000

180 1 1 $400,000

210 2 2 $550,000
Supervised Machine Learning Process

● Ask a human realtor to take a look at


historical data...
Area m2 Bedrooms Bathrooms Price

200 3 2 $500,000

190 2 1 $450,000

230 3 3 $650,000

180 1 1 $400,000

210 2 2 $550,000
Supervised Machine Learning Process

● Then give the human realtor the features


of a house and ask to predict a selling
price. Area m Bedrooms
2 Bathrooms Price

200 3 2 $500,000

190 2 1 $450,000

230 3 3 $650,000

180 1 1 $400,000

210 2 2 $550,000
Supervised Machine Learning Process

● But how would you measure how accurate


her prediction is? What house should you
choose to test on?
Area m2 Bedrooms Bathrooms Price

200 3 2 $500,000

190 2 1 $450,000

230 3 3 $650,000

180 1 1 $400,000

210 2 2 $550,000
Supervised Machine Learning Process

● You can’t judge based on a new house


that hasn’t sold yet, you don’t know it’s
true selling price! Area m Bedrooms Bathrooms Price
2

200 3 2 $500,000

190 2 1 $450,000

230 3 3 $650,000

180 1 1 $400,000

210 2 2 $550,000
Supervised Machine Learning Process

● You shouldn’t judge the human realtor on


data he/she’s already seen, she could have
memorized it!
Area m2 Bedrooms Bathrooms Price

200 3 2 $500,000

190 2 1 $450,000

230 3 3 $650,000

180 1 1 $400,000

210 2 2 $550,000
Supervised Machine Learning Process

● Thus the need for a Train/Test split of the


data.
Area m2 Bedrooms Bathrooms Price

200 3 2 $500,000

190 2 1 $450,000

230 3 3 $650,000

180 1 1 $400,000

210 2 2 $550,000
Supervised Machine Learning Process
● We already organized the data into
Features (X) and a Label (y)
Area m2 Bedrooms Bathrooms Price

200 3 2 $500,000

190 2 1 $450,000

230 3 3 $650,000

180 1 1 $400,000

210 2 2 $550,000
Supervised Machine Learning Process

● Now we will split this into a training set


and a test set:
Area m2 Bedrooms Bathrooms Price

200 3 2 $500,000
TRAIN 190 2 1 $450,000

230 3 3 $650,000

180 1 1 $400,000

210 2 2 $550,000
Supervised Machine Learning Process

● Now we will split this into a training set


and a test set:
Area m2 Bedrooms Bathrooms Price

200 3 2 $500,000
TRAIN 190 2 1 $450,000

230 3 3 $650,000

180 1 1 $400,000
TEST
210 2 2 $550,000
Supervised Machine Learning Process

● Notice how we have 4 components

Area m2 Bedrooms Bathrooms Price

200 3 2 $500,000
X TRAIN 190 2 1 $450,000 Y TRAIN

230 3 3 $650,000

180 1 1 $400,000 Y TEST


X TEST
210 2 2 $550,000
Supervised Machine Learning Process

● Let’s go back to fairly testing our human


realtor.
Area m2 Bedrooms Bathrooms Price

200 3 2 $500,000

190 2 1 $450,000

230 3 3 $650,000

180 1 1 $400,000

210 2 2 $550,000
Supervised Machine Learning Process

● Let’s go back to fairly testing our human


realtor.
Area m2 Bedrooms Bathrooms Price

200 3 2 $500,000
TRAIN 190 2 1 $450,000

230 3 3 $650,000

180 1 1 $400,000
TEST
210 2 2 $550,000
Supervised Machine Learning Process

● Let her study and learn on the training


set getting access to both X and y.
Area m2 Bedrooms Bathrooms Price

200 3 2 $500,000
TRAIN 190 2 1 $450,000

230 3 3 $650,000
Supervised Machine Learning Process

● After she has “learned” about the data,


we can test her skill on the test set.
Area m2 Bedrooms Bathrooms

180 1 1
TEST
210 2 2
Supervised Machine Learning Process

● Provide only the X test data and ask for


her predictions for the sell price.
Area m2 Bedrooms Bathrooms

180 1 1
TEST
210 2 2
Supervised Machine Learning Process

● This is new data she has never seen


before! She has also never seen the real
sold price. Area m Bedrooms
2 Bathrooms

180 1 1
TEST
210 2 2
Supervised Machine Learning Process

● Ask for predictions per data point.

Predictions Area m2 Bedrooms Bathrooms

$410,000 180 1 1
TEST
$540,000 210 2 2
Supervised Machine Learning Process

● Then bring back the original prices.

Predictions Area m2 Bedrooms Bathrooms Price

$410,000 180 1 1 $400,000


TEST
$540,000 210 2 2 $550,000
Supervised Machine Learning Process

● Finally compare predictions against true


test price.
Predictions Price

$410,000 $400,000

$540,000 $550,000
Supervised Machine Learning Process

● This is often labeled as ŷ compared again


y
ŷ y
Predictions Price

$410,000 $400,000

$540,000 $550,000
Supervised Machine Learning Process

● Later on we will discuss the many


methods of evaluating this performance!

Predictions Price

$410,000 $400,000

$540,000 $550,000
Supervised Machine Learning Process

● Split Data
Training
Data Set
X:
Data Features
Y: Label

Test
Data Set
Supervised Machine Learning Process

● Split Data, Fit on Train Data


Training
Data Set
X: Fit/Train
Data Features Model
Y: Label

Test
Data Set
Supervised Machine Learning Process

● Split Data, Fit on Train Data,Evaluate


Model
Training
Data Set
X: Fit/Train
Data Features Model
Y: Label

Test
Data Set Evaluate
Performance
Supervised Machine Learning Process

● What happens if performance isn’t great?


Training
Data Set
X: Fit/Train
Data Features Model
Y: Label

Test
Data Set Evaluate
Performance
Supervised Machine Learning Process

● We can adjust model hyperparameters


Training
Data Set
X: Fit/Train
Data Features Model
Y: Label

Test
Data Set Evaluate
Performance
Supervised Machine Learning Process

● Many algorithms have adjustable values


Training
Data Set
X: Fit/Train
Adjust
Data Features Adjusted
Model
Y: Label Model

Test
Data Set
Supervised Machine Learning Process

● Evaluate adjusted model


Training
Data Set
X: Fit/Train
Adjust
Data Features Adjusted
Model
Y: Label Model

Test Evaluate
Data Set Performance
Supervised Machine Learning Process

● Can repeat this process as necessary


Training
Data Set
X: Fit/Train
Adjust
Data Features Adjusted
Model
Y: Label Model

Test Evaluate
Data Set Performance
Supervised Machine Learning Process

● Full and Simplified Process


Supervised Machine Learning Process

● Get X and y data

X and y
Data
Supervised Machine Learning Process

● Split data for evaluation purposes


Training
Data Set
X and y
Data

Test
Data Set
Supervised Machine Learning Process

● Fit ML Model on Training Data Set


Training
Data Set
X and y Fit/Train
Data Model

Test
Data Set
Supervised Machine Learning Process

● Evaluate Model Performance


Training
Data Set
X and y Fit/Train
Data Model

Test
Data Set
Evaluate
Performance
Supervised Machine Learning Process

● Adjust model hyperparameters as needed


Training
Data Set
X and y Fit/Train Adjust as
Data Model Needed

Test
Data Set
Evaluate
Performance
Supervised Machine Learning Process

● Deploy model to real world


Training
Data Set
X and y Fit/Train Adjust as Deploy
Data Model Needed Model

Test
Data Set
Evaluate
Performance
Machine Learning

● ML Process : Supervised Learning Tasks


Collect & Clean & Exploratory Machine
Store Organize Data Learning
Data Data Analysis Models

Real Supervised Learning:


World Predict an Outcome
ML Pathway

Collect & Clean & Exploratory Machine


Store Organize Data Learning
Data Data Analysis Models

Real
World
Service

Data Dashboard
Product
Application

Predict Future Outcomes

You might also like