01 Overview of Machine Learning
01 Overview of Machine Learning
Overview
Machine Learning
Unsupervised Learning:
Discover Patterns in Data
Machine Learning
● Our main goals in ML Overview section:
○ Problems solved by Machine Learning
○ Linear Regression
■ Intuition and Mathematical
Theory
■ Simple Linear Regression
■ Scikit-learn and Linear Regression
■ Regularization
Machine Learning
○ Additional ML topics
■ Performance Metrics
■ Feature Engineering
■ Data Preparation
■ Cross-validation
Machine Learning
○ Credit Scoring
○ Insurance Risk
○ Price Forecasting
○ Spam Filtering
○ Customer Segmentation
○ Recommender Systems
○ Text Sentiment Analysis
Machine Learning
● Simple Example:
○ Typical Algorithm
■ Human user defines an algorithm
to manually set values of
importance for each feature.
Machine Learning
○ ML Algorithm
■ Algorithm automatically
determines importance of each
feature from existing data
Machine Learning
○ Supervised Learning
○ Unsupervised Learning
Machine Learning
● Supervised Learning
○ Using historical and labeled data, the
machine learning model predicts a value.
● Unsupervised Learning
○ Applied to unlabeled data, the machine
learning model discovers possible patterns
in the data.
Machine Learning
● Supervised Learning
○ Requires historical labeled data:
■ Historical
● Known results and data from
the past.
■ Labeled
● The desired output is known.
Machine Learning
● Supervised Learning
○ Two main label types
■ Categorical Value to Predict
● Classification Task
■ Continuous Value to Predict
● Regression Task
Machine Learning
● Supervised Learning
○ Classification Tasks
■ Predict an assigned category
● Cancerous vs. Benign Tumor
● Fulfillment vs. Credit Default
● Assigning Image Category
○ Handwriting Recognition
○ Cats vs Dogs vs Dolphins
Machine Learning
● Supervised Learning
○ Regression Tasks
■ Predict a continuous value
● Future prices
● Electricity loads
● Test scores
Machine Learning
● Unsupervised Learning
○ Group and interpret data without a
label.
○ Example:
■ Clustering customers into
separate groups based off their
behavior features.
Machine Learning
● Unsupervised Learning
○ Major downside is because there
was no historical “correct” label, it is
much harder to evaluate
performance of an unsupervised
learning algorithm.
Machine Learning
Real
World
Machine Learning
Real
World
Machine Learning
Real
World
Machine Learning
Scikit-learn
Machine Learning
200 3 2 $500,000
190 2 1 $450,000
230 3 3 $650,000
180 1 1 $400,000
210 2 2 $550,000
Machine Learning
200 3 2 $500,000
190 2 1 $450,000
230 3 3 $650,000
180 1 1 $400,000
210 2 2 $550,000
Machine Learning
● If a new house comes on the market with
a known Area, Bedrooms, and
Bathrooms: Predict what price should it
sell at.
Area m2 Bedrooms Bathrooms Price
200 3 2 $500,000
190 2 1 $450,000
230 3 3 $650,000
180 1 1 $400,000
210 2 2 $550,000
Machine Learning
● Data Product:
○ Input house features
○ Output predicted selling price (label)
Area m2 Bedrooms Bathrooms Price
200 3 2 $500,000
190 2 1 $450,000
230 3 3 $650,000
180 1 1 $400,000
210 2 2 $550,000
Machine Learning
200 3 2 $500,000
190 2 1 $450,000
230 3 3 $650,000
180 1 1 $400,000
210 2 2 $550,000
Machine Learning
Supervised Learning:
Predict an Outcome
Machine Learning
Supervised Learning:
Predict an Outcome
Machine Learning
Data
Machine Learning
X:
Data Features
y: Label
Machine Learning
X: 200 3 2 $500,000
Data Features
y: Label 190 2 1 $450,000
230 3 3 $650,000
180 1 1 $400,000
210 2 2 $550,000
Machine Learning
X: 200 3 2 $500,000
Data Features
y: Label 190 2 1 $450,000
230 3 3 $650,000
180 1 1 $400,000
210 2 2 $550,000
Machine Learning
X: 200 3 2 $500,000
Data Features
y: Label 190 2 1 $450,000
230 3 3 $650,000
180 1 1 $400,000
210 2 2 $550,000
Machine Learning
X: 200 3 2 $500,000
Data Features
y: Label 190 2 1 $450,000
230 3 3 $650,000
180 1 1 $400,000
210 2 2 $550,000
Supervised Machine Learning Process
Test
Data Set
Supervised Machine Learning Process
Test
Data Set
Supervised Machine Learning Process
200 3 2 $500,000
190 2 1 $450,000
230 3 3 $650,000
180 1 1 $400,000
210 2 2 $550,000
Supervised Machine Learning Process
200 3 2 $500,000
190 2 1 $450,000
230 3 3 $650,000
180 1 1 $400,000
210 2 2 $550,000
Supervised Machine Learning Process
200 3 2 $500,000
190 2 1 $450,000
230 3 3 $650,000
180 1 1 $400,000
210 2 2 $550,000
Supervised Machine Learning Process
200 3 2 $500,000
190 2 1 $450,000
230 3 3 $650,000
180 1 1 $400,000
210 2 2 $550,000
Supervised Machine Learning Process
200 3 2 $500,000
190 2 1 $450,000
230 3 3 $650,000
180 1 1 $400,000
210 2 2 $550,000
Supervised Machine Learning Process
200 3 2 $500,000
190 2 1 $450,000
230 3 3 $650,000
180 1 1 $400,000
210 2 2 $550,000
Supervised Machine Learning Process
200 3 2 $500,000
190 2 1 $450,000
230 3 3 $650,000
180 1 1 $400,000
210 2 2 $550,000
Supervised Machine Learning Process
200 3 2 $500,000
190 2 1 $450,000
230 3 3 $650,000
180 1 1 $400,000
210 2 2 $550,000
Supervised Machine Learning Process
● We already organized the data into
Features (X) and a Label (y)
Area m2 Bedrooms Bathrooms Price
200 3 2 $500,000
190 2 1 $450,000
230 3 3 $650,000
180 1 1 $400,000
210 2 2 $550,000
Supervised Machine Learning Process
200 3 2 $500,000
TRAIN 190 2 1 $450,000
230 3 3 $650,000
180 1 1 $400,000
210 2 2 $550,000
Supervised Machine Learning Process
200 3 2 $500,000
TRAIN 190 2 1 $450,000
230 3 3 $650,000
180 1 1 $400,000
TEST
210 2 2 $550,000
Supervised Machine Learning Process
200 3 2 $500,000
X TRAIN 190 2 1 $450,000 Y TRAIN
230 3 3 $650,000
200 3 2 $500,000
190 2 1 $450,000
230 3 3 $650,000
180 1 1 $400,000
210 2 2 $550,000
Supervised Machine Learning Process
200 3 2 $500,000
TRAIN 190 2 1 $450,000
230 3 3 $650,000
180 1 1 $400,000
TEST
210 2 2 $550,000
Supervised Machine Learning Process
200 3 2 $500,000
TRAIN 190 2 1 $450,000
230 3 3 $650,000
Supervised Machine Learning Process
180 1 1
TEST
210 2 2
Supervised Machine Learning Process
180 1 1
TEST
210 2 2
Supervised Machine Learning Process
180 1 1
TEST
210 2 2
Supervised Machine Learning Process
$410,000 180 1 1
TEST
$540,000 210 2 2
Supervised Machine Learning Process
$410,000 $400,000
$540,000 $550,000
Supervised Machine Learning Process
$410,000 $400,000
$540,000 $550,000
Supervised Machine Learning Process
Predictions Price
$410,000 $400,000
$540,000 $550,000
Supervised Machine Learning Process
● Split Data
Training
Data Set
X:
Data Features
Y: Label
Test
Data Set
Supervised Machine Learning Process
Test
Data Set
Supervised Machine Learning Process
Test
Data Set Evaluate
Performance
Supervised Machine Learning Process
Test
Data Set Evaluate
Performance
Supervised Machine Learning Process
Test
Data Set Evaluate
Performance
Supervised Machine Learning Process
Test
Data Set
Supervised Machine Learning Process
Test Evaluate
Data Set Performance
Supervised Machine Learning Process
Test Evaluate
Data Set Performance
Supervised Machine Learning Process
X and y
Data
Supervised Machine Learning Process
Test
Data Set
Supervised Machine Learning Process
Test
Data Set
Supervised Machine Learning Process
Test
Data Set
Evaluate
Performance
Supervised Machine Learning Process
Test
Data Set
Evaluate
Performance
Supervised Machine Learning Process
Test
Data Set
Evaluate
Performance
Machine Learning
Real
World
Service
Data Dashboard
Product
Application