AI VIETNAM
All-in-One Course
Machine Learning
K-NEAREST NEIGHBORS
DECISION TREE
Nguyen Quoc Thai
1
Year 2023
CONTENT
(1) – K-Nearest Neighbors (KNN)
(2) – KNN Applications
(3) – Decision Tree
(4) – Summary
2
1 – K Nearest Neighbors
! What is the KNN Algorithm?
Ø KNN is one of the simplest supervised machine learning algorithms
Ø Lazy Learning:
q Does not “LEARN” until the test example is given
q A new data is predicted based on K-Nearest Neighbors from the training data
3
1 – K Nearest Neighbors
! What is the KNN Algorithm?
Training Phrase Memory-based Learning
Training Data
Similarity Measure
Testing Phrase arch
Se
Test Data K-Nearest Neighbors
4
1 – K Nearest Neighbors
! What is the KNN Algorithm?
Training Phrase Memory-based Learning
Training Data
Similarity Measure
Testing Phrase arch
Se
Predict
Test Data K-Nearest Neighbors
5
1 – K Nearest Neighbors
! What is the KNN Algorithm?
Training Phrase Memory-based Learning
Non-Parametric
Training Data
Similarity Measure
Testing Phrase arch
Se
Predict
Test Data K-Nearest Neighbors
6
1 – K Nearest Neighbors
! What is the KNN Algorithm?
Training Phrase Memory-based Learning
Solving
Regression
Classfication
Training Data
Testing Phrase arch
Se
Predict
Test Data K-Nearest Neighbors
7
1 – K Nearest Neighbors
! What is the KNN Algorithm?
Regression Classification
Ø Predict a continuous value based Ø Classify input variables to identify
on the input variables discrete output variables (labels,
categories)
What will be the Will it be hot or
temperature tomorrow? cold tomorrow?
8
1 – K Nearest Neighbors
! KNN: Classification Approach
Training Data
Classification
Feature Label
Ø Classify input variables to identify
discrete output variables (labels,
categories)
Will it be hot or … …
cold tomorrow?
Test Data
?
9
1 – K Nearest Neighbors
! KNN: Classification Approach
Step 1: Look at the data
Training Data
Petal_Length Petal_Width Label
1.4 0.2 0
1.3 0.4 0
4 1 1
4.7 1.4 1
Test Data
2.4 0.8
New data 10
1 – K Nearest Neighbors
! KNN: Classification Approach
Step 2: Calculate distances Euclidean Distance $
𝑑(x, y) = ( x ! − y! %
Training Data !"#
Petal_Length Petal_Width Label Distance
1.4 0.2 0 1.166
1.3 0.4 0 1.170
4 1 1 1.612
2.376
4.7 1.4 1 2.376
6
1.16 1.612
Test Data 1.1
70
2.4 0.8
11
1 – K Nearest Neighbors
! KNN: Classification Approach
Step 3: Find neighbours
Training Data
Ranking points
Petal_Length Petal_Width Label Distance
1.4 0.2 0 1.166 1 st
1.3 0.4 0 1.170 2 nd
4 1 1 1.612 3 rd
4.7 1.4 1 2.376 4 th
Test Data Find the nearest neighbours by ranking
2.4 0.8 points by increasing distance
12
1 – K Nearest Neighbors
! KNN: Classification Approach
Step 4: Vote on labels
Training Data
Petal_Length Petal_Width Label Distance K=3 Nearest neighbours # of votes
1.4 0.2 0 1.166 1 st 2
1.3 0.4 0 1.170 2 nd
4 1 1 1.612 3 rd 1
4.7 1.4 1 2.376
Test Data Vote on the predicted class labels based
2.4 0.8 0 on the class of the k nearest neighbors
13
1 – K Nearest Neighbors
! KNN: Regression Approach
Step 1: Look at the data
Training Data
Length Width Price
2.0 2.0 2.0
3.0 3.0 2.5
4.0 3.0 3.5
5.0 2.0 5.0
New data
Test Data
3.0 2.0
14
1 – K Nearest Neighbors
! KNN: Regression Approach
Step 2: Calculate distances Euclidean Distance $
𝑑(x, y) = ( x ! − y! %
Training Data !"#
Length Width Price Distance
1.0 1.4
2.0 2.0 2.0 1.0
1.0 2.0
3.0 3.0 2.5 1.0
4.0 3.0 3.5 1.4
5.0 2.0 5.0 2.0
Test Data
3.0 2.0
15
1 – K Nearest Neighbors
! KNN: Regression Approach
Step 3: Find neighbours
Training Data
Ranking points
Length Width Price Distance
2.0 2.0 2.0 1.0 1 st
3.0 3.0 2.5 1.0 2 nd
4.0 3.0 3.5 1.4 3 rd
5.0 2.0 5.0 2.0 4 th
Test Data Find the nearest neighbours by ranking
3.0 2.0 points by increasing distance
16
1 – K Nearest Neighbors
! KNN: Regression Approach
Step 4: Vote on labels
1
Y&'() = ( y*
k
Training Data *∈,-
Length Width Price Distance K=4 Nearest neighbours
2.0 2.0 2.0 1.0 1 st Y&'()
3.0 3.0 2.5 1.0 2 nd 1
= 2.0 + 2.5 + 3.5 + 5.0
4
4.0 3.0 3.5 1.4 3 rd =3.25
5.0 2.0 5.0 2.0 4 th
Test Data Compute the mean value of the
3.0 2.0 3.25 k nearest neighbors
17
1 – K Nearest Neighbors
! KNN: Summary
Classification
Regression
18
1 – K Nearest Neighbors
! Geometry Distance Functions
Ø Euclidean (p=2) Ø Manhattan (p=1)
/
$
%
𝑑(x, y) = ( 𝑥. − 𝑦.
𝑑(x, y) = ( x ! − y! ."#
!"#
Ø Minkowski (p-norm) Ø Chebyshev (p=)
# #
/ 0 / 0
𝑑(x, y) = ( 𝑥. − 𝑦. 0 𝑑 x, y = lim ( 𝑥. − 𝑦. 0
0→2
."# ."#
= max 𝑥. − 𝑦.
.
19
1 – K Nearest Neighbors
! Feature Scaling (Normalization)
Ø Standardize the range of independent variables (feature of data)
Strong Influence
Training Data
Petal_Length L_Distance Petal_Width W_Distance Label Distance Rank
1.4 1.8 0.2 0.4 0 1.844 3
1.3 1.9 0.4 0.2 0 1.910 4
4 0.8 1 0.4 1 0.894 1
4.7 1.3 1.4 0.8 1 1.526 2
Test Data
3.2 0.6
20
1 – K Nearest Neighbors
! Feature Scaling (Normalization)
Ø Standardize the range of independent variables (feature of data)
Training Data MinMaxScaler Normalization
Petal_Length L_Distance Petal_Width W_Distance Label Distance Rank
0.03 0.53 0.00 0.33 0 0.624 3
0.00 0.56 0.17 0.16 0 0.582 2
0.79 0.23 0.66 0.33 1 0.402 1
1.00 0.44 1.00 0.67 1 0.801 4
Test Data
0.56 0.33
21
1 – K Nearest Neighbors
! Searching in KNN
Ø Training dataset: N samples in D dimensions
Ø Brute Force: Naïve neighbor search – O[DN2]
1 KD Tree O[NDlog(N)] 2 Ball Tree
Source: https://varshasaini.in/kd-tree-and-ball-tree-knn-algorithm/
22
1 – K Nearest Neighbors
! How to find the optimal value of K in KNN?
Ø Choose K based on the evaluation on the validation set (Accuracy, Error, F-Score,…)
Error
K=5 Training Error
K=3
Validation Error
2
K Value
23
2 – KNN Applications
! KNN for Regression: “Diabetes” Dataset
Ø Sample: 442
Ø Features: 10
Ø Target: 25 – 346
Ø R2-Score (Validation)
24
2 – KNN Applications
! KNN for Classification: “Iris” Dataset
Ø Sample: 150
Ø Features: 4
Ø Classes: 3 (50 per class)
Ø Accuracy (Validation)
25
2 – KNN Applications
! KNN for Text Classification: “IMDB” Dataset
Ø Sample: 50.000 movie review
Ø Classes: 2 – Positive and Negative (25.000 per class)
Ø Accuracy, F1-Score
26
https://paperswithcode.com/sota/text-classification-on-imdb
2 – KNN Applications
! KNN for Text Classification: “IMDB” Dataset
Text Feature Label
I like this movie 1
So bad Convert text 0
Boring to feature 0
Love it so much 1
BoW
KNN
Classifier
27
2 – KNN Applications
! KNN for Text Classification: “IMDB” Dataset
Ø Bag of Words (BoW)
Ø Document-Level: Consider text as a bag (collection) of words
Ø Represented by a V-dimensional
Use: the number of occurrences of the word in the document
[dog, bites, man] Vocabulary
[man, bites, dog] IDX 0 1 2 3 4 5
[dog, eats, meat] Token bites dog eats food man meat
[man, eats, food]
Counter
[dog, bites, man] 1 1 0 0 1 0
28
2 – KNN Applications
! KNN for Text Classification: “IMDB” Dataset
Ø Bag of Words (BoW)
Ø Document-Level: Consider text as a bag (collection) of words
Ø Represented by a V-dimensional
[dog, bites, man] 1 1 0 0 1 0
[man, bites, dog] 1 1 0 0 1 0
[dog, eats, meat] 0 1 1 0 0 1
[man, eats, food] 0 0 0 1 1 0
29
2 – KNN Applications
! KNN for Text Classification: “IMDB” Dataset
Ø Bag of Words (BoW)
Ø Out of vocabulary (OOV)
Vocabulary
IDX 0 1 2 3 4 5
Token bites dog eats food man meat
Counter
[dog, bites, man] 1 1 0 0 1 0
[dog, and, dog, are, friends] 0 2 0 0 0 0
30
2 – KNN Applications
! KNN for Text Classification: “IMDB” Dataset
Ø Bag of Words (BoW)
Ø Representation without considering frequency (Binary Representation)
Vocabulary
IDX 0 1 2 3 4 5
Token bites dog eats food man meat
Counter
[dog, and, dog, are, friends] 0 2 0 0 0 0
Convert to binary representation
[dog, and, dog, are, friends] 0 1 0 0 0 0
31
3 – Decision Tree
! Decision Tree
Ø A possible decision tree for the data
Ø Described by IF-ELSE sets
Feature 1
Value 11 Value 12 Value 13
Feature 2 Yes Feature 3
Value 21 Value 22 Value 31 Value 32
No Yes No Yes
32
3 – Decision Tree
! Decision Tree
Ø Each internal node: attribute X (feature)
Ø Each branch from a node: value of X
Feature 1
Ø Each leaf node: predict value
Value 11 Value 12 Value 13
Feature 2 Yes Feature 3
Value 21 Value 22 Value 31 Value 32
No Yes No Yes
33
3 – Decision Tree
! Decision Tree
Ø Example: Play Tennis dataset
34
3 – Decision Tree
! Decision Tree
Ø Example: Play Tennis dataset Outlook
Sunny Overcast Rain
Humidity Yes Wind
High Normal Strong Weak
No Yes No Yes
What prediction would we make for
<outlook=Sunny, temperature=Hot, humidity=High, wind=Weak> ?
35
3 – Decision Tree
! Constructing Decision Tree: ID3
Why does it make
more sense to test the
Feature 1 feature 1 first?
Value 11 Value 12 Value 13
Feature 2 Yes Feature 3
Value 21 Value 22 Value 31 Value 32
No Yes No Yes
36
3 – Decision Tree
! Constructing Decision Tree: ID3
Ø Iterative Dichotomiser 3 Information Gain
Entropy
Top-Down Feature 1
Value 11 Value 12 Value 13
Feature 2 Yes Feature 3
Value 21 Value 22 Value 31 Value 32
No Yes No Yes
37
3 – Decision Tree
! Constructing Decision Tree: ID3
Ø Entropy is a measure of randomness/uncertainty of a set
Ø Data: a set S of examples with C many classes
Ø Probability vector a = [p1, p2,…, pc] is the class distribution of the set S
Ø Entropy of the set S:
𝐸 S = − % p! log $ p!
!∈#
Ø If a set S of examples has
Some dominant classes => small entropy of the class distribution
Equiprobable classes => high entropy of the class distribution
38
3 – Decision Tree
! Constructing Decision Tree: ID3
Ø Entropy of the set S:
𝐸 S = − % p! log $ p!
!∈#
Ø Example:
S: 14 examples (9: class c1, 5: class c2)
=> E(S) = -(9/14).log2(9/14) –(5/14).log2(5/14) = 0.94
39
3 – Decision Tree
! Constructing Decision Tree: ID3
Ø Entropy = 0 1
=> all samples class 1 (or 2)
Ø Entropy = 1
E(S)
=> num samples class 1 = class 2
Ø Entropy ∈ (0,1)
=> num samples class 1 ≠ class 2
0 1
40
3 – Decision Tree
! Constructing Decision Tree: ID3
Ø Information Gain (IG) on knowing the value of the feature F in S:
S%
Gain S, F = E(S) − % E(S% )
S
%∈&
Ø Sf donates the subset of elements of S for which feature F has value f
41
3 – Decision Tree
! Constructing Decision Tree: ID3
Ø Information Gain (IG)
Ø Gain(S, Wind) with Wind: Weak or Strong
S = {9: Yes, 5: No}
Sweak = {6: Yes, 2: No}
Wstrong = {3: Yes, 3: No}
=> Gain S, Wind
8 6
=E S − E S'()* − E S+,-./0
14 14
1 4
= 0.94 − ∗ 0.81 − ∗ 1 = 0.048
23 23
42
3 – Decision Tree
! Constructing Decision Tree: ID3
Ø Information Gain (IG)
Choose Outlook with
Gain(S, Outlook) = 0.246 the highest Gain score
Gain(S, Temperature) = 0.029
Gain(S, Humidity) = 0.151 Outlook S = {9+, 5-}
Gain(S, Wind) = 0.048
Sunny Overcast Rain
Node 1 Yes Mode 2
Ssunny = {2+, 3-} Sovercast = {2+, 0-} Srain = {3+, 2-}
43
3 – Decision Tree
! Constructing Decision Tree: ID3
Ø Information Gain (IG)
Fine “Node 1”: {Temperature, Humidity, Wind?
Gain(Ssunny, Temperature) = 0.57 Outlook S = {9+, 5-}
Gain(Ssunny, Humidity) = 0.57
Sunny Overcast Rain
Gain(Ssunny, Wind) = 0.57
Node 1 Yes Mode 2
Ssunny = {2+, 3-} Sovercast = {2+, 0-} Srain = {3+, 2-}
44
3 – Decision Tree
! Constructing Decision Tree: ID3
Ø Information Gain (IG)
Fine “Node 1”: {Temperature, Humidity, Wind?
Gain(Ssunny, Temperature) = 0.57 Outlook S = {9+, 5-}
Gain(Ssunny, Humidity) = 0.57
Sunny Overcast Rain
Gain(Ssunny, Wind) = 0.57
Ssunny = {2+, 3-} Humidity Yes Mode 2
Sovercast = {2+, 0-} Srain = {3+, 2-}
High Normal
{0+, 3-} No {2+, 0-} Yes
45
3 – Decision Tree
! Constructing Decision Tree: ID3
Ø Information Gain (IG) Outlook
Sunny Overcast Rain
Humidity Yes Wind
High Normal Strong Weak
No Yes No Yes
Test = <outlook=Sunny, temperature=Hot, humidity=High, wind=Weak> => No
46
3 – Decision Tree
! Overfitting in Decision Trees
Ø Desired: a DT that is not too big in size, yet fits the training data reasonably
Source
47
3 – Decision Tree
! Overfitting in Decision Trees
Ø Mainly two approaches
q Prune while building the tree (Stopping Early)
q Prune after building the tree (Post-Pruning)
Ø Criteria for judging which nodes could potentially be pruned: evaluate validation set
48
3 – Decision Tree
! Decision Tree for Iris Dataset
49
SUMMARY
KNN Decision Tree
Ø Predicted based on K-Nearest Neighbors Ø Build a decision tree (ID3)
from the training data through Geometry Ø Defined by a hierarchy of rules (in form of
Distance Functions a tree)
50
Thanks!
Any questions?
51