[go: up one dir, main page]

0% found this document useful (0 votes)
21 views51 pages

(Slide) KNN DecisionTree

The document provides an overview of the K-Nearest Neighbors (KNN) algorithm, detailing its classification and regression approaches, as well as its applications in various datasets. It explains the steps involved in using KNN, including calculating distances, finding neighbors, and voting on labels. Additionally, it touches on decision trees and feature scaling, highlighting the importance of selecting the optimal value of K for effective model performance.

Uploaded by

nqthanh2101
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views51 pages

(Slide) KNN DecisionTree

The document provides an overview of the K-Nearest Neighbors (KNN) algorithm, detailing its classification and regression approaches, as well as its applications in various datasets. It explains the steps involved in using KNN, including calculating distances, finding neighbors, and voting on labels. Additionally, it touches on decision trees and feature scaling, highlighting the importance of selecting the optimal value of K for effective model performance.

Uploaded by

nqthanh2101
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 51

AI VIETNAM

All-in-One Course

Machine Learning

K-NEAREST NEIGHBORS
DECISION TREE
Nguyen Quoc Thai

1
Year 2023
CONTENT

(1) – K-Nearest Neighbors (KNN)


(2) – KNN Applications
(3) – Decision Tree
(4) – Summary

2
1 – K Nearest Neighbors
! What is the KNN Algorithm?

Ø KNN is one of the simplest supervised machine learning algorithms


Ø Lazy Learning:
q Does not “LEARN” until the test example is given
q A new data is predicted based on K-Nearest Neighbors from the training data

3
1 – K Nearest Neighbors
! What is the KNN Algorithm?

Training Phrase Memory-based Learning

Training Data
Similarity Measure
Testing Phrase arch
Se

Test Data K-Nearest Neighbors


4
1 – K Nearest Neighbors
! What is the KNN Algorithm?

Training Phrase Memory-based Learning

Training Data
Similarity Measure
Testing Phrase arch
Se
Predict

Test Data K-Nearest Neighbors


5
1 – K Nearest Neighbors
! What is the KNN Algorithm?

Training Phrase Memory-based Learning

Non-Parametric
Training Data
Similarity Measure
Testing Phrase arch
Se
Predict

Test Data K-Nearest Neighbors


6
1 – K Nearest Neighbors
! What is the KNN Algorithm?

Training Phrase Memory-based Learning

Solving
Regression
Classfication
Training Data

Testing Phrase arch


Se
Predict

Test Data K-Nearest Neighbors


7
1 – K Nearest Neighbors
! What is the KNN Algorithm?

Regression Classification

Ø Predict a continuous value based Ø Classify input variables to identify


on the input variables discrete output variables (labels,
categories)
What will be the Will it be hot or
temperature tomorrow? cold tomorrow?

8
1 – K Nearest Neighbors
! KNN: Classification Approach

Training Data
Classification
Feature Label
Ø Classify input variables to identify
discrete output variables (labels,
categories)
Will it be hot or … …
cold tomorrow?

Test Data
?

9
1 – K Nearest Neighbors
! KNN: Classification Approach

Step 1: Look at the data

Training Data
Petal_Length Petal_Width Label
1.4 0.2 0
1.3 0.4 0
4 1 1
4.7 1.4 1

Test Data
2.4 0.8
New data 10
1 – K Nearest Neighbors
! KNN: Classification Approach

Step 2: Calculate distances Euclidean Distance $


𝑑(x, y) = ( x ! − y! %

Training Data !"#

Petal_Length Petal_Width Label Distance


1.4 0.2 0 1.166
1.3 0.4 0 1.170
4 1 1 1.612
2.376
4.7 1.4 1 2.376
6
1.16 1.612
Test Data 1.1
70

2.4 0.8
11
1 – K Nearest Neighbors
! KNN: Classification Approach

Step 3: Find neighbours

Training Data
Ranking points
Petal_Length Petal_Width Label Distance
1.4 0.2 0 1.166 1 st
1.3 0.4 0 1.170 2 nd
4 1 1 1.612 3 rd
4.7 1.4 1 2.376 4 th

Test Data Find the nearest neighbours by ranking


2.4 0.8 points by increasing distance
12
1 – K Nearest Neighbors
! KNN: Classification Approach

Step 4: Vote on labels

Training Data
Petal_Length Petal_Width Label Distance K=3 Nearest neighbours # of votes
1.4 0.2 0 1.166 1 st 2
1.3 0.4 0 1.170 2 nd
4 1 1 1.612 3 rd 1
4.7 1.4 1 2.376

Test Data Vote on the predicted class labels based


2.4 0.8 0 on the class of the k nearest neighbors
13
1 – K Nearest Neighbors
! KNN: Regression Approach

Step 1: Look at the data

Training Data
Length Width Price
2.0 2.0 2.0
3.0 3.0 2.5
4.0 3.0 3.5
5.0 2.0 5.0
New data
Test Data
3.0 2.0
14
1 – K Nearest Neighbors
! KNN: Regression Approach

Step 2: Calculate distances Euclidean Distance $


𝑑(x, y) = ( x ! − y! %

Training Data !"#

Length Width Price Distance


1.0 1.4
2.0 2.0 2.0 1.0
1.0 2.0
3.0 3.0 2.5 1.0
4.0 3.0 3.5 1.4
5.0 2.0 5.0 2.0

Test Data
3.0 2.0
15
1 – K Nearest Neighbors
! KNN: Regression Approach

Step 3: Find neighbours

Training Data
Ranking points
Length Width Price Distance
2.0 2.0 2.0 1.0 1 st
3.0 3.0 2.5 1.0 2 nd
4.0 3.0 3.5 1.4 3 rd
5.0 2.0 5.0 2.0 4 th

Test Data Find the nearest neighbours by ranking


3.0 2.0 points by increasing distance
16
1 – K Nearest Neighbors
! KNN: Regression Approach

Step 4: Vote on labels


1
Y&'() = ( y*
k
Training Data *∈,-

Length Width Price Distance K=4 Nearest neighbours


2.0 2.0 2.0 1.0 1 st Y&'()
3.0 3.0 2.5 1.0 2 nd 1
= 2.0 + 2.5 + 3.5 + 5.0
4
4.0 3.0 3.5 1.4 3 rd =3.25
5.0 2.0 5.0 2.0 4 th

Test Data Compute the mean value of the


3.0 2.0 3.25 k nearest neighbors
17
1 – K Nearest Neighbors
! KNN: Summary
Classification
Regression

18
1 – K Nearest Neighbors
! Geometry Distance Functions

Ø Euclidean (p=2) Ø Manhattan (p=1)


/
$
%
𝑑(x, y) = ( 𝑥. − 𝑦.
𝑑(x, y) = ( x ! − y! ."#
!"#

Ø Minkowski (p-norm) Ø Chebyshev (p=)


# #
/ 0 / 0
𝑑(x, y) = ( 𝑥. − 𝑦. 0 𝑑 x, y = lim ( 𝑥. − 𝑦. 0
0→2
."# ."#
= max 𝑥. − 𝑦.
.

19
1 – K Nearest Neighbors
! Feature Scaling (Normalization)

Ø Standardize the range of independent variables (feature of data)


Strong Influence
Training Data

Petal_Length L_Distance Petal_Width W_Distance Label Distance Rank


1.4 1.8 0.2 0.4 0 1.844 3
1.3 1.9 0.4 0.2 0 1.910 4
4 0.8 1 0.4 1 0.894 1
4.7 1.3 1.4 0.8 1 1.526 2
Test Data
3.2 0.6
20
1 – K Nearest Neighbors
! Feature Scaling (Normalization)

Ø Standardize the range of independent variables (feature of data)

Training Data MinMaxScaler Normalization

Petal_Length L_Distance Petal_Width W_Distance Label Distance Rank


0.03 0.53 0.00 0.33 0 0.624 3
0.00 0.56 0.17 0.16 0 0.582 2
0.79 0.23 0.66 0.33 1 0.402 1
1.00 0.44 1.00 0.67 1 0.801 4
Test Data
0.56 0.33
21
1 – K Nearest Neighbors
! Searching in KNN

Ø Training dataset: N samples in D dimensions


Ø Brute Force: Naïve neighbor search – O[DN2]

1 KD Tree O[NDlog(N)] 2 Ball Tree

Source: https://varshasaini.in/kd-tree-and-ball-tree-knn-algorithm/
22
1 – K Nearest Neighbors
! How to find the optimal value of K in KNN?

Ø Choose K based on the evaluation on the validation set (Accuracy, Error, F-Score,…)

Error
K=5 Training Error

K=3

Validation Error

2
K Value
23
2 – KNN Applications
! KNN for Regression: “Diabetes” Dataset

Ø Sample: 442
Ø Features: 10
Ø Target: 25 – 346
Ø R2-Score (Validation)

24
2 – KNN Applications
! KNN for Classification: “Iris” Dataset

Ø Sample: 150
Ø Features: 4
Ø Classes: 3 (50 per class)
Ø Accuracy (Validation)

25
2 – KNN Applications
! KNN for Text Classification: “IMDB” Dataset

Ø Sample: 50.000 movie review


Ø Classes: 2 – Positive and Negative (25.000 per class)
Ø Accuracy, F1-Score

26
https://paperswithcode.com/sota/text-classification-on-imdb
2 – KNN Applications
! KNN for Text Classification: “IMDB” Dataset

Text Feature Label

I like this movie 1


So bad Convert text 0
Boring to feature 0
Love it so much 1

BoW
KNN
Classifier

27
2 – KNN Applications
! KNN for Text Classification: “IMDB” Dataset

Ø Bag of Words (BoW)


Ø Document-Level: Consider text as a bag (collection) of words
Ø Represented by a V-dimensional
Use: the number of occurrences of the word in the document

[dog, bites, man] Vocabulary

[man, bites, dog] IDX 0 1 2 3 4 5


[dog, eats, meat] Token bites dog eats food man meat
[man, eats, food]
Counter

[dog, bites, man] 1 1 0 0 1 0


28
2 – KNN Applications
! KNN for Text Classification: “IMDB” Dataset

Ø Bag of Words (BoW)


Ø Document-Level: Consider text as a bag (collection) of words
Ø Represented by a V-dimensional

[dog, bites, man] 1 1 0 0 1 0

[man, bites, dog] 1 1 0 0 1 0

[dog, eats, meat] 0 1 1 0 0 1

[man, eats, food] 0 0 0 1 1 0

29
2 – KNN Applications
! KNN for Text Classification: “IMDB” Dataset

Ø Bag of Words (BoW)


Ø Out of vocabulary (OOV)
Vocabulary
IDX 0 1 2 3 4 5
Token bites dog eats food man meat

Counter

[dog, bites, man] 1 1 0 0 1 0

[dog, and, dog, are, friends] 0 2 0 0 0 0

30
2 – KNN Applications
! KNN for Text Classification: “IMDB” Dataset

Ø Bag of Words (BoW)


Ø Representation without considering frequency (Binary Representation)
Vocabulary
IDX 0 1 2 3 4 5
Token bites dog eats food man meat

Counter

[dog, and, dog, are, friends] 0 2 0 0 0 0

Convert to binary representation

[dog, and, dog, are, friends] 0 1 0 0 0 0


31
3 – Decision Tree
! Decision Tree

Ø A possible decision tree for the data


Ø Described by IF-ELSE sets
Feature 1

Value 11 Value 12 Value 13

Feature 2 Yes Feature 3

Value 21 Value 22 Value 31 Value 32

No Yes No Yes
32
3 – Decision Tree
! Decision Tree

Ø Each internal node: attribute X (feature)


Ø Each branch from a node: value of X
Feature 1
Ø Each leaf node: predict value
Value 11 Value 12 Value 13

Feature 2 Yes Feature 3

Value 21 Value 22 Value 31 Value 32

No Yes No Yes
33
3 – Decision Tree
! Decision Tree

Ø Example: Play Tennis dataset

34
3 – Decision Tree
! Decision Tree

Ø Example: Play Tennis dataset Outlook

Sunny Overcast Rain

Humidity Yes Wind

High Normal Strong Weak

No Yes No Yes

What prediction would we make for


<outlook=Sunny, temperature=Hot, humidity=High, wind=Weak> ?
35
3 – Decision Tree
! Constructing Decision Tree: ID3

Why does it make


more sense to test the
Feature 1 feature 1 first?

Value 11 Value 12 Value 13

Feature 2 Yes Feature 3

Value 21 Value 22 Value 31 Value 32

No Yes No Yes
36
3 – Decision Tree
! Constructing Decision Tree: ID3

Ø Iterative Dichotomiser 3 Information Gain


Entropy
Top-Down Feature 1

Value 11 Value 12 Value 13

Feature 2 Yes Feature 3

Value 21 Value 22 Value 31 Value 32

No Yes No Yes
37
3 – Decision Tree
! Constructing Decision Tree: ID3

Ø Entropy is a measure of randomness/uncertainty of a set


Ø Data: a set S of examples with C many classes
Ø Probability vector a = [p1, p2,…, pc] is the class distribution of the set S
Ø Entropy of the set S:
𝐸 S = − % p! log $ p!
!∈#
Ø If a set S of examples has
Some dominant classes => small entropy of the class distribution
Equiprobable classes => high entropy of the class distribution

38
3 – Decision Tree
! Constructing Decision Tree: ID3

Ø Entropy of the set S:


𝐸 S = − % p! log $ p!
!∈#
Ø Example:
S: 14 examples (9: class c1, 5: class c2)
=> E(S) = -(9/14).log2(9/14) –(5/14).log2(5/14) = 0.94

39
3 – Decision Tree
! Constructing Decision Tree: ID3

Ø Entropy = 0 1
=> all samples class 1 (or 2)
Ø Entropy = 1

E(S)
=> num samples class 1 = class 2
Ø Entropy ∈ (0,1)
=> num samples class 1 ≠ class 2

0 1

40
3 – Decision Tree
! Constructing Decision Tree: ID3

Ø Information Gain (IG) on knowing the value of the feature F in S:


S%
Gain S, F = E(S) − % E(S% )
S
%∈&
Ø Sf donates the subset of elements of S for which feature F has value f

41
3 – Decision Tree
! Constructing Decision Tree: ID3

Ø Information Gain (IG)


Ø Gain(S, Wind) with Wind: Weak or Strong
S = {9: Yes, 5: No}
Sweak = {6: Yes, 2: No}
Wstrong = {3: Yes, 3: No}
=> Gain S, Wind
8 6
=E S − E S'()* − E S+,-./0
14 14
1 4
= 0.94 − ∗ 0.81 − ∗ 1 = 0.048
23 23

42
3 – Decision Tree
! Constructing Decision Tree: ID3

Ø Information Gain (IG)


Choose Outlook with
Gain(S, Outlook) = 0.246 the highest Gain score
Gain(S, Temperature) = 0.029
Gain(S, Humidity) = 0.151 Outlook S = {9+, 5-}
Gain(S, Wind) = 0.048
Sunny Overcast Rain

Node 1 Yes Mode 2


Ssunny = {2+, 3-} Sovercast = {2+, 0-} Srain = {3+, 2-}

43
3 – Decision Tree
! Constructing Decision Tree: ID3

Ø Information Gain (IG)


Fine “Node 1”: {Temperature, Humidity, Wind?
Gain(Ssunny, Temperature) = 0.57 Outlook S = {9+, 5-}

Gain(Ssunny, Humidity) = 0.57


Sunny Overcast Rain
Gain(Ssunny, Wind) = 0.57
Node 1 Yes Mode 2
Ssunny = {2+, 3-} Sovercast = {2+, 0-} Srain = {3+, 2-}

44
3 – Decision Tree
! Constructing Decision Tree: ID3

Ø Information Gain (IG)


Fine “Node 1”: {Temperature, Humidity, Wind?
Gain(Ssunny, Temperature) = 0.57 Outlook S = {9+, 5-}

Gain(Ssunny, Humidity) = 0.57


Sunny Overcast Rain
Gain(Ssunny, Wind) = 0.57
Ssunny = {2+, 3-} Humidity Yes Mode 2
Sovercast = {2+, 0-} Srain = {3+, 2-}
High Normal

{0+, 3-} No {2+, 0-} Yes


45
3 – Decision Tree
! Constructing Decision Tree: ID3

Ø Information Gain (IG) Outlook

Sunny Overcast Rain

Humidity Yes Wind

High Normal Strong Weak

No Yes No Yes

Test = <outlook=Sunny, temperature=Hot, humidity=High, wind=Weak> => No


46
3 – Decision Tree
! Overfitting in Decision Trees

Ø Desired: a DT that is not too big in size, yet fits the training data reasonably

Source
47
3 – Decision Tree
! Overfitting in Decision Trees

Ø Mainly two approaches


q Prune while building the tree (Stopping Early)
q Prune after building the tree (Post-Pruning)
Ø Criteria for judging which nodes could potentially be pruned: evaluate validation set

48
3 – Decision Tree
! Decision Tree for Iris Dataset

49
SUMMARY
KNN Decision Tree

Ø Predicted based on K-Nearest Neighbors Ø Build a decision tree (ID3)


from the training data through Geometry Ø Defined by a hierarchy of rules (in form of
Distance Functions a tree)

50
Thanks!
Any questions?

51

You might also like