[go: up one dir, main page]

0% found this document useful (0 votes)
21 views50 pages

COS10022 DSP Week05 Decision Tree and Random Forest

Uploaded by

thaianhbk1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views50 pages

COS10022 DSP Week05 Decision Tree and Random Forest

Uploaded by

thaianhbk1
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

COS10022 – DATA SCIENCE PRINCIPLES

Dr Pei-Wei Tsai (Lecturer, Unit Convenor)


ptsai@swin.edu.au, EN508d
Week 05

Decision
Tree

Ensemble
Learning

Random
Forest
2
DECISION TREE
3
Root Decision Tree
Trunk Learning Processing Splitting Pruning
Type data types Rules Techniques
•Supervised •Discrete •Entropy •Predicting
Decision Node Learning •Continuous •Gain Ratio
•Chi-Square
Error Rate
•Entire Error
(depending Test Rate
on the •Gini Index (Training
algorithm) and
•etc.
Predicting)
•No pruning.

Terminal Node (Leaf)

◦ Goal:
Predict/classify the result based on the given criteria.
4

https://p7.hiclipart.com/preview/635/85/121/trunk-tree-stump-bark-tree.jpg, https://miro.medium.com/max/592/0*X-UrBzBeKMnoTY6H.png
Predictors Target
Outdoor
Outlook Temp Humidity Windy
Exercise
Rainy Hot High False No Outlook
Rainy Hot High True No
Cloudy Hot High False Yes
Sunny Mild High False Yes Sunny Cloudy Rainy
Sunny Cool Normal False Yes
Sunny Cool Normal True No
Cloudy Cool Normal True Yes Windy Yes Humidity
Rainy Mild High False No False True Normal High
Rainy Cool Normal False Yes
Sunny Mild Normal False Yes Yes No Yes No
Rainy Mild Normal True Yes
Cloudy Mild High True Yes
Cloudy Hot Normal False Yes
Sunny Mild High True No

Decision Tree (2) ◦ Using algorithms such as ID3 allows users to build
decision tree easily.

5
https://medium.com/@rishabhjain_22692/decision-trees-it-begins-here-93ff54ef134
Outlook Wind Temp Storage Sailing
Sunny 3 30 Yes Yes

Decision Tree (2) Sunny


Rain
Overcast
3
12
15
25
15
2
Yes
Yes
No
No
Yes
No
Rain 16 25 Yes Yes
Sunny 14 18 Yes Yes
Rain 3 5 No No
Sunny 9 20 Yes Yes
Overcast 14 5 No No
Sunny 1 7 No No
Rain 4 25 Yes No
Rain 14 24 Yes Yes
Sunny 11 20 Yes Yes
Sunny 2 18 Yes No
Overcast 8 22 Yes Yes
Overcast 13 24 Yes Yes

6
A Clever Choice of the Attribute for
Splitting would Benefit Your Result
• ID3, CART, etc. • Splitting the • Entropy
dataset in a way • Information Gain
that the subsets Ratio
are as pure as • Gini Index
possible in terms of
• Etc.
the target variable.

How does the


algorithm decide when
How to construct a
Purpose: to use which feature in
decision tree?
order to split the
dataset?

7
◦ Where can we use Entropy?
◦ Entropy is one of the
Entropy: measurements in the information
theory, which is a field of study
concerned with quantifying
Η 𝑝 = − σ𝑛𝑖=1 𝑝𝑖 log 2 𝑝𝑖 information for communication.
Entropy
for 𝑝 ∈ ℚ𝑛

where Η (Greek capital ◦ Mathematically, it is defined as the


letter eta) defines the negative of the sum over the
entropy, 𝑝𝑖 indicates the probability for each class multiplied
probability mass function, by the logarithm of it.
and 𝑛 is the number of
output states.
8

https://www.knime.com/knime-introductory-course/chapter6/section3/decision-tree
Entropy Calculation Η 𝑝 = − σ𝑛𝑖=1 𝑝𝑖 log 2 𝑝𝑖 for 𝑝 ∈ ℚ𝑛

Η 𝑝 = − 7ൗ13 log 2 7ൗ13 − 6ൗ13 log 2 6ൗ13 = 0.995 Η 𝑝 = − 13ൗ13 log 2 13ൗ13 − 0ൗ13 log 2 0ൗ13 = 0
9

https://www.knime.com/knime-introductory-course/chapter6/section3/decision-tree
Entropy Calculation Example
◦ Calculate the Entropy to
Outlook Wind Temp Storage Sailing
find go sailing from all
Sunny 3 30 Yes Yes
Sunny 3 25 Yes No observations.
Rain 12 15 Yes Yes
Overcast 15 2 No No
◦ Η 9+ , 7−
Rain 16 25 Yes Yes 9 9 7 7
Sunny 14 18 Yes Yes = − log 2 − log 2
16 16 16 16
Rain 3 5 No No
= 0.9887
Sunny 9 20 Yes Yes
Overcast 14 5 No No
Sunny 1 7 No No
Rain 4 25 Yes No
Rain 14 24 Yes Yes
Sunny 11 20 Yes Yes
Sunny 2 18 Yes No
Overcast 8 22 Yes Yes
Overcast 13 24 Yes Yes

10
Entropy Calculation Example 2
◦ Assume a pool of animal (cats and dogs) are somehow Animal
split into two groups.
◦ Let’s find the entropy for each group.

6 6 6 6
Η𝐴 𝑝 = − σ2𝑖=1 𝑝𝑖 log 2 𝑝𝑖 = − 12 log 2 12 − 12 log 2 12 = 1
5 5 1 1
Η𝐶 𝑝 = − σ2𝑖=1 𝑝𝑖 log 2 𝑝𝑖 = − log 2 − log 2 = 0.65
6 6 6 6 Cat Dog
5 5 1 1
Η𝐷 𝑝 = − σ2𝑖=1 𝑝𝑖 log 2 𝑝𝑖 = − 6 log 2 6 − 6 log 2 6 = 0.65

◦ Entropy reduces when the data in a set is more pure.


◦ The differences between the new and the original
entropies is defined as the information gain.

11
Η0 𝑝 = − 7ൗ13 log 2 7ൗ13 − 6ൗ13 log 2 6ൗ13 = 0.995
Information Gain
– One of the Possible
Split Criteria
◦ In ID3 algorithm (one of the decision tree
construction algorithms), the information
gain, which is the differences of entropies
before and after applying the dataset
splitting rule.
◦ Information Gain after the split:
◦ 𝐺𝑎𝑖𝑛 = Η0 − 𝑤1 Η1 + 𝑤2 Η2
= 0.995 − 0.462 × 0.65 − 0.539 × 0.83
= 0.995 − 0.766
= 0.23
◦ Next splitting feature:
Η1 𝑝 = − 5ൗ6 log 2 5ൗ6 − 1ൗ6 log 2 1ൗ6 Η2 𝑝 = − 2ൗ7 log 2 2ൗ7 − 5ൗ7 log 2 5ൗ7 ◦ Starting from the one with the highest Gain.
= 0.65 = 0.863
𝑤1 = 6Τ13 = 0.462 𝑤2 = 7ൗ13 = 0.539
Disadvantage:
Solution: Using Gain Ratio in Prefers highly represented 12
C4.5 algorithm to replace ID3 features.
C4.5 Algorithm
◦ To overcome the disadvantage in ID3, C4.5
algorithm introduces the SplitInfo concept
with the Gain Ratio.
◦ The SplitInfo is defined as the sum over the
weights multiplied by the logarithm of the
weights.
◦ The Gain Ratio measure is then calculated by
dividing the gain from the ID3 algorithm by the
SplitInfo.

𝐺𝑎𝑖𝑛 Η0 − σ𝑘𝑖=1 𝑤𝑖 Η𝑖
𝐺𝑎𝑖𝑛𝑅𝑎𝑡𝑖𝑜 = =
Question: 𝑆𝑝𝑙𝑖𝑡𝐼𝑛𝑓𝑜 σ𝑘𝑖=1 𝑤𝑖 log 2 𝑤𝑖
Can Entropy be greater than 1?

13

https://towardsdatascience.com/entropy-how-decision-trees-make-decisions-2946b9c18c8
𝑝1 = 7ൗ13 CART Algorithm
𝑝2 = 6ൗ13 ◦ Another measurement for measuring the
purity or the actually impurity, which is used
by CART algorithm, is called the Gini index.
◦ Split criterion in CART algorithm:
𝑛
𝐺𝑖𝑛𝑖𝑖𝑛𝑑𝑒𝑥 = ෍ 𝑤𝑖 𝐺𝑖𝑛𝑖𝑖𝑚𝑝𝑢𝑟𝑖𝑡𝑦𝑖
𝑖=1

◦ Gini index is designed based on Gini


impurity:
𝐺𝑖𝑛𝑖𝑖𝑚𝑝𝑢𝑟𝑖𝑡𝑦 𝑝 = 1 − σ𝑛𝑖=1 𝑝𝑖2 for 𝑝 ∈ ℚ𝑛

◦ In this case:
6 7
𝐺𝑖𝑛𝑖𝑖𝑛𝑑𝑒𝑥 = 𝐺𝑖𝑛𝑖𝑖𝑚𝑝𝑢𝑟𝑖𝑡𝑦1 + 𝐺𝑖𝑛𝑖𝑖𝑚𝑝𝑢𝑟𝑖𝑡𝑦2
13 13
72 62
𝐺𝑖𝑛𝑖𝑖𝑚𝑝𝑢𝑟𝑖𝑡𝑦 𝑝 =1− 2− = 0.497
12 52 2 2 52 13 132
𝐺𝑖𝑛𝑖𝑖𝑚𝑝𝑢𝑟𝑖𝑡𝑦1 = 1 − 2 − 2 = 0.278 𝐺𝑖𝑛𝑖𝑖𝑚𝑝𝑢𝑟𝑖𝑡𝑦2 = 1 − 2 − 2 = 0.592
6 6 7 7 ◦ Next splitting feature:
𝑤1 = 6Τ13 = 0.462 𝑤2 = 7ൗ13 = 0.539 Feature with the lowest 𝐺𝑖𝑛𝑖𝑖𝑛𝑑𝑒𝑥

14

https://www.knime.com/knime-introductory-course/chapter6/section3/decision-tree
Gini Index Calculation Example
S/N Gender Class Play Computer Games ◦ Assume that we want to segregate students based on
1 M IV Yes an specific target variable (playing computer games).
2 F IV Yes
3 F IV Yes ◦ We split the population using two input variables:
4 M IV No Gender and Class.
5 F IV No
6 M IV Yes ◦ Now, we want to identify which split is producing more
7 M IV No
homogeneous sub-nodes using Gini index.
8 F V No
9 M V Yes
10 F V No
11 M V Yes
12 F V No
13 F V No

Total Observations = 13
Play Computer Games = 6 (46%)

15
S/N Gender Class Play Computer Games Split on Gender Split on Class
1 M IV Yes Total Observations = 13
2 F IV Yes
Play Computer Games = 6 (46%)
3 F IV Yes
4 M IV No
5 F IV No
6 M IV Yes
7 M IV No
Male Female Class IV Class V
8 F V No
9 M V Yes
10 F V No
11 M V Yes
Students = 6 Students = 7 Students = 7
12 F V No Students = 6
Play = 4 (67%) Play = 2 (29%) Play = 4 (57%)
13 F V No Play = 2 (33%)
𝐺𝑖𝑛𝑖𝑖𝑚𝑝𝑢𝑟𝑖𝑡𝑦𝑀 = 1 − 0.672 + 0.332 = 0.44 𝐺𝑖𝑛𝑖𝑖𝑚𝑝𝑢𝑟𝑖𝑡𝑦𝐼𝑉 = 1 − 0.572 + 0.432 = 0.49
𝐺𝑖𝑛𝑖𝑖𝑚𝑝𝑢𝑟𝑖𝑡𝑦𝐹 = 1 − 0.292 + 0.712 = 0.41 𝐺𝑖𝑛𝑖𝑖𝑚𝑝𝑢𝑟𝑖𝑡𝑦𝑉 = 1 − 0.332 + 0.662 = 0.46
6 7 7 6
𝐺𝑖𝑛𝑖𝑖𝑛𝑑𝑒𝑥 = × 0.44 + × 0.41 = 0.42 𝐺𝑖𝑛𝑖𝑖𝑛𝑑𝑒𝑥 = × 0.49 + × 0.46 = 0.48
13 13 13 13
Better Split

GINI INDEX CALCULATION EXAMPLE

16
Tree Construction Summary

Which question to ask?

How much a question helps to unmix the labels?

Quantify the amount of uncertainty at a single node using the Gini impurity metric.

Quantify how much a question reduces the uncertainty using Information Gain.

Repeating to ask questions until the data can not be divided further.

18
General Steps for Tree Construction
Start

Yes Is the data No Determine


pure? potential splits.

Create the Determine the


Split the data.
leaf. best split.

Stop
≤ Splitting >
criterion

19

https://www.youtube.com/watch?v=u4kbPtiVVB8
𝐺𝑖𝑛𝑖𝑖𝑚𝑝𝑢𝑟𝑖𝑡𝑦 𝑝 = 1 − σ𝑛𝑖=1 𝑝𝑖2 for 𝑝 ∈ ℚ𝑛

𝑛
Numerical
𝐺𝑖𝑛𝑖𝑖𝑛𝑑𝑒𝑥 = ෍ 𝑤𝑖 𝐺𝑖𝑛𝑖𝑖𝑚𝑝𝑢𝑟𝑖𝑡𝑦𝑖 Feature
Handling
𝑖=1

◦ Is the impurity equal


to 0 a good thing in all
cases?
◦ If a cluster only
contains a single data
(having one branch for
each possible
numerical value),
there is no meaning
for having the cluster.

20

https://www.knime.com/knime-introductory-course/chapter6/section3/decision-tree
Numerical
Feature
Handling
◦ Solution: Binary splits
◦ In case of
numerical features,
the split is always a
binary split.

◦ Candidate split points:


◦ Mid points
between
consecutive
values.

21

https://www.knime.com/knime-introductory-course/chapter6/section3/decision-tree
Right size
of the Tree?
◦ Like many other
machine learning
algorithms, Decision
Trees are subject to
potentially overfitting
the training data.
◦ Too shallow
◦ Too simple model that
don’t fit the data.

◦ Too deep
◦ Too detailed and don’t
generalize on new
data.

◦ Solution:
Pruning.

22

https://www.knime.com/knime-introductory-course/chapter6/section3/decision-tree
Pruning
◦ Goal
◦ Tree that generalizes better.

◦ Idea
◦ Cut branches that seems as a result from
overfitting the training set.

◦ Common pruning methods


◦ Reduced Error Pruning
◦ Prunes a branch if this doesn’t decrease the
accuracy for the training set.
◦ Minimum Description Length Pruning
◦ Prunes a branch if this decreases the
description length (#bits (tree) + #bits
(misclassified samples)) for the training set.

The description length is defined as the number of bits


that are needed to encode a tree plus the number of bits
to encode the samples of the training set that have been
misclassified. 23
Pre-pruning: Post-pruning:
• Limit the minimum samples in a group. • Let it grow first and then prune
• Limit the depth of the tree. it back. https://www.knime.com/knime-introductory-course/chapter6/section3/decision-tree
ENSEMBLE LEARNING
24
Ensemble
Learning
◦ Why is it necessary to ensemble?
◦ To train a super strong classifier is relatively
difficult comparing to train multiple weak
classifiers.
◦ The final result can be obtained by voting or by
taking the average of the output.

◦ Commonly used methods:


◦ Bagging (Bootstrapping)
◦ Boosting
◦ Stacking

25

https://becominghuman.ai/ensemble-learning-bagging-and-boosting-d20f38be9b1e
Bootstrapping

• Random sampling with replacement.


Bootstrap
refers to

• To better understand the bias, mean, the variance, and


the standard deviation from the dataset.
Strong
points

• Random sampling of small subset of data from the


dataset.
• The subset can be replaced.
Key • The selection of all examples in the dataset has equal
operations probability.

26

https://becominghuman.ai/ensemble-learning-bagging-and-boosting-d20f38be9b1e
Bootstrapping (2)
◦ Assume we have a set of data 𝑋 = 𝑥1 , 𝑥2 , ⋯ , 𝑥𝑛
containing 𝑛 values, where 𝑛 is not a relatively large
number.
◦ If we simply calculate the mean, standard
deviation, etc. by taking the whole dataset into
account, the result can be biased from the actual
situation.

σ𝑛
𝑖=1 𝑥𝑖
◦ 𝑚𝑒𝑎𝑛 𝑋 =
𝑛

◦ We know that our sample is small and thus the


mean has error in it.
◦ The estimation can be improved by bootstrapping.

27

https://becominghuman.ai/ensemble-learning-bagging-and-boosting-d20f38be9b1e
Bootstrapping (3)
• Create many (.e.g. 𝑚) random sub-
samples from the dataset with
Create replacement (meaning we can select
Subsamples the same value multiple times).

• Calculate the mean of each


Mean sub-sample.
Calculation

• Calculate the average of all


collected means and use them as
Ensemble the estimated mean for the data.
Results

28

https://becominghuman.ai/ensemble-learning-bagging-and-boosting-d20f38be9b1e
Bootstrapping Application Example
◦ Assume we have a small sample dataset: 𝑋 = 13, 21, 17, 22, 4, 5, 9, 28 extracted from the population
𝑋෠ = 17, 10, 6, 13, 9, 12, 28, 13, 0, 4, 21, 5, 22, 9, 13, 4, 21, 16, 3, 22 .
σ𝑛 𝑥
◦ You’ll find that the mean value of this dataset is 𝑚𝑒𝑎𝑛 𝑋 = 𝑖=1 𝑖 , where 𝑛 = 8 in this case and
𝑛
𝑚𝑒𝑎𝑛 𝑋 = 14.88
◦ To calibrate the mean value, we can randomly take 𝑚 sub-samples. Let’s say 𝑚 = 5 as an example.

13 + 22 + 4 + 28 67
𝑠1 = 𝑚𝑒𝑎𝑛 𝑋1 = = = 16.75
𝑋1 = 13, 22, 4, 28 4 4
13 + 21 + 5 + 9 + 28 76
𝑋2 = 13, 21, 5, 9, 28 𝑠2 = 𝑚𝑒𝑎𝑛 𝑋2 = = = 15.2
Find the means 5 5
17 + 4 + 9 30
𝑋3 = 17, 4, 9 𝑠3 = 𝑚𝑒𝑎𝑛 𝑋3 = = = 10
3 3
𝑋4 = 21, 22, 5, 28 21 + 22 + 5 + 28 76
𝑠4 = 𝑚𝑒𝑎𝑛 𝑋4 = = = 19
𝑋5 = 13, 17, 4, 9 4 4
13 + 17 + 4 + 9 43
𝑠5 = 𝑚𝑒𝑎𝑛 𝑋5 = = = 10.75
4 4
◦ Ensemble the final results:
σ𝑚
𝑗=1 𝑠𝑗 16.75 + 15.2 + 10 + 19 + 10.75 71.7
Calibrated Mean = = = = 14.34
𝑚 5 5
Population Mean = 12.4
Is this always 29
the case?
Bagging
◦ Bagging is also known as Bootstrap
Aggregation.
◦ A simple but a powerful ensemble method.
◦ It is typically an application of the Bootstrap
procedure to decision tree, which is a high-
variance machine learning algorithm.
◦ In bagging, training instances can be
sampled several times for the same
predictor.

30

https://medium.com/machine-learning-through-visuals/machine-learning-through-visuals-part-1-what-is-bagging-ensemble-learning-432059568cc8
Bagging (2)
◦ Moving on to bagging ensemble learning,
the output of each predictor (classifier) is
summarised by the aggregation function.
◦ The aggregation function is typically the
statistical mode for classification and
average for regression.
◦ It helps to reduce both bias and variance.

31

https://medium.com/machine-learning-through-visuals/machine-learning-through-visuals-part-1-what-is-bagging-ensemble-learning-432059568cc8
Boosting
◦ In boosting algorithms, they utilise the
weighted averages to mark weak learners
into strong learners.
◦ Different from bagging, which runs each
model independently before aggregating the
outputs at the end without preference to any
model, boosting is all about “teamwork.”
◦ Each model that runs in boosting, dictates
what features the next model will focus on.

32

https://becominghuman.ai/ensemble-learning-bagging-and-boosting-d20f38be9b1e
Boosting
◦ Data points
◦ + and – samples are equally weighted at the
beginning. (The weight is denoted by the size of
the symbols.)

◦ Box1
◦ The decision stump (D1) generates a vertical
line to classify the data points.
◦ For the three + on the right side, higher weights
are assigned to those + on the right side and
the classification process is handed over to D2.

◦ Box2
◦ Since there are three + having higher weights,
the second decision strum (D2) will try to
predict them correctly.
◦ A vertical line on the right side of box2 has
classified the three mis-classified + correctly.
◦ But again, it causes another mis-classification
errors on the three – on the left side. Thus,
those three – would gain weight in D3.

33

https://becominghuman.ai/ensemble-learning-bagging-and-boosting-d20f38be9b1e
Boosting
◦ Box3
◦ Since the three mis-classified – are given higher
weights, a decision stump (D3) is applied to
predict these mis-classified observation
correctly.
◦ This time, a horizontal line is generated to
classify + and – based on higher weight of mis-
classified observations.

◦ Box4
◦ By combining D1, D2, and D3, a strong
predictor, which has complex rules as
compared to individual weak learner, is
generated.

34

https://becominghuman.ai/ensemble-learning-bagging-and-boosting-d20f38be9b1e
Stacking
◦ Stacking often
considers
heterogeneous weak
learners, learns them
in parallel and
combines them by
training a meta-model
to output a prediction
based on the different
weak models
predictions.
◦ In recent years, a
more modern
approach is to create
an ensemble of a well-
chosen collection of
strong yet diverse
models in stacking.

35

https://www.researchgate.net/profile/Mahsa_Soufineyestani/publication/326224119/figure/download/fig1/AS:645263400112129@1530854191613/Stacking-model-1-S1.png
Stacking
◦ On the popular data science competition
site Kaggle (https://www.kaggle.com/), you
can explore numerous winning solutions
through its discussion forums to get a
flavour of the state of the art.
◦ Another popular data science competition is
the KDD Cup (http://www.kdd.org/kdd-cup).
The figure shown on the left is the winning
solution for the 2015 competition, which
used a three-stage stacked modelling
approach.

36

https://blogs.sas.com/content/subconsciousmusings/2017/05/18/stacked-ensemble-models-win-data-science-competitions/
RANDOM FOREST
37
Using Enhanced Multiclass object detection Random Forest is used in
Theromatic Mapping (ETM) is done using Random the KINECT game console.
devices to acquire images Forest algorithms
of the earth’s surface.
The accuracy is higher while Provides better detection in Tracks body movements and
the training time is less complicated environments recreates it in the game
comparing to other ML
methods.

◦ Remote Sensing
Where to use ◦ Object Detection

Random Forest? ◦ Gamming console (Object Detection and Tracking)

38
Application of Random Forest

Training set for Mapping


identifying body parts Classifier Learning Identify body parts
39

http://x-tech.am/wp-content/uploads/2013/10/blog11-884x442.jpg
Where does Random Forest fit in Data
Science?
Machine
Learning

Supervised Unsupervised Reinforcement


Learning Learning Learning

Classification Regression

Random
Forest
40
Why using Random Forest?
No Overfitting

• Use of multiple trees to reduce the risk of overfitting.


• Less training time.

High Accuracy

• Runs efficiently on large dataset.


• Produces highly accurate predictions in large dataset.

Estimates missing data

• Random Forest can maintain accuracy when a large proportion of data is


missing.

41
Tree1 Tree2 Tree3
What is Random
Forest?
Output 1 Output 2 Output 3
Definition

• Random Forest or Random


Decision Forest is a method
that operates by constructing
multiple Decision Trees during
Random
Majority training phases.
Forest
Operation

• It’s an application of stacking.


Final Decision

42
HOW DOES RANDOM FOREST
WORK?
43
Let’s Review How Decision Tree
Works, First.

Classification Criteria Training Sample


S/N Conditions Colour Diameter Shape Label
1 Colour == Brown? White 5 Round Golf ball
2 Diameter < 8? White 5 Round Golf ball
Football 3 Colour == White? Brown 30 Round Basketball
4 Diameter == 30? Brown 32 Oval Football
Basketball Brown
White
32
5
Oval
Round
Football
Golf ball
White 5 Round Golf ball
Golf ball Brown 30 Round Basketball
Brown 30 Round Basketball
Brown 32 Oval Football

44
Let’s Review How Decision Tree
Works, First.
7 7 4 4 5 5
Η0 𝑝 = − σ3𝑖=1 𝑝𝑖 log 2 𝑝𝑖 = − log 2 − log 2 − log 2 = 1.55 > 𝟏
16 16 16 16 16 16

Is the diameter <= 8?

True False
S/N Conditions
1 Colour == Brown?
2 Diameter < 8?
3 Colour == White?
4 Shape == Oval?

Colour Diameter Shape Label


White 5 Round Golf ball
White 5 Round Golf ball
Brown 30 Round Basketball
Brown 32 Oval Football
Brown 32 Oval Football 1
7 7 4 4 5 5
White 5 Round Golf ball Η11 𝑝 = − ෍ 𝑝𝑖 log2 𝑝𝑖 = − log 2 = 0 Η12 𝑝 = − log2 − log 2 = 0.99
White 5 Round Golf ball 7 7 9 9 9 9
𝑖=1
Brown 30 Round Basketball
Brown 30 Round Basketball
Brown 32 Oval Football 45
Let’s Review How
Decision Tree Works, Is the diameter <= 8?

First. True False

Shape == Oval?
S/N Conditions
True False
1 Colour == Brown?
2 Diameter < 8?
3 Colour == White?
4 Shape == Oval?

Colour Diameter Shape Label


White 5 Round Golf ball
White 5 Round Golf ball
Brown 30 Round Basketball
Brown 32 Oval Football
Brown 32 Oval Football
White 5 Round Golf ball Now we have 3 baskets, whose
White 5 Round Golf ball entropies are all 0, with 100%
Brown 30 Round Basketball
Brown 30 Round Basketball prediction accuracy.
Brown 32 Oval Football 46
How Does Random Forest Work?
Tree 2

Colour == Brown?

True False

S/N Conditions
1 Colour == Brown?
2 Diameter < 8?
3 Colour == White?
4 Shape == Oval? Shape == Oval?

True False

47
How Does Random Forest Work?
Tree 3

Colour == White?

True False

S/N Conditions
1 Colour == Brown?
2 Diameter < 8?
3 Colour == White?
4 Shape == Oval?
Shape == Oval?

True False

48
How Does Random Forest Work?
◦ Assume we have a new test data with no colour information.
◦ Let the default answer to be false if there is no judge on the particular classifying criterion.
◦ Let’s guess what this input will be.

49
How Does Random Forest Work?
Tree 1 Tree 2 Tree 3

Is the diameter <= 8? Colour == Brown? Colour == White?


True False True False True False

Shape == Oval? Shape == Oval? Shape == Oval?


True False True False True False

Majority Vote

50
• Entropy • Bagging • Definition
Decision Tree

Random Forest
Ensemble Learning
• Information • Boosting • Majority
Gain • Stacking Vote
• Gini Index
• Creation
• Pruning

WEEK 09 SUMMARY

55

You might also like