0% found this document useful (0 votes)

21 views50 pages

COS10022 DSP Week05 Decision Tree and Random Forest

Uploaded by

thaianhbk1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views50 pages

COS10022 DSP Week05 Decision Tree and Random Forest

Uploaded by

thaianhbk1

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 50

COS10022 – DATA SCIENCE PRINCIPLES

Dr Pei-Wei Tsai (Lecturer, Unit Convenor)

ptsai@swin.edu.au, EN508d
Week 05

Decision
Tree

Ensemble
Learning

Random
Forest
2
DECISION TREE
3
Root Decision Tree
Trunk Learning Processing Splitting Pruning
Type data types Rules Techniques
•Supervised •Discrete •Entropy •Predicting
Decision Node Learning •Continuous •Gain Ratio
•Chi-Square
Error Rate
•Entire Error
(depending Test Rate
on the •Gini Index (Training
algorithm) and
•etc.
Predicting)
•No pruning.

Terminal Node (Leaf)

◦ Goal:
Predict/classify the result based on the given criteria.
4

https://p7.hiclipart.com/preview/635/85/121/trunk-tree-stump-bark-tree.jpg, https://miro.medium.com/max/592/0*X-UrBzBeKMnoTY6H.png
Predictors Target
Outdoor
Outlook Temp Humidity Windy
Exercise
Rainy Hot High False No Outlook
Rainy Hot High True No
Cloudy Hot High False Yes
Sunny Mild High False Yes Sunny Cloudy Rainy
Sunny Cool Normal False Yes
Sunny Cool Normal True No
Cloudy Cool Normal True Yes Windy Yes Humidity
Rainy Mild High False No False True Normal High
Rainy Cool Normal False Yes
Sunny Mild Normal False Yes Yes No Yes No
Rainy Mild Normal True Yes
Cloudy Mild High True Yes
Cloudy Hot Normal False Yes
Sunny Mild High True No

Decision Tree (2) ◦ Using algorithms such as ID3 allows users to build
decision tree easily.

5
https://medium.com/@rishabhjain_22692/decision-trees-it-begins-here-93ff54ef134
Outlook Wind Temp Storage Sailing
Sunny 3 30 Yes Yes

Decision Tree (2) Sunny

Rain
Overcast
3
12
15
25
15
2
Yes
Yes
No
No
Yes
No
Rain 16 25 Yes Yes
Sunny 14 18 Yes Yes
Rain 3 5 No No
Sunny 9 20 Yes Yes
Overcast 14 5 No No
Sunny 1 7 No No
Rain 4 25 Yes No
Rain 14 24 Yes Yes
Sunny 11 20 Yes Yes
Sunny 2 18 Yes No
Overcast 8 22 Yes Yes
Overcast 13 24 Yes Yes

6
A Clever Choice of the Attribute for
Splitting would Benefit Your Result
• ID3, CART, etc. • Splitting the • Entropy
dataset in a way • Information Gain
that the subsets Ratio
are as pure as • Gini Index
possible in terms of
• Etc.
the target variable.

How does the

algorithm decide when
How to construct a
Purpose: to use which feature in
decision tree?
order to split the
dataset?

7
◦ Where can we use Entropy?
◦ Entropy is one of the
Entropy: measurements in the information
theory, which is a field of study
concerned with quantifying
Η 𝑝 = − σ𝑛𝑖=1 𝑝𝑖 log 2 𝑝𝑖 information for communication.
Entropy
for 𝑝 ∈ ℚ𝑛

where Η (Greek capital ◦ Mathematically, it is defined as the

letter eta) defines the negative of the sum over the
entropy, 𝑝𝑖 indicates the probability for each class multiplied
probability mass function, by the logarithm of it.
and 𝑛 is the number of
output states.
8

https://www.knime.com/knime-introductory-course/chapter6/section3/decision-tree
Entropy Calculation Η 𝑝 = − σ𝑛𝑖=1 𝑝𝑖 log 2 𝑝𝑖 for 𝑝 ∈ ℚ𝑛

Η 𝑝 = − 7ൗ13 log 2 7ൗ13 − 6ൗ13 log 2 6ൗ13 = 0.995 Η 𝑝 = − 13ൗ13 log 2 13ൗ13 − 0ൗ13 log 2 0ൗ13 = 0
9

https://www.knime.com/knime-introductory-course/chapter6/section3/decision-tree
Entropy Calculation Example
◦ Calculate the Entropy to
Outlook Wind Temp Storage Sailing
find go sailing from all
Sunny 3 30 Yes Yes
Sunny 3 25 Yes No observations.
Rain 12 15 Yes Yes
Overcast 15 2 No No
◦ Η 9+ , 7−
Rain 16 25 Yes Yes 9 9 7 7
Sunny 14 18 Yes Yes = − log 2 − log 2
16 16 16 16
Rain 3 5 No No
= 0.9887
Sunny 9 20 Yes Yes
Overcast 14 5 No No
Sunny 1 7 No No
Rain 4 25 Yes No
Rain 14 24 Yes Yes
Sunny 11 20 Yes Yes
Sunny 2 18 Yes No
Overcast 8 22 Yes Yes
Overcast 13 24 Yes Yes

10
Entropy Calculation Example 2
◦ Assume a pool of animal (cats and dogs) are somehow Animal
split into two groups.
◦ Let’s find the entropy for each group.

6 6 6 6
Η𝐴 𝑝 = − σ2𝑖=1 𝑝𝑖 log 2 𝑝𝑖 = − 12 log 2 12 − 12 log 2 12 = 1
5 5 1 1
Η𝐶 𝑝 = − σ2𝑖=1 𝑝𝑖 log 2 𝑝𝑖 = − log 2 − log 2 = 0.65
6 6 6 6 Cat Dog
5 5 1 1
Η𝐷 𝑝 = − σ2𝑖=1 𝑝𝑖 log 2 𝑝𝑖 = − 6 log 2 6 − 6 log 2 6 = 0.65

◦ Entropy reduces when the data in a set is more pure.

◦ The differences between the new and the original
entropies is defined as the information gain.

11
Η0 𝑝 = − 7ൗ13 log 2 7ൗ13 − 6ൗ13 log 2 6ൗ13 = 0.995
Information Gain
– One of the Possible
Split Criteria
◦ In ID3 algorithm (one of the decision tree
construction algorithms), the information
gain, which is the differences of entropies
before and after applying the dataset
splitting rule.
◦ Information Gain after the split:
◦ 𝐺𝑎𝑖𝑛 = Η0 − 𝑤1 Η1 + 𝑤2 Η2
= 0.995 − 0.462 × 0.65 − 0.539 × 0.83
= 0.995 − 0.766
= 0.23
◦ Next splitting feature:
Η1 𝑝 = − 5ൗ6 log 2 5ൗ6 − 1ൗ6 log 2 1ൗ6 Η2 𝑝 = − 2ൗ7 log 2 2ൗ7 − 5ൗ7 log 2 5ൗ7 ◦ Starting from the one with the highest Gain.
= 0.65 = 0.863
𝑤1 = 6Τ13 = 0.462 𝑤2 = 7ൗ13 = 0.539
Disadvantage:
Solution: Using Gain Ratio in Prefers highly represented 12
C4.5 algorithm to replace ID3 features.
C4.5 Algorithm
◦ To overcome the disadvantage in ID3, C4.5
algorithm introduces the SplitInfo concept
with the Gain Ratio.
◦ The SplitInfo is defined as the sum over the
weights multiplied by the logarithm of the
weights.
◦ The Gain Ratio measure is then calculated by
dividing the gain from the ID3 algorithm by the
SplitInfo.

𝐺𝑎𝑖𝑛 Η0 − σ𝑘𝑖=1 𝑤𝑖 Η𝑖
𝐺𝑎𝑖𝑛𝑅𝑎𝑡𝑖𝑜 = =
Question: 𝑆𝑝𝑙𝑖𝑡𝐼𝑛𝑓𝑜 σ𝑘𝑖=1 𝑤𝑖 log 2 𝑤𝑖
Can Entropy be greater than 1?

https://towardsdatascience.com/entropy-how-decision-trees-make-decisions-2946b9c18c8
𝑝1 = 7ൗ13 CART Algorithm
𝑝2 = 6ൗ13 ◦ Another measurement for measuring the
purity or the actually impurity, which is used
by CART algorithm, is called the Gini index.
◦ Split criterion in CART algorithm:
𝑛
𝐺𝑖𝑛𝑖𝑖𝑛𝑑𝑒𝑥 = ෍ 𝑤𝑖 𝐺𝑖𝑛𝑖𝑖𝑚𝑝𝑢𝑟𝑖𝑡𝑦𝑖
𝑖=1

◦ Gini index is designed based on Gini

impurity:
𝐺𝑖𝑛𝑖𝑖𝑚𝑝𝑢𝑟𝑖𝑡𝑦 𝑝 = 1 − σ𝑛𝑖=1 𝑝𝑖2 for 𝑝 ∈ ℚ𝑛

◦ In this case:
6 7
𝐺𝑖𝑛𝑖𝑖𝑛𝑑𝑒𝑥 = 𝐺𝑖𝑛𝑖𝑖𝑚𝑝𝑢𝑟𝑖𝑡𝑦1 + 𝐺𝑖𝑛𝑖𝑖𝑚𝑝𝑢𝑟𝑖𝑡𝑦2
13 13
72 62
𝐺𝑖𝑛𝑖𝑖𝑚𝑝𝑢𝑟𝑖𝑡𝑦 𝑝 =1− 2− = 0.497
12 52 2 2 52 13 132
𝐺𝑖𝑛𝑖𝑖𝑚𝑝𝑢𝑟𝑖𝑡𝑦1 = 1 − 2 − 2 = 0.278 𝐺𝑖𝑛𝑖𝑖𝑚𝑝𝑢𝑟𝑖𝑡𝑦2 = 1 − 2 − 2 = 0.592
6 6 7 7 ◦ Next splitting feature:
𝑤1 = 6Τ13 = 0.462 𝑤2 = 7ൗ13 = 0.539 Feature with the lowest 𝐺𝑖𝑛𝑖𝑖𝑛𝑑𝑒𝑥

https://www.knime.com/knime-introductory-course/chapter6/section3/decision-tree
Gini Index Calculation Example
S/N Gender Class Play Computer Games ◦ Assume that we want to segregate students based on
1 M IV Yes an specific target variable (playing computer games).
2 F IV Yes
3 F IV Yes ◦ We split the population using two input variables:
4 M IV No Gender and Class.
5 F IV No
6 M IV Yes ◦ Now, we want to identify which split is producing more
7 M IV No
homogeneous sub-nodes using Gini index.
8 F V No
9 M V Yes
10 F V No
11 M V Yes
12 F V No
13 F V No

Total Observations = 13
Play Computer Games = 6 (46%)

15
S/N Gender Class Play Computer Games Split on Gender Split on Class
1 M IV Yes Total Observations = 13
2 F IV Yes
Play Computer Games = 6 (46%)
3 F IV Yes
4 M IV No
5 F IV No
6 M IV Yes
7 M IV No
Male Female Class IV Class V
8 F V No
9 M V Yes
10 F V No
11 M V Yes
Students = 6 Students = 7 Students = 7
12 F V No Students = 6
Play = 4 (67%) Play = 2 (29%) Play = 4 (57%)
13 F V No Play = 2 (33%)
𝐺𝑖𝑛𝑖𝑖𝑚𝑝𝑢𝑟𝑖𝑡𝑦𝑀 = 1 − 0.672 + 0.332 = 0.44 𝐺𝑖𝑛𝑖𝑖𝑚𝑝𝑢𝑟𝑖𝑡𝑦𝐼𝑉 = 1 − 0.572 + 0.432 = 0.49
𝐺𝑖𝑛𝑖𝑖𝑚𝑝𝑢𝑟𝑖𝑡𝑦𝐹 = 1 − 0.292 + 0.712 = 0.41 𝐺𝑖𝑛𝑖𝑖𝑚𝑝𝑢𝑟𝑖𝑡𝑦𝑉 = 1 − 0.332 + 0.662 = 0.46
6 7 7 6
𝐺𝑖𝑛𝑖𝑖𝑛𝑑𝑒𝑥 = × 0.44 + × 0.41 = 0.42 𝐺𝑖𝑛𝑖𝑖𝑛𝑑𝑒𝑥 = × 0.49 + × 0.46 = 0.48
13 13 13 13
Better Split

GINI INDEX CALCULATION EXAMPLE

16
Tree Construction Summary

Which question to ask?

How much a question helps to unmix the labels?

Quantify the amount of uncertainty at a single node using the Gini impurity metric.

Quantify how much a question reduces the uncertainty using Information Gain.

Repeating to ask questions until the data can not be divided further.

18
General Steps for Tree Construction
Start

Yes Is the data No Determine

pure? potential splits.

Create the Determine the

Split the data.
leaf. best split.

Stop
≤ Splitting >
criterion

https://www.youtube.com/watch?v=u4kbPtiVVB8
𝐺𝑖𝑛𝑖𝑖𝑚𝑝𝑢𝑟𝑖𝑡𝑦 𝑝 = 1 − σ𝑛𝑖=1 𝑝𝑖2 for 𝑝 ∈ ℚ𝑛

𝑛
Numerical
𝐺𝑖𝑛𝑖𝑖𝑛𝑑𝑒𝑥 = ෍ 𝑤𝑖 𝐺𝑖𝑛𝑖𝑖𝑚𝑝𝑢𝑟𝑖𝑡𝑦𝑖 Feature
Handling
𝑖=1

◦ Is the impurity equal

to 0 a good thing in all
cases?
◦ If a cluster only
contains a single data
(having one branch for
each possible
numerical value),
there is no meaning
for having the cluster.

https://www.knime.com/knime-introductory-course/chapter6/section3/decision-tree
Numerical
Feature
Handling
◦ Solution: Binary splits
◦ In case of
numerical features,
the split is always a
binary split.

◦ Candidate split points:

◦ Mid points
between
consecutive
values.

https://www.knime.com/knime-introductory-course/chapter6/section3/decision-tree
Right size
of the Tree?
◦ Like many other
machine learning
algorithms, Decision
Trees are subject to
potentially overfitting
the training data.
◦ Too shallow
◦ Too simple model that
don’t fit the data.

◦ Too deep
◦ Too detailed and don’t
generalize on new
data.

◦ Solution:
Pruning.

https://www.knime.com/knime-introductory-course/chapter6/section3/decision-tree
Pruning
◦ Goal
◦ Tree that generalizes better.

◦ Idea
◦ Cut branches that seems as a result from
overfitting the training set.

◦ Common pruning methods

◦ Reduced Error Pruning
◦ Prunes a branch if this doesn’t decrease the
accuracy for the training set.
◦ Minimum Description Length Pruning
◦ Prunes a branch if this decreases the
description length (#bits (tree) + #bits
(misclassified samples)) for the training set.

The description length is defined as the number of bits

that are needed to encode a tree plus the number of bits
to encode the samples of the training set that have been
misclassified. 23
Pre-pruning: Post-pruning:
• Limit the minimum samples in a group. • Let it grow first and then prune
• Limit the depth of the tree. it back. https://www.knime.com/knime-introductory-course/chapter6/section3/decision-tree
ENSEMBLE LEARNING
24
Ensemble
Learning
◦ Why is it necessary to ensemble?
◦ To train a super strong classifier is relatively
difficult comparing to train multiple weak
classifiers.
◦ The final result can be obtained by voting or by
taking the average of the output.

◦ Commonly used methods:

◦ Bagging (Bootstrapping)
◦ Boosting
◦ Stacking

https://becominghuman.ai/ensemble-learning-bagging-and-boosting-d20f38be9b1e
Bootstrapping

• Random sampling with replacement.

Bootstrap
refers to

• To better understand the bias, mean, the variance, and

the standard deviation from the dataset.
Strong
points

• Random sampling of small subset of data from the

dataset.
• The subset can be replaced.
Key • The selection of all examples in the dataset has equal
operations probability.

https://becominghuman.ai/ensemble-learning-bagging-and-boosting-d20f38be9b1e
Bootstrapping (2)
◦ Assume we have a set of data 𝑋 = 𝑥1 , 𝑥2 , ⋯ , 𝑥𝑛
containing 𝑛 values, where 𝑛 is not a relatively large
number.
◦ If we simply calculate the mean, standard
deviation, etc. by taking the whole dataset into
account, the result can be biased from the actual
situation.

σ𝑛
𝑖=1 𝑥𝑖
◦ 𝑚𝑒𝑎𝑛 𝑋 =
𝑛

◦ We know that our sample is small and thus the

mean has error in it.
◦ The estimation can be improved by bootstrapping.

https://becominghuman.ai/ensemble-learning-bagging-and-boosting-d20f38be9b1e
Bootstrapping (3)
• Create many (.e.g. 𝑚) random sub-
samples from the dataset with
Create replacement (meaning we can select
Subsamples the same value multiple times).

• Calculate the mean of each

Mean sub-sample.
Calculation

• Calculate the average of all

collected means and use them as
Ensemble the estimated mean for the data.
Results

https://becominghuman.ai/ensemble-learning-bagging-and-boosting-d20f38be9b1e
Bootstrapping Application Example
◦ Assume we have a small sample dataset: 𝑋 = 13, 21, 17, 22, 4, 5, 9, 28 extracted from the population
𝑋෠ = 17, 10, 6, 13, 9, 12, 28, 13, 0, 4, 21, 5, 22, 9, 13, 4, 21, 16, 3, 22 .
σ𝑛 𝑥
◦ You’ll find that the mean value of this dataset is 𝑚𝑒𝑎𝑛 𝑋 = 𝑖=1 𝑖 , where 𝑛 = 8 in this case and
𝑛
𝑚𝑒𝑎𝑛 𝑋 = 14.88
◦ To calibrate the mean value, we can randomly take 𝑚 sub-samples. Let’s say 𝑚 = 5 as an example.

13 + 22 + 4 + 28 67
𝑠1 = 𝑚𝑒𝑎𝑛 𝑋1 = = = 16.75
𝑋1 = 13, 22, 4, 28 4 4
13 + 21 + 5 + 9 + 28 76
𝑋2 = 13, 21, 5, 9, 28 𝑠2 = 𝑚𝑒𝑎𝑛 𝑋2 = = = 15.2
Find the means 5 5
17 + 4 + 9 30
𝑋3 = 17, 4, 9 𝑠3 = 𝑚𝑒𝑎𝑛 𝑋3 = = = 10
3 3
𝑋4 = 21, 22, 5, 28 21 + 22 + 5 + 28 76
𝑠4 = 𝑚𝑒𝑎𝑛 𝑋4 = = = 19
𝑋5 = 13, 17, 4, 9 4 4
13 + 17 + 4 + 9 43
𝑠5 = 𝑚𝑒𝑎𝑛 𝑋5 = = = 10.75
4 4
◦ Ensemble the final results:
σ𝑚
𝑗=1 𝑠𝑗 16.75 + 15.2 + 10 + 19 + 10.75 71.7
Calibrated Mean = = = = 14.34
𝑚 5 5
Population Mean = 12.4
Is this always 29
the case?
Bagging
◦ Bagging is also known as Bootstrap
Aggregation.
◦ A simple but a powerful ensemble method.
◦ It is typically an application of the Bootstrap
procedure to decision tree, which is a high-
variance machine learning algorithm.
◦ In bagging, training instances can be
sampled several times for the same
predictor.

https://medium.com/machine-learning-through-visuals/machine-learning-through-visuals-part-1-what-is-bagging-ensemble-learning-432059568cc8
Bagging (2)
◦ Moving on to bagging ensemble learning,
the output of each predictor (classifier) is
summarised by the aggregation function.
◦ The aggregation function is typically the
statistical mode for classification and
average for regression.
◦ It helps to reduce both bias and variance.

https://medium.com/machine-learning-through-visuals/machine-learning-through-visuals-part-1-what-is-bagging-ensemble-learning-432059568cc8
Boosting
◦ In boosting algorithms, they utilise the
weighted averages to mark weak learners
into strong learners.
◦ Different from bagging, which runs each
model independently before aggregating the
outputs at the end without preference to any
model, boosting is all about “teamwork.”
◦ Each model that runs in boosting, dictates
what features the next model will focus on.

https://becominghuman.ai/ensemble-learning-bagging-and-boosting-d20f38be9b1e
Boosting
◦ Data points
◦ + and – samples are equally weighted at the
beginning. (The weight is denoted by the size of
the symbols.)

◦ Box1
◦ The decision stump (D1) generates a vertical
line to classify the data points.
◦ For the three + on the right side, higher weights
are assigned to those + on the right side and
the classification process is handed over to D2.

◦ Box2
◦ Since there are three + having higher weights,
the second decision strum (D2) will try to
predict them correctly.
◦ A vertical line on the right side of box2 has
classified the three mis-classified + correctly.
◦ But again, it causes another mis-classification
errors on the three – on the left side. Thus,
those three – would gain weight in D3.

https://becominghuman.ai/ensemble-learning-bagging-and-boosting-d20f38be9b1e
Boosting
◦ Box3
◦ Since the three mis-classified – are given higher
weights, a decision stump (D3) is applied to
predict these mis-classified observation
correctly.
◦ This time, a horizontal line is generated to
classify + and – based on higher weight of mis-
classified observations.

◦ Box4
◦ By combining D1, D2, and D3, a strong
predictor, which has complex rules as
compared to individual weak learner, is
generated.

https://becominghuman.ai/ensemble-learning-bagging-and-boosting-d20f38be9b1e
Stacking
◦ Stacking often
considers
heterogeneous weak
learners, learns them
in parallel and
combines them by
training a meta-model
to output a prediction
based on the different
weak models
predictions.
◦ In recent years, a
more modern
approach is to create
an ensemble of a well-
chosen collection of
strong yet diverse
models in stacking.

https://www.researchgate.net/profile/Mahsa_Soufineyestani/publication/326224119/figure/download/fig1/AS:645263400112129@1530854191613/Stacking-model-1-S1.png
Stacking
◦ On the popular data science competition
site Kaggle (https://www.kaggle.com/), you
can explore numerous winning solutions
through its discussion forums to get a
flavour of the state of the art.
◦ Another popular data science competition is
the KDD Cup (http://www.kdd.org/kdd-cup).
The figure shown on the left is the winning
solution for the 2015 competition, which
used a three-stage stacked modelling
approach.

https://blogs.sas.com/content/subconsciousmusings/2017/05/18/stacked-ensemble-models-win-data-science-competitions/
RANDOM FOREST
37
Using Enhanced Multiclass object detection Random Forest is used in
Theromatic Mapping (ETM) is done using Random the KINECT game console.
devices to acquire images Forest algorithms
of the earth’s surface.
The accuracy is higher while Provides better detection in Tracks body movements and
the training time is less complicated environments recreates it in the game
comparing to other ML
methods.

◦ Remote Sensing
Where to use ◦ Object Detection

Random Forest? ◦ Gamming console (Object Detection and Tracking)

38
Application of Random Forest

Training set for Mapping

identifying body parts Classifier Learning Identify body parts
39

http://x-tech.am/wp-content/uploads/2013/10/blog11-884x442.jpg
Where does Random Forest fit in Data
Science?
Machine
Learning

Supervised Unsupervised Reinforcement

Learning Learning Learning

Classification Regression

Random
Forest
40
Why using Random Forest?
No Overfitting

• Use of multiple trees to reduce the risk of overfitting.

• Less training time.

High Accuracy

• Runs efficiently on large dataset.

• Produces highly accurate predictions in large dataset.

Estimates missing data

• Random Forest can maintain accuracy when a large proportion of data is

missing.

41
Tree1 Tree2 Tree3
What is Random
Forest?
Output 1 Output 2 Output 3
Definition

• Random Forest or Random

Decision Forest is a method
that operates by constructing
multiple Decision Trees during
Random
Majority training phases.
Forest
Operation

• It’s an application of stacking.

Final Decision

42
HOW DOES RANDOM FOREST
WORK?
43
Let’s Review How Decision Tree
Works, First.

Classification Criteria Training Sample

S/N Conditions Colour Diameter Shape Label
1 Colour == Brown? White 5 Round Golf ball
2 Diameter < 8? White 5 Round Golf ball
Football 3 Colour == White? Brown 30 Round Basketball
4 Diameter == 30? Brown 32 Oval Football
Basketball Brown
White
32
5
Oval
Round
Football
Golf ball
White 5 Round Golf ball
Golf ball Brown 30 Round Basketball
Brown 30 Round Basketball
Brown 32 Oval Football

44
Let’s Review How Decision Tree
Works, First.
7 7 4 4 5 5
Η0 𝑝 = − σ3𝑖=1 𝑝𝑖 log 2 𝑝𝑖 = − log 2 − log 2 − log 2 = 1.55 > 𝟏
16 16 16 16 16 16

Is the diameter <= 8?

True False
S/N Conditions
1 Colour == Brown?
2 Diameter < 8?
3 Colour == White?
4 Shape == Oval?

Colour Diameter Shape Label

White 5 Round Golf ball
White 5 Round Golf ball
Brown 30 Round Basketball
Brown 32 Oval Football
Brown 32 Oval Football 1
7 7 4 4 5 5
White 5 Round Golf ball Η11 𝑝 = − ෍ 𝑝𝑖 log2 𝑝𝑖 = − log 2 = 0 Η12 𝑝 = − log2 − log 2 = 0.99
White 5 Round Golf ball 7 7 9 9 9 9
𝑖=1
Brown 30 Round Basketball
Brown 30 Round Basketball
Brown 32 Oval Football 45
Let’s Review How
Decision Tree Works, Is the diameter <= 8?

First. True False

Shape == Oval?
S/N Conditions
True False
1 Colour == Brown?
2 Diameter < 8?
3 Colour == White?
4 Shape == Oval?

Colour Diameter Shape Label

White 5 Round Golf ball
White 5 Round Golf ball
Brown 30 Round Basketball
Brown 32 Oval Football
Brown 32 Oval Football
White 5 Round Golf ball Now we have 3 baskets, whose
White 5 Round Golf ball entropies are all 0, with 100%
Brown 30 Round Basketball
Brown 30 Round Basketball prediction accuracy.
Brown 32 Oval Football 46
How Does Random Forest Work?
Tree 2

Colour == Brown?

True False

S/N Conditions
1 Colour == Brown?
2 Diameter < 8?
3 Colour == White?
4 Shape == Oval? Shape == Oval?

True False

47
How Does Random Forest Work?
Tree 3

Colour == White?

True False

S/N Conditions
1 Colour == Brown?
2 Diameter < 8?
3 Colour == White?
4 Shape == Oval?
Shape == Oval?

True False

48
How Does Random Forest Work?
◦ Assume we have a new test data with no colour information.
◦ Let the default answer to be false if there is no judge on the particular classifying criterion.
◦ Let’s guess what this input will be.

49
How Does Random Forest Work?
Tree 1 Tree 2 Tree 3

Is the diameter <= 8? Colour == Brown? Colour == White?

True False True False True False

Shape == Oval? Shape == Oval? Shape == Oval?

True False True False True False

Majority Vote

50
• Entropy • Bagging • Definition
Decision Tree

Random Forest
Ensemble Learning
• Information • Boosting • Majority
Gain • Stacking Vote
• Gini Index
• Creation
• Pruning

WEEK 09 SUMMARY

Chapter 3
No ratings yet
Chapter 3
88 pages
Decision Tree
No ratings yet
Decision Tree
19 pages
Classification - Decision Trees
No ratings yet
Classification - Decision Trees
43 pages
DECISION TREES-jb
No ratings yet
DECISION TREES-jb
8 pages
ML-chap9 2024 110217
No ratings yet
ML-chap9 2024 110217
52 pages
Classification With Decision Trees: Instructor: Qiang Yang
100% (1)
Classification With Decision Trees: Instructor: Qiang Yang
62 pages
Unit-3 ML
No ratings yet
Unit-3 ML
47 pages
Lecture 4
No ratings yet
Lecture 4
74 pages
Experiment No-2
No ratings yet
Experiment No-2
4 pages
MODULE 4-Dr - GM
No ratings yet
MODULE 4-Dr - GM
23 pages
Decision Tree Algorithms Guide
No ratings yet
Decision Tree Algorithms Guide
49 pages
ML Unit 3
No ratings yet
ML Unit 3
22 pages
Classification Algorithms
No ratings yet
Classification Algorithms
31 pages
Decistion Tree
No ratings yet
Decistion Tree
27 pages
Decision Tree
100% (4)
Decision Tree
66 pages
Unit 1 ML (NN& ML Techniques)
No ratings yet
Unit 1 ML (NN& ML Techniques)
40 pages
Decision Trees for Business Analysts
No ratings yet
Decision Trees for Business Analysts
32 pages
Classification Algorithms: Inteligência Artificial E Cibersegurança (Inacs)
No ratings yet
Classification Algorithms: Inteligência Artificial E Cibersegurança (Inacs)
60 pages
Clase12 13
No ratings yet
Clase12 13
15 pages
Decision Tree
No ratings yet
Decision Tree
35 pages
Decision Tree
No ratings yet
Decision Tree
8 pages
Decision Tree Induction Basics
No ratings yet
Decision Tree Induction Basics
55 pages
Unit 1 ML (DT)
No ratings yet
Unit 1 ML (DT)
24 pages
Decision-Tree Learning .
No ratings yet
Decision-Tree Learning .
29 pages
Act 9
No ratings yet
Act 9
22 pages
Unit-4 (1) .Docx ML
No ratings yet
Unit-4 (1) .Docx ML
42 pages
Decision Tree
No ratings yet
Decision Tree
5 pages
ML Unit-3
No ratings yet
ML Unit-3
92 pages
Unit IV Decision Trees
No ratings yet
Unit IV Decision Trees
37 pages
Decision Trees
No ratings yet
Decision Trees
13 pages
Tree Based Algorithms in Machine Learning
No ratings yet
Tree Based Algorithms in Machine Learning
8 pages
ML Unit 2 Final - III Yr
No ratings yet
ML Unit 2 Final - III Yr
72 pages
3 Decision Trees - LMS
No ratings yet
3 Decision Trees - LMS
47 pages
ML Unit 3 - Questions
No ratings yet
ML Unit 3 - Questions
7 pages
Decision Tree
No ratings yet
Decision Tree
43 pages
07.2.decision Trees
No ratings yet
07.2.decision Trees
33 pages
L04 Decision Trees
No ratings yet
L04 Decision Trees
34 pages
Unit6 - 2 Classification-Decision-Trees
No ratings yet
Unit6 - 2 Classification-Decision-Trees
36 pages
7 DecisionTree
No ratings yet
7 DecisionTree
58 pages
MLT UNIT-3 Notes
No ratings yet
MLT UNIT-3 Notes
35 pages
Decision Tree
No ratings yet
Decision Tree
34 pages
Decision Tree Learning Guide
No ratings yet
Decision Tree Learning Guide
33 pages
UNIT - 3 ML
No ratings yet
UNIT - 3 ML
24 pages
06-Classification Part1
No ratings yet
06-Classification Part1
44 pages
Data Science Lectures 3
No ratings yet
Data Science Lectures 3
46 pages
Decision Tree
No ratings yet
Decision Tree
36 pages
Decision Tree Basics for Data Scientists
No ratings yet
Decision Tree Basics for Data Scientists
61 pages
Aiml Easy Solution
No ratings yet
Aiml Easy Solution
70 pages
Decision Tree Algorithm
No ratings yet
Decision Tree Algorithm
18 pages
Decision Trees
No ratings yet
Decision Trees
45 pages
Decision Tree in Machine Learning
No ratings yet
Decision Tree in Machine Learning
11 pages
15 1 Random Forest and Decision Tree
No ratings yet
15 1 Random Forest and Decision Tree
66 pages
Lecture 4
No ratings yet
Lecture 4
74 pages
Unit 3.2 Decision Tree Algorithm Wit Examples
No ratings yet
Unit 3.2 Decision Tree Algorithm Wit Examples
85 pages
Lecture 5 DecisionTree
No ratings yet
Lecture 5 DecisionTree
21 pages
6 DecisionTrees ID3 CART
No ratings yet
6 DecisionTrees ID3 CART
24 pages
Unit 3 Notes
No ratings yet
Unit 3 Notes
25 pages
Decision Trees for Data Scientists
No ratings yet
Decision Trees for Data Scientists
30 pages
فاينل تعلم
No ratings yet
فاينل تعلم
144 pages
Snow Leopard: Fast Facts
No ratings yet
Snow Leopard: Fast Facts
2 pages
Geographic Distribution
No ratings yet
Geographic Distribution
27 pages
OCR A - Data - Sheet
No ratings yet
OCR A - Data - Sheet
7 pages
Ssesp Instruction 2023 2
No ratings yet
Ssesp Instruction 2023 2
1 page
- Chương 4 - Lý thuyết hành vi NSX
No ratings yet
- Chương 4 - Lý thuyết hành vi NSX
83 pages
ACCESS?query2 Achar Endereços de Pessoas Pelo CPF de Gratis
No ratings yet
ACCESS?query2 Achar Endereços de Pessoas Pelo CPF de Gratis
475 pages
Iit Ashram: Class: 9 - Gseb Science Board Pattern Test-1
No ratings yet
Iit Ashram: Class: 9 - Gseb Science Board Pattern Test-1
3 pages
TOSUN New Energy Devices 2024 2025
No ratings yet
TOSUN New Energy Devices 2024 2025
18 pages
Report Card-768984
No ratings yet
Report Card-768984
6 pages
Forensic Audit Essay
No ratings yet
Forensic Audit Essay
4 pages
Final SRI HARI SOP UCD University
No ratings yet
Final SRI HARI SOP UCD University
2 pages
Class Test - 3 (Smart Test) For Class - X of SDA School. - Google Forms
No ratings yet
Class Test - 3 (Smart Test) For Class - X of SDA School. - Google Forms
9 pages
BN-DS-C58 90 Degree Three Weld Long Radius Miter Elbow
No ratings yet
BN-DS-C58 90 Degree Three Weld Long Radius Miter Elbow
7 pages
Chemostat PDF
No ratings yet
Chemostat PDF
23 pages
Mid Term-Ish Study Guide
No ratings yet
Mid Term-Ish Study Guide
3 pages
ĐỀ TRƯỜNG THPT CHUYÊN QUANG TRUNG - BÌNH PHƯỚC
No ratings yet
ĐỀ TRƯỜNG THPT CHUYÊN QUANG TRUNG - BÌNH PHƯỚC
7 pages
Cleanroom Lighting Essentials
100% (1)
Cleanroom Lighting Essentials
3 pages
UOW Engineering UG 2024 - 25-Digital
No ratings yet
UOW Engineering UG 2024 - 25-Digital
44 pages
Pro Finish Operating Manual 11 13 15
No ratings yet
Pro Finish Operating Manual 11 13 15
57 pages
L7.3. CH A HSG 20
No ratings yet
L7.3. CH A HSG 20
14 pages
Characterstics of Service Marketing
No ratings yet
Characterstics of Service Marketing
8 pages
Pharmacy Practice For Technicians, 5th Edition, by Hanan, ISBN-10: 1133132766
No ratings yet
Pharmacy Practice For Technicians, 5th Edition, by Hanan, ISBN-10: 1133132766
408 pages
Publication by DR Maheshari
No ratings yet
Publication by DR Maheshari
8 pages
Logic Assignment
No ratings yet
Logic Assignment
8 pages
CI - CD Project With Github-Jenkins-Sonarqube-Docker
No ratings yet
CI - CD Project With Github-Jenkins-Sonarqube-Docker
10 pages
Lampara de Cirugia Aim Led Philips Burton
No ratings yet
Lampara de Cirugia Aim Led Philips Burton
4 pages
RTS-9-Some Applications of Trigonometry (O.P. GUPTA)
No ratings yet
RTS-9-Some Applications of Trigonometry (O.P. GUPTA)
5 pages
Training Module
No ratings yet
Training Module
49 pages
Esm12crr b3 Um Manul Guide
No ratings yet
Esm12crr b3 Um Manul Guide
22 pages
Managing Turnaround Strategies
No ratings yet
Managing Turnaround Strategies
12 pages

COS10022 DSP Week05 Decision Tree and Random Forest

Uploaded by

COS10022 DSP Week05 Decision Tree and Random Forest

Uploaded by

COS10022 – DATA SCIENCE PRINCIPLES

Dr Pei-Wei Tsai (Lecturer, Unit Convenor)

Terminal Node (Leaf)

Decision Tree (2) Sunny

How does the

where Η (Greek capital ◦ Mathematically, it is defined as the

◦ Entropy reduces when the data in a set is more pure.

◦ Gini index is designed based on Gini

GINI INDEX CALCULATION EXAMPLE

Which question to ask?

How much a question helps to unmix the labels?

Yes Is the data No Determine

Create the Determine the

◦ Is the impurity equal

◦ Candidate split points:

◦ Common pruning methods

The description length is defined as the number of bits

◦ Commonly used methods:

• Random sampling with replacement.

• To better understand the bias, mean, the variance, and

• Random sampling of small subset of data from the

◦ We know that our sample is small and thus the

• Calculate the mean of each

• Calculate the average of all

Random Forest? ◦ Gamming console (Object Detection and Tracking)

Training set for Mapping

Supervised Unsupervised Reinforcement

• Use of multiple trees to reduce the risk of overfitting.

• Runs efficiently on large dataset.

Estimates missing data

• Random Forest can maintain accuracy when a large proportion of data is

• Random Forest or Random

• It’s an application of stacking.

Classification Criteria Training Sample

Is the diameter <= 8?

Colour Diameter Shape Label

First. True False

Colour Diameter Shape Label

Is the diameter <= 8? Colour == Brown? Colour == White?

Shape == Oval? Shape == Oval? Shape == Oval?

You might also like