0% found this document useful (0 votes)

4 views10 pages

Decision Tree

The document provides an overview of Decision Trees, a machine learning model that uses a flow diagram structure with nodes representing feature tests and branches indicating test results. It discusses the advantages of Decision Trees, particularly their interpretability, and details algorithms for constructing them, including ID3, C4.5, and CART. Additionally, it explains the process of building a binary tree using CART, including methods for selecting features and calculating information gain.

Uploaded by

erlanderrrachmad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views10 pages

Decision Tree

Uploaded by

erlanderrrachmad

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

SCMA801204 – Advanced Machine Learning

Decision Tree

Dr. rer. nat. Hendri Murfi

Machine Learning Group

Department of Mathematics, Universitas Indonesia – Depok 16424
Telp. +62-21-7862719/7863439, Fax. +62-21-7863439, Email. hendri@ui.ac.id

Decision Tree
Description

• A Decision Tree is a model like a

flow diagram consisting of nodes
and branches

• Nodes represent testing of a

particular feature

• Branches represent the results of

those tests

• Each leaf node represents a class or

value

• The path from root to leaf states the

rules
2
Decision Tree
Advantage

• Decision Tree is a machine

learning model that is white box.
It means the knowledge
extracted by the model is in the
form of rules that can be
explained easily with Boolean
logic (If … Then …).

• Meanwhile, other machine

learning models are black boxes,
that is, the knowledge extracted
by the model is difficult to
interpret.

Decision Tree
Algoritm

• Iterative Dichotomiser 3 (ID3) is an algorithm for building a Decision Tree

in the form of a multiway tree. For each node, the algorithm looks for
categorical features that will produce the largest information gain
− C4.5 is a development of ID3 which removes the restriction that
features must be categorical by defining a discrete feature obtained by
partitioning a continuous feature into several intervals
− The latest version of development currently is C5.0

• Classification And Regression Tree (CART) is an algorithm likes C4.5, but

also supports continuous targets (regression problems) and is a binary tree.
For each node, the algorithm looks for features and threshold values that
will produce the largest information gain

4
CART
Problem Formulation

Given training data {xn, tn}, n = 1 to N

where tn is a label
• Problem:
How to build a binary tree from
training data, so that training data
with the same class or value will
be grouped together
• Method:
At each node, it selects features
and threshold values that create
the purest partition of the data,
i.e., data with the same class or
value is grouped together

CART
Algorithm

• For example, Q is the data set at a node,  = (j, a) is the j-th feature
candidate set and the threshold values a, Qleft( ) and Qright( ) are the
data in the left and right partitions of the threshold value, namely:
Qleft( ) = {(x,t) | x(j) ≤ a}
Qright( ) = Q - Qleft( )
• So, the purity of a node is calculated based on an impurity function H()
and information gain G(), namely
𝑁 𝑁
𝐺 𝑄, 𝜃 = 𝐻 𝑄 𝜃 + 𝐻 𝑄 𝜃
𝑁 𝑁
where 𝑁 is the amount of data on the node, 𝑁 and 𝑁 is the
amount of data on the left and right partitions

6
CART
Algorithm

• Thus, the selection of features and threshold values that create

the purest data partition for a node can be expressed as an
optimization problem as follows:

𝜃 ∗ = min 𝐺 𝑄, 𝜃

• The process of selecting features and threshold values is

carried out recursively for the Qleft(*) and Qright(*) subsets
until the stopping criteria are met, for example the maximum
tree depth is reached or the minimum amount of data at a node
(N) is reached

CART
Algorithm

• Given data S = {xm, tm}, m=1 .. M and tn  {0, 1, 2, ..., k}. There are several
functions for calculating purity in classification problems, namely:
 Gini 𝐻 𝑆 = ∑ 𝑝 (1 − 𝑝 )
 Entropy 𝐻 𝑆 = − ∑ 𝑝 𝑙𝑜𝑔(𝑝 )
 Misclassification 𝐻 𝑆 = 1 − 𝑚𝑎𝑥(𝑝 )
𝑝 is the probability of class k at that node and
1
𝑝 = 𝐼(𝑡 = 𝑘)
𝑀

where 𝐼(𝑡 = 𝑘) is an indicator function having value of 1 if 𝑡 = 𝑘

and 0 for others

8
CART
Algoritma

• Given data S = {xm, tm}, m=1 .. M and tn  {0, 1, 2, ..., k}. There are several
functions for calculating purity in regression problems, namely :
 Mean Square Error
𝐻 𝑆 = ∑ (𝑡 − 𝑡̅)

 Mean Absolute Error

𝐻 𝑆 = ∑ 𝑡 − 𝑡̅

where 𝑡̅ = ∑ 𝑡

CART
Illustration

Given seven observations of mortality rates with five features related to air
quality for several regions in England in 2007-2012. Which feature will be
selected as the root?

Obs O3 PM10 PM2.5 NO2 T2M Mortality Rate

1 22,251 20,447 8,891 28,858 274,723 1,208
2 30,977 15,687 7,250 27,145 276,907 1,498
3 46,849 13,461 4,166 16,238 280,981 1,387
4 64,451 7,447 2,109 5,192 277,892 1,342
5 59,234 7,281 2,781 7,062 278,626 1,431
6 42,514 8,020 4,256 16,457 279,427 1,319
7 30,100 15,848 6,924 23,323 279,413 1,252
Source: K. C. Dewi. Analisis Akurasi Model Random Forest untuk Big Data – Studi Kasus Prediksi Klaim Severity pada Asuransi
Mobil. Tesis. Program Magister Matematika, Departemen Matematika FMIPA UI, 2019
10
CART
Illustration

• There are five features that can be candidates as root nodes.

For problems with a very large number of features, feature
candidates are often selected randomly from a subset of
features
• The selection of the threshold value can be done randomly
from the feature values. In addition, selecting a threshold
value can be done by finding the midpoint of the feature value
(Cutler, Cutler, & Stevens, 2012)

CART
Illustration

For example, for feature O3 and the threshold value 22.251 or  = {O3, 22,251}

Mortality
Obs O3 PM10 PM2.5 NO2 T2M
Rate
O3 ≤ 22,251
1 22,251 20,447 8,891 28,858 274,723 1,208

2 30,977 15,687 7,250 27,145 276,907 1,498 Yes No

3 46,849 13,461 4,166 16,238 280,981 1,387

64,451 7,447 2,109 5,192 277,892 1,342 1,498
4
1,208 1,387
5 59,234 7,281 2,781 7,062 278,626 1,431 1,342
1,431
6 42,514 8,020 4,256 16,457 279,427 1,319
1,319
7 30,100 15,848 6,924 23,323 279,413 1,252 1,252

12
CART
Illustration

So, the impurity of each branch  = {O3, 22,251} is follows:

1 O3 ≤ 22,251
𝐻 𝑄 (𝜃) = 𝑦 −𝑦
𝑛

= 1,208 − 1,208 Yes No

=0
1,498
1 1,208 1,387
𝐻 𝑄 (𝜃) = 𝑦 −𝑦 1,342
𝑛
1,431
= 1,498 − 1,3715 + 1,387 − 1,3715 + 1,342 − 1,3715 + 1,319
1,431 − 1,3715 + 1,319 − 1,3715 + 1,252 − 1,3715 1,252
= 0,0376895
= 0,0063

CART
Illustration

• So, the information gain of  = {O3, 22,251} is follows:

𝐺 𝑄, 𝜃 = 𝐻 𝑄 (𝜃) + 𝐻 𝑄 (𝜃)
𝒎 𝒎

= 0 + 0,0063
= 0,0054

• Using the same procedures, find the information gain for the feature
PM10, PM2.5, NO2, and T2M.

14
CART
Illustration

PM10≤ 15,687 PM2.5 ≤ 4,166 NO2≤ 5,192

Yes No Yes No Yes No

1,498 1,387
1,208 1,208 1,208
1,387 1,342
1,252 1,498 1,342 1,498
1,342 1,431
1,319 1,387
1,431 1,252 1,431
1,319 1,319
1,252
T2M≤ 278,626

Yes No For example, for each feature, the chosen threshold

value is follows:  = {PM10, 15,687},  = {PM2.5,
1,208 1,387 4,166},  = {NO2, 5,192},  = {T2M, 278,626}
1,498 1,319
1,342 1,252
1,431 15

CART
Illustration

• So, the information gain for all features 𝜃 is obtained as follows:

  = {O3, 22,251} is 0,0054
  = {PM10, 15,687} is 0,0031
  = {PM2.5, 4,166} is 0,0076
  = {NO2, 5,192} is 0,0087
  = {T2M, 278,626} is 0,0080
• Next, choose 𝜃 which minimizes the information gain or
𝜃 ∗ = 𝑚𝑖𝑛 𝐺 𝑄, 𝜃
• Because  = {PM10, 15,687} gives the smallest information gain value,
select the PM10 feature with a threshold value of 15,687 as the root node

16
CART
Illustration

PM10 ≤ 15,687
• Next, do the same steps to
Ya Tidak
find the binary separation 𝜃
on the samples contained in T2M ≤ 276,907
1,208
1,252
the left child node and the
Ya Tidak
right child node.
• Perform this step recursively 1,498 T2M ≤ 278,626
until the stopping criteria are
Tidak
met, for example, the Ya

minimum number of samples

1,431 PM2.5 ≤ 4,166
at a node (𝑁𝑚<= 2), or the
maximum tree depth Ya Tidak

1,387
1,319
1,342
17

CART
Practical Usage Tips

• The main parameter that needs to be optimized at the model

selection stage is the maximum tree depth (tree size)
• Other parameters that are often optimized are the minimum number
of samples so that a node can branch, and the minimum number of
samples in the leaves
• Some other practical usage tips can be seen at https://scikit-
learn.org/stable/modules/tree.html

18
References

C. H. Bishop (2006). Pattern Recognition and Machine Learning, Springer

(Chapter 14.3)

Decision Tree
No ratings yet
Decision Tree
15 pages
Unit IV Decision Trees
No ratings yet
Unit IV Decision Trees
37 pages
DECISION TREES-jb
No ratings yet
DECISION TREES-jb
8 pages
ML Unit 3
No ratings yet
ML Unit 3
22 pages
ML Unit 3 New
100% (1)
ML Unit 3 New
24 pages
ML Unit3
No ratings yet
ML Unit3
24 pages
Machine Learning: Decision Trees & Algorithms
No ratings yet
Machine Learning: Decision Trees & Algorithms
24 pages
Decision Trees
No ratings yet
Decision Trees
38 pages
TEAA - Tree Ensembles-1
No ratings yet
TEAA - Tree Ensembles-1
43 pages
MSCI331 Lecture9
No ratings yet
MSCI331 Lecture9
21 pages
Dadm s16 Cart
No ratings yet
Dadm s16 Cart
18 pages
Experiment No-2
No ratings yet
Experiment No-2
4 pages
ML Unit 03
No ratings yet
ML Unit 03
23 pages
Decision Trees: Make A Decision (Represent An Outcome
No ratings yet
Decision Trees: Make A Decision (Represent An Outcome
4 pages
Classification and Regression Trees
No ratings yet
Classification and Regression Trees
37 pages
6 DecisionTrees ID3 CART
No ratings yet
6 DecisionTrees ID3 CART
24 pages
Exercise 4
No ratings yet
Exercise 4
3 pages
CS467-M4-Machine Learning-Ktustudents - in
No ratings yet
CS467-M4-Machine Learning-Ktustudents - in
9 pages
Unit 3
No ratings yet
Unit 3
28 pages
Dinesh Kumar Indra Panwar Arjan Singh
No ratings yet
Dinesh Kumar Indra Panwar Arjan Singh
19 pages
ML Unit 3 Qa
No ratings yet
ML Unit 3 Qa
26 pages
CART: Theory & Applications
No ratings yet
CART: Theory & Applications
40 pages
Classification and Regression Trees CART
No ratings yet
Classification and Regression Trees CART
40 pages
Decistion Tree
No ratings yet
Decistion Tree
27 pages
ML Unit 3
No ratings yet
ML Unit 3
28 pages
Classification and Regression Trees
No ratings yet
Classification and Regression Trees
36 pages
ML Unit-3
No ratings yet
ML Unit-3
23 pages
AST Day 3 Slides
No ratings yet
AST Day 3 Slides
79 pages
Unit 2
No ratings yet
Unit 2
29 pages
L04 Decision Trees
No ratings yet
L04 Decision Trees
34 pages
STAT 451: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
No ratings yet
STAT 451: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
18 pages
Lecture 17 18
No ratings yet
Lecture 17 18
52 pages
Data Mining: Trees & Rules
No ratings yet
Data Mining: Trees & Rules
36 pages
Evaluating Model Accuracy and Bias-Variance Tradeoff
No ratings yet
Evaluating Model Accuracy and Bias-Variance Tradeoff
40 pages
ML CLASS 6 Decision Tree Algorithm
No ratings yet
ML CLASS 6 Decision Tree Algorithm
21 pages
2023-24 ML Notes 2
No ratings yet
2023-24 ML Notes 2
16 pages
Wk. 5.2. Decision Trees (27.10.2020)
No ratings yet
Wk. 5.2. Decision Trees (27.10.2020)
57 pages
Decision Trees: More Theoretical Justification For Practical Algorithms (Extended Abstract)
No ratings yet
Decision Trees: More Theoretical Justification For Practical Algorithms (Extended Abstract)
15 pages
Unit 1 ML (NN& ML Techniques)
No ratings yet
Unit 1 ML (NN& ML Techniques)
40 pages
AI&Ml-module 4 (Complete)
No ratings yet
AI&Ml-module 4 (Complete)
124 pages
AI&Ml-module 4 (Part 1)
No ratings yet
AI&Ml-module 4 (Part 1)
85 pages
Chapter 09 CART - N
No ratings yet
Chapter 09 CART - N
24 pages
Chapter 3
No ratings yet
Chapter 3
88 pages
5 1 Decision Trees
No ratings yet
5 1 Decision Trees
34 pages
HandsOnML Ch6E
No ratings yet
HandsOnML Ch6E
23 pages
Decision Trees and Random Forests
No ratings yet
Decision Trees and Random Forests
19 pages
08 Notes DecisionTrees RandomForest
No ratings yet
08 Notes DecisionTrees RandomForest
6 pages
Unit 1 ML (DT)
No ratings yet
Unit 1 ML (DT)
24 pages
PSR 0607 Chap10
No ratings yet
PSR 0607 Chap10
33 pages
Decision Tree
No ratings yet
Decision Tree
31 pages
Decision Tree Algorithms Guide
No ratings yet
Decision Tree Algorithms Guide
49 pages
ML Unit 3
No ratings yet
ML Unit 3
49 pages
Decision Tree Algorithms Guide
No ratings yet
Decision Tree Algorithms Guide
54 pages
Machine Learning: B.E, M.Tech, PH.D
No ratings yet
Machine Learning: B.E, M.Tech, PH.D
23 pages
Chap9 Cart 574 1
No ratings yet
Chap9 Cart 574 1
42 pages
Decision Tree
No ratings yet
Decision Tree
68 pages
Tree-Based Methods
No ratings yet
Tree-Based Methods
32 pages
QMC Lectures Cyrusumrigar Upmc 2015
No ratings yet
QMC Lectures Cyrusumrigar Upmc 2015
183 pages
MC Lecture
No ratings yet
MC Lecture
123 pages
Seminar Artificial Neural Network 24 9
No ratings yet
Seminar Artificial Neural Network 24 9
39 pages
Introduction To Machine Learning
No ratings yet
Introduction To Machine Learning
4 pages
Regression and Classification
No ratings yet
Regression and Classification
20 pages
Support Vector Machine
No ratings yet
Support Vector Machine
30 pages
Ensemble Learning
No ratings yet
Ensemble Learning
19 pages
FINnet 2.0: Transforming FIU India
No ratings yet
FINnet 2.0: Transforming FIU India
3 pages
Assignment4 Group3.CC01.Forecasting-1
No ratings yet
Assignment4 Group3.CC01.Forecasting-1
11 pages
Sonic New Catalog 2023
No ratings yet
Sonic New Catalog 2023
12 pages
Opus 10
No ratings yet
Opus 10
1 page
Expansion Tank 3408
No ratings yet
Expansion Tank 3408
4 pages
Asme b16.10 RF Bwe Ftof Dims Ball GGC Plug
No ratings yet
Asme b16.10 RF Bwe Ftof Dims Ball GGC Plug
1 page
Parsing of Myanmar Sentences With Function Tagging
No ratings yet
Parsing of Myanmar Sentences With Function Tagging
19 pages
Flattening and Reshaping in CNN
No ratings yet
Flattening and Reshaping in CNN
4 pages
grade 12 mock exams new - (2020-2019) عمان
No ratings yet
grade 12 mock exams new - (2020-2019) عمان
71 pages
3.1. Test 1 Unit 3
No ratings yet
3.1. Test 1 Unit 3
4 pages
Gem 511687762195283 07042025
No ratings yet
Gem 511687762195283 07042025
4 pages
Planmeca FIT CAD/CAM System: User's Manual
No ratings yet
Planmeca FIT CAD/CAM System: User's Manual
192 pages
Transformer Basics for Engineers
No ratings yet
Transformer Basics for Engineers
10 pages
Statistics in Engineering With Examples in MATLAB and R Second Edition Chapman Hall CRC Texts in Statistical Science Andrew Metcalfe
No ratings yet
Statistics in Engineering With Examples in MATLAB and R Second Edition Chapman Hall CRC Texts in Statistical Science Andrew Metcalfe
55 pages
DFS60B-THCM03600: Incremental Encoders
No ratings yet
DFS60B-THCM03600: Incremental Encoders
8 pages
Veronika CV
No ratings yet
Veronika CV
1 page
Certificate of Compliance: The Products Listed Below Are Eligible To Bear The CSA Mark Shown
No ratings yet
Certificate of Compliance: The Products Listed Below Are Eligible To Bear The CSA Mark Shown
2 pages
Excel VBA Macro Cheat Sheet
No ratings yet
Excel VBA Macro Cheat Sheet
3 pages
FLL Challenge Submerged Bi Book 8 Enus
No ratings yet
FLL Challenge Submerged Bi Book 8 Enus
44 pages
AI in Gaming
No ratings yet
AI in Gaming
2 pages
ALPS Catalogue PDF
No ratings yet
ALPS Catalogue PDF
34 pages
5G Base Station Energy Efficiency
No ratings yet
5G Base Station Energy Efficiency
5 pages
Mototec - Troubleshooting: Edition 1.0
No ratings yet
Mototec - Troubleshooting: Edition 1.0
4 pages
AI Predictions: Seven Deadly Sins
No ratings yet
AI Predictions: Seven Deadly Sins
9 pages
Nebo Torchy
No ratings yet
Nebo Torchy
2 pages
Study Material - Texturing - CC 6
No ratings yet
Study Material - Texturing - CC 6
68 pages
Material Specification Sheet Saarstahl - 30Mnvs6 (27mnsivs6) - Saarform 900
100% (1)
Material Specification Sheet Saarstahl - 30Mnvs6 (27mnsivs6) - Saarform 900
1 page
Unbalanced Faults Analysis in Grid - Connected PV System - Listhianne
No ratings yet
Unbalanced Faults Analysis in Grid - Connected PV System - Listhianne
6 pages
VSD Commissioning
No ratings yet
VSD Commissioning
2 pages
File DXE Driver Setup Setup - ffs.0.0.en-US - Ifr
No ratings yet
File DXE Driver Setup Setup - ffs.0.0.en-US - Ifr
1,003 pages

Decision Tree

Uploaded by

Decision Tree

Uploaded by

SCMA801204 – Advanced Machine Learning

Dr. rer. nat. Hendri Murfi

Machine Learning Group

• A Decision Tree is a model like a

• Nodes represent testing of a

• Branches represent the results of

• Each leaf node represents a class or

• The path from root to leaf states the

• Decision Tree is a machine

• Meanwhile, other machine

• Iterative Dichotomiser 3 (ID3) is an algorithm for building a Decision Tree

• Classification And Regression Tree (CART) is an algorithm likes C4.5, but

Given training data {xn, tn}, n = 1 to N

• Thus, the selection of features and threshold values that create

• The process of selecting features and threshold values is

where 𝐼(𝑡 = 𝑘) is an indicator function having value of 1 if 𝑡 = 𝑘

 Mean Absolute Error

Obs O3 PM10 PM2.5 NO2 T2M Mortality Rate

• There are five features that can be candidates as root nodes.

2 30,977 15,687 7,250 27,145 276,907 1,498 Yes No

3 46,849 13,461 4,166 16,238 280,981 1,387

So, the impurity of each branch  = {O3, 22,251} is follows:

= 1,208 − 1,208 Yes No

• So, the information gain of  = {O3, 22,251} is follows:

PM10≤ 15,687 PM2.5 ≤ 4,166 NO2≤ 5,192

Yes No Yes No Yes No

Yes No For example, for each feature, the chosen threshold

• So, the information gain for all features 𝜃 is obtained as follows:

minimum number of samples

• The main parameter that needs to be optimized at the model

C. H. Bishop (2006). Pattern Recognition and Machine Learning, Springer

You might also like