0% found this document useful (0 votes)

12 views15 pages

Decision_tree

A decision tree is a supervised machine learning algorithm used for classification and regression, structured as a hierarchical model with nodes representing decisions based on features and leaf nodes indicating outcomes. The algorithm involves feature selection, recursive splitting of data into homogeneous subsets, and making predictions by traversing the tree. Key concepts include entropy, information gain, and Gini impurity, with various algorithms like ID3, C4.5, and CART utilized for constructing decision trees.

Uploaded by

brokenbottle571

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

12 views15 pages

Decision_tree

Uploaded by

brokenbottle571

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Decision Tree

Decision Tree: Introduction

❖ A decision tree is a popular and versatile supervised machine learning algorithm used for both
classification and regression tasks.

❖ A decision tree is a hierarchical structure that resembles an inverted tree, where each internal
node represents a decision based on a feature or attribute, and each leaf node represents a class
label (in classification) or a numerical value (in regression).

A decision tree comprises of:

▪ Root node: This is the starting point of the tree

which contains the feature that best splits the
dataset into two or more homogeneous sets
based on certain criteria.
▪ Internal Nodes: These nodes represent decision
points containing decision rules based on a
feature.
▪ Branches: The branches emanating from each
internal node represent the possible outcomes
or values of the corresponding feature.
▪ Leaf nodes: These are the terminal nodes of the
tree. Each leaf node represents a class label (in Fig. 1: Iris Decision Tree
classification) or a predicted value (in regression).
Decision Trees: Introduction (Cont.)
Here's how it works:

▪ Feature Selection: The algorithm selects the feature that

results in the most homogeneous subsets (i.e., subsets
with similar outcomes) after splitting.
▪ Splitting: Once a feature is selected, the dataset is split
into subsets based on the possible values of that feature.
This process is repeated recursively for each subset until
a stopping criterion is met. as the prediction.

▪ Final outcomes: At the end of the process, the tree

forms leaf nodes that represent the final outcomes or Fig. 2: Iris Decision Tree
predictions.
▪ Prediction: To make predictions for new data, the algorithm traverses the tree from the root node to
a leaf node based on the values of the features in the new data. The outcome associated with the
leaf node reached is then assigned as the prediction.
For examples, let us consider two new instances with, say, petal length= 1.67 cm, petal width=1.78
cm and petal length= 2.9 cm, petal width=1.78 cm.

❖ Decision trees are versatile machine learning algorithms that can perform both classification and
regression tasks, and even multioutput tasks, capable of fitting complex datasets, and are
fundamental components of random forests.
Working of Decision Tree: Toy Example

❖ We have data for 15 students on pass or fail in an online ML Table 1: The toy data set
exam. To understand the basic process we begin with a dataset
which comprises a target variable that is binary ( Pass/Fail) and
various binary or categorical predictor variables such as:

• whether enrolled in other online courses

• whether student is from a maths, computer science or
other background
• Whether working or not working

❖ Selection of feature variable root node:

• The variable that best splits the target column into

homogeneous sets. Our candidates are: Background,
Working Status and Other Online Courses.
• Before evaluating the candidate variables for root node, we
have to define a measure of homogeneity or purity of a
node:
I. Entropy and Information gain
II. Gini impurity or Gini index

Fig. 3: Working Status as root node

Feature variable for decision node: Entropy and Information Gain

❖ Entropy: In information theory, the entropy of a random variable is the average level of
“information”, “surprise”, or “uncertainty” inherent to the variable’s possible outcomes. It is
measured by the formula:

where the pi is the probability of randomly selecting an example of ith outcome of target variable. For
example, ppass= number of passing students/ total students= 9/15, pfail= number of fails/ total
students= 6/15.
❖ Information Gain: Entropyparent – Entropychildren
where Entropyparent is the entropy of the parent node and Entropychildren represents the average
entropy of the child nodes that are obtained by splitting the parent node.

❖ To calculate entropy, first let us put our formulas for Entropy and Information Gain in terms of
variables in our dataset:
• 𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑜𝑓 𝑝𝑎𝑠𝑠 𝑜𝑟 𝑓𝑎𝑖𝑙 𝑎𝑡 𝑒𝑎𝑐ℎ 𝑛𝑜𝑑𝑒 = 𝑁𝑜. 𝑜𝑓 𝑝𝑎𝑠𝑠 𝑜𝑟 𝑓𝑎𝑖𝑙/ 𝑠𝑎𝑚𝑝𝑙𝑒 − 𝑠𝑖𝑧𝑒 𝑜𝑓 𝑡ℎ𝑒 𝑛𝑜𝑑𝑒

• 𝐸𝑛𝑡𝑟𝑜𝑝𝑦 𝑜𝑓 𝑎𝑛𝑦 𝑛𝑜𝑑𝑒 = 𝐸 = − 𝑝𝑝𝑎𝑠𝑠 𝑙𝑜𝑔2 𝑝𝑝𝑎𝑠𝑠 + 𝑝𝑓𝑎𝑖𝑙 𝑙𝑜𝑔2 𝑝𝑓𝑎𝑖𝑙

𝑆𝑐,𝑗
• 𝐴𝑣𝑒𝑟𝑎𝑔𝑒 𝑒𝑛𝑡𝑟𝑜𝑝𝑦 𝑎𝑐𝑟𝑜𝑠𝑠 𝑡ℎ𝑒 𝑐ℎ𝑖𝑙𝑑 𝑛𝑜𝑑𝑒𝑠 = 𝐸𝑎𝑣,𝑐ℎ𝑖𝑙𝑑 = σ𝑡𝑜𝑡𝑎𝑙
𝑗=1
𝑐ℎ𝑖𝑙𝑑 𝑛𝑜𝑑𝑒𝑠
𝐸𝑗
𝑆𝑝
• 𝐼𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛 𝐺𝑎𝑖𝑛 = 𝐸𝑝𝑎𝑟𝑒𝑛𝑡 − 𝐸𝑎𝑣,𝑐ℎ𝑖𝑙𝑑
Toy Example (cont.): Evaluating candidate variables for root node

8 8 7 7
❖ 𝐸𝑝𝑎𝑟𝑒𝑛𝑡 = − 𝑝𝑝𝑎𝑠𝑠 𝑙𝑜𝑔2 𝑝𝑝𝑎𝑠𝑠 + 𝑝𝑓𝑎𝑖𝑙 𝑙𝑜𝑔2 𝑝𝑓𝑎𝑖𝑙 =− ∗ 𝑙𝑜𝑔2 + ∗ 𝑙𝑜𝑔2 =0.9968
15 15 15 15

❖ Let us calculate the average entropy of child nodes, 𝐸𝑎𝑣,𝑐ℎ𝑖𝑙𝑑 , and 𝐼𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛 𝐺𝑎𝑖𝑛 for each
candidate feature variables in turn.
❖ Work Status as feature variable of the root node: ❖ In a similar fashion we can evaluate
𝐸𝑤𝑜𝑟𝑘𝑖𝑛𝑔 = −
3
∗ 𝑙𝑜𝑔2
3 6
+ ∗ 𝑙𝑜𝑔2
6
=0.9183
the entropy and information gain for
9 9 9 9 Student Background and Online
5 5 1 1
𝐸𝑛𝑜𝑡−𝑤𝑜𝑟𝑘𝑖𝑛𝑔 = − ∗ 𝑙𝑜𝑔2 + ∗ 𝑙𝑜𝑔2 =0.65 Courses variables. The results are
6 6 6 6
9 6
𝐸𝑎𝑣,𝑐ℎ𝑖𝑙𝑑 =
15
∗ 0.9183 + ∗ 0.65 = 0.8110
15
provided in the table below:
𝐼𝑛𝑓𝑜𝑟𝑚𝑎𝑡𝑖𝑜𝑛 𝐺𝑎𝑖𝑛 = 0.9968 − 0.8110 = 0.1858

Table 2: Root node calculation

Root node Entropy- Average Information
variable Nodes node entropy Gain
Parent 0.9968
Working 0.9183
Working status Not Working 0.6500 0.8110 0.1858
Backgound_Math 0.9852
Backgound_CS 0.0000
Background Backgound_others 0.0000 0.4598 0.5370
online_course_yes 0.9544
Online course online_course_no 0.9852 0.9688 0.0280
Fig. 4: Decision tree for the toy problem
Gini impurity as feature selection criterion

❖ Gini Impurity or Index:

The Gini Index or Impurity measures the probability for a random instance being misclassified when
chosen randomly.

❖ The Gini Index or Impurity of ith node:

In this equation:
• Gi is the Gini impurity of the ith node.
• pi,k is the ratio of class k instances among the training instances in the ith node.

❖ CART cost function for classification:

Gini impurity vs Information Gain
Tree algorithms:

❖ ID3 (Iterative Dichotomiser 3) was developed in 1986 by Ross Quinlan. It supports categorical
feature only; tree is fully grown, pruning is applied afterwards.

❖ C4.5 is the successor to ID3 and removed the restriction that features must be categorical.

❖ C5.0 is Quinlan’s latest version release under a proprietary license. It uses less memory and builds
smaller rulesets than C4.5 while being more accurate.

❖ CART (Classification and Regression Trees) is very similar to C4.5, but it differs in that it supports
numerical target variables (regression) and does not compute rule sets. CART constructs binary
trees using the feature and threshold that yield the largest information gain at each node.

❖ scikit-learn uses an optimized version of the CART algorithm; however, the scikit-learn
implementation does not support categorical variables for now
CART (Classification and Regression Trees)

❖ CART algorithm recursively partitions the feature space such that the samples with the same
labels or similar target values are grouped together.

❖ The algorithm works by first splitting the training set into two subsets using a single feature k and
a threshold tk (e.g., “petal length ≤ 2.45 cm”) such that it produces the purest subsets, weighted
by their size.

❖ CART cost function for classification:

❖ The process continues until no further split is possible or it hits one of its stopping criteria.

❖ CART algorithm is a greedy algorithm; it often produces a solution that’s reasonably good but not
guaranteed to be optimal.

❖ Finding the optimal tree is known to be an NP-complete problem. It requires O(exp(m)) time,
making the problem intractable even for small training sets.

❖ Computational Complexity: Prediction – O(log2(m)), Training - O(n × m log2(m)).

CART: Hyperparameters

❖ criterion{“gini”, “entropy”, “log_loss”}, ❖ Pruning: The process of reducing the size of

default=”gini” the tree by removing parts of it such that it
does not overfit.
❖ splitter{“best”, “random”}, default=”best”
❖ Regularization Parameters:
❖ max_depth: int, default=None
Pre-pruning: max_depth, min_samples_split,
max_features, max_leaf_nodes, etc.
❖ min_samples_split: int or float, default=2
Post-pruning: ccp_alpha
If float, then min_samples_split is a fraction and
ceil(min_samples_split * n_samples) are the
❖ Increasing min_* hyperparameters or
minimum number of samples for each split. reducing max_* hyperparameters will
regularize the model.
❖ min_weight_fraction_leaf: float, default=0.0

❖ max_features: int, float or {“sqrt”, “log2”},

default=None

❖ max_leaf_nodes: int, default=None

❖ min_impurity_decrease: float, default=0.0

❖ ccp_alpha: non-negative float, default=0.0
❖ The unregularized model on the left is clearly
overfitting, and the regularized model on the
❖ random_state: int, None, default=None right will probably generalize better.
Minimal Cost Complexity Pruning

❖ The idea behind minimal cost-complexity pruning is this:

❖ For any subtree 𝑇 ≤ 𝑇𝑚𝑎𝑥 , define its complexity as 𝑇෨ , the number of terminal nodes in T. Let α
𝛼 ≥ 0 be a real number called the complexity parameter and define the cost complexity measure
𝑅𝛼 𝑇 as
𝑅𝛼 𝑇 = 𝑅 𝑇 + 𝛼 𝑇෨

❖ For each value of α, algorithms finds that subtree 𝑇 𝛼 ≤ 𝑇𝑚𝑎𝑥 which minimizes 𝑅𝛼 𝑇 .

❖ If 𝛼 is small, the penalty for having a large number of terminal nodes is small and 𝑇 𝛼 will be
large.
For instance, if 𝑇𝑚𝑎𝑥 is so large that each terminal node contains only one case, then every case is
classified correctly; 𝑅 𝑇𝑚𝑎𝑥 = 0, so that 𝑇𝑚𝑎𝑥 minimizes 𝑅𝛼 𝑇 .

❖ Finally, for 𝛼 sufficiently large, the minimizing subtree 𝑇 𝛼 will consist of the root node only, and
the tree 𝑇𝑚𝑎𝑥 will have been completely pruned.

❖ Although 𝛼 runs through a continuum of values, there are at most a finite number of subtrees of
𝑇𝑚𝑎𝑥 . Thus, the pruning process produces a finite sequence of subtrees T1, T2, T3, ... with
progressively fewer terminal nodes.
Decision Tree: Regression

❖ Regressor: Decision trees are also capable of performing regression tasks. The main difference
with classification tree is that instead of predicting a class in each node, it predicts a value.

❖ Accordingly, the CART regression algorithm works such that instead of trying to split the training
set in a way that minimizes impurity, it now tries to split the training set in a way that minimizes
the MSE. CART cost function for regression is given below:

❖ Scikit-Learn’s DecisionTreeRegressor class

import numpy as np
from sklearn.tree import DecisionTreeRegressor
np.random.seed(42)
X_quad = np.random.rand(200, 1) - 0.5 # a single
random input feature
y_quad = X_quad ** 2 + 0.025 *
np.random.randn(200, 1)

tree_reg = DecisionTreeRegressor(max_depth=2,
random_state=42)
tree_reg.fit(X_quad, y_quad) Fig. 5: A decision tree for regression
Decision Tree: Regression (cont.)

Fig. 5: A decision tree for regression

Fig. 6: Predictions of two decision tree regression models

Decision Tree: Challenges

❖ Sensitivity to Axis Orientation: Decision boundaries are always orthogonal to the feature
axis, which tend to become unnecessarily complex, consequently.

Fig. 7: Sensitivity to training set rotation

❖ Decision Trees Have a High Variance: Luckily, by averaging predictions over many trees, it’s possible to
reduce variance significantly. Such an ensemble of trees is called a random forest, and it’s one of the
most powerful types of models available today

MT WF-C878R WF-C878Ra Rev.D
No ratings yet
MT WF-C878R WF-C878Ra Rev.D
783 pages
1.4 Practice Key
No ratings yet
1.4 Practice Key
5 pages
Supplemental Guide On The Procedures For Online Surveys Using Askallo
No ratings yet
Supplemental Guide On The Procedures For Online Surveys Using Askallo
33 pages
The Practically Cheating Calculus Handbook
From Everand
The Practically Cheating Calculus Handbook
S. Deviant
3.5/5 (7)
Sissor Table
100% (2)
Sissor Table
93 pages
Mind Map All in One PBA-2021-08-20 12-38-51
No ratings yet
Mind Map All in One PBA-2021-08-20 12-38-51
1 page
Dyna DM2800 Schematics
100% (1)
Dyna DM2800 Schematics
4 pages
ML-unit-3
No ratings yet
ML-unit-3
22 pages
Lecture 17 18
No ratings yet
Lecture 17 18
52 pages
Unit IV Decision Trees
No ratings yet
Unit IV Decision Trees
37 pages
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
31 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
L04 Decision Trees
No ratings yet
L04 Decision Trees
34 pages
ML CLASS 6 Decision Tree Algorithm
No ratings yet
ML CLASS 6 Decision Tree Algorithm
21 pages
Lecture 04 Decession Trees 04112022 015118pm
No ratings yet
Lecture 04 Decession Trees 04112022 015118pm
43 pages
Cse 445 Lecture 8 Mma
No ratings yet
Cse 445 Lecture 8 Mma
107 pages
Decision Tree
No ratings yet
Decision Tree
31 pages
Decision Tree Algorithm, Explained-1-22
No ratings yet
Decision Tree Algorithm, Explained-1-22
22 pages
DECISION TREES-jb
No ratings yet
DECISION TREES-jb
8 pages
U4 ML Updated
No ratings yet
U4 ML Updated
32 pages
Chapter 4classification and Prediction
No ratings yet
Chapter 4classification and Prediction
19 pages
unit-4[1].docx ML
No ratings yet
unit-4[1].docx ML
42 pages
Decision Tree
No ratings yet
Decision Tree
20 pages
Decision Tree in Machine Learning
No ratings yet
Decision Tree in Machine Learning
11 pages
ML_Module-3-chapter-6 RNSIT
No ratings yet
ML_Module-3-chapter-6 RNSIT
10 pages
Ml Unit 2 Final_iii Yr
No ratings yet
Ml Unit 2 Final_iii Yr
72 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
STAT 451: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
No ratings yet
STAT 451: Machine Learning Lecture Notes: Sebastian Raschka Department of Statistics University of Wisconsin-Madison
18 pages
decision tree
No ratings yet
decision tree
13 pages
S&ML Unit 6- Q & A
No ratings yet
S&ML Unit 6- Q & A
12 pages
Decision Trees
No ratings yet
Decision Trees
38 pages
AI&Ml-module 4 (Complete)
No ratings yet
AI&Ml-module 4 (Complete)
124 pages
AI&Ml-module 4 (Part 1)
No ratings yet
AI&Ml-module 4 (Part 1)
85 pages
Unit-3 Alt
No ratings yet
Unit-3 Alt
24 pages
Unit IV Da Online - PPTX 2 82
No ratings yet
Unit IV Da Online - PPTX 2 82
81 pages
Supervised Decision TreeRandom Forest
No ratings yet
Supervised Decision TreeRandom Forest
39 pages
Decision Trees
No ratings yet
Decision Trees
45 pages
ML Unit II
No ratings yet
ML Unit II
183 pages
08 Decision - Tree
No ratings yet
08 Decision - Tree
9 pages
Decisiontree 2
No ratings yet
Decisiontree 2
16 pages
Classification and Prediction
No ratings yet
Classification and Prediction
81 pages
Decision Trees_ a Complete Introduction With Examples _ by Shubham Koli _ Medium
No ratings yet
Decision Trees_ a Complete Introduction With Examples _ by Shubham Koli _ Medium
22 pages
ML_UNIT_03
No ratings yet
ML_UNIT_03
23 pages
Decision Tree Classification Algorithm
No ratings yet
Decision Tree Classification Algorithm
14 pages
Decision Trees-Lecture 9&10
No ratings yet
Decision Trees-Lecture 9&10
60 pages
u34
No ratings yet
u34
4 pages
DT-0 (3 Files Merged)
No ratings yet
DT-0 (3 Files Merged)
143 pages
Decision Trees Lectures
No ratings yet
Decision Trees Lectures
55 pages
7_DecisionTree
No ratings yet
7_DecisionTree
58 pages
19 -- Decision Tree -- ID3
No ratings yet
19 -- Decision Tree -- ID3
87 pages
Module 4 Lecture -2
No ratings yet
Module 4 Lecture -2
65 pages
ML Unit 3
No ratings yet
ML Unit 3
14 pages
LAB (1) Decision Tree: Islamic University of Gaza Computer Engineering Department Artificial Intelligence ECOM 5038
No ratings yet
LAB (1) Decision Tree: Islamic University of Gaza Computer Engineering Department Artificial Intelligence ECOM 5038
18 pages
ML Unit 3 New
100% (1)
ML Unit 3 New
24 pages
ML_UNIT3
No ratings yet
ML_UNIT3
24 pages
Decision Tree Classification Algorithm (2)
No ratings yet
Decision Tree Classification Algorithm (2)
11 pages
Machine_Learning_Lecture_08_Decision Tree Learning (1)
No ratings yet
Machine_Learning_Lecture_08_Decision Tree Learning (1)
67 pages
DMDW-CO3-SESSION-14
No ratings yet
DMDW-CO3-SESSION-14
55 pages
Lecture 6 - Decision Trees
No ratings yet
Lecture 6 - Decision Trees
43 pages
Decision Tree.pptx
No ratings yet
Decision Tree.pptx
41 pages
Unit 3 (A) NGP
No ratings yet
Unit 3 (A) NGP
78 pages
Unit-II
No ratings yet
Unit-II
34 pages
Data Mining: Concepts and Techniques: - Chapter 7
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 7
61 pages
Lecture Note #5_PEC-CS701E
No ratings yet
Lecture Note #5_PEC-CS701E
16 pages
unit3-ml
No ratings yet
unit3-ml
23 pages
Decision Tree
No ratings yet
Decision Tree
74 pages
Regression
No ratings yet
Regression
25 pages
OS LAB-IPC using PIPE
No ratings yet
OS LAB-IPC using PIPE
9 pages
Short notes.
No ratings yet
Short notes.
2 pages
Neural Network
No ratings yet
Neural Network
22 pages
Cn 1-5.
No ratings yet
Cn 1-5.
46 pages
Soft computing.
No ratings yet
Soft computing.
51 pages
Smart Acu 2000 B
No ratings yet
Smart Acu 2000 B
1 page
s71500 Cpu 1517t 3 PNDP Manual en-US en-US
No ratings yet
s71500 Cpu 1517t 3 PNDP Manual en-US en-US
55 pages
Bsc. (Hons) Web Technologies: Cohort: Bwt/08/Ft
No ratings yet
Bsc. (Hons) Web Technologies: Cohort: Bwt/08/Ft
6 pages
Sem
No ratings yet
Sem
19 pages
Syllabus
No ratings yet
Syllabus
3 pages
Geological Database
No ratings yet
Geological Database
147 pages
The Cheat Sheet To Compositing by Pablo Muñoz Gomez
No ratings yet
The Cheat Sheet To Compositing by Pablo Muñoz Gomez
20 pages
Kumar Subramani
No ratings yet
Kumar Subramani
9 pages
Forecastingtime Series
No ratings yet
Forecastingtime Series
24 pages
SDA SCL: //wire - Begin //connects I2C // Serial - Begin (9600)
No ratings yet
SDA SCL: //wire - Begin //connects I2C // Serial - Begin (9600)
6 pages
PMM-MD-53030-10 5 Phase Stepper Driver
No ratings yet
PMM-MD-53030-10 5 Phase Stepper Driver
18 pages
10 Math Tricks That Will Blow Your Mind
No ratings yet
10 Math Tricks That Will Blow Your Mind
5 pages
A50U-PCLNR 16 Boring Bar Dia 50 MM SANDVIK
No ratings yet
A50U-PCLNR 16 Boring Bar Dia 50 MM SANDVIK
2 pages
Chapter 5: Queues: (Data Structures and Algorithms)
No ratings yet
Chapter 5: Queues: (Data Structures and Algorithms)
39 pages
Unit 1a Awareness of Cyber Crimes and Security
No ratings yet
Unit 1a Awareness of Cyber Crimes and Security
32 pages
Module 2 PPP
No ratings yet
Module 2 PPP
37 pages
SDWAN
100% (1)
SDWAN
281 pages
Level 4 Diploma in IT Networking
No ratings yet
Level 4 Diploma in IT Networking
4 pages
Fulltext
No ratings yet
Fulltext
115 pages
Trial XX
No ratings yet
Trial XX
12 pages
To Do List Abstract
No ratings yet
To Do List Abstract
4 pages
Infoistic Company Profile PDF
No ratings yet
Infoistic Company Profile PDF
11 pages
UINTERFREQNCELL
No ratings yet
UINTERFREQNCELL
2 pages
三国演义_英文 1-4册 -- 罗贯中著;罗慕士译 -- 1995 -- 北京_外文出版社 -- 9787119005904 -- 37501c50c5edd3801f4375b19626f4f7 -- Anna’s Archive
No ratings yet
三国演义_英文 1-4册 -- 罗贯中著;罗慕士译 -- 1995 -- 北京_外文出版社 -- 9787119005904 -- 37501c50c5edd3801f4375b19626f4f7 -- Anna’s Archive
2,400 pages