0 ratings 0% found this document useful (0 votes) 41 views 17 pages Decision Tree
The document provides an overview of the Decision Tree algorithm, highlighting its structure, key concepts such as entropy, information gain, and Gini impurity. It explains how to build a decision tree using the ID3 algorithm with a practical example related to playing football based on weather conditions. Additionally, it discusses the advantages and disadvantages of decision trees, including their interpretability and susceptibility to overfitting.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here .
Available Formats
Download as PDF or read online on Scribd
Go to previous items Go to next items
Save Decision Tree For Later 3727124, 12:46 PM Decision Tree Algorithm Wih Hands-On Example | by Arun Mohan | DataDrveninvestor
PR ETT C0 ee
Decision Tree Algorithm With
Hands-On Example
@ Arun Mohan - Follow
3 Published in DataDriveninvestor . 6 minread - Jan 23,2019
Openinapp 7
Gignup) signin
Ot Medium = seen F write
— sa gate
Splitting © ara Tey
-ntps:Imecium datacriveninvestor comidecisionree.algorit-wit-hands-on-example-e6cZafb40u38 wrsro7is, 246 PM Decision Tee Algorthm With Hands-On Example | by Arun Mohan | DataDivenlnvestor
The decision tree is one of the most important machine learning algorithms.
It is used for both classification and regression problems. In this article, we
will go through the classification part.
What is a decision tree?
A decision tree is a classification and prediction tool having a tree-like
structure, where each internal node denotes a test on an attribute, each
branch represents an outcome of the test, and each leaf node (terminal
node) holds a class label.
Height > 180cm
Yes | No
Weight > 80kg
Male
Yes | No
Male Female
Above we have a small decision tree. An important advantage of the decision
tree is that it is highly interpretable. Here If Height > 180cm or if height <
180cm and weight > 80kg person is male.Otherwise female. Did you ever
think about how we came up with this decision tree? I will try to explain it
using the weather dataset.
-ntps:Imecium datacriveninvestor comidecisionree.algorit-wit-hands-on-example-e6cZafb40u38 aursro7is, 246 PM Decision Tee Algorithm With Hands-On Example by Arun Mohan | DataDriveninvestor
Before going to it further I will explain some important terms related to
decision trees.
Entropy
In machine learning, entropy is a measure of the randomness in the
information being processed. The higher the entropy, the harder it is to draw
any conclusions from that information.
HO) =— PG) bs, Pla)
Information Gain
Information gain can be defined as the amount of information gained about
a random variable or signal from observing another random variable.It can
be considered as the difference between the entropy of parent node and
weighted average entropy of child nodes.
1G(5,4) = H(S) — H(S,A)
Aterativty,
16(5,4) = H(S) — PG) « HG)
Gini Impurity
htpssimecium,datadrveninvestorcomidecsion-ree-algort-wi-hands-or-example-eBc2afb40u38 snrsro7is, 246 PM Decision Tee Algorthm With Hands-On Example | by Arun Mohan | DataDivenlnvestor
Gini impurity is a measure of how often a randomly chosen element from
the set would be incorrectly labeled if it was randomly labeled according to
the distribution of labels in the subset.
Gin E) =1- Dh. P?
Gini impurity is lower bounded by 0, with 0 occurring if the data set contains
only one class.
Entropy vs GINI
42
og
os
04
02
Entropy e=GINI
There are many algorithms there to build a decision tree. They are
1. CART (Classification and Regression Trees) — This makes use of Gini
impurity as the metric.
2, ID3 (Iterative Dichotomiser 3) — This uses entropy and information gain
as metric,
-ntps:Imecium datacriveninvestor comidecisionree.algorit-wit-hands-on-example-e6cZafb40u38 aursro7is, 246 PM Decision Tee Algorithm With Hands-On Example by Arun Mohan | DataDriveninvestor|
In this article, I will go through ID3. Once you got it it is easy to implement
the same using CART.
Clas:
ation using the ID3 algorithm
Consider whether a dataset based on which we will determine whether to
play football or not.
Outlook “Temperature Humidity (Wind Played football{yes/no}
Sunny Hot High Weak No
Sunny Hot High strong No
Overcast Hot High Weak Yes
Rain Mild High Weak Yes
Rain Cool Normal Weak Yes
Rain Cool Normal Strong No
Overcast Cool Normal Strong Yes
Sunny Mild High Weak No
Sunny Cool Normal Weak Yes
Rain Mild Normal Weak Yes
Sunny Mild Normal Strong Yes
Overcast Mild High strong Yes
Overcast Hot Normal Weak yes
Rain Mild High strong No
Here There are for independent variables to determine the dependent
variable. The independent variables are Outlook, Temperature, Humidity,
and Wind. The dependent variable is whether to play football or not.
‘As the first step, we have to find the parent node for our decision tree. For
that follow the steps:
Find the entropy of the class variable.
E(S)
[(9/14)log(9/14) + (5/14)log(5/14)] = 0.94
htpssimecium,datadrveninvestorcomidecsion-ree-algort-wi-hands-or-example-eBc2afb40u38 sirsro7is, 246 PM Decision Tee Algorithm With Hands-On Example by Arun Mohan | DataDriveninvestor
note: Here typically we will take log to base 2.Here total there are 14 yes/no.
Out of which 9 yes and 5 no.Based on it we calculated probability above.
From the above data for outlook we can arrive at the following table easily
play
yes no total
sunny 3 2 5
Outlook overcast 4 o 4
rainy 2 3 5
4
Now we have to calculate average weighted entropy. ie, we have found the total
of weights of each feature multiplied by probabilities.
E(S, outlook) = (5/14)*E(3,2) + (4/14)*E(4,0) + (5/14)*E(2,3) = (5/14)(-
(3/S)log(3/5)-(2/5)log(2/5))+ (4/14)(0) + (5/14) ((2/5)log(2/5)-(3/5)log(3/5)) =
0.693
The next step is to find the information gain. It is the difference between
parent entropy and average weighted entropy we found above.
IG(S, outlook) = 0.94 - 0.693 = 0.247
Similarly find Information gain for Temperature, Humidity, and Windy.
IG(S, Temperature) = 0.940 - 0.911 = 0.029
IG(S, Humidity) = 0.940 - 0.788 = 0.152
IG(S, Windy) = 0.940 - 0.8932 = 0.048
htpssimecium,datadrveninvestorcomidecsion-ree-algort-wi-hands-or-example-eBc2afb40u38 snr3727124, 12:46 PM
Now our data look as follows
Decision Tee Algorithm With Hands-On Example | by Aun Mehan | Dataiveninvesior
Now select the feature having the largest entropy gain. Here it is Outlook. So it
forms the first node(root node) of our decision tree.
Outlook [-7\Temperature Humidity Wind Played football(yes/no)
Sunny Hot High Weak No
Sunny Hot High strong No
Sunny Mild High Weak No
Sunny Cool Normal Weak Yes
Sunny Mild Normal Strong Yes
Outlook [7 Temperature Humidity Wind Played football(yes/no}
Overcast Hot High Weak Yes
Overcast Cool Normal Strong Yes
Overcast Mild High Strong Yes
Overcast Hot Normal Weak Yes
Qutlook |-T|Temperature Humidity Wind Played football{yes/no)
Rain Mild High Weak Yes
Rain Cool Normal Weak Yes
Rain Cool Normal Strong No
Rain Mild Normal Weak Yes
Rain Mild High Strong No
Since overcast contains only examples of class ‘Yes’ we can set it as yes. That
means If outlook is overcast football will be played. Now our decision tree
looks as follows.
-ntps:Imecium datacriveninvestor comidecisionree.algorit-wit-hands-on-example-e6cZafb40u38 mr3727124, 12:46 PM Decision Tree Algorithm Wih Hands-On Example | by Arun Mohan | DataDrveninvestor
sunny Rain
The next step is to find the next node in our decision tree. Now we will find
one under sunny. We have to determine which of the following Temperature,
Humidity or Wind has higher information gain.
Outlook [*|Temperature Humidity (Wind Played football{yes/no}
Sunny Hot High Weak No
Sunny Hot High Strong No
sunny Mild High Weak No
Sunny Cool Normal Weak Yes
Sunny Mild Normal Strong Yes
Calculate parent entropy E(sunny)
E(sunny) = (-(3/5)log(3/5)-(2/5)log(2/5)) = 0.971.
Now Calculate the information gain of Temperature. IG(sunny, Temperature)
| play
| yes no total
hot o Z 2
[Temperature cool 1 ri 2
mild a o 1
| 5
E(sunny, Temperature) = (2/5)*E(0,2) + (2/5)*E(1,1) + (1/5)*E(1,0)=2/5=0.4
htpssimecium,datadrveninvestorcomidecsion-ree-algort-wi-hands-or-example-eBc2afb40u38 sir3727124, 12:46 PM Decision Tree Algorithm Wih Hands-On Example | by Arun Mohan | DataDrveninvestor
Now calculate information gain.
1G(sunny, Temperature) = 0.971-0.4 =0.571
Similarly we get
1G(sunny, Humidity) = 0.971
1G(sunny, Windy) = 0.020
Here IG(sunny, Humidity) is the largest value. So Humidity is the node that
comes under sunny.
play
Humidity yes no
high ° 3
normal 2 0
For humidity from the above table, we can say that play will occur if
humidity is normal and will not occur if it is high. Similarly, find the nodes
under rainy.
Note: A branch with entropy more than 0 needs further splitting.
Finally, our decision tree will look as below:
htpssimecium,datadrveninvestorcomidecsion-ree-algort-wi-hands-or-example-eBc2afb40u38 snr3727124, 12:46 PM Decision Tree Algorithm Wih Hands-On Example | by Arun Mohan | DataDrveninvestor
Outlook
Sunny Rain
Overcast
Humidity | Wind
Yes
X
High Normal Strong Weak
%
No Yes No Yes
Classification using CART algorithm
Classification using CART is similar to it. But instead of entropy, we use Gini
impurity.
So as the first step we will find the root node of our decision tree. For that
Calculate the Gini index of the class variable
Gini(S) = 1 - [(9/14)* + (5/14)*] = 0.4591
As the next step, we will calculate the Gini gain. For that first, we will find
the average weighted Gini impurity of Outlook, Temperature, Humidity, and
Windy.
First, consider case of Outlook
htpssimecium,datadrveninvestorcomidecsion-ree-algort-wi-hands-or-example-eBc2afb40u38 0173727124, 12:46 PM Decision Tree Algorithm With Hands-On Example | by Arun Mohan | DataDrveninvestor
play
yes no total
sunny 3 2 5
Outlook overcast 4 o 4
rainy 2 3 5
Gini(S, outlook) = (5/14)gini(3,2) + (4/14)*gini(4,0)+ (5/14)*gini(2,3) = (6/14)(1 -
(3/5)? - (2/5)*) + (4/14)*0 + (5/14)(1 - (2/5)? - (3/5)*)= 0.171+0+0.171 = 0.342
Gini gain (S, outlook) = 0.459 - 0.342 = 0.117
Gini gain(S, Temperature) = 0.459 - 0.4405 = 0.0185
Gini gain(S, Humidity) = 0.459 - 0.3674 = 0.0916
Gini gain(S, windy) = 0.459 - 0.4286 = 0.0304
Choose one that has a higher Gini gain. Gini gain is higher for outlook. So we
can choose it as our root node.
Now you have got an idea of how to proceed further. Repeat the same steps
we used in the ID3 algorithm.
Advantages and disadvantages of decision trees
Advantages:
1. Decision trees are super interpretable
2. Require little data preprocessing
3. Suitable for low latency applications
htpssimecium,datadrveninvestorcomidecsion-ree-algort-wi-hands-or-example-eBc2afb40u38 wr3727124, 12:46 PM Decision Tree Algorithm Wih Hands-On Example | by Arun Mohan | DataDrveninvestor
Disadvantages:
1. More likely to overfit noisy data. The probability of overfitting on noise
increases as a tree gets deeper. A solution for it is pruning. You can read
more about pruning from my Kaggle notebook. Another way to avoid
overfitting is to use bagging techniques like Random Forest. You can read
more about Random Forest from an article from neptune.ai.
References:
+ hittps:/www.saedsayad,.com/decision treehtm
* Applied-ai course
htpssimecium,datadrveninvestorcomidecsion-ree-algort-wi-hands-or-example-eBc2afb40u38
yanr3727124, 12:46 PM Decision Tree Algorithm Wih Hands-On Example | by Arun Mohan | DataDrveninvestor
Powered by
Machine Learning Data Science Decision Tree Al Python
-ntps:Imecium datacriveninvestor comidecisionree.algorit-wit-hands-on-example-e6cZafb40u38 ssi7Written by Arun Mohan
172Followers « Writer for DataDriveninvestor
Machine Learning | Al
More from Arun Mohan and DataDriveninvestor
® Aunttohan
Understanding Distil BERT In Depth
Distil Bert was introduced in paper DistiIBERT,
a distilled version of BERT: smaller,faster,
6minread » Nov 28,2022
Ho Q a
:
Desiree Peralta in DataDrivenivestor
Decision Tree Algorithm Wih Hands-On Example | by Arun Mohan | DataDrveninvestor
O
2
10%
rate of emitting
‘raining data
argent: 50% lo
‘itath 150% nee
0.0%
® pevansh in DataDriveninvestor
Google extracted ChatGPT’s
Training Data using a silly t!
Scalable Extraction of Training Data from
(Production) Language Models
18min read « Jan 8,2024
Q2
a
& 3k
@ ArunMohan in Datadriventnvestor
-ntps:Imecium datacriveninvestor comidecisionree.algorit-wit-hands-on-example-e6cZafb40u38 sainr3727124, 12:46 PM Decision Tree Algorithm Wih Hands-On Example | by Arun Mohan | DataDrveninvestor
If |Woke Up With Zero Money K- Fold Cross Validation For
Tomorrow, This Is How | Would... Parameter Tuning
Once you have the knowledge and the In this article | will explain about K- fold cross
discipline, you can create wealth over and... validation which is mainly used for hyper...
+ Tmin read + Mar1,2024 Sminread ~ Jan 26,2019
SH 23K Q 38 Li &ss6 Qs ct
See all from Arun Mohan See all from DataDriveninvestor
Recommended from Medium
-ntps:Imecium datacriveninvestor comidecisionree.algorit-wit-hands-on-example-e6cZafb40u38 ssi73727124, 12:46 PM
@ potamapu in Towards Data Science
Decision Trees for Classification—
Complete Example
A detailed example how to construct a
Decision Tree for classification
@minread - Jan‘,2028
S28 Q3 tt
Lists
Predictive Modeling w/
Python
20 stories . 1036 saves
Coding & Development
stories
522 saves
Decision Tree Algorithm Wih Hands-On Example | by Arun Mohan | DataDrveninvestor
@ Enozeren
Bui g a Decision Tree From
Scratch with Python
Decision Trees are machine learning
algorithms used for classification and.
8minread - Oct 13,2023
Se Qi
rh
Practical Guides to Machine
Learning
10stories
1239 saves
Natural Language Processing
1320 stories
el
812 saves
-ntps:Imecium datacriveninvestor comidecisionree.algorit-wit-hands-on-example-e6cZafb40u38 1673727124, 12:46 PM
@ Thesu
DECISION TREES
In this project, we implement a decision tree
from scratch and apply it to the task of.
2minread » Sdaysago
© Q aw
essa Tee
shold accept re
bot
Hy
i
© Kesunissanayake in Towards Dov
Machine Learning Algorithms(8)—
Decision Tree Algorithm
In this article, | will focus on discussing the
purpose of decision trees. A decision tree is...
14min read - Nov23, 2028
Su Q W
‘See more recommendations
Decision Tree Algorithm Wih Hands-On Example | by Arun Mohan | DataDrveninvestor
© code Thuto
Most Important Interview Question
of Dee n Tree Algorithm
The Decision Tree algorithm is a type of
supervised learning algorithm (having a pre-..
14Aminread + Oct20,2023
® Q ot
@ Rajasekhar
Decision Tree— Anti Intelligence
Algorithm
Enough of drawing these lines to predict the
magic, I give up I!
2minread - Mar6,2024
4 Q Wi
-ntps:Imecium datacriveninvestor comidecisionree.algorit-wit-hands-on-example-e6cZafb40u38 amir