Decision Tree
Decision Tree
Sudhir Shenai
Content Outline
• What is the purpose of Decision Tree?
• What is Decision Tree?
• How to build Decision Tree?
• How to optimize Decision Tree?
What is the purpose of Decision Tree?
• To create data models that will predict class labels or values for the
decision-making process.
• Using a decision tree, we can visualize the decisions that make it easy
to understand
What is Decision Tree?
• A decision tree is a flowchart tree-like structure that is made from
training set tuples.
• The dataset is broken down into smaller subsets and is present in the
form of nodes of a tree.
• Some decision trees only have binary nodes, that means exactly two
branches of a node, while some decision trees are non-binary.
An example of a decision tree with the dataset
Structure of Decision Tree
• The tree structure has a root node, internal nodes or decision nodes,
leaf node, and branches.
• The root node is the topmost node. It represents the best attribute
selected for classification.
• Internal nodes of the decision nodes represent a test of an attribute of
the dataset
• Leaf node or Terminal node which represents the classification or
decision label.
• The branches show the outcome of the test performed
The image below shows the decision tree for the Titanic
dataset to predict whether the passenger will survive or not.
How does a Decision Tree work?
• A decision tree is a supervised learning algorithm that works for both
discrete and continuous variables.
• It splits the dataset into subsets on the basis of the most significant
attribute in the dataset.
• How the decision tree identifies this attribute and how this splitting is
done is decided by the algorithms.
• The most significant predictor is designated as the root node, splitting
is done to form sub-nodes called decision nodes, and the nodes which
do not split further are terminal or leaf nodes.
• In the decision tree, the dataset is divided into homogeneous and non-
overlapping regions.
• It follows a top-down approach as the top region presents all the
observations at a single place which splits into two or more branches
that further split.
• This approach is also called a greedy approach as it only considers the
current node between the worked on without focusing on the future
nodes.
• The decision tree algorithms will continue running until a stop criteria
such as the minimum number of observations etc. is reached.
• Once a decision tree is built, many nodes may represent outliers or
noisy data.
• Tree pruning method is applied to remove unwanted data. This, in
turn, improves the accuracy of the classification model.
• To find the accuracy of the model, a test set consisting of test tuples
and class labels is used.
• The percentages of the test set tuples are correctly classified by the
model to identify the accuracy of the model.
• If the model is found to be accurate then it is used to classify the data
tuples for which the class labels are not known.
General approach for building “decision tree
-classification model”
Example of Creating a Decision Tree
#1) Learning Step: The training data is fed into the system to be
analyzed by a classification algorithm. In this example, the class label is
the attribute i.e. “loan decision”. The model built from this training data
is represented in the form of decision rules.
#2) Classification: Test dataset are fed to the model to check the
accuracy of the classification rule. If the model gives acceptable results
then it is applied to a new dataset with unknown class variables.
Classification Model
Some of the decision tree algorithms are
Hunt’s Algorithm
ID3
CD4.5
CART
Decision Tree Algorithm :CART
• CART model i.e. Classification and Regression Models is a decision
tree algorithm for building models. Decision Tree model where the
target values have a discrete nature is called classification models.
• In the late 1970s and early 1980s, J.Ross Quinlan was a researcher
who built a decision tree algorithm for machine learning. This
algorithm is known as ID3, Iterative Dichotomiser. This algorithm
was an extension of the concept learning systems described by E.B
Hunt, J, and Marin.
• ID3 later came to be known as C4.5. ID3 and C4.5 follow a greedy
top-down approach for constructing decision trees. The algorithm
starts with a training dataset with class labels that are portioned into
smaller subsets as the tree is being constructed.
Decision Tree Induction for Machine Learning:
ID3
#1) Initially, there are three parameters i.e. attribute list, attribute
selection method and data partition. The attribute list describes the
attributes of the training set tuples.
#2) The attribute selection method describes the method for selecting
the best attribute for discrimination among tuples. The methods used
for attribute selection can either be Information Gain or Gini Index.
#3) The structure of the tree (binary or non-binary) is decided by the
attribute selection method.
#4) When constructing a decision tree, it starts as a single node
representing the tuples.
Decision Tree Induction for Machine Learning:
ID3
#5) If the root node tuples represent different class labels, then it calls
an attribute selection method to split or partition the tuples. The step
will lead to the formation of branches and decision nodes.
Calculation of Entropy:
Yes No
9 5
If entropy is zero, it means that all members belong to the same class and if
entropy is one then it means that half of the tuples belong to one class and
one of them belong to other class. 0.94 means fair distribution.
• Find the information gain attribute which gives maximum information gain.
• For Example “Wind”, it takes two values: Strong and Weak, therefore, x =
{Strong, Weak}.
• Find out H(x), P(x) for x =weak and x= strong. H(S) is already calculated
above.
• Weak= 8
• Strong= 8
• For “weak” wind, 6 of them say “Yes” to play cricket and 2 of them say “No”. So
entropy will be:
• For “strong” wind, 3 said “No” to play cricket and 3 said “Yes”.
• This shows perfect randomness as half items belong to one class and the remaining half
belong to others.
Calculate the information gain: Similarly the
information gain for other attributes is:
• The information gain for humidity is highest, therefore it is chosen as the next node.
Similarly, Entropy is calculated for Rain. Wind gives the highest information gain
The decision tree would look like below: