[go: up one dir, main page]

0% found this document useful (0 votes)
100 views43 pages

Decision Tree

The document discusses decision trees, which are flowchart-like structures used for classification and regression. Decision trees split a dataset into smaller and smaller subsets based on attribute values, with each split represented as a node in the tree. The top node is the root node, internal nodes are split points, and terminal nodes or leaf nodes represent class labels. Algorithms like ID3 and C4.5 use a greedy approach to build decision trees by selecting the attribute that best splits the data at each node. Information gain and the Gini index are common metrics used to determine the best splitting attributes.

Uploaded by

Kimhorng Horn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
100 views43 pages

Decision Tree

The document discusses decision trees, which are flowchart-like structures used for classification and regression. Decision trees split a dataset into smaller and smaller subsets based on attribute values, with each split represented as a node in the tree. The top node is the root node, internal nodes are split points, and terminal nodes or leaf nodes represent class labels. Algorithms like ID3 and C4.5 use a greedy approach to build decision trees by selecting the attribute that best splits the data at each node. Information gain and the Gini index are common metrics used to determine the best splitting attributes.

Uploaded by

Kimhorng Horn
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 43

Decision Tree

Sudhir Shenai
Content Outline
• What is the purpose of Decision Tree?
• What is Decision Tree?
• How to build Decision Tree?
• How to optimize Decision Tree?
What is the purpose of Decision Tree?
• To create data models that will predict class labels or values for the
decision-making process.

• A Decision tree can be a classification or regression models

• Using a decision tree, we can visualize the decisions that make it easy
to understand
What is Decision Tree?
• A decision tree is a flowchart tree-like structure that is made from
training set tuples.

• The dataset is broken down into smaller subsets and is present in the
form of nodes of a tree.

• Some decision trees only have binary nodes, that means exactly two
branches of a node, while some decision trees are non-binary.
An example of a decision tree with the dataset
Structure of Decision Tree
• The tree structure has a root node, internal nodes or decision nodes,
leaf node, and branches.
• The root node is the topmost node. It represents the best attribute
selected for classification.
• Internal nodes of the decision nodes represent a test of an attribute of
the dataset
• Leaf node or Terminal node which represents the classification or
decision label.
• The branches show the outcome of the test performed
The image below shows the decision tree for the Titanic
dataset to predict whether the passenger will survive or not.
How does a Decision Tree work?
• A decision tree is a supervised learning algorithm that works for both
discrete and continuous variables.
•  It splits the dataset into subsets on the basis of the most significant
attribute in the dataset.
• How the decision tree identifies this attribute and how this splitting is
done is decided by the algorithms.
• The most significant predictor is designated as the root node, splitting
is done to form sub-nodes called decision nodes, and the nodes which
do not split further are terminal or leaf nodes.
• In the decision tree, the dataset is divided into homogeneous and non-
overlapping regions.
• It follows a top-down approach as the top region presents all the
observations at a single place which splits into two or more branches
that further split.
• This approach is also called a greedy approach as it only considers the
current node between the worked on without focusing on the future
nodes.
• The decision tree algorithms will continue running until a stop criteria
such as the minimum number of observations etc. is reached.
• Once a decision tree is built, many nodes may represent outliers or
noisy data.
• Tree pruning method is applied to remove unwanted data. This, in
turn, improves the accuracy of the classification model.
• To find the accuracy of the model, a test set consisting of test tuples
and class labels is used.
• The percentages of the test set tuples are correctly classified by the
model to identify the accuracy of the model.
• If the model is found to be accurate then it is used to classify the data
tuples for which the class labels are not known.
General approach for building “decision tree
-classification model”
Example of Creating a Decision Tree
#1) Learning Step: The training data is fed into the system to be
analyzed by a classification algorithm. In this example, the class label is
the attribute i.e. “loan decision”. The model built from this training data
is represented in the form of decision rules.
#2) Classification: Test dataset are fed to the model to check the
accuracy of the classification rule. If the model gives acceptable results
then it is applied to a new dataset with unknown class variables.
Classification Model
Some of the decision tree algorithms are
Hunt’s Algorithm
ID3
CD4.5
CART
Decision Tree Algorithm :CART
• CART model i.e. Classification and Regression Models is a decision
tree algorithm for building models. Decision Tree model where the
target values have a discrete nature is called classification models.

• A discrete value is a finite or countably infinite set of values, For


Example, age, size, etc. The models where the target values are
represented by continuous values are usually numbers that are called
Regression Models. Continuous variables are floating-point variables.
These two models together are called CART.
Decision Tree Induction Algorithm
1.Decision tree
induction is the
method of learning
the decision trees
from the training set

2. The training set


consists of
attributes and class
labels.
Decision Tree Induction for Machine Learning: ID3

• In the late 1970s and early 1980s, J.Ross Quinlan was a researcher
who built a decision tree algorithm for machine learning. This
algorithm is known as ID3, Iterative Dichotomiser. This algorithm
was an extension of the concept learning systems described by E.B
Hunt, J, and Marin.

• ID3 later came to be known as C4.5. ID3 and C4.5 follow a greedy
top-down approach for constructing decision trees. The algorithm
starts with a training dataset with class labels that are portioned into
smaller subsets as the tree is being constructed.
Decision Tree Induction for Machine Learning:
ID3
#1) Initially, there are three parameters i.e. attribute list, attribute
selection method and data partition. The attribute list describes the
attributes of the training set tuples.
#2) The attribute selection method describes the method for selecting
the best attribute for discrimination among tuples. The methods used
for attribute selection can either be Information Gain or Gini Index.
#3) The structure of the tree (binary or non-binary) is decided by the
attribute selection method.
#4) When constructing a decision tree, it starts as a single node
representing the tuples.
Decision Tree Induction for Machine Learning:
ID3
#5) If the root node tuples represent different class labels, then it calls
an attribute selection method to split or partition the tuples. The step
will lead to the formation of branches and decision nodes.

#6) The splitting method will determine which attribute should be


selected to partition the data tuples. It also determines the branches to be
grown from the node according to the test outcome. The main motive of
the splitting criteria is that the partition at each branch of the decision
tree should represent the same class label.
An example of splitting attribute is shown below:

The figure above is discrete-valued The figure above is continuous-valued


Decision Tree Induction for Machine Learning:
ID3
• #7) The above partitioning steps are followed recursively to form a
decision tree for the training dataset tuples.
• #8) The portioning stops only when either all the partitions are made
or when the remaining tuples cannot be partitioned further.
• #9) The complexity of the algorithm is described by n * |D| * log |
D| where n is the number of attributes in training dataset D and |D| is
the number of tuples.
What is Greedy Recursive Binary Splitting?
• In the binary splitting method, the tuples are split and each split cost
function is calculated. The lowest cost split is selected.
• The splitting method is binary which is formed as 2 branches.
• It is recursive as the same method (calculating the cost) is used for
splitting the other tuples of the dataset.
• This algorithm is called as greedy as it focuses only on the current
node. It focuses on lowering its cost, while the other nodes are
ignored.
How to select attributes for creating a Tree?
• Attribute selection measures are also called splitting rules to decide
how the tuples are going to split.
• The splitting criteria are used to best partition the dataset.
• These measures provide a ranking to the attributes for partitioning the
training tuples.
• Popular methods of selecting attribute.
1. Information gain
2. Gain Ratio
3. Gini Index
#1) Information Gain : This method is the main method that is used to build decision trees. It reduces the
information that is required to classify the tuples. It reduces the number of tests that are needed to classify the
given tuple. The attribute with the highest information gain is selected. The original information needed for
classification of a tuple in dataset D is given by:

Where p is the probability that the tuple belongs to class


C. The information is encoded in bits, therefore, log to the
base 2 is used. E(s) represents the average amount of
information required to find out the class label of dataset
D. This information gain is also called Entropy.
The information required for exact classification after
portioning. Where P (c) is the weight of partition. This
information represents the information needed to classify
the dataset D on portioning by X.
Information gain is the difference between the
original and expected information that is required to
classify the tuples of dataset D. Gain is the reduction
of information that is required by knowing the value
of X. The attribute with the highest information gain
is chosen as “best”.
#2) Gain Ratio
Information gain might sometimes result in portioning useless
for classification. However, the Gain ratio splits the training
data set into partitions and considers the number of tuples of the
outcome with respect to the total tuples. The attribute with the
max gain ratio is used as a splitting attribute.
#3) Gini Index
Gini Index is calculated for binary variables only. It measures
the impurity in training tuples of dataset D, where P is the
probability that tuple belongs to class C.

The Gini index that is calculated for binary split dataset D by


attribute A, where n is the nth partition of the dataset D.
The reduction in impurity is given by the difference of the Gini
index of the original dataset D and Gini index after partition by
attribute A. The maximum reduction in impurity or max Gini
index is selected as the best attribute for splitting.
Overfitting in Decision Trees
• Overfitting happens when a decision tree tries to be as perfect as
possible by increasing the depth of tests and thereby reduces the error.
• This results in very complex trees and leads to overfitting.
• Overfitting reduces the predictive nature of the decision tree.
• The approaches to avoid overfitting of the trees include pre pruning
and post pruning.
What Is Tree Pruning?
• Pruning is the method of removing the unused branches from the
decision tree.
• Some branches of the decision tree might represent outliers or noisy
data.
• Tree pruning is the method to reduce the unwanted branches of the tree.
• This will reduce the complexity of the tree and help in effective
predictive analysis.
• It reduces the overfitting as it removes the unimportant branches from
the trees.
• There are two ways of tree pruning : #1 Pre-Pruning #2 Post-Pruning
Unpruned and Pruned Tree
#1) Pre-pruning:
• In this approach, the construction of the decision tree is stopped early.
It means it is decided not to further partition the branches. The last
node constructed becomes the leaf node and this leaf node may hold
the most frequent class among the tuples.
• The attribute selection measures are used to find out the weightage of
the split. Threshold values are prescribed to decide which splits are
regarded as useful. If the portioning of the node results in splitting by
falling below threshold then the process is halted
#2) Post-pruning:
• This method removes the outlier branches from a fully grown tree. The
unwanted branches are removed and replaced by a leaf node denoting the
most frequent class label. This technique requires more computation than
pre-pruning, however, it is more reliable.
• The pruned trees are more precise and compact when compared to
unpruned trees but they carry a disadvantage of replication and
repetition.
• Repetition occurs when the same attribute is tested again and again along
a branch of a tree. Replication occurs when the duplicate subtrees are
present within the tree. These issues can be solved by multivariate splits.
What Is Predictive Modelling?

• The classification models can be used to predict the outcomes of an


unknown set of attributes.
• When a dataset with unknown class labels is fed into the model, then it
will automatically assign the class label to it. This method of applying
probability to predict outcomes is called predictive modeling.
Advantages Of Decision Tree Classification

1.Decision tree classification does not require any domain knowledge,


hence, it is appropriate for the knowledge discovery process.
2.The representation of data in the form of the tree is easily understood
by humans and it is intuitive.
3.It can handle multidimensional data.
4.It is a quick process with great accuracy.
Disadvantages Of Decision Tree
Classification
1.Sometimes decision trees become very complex and these are called
overfitted trees.
2.The decision tree algorithm may not be an optimal solution.
3.The decision trees may return a biased solution if some class label
dominates it.
Conclusion
• Decision Trees are data mining techniques for classification and
regression analysis.
• This technique is now spanning over many areas like medical
diagnosis, target marketing, etc. These trees are constructed by
following an algorithm such as ID3, CART. These algorithms find
different ways to split the data into partitions.
• It is the most widely known supervised learning technique that is used
in machine learning and pattern analysis. The decision trees predict the
values of the target variable by building models through learning from
the training set provided to the system.
Example
Constructing a Decision Tree
Let us take an example of the last 10 days weather dataset with attributes outlook,
temperature, wind, and humidity. The outcome variable will be playing cricket or
not. We will use the ID3 algorithm to build the decision tree
Day Outlook Temperature Humidity Wind Play cricket

1 Sunny Hot High Weak No

2 Sunny Hot High Strong No

3 Overcast Hot High Weak Yes

4 Rain Mild High Weak Yes

5 Rain Cool Normal Weak Yes

6 Rain Cool Normal Strong No

7 Overcast Cool Normal Strong Yes

8 Sunny Mild High Weak No

9 Sunny Cool Normal Weak Yes

10 Rain Mild Normal Weak Yes

11 Sunny Mild Normal Strong Yes

12 Overcast Mild High Strong Yes

13 Overcast Hot Normal Weak Yes

14 Rain Mild High Strong No


• Step1: The first step will be to create a root node.
• Step2: If all results are yes, then the leaf node “yes” will be returned
else the leaf node “no” will be returned.
• Step3: Find out the Entropy of all observations and entropy with
attribute “x” that is E(S) and E(S, x).
• Step4: Find out the information gain and select the attribute with high
information gain.
• Step5: Repeat the above steps until all attributes are covered.

Calculation of Entropy:
Yes                           No
9                                5
If entropy is zero, it means that all members belong to the same class and if
entropy is one then it means that half of the tuples belong to one class and
one of them belong to other class. 0.94 means fair distribution.
• Find the information gain attribute which gives maximum information gain.
• For Example “Wind”, it takes two values: Strong and Weak, therefore, x =
{Strong, Weak}.
• Find out H(x), P(x) for x =weak and x= strong. H(S) is already calculated
above.
• Weak= 8
• Strong= 8
• For “weak” wind, 6 of them say “Yes” to play cricket and 2 of them say “No”. So
entropy will be:

• For “strong” wind, 3 said “No” to play cricket and 3 said “Yes”.

• This shows perfect randomness as half items belong to one class and the remaining half
belong to others.
Calculate the information gain: Similarly the
information gain for other attributes is:

• The attribute outlook has the highest information gain of 0.246, thus


it is chosen as root
• Overcast has 3 values: Sunny, Overcast and Rain. Overcast with play
cricket is always “Yes”. So it ends up with a leaf node, “yes”. For the
other values “Sunny” and “Rain”.
• Table for Outlook as “Sunny” will be:
Temperature Humidity Wind Golf

Hot High Weak No

Hot High Strong No

Mild High Weak No

Cool Normal Weak Yes

Mild Normal Strong Yes


• Entropy for “Outlook” “Sunny” is:

• Information gain for attributes with respect to Sunny is:

• The information gain for humidity is highest, therefore it is chosen as the next node.
Similarly, Entropy is calculated for Rain. Wind gives the highest information gain
The decision tree would look like below:

You might also like