[go: up one dir, main page]

0% found this document useful (0 votes)
12 views21 pages

Lecture 5 DecisionTree

The document is a lecture on Decision Trees, covering their definition, algorithm workings, and attribute selection measures like Information Gain and Gini Index. It includes examples illustrating how to classify data and discusses the potential issue of overfitting in Decision Trees. Additionally, it mentions Python implementations for practical applications.

Uploaded by

abrham keraleme
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views21 pages

Lecture 5 DecisionTree

The document is a lecture on Decision Trees, covering their definition, algorithm workings, and attribute selection measures like Information Gain and Gini Index. It includes examples illustrating how to classify data and discusses the potential issue of overfitting in Decision Trees. Additionally, it mentions Python implementations for practical applications.

Uploaded by

abrham keraleme
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

DecisionTree

Lecture 5

Salahadin Seid

School of Data Science


Emerland University

September 1, 2024

Salahadin Seid School of Data Science Emerland University


DecisionTree 1 / 21
Outline

• What is Decision Tree?


• Example
• How Does the Decision Tree Algorithm Work?
• Attribute Selection Measures
• Information Gain
• Gini-index
• Python Implementations

Salahadin Seid School of Data Science Emerland University


DecisionTree 2 / 21
What is Decision trees?

• Decision trees are a another popular ML algorithm that can


be used for both regression and classification tasks.
• Decision trees make predictions by recursively splitting on
different attributes according to a tree structure.
• It has a hierarchical tree structure consisting of a root node,
branches, internal nodes, and leaf nodes.
• They are easy to understand, interpret, and implement,
making them an ideal choice for beginners in the field of ML.

Salahadin Seid School of Data Science Emerland University


DecisionTree 3 / 21
Example

• Example: classifying fruit as an orange or lemon based on


height and width

Salahadin Seid School of Data Science Emerland University


DecisionTree 4 / 21
Example ...

Salahadin Seid School of Data Science Emerland University


DecisionTree 5 / 21
Example ...
• For continuous attributes, split based on less than or greater than
some threshold
• Thus, input space is divided into regions with boundaries parallel to
axes

Salahadin Seid School of Data Science Emerland University


DecisionTree 6 / 21
How Does the Decision Tree Algorithm Work?

The basic idea behind any decision tree algorithm is as follows:


1 Select the best attribute using Attribute Selection Measures
(ASM) to split the records.
2 Make that attribute a decision node and breaks the dataset
into smaller subsets.
3 Start tree building by repeating this process recursively for
each child until one of the conditions will match:
• All the tuples belong to the same attribute value.
• There are no more remaining attributes.
• There are no more instances.

Salahadin Seid School of Data Science Emerland University


DecisionTree 7 / 21
Attribute Selection Measures

• Attribute selection measure (ASM) , aka, splitting rules-


a heuristic for selecting the splitting criterion that partitions
data in the best possible manner.
• ASM provides a rank to each feature (or attribute) by
explaining the given dataset.
• The best score attribute will be selected as a splitting
attribute.
• The most popular selection measures are Information Gain,
and Gini Index.

Salahadin Seid School of Data Science Emerland University


DecisionTree 8 / 21
Information Gain

• Information gain is a measure used to determine which


feature should be used to split the data at each internal node
of the decision tree. It is calculated using entropy.
• Entropy is a metric to measure the impurity in a given
attribute. It specifies randomness in data.
• In a decision tree, the goal is to decrease the entropy of the
dataset by creating more pure subsets of data.
• Since entropy is a measure of impurity, by decreasing the
entropy, we are increasing the purity of the data.

Salahadin Seid School of Data Science Emerland University


DecisionTree 9 / 21
Information Gain

Entropy

H(x ) = − p(x )log2 p(x )
x ∈X

Pi is the probability of randomly selecting an example in class i.


Dataset made up of 3 colours; red, purple, and yellow - equation
becomes: H(x ) = −(pr log2 pr + pp log2 pp + py log2 py )

Salahadin Seid School of Data Science Emerland University


DecisionTree 10 / 21
Information Gain...

Information Gain
∑ sv
Gain(S, A) = Entropy (S) − ∗ Entropy (sv )
s
InformationGain = entropy (parent) − [weightedAverage] ∗ entropy (children)

Salahadin Seid School of Data Science Emerland University


DecisionTree 11 / 21
Information Gain... example

• A decision tree to predict whether a loan given to a person


would result in a write-off or not.
• Our entire population consists of 30 instances.
• 16 belong to the write-off class and
• other 14 belong to the non-write-off class.
• We have two features, namely
• Balance that can take on two values -> < 50K or >50K
• Residence that can take on three values -> OWN, RENT or
OTHER.

Salahadin Seid School of Data Science Emerland University


DecisionTree 12 / 21
Information Gain... example

• Feature 1: Balance
• Balance < 50K : Sub total: 13 (12 write-off, 1 not)
• Balance >= 50K : Sub total: 17 (4 write-off, 13 not)
E (parent) = − 16
30 log2 30 − 30 log2 30 = 0.99
16 14 14

E (Blance < 50K ) = − 12 13 log2 13 − 13 log2 13 = 0.39


12 1 1

E (Blance >= 50K ) = − 17 log2 17 − 17 log2 13


4 4 13
17 = 0.79
Weighted Average of entropy:
30 ∗ 0.39 + 30 ∗ 0.79 = 0.62
E (Blance) = 13 17

Information Gain
IG(Parent, Blance) = E (Parent) − E (Blance) = 0.99 − 0.62 = 0.37

Salahadin Seid School of Data Science Emerland University


DecisionTree 13 / 21
Information Gain... example

• Feature 2: Residence
• Residence = own : Sub total: 8 (7 write-off, 1 not)
• Residence= rent : Sub total: 10 (4 write-off, 6 not)
• Residence= other : Sub total: 12 (5 write-off, 7 not)
E (parent) = − 16
30 log2 30 − 30 log2 30 = 0.99
16 14 14

E (Residence = own) = − 8 log2 8 − 18 log2 18 = 0.54


7 7

E (Residence = rent) = − 10 4 4
log2 10 − 10
6 6
log2 10 = 0.97
E (Residence = other ) = − 12 5 5
log2 12 − 12
7 7
log2 12 = 0.98
Weighted Average of entropy:
E (Residence) = 308
∗ 0.54 + 1030 ∗ 0.97 + 30 ∗ 0.98 = 0.86
12

Information Gain IG(Parent, Residence) =


E (Parent) − E (Residence) = 0.99 − 0.86 = 0.13
Information gain from feature, Balance is almost 3 times more than the
information gain from Residence - it means that the feature(Balance) with the
higher information gain (0.37) is more informative and should be used to split
the data at the next internal node.
Salahadin Seid School of Data Science Emerland University
DecisionTree 14 / 21
Gini Index

• Gini Index is also known as Gini impurity - It is measure of


how mixed or impure a dataset is.
• Gini impurity ranges between 0 and 1, where 0 represents a
pure dataset and 1 represents a completely impure dataset.
• Pure dataset - all the samples belong to the same class or
category.
• Impure dataset - contains a mixture of different classes or
categories.

Salahadin Seid School of Data Science Emerland University


DecisionTree 15 / 21
Gini Index


Gini_impurity = 1 − p(i)2

where p(i) is the probability of a specific class and the summation


is done for all classes present in the dataset.

Lets consider a toy dataset with two classes Yes and No and the
following class probabilities: p(Yes) = 0.3 and p(No) = 0.7
Gini_impurity = 1 − (0.3)2 − (0.7)2 = 0.45

Salahadin Seid School of Data Science Emerland University


DecisionTree 16 / 21
Gini Index - Example

Consider a toy dataset with the following features and class labels:

• Target class label is Buys_insurance and it can take two


values Yes or No.
• We want to determine the best feature to use as the root
node for the decision tree.
• To do this, we will calculate the Gini impurity for each feature
and select the feature with the lowest Gini impurity.
Salahadin Seid School of Data Science Emerland University
DecisionTree 17 / 21
Gini Index - Example

Calculate the Gini impurity for each feature formula: Gini_impurity =


1 − p(Feature = value1)2 − p(Feature = value2)2 − ...p(Feature = valueN)2
1 AGE Gini impurity: = 1 − p(age = 20)2 − p(age = 25)2 − ...p(age = 45)2
Gini_impurity = 1 − (1/6)2 − (1/6)2 − (1/6)2 ... − (1/6)2 = 1
2 Gender Gini impurity: = 1 − p(Gender = male)2 − p(Gender = Female)2
Gini_impurity = 1 − (3/6)2 − (3/6)2 = 0.5
3 Income Gini impurity:
= 1 − p(Income = high)2 − p(Income = Medium)2 − p(Income = low )2
Gini_impurity = 1 − (3/6)2 − (2/6)2 − (1/6)2 = 0.612
4 Credit Score Gini impurity:
= 1 − p(cs = Excellent)2 − p(cs = fair )2 − p(cs = poor )2
Gini_impurity = 1 − (3/6)2 − (2/6)2 − (1/6)2 = 0.612
From the above calculations, we can see that the feature Gender have the
lowest Gini impurity of 0.5. So, we can select gender as the root node for the
decision tree.

Salahadin Seid School of Data Science Emerland University


DecisionTree 18 / 21
Decision Trees - over-fitting issue

• Decision Trees are prone to over-fitting.


• Overfitting in decision trees occurs when the tree becomes too
complex and captures the noise in the training data, rather
than the underlying pattern.
• A decision tree will always overfit the training data if we allow
it to grow to its max depth.
• This can lead to poor generalization performance on new
unseen data

Salahadin Seid School of Data Science Emerland University


DecisionTree 19 / 21
Python Implementations

1 entropy example: entropy_iris.py


2 Training an Unpruned Decision Tree: dt_iris_without.py
• Accuracy on the training set: 1.0
• Accuracy on the testing set: 0.9333
3 Using pruning - removing branches that do not provide much
information gain or that are not necessary for the tree to make
accurate predictions: dt_iris_with_pruning.py

Salahadin Seid School of Data Science Emerland University


DecisionTree 20 / 21
Thanks For Your Attention!
Any questions?

Salahadin Seid School of Data Science Emerland University


DecisionTree 21 / 21

You might also like