Entropy and
Information Grain
Entropy
Entropy is a measure of disorder or uncertainty
and the goal of machine learning models and
Data Scientists in general is to reduce
uncertainty.
High, Low Entropy
“High Entropy”
X is from a uniform like distribution
Flat histogram
Values sampled from it are less predictable
“Low Entropy”
X is from a varied (peaks and valleys) distribution
Histogram has many lows and highs
Values sampled from it are more predictable
Decision tree-classification
to build a decision tree, we need to calculate two types of
entropy using frequency tables as follows:
a) Entropy using the frequency table of one attribute:
Entropy
b) Entropy using the frequency table of two
attributes:
Information Gain
The information gain is based on the
decrease in entropy after a dataset is
split on an attribute. Constructing a
decision tree is all about finding attribute
that returns the highest information gain
(i.e., the most homogeneous branches).
Information gain
Step 1: Calculate
entropy of the target.
Information gain cont…
Step 2:
The dataset is then split on the different attributes. The entropy for
each branch is calculated.
Then it is added proportionally, to get total entropy for the split.
The resulting entropy is subtracted from the entropy before the split.
The result is the Information Gain, or decrease in entropy.
Information gain cont…
Information gain cont..
Step 3: Choose attribute with the largest
information gain as the decision node, divide the
dataset by its branches and repeat the same
process on every branch
Information gain cont..
Step 4a: A branch
with entropy of 0 is a
leaf node.
Information gain cont..
Step 4b: A branch with entropy
more than 0 needs further
splitting.
Information gain cont…
A decision tree can easily be transformed to a
set of rules by mapping from the root node to
the leaf nodes one by one.
Decision Trees
When do I play tennis?
Decision Tree
Is the decision tree correct?
Let’s check whether the split on Wind attribute is
correct.
We need to show that Wind attribute has the
highest information gain.
When do I play tennis?
Wind attribute – 5 records match
Note: calculate the entropy only on examples that got
“routed” in our branch of the tree (Outlook=Rain)
Calculation
Let
S = {D4, D5, D6, D10, D14}
Entropy:
H(S) = – 3/5log(3/5) – 2/5log(2/5) = 0.971
Information Gain
IG(S,Temp) = H(S) – H(S|Temp) = 0.01997
IG(S, Humidity) = H(S) – H(S|Humidity) = 0.01997
IG(S,Wind) = H(S) – H(S|Wind) = 0.971
Assignment #01
Imagine your own example for classification
Everyone should have different example
What will be the root node?
Make rules after finalizing the decision tree.
Calculate entropy and IG
Note:
23 rd
feb 2021 last date to submit.
No handwritten assignment will be accepted.
Copied assignment will be graded “0”.
No late submission will be accepted.