Building a Decision Tree Using the ID3 Algorithm
ID3 (Iterative Dichotomiser 3) is a popular algorithm for constructing a decision tree. It is
based on entropy and information gain to find the best splits in the data.
Steps to Build a Decision Tree Using ID3
1. Define the Problem
Decide on the classification goal.
Example: Predict if a person will play outside based on weather conditions.
Weather Temperature Humidity Windy Play Outside?
Sunny Hot High No No
Sunny Hot High Yes No
Overcast Hot High No Yes
Rainy Mild High No Yes
Rainy Cool Normal No Yes
Rainy Cool Normal Yes No
Overcast Cool Normal Yes Yes
Sunny Mild High No No
Sunny Cool Normal No Yes
Rainy Mild Normal No Yes
Sunny Mild Normal Yes Yes
Overcast Mild High Yes Yes
Overcast Hot Normal No Yes
Rainy Mild High Yes No
2. Calculate Entropy (Impurity Measure)
3. Compute Information Gain for Each Feature
Information Gain (IG) tells us which attribute best splits the data:
Calculate Entropy for each attribute (Weather, Temperature, Humidity, Windy).
Subtract from the parent entropy.
The attribute with the highest IG becomes the root node.
3.1 Information Gain for Weather
We divide the dataset based on Weather (Sunny, Overcast, Rainy).
Entropy for "Sunny" Subset
Weather Play Outside?
Sunny No
Sunny No
Sunny No
Sunny Yes
Sunny Yes
Entropy for "Overcast" Subset
Weather Play Outside?
Overcast Yes
Overcast Yes
Overcast Yes
Overcast Yes
All Yes, so entropy = 0.
Entropy for "Rainy" Subset
Weather Play Outside?
Rainy Yes
Rainy Yes
Rainy No
Rainy Yes
Rainy No
3.2 Calculate Information Gain for Other Attributes
Using the same method, we find:
IG(Temperature) = 0.029
IG(Humidity) = 0.151
IG(Windy) = 0.048
Since Weather has the highest Information Gain (0.247), we split the tree on Weather.
4. Choose the Best Attribute & Split Data
Suppose Weather has the highest Information Gain, we split on it:
For Sunny, we continue splitting based on Humidity (which has the next highest
Information Gain).
Continue splitting Sunny and Rainy nodes using the same entropy & information gain
calculations.
Stop splitting when all data in a node belongs to one class.
Step 5: Continue Splitting Until Leaf Nodes
We repeat the process for the subsets (Sunny, Rainy) using the same entropy and information
gain calculations.
The final decision tree will look like:
Once the splitting is done, the final decision tree looks like this:
6. Make Predictions
If Weather = Overcast, then Play = Yes.
If Weather = Sunny:
If Humidity = High, then Play = No.
If Humidity = Normal, then Play = Yes.
If Weather = Rainy:
If Windy = No, then Play = Yes.
If Windy = Yes, then Play = No.
Advantages of ID3
✅ Simple and easy to understand
✅ Works well with categorical data
✅ Produces a human-readable decision tree
Building a Decision Tree Using CART (Classification and
Regression Trees) for the Given Dataset
CART (Classification and Regression Trees) is a decision tree algorithm that uses Gini
Impurity instead of entropy to select the best attribute for splitting the data.
Step 1: Define the Dataset
We use the same dataset as before:
Weather Temperature Humidity Windy Play Outside?
Sunny Hot High No No
Sunny Hot High Yes No
Overcast Hot High No Yes
Rainy Mild High No Yes
Rainy Cool Normal No Yes
Rainy Cool Normal Yes No
Overcast Cool Normal Yes Yes
Sunny Mild High No No
Sunny Cool Normal No Yes
Rainy Mild Normal No Yes
Sunny Mild Normal Yes Yes
Overcast Mild High Yes Yes
Overcast Hot Normal No Yes
Rainy Mild High Yes No
Step 2: Compute Gini Impurity for the Dataset
Gini Impurity Formula:
Step 3: Compute Gini for Each Attribute
We calculate Gini impurity for each attribute to find the best split.
Gini for Weather
Weather = Sunny
Weather Play Outside?
Sunny No
Sunny No
Sunny No
Sunny Yes
Sunny Yes
Yes = 2, No = 3
Weather = Overcast
Weather Play Outside?
Overcast Yes
Overcast Yes
Overcast Yes
Overcast Yes
All Yes, so Gini = 0.
Weather = Rainy
Weather Play Outside?
Rainy Yes
Rainy Yes
Rainy No
Rainy Yes
Rainy No
Yes = 3, No = 2
Gini for Humidity
Humidity = High
Humidity Play Outside?
High No
High No
High Yes
High Yes
High No
Yes = 2, No = 3
Humidity = Normal
Humidity Play Outside?
Normal Yes
Normal Yes
Normal Yes
Normal Yes
Normal No
Yes = 4, No = 1
Step 4: Select the Best Split
Comparing the Gini values:
Gini(Weather) = 0.343
Gini(Humidity) = 0.40
Since Weather has the lowest Gini, we split on Weather first.
Step 5: Build the Decision Tree
Step 6: Decision Rules
1. If Weather = Overcast, then Play = Yes.
2. If Weather = Sunny:
o If Humidity = High, then Play = No.
o If Humidity = Normal, then Play = Yes.
3. If Weather = Rainy:
o If Windy = No, then Play = Yes.
o If Windy = Yes, then Play = No.
Conclusion
CART uses Gini Impurity instead of Entropy to build the tree.
The best split was Weather, and we continued splitting until we reached pure nodes.
This tree can now be used for making predictions.