[go: up one dir, main page]

0% found this document useful (0 votes)
45 views8 pages

Unit IV - Decision Tree With ID3

The document outlines the process of building a decision tree using the ID3 and CART algorithms, focusing on the calculation of entropy and information gain for ID3, and Gini impurity for CART. It provides detailed steps for defining the problem, calculating splits, and making predictions based on weather conditions. The final decision trees from both algorithms are designed to predict whether a person will play outside based on various attributes.

Uploaded by

manodoke03
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
45 views8 pages

Unit IV - Decision Tree With ID3

The document outlines the process of building a decision tree using the ID3 and CART algorithms, focusing on the calculation of entropy and information gain for ID3, and Gini impurity for CART. It provides detailed steps for defining the problem, calculating splits, and making predictions based on weather conditions. The final decision trees from both algorithms are designed to predict whether a person will play outside based on various attributes.

Uploaded by

manodoke03
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Building a Decision Tree Using the ID3 Algorithm

ID3 (Iterative Dichotomiser 3) is a popular algorithm for constructing a decision tree. It is


based on entropy and information gain to find the best splits in the data.

Steps to Build a Decision Tree Using ID3

1. Define the Problem

Decide on the classification goal.

 Example: Predict if a person will play outside based on weather conditions.

Weather Temperature Humidity Windy Play Outside?


Sunny Hot High No No
Sunny Hot High Yes No
Overcast Hot High No Yes
Rainy Mild High No Yes
Rainy Cool Normal No Yes
Rainy Cool Normal Yes No
Overcast Cool Normal Yes Yes
Sunny Mild High No No
Sunny Cool Normal No Yes
Rainy Mild Normal No Yes
Sunny Mild Normal Yes Yes
Overcast Mild High Yes Yes
Overcast Hot Normal No Yes
Rainy Mild High Yes No

2. Calculate Entropy (Impurity Measure)


3. Compute Information Gain for Each Feature

Information Gain (IG) tells us which attribute best splits the data:

 Calculate Entropy for each attribute (Weather, Temperature, Humidity, Windy).


 Subtract from the parent entropy.
 The attribute with the highest IG becomes the root node.

3.1 Information Gain for Weather

We divide the dataset based on Weather (Sunny, Overcast, Rainy).

Entropy for "Sunny" Subset


Weather Play Outside?
Sunny No
Sunny No
Sunny No
Sunny Yes
Sunny Yes

Entropy for "Overcast" Subset


Weather Play Outside?
Overcast Yes
Overcast Yes
Overcast Yes
Overcast Yes

All Yes, so entropy = 0.


Entropy for "Rainy" Subset
Weather Play Outside?
Rainy Yes
Rainy Yes
Rainy No
Rainy Yes
Rainy No

3.2 Calculate Information Gain for Other Attributes

Using the same method, we find:

 IG(Temperature) = 0.029
 IG(Humidity) = 0.151
 IG(Windy) = 0.048

Since Weather has the highest Information Gain (0.247), we split the tree on Weather.

4. Choose the Best Attribute & Split Data

Suppose Weather has the highest Information Gain, we split on it:

 For Sunny, we continue splitting based on Humidity (which has the next highest
Information Gain).
 Continue splitting Sunny and Rainy nodes using the same entropy & information gain
calculations.

 Stop splitting when all data in a node belongs to one class.

Step 5: Continue Splitting Until Leaf Nodes

We repeat the process for the subsets (Sunny, Rainy) using the same entropy and information
gain calculations.

The final decision tree will look like:

Once the splitting is done, the final decision tree looks like this:

6. Make Predictions

 If Weather = Overcast, then Play = Yes.


 If Weather = Sunny:

 If Humidity = High, then Play = No.


 If Humidity = Normal, then Play = Yes.

 If Weather = Rainy:

 If Windy = No, then Play = Yes.


 If Windy = Yes, then Play = No.

Advantages of ID3

✅ Simple and easy to understand


✅ Works well with categorical data
✅ Produces a human-readable decision tree
Building a Decision Tree Using CART (Classification and
Regression Trees) for the Given Dataset
CART (Classification and Regression Trees) is a decision tree algorithm that uses Gini
Impurity instead of entropy to select the best attribute for splitting the data.

Step 1: Define the Dataset

We use the same dataset as before:

Weather Temperature Humidity Windy Play Outside?


Sunny Hot High No No
Sunny Hot High Yes No
Overcast Hot High No Yes
Rainy Mild High No Yes
Rainy Cool Normal No Yes
Rainy Cool Normal Yes No
Overcast Cool Normal Yes Yes
Sunny Mild High No No
Sunny Cool Normal No Yes
Rainy Mild Normal No Yes
Sunny Mild Normal Yes Yes
Overcast Mild High Yes Yes
Overcast Hot Normal No Yes
Rainy Mild High Yes No

Step 2: Compute Gini Impurity for the Dataset

Gini Impurity Formula:

Step 3: Compute Gini for Each Attribute

We calculate Gini impurity for each attribute to find the best split.
Gini for Weather

Weather = Sunny
Weather Play Outside?
Sunny No
Sunny No
Sunny No
Sunny Yes
Sunny Yes

 Yes = 2, No = 3

Weather = Overcast
Weather Play Outside?
Overcast Yes
Overcast Yes
Overcast Yes
Overcast Yes

All Yes, so Gini = 0.

Weather = Rainy
Weather Play Outside?
Rainy Yes
Rainy Yes
Rainy No
Rainy Yes
Rainy No

 Yes = 3, No = 2

Gini for Humidity


Humidity = High
Humidity Play Outside?
High No
High No
High Yes
High Yes
High No

 Yes = 2, No = 3

Humidity = Normal
Humidity Play Outside?
Normal Yes
Normal Yes
Normal Yes
Normal Yes
Normal No

 Yes = 4, No = 1

Step 4: Select the Best Split

Comparing the Gini values:

 Gini(Weather) = 0.343
 Gini(Humidity) = 0.40

Since Weather has the lowest Gini, we split on Weather first.

Step 5: Build the Decision Tree


Step 6: Decision Rules

1. If Weather = Overcast, then Play = Yes.


2. If Weather = Sunny:
o If Humidity = High, then Play = No.
o If Humidity = Normal, then Play = Yes.
3. If Weather = Rainy:
o If Windy = No, then Play = Yes.
o If Windy = Yes, then Play = No.

Conclusion

 CART uses Gini Impurity instead of Entropy to build the tree.


 The best split was Weather, and we continued splitting until we reached pure nodes.
 This tree can now be used for making predictions.

You might also like