INTRODUCTION TO
MACHINE LEARNING
Unit 04
Compiled by Solat Jabeen
Thi s Photo by Unknown Author is licensed under CC BY-SA-NC
ACKNOWLEDGEMENT
• The slides and code in this lecture are primarily taken from
• Machine Learning with PyTorch and Scikit-Learn by Rischka et al.
• Data Mining Concepts and Techniques by Han et al.
• Hands-on Machine Learning with Scikit-Learn, Keras and
TensorFlow by Geron
Fall 2024 2
TODAY’S AGENDA
• Steps in Decision Tree Learing
• Computing Gini Index for root node
• Recursive application of Gini Index
• Handling Continuous Variables
Fall 2024 3
DECISION TREES
• Decision trees are versatile machine learning algorithms that can
perform both classification and regression tasks.
• Decision trees are also the fundamental components of random
forests which are among the most powerful machine learning
algorithms available today.
• One of the many qualities of decision trees is that they require very
little data preparation.
• In fact, they don’t require feature scaling or centering at all.
Fall 2024 4
EXAMPLE OF A DECISION TREE
Splitting Attributes
Tid Refund Marital Taxable
Status Income Cheat Refund
Yes No
1 Yes Single 125K No
2 No Married 100K No NO MarSt
3 No Single 70K No Single, Divorced Married
4 Yes Married 120K No
5 No Divorced 95K Yes
TaxInc NO
6 No Married 60K No < 80K > 80K
7 Yes Divorced 220K No YES
NO
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes Model: Decision Tree
10
Fall 2024
Training Data 6
APPLY MODEL TO TEST DATA
Start from the root of tree. Refund Marital Taxable
Status Income Cheat
Refund No Married 80K ?
10
Yes No
Test Data
NO MarSt
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
Fall 2024 7
APPLY MODEL TO TEST DATA
Refund Refund Marital Taxable
Status Income Cheat
Yes No
No Married 80K ?
10
NO MarSt
Test Data
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
Fall 2024 8
APPLY MODEL TO TEST DATA
Refund Marital Taxable
Status Income Cheat
Refund
Yes No No Married 80K ?
10
NO MarSt
Test Data
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
Fall 2024 9
APPLY MODEL TO TEST DATA
Refund Marital Taxable
Refund
Status Income Cheat
Yes No
No Married 80K ?
10
NO MarSt
Test Data
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
Fall 2024 10
APPLY MODEL TO TEST DATA
Refund Marital Taxable
Refund Status Income Cheat
Yes No
No Married 80K ?
10
NO MarSt Test Data
Single, Divorced Married
TaxInc NO
< 80K > 80K
NO YES
Fall 2024 11
APPLY MODEL TO TEST DATA
Refund Marital Taxable
Status Income Cheat
Refund Test Data
Yes No No Married 80K ?
10
NO MarSt
Single, Divorced Married Assign Cheat to “No”
TaxInc NO
< 80K > 80K
NO YES
Fall 2024 12
ANOTHER EXAMPLE OF DECISION TREE
Tid Refund Marital Taxable
Status Income Cheat MarSt Single,
Married Divorced
1 Yes Single 125K No
2 No Married 100K No NO Refund
3 No Single 70K No Yes No
4 Yes Married 120K No
NO TaxInc
5 No Divorced 95K Yes
6 No Married 60K No
< 80K > 80K
7 Yes Divorced 220K No NO YES
8 No Single 85K Yes
9 No Married 75K No
There could be more than one tree that fits
the same data!
10 No Single 90K Yes
10
Fall 2024 13
• The class=virginica refers to the
category (Virginia) to which the
IRIS DECISION TREE majority number of tuples (41)
belong.
• Sometimes we need to stop the
algorithm at a certain depth
then this majority is used to
assign the class at this node.
• In short, it is not an error
Mixed Node
Pure Node (all one
(mixture of classes;
class; perfect
needs further
classification
splitting
Fall 2024 14
Fall 2024 15
DECISION TREE INDUCTION: ALGORITHM
1. Tree is constructed in a top-down, recursive, divide-and-conquer
manner
2. At start, all the training examples are at the root.
3. Examples are partitioned recursively based on selected attributes
4. On each node, attributes are selected based on the training
examples on that node, and a heuristic or statistical measure, such
as:
1. Entropy or
2. Gini index
Fall 2024 19
EXAMPLE
age income student credit_rating buys_computer
Attributes: youth high no fair no
1. Age [Youth, Middle-Aged, Senior] youth high no excellent no
middle-aged high no fair yes
2. Income [High, Medium, Low]
senior medium no fair yes
3. Student [Yes, No]
senior low yes fair yes
4. Credit_Rating [Fair, Excellent]
senior low yes excellent no
middle-aged low yes excellent yes
Class: youth medium no fair no
Buys_Computer [Yes, No] youth low yes fair yes
senior medium yes fair yes
youth medium yes excellent yes
middle-aged medium no excellent yes
middle-aged high yes fair yes
senior medium no excellent no
Source: Han & Kamber (Chapter 8)
Fall 2024 22
EXAMPLE: COMPUTING GINI
Age Income Student Credit_Rating Buys_Computer
Youth = 5 Low = 4 Yes = 7 Fair = 8 Yes = 9
Middle-Aged = 4 Medium = 6 No = 7 Excellent = 6 No = 5
Senior = 5 High = 4
• Gini(Youth) = 1 – (3/5)2 - (2/5) 2 = 1 - 0.36 - 0.16 = 0.48
• Gini(Middle-Aged) = 1 – (0/4)2 - (4/4) 2 = 1 - 0 - 1 = 0
• Gini(Senior) = 1 – (2/5)2 - (3/5) 2 = 1 - 0.16 - 0.36 = 0.48
• Total Gini Index for Age = 5/14 Gini(Youth) + 4/14 Gini(Middle-Aged) +
5/14 Gini(Senior)
= 0.342
Fall 2024 23
EXAMPLE: COMPUTING GINI
• Compute Gini for the remaining attributes: Income, Student, Credit
Rating
• Gini Age (buy) = 0.34
• Gini inc (buy) = 0.44
• Gini std (buy) = 0.37
• Gini rat (buy) = 0.43
• Node with the lowest Gini score will be chosen as Root Node
• Root Node → Age
Fall 2024 24
EXAMPLE: RESULTANT DECISION TREE
age?
Youth overcast Senior
Middle-Aged
student? yes credit rating?
no yes excellent fair
no yes yes
Fall 2024 25
WHEN TO STOP?
• Once the decision tree algorithm has successfully split the training
set in two, it splits the subsets using the same logic, then the sub-
subsets, and so on, recursively.
• Conditions for stopping partitioning
• All samples for a given node belong to the same class
• There are no remaining attributes for further partitioning
• There are no samples left
• Reach the maximum depth (defined by max_depth hyperparameter)
• Prediction
• Majority voting is employed for classifying the leaf
Fall 2024 28
PRACTICE PROBLEM
Height Hair Eyes Class
• Generate the decision tree Short Blond Blue +
using Gini impurity and Tall Blond Brown -
entropy.
Tall Red Blue +
Short Dark Blue -
Tall Dark Blue -
Tall Blond Blue +
Tall Dark Brown -
Short Blond Brown -
Fall 2024 29
HANDLING CONTINUOUS ATTRIBUTES
• Method 1: Discretize continuous values and treat them as categorical values
• E.g., age: < 20, 20..30, 30..40, 40..50, > 50
• Method 2: Determine the best split point for continuous-valued attribute A
• Sort: 15, 18, 21, 22, 24, 25, 29, 31, …
• Possible split point: (ai+ai+1)/2: for example, (15+18)/2 = 16.5,
• Similarly, 19.5, 21.5, 23, 24.5, 27, 30, …
• The point with the minimum Gini for A is selected as the split point for A
• Split: Based on split point P
• The set of tuples in D satisfying A ≤ P vs. those with A > P
Fall 2024 30
CONTINUOUS ATTRIBUTES: COMPUTING GINI INDEX
• For efficient computation: for each attribute, Tid Refund Marital Taxable
• Sort the attribute on values Status Income Cheat
• Linearly scan these values, each time updating the 1 Yes Single 125K No
count matrix and computing gini index 2 No Married 100K No
• Choose the split position that has the least gini index 3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
Cheat No No No Yes Yes Yes No No No No
6 No Married 60K No
Taxable Income
7 Yes Divorced 220K No
Sorted Values 60 70 75 85 90 95 100 120 125 220
8 No Single 85K Yes
55 65 72 80 87 92 97 110 122 172 230
Split Positions 9 No Married 75K No
<= > <= > <= > <= > <= > <= > <= > <= > <= > <= > <= >
Yes 0 3 0 3 0 3 0 3 1 2 2 1 3 0 3 0 3 0 3 0 3 0 10 No Single 90K Yes
10
No 0 7 1 6 2 5 3 4 3 4 3 4 3 4 4 3 5 2 6 1 7 0
Gini 0.420 0.400 0.375 0.343 0.417 0.400 0.300 0.343 0.375 0.400 0.420
Fall 2024 31
PRACTICE PROBLEM
Generate the decision tree using
Gini impurity and entropy.
Fall 2024 34
EXTRACTING CLASSIFICATION RULES FROM TREES
• Represent the knowledge in the form of IF-THEN rules
• One rule is created for each path from the root to a leaf
• Each attribute-value pair along a path forms a conjunction. The leaf node holds
the class prediction
• Rules are easier for humans to understand
• Example
IF age = “<=30” AND student = “no” THEN buys_computer = “no”
IF age = “<=30” AND student = “yes” THEN buys_computer = “yes”
IF age = “31…40” THEN buys_computer = “yes”
IF age = “>40” AND credit_rating = “excellent” THEN buys_computer = “yes”
IF age = “<=30” AND credit_rating = “fair” THEN buys_computer = “no”
Fall 2024 35
CHARACTERISTICS OF DECISION TREE INDUCTION
• Decision tree induction is a non-parametric approach for building
classification models. In other words, it doesn’t require any prior
assumptions regarding the type of probability distributions satisfied by
the class and other attributes.
• Finding an optimal decision tree is an NP-complete problem. Many
decision tree algorithms employ a heuristic-based approach to guide
their search in the vast hypothesis space. For example, the algorithm
discussed in this unit uses a greedy, top-down, recursive partitioning
strategy for growing a decision tree.
Fall 2024 36
CHARACTERISTICS OF DECISION TREE INDUCTION (CONT’D)
• Techniques developed for constructing decision trees are computationally
inexpensive, making it possible to quickly construct models even when the
training set size is very large.
• Furthermore, once a decision tree has been built, classifying a test record
is extremely fast, with a worst-case complexity of O(w), where w is the
maximum depth of the tree.
• Decision trees, especially smaller-sized trees, are relatively easy to
interpret.
• Decision tree algorithms are quite robust to the presence of noise.
Fall 2024 37
CHARACTERISTICS OF DECISION TREE INDUCTION (CONT’D)
• The presence of redundant attributes does not adversely affect the
accuracy of decision trees. An attribute is redundant if it is strongly
correlated with another attribute in the data.
• One of the two redundant attributes will not be used for splitting once
the other attribute has been chosen.
• Studies have shown that the choice of impurity measures has little
effect on the performance of decision tree induction algorithms.
Fall 2024 38