0% found this document useful (0 votes)

28 views28 pages

IML Unit04 - Learning Decision Trees

Machine Learning

Uploaded by

abdullahrehman13071

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views28 pages

IML Unit04 - Learning Decision Trees

Machine Learning

Uploaded by

abdullahrehman13071

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 28

INTRODUCTION TO

MACHINE LEARNING
Unit 04
Compiled by Solat Jabeen

Thi s Photo by Unknown Author is licensed under CC BY-SA-NC

ACKNOWLEDGEMENT
• The slides and code in this lecture are primarily taken from
• Machine Learning with PyTorch and Scikit-Learn by Rischka et al.
• Data Mining Concepts and Techniques by Han et al.
• Hands-on Machine Learning with Scikit-Learn, Keras and
TensorFlow by Geron

Fall 2024 2
TODAY’S AGENDA
• Steps in Decision Tree Learing
• Computing Gini Index for root node
• Recursive application of Gini Index
• Handling Continuous Variables

Fall 2024 3
DECISION TREES
• Decision trees are versatile machine learning algorithms that can
perform both classification and regression tasks.
• Decision trees are also the fundamental components of random
forests which are among the most powerful machine learning
algorithms available today.
• One of the many qualities of decision trees is that they require very
little data preparation.
• In fact, they don’t require feature scaling or centering at all.

Fall 2024 4
EXAMPLE OF A DECISION TREE
Splitting Attributes

Tid Refund Marital Taxable

Status Income Cheat Refund
Yes No
1 Yes Single 125K No
2 No Married 100K No NO MarSt
3 No Single 70K No Single, Divorced Married
4 Yes Married 120K No
5 No Divorced 95K Yes
TaxInc NO
6 No Married 60K No < 80K > 80K
7 Yes Divorced 220K No YES
NO
8 No Single 85K Yes
9 No Married 75K No
10 No Single 90K Yes Model: Decision Tree
10

Fall 2024
Training Data 6
APPLY MODEL TO TEST DATA
Start from the root of tree. Refund Marital Taxable
Status Income Cheat

Refund No Married 80K ?

Yes No
Test Data
NO MarSt
Single, Divorced Married

TaxInc NO
< 80K > 80K

NO YES

Fall 2024 7
APPLY MODEL TO TEST DATA
Refund Refund Marital Taxable
Status Income Cheat
Yes No
No Married 80K ?
10

NO MarSt
Test Data
Single, Divorced Married

TaxInc NO
< 80K > 80K

NO YES

Fall 2024 8
APPLY MODEL TO TEST DATA
Refund Marital Taxable
Status Income Cheat
Refund
Yes No No Married 80K ?
10

NO MarSt
Test Data

Single, Divorced Married

TaxInc NO
< 80K > 80K

NO YES

Fall 2024 9
APPLY MODEL TO TEST DATA
Refund Marital Taxable
Refund
Status Income Cheat
Yes No
No Married 80K ?
10

NO MarSt
Test Data
Single, Divorced Married

TaxInc NO
< 80K > 80K

NO YES

Fall 2024 10
APPLY MODEL TO TEST DATA
Refund Marital Taxable
Refund Status Income Cheat
Yes No
No Married 80K ?
10

NO MarSt Test Data

Single, Divorced Married

TaxInc NO
< 80K > 80K

NO YES

Fall 2024 11
APPLY MODEL TO TEST DATA
Refund Marital Taxable
Status Income Cheat
Refund Test Data
Yes No No Married 80K ?
10

NO MarSt
Single, Divorced Married Assign Cheat to “No”

TaxInc NO
< 80K > 80K

NO YES

Fall 2024 12
ANOTHER EXAMPLE OF DECISION TREE

Tid Refund Marital Taxable

Status Income Cheat MarSt Single,
Married Divorced
1 Yes Single 125K No
2 No Married 100K No NO Refund
3 No Single 70K No Yes No
4 Yes Married 120K No
NO TaxInc
5 No Divorced 95K Yes
6 No Married 60K No
< 80K > 80K
7 Yes Divorced 220K No NO YES
8 No Single 85K Yes
9 No Married 75K No
There could be more than one tree that fits
the same data!
10 No Single 90K Yes
10

Fall 2024 13
• The class=virginica refers to the
category (Virginia) to which the

IRIS DECISION TREE majority number of tuples (41)

belong.
• Sometimes we need to stop the
algorithm at a certain depth
then this majority is used to
assign the class at this node.
• In short, it is not an error

Mixed Node
Pure Node (all one
(mixture of classes;
class; perfect
needs further
classification
splitting

Fall 2024 14
Fall 2024 15
DECISION TREE INDUCTION: ALGORITHM

1. Tree is constructed in a top-down, recursive, divide-and-conquer

manner
2. At start, all the training examples are at the root.
3. Examples are partitioned recursively based on selected attributes
4. On each node, attributes are selected based on the training
examples on that node, and a heuristic or statistical measure, such
as:
1. Entropy or

2. Gini index

Fall 2024 19
EXAMPLE
age income student credit_rating buys_computer
Attributes: youth high no fair no
1. Age [Youth, Middle-Aged, Senior] youth high no excellent no
middle-aged high no fair yes
2. Income [High, Medium, Low]
senior medium no fair yes
3. Student [Yes, No]
senior low yes fair yes
4. Credit_Rating [Fair, Excellent]
senior low yes excellent no
middle-aged low yes excellent yes
Class: youth medium no fair no
Buys_Computer [Yes, No] youth low yes fair yes
senior medium yes fair yes
youth medium yes excellent yes
middle-aged medium no excellent yes
middle-aged high yes fair yes
senior medium no excellent no
Source: Han & Kamber (Chapter 8)

Fall 2024 22
EXAMPLE: COMPUTING GINI
Age Income Student Credit_Rating Buys_Computer
Youth = 5 Low = 4 Yes = 7 Fair = 8 Yes = 9
Middle-Aged = 4 Medium = 6 No = 7 Excellent = 6 No = 5
Senior = 5 High = 4

• Gini(Youth) = 1 – (3/5)2 - (2/5) 2 = 1 - 0.36 - 0.16 = 0.48

• Gini(Middle-Aged) = 1 – (0/4)2 - (4/4) 2 = 1 - 0 - 1 = 0

• Gini(Senior) = 1 – (2/5)2 - (3/5) 2 = 1 - 0.16 - 0.36 = 0.48

• Total Gini Index for Age = 5/14 Gini(Youth) + 4/14 Gini(Middle-Aged) +

5/14 Gini(Senior)
= 0.342
Fall 2024 23
EXAMPLE: COMPUTING GINI
• Compute Gini for the remaining attributes: Income, Student, Credit
Rating
• Gini Age (buy) = 0.34
• Gini inc (buy) = 0.44
• Gini std (buy) = 0.37
• Gini rat (buy) = 0.43

• Node with the lowest Gini score will be chosen as Root Node
• Root Node → Age

Fall 2024 24
EXAMPLE: RESULTANT DECISION TREE

age?

Youth overcast Senior

Middle-Aged

student? yes credit rating?

no yes excellent fair

no yes yes

Fall 2024 25
WHEN TO STOP?
• Once the decision tree algorithm has successfully split the training
set in two, it splits the subsets using the same logic, then the sub-
subsets, and so on, recursively.
• Conditions for stopping partitioning
• All samples for a given node belong to the same class
• There are no remaining attributes for further partitioning
• There are no samples left
• Reach the maximum depth (defined by max_depth hyperparameter)
• Prediction
• Majority voting is employed for classifying the leaf

Fall 2024 28
PRACTICE PROBLEM
Height Hair Eyes Class
• Generate the decision tree Short Blond Blue +
using Gini impurity and Tall Blond Brown -
entropy.
Tall Red Blue +
Short Dark Blue -
Tall Dark Blue -
Tall Blond Blue +
Tall Dark Brown -
Short Blond Brown -

Fall 2024 29
HANDLING CONTINUOUS ATTRIBUTES

• Method 1: Discretize continuous values and treat them as categorical values

• E.g., age: < 20, 20..30, 30..40, 40..50, > 50
• Method 2: Determine the best split point for continuous-valued attribute A
• Sort: 15, 18, 21, 22, 24, 25, 29, 31, …
• Possible split point: (ai+ai+1)/2: for example, (15+18)/2 = 16.5,
• Similarly, 19.5, 21.5, 23, 24.5, 27, 30, …
• The point with the minimum Gini for A is selected as the split point for A
• Split: Based on split point P
• The set of tuples in D satisfying A ≤ P vs. those with A > P

Fall 2024 30
CONTINUOUS ATTRIBUTES: COMPUTING GINI INDEX

• For efficient computation: for each attribute, Tid Refund Marital Taxable
• Sort the attribute on values Status Income Cheat

• Linearly scan these values, each time updating the 1 Yes Single 125K No
count matrix and computing gini index 2 No Married 100K No

• Choose the split position that has the least gini index 3 No Single 70K No
4 Yes Married 120K No
5 No Divorced 95K Yes
Cheat No No No Yes Yes Yes No No No No
6 No Married 60K No
Taxable Income
7 Yes Divorced 220K No
Sorted Values 60 70 75 85 90 95 100 120 125 220
8 No Single 85K Yes
55 65 72 80 87 92 97 110 122 172 230
Split Positions 9 No Married 75K No
<= > <= > <= > <= > <= > <= > <= > <= > <= > <= > <= >
Yes 0 3 0 3 0 3 0 3 1 2 2 1 3 0 3 0 3 0 3 0 3 0 10 No Single 90K Yes
10

No 0 7 1 6 2 5 3 4 3 4 3 4 3 4 4 3 5 2 6 1 7 0

Gini 0.420 0.400 0.375 0.343 0.417 0.400 0.300 0.343 0.375 0.400 0.420
Fall 2024 31
PRACTICE PROBLEM
Generate the decision tree using
Gini impurity and entropy.

Fall 2024 34
EXTRACTING CLASSIFICATION RULES FROM TREES
• Represent the knowledge in the form of IF-THEN rules
• One rule is created for each path from the root to a leaf
• Each attribute-value pair along a path forms a conjunction. The leaf node holds
the class prediction
• Rules are easier for humans to understand

• Example
IF age = “<=30” AND student = “no” THEN buys_computer = “no”
IF age = “<=30” AND student = “yes” THEN buys_computer = “yes”
IF age = “31…40” THEN buys_computer = “yes”
IF age = “>40” AND credit_rating = “excellent” THEN buys_computer = “yes”
IF age = “<=30” AND credit_rating = “fair” THEN buys_computer = “no”
Fall 2024 35
CHARACTERISTICS OF DECISION TREE INDUCTION

• Decision tree induction is a non-parametric approach for building

classification models. In other words, it doesn’t require any prior
assumptions regarding the type of probability distributions satisfied by
the class and other attributes.
• Finding an optimal decision tree is an NP-complete problem. Many
decision tree algorithms employ a heuristic-based approach to guide
their search in the vast hypothesis space. For example, the algorithm
discussed in this unit uses a greedy, top-down, recursive partitioning
strategy for growing a decision tree.

Fall 2024 36
CHARACTERISTICS OF DECISION TREE INDUCTION (CONT’D)

• Techniques developed for constructing decision trees are computationally

inexpensive, making it possible to quickly construct models even when the
training set size is very large.
• Furthermore, once a decision tree has been built, classifying a test record
is extremely fast, with a worst-case complexity of O(w), where w is the
maximum depth of the tree.
• Decision trees, especially smaller-sized trees, are relatively easy to
interpret.
• Decision tree algorithms are quite robust to the presence of noise.

Fall 2024 37
CHARACTERISTICS OF DECISION TREE INDUCTION (CONT’D)

• The presence of redundant attributes does not adversely affect the

accuracy of decision trees. An attribute is redundant if it is strongly
correlated with another attribute in the data.
• One of the two redundant attributes will not be used for splitting once
the other attribute has been chosen.
• Studies have shown that the choice of impurity measures has little
effect on the performance of decision tree induction algorithms.

Fall 2024 38

9 Lecture AI 09
No ratings yet
9 Lecture AI 09
57 pages
Data Mining: Concepts and Techniques
No ratings yet
Data Mining: Concepts and Techniques
59 pages
Classification: Basic Concepts and Decision Trees
No ratings yet
Classification: Basic Concepts and Decision Trees
71 pages
ML Lecture 8 9 Classification
No ratings yet
ML Lecture 8 9 Classification
35 pages
Decision Tree
No ratings yet
Decision Tree
74 pages
09 - ML - Decision Tree
No ratings yet
09 - ML - Decision Tree
45 pages
Week 4 - Classification - Decision Tree 1
No ratings yet
Week 4 - Classification - Decision Tree 1
40 pages
5 Classification
No ratings yet
5 Classification
59 pages
07.2.decision Trees
No ratings yet
07.2.decision Trees
33 pages
Classification - Decision Trees
No ratings yet
Classification - Decision Trees
96 pages
CSE445 NSU Week - 4
No ratings yet
CSE445 NSU Week - 4
48 pages
Machine - Learning - Lecture - 08 - Decision Tree Learning
No ratings yet
Machine - Learning - Lecture - 08 - Decision Tree Learning
67 pages
Datamining-Lect5 Decision Tree
No ratings yet
Datamining-Lect5 Decision Tree
38 pages
Tree Based Classifiers: Dinesh R
No ratings yet
Tree Based Classifiers: Dinesh R
54 pages
Classification: Lecture Notes For Chapters 4 & 5
No ratings yet
Classification: Lecture Notes For Chapters 4 & 5
42 pages
07.2.decision Trees - ML
No ratings yet
07.2.decision Trees - ML
32 pages
Classification Basics
No ratings yet
Classification Basics
65 pages
L4 Classification
No ratings yet
L4 Classification
7 pages
Data Mining Unit 3
No ratings yet
Data Mining Unit 3
21 pages
DM Unit-Iii
No ratings yet
DM Unit-Iii
13 pages
Lecture 5 DecisionTree
No ratings yet
Lecture 5 DecisionTree
21 pages
Tree-Based Learning Explained
No ratings yet
Tree-Based Learning Explained
70 pages
Data Mining: Concepts and Techniques: - Chapter 7
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 7
61 pages
Classification: Basic Concepts and Decision Trees
No ratings yet
Classification: Basic Concepts and Decision Trees
71 pages
Supervised Learning Algorithm
No ratings yet
Supervised Learning Algorithm
59 pages
ML L8 Decision Tree
No ratings yet
ML L8 Decision Tree
109 pages
Unit IV Decision Trees
No ratings yet
Unit IV Decision Trees
37 pages
Important For Data Mining
No ratings yet
Important For Data Mining
96 pages
Wk. 5.2. Decision Trees (27.10.2020)
No ratings yet
Wk. 5.2. Decision Trees (27.10.2020)
57 pages
ML 05 Decision Trees
No ratings yet
ML 05 Decision Trees
76 pages
Decision Tree
No ratings yet
Decision Tree
15 pages
Unit 2 Notes
No ratings yet
Unit 2 Notes
83 pages
Decision Tree
No ratings yet
Decision Tree
43 pages
ECE Classification Concepts
No ratings yet
ECE Classification Concepts
69 pages
L04 Decision Trees
No ratings yet
L04 Decision Trees
34 pages
Decision Tree Course Guide
No ratings yet
Decision Tree Course Guide
37 pages
Big Data Classification Basics
No ratings yet
Big Data Classification Basics
47 pages
Unit II Part 1
No ratings yet
Unit II Part 1
62 pages
Decision Tree
No ratings yet
Decision Tree
74 pages
Lecture 6 - Decision Trees
No ratings yet
Lecture 6 - Decision Trees
43 pages
Lecture 5 Classification P2 Decision Tree
No ratings yet
Lecture 5 Classification P2 Decision Tree
54 pages
DM 4
No ratings yet
DM 4
68 pages
05classification Rule Mining
No ratings yet
05classification Rule Mining
56 pages
Unit 3 Classification
No ratings yet
Unit 3 Classification
71 pages
ML Unit II
No ratings yet
ML Unit II
183 pages
CH 6
No ratings yet
CH 6
72 pages
Lec 6
No ratings yet
Lec 6
39 pages
Unit 1 ML (NN& ML Techniques)
No ratings yet
Unit 1 ML (NN& ML Techniques)
40 pages
DT-0 (3 Files Merged)
No ratings yet
DT-0 (3 Files Merged)
143 pages
DM Lect8
No ratings yet
DM Lect8
56 pages
Unit-4 (1) .Docx ML
No ratings yet
Unit-4 (1) .Docx ML
42 pages
Classification & Prediction Techniques
No ratings yet
Classification & Prediction Techniques
71 pages
Decision Tree Induction
No ratings yet
Decision Tree Induction
80 pages
Unit6 - 2 Classification-Decision-Trees
No ratings yet
Unit6 - 2 Classification-Decision-Trees
36 pages
UNIT 3 Classification
No ratings yet
UNIT 3 Classification
17 pages
Decision Tree
No ratings yet
Decision Tree
41 pages
A-Z Booklet Nature Sunshine 2014
100% (1)
A-Z Booklet Nature Sunshine 2014
126 pages
PIC 16F877A ADC Module Guide
No ratings yet
PIC 16F877A ADC Module Guide
30 pages
Dana Spicer 2 - 4266588 - 15.5HR36432-36 - TPL-2626 - 2015-10-08
100% (2)
Dana Spicer 2 - 4266588 - 15.5HR36432-36 - TPL-2626 - 2015-10-08
76 pages
We Are Going To Act A Roleplay About Cleanliness and Personal Hygiene
100% (8)
We Are Going To Act A Roleplay About Cleanliness and Personal Hygiene
2 pages
Delta Lube 06 (PDB) - en - GB Technical
No ratings yet
Delta Lube 06 (PDB) - en - GB Technical
3 pages
Chemical and Physical Information
No ratings yet
Chemical and Physical Information
9 pages
Gym, Spa and Games Checking List
No ratings yet
Gym, Spa and Games Checking List
3 pages
Aryans PPT!
No ratings yet
Aryans PPT!
25 pages
Abhay Pandey Resume
No ratings yet
Abhay Pandey Resume
3 pages
Duplex Stainless Steels For Fisher Valves: Bulletin 59:025
No ratings yet
Duplex Stainless Steels For Fisher Valves: Bulletin 59:025
4 pages
Course Outline Quantum Mechanics-II
No ratings yet
Course Outline Quantum Mechanics-II
2 pages
Sans 5845
No ratings yet
Sans 5845
7 pages
M100 Intercom Station Range and Accessories Catalogue
No ratings yet
M100 Intercom Station Range and Accessories Catalogue
24 pages
Sboomspec SX125
No ratings yet
Sboomspec SX125
2 pages
Ductility Aspects of Reinforced and Non-Reinforced Timber Joints PDF
No ratings yet
Ductility Aspects of Reinforced and Non-Reinforced Timber Joints PDF
9 pages
CFMOTO Vehicle Safety Manual
No ratings yet
CFMOTO Vehicle Safety Manual
135 pages
Powerpoint Cervantes
100% (1)
Powerpoint Cervantes
19 pages
Engineering Drawing Drafting Instruments Lesson 2
100% (1)
Engineering Drawing Drafting Instruments Lesson 2
44 pages
MPS 100 Cargo Transport
No ratings yet
MPS 100 Cargo Transport
4 pages
Kirch (2010) Peopling of The Pacific - A Holistic Anthropological Perspective
No ratings yet
Kirch (2010) Peopling of The Pacific - A Holistic Anthropological Perspective
22 pages
Astm B 601
No ratings yet
Astm B 601
7 pages
10 Drive and Driven Pulley
No ratings yet
10 Drive and Driven Pulley
21 pages
Chapter 2 Utilization of Banana Peel As Skin Moisturizer
50% (2)
Chapter 2 Utilization of Banana Peel As Skin Moisturizer
12 pages
BOC CAO 01-2016 Advance Cargo Declaration Inward Foreign Manifest Consolidated Cargo Manifest Rule
100% (1)
BOC CAO 01-2016 Advance Cargo Declaration Inward Foreign Manifest Consolidated Cargo Manifest Rule
6 pages
The Emergence of Life On Eart - 2001 - Progress in Biophysics and Molecular Biol
No ratings yet
The Emergence of Life On Eart - 2001 - Progress in Biophysics and Molecular Biol
46 pages
Course Form - Second Semester, 2022 - 2023
No ratings yet
Course Form - Second Semester, 2022 - 2023
1 page
Quiz Questions
No ratings yet
Quiz Questions
4 pages
Liu 2013. Geografical Variation in Statolith Trace Elements Dosidicus Gigas
No ratings yet
Liu 2013. Geografical Variation in Statolith Trace Elements Dosidicus Gigas
10 pages
2G To 3G Handover
No ratings yet
2G To 3G Handover
36 pages
Drainage Details
No ratings yet
Drainage Details
1 page

IML Unit04 - Learning Decision Trees

Uploaded by

IML Unit04 - Learning Decision Trees

Uploaded by

INTRODUCTION TO

Thi s Photo by Unknown Author is licensed under CC BY-SA-NC

Tid Refund Marital Taxable

Refund No Married 80K ?

Single, Divorced Married

NO MarSt Test Data

Tid Refund Marital Taxable

IRIS DECISION TREE majority number of tuples (41)

1. Tree is constructed in a top-down, recursive, divide-and-conquer

• Gini(Youth) = 1 – (3/5)2 - (2/5) 2 = 1 - 0.36 - 0.16 = 0.48

• Gini(Middle-Aged) = 1 – (0/4)2 - (4/4) 2 = 1 - 0 - 1 = 0

• Gini(Senior) = 1 – (2/5)2 - (3/5) 2 = 1 - 0.16 - 0.36 = 0.48

• Total Gini Index for Age = 5/14 Gini(Youth) + 4/14 Gini(Middle-Aged) +

Youth overcast Senior

student? yes credit rating?

no yes excellent fair

• Method 1: Discretize continuous values and treat them as categorical values

• Decision tree induction is a non-parametric approach for building

• Techniques developed for constructing decision trees are computationally

• The presence of redundant attributes does not adversely affect the

You might also like