0% found this document useful (0 votes)

20 views38 pages

T6 Decision Tree

The document discusses decision trees, a model used for classification tasks, detailing their structure, construction methods, and the concepts of entropy and information gain. It explains how decision trees are built through recursive partitioning based on the best attribute selection to achieve the purest nodes. The ID3 algorithm is also introduced as a method for constructing decision trees using these principles.

Uploaded by

prashantkumar.prk1602

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views38 pages

T6 Decision Tree

Uploaded by

prashantkumar.prk1602

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

Decision Trees

Introduction
Entropy
Information gain
Tree models
 Trees are expressive and easy to understand.

 A feature tree is a tree such that each internal node is labeled

with a feature and each edge emanating from an internal node
is labeled with a literal.

 The set of literals at a node is called a split.

 Each leaf of the tree represents a logical expression, which is

the conjunction of literals on the path from root to the leaf.

 The extension of that conjunction is called the instance space

segment associated with the leaf.
Tree models
 An internal node test the value of particular features.

 A branch represents an outcome of the test.

 A leaf node represents a class label h(x) or class label

distribution.
Building decision trees
 Top-down tree construction

 At start, all training examples are at the root.

 Partition the examples recursively by choosing one attribute
each time.

 Bottom-up tree pruning

 Remove subtrees or branches, in a bottom-up manner, to

improve the estimated accuracy on new cases.
Building decision trees
Building decision trees
Building decision trees
Building decision trees
 The construction of a tree involves the following three general
elements:

 The selection of the splits i.e., which node to split and how to
split it?

 When to declare a node terminal and stop splitting?

 We have to assign each terminal node to a class. How do we
assign these class labels?
Building decision trees
 The algorithm starts as a single node N, representing the
training tuples in D.

 If the tuples in D are all of the same class, then node N becomes
a leaf and is labeled with that class.

 Otherwise, the algorithm calls BestSplit to determine the splitting

criterion.
 This tells which attribute to test at node N by determining the
way to partition the tuples in D into individual classes.
 The splitting attribute is determines so that, the resulting
partitions is pure – if all tuples in it belong to same class.
Building decision trees
 The node N is labeled with the splitting criterion, which serves as
a test at the node.
 A branch is grown from N for each of the outcome of splitting
criterion.
 The tuples in D are partitioned accordingly.
 The algorithm uses same process recursively to form a decision
tree for the tuples at each resulting partition Dj, of D.
Building decision trees
 The recursive partitioning stops under any one of following
terminating condition:

 All of the tuples in partition D (at node N) belong to the same

class, or

 There are no remaining attributes on which the tuples may be

further partitioned, or

 There are no tuples for a given branch, i.e., a partition Dj is

empty.
Choosing the Best attribute
 How to choose the best attribute or feature?

 A goodness of split is used for this purpose.

 The goodness of split in turn is measured by an impurity function

 Typical impurity functions:

 Minority class : min(p, 1-p)

 Gini index : 2p(1-p)
 Entropy : – p log p – (1-p) log
2 2 (1-p)
Choosing the Best attribute
A1 and A2 are “attributes” (i.e. features or inputs).

Number +
and – examples
before and after
a split.

- Many different frameworks for choosing BEST have been

proposed!
- We will look at Entropy Gain.
Entropy
 A formula to calculate the homogeneity of a sample.

 A completely homogeneous sample has entropy of 0.

 An equally divided sample has entropy of 1.

 p+ = no. of positive samples / total samples in S.

 p- = no. of negative samples / total samples in S.

Entropy
 Entropy (S) ≡ – p+ log2 p+ – p– log2 p– [0 log20 = 0]
 Entropy ([14+, 0–]) = – 14/14 log2 (14/14) – 0 log2 (0) = 0
 Entropy ([9+, 5–]) = – 9/14 log2 (9/14) – 5/14 log2 (5/14) = 0.94
 Entropy ([7+, 7– ]) = – 7/14 log2 (7/14) – 7/14 log2 (7/14)
= 1/2 + 1/2 = 1 [log21/2 = – 1]
 Note: 0 ≤ p ≤ 1, 0 ≤ entropy ≤ 1
Entropy
9+, 5-

pure, 100% yes

not pure at all, 40% yes pure, 100% yes

not pure at all, 40% yes
Information gain (IG)
 Which is the best attribute?

 The one which will result in the smallest tree.

 Heuristic: choose the attribute that produces the “purest” nodes

 Popular impurity criterion: information gain

 Information gain increases with the average purity of the

subsets that an attribute produces.

 Information gain uses entropy H(p)

 Strategy: choose attribute that results in greatest information
gain.
Information gain (IG)
 The information gain is based on the decrease in entropy after a
dataset is split on an attribute.

 Which attribute creates the most homogeneous branches?

 Calculate the entropy of the total dataset D before split.

 The dataset is then split on the different attributes.

 The entropy for each branch is calculated. Then it is added

proportionally, to get total entropy for the split.

 The resulting entropy is subtracted from the entropy before the split.

 The result is the Information Gain, or decrease in entropy.

Information gain (IG)
 Information gain = entropy before split – total entropy for the split.

 The attribute that yields the largest IG is chosen for the decision node.
Decision Trees
When do I play tennis?
Best node?
Attribute: Outlook
 “Outlook” = “Sunny”:
info([2,3]) = entropy(2/5,3/5) = −2 / 5 log(2 / 5) − 3 / 5 log(3 / 5) = 0.971 bits

 “Outlook” = “Overcast”:
info([4,0]) = entropy(1,0) = −1log(1) − 0 log(0) = 0 bits

 “Outlook” = “Rainy”:
info([3,2]) = entropy(3/5,2/5) = −3 / 5 log(3 / 5) − 2 / 5 log(2 / 5) = 0.971 bits

 Expected Information for “Outlook”:

info([3,2],[4,0],[3,2]) = (5 / 14) × 0.971 + (4 / 14) × 0 + (5 / 14) × 0.971
= 0.693 bits
Information gain
 Entropy(S) = - (9/14) log2 (9/14) - (5/14) log2 (5/14) = 0.940

 Information gain:
(information before split) – (information after split)
gain(" Outlook" ) = info([9,5]) - info([2,3], [4,0], [3,2]) = 0.940 - 0.693
= 0.247 bits
 Similarly calculate the information gain for 'Wind” attribute!
Attribute: Wind
Values(Wind) = {Weak, Strong}
S = [9+, 5−] = 0.94
SWeak = [6+, 2−] = entropy(SWeak)= - 6/8 log(6/8) – 2/8 log(2/8) = 0.811

SStrong = [3+, 3−] = entropy(SStrong)= - 3/3 log(3/3) – 3/3 log(3/3) = 1

 Information gain due to knowing Wind:

Gain(S, Wind) = Entropy(S) − 8/14 Entropy(SWeak) − 6/14 Entropy(SStrong)
= 0.94 − 8/14 × 0.811 − 6/14 × 1.00
= 0.048
Best Attribute ?
Best Attribute ?
 Which attribute should be tested at the root?
gain(" Outlook" ) = 0.247 bits
gain(" Temperature" ) = 0.029 bits
gain(" Humidity") = 0.152 bits
gain(" Windy") = 0.048 bits

 Outlook provides the best prediction for the target.

 Lets grow the tree:

 add to the tree a successor for each possible value of
Outlook
 partition the training samples according to the value of
Outlook
After first step ?
Second step

Gain(SSunny, Temp.) = 0.970 − 2/5 × 0.0 − 2/5 × 1.0 − 1/5 × 0.0 = 0.570

Gain(SSunny, Wind) = 0.970 − 2/5 × 1.0 − 3/5 × 0.918 = 0 .019

Gain(SSunny, Humidity) = 0.970 − 3/5 × 0.0 − 2/5 × 0.0 = 0.970

Second step
 Working on Outlook=Sunny node:
gain(" Temperature" ) = 0.571 bits
gain(" Windy") = 0.020 bits
gain(" Humidity") = 0.971 bits

 Humidity provides the best prediction for the target

 Lets grow the tree:

 add to the tree a successor for each possible value of

Humidity

 partition the training samples according to the value of

Humidity
After second step

{D1, D2, D8} {D9, D11}

No Yes
After second step
 The test on Humidity provides two splits, and the resulting tuples
of each split falls on the same class.

 Hence leaf nodes are labeled with that class.

 Lets grow the tree by testing the node rain:

 add to the tree a successor for each possible value of rain.

 partition the training samples according to the value of rain.
Wind attribute
Wind Attribute
 Let S = {D4, D5, D6, D10, D14}

 Entropy: H(S) = – 3/5log(3/5) – 2/5log(2/5) = 0.971

 Information Gain
IG(S,Temp) = H(S) – H(S|Temp) = 0.01997
IG(S,Wind) = H(S) – H(S|Wind)
= 0.971 − 3/5 × 0.0 − 2/5 × 0.0 = 0.971
After third step

{D1, D2, D8} {D9, D11}

{D6, D14} {D4, D5, D10}
No Yes
No Yes
Final descision tree
ID3 Algorithm
 ID3(X, T, Attrs)
X: training examples: T: target attribute (e.g. PlayTennis),
Attrs: other attributes, initially all attributes
 Create Root node
If all X's are +, return Root with class +
If all X's are –, return Root with class –
If Attrs is empty return Root with class most common value of T in X
else
A ← best attribute; decision attribute for Root←A
For each possible value vi of A:
- add a new branch below Root, for test A = vi
- Xi ←subset of X with A = vi
- If Xi is empty then add a new leaf with class the most common
value of T in X
else add the subtree generated by ID3(Xi, T, Attrs ← {A})
return Root
Acknowledgement
 Machine Learning – Peter Flach
 Notes from Tom Mitchell

Decision Trees for Data Scientists
No ratings yet
Decision Trees for Data Scientists
14 pages
MLT UNIT-3 Notes
No ratings yet
MLT UNIT-3 Notes
35 pages
2.3 Decision-Tree-Algorithm
No ratings yet
2.3 Decision-Tree-Algorithm
61 pages
Lect 8-Decision Tree-2
No ratings yet
Lect 8-Decision Tree-2
16 pages
Decision Tree
100% (4)
Decision Tree
66 pages
Decision Tree Learning Basics
No ratings yet
Decision Tree Learning Basics
36 pages
Trees
No ratings yet
Trees
78 pages
Decision Tree
No ratings yet
Decision Tree
58 pages
Module 5 Notes
No ratings yet
Module 5 Notes
8 pages
Unit6 - 2 Classification-Decision-Trees
No ratings yet
Unit6 - 2 Classification-Decision-Trees
36 pages
DT Classifier
No ratings yet
DT Classifier
45 pages
07 - ML - Decision Tree
No ratings yet
07 - ML - Decision Tree
37 pages
Decision-Tree Learning .
No ratings yet
Decision-Tree Learning .
29 pages
Decision Tree: Courtesy: Prof. Pabitra Mitra, CSE, IIT Kharagpur
No ratings yet
Decision Tree: Courtesy: Prof. Pabitra Mitra, CSE, IIT Kharagpur
73 pages
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
31 pages
Decision Tree Algorithm: Comp328 Tutorial 1 Kai Zhang
No ratings yet
Decision Tree Algorithm: Comp328 Tutorial 1 Kai Zhang
25 pages
7-Decision Trees Learning
No ratings yet
7-Decision Trees Learning
51 pages
DT-0 (3 Files Merged)
No ratings yet
DT-0 (3 Files Merged)
143 pages
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
No ratings yet
16-Decision Tree Classification Algorithm Advantages With Examples (Iterative Dichotomiser 3-ID3) - 22-03-2024
83 pages
06 Classification Decision Tree
No ratings yet
06 Classification Decision Tree
42 pages
Data Mining: Classification-1
No ratings yet
Data Mining: Classification-1
53 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
41 pages
L5 - Decision Tree - B
No ratings yet
L5 - Decision Tree - B
51 pages
Decision Tree Learning Guide
No ratings yet
Decision Tree Learning Guide
79 pages
Decision Tree Induction Basics
No ratings yet
Decision Tree Induction Basics
55 pages
Classification With Decision Trees: Instructor: Qiang Yang
100% (1)
Classification With Decision Trees: Instructor: Qiang Yang
62 pages
Decision Trees
No ratings yet
Decision Trees
15 pages
Decision Tree
No ratings yet
Decision Tree
20 pages
Module 3-Decision Tree Learning
100% (1)
Module 3-Decision Tree Learning
33 pages
Decision Tree Basics for Data Scientists
No ratings yet
Decision Tree Basics for Data Scientists
61 pages
L3 - Decision Trees
No ratings yet
L3 - Decision Trees
28 pages
9-Module 5 Decision Tree-21-03-2024
No ratings yet
9-Module 5 Decision Tree-21-03-2024
83 pages
Tree Models
No ratings yet
Tree Models
42 pages
Decision Trees
No ratings yet
Decision Trees
18 pages
Chapter 3
No ratings yet
Chapter 3
88 pages
Unit 1 ML (NN& ML Techniques)
No ratings yet
Unit 1 ML (NN& ML Techniques)
40 pages
Lecture 4
No ratings yet
Lecture 4
74 pages
Decision Trees-Lecture 9&10
No ratings yet
Decision Trees-Lecture 9&10
60 pages
Ch02 DecisionTree
100% (1)
Ch02 DecisionTree
41 pages
Unit 3
No ratings yet
Unit 3
90 pages
Decision Tree Induction
No ratings yet
Decision Tree Induction
80 pages
Decision - Tree
No ratings yet
Decision - Tree
75 pages
Decision Trees
No ratings yet
Decision Trees
34 pages
Unit 4 - Decision Tree ID3
No ratings yet
Unit 4 - Decision Tree ID3
5 pages
Decision Tree Algorithm Guide
No ratings yet
Decision Tree Algorithm Guide
25 pages
Lecture 17 18
No ratings yet
Lecture 17 18
52 pages
Decision Tree
No ratings yet
Decision Tree
35 pages
Decision Tree Learning Guide
No ratings yet
Decision Tree Learning Guide
33 pages
Lecture 4
No ratings yet
Lecture 4
74 pages
Module - 2 Decision Tree Learning
No ratings yet
Module - 2 Decision Tree Learning
79 pages
Unit-4 (1) .Docx ML
No ratings yet
Unit-4 (1) .Docx ML
42 pages
Unit 2 1
No ratings yet
Unit 2 1
15 pages
CENG313 Introduction To Data Science: Lecture 12: Classification Decision Trees
No ratings yet
CENG313 Introduction To Data Science: Lecture 12: Classification Decision Trees
61 pages
Decision Trees for Data Scientists
No ratings yet
Decision Trees for Data Scientists
61 pages
Unit 1 ML (DT)
No ratings yet
Unit 1 ML (DT)
24 pages
Data Mining Notes Unit 4
No ratings yet
Data Mining Notes Unit 4
30 pages
ML Unit-3
No ratings yet
ML Unit-3
92 pages
Sierra College TEAS Prep Resources
100% (1)
Sierra College TEAS Prep Resources
2 pages
GUTbiomin Finalwfigs
No ratings yet
GUTbiomin Finalwfigs
11 pages
The Novel and The Multispecies Soundscape 1st Ed. Edition Ben de Bruyn Instant Download
No ratings yet
The Novel and The Multispecies Soundscape 1st Ed. Edition Ben de Bruyn Instant Download
105 pages
Rockwell Test
No ratings yet
Rockwell Test
4 pages
Venn Diagram Notes
No ratings yet
Venn Diagram Notes
4 pages
Jt-Mb-E: Microphone Bridging Transformer
No ratings yet
Jt-Mb-E: Microphone Bridging Transformer
2 pages
Music Autobiography Guidelines
No ratings yet
Music Autobiography Guidelines
3 pages
Vibration Absorbers
No ratings yet
Vibration Absorbers
33 pages
Green School PDF
100% (1)
Green School PDF
83 pages
Seafarers' Welfare Directory Guide
No ratings yet
Seafarers' Welfare Directory Guide
164 pages
Chapter-1: The Living World
No ratings yet
Chapter-1: The Living World
2 pages
FAQ Lithium-Ion Battery BR153094EN LR
No ratings yet
FAQ Lithium-Ion Battery BR153094EN LR
4 pages
CP D155AX 6 S N 81028 UP (Chassi)
No ratings yet
CP D155AX 6 S N 81028 UP (Chassi)
5,092 pages
Informed Consent
No ratings yet
Informed Consent
2 pages
IQ Test. No Obligations, Free Online IQ Test at
No ratings yet
IQ Test. No Obligations, Free Online IQ Test at
1 page
6.5 Tools and Techniques: 2.critical Path Method (CPM)
No ratings yet
6.5 Tools and Techniques: 2.critical Path Method (CPM)
8 pages
Project Report (BBA)
No ratings yet
Project Report (BBA)
70 pages
Type U' Bops: Shear Bonnet Identification
No ratings yet
Type U' Bops: Shear Bonnet Identification
4 pages
Chapter 1
No ratings yet
Chapter 1
22 pages
Week 1 Worksheet (Brown Shoes Means Failed Job Interview Happiness V Consumerism)
No ratings yet
Week 1 Worksheet (Brown Shoes Means Failed Job Interview Happiness V Consumerism)
6 pages
Reading Comprehension Questions and Answers
No ratings yet
Reading Comprehension Questions and Answers
4 pages
A General Model For Railway Systems Risk Assessmen
No ratings yet
A General Model For Railway Systems Risk Assessmen
10 pages
AMAF - ASEAN - Gender Mainstreaming Guidelines
No ratings yet
AMAF - ASEAN - Gender Mainstreaming Guidelines
28 pages
Business Development
No ratings yet
Business Development
65 pages
IoT The Network Protocols and Technologies - v4
No ratings yet
IoT The Network Protocols and Technologies - v4
28 pages
BCU Freeze Issue: Swissbit SD Card
No ratings yet
BCU Freeze Issue: Swissbit SD Card
2 pages
Procrastination Academic Scale Student
100% (1)
Procrastination Academic Scale Student
8 pages
CPE 2 2 Time Scaled Event Network Exercises Macalinao
100% (2)
CPE 2 2 Time Scaled Event Network Exercises Macalinao
6 pages
NW Geography Grade 12 September 2024 P2 and Memo
No ratings yet
NW Geography Grade 12 September 2024 P2 and Memo
25 pages
Science-Religion Dialogue Crisis
No ratings yet
Science-Religion Dialogue Crisis
8 pages

T6 Decision Tree

Uploaded by

T6 Decision Tree

Uploaded by

Decision Trees

 A feature tree is a tree such that each internal node is labeled

 The set of literals at a node is called a split.

 Each leaf of the tree represents a logical expression, which is

 The extension of that conjunction is called the instance space

 A branch represents an outcome of the test.

 A leaf node represents a class label h(x) or class label

 At start, all training examples are at the root.

 Bottom-up tree pruning

 Remove subtrees or branches, in a bottom-up manner, to

 When to declare a node terminal and stop splitting?

 Otherwise, the algorithm calls BestSplit to determine the splitting

 All of the tuples in partition D (at node N) belong to the same

 There are no remaining attributes on which the tuples may be

 There are no tuples for a given branch, i.e., a partition Dj is

 A goodness of split is used for this purpose.

 The goodness of split in turn is measured by an impurity function

 Typical impurity functions:

 Minority class : min(p, 1-p)

- Many different frameworks for choosing BEST have been

 A completely homogeneous sample has entropy of 0.

 An equally divided sample has entropy of 1.

 p+ = no. of positive samples / total samples in S.

 p- = no. of negative samples / total samples in S.

pure, 100% yes

not pure at all, 40% yes pure, 100% yes

 The one which will result in the smallest tree.

 Popular impurity criterion: information gain

 Information gain increases with the average purity of the

 Information gain uses entropy H(p)

 Which attribute creates the most homogeneous branches?

 Calculate the entropy of the total dataset D before split.

 The dataset is then split on the different attributes.

 The entropy for each branch is calculated. Then it is added

 The result is the Information Gain, or decrease in entropy.

 Expected Information for “Outlook”:

SStrong = [3+, 3−] = entropy(SStrong)= - 3/3 log(3/3) – 3/3 log(3/3) = 1

 Information gain due to knowing Wind:

 Outlook provides the best prediction for the target.

 Lets grow the tree:

Gain(SSunny, Wind) = 0.970 − 2/5 × 1.0 − 3/5 × 0.918 = 0 .019

Gain(SSunny, Humidity) = 0.970 − 3/5 × 0.0 − 2/5 × 0.0 = 0.970

 Humidity provides the best prediction for the target

 Lets grow the tree:

 add to the tree a successor for each possible value of

 partition the training samples according to the value of

{D1, D2, D8} {D9, D11}

 Hence leaf nodes are labeled with that class.

 Lets grow the tree by testing the node rain:

 add to the tree a successor for each possible value of rain.

 Entropy: H(S) = – 3/5log(3/5) – 2/5log(2/5) = 0.971

{D1, D2, D8} {D9, D11}

You might also like