0% found this document useful (0 votes)

20 views14 pages

Decision Trees for Data Scientists

Uploaded by

ahmadmanhal673

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views14 pages

Decision Trees for Data Scientists

Uploaded by

ahmadmanhal673

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

03/12/1445

Decision Tree

Decision Trees
 Decision Trees are a type of Supervised Machine Learning (that is you
explain what the input is and what the corresponding output is in the
training data) where the data is continuously split according to a certain
parameter. The tree can be explained by two entities, namely decision
nodes and leaves. The leaves are the decisions or the final outcomes. And
the decision nodes are where the data is split.
 There are two main types of Decision Trees:
 Classification trees (Yes/No types): the decision variable is Categorical.
 Regression trees (Continuous data types): the decision or the outcome variable is
Continuous,

1
03/12/1445

Decision Trees
 A decision tree is a flow-chart-like tree structure
 Internal node denotes a test on an attribute (feature)
 Branch represents an outcome of the test
 All records in a branch have the same value for the tested attribute
 Leaf node represents class label or class label distribution

outlook

sunny overcast rain

humidity P windy

high normal true false

N P N P

2
03/12/1445

Instance Language for Classification

 Example: “is it a good day to play golf?” A particular instance in the
 a set of attributes and their possible values: training set might be:
outlook sunny, overcast, rain
<overcast, hot, normal, false>: play
temperature cool, mild, hot
humidity high, normal
windy true, false

In this case, the target class

is a binary attribute, so each
instance represents a positive
or a negative example.

3
03/12/1445

Using Decision Trees for Classification

 Examples can be classified as follows
1. look at the example's value for the feature specified
2. move along the edge labeled with this value
3. if you reach a leaf, return the label of the leaf
4. otherwise, repeat from step 1
 Example (a decision tree to decide whether to go on a picnic):

outlook

So a new instance:
sunny overcast rain
<rainy, hot, normal, true>: ?

humidity windy will be classified as “noplay”

high normal true false

N P N P

Decision Trees and Decision Rules

outlook

If attributes are continuous, sunny overcast rain

internal nodes may test
against a threshold. humidity P windy

> 75%<= 75% > 20 <= 20

N P N P

Each path in the tree represents a decision rule:

Rule1: Rule3:
If (outlook=“sunny”) AND (humidity<=0.75) If (outlook=“overcast”)
Then (play=“yes”) Then (play=“yes”)
Rule2: ...
If (outlook=“rainy”) AND (wind>20)
Then (play=“no”)

4
03/12/1445

Top-Down Decision Tree Generation

 The basic approach usually consists of two phases:
 Tree construction
 At the start, all the training instances are at the root
 Partition instances recursively based on selected attributes
 Tree pruning (to improve accuracy)
 remove tree branches that may reflect noise in the training data and lead to errors when
classifying test data

 Basic Steps in Decision Tree Construction

 Tree starts a single node representing all data
 If instances are all same class then node becomes a leaf labeled with class label
 Otherwise, select feature that best distinguishes among instances
 Partition the data based the values of the selected feature (with each branch
representing one partitions)
 Recursion stops when:
 instances in node belong to the same class (or if too few instances remain)
 There are no remaining attributes on which to split

Trees Construction Algorithm (ID3)

 Decision Tree Learning Method (ID3)
 Input: a set of training instances S, a set of features F
 1. If every element of S has a class value “yes”, return “yes”; if every element of
S has class value “no”, return “no”
 2. Otherwise, choose the best feature f from F (if there are no features
remaining, then return failure);
 3. Extend tree from f by adding a new branch for each attribute value of f
 3.1. Set F’ = F – {f},
 4. Distribute training instances to leaf nodes (so each leaf node n represents the
subset of examples Sn of S with the corresponding attribute value
 5. Repeat steps 1-5 for each leaf node n with Sn as the new set of training
instances and F’ as the new set of attributes
 Main Question:
 how do we choose the best feature at each step?

Note: ID3 algorithm only deals with categorical attributes, but can be extended
(as in C4.5) to handle continuous attributes

5
03/12/1445

Choosing the “Best” Feature

 Use Information Gain to find the “best” (most discriminating) feature
 Assume there are two classes, P and N (e.g, P = “yes” and N = “no”)
 Let the set of instances S (training data) contains p elements of class P and n
elements of class N
 The amount of information, needed to decide if an arbitrary example in S
belongs to P or N is defined in terms of entropy, I(p,n):

I ( p, n)   Pr( P )log 2 Pr( P )  Pr( N )log 2 Pr( N )

 Note that Pr(P) = p / (p+n) and Pr(N) = n / (p+n)

 In other words, entropy of a set on instances S is a function of the

probability distribution of classes among the instances in S.

Entropy
 Entropy for a two class variable

6
03/12/1445

Entropy in Multi-Class Problems

 More generally, if we have m classes, c1, c2, …, cm , with s1, s2, …, sm
as the numbers of instances from S in each class, then the entropy is:

I 𝑠1, 𝑠2, … , 𝑠𝑛 = − ∑ 𝑝 log2 𝑝

 where pi is the probability that an arbitrary instance belongs to the

class ci.

Information Gain
 Now, assume that using attribute A a set S of instances will be
partitioned into sets S1, S2 , …, Sv each corresponding to distinct
values of attribute A.
 If Si contains pi cases of P and ni cases of N, the entropy, or the expected
information needed to classify objects in all subtrees Si is

 The probability that

Si pi  ni
E ( A)   Pr( Si ) I ( pi , ni ) where, Pr( Si )   an arbitrary
i 1 S pn instance in S
belongs to the
partition Si
 The encoding information that would be gained by branching on A:

Gain( A)  I ( p, n)  E ( A)
 At any point we want to branch using an attribute that provides the highest
information gain.

7
03/12/1445

Attribute Selection - Example

 The “Golf” example: what attribute should we choose as the root?
Day outlook temp humidity wind play S: [9+,5-]
D1 sunny hot high weak No
Outlook?
D2 sunny hot high strong No
D3 overcast hot high weak Yes
D4 rain mild high weak Yes overcast rainy
sunny
D5 rain cool normal weak Yes
D6 rain cool normal strong No
D7 overcast cool normal strong Yes
D8 sunny mild high weak No [4+,0-] [2+,3-] [3+,2-]
D9 sunny cool normal weak Yes
D10 rain mild normal weak Yes I(9,5) = -(9/14).log(9/14) - (5/14).log(5/14)
D11 sunny mild normal strong Yes = 0.94
D12 overcast mild high strong Yes
D13 overcast hot normal weak Yes
I(4,0) = -(4/4).log(4/4) - (0/4).log(0/4)
D14 rain mild high strong No
=0

Gain(outlook) = .94 - (4/14)*0 I(2,3) = -(2/5).log(2/5) - (3/5).log(3/5)

- (5/14)*.97 = 0.97
- (5/14)*.97 I(3,2) = -(3/5).log(3/5) - (2/5).log(2/5)
= .24 = 0.97
15

Attribute Selection - Example (Cont.)

Day outlook temp humidity wind play S: [9+,5-] (I = 0.940)
D1 sunny hot high weak No
D2 sunny hot high strong No humidity?
D3 overcast hot high weak Yes
high normal
D4 rain mild high weak Yes
D5 rain cool normal weak Yes
D6 rain cool normal strong No
D7 overcast cool normal strong Yes [3+,4-] (I = 0.985) [6+,1-] (I = 0.592)
D8 sunny mild high weak No
D9 sunny cool normal weak Yes Gain(humidity) = .940 - (7/14)*.985 - (7/14)*.592
D10 rain mild normal weak Yes = .151
D11 sunny mild normal strong Yes
D12 overcast mild high strong Yes
S: [9+,5-] (I = 0.940)
D13 overcast hot normal weak Yes
D14 rain mild high strong No wind?
weak strong
So, classifying examples by humidity provides
more information gain than by wind. Similarly,
we must find the information gain for “temp”.
[6+,2-] (I = 0.811) [3+,3-] (I = 1.00)
In this case, however, you can verify that
outlook has largest information gain, so it’ll be Gain(wind) = .940 - (8/14)*.811 - (8/14)*1.0
selected as root = .048

8
03/12/1445

Attribute Selection - Example (Cont.)

 Partially learned decision tree
S: [9+,5-]
Outlook {D1, D2, …, D14}

sunny overcast rainy

? yes ?
[2+,3-] [4+,0-] [3+,2-]
{D1, D2, D8, D9, D11} {D3, D7, D12, D13} {D4, D5, D6, D10, D14}

 which attribute should be tested here?

Ssunny = {D1, D2, D8, D9, D11}
Gain(Ssunny, humidity) = .970 - (3/5)*0.0 - (2/5)*0.0 = .970
Gain(Ssunny, temp) = .970 - (2/5)*0.0 - (2/5)*1.0 - (1/5)*0.0 = .570
Gain(Ssunny, wind) = .970 - (2/5)*1.0 - (3/5)*.918 = .019

Other Attribute Selection Measures

 Gain ratio:
 Information Gain measure tends to be biased in favor attributes with a large number
of values
 Gain Ratio normalizes the Information Gain with respect to the total entropy of all
splits based on values of an attribute
 Used by C4.5 (the successor of ID3)
 But, tends to prefer unbalanced splits (one partition much smaller than others)
 Gini index:
 A measure of impurity (based on relative frequencies of classes in a set of instances)
 The attribute that provides the smallest Gini index (or the largest reduction in impurity
due to the split) is chosen to split the node
 Possible Problems:
 Biased towards multivalued attributes; similar to Info. Gain.
 Has difficulty when # of classes is large

9
03/12/1445

Overfitting
• An induced tree may overfit the training data
• Too many branches, some may reflect anomalies due to noise or
outliers
• Some splits or leaf nodes may be the result of decision based on very
few instances, resulting in poor accuracy for unseen instances

10
03/12/1445

Overfitting and Tree Pruning

 Two approaches to avoid overfitting
 Prepruning: Halt tree construction early ̵ do not split a node if this would
result in the error rate going above a pre-specified threshold
Difficult to choose an appropriate threshold
 Postpruning: Remove branches from a “fully grown” tree
Get a sequence of progressively pruned trees
Use a test data different from the training data to measure error rates
Select the “best pruned tree”

11
03/12/1445

Enhancements to Basic Decision Tree

Learning Approach
 Allow for continuous-valued attributes
 Dynamically define new discrete-valued attributes that partition the
continuous attribute value into a discrete set of intervals
 Handle missing attribute values
 Assign the most common value of the attribute
 Assign probability to each of the possible values
 Attribute construction
 Create new attributes based on existing ones that are sparsely represented
 This reduces fragmentation, repetition, and replication

12
03/12/1445

13
03/12/1445

T6 Decision Tree
No ratings yet
T6 Decision Tree
38 pages
06 Classification Decision Tree
No ratings yet
06 Classification Decision Tree
42 pages
Decision Trees: Decision Tree Is One of The Most Widely Used and
No ratings yet
Decision Trees: Decision Tree Is One of The Most Widely Used and
53 pages
Classification and Clustering
No ratings yet
Classification and Clustering
59 pages
Tree Models
No ratings yet
Tree Models
42 pages
L3 - Decision Trees
No ratings yet
L3 - Decision Trees
28 pages
Unit6 - 2 Classification-Decision-Trees
No ratings yet
Unit6 - 2 Classification-Decision-Trees
36 pages
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
No ratings yet
FALLSEM2024-25 BCSE209L TH VL2024250101598 2024-08-05 Reference-Material-I
31 pages
07 - ML - Decision Tree
No ratings yet
07 - ML - Decision Tree
37 pages
DT Classifier
No ratings yet
DT Classifier
45 pages
Decision Tree Learning Basics
No ratings yet
Decision Tree Learning Basics
36 pages
Decision Tree Learning Guide
No ratings yet
Decision Tree Learning Guide
33 pages
MLT UNIT-3 Notes
No ratings yet
MLT UNIT-3 Notes
35 pages
Decision Tree Learning Guide
No ratings yet
Decision Tree Learning Guide
79 pages
Lect 8-Decision Tree-2
No ratings yet
Lect 8-Decision Tree-2
16 pages
2.3 Decision-Tree-Algorithm
No ratings yet
2.3 Decision-Tree-Algorithm
61 pages
Decision Tree Induction Basics
No ratings yet
Decision Tree Induction Basics
55 pages
Module 3-Decision Tree Learning
100% (1)
Module 3-Decision Tree Learning
33 pages
Decision Trees for Data Scientists
No ratings yet
Decision Trees for Data Scientists
61 pages
Decision Tree Algorithm
No ratings yet
Decision Tree Algorithm
12 pages
CSC454 10
No ratings yet
CSC454 10
36 pages
Decision-Tree Learning .
No ratings yet
Decision-Tree Learning .
29 pages
Decision Tree
No ratings yet
Decision Tree
20 pages
Decision Tree
No ratings yet
Decision Tree
13 pages
Module - 2 Decision Tree Learning
No ratings yet
Module - 2 Decision Tree Learning
79 pages
2.decision Tree
No ratings yet
2.decision Tree
56 pages
Decision Trees for Data Classification
No ratings yet
Decision Trees for Data Classification
33 pages
Decision Trees
No ratings yet
Decision Trees
15 pages
Ch02 DecisionTree
100% (1)
Ch02 DecisionTree
41 pages
Data Mining: Classification-1
No ratings yet
Data Mining: Classification-1
53 pages
Decision Tree & Random Forest
No ratings yet
Decision Tree & Random Forest
41 pages
Classification With Decision Trees: Instructor: Qiang Yang
100% (1)
Classification With Decision Trees: Instructor: Qiang Yang
62 pages
AI&Ml-module 4 (Part 1)
No ratings yet
AI&Ml-module 4 (Part 1)
85 pages
AI&Ml-module 4 (Complete)
No ratings yet
AI&Ml-module 4 (Complete)
124 pages
Unit 4 - Decision Tree ID3
No ratings yet
Unit 4 - Decision Tree ID3
5 pages
Decision Tree
No ratings yet
Decision Tree
74 pages
Decision Tree
No ratings yet
Decision Tree
5 pages
15.module6 Decisiontree-Updated 14
No ratings yet
15.module6 Decisiontree-Updated 14
20 pages
Decision Tree
No ratings yet
Decision Tree
7 pages
Decision Tree: Dept of CS & IT Bahauddin Zakariya University, Sahiwal Campus
No ratings yet
Decision Tree: Dept of CS & IT Bahauddin Zakariya University, Sahiwal Campus
31 pages
Unit-4 (1) .Docx ML
No ratings yet
Unit-4 (1) .Docx ML
42 pages
Wk. 5.2. Decision Trees (27.10.2020)
No ratings yet
Wk. 5.2. Decision Trees (27.10.2020)
57 pages
L5 - Decision Tree - B
No ratings yet
L5 - Decision Tree - B
51 pages
DT-0 (3 Files Merged)
No ratings yet
DT-0 (3 Files Merged)
143 pages
Decision Tree
No ratings yet
Decision Tree
41 pages
Unit 1 ML (NN& ML Techniques)
No ratings yet
Unit 1 ML (NN& ML Techniques)
40 pages
Examples
No ratings yet
Examples
8 pages
Decision Trees
No ratings yet
Decision Trees
18 pages
ML Unit-2 Material WORD
No ratings yet
ML Unit-2 Material WORD
25 pages
3-Classification, Clustering and Prediction
No ratings yet
3-Classification, Clustering and Prediction
142 pages
Chapter 5 2018 2019
No ratings yet
Chapter 5 2018 2019
5 pages
Decision Trees Edited
No ratings yet
Decision Trees Edited
56 pages
Decision Tree Course Guide
No ratings yet
Decision Tree Course Guide
37 pages
Practice Q Machine Learning Ans
No ratings yet
Practice Q Machine Learning Ans
54 pages
Classification
No ratings yet
Classification
75 pages
New Module 3 Part1
No ratings yet
New Module 3 Part1
69 pages
Day 5 Supervised Technique-Decision Tree For Classification PDF
100% (1)
Day 5 Supervised Technique-Decision Tree For Classification PDF
58 pages
Unit 1 ML (DT)
No ratings yet
Unit 1 ML (DT)
24 pages
Redesign of An in Market Food Processor For Manufacturing Cost Reduction Using DFMA Methodology
No ratings yet
Redesign of An in Market Food Processor For Manufacturing Cost Reduction Using DFMA Methodology
20 pages
Cadastral Modernization in Zanzibar
No ratings yet
Cadastral Modernization in Zanzibar
3 pages
More Than Anything
No ratings yet
More Than Anything
70 pages
Powerpoint Cervantes
100% (1)
Powerpoint Cervantes
19 pages
International Experts' Review Hydric Component of The Environmental Impact Assessment Conga Mining Project
No ratings yet
International Experts' Review Hydric Component of The Environmental Impact Assessment Conga Mining Project
264 pages
Laboratory Record: Experiment No: 06
No ratings yet
Laboratory Record: Experiment No: 06
7 pages
GB/T 7512-2017 (English Version) Valves For Liquefied Petroleum Gas Cylinders, Includes Amendment 1
No ratings yet
GB/T 7512-2017 (English Version) Valves For Liquefied Petroleum Gas Cylinders, Includes Amendment 1
9 pages
Module 1 Pe Learning Activity Sheet
No ratings yet
Module 1 Pe Learning Activity Sheet
2 pages
Global Bioequivalence Insights
No ratings yet
Global Bioequivalence Insights
17 pages
Design Basis Report - PL & FF
100% (1)
Design Basis Report - PL & FF
17 pages
Veer Surendra Sai University of Technology: Burla Notice
No ratings yet
Veer Surendra Sai University of Technology: Burla Notice
4 pages
BCH 202 Water
No ratings yet
BCH 202 Water
13 pages
Jackson 3 2 Homework Solution
100% (1)
Jackson 3 2 Homework Solution
7 pages
Relationship Among Different School Subjects
No ratings yet
Relationship Among Different School Subjects
3 pages
Elgan Powerpoint 1
No ratings yet
Elgan Powerpoint 1
20 pages
Questions For Adult Class 2023
No ratings yet
Questions For Adult Class 2023
4 pages
Benetton Case Study - Group
No ratings yet
Benetton Case Study - Group
4 pages
Marketing & Packaging of Processed Meat & Poultry Products
No ratings yet
Marketing & Packaging of Processed Meat & Poultry Products
14 pages
K-Factor Calculation
No ratings yet
K-Factor Calculation
2 pages
Lesson Plan Outline
No ratings yet
Lesson Plan Outline
18 pages
Exp1-Mathematical Modeling Using Simulink
No ratings yet
Exp1-Mathematical Modeling Using Simulink
7 pages
G10 Chemistry Exam Revision
No ratings yet
G10 Chemistry Exam Revision
10 pages
Infineon IQDH29NE2LM5 DataSheet v02 00 EN-3367072
No ratings yet
Infineon IQDH29NE2LM5 DataSheet v02 00 EN-3367072
12 pages
River System2
No ratings yet
River System2
1 page
Locate365 Master Corp Pitch
No ratings yet
Locate365 Master Corp Pitch
10 pages
5K Training Schedule Beginner Runner & Walker: Week Mon Tue Wed Thu Fri Sat Sun 8
No ratings yet
5K Training Schedule Beginner Runner & Walker: Week Mon Tue Wed Thu Fri Sat Sun 8
2 pages
Zinnia
No ratings yet
Zinnia
2 pages
Liu 2013. Geografical Variation in Statolith Trace Elements Dosidicus Gigas
No ratings yet
Liu 2013. Geografical Variation in Statolith Trace Elements Dosidicus Gigas
10 pages
Power Electronics CT Delhi
No ratings yet
Power Electronics CT Delhi
6 pages
Service Facts TTA09043A
No ratings yet
Service Facts TTA09043A
12 pages

Decision Trees for Data Scientists

Uploaded by

Decision Trees for Data Scientists

Uploaded by

03/12/1445

sunny overcast rain

high normal true false

Instance Language for Classification

In this case, the target class

Using Decision Trees for Classification

humidity windy will be classified as “noplay”

high normal true false

Decision Trees and Decision Rules

If attributes are continuous, sunny overcast rain

> 75%<= 75% > 20 <= 20

Each path in the tree represents a decision rule:

Top-Down Decision Tree Generation

 Basic Steps in Decision Tree Construction

Trees Construction Algorithm (ID3)

Choosing the “Best” Feature

I ( p, n)   Pr( P )log 2 Pr( P )  Pr( N )log 2 Pr( N )

 Note that Pr(P) = p / (p+n) and Pr(N) = n / (p+n)

 In other words, entropy of a set on instances S is a function of the

Entropy in Multi-Class Problems

I 𝑠1, 𝑠2, … , 𝑠𝑛 = − ∑ 𝑝 log2 𝑝

 where pi is the probability that an arbitrary instance belongs to the

 The probability that

Attribute Selection - Example

Gain(outlook) = .94 - (4/14)*0 I(2,3) = -(2/5).log(2/5) - (3/5).log(3/5)

Attribute Selection - Example (Cont.)

Attribute Selection - Example (Cont.)

sunny overcast rainy

 which attribute should be tested here?

Other Attribute Selection Measures

Overfitting and Tree Pruning

Enhancements to Basic Decision Tree

You might also like