0% found this document useful (0 votes)

95 views44 pages

Lecture - 3 Classification (Decision Tree)

The document discusses decision trees (DT), including what they are, why they are useful, and how they work. It provides examples to illustrate key concepts in DT, such as: - DTs use splitting algorithms that recursively separate a dataset based on the values of features. - Information theory concepts like entropy and information gain are used to determine the optimal features to split on. - DTs output easy to understand classification rules and can handle both numeric and nominal data.

Uploaded by

Ashenafi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

95 views44 pages

Lecture - 3 Classification (Decision Tree)

Uploaded by

Ashenafi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 44

Classification : Decision Tree

(DT)
Adama Science and Technology University
School of Electrical Engineering and Computing
Department of CSE
Dr. Mesfin Abebe Haile (2020)
Outline

 What is decision tree (DT) algorithm

 Why we need DT
 The Pros and Cons of DT
 Information Theory
 Some Issues in DT
 Assignment II

11/07/22 2
Decision Tree (DT)

 Decision trees: splitting datasets one features at a time.

 The decision tree is one of the most commonly used classification technique.
 It has a decision blocks (rectangles).
 Termination block (ovals).
 The right and left arrows are called branches.

 The kNN algorithm can do a grate job of classification, but it didn’t lead to
any major insight about the data.

11/07/22 3
Decision Tree (DT)

Figure 1: A decision Tree

11/07/22 4
Decision Tree (DT)

 The best part of the DT (decision tree) algorithm is that humans can easily
understand the data:
 The DT algorithm:
 Takes a set of data. (training examples)
 Build a decision tree (model), and draw it.

 It can also be re-represented as sets of if-then rules to improve human readability.

 The DT does a grate job of distilling data into knowledge.
 Takes a set of unfamiliar data and extract a set of rules.
 DT is often used in expert system development.

11/07/22 5
Decision Tree (DT)

 The DT can be expressed using the following expression:

 (Outlook = Sunny ˄ Humidity = Normal) → Yes
 ˅ (Outlook = Overcast) → Yes
 ˅ (Outlook = Rain ˄ Wind = Weak) → Yes

11/07/22 6
Decision Tree (DT)

 The pros and cons of DT:

 Pros of DT:
 Computationally cheap to use,
 Easy for humans to understand the learned results,
 Missing values OK (robust to errors),
 Can deal with irrelevant features.

 Cons of DT:
 Prone to overfitting.
 Work with: Numeric values, nominal values.

11/07/22 7
Decision Tree (DT)

 Appropriate problems for DT learning:

 Instance are represented by attribute-value pairs (fixed set of attributes and
their values),
 The target function has discrete output values,
 Disjunctive descriptions may be required,
 The training data may contain errors,
 The training data may contain missing attribute values.

11/07/22 8
Decision Tree (DT)

 The mathematics that is used by DT to split the dataset is called

information theory:
 The first decision, you need to make is:
 Which feature shall be used to split the data.
 You need to try every feature and measure which split will give the
best result.
 Then split the dataset into subsets.
 The subsets will then traverse down the branches of the decision
node.
 If the data on the branch is the same, stop; else repeat the splitting.

11/07/22 9
Decision Tree (DT)

Figure 2: Pseudo-code for the splitting function

11/07/22 10
Decision Tree (DT)

 General approach to decision trees:

 Collect: Any method.
 Prepare: DI3 algorithm works only on nominal values, so any
continues values will need to be quantized.
 Analyze: Any method. You should visually inspect the tree after it
is build.
 Train: Construct a tree data structure. (DT)
 Test: Calculate the error rate with the learned tree.
 Use: This can be used in any supervised learning task. Often, to
better understand the data.

11/07/22 11
Decision Tree (DT)

 We would like to classify the following animals into two

classes:
 Fish and not Fish

Table 1: Marine animal data

11/07/22 12
Decision Tree (DT)

 Need to decide whether we should split the data based on the

first feature or the second feature:
 To make more organize the unorganized data.
 One way to do this is to measure the information.
 Measure the information before and after the split.

 Information theory is a branch of science that is concerned with

quantifying information.
 The change in information before and after the split is known
as the information gain.
11/07/22 13
Decision Tree (DT)

 The split with the highest information gain is the best option.
 The measure of information of a set is known as the Shannon
entropy or entropy.
 One way to do this is to measure the information.

 The change in information before and after the split is known

as the information gain.

11/07/22 14
Decision Tree (DT)

 To calculate entropy, you need the expected value of all the

information of all possible values of our class.
 This is given by:

 Where n is the number of classes:

11/07/22 15
Decision Tree (DT)

 The higher the entropy, the more mixed up the data.

 Another common measure of disorder in a set is the Gini
impurity.
 Which is the probability of choosing an item from the set and the
probability of that data item being misclassified.

 Calculate the shannon entropy of a dataset.

 Dataset splitting on a given feature.
 Choosing the best feature to split on.

11/07/22 16
Decision Tree (DT)

 Recursively building the tree.

 Start with dataset and split it based on the best attribute.
 The data will traverse down the branches of the tree to another
node.
 This node will then split the data again (recursively)
 Stop under the following conditions: run out of attributes or if the
instances in a branch are the same class.

11/07/22 17
Decision Tree (DT)

Table 2: Example training sets

11/07/22 18
Decision Tree (DT)

Figure 3: Data path while splitting

11/07/22 19
Decision Tree (DT)

 ID3 uses the information gain measure to select among the

candidate attributes.
 Start with dataset and split it based on the best attribute.
 Given a collection S, containing positive and negative examples
of some target.

 The entropy of S relative to this Boolean classification is:

Entropy(S) = -p1Xlog2p1+ - p2Xlog2p2

11/07/22 20
Decision Tree (DT)

 Example:
 The target attribute is PlayTennis. (yes/no)

Table 3: Example training sets

11/07/22 21
Decision Tree (DT)

 Suppose S is a collection of 14 examples of some Boolean

concept, including 9 positive and 5 negative examples.
 Then the entropy of S relative to Boolean classification is:

11/07/22 22
Decision Tree (DT)

 Note that the entropy is 0 if all members of S belong to the

same class.
 For example: if all the members are positive (p+ = 1), then (p-
= 0).
 Entropy (s) = -1.log2(1) – 0.log2(0) = 0

 Note the entropy is one (1) when the collection contain an

equal number of positive and negative examples.
 If the collection contain unequal number of positive and
negative the entropy is b/n 0 and 1.
11/07/22 23
Decision Tree (DT)

 Suppose S is a collection of training-example days described by

attributes Wind. (weak, strong)
 The information gain is the measure used by ID3 to select the
best attribute at each step in growing the tree.

11/07/22 24
Decision Tree (DT)

 Information gain of the two attributes: Humidity and Wind.

11/07/22 25
Decision Tree (DT)

 Example:
 ID3 will determines the information gain for each attribute.
(Outlook, Temperature, Humidity and Wind)
 Then select the one with the highest information gain.
 The information gain values for all four attributes are:
 Gain (S, Outlook) = 0.246
 Gain (S, Humidity) = 0.151
 Gain (S, Wind) = 0.048
 Gain (S, Temperature) = 0.029

 Outlook provides grater information gain than the other.

11/07/22 26
Decision Tree (DT)

 Example:
 According to the information gain measure, the Outlook attribute
selected as the root node.
 Branches are created below the root for each of its possible
values. (Sunny, Overcast, and Rain)

11/07/22 27
Decision Tree (DT)

 The partially learned decision tree resulting from the first step of ID3

11/07/22 28
Decision Tree (DT)

 The overcast descendant has only positive examples and

therefore becomes a leaf node with classification Yes:

 The other two nodes will be expand by selecting the attribute

with the highest information gain relative to the new subsets.

11/07/22 29
Decision Tree (DT)

 Decision Tree learning can be:

 Classification tree: finite set values target variables
 Regression tree: continuous values target variable

 There are many specific Decision Tree algorithms:

 ID3 (Iterative of ID3)
 C4.5 (Successor of ID3)
 CART(classification and Regression Tree)
 CHAID (Chi – squared Automatic Interaction Detector)
 MARS: extends DT to handle numerical data better
11/07/22 30
Decision Tree (DT)

 Different Decision Tree algorithms use different metrics for

measuring the “best attribute” :
 Information gain: used by ID3, C4.5 and C5.0
 Gini impurity: used by CART

11/07/22 31
Decision Tree (DT)

 ID3 in terms of its search space and search strategy:

 ID3’s hypothesis space of all decision tree is a complete space of
finite discrete-valued functions.
 ID3’s maintains only a single current hypothesis as it searches
through the space of decision trees.

 ID3 in its pure form perform no backtracking in its search. (post-

pruning the decision tree)
 ID3 uses all training examples at each step in the search to make
statistically based decisions regarding how to refine its current
hypothesis. (much less sensitive to error)
11/07/22 32
Decision Tree (DT)

 Inductive bias in Decision Tree learning (ID3) :

 Inductive bias are the set of assumption.
 ID3 selects in favor of shorter trees over longer ones. (breadth
first approach)
 Selects trees that place the attributes with the highest information
gain closest to the root.

11/07/22 33
Decision Tree (DT)

 Issues in Decision Tree learning:

 How deeply to grow the decision tree
 Handling continuous attributes
 Choosing an appropriate attributes selection measure
 Handling training data with missing attribute values
 Handling attributes with differing costs and
 Improve computational efficiency

 ID3 extended to address most of these issues to C4.5.

11/07/22 34
Decision Tree (DT)

 Avoiding over fitting the Data:

 Noisy data and too small training examples are problems.
 Over fitting is a practical problem for Decision Tree and many of
the learning algorithms.
 Over fitting was found to decrease the accuracy of the learned tree
by 10-25%.

 Approach to avoid over fitting:

 Stop growing the tree, before it over fitting. (direct but less practical)
 Allow the tree to over fitting, and then post-prune (the most successful
in practice)

11/07/22 35
Decision Tree (DT)

 Incorporating continuous-value attributes:

 Initial definition of ID3, attributes and target value must be
discrete set of values.

 The attributes tested in the decision nodes of the tree must be

discrete value :
 Create a new Boolean attribute for the continuous value or
 Multiple interval rather than just two interval.

11/07/22 36
Decision Tree (DT)

 Alternative measure for selecting attributes:

 Information gain favor attributes with many values.
 One alternative measure that has been used successfully is the
gain ratio.

11/07/22 37
Decision Tree (DT)

 Handling training example with missing attribute value:

 Assign it with the most common value among training examples at
node n.
 Assign the probability to each of the possible values of the attribute.
 The 2nd approach, is used in C4.5.

11/07/22 38
Decision Tree (DT)

 Handling attributes with different costs:

 Low-cost attributes than high-cost attributes.
 ID3 can be modified to take into account costs by introducing a cost
term into the attribute selection measure.
Divide the gain by the cost of the attribute.

11/07/22 39
Question & Answer

11/07/22 40
Thank You !!!

11/07/22 41
Assignment II

 Answer the given questions by considering the following set of training examples.

11/07/22 42
Assignment II

 (a) What is the entropy of this collection of training examples with respect to the target function classification?
 (b) What is the information gain of a2 relative to these training examples?

11/07/22 43
Decision Tree (DT)

 Do some research on the following Decision Tree algorithms:

11/07/22 44

Decision Tree Basics for Students
No ratings yet
Decision Tree Basics for Students
52 pages
Data Mining Chapter 3 Classification
No ratings yet
Data Mining Chapter 3 Classification
82 pages
Decision Trees Edited
No ratings yet
Decision Trees Edited
56 pages
Day 5 Supervised Technique-Decision Tree For Classification PDF
100% (1)
Day 5 Supervised Technique-Decision Tree For Classification PDF
58 pages
AI&Ml-module 4 (Part 1)
No ratings yet
AI&Ml-module 4 (Part 1)
85 pages
AI&Ml-module 4 (Complete)
No ratings yet
AI&Ml-module 4 (Complete)
124 pages
AI Chapter 3 Part 2
No ratings yet
AI Chapter 3 Part 2
51 pages
Unit 3 (A) NGP
No ratings yet
Unit 3 (A) NGP
78 pages
Aiml M4 C1
No ratings yet
Aiml M4 C1
101 pages
UNIT II 2.1 ML Decision Tree Learning
No ratings yet
UNIT II 2.1 ML Decision Tree Learning
55 pages
Decision Trees for Beginners
No ratings yet
Decision Trees for Beginners
4 pages
Machine Learning-Lecture 05
No ratings yet
Machine Learning-Lecture 05
21 pages
Slide 3
No ratings yet
Slide 3
23 pages
Classification and Prediction
No ratings yet
Classification and Prediction
81 pages
DecisionTree Numerical ID3Prob
No ratings yet
DecisionTree Numerical ID3Prob
114 pages
Decision Trees for Data Scientists
No ratings yet
Decision Trees for Data Scientists
14 pages
DataMining-Handouts1 5
No ratings yet
DataMining-Handouts1 5
8 pages
Decision Tree
No ratings yet
Decision Tree
41 pages
Decision Tree Classification Guide
No ratings yet
Decision Tree Classification Guide
23 pages
ML Unit II
No ratings yet
ML Unit II
183 pages
Lesson 7 Supervised Method (Decision Trees) Algorithms
No ratings yet
Lesson 7 Supervised Method (Decision Trees) Algorithms
12 pages
ML Unit 3 Qa
No ratings yet
ML Unit 3 Qa
26 pages
DWM - Module 3
No ratings yet
DWM - Module 3
22 pages
Data Science Lectures 3
No ratings yet
Data Science Lectures 3
46 pages
Decision Trees and Decision Modeling
No ratings yet
Decision Trees and Decision Modeling
58 pages
Unit6 - 2 Classification-Decision-Trees
No ratings yet
Unit6 - 2 Classification-Decision-Trees
36 pages
Unit 3 Notes
No ratings yet
Unit 3 Notes
25 pages
Practice Q Machine Learning Ans
No ratings yet
Practice Q Machine Learning Ans
54 pages
فاينل تعلم
No ratings yet
فاينل تعلم
144 pages
Unit 3 (MLT)
No ratings yet
Unit 3 (MLT)
42 pages
Decision Tree
No ratings yet
Decision Tree
26 pages
06 Classification Decision Tree
No ratings yet
06 Classification Decision Tree
42 pages
Machine Learning: Decision Trees & Algorithms
No ratings yet
Machine Learning: Decision Trees & Algorithms
24 pages
MLT UNIT-3 Notes
No ratings yet
MLT UNIT-3 Notes
35 pages
MLT Unit 3
100% (1)
MLT Unit 3
38 pages
ML Unit3
No ratings yet
ML Unit3
24 pages
Unit-2 Material
No ratings yet
Unit-2 Material
52 pages
Lecture11 Ch8 ClassBasic Part1
No ratings yet
Lecture11 Ch8 ClassBasic Part1
38 pages
L8 1 Decisiontrees Random Forest
No ratings yet
L8 1 Decisiontrees Random Forest
118 pages
Decision Tree
No ratings yet
Decision Tree
20 pages
Decision Tree 2
No ratings yet
Decision Tree 2
20 pages
Lecture 6 - Decision Trees
No ratings yet
Lecture 6 - Decision Trees
43 pages
Session 5b Classification by Decision Tree Induction
No ratings yet
Session 5b Classification by Decision Tree Induction
42 pages
Unit 1 ML (NN& ML Techniques)
No ratings yet
Unit 1 ML (NN& ML Techniques)
40 pages
2.12 Chapter 6 Decision Tree
No ratings yet
2.12 Chapter 6 Decision Tree
56 pages
Decision Tree Intro MDT903
No ratings yet
Decision Tree Intro MDT903
40 pages
Decision Tree Classification Guide
No ratings yet
Decision Tree Classification Guide
161 pages
Wk. 5.2. Decision Trees (27.10.2020)
No ratings yet
Wk. 5.2. Decision Trees (27.10.2020)
57 pages
Decision Trees CLS
No ratings yet
Decision Trees CLS
43 pages
5.KNN Naive Bayes and DT
No ratings yet
5.KNN Naive Bayes and DT
44 pages
Ch02 DecisionTree
100% (1)
Ch02 DecisionTree
41 pages
Decision Tree Learning Guide
No ratings yet
Decision Tree Learning Guide
33 pages
06-Classification Part1
No ratings yet
06-Classification Part1
44 pages
Decision Tree Learning Guide
No ratings yet
Decision Tree Learning Guide
79 pages
Unit 3
No ratings yet
Unit 3
46 pages
M01 Tree-Based Methods
No ratings yet
M01 Tree-Based Methods
38 pages
ML-3-Decision Tree
No ratings yet
ML-3-Decision Tree
17 pages
Session-29 Co3 - BBN DT KNN
No ratings yet
Session-29 Co3 - BBN DT KNN
34 pages
Intro to Machine Learning & kNN
No ratings yet
Intro to Machine Learning & kNN
90 pages
Lecture - 1 Introduction To ML
No ratings yet
Lecture - 1 Introduction To ML
38 pages
Intro to Machine Learning & kNN
No ratings yet
Intro to Machine Learning & kNN
90 pages
Lecture - 4 Classification (Naive Bayes)
No ratings yet
Lecture - 4 Classification (Naive Bayes)
33 pages
Lecture - 4 Classification (Naive Bayes)
No ratings yet
Lecture - 4 Classification (Naive Bayes)
33 pages
Lecture - 1 Introduction To ML
No ratings yet
Lecture - 1 Introduction To ML
38 pages
A Crown So Cursed
100% (1)
A Crown So Cursed
13 pages
Cbse Class 10 Maths Competency Based Questions Chapter 11
No ratings yet
Cbse Class 10 Maths Competency Based Questions Chapter 11
5 pages
01 Index To Challenge Magazine 25-77 (6419662)
67% (3)
01 Index To Challenge Magazine 25-77 (6419662)
50 pages
BARTEC - Automatizacion - ANTARES Brochure - Long
No ratings yet
BARTEC - Automatizacion - ANTARES Brochure - Long
32 pages
DL-basics-of-neural-networks-MNIST-dataset - Ipynb - Colab
No ratings yet
DL-basics-of-neural-networks-MNIST-dataset - Ipynb - Colab
5 pages
PFC Matrix
100% (2)
PFC Matrix
48 pages
Gen Bio 2 Summative Test Q4 Week 1 and 2
No ratings yet
Gen Bio 2 Summative Test Q4 Week 1 and 2
3 pages
The Effect of Swimming Volume
No ratings yet
The Effect of Swimming Volume
6 pages
Owner's Manual & Safety Instructions
No ratings yet
Owner's Manual & Safety Instructions
4 pages
Pharmaceutical Dosage Forms PDF
No ratings yet
Pharmaceutical Dosage Forms PDF
20 pages
Folds
No ratings yet
Folds
33 pages
Memorandum Grade 9 Geography
No ratings yet
Memorandum Grade 9 Geography
4 pages
PPG HI-TEMP 1027™: Product Data Sheet
No ratings yet
PPG HI-TEMP 1027™: Product Data Sheet
6 pages
NPR American Piston PDF
100% (1)
NPR American Piston PDF
296 pages
Health Education in Environmental Sanitation
No ratings yet
Health Education in Environmental Sanitation
7 pages
Rutaceae
No ratings yet
Rutaceae
13 pages
Production Relocation of Multinational Companies From China and Chances For Vietnam
No ratings yet
Production Relocation of Multinational Companies From China and Chances For Vietnam
10 pages
Journal 12
No ratings yet
Journal 12
7 pages
Certificate Of: Product Name: Refrigerant Tytnhh Haianh 1201160346
No ratings yet
Certificate Of: Product Name: Refrigerant Tytnhh Haianh 1201160346
1 page
Examples of Building Response To Excavation and Tunneling - E Cording PDF
No ratings yet
Examples of Building Response To Excavation and Tunneling - E Cording PDF
80 pages
Chapter Two Fundamental Parameters of Antenna PDF
100% (2)
Chapter Two Fundamental Parameters of Antenna PDF
88 pages
Chemical Raipur
No ratings yet
Chemical Raipur
27 pages
Effects of Environmental Factors On Maxillofacial Elastomers: - Literature Review
No ratings yet
Effects of Environmental Factors On Maxillofacial Elastomers: - Literature Review
4 pages
IoT Sensor Localization Review
No ratings yet
IoT Sensor Localization Review
35 pages
Group 8 - B - Laboratory Activity 2
No ratings yet
Group 8 - B - Laboratory Activity 2
11 pages
MR Q Brochure
No ratings yet
MR Q Brochure
12 pages
TL 52705 en
No ratings yet
TL 52705 en
9 pages
Engineering Motion Problems
No ratings yet
Engineering Motion Problems
2 pages
Music Publication
No ratings yet
Music Publication
24 pages

Lecture - 3 Classification (Decision Tree)

Uploaded by

Lecture - 3 Classification (Decision Tree)

Uploaded by

Classification : Decision Tree

 What is decision tree (DT) algorithm

 Decision trees: splitting datasets one features at a time.

Figure 1: A decision Tree

 It can also be re-represented as sets of if-then rules to improve human readability.

 The DT can be expressed using the following expression:

 The pros and cons of DT:

 Appropriate problems for DT learning:

 The mathematics that is used by DT to split the dataset is called

Figure 2: Pseudo-code for the splitting function

 General approach to decision trees:

 We would like to classify the following animals into two

Table 1: Marine animal data

 Need to decide whether we should split the data based on the

 Information theory is a branch of science that is concerned with

 The change in information before and after the split is known

 To calculate entropy, you need the expected value of all the

 Where n is the number of classes:

 The higher the entropy, the more mixed up the data.

 Calculate the shannon entropy of a dataset.

 Recursively building the tree.

Table 2: Example training sets

Figure 3: Data path while splitting

 ID3 uses the information gain measure to select among the

 The entropy of S relative to this Boolean classification is:

Table 3: Example training sets

 Suppose S is a collection of 14 examples of some Boolean

 Note that the entropy is 0 if all members of S belong to the

 Note the entropy is one (1) when the collection contain an

 Suppose S is a collection of training-example days described by

 Information gain of the two attributes: Humidity and Wind.

 Outlook provides grater information gain than the other.

 The overcast descendant has only positive examples and

 The other two nodes will be expand by selecting the attribute

 Decision Tree learning can be:

 There are many specific Decision Tree algorithms:

 Different Decision Tree algorithms use different metrics for

 ID3 in terms of its search space and search strategy:

 ID3 in its pure form perform no backtracking in its search. (post-

 Inductive bias in Decision Tree learning (ID3) :

 Issues in Decision Tree learning:

 ID3 extended to address most of these issues to C4.5.

 Avoiding over fitting the Data:

 Approach to avoid over fitting:

 Incorporating continuous-value attributes:

 The attributes tested in the decision nodes of the tree must be

 Alternative measure for selecting attributes:

 Handling training example with missing attribute value:

 Handling attributes with different costs:

 Do some research on the following Decision Tree algorithms:

You might also like