[go: up one dir, main page]

0% found this document useful (0 votes)
5 views50 pages

Section+04+ +classification+ +002+ +Decision+Tree

The document provides an overview of Decision Trees as a supervised learning method used for classification tasks, detailing their structure, terms, and variations such as Boosted Decision Trees and Decision Forests. It explains the concepts of ensemble learning, including bagging and boosting, and discusses their applications in predicting outcomes based on various datasets. Additionally, it highlights the importance of parameters like tree depth and the number of splits in optimizing model performance.

Uploaded by

sundar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views50 pages

Section+04+ +classification+ +002+ +Decision+Tree

The document provides an overview of Decision Trees as a supervised learning method used for classification tasks, detailing their structure, terms, and variations such as Boosted Decision Trees and Decision Forests. It explains the concepts of ensemble learning, including bagging and boosting, and discusses their applications in predicting outcomes based on various datasets. Additionally, it highlights the importance of parameters like tree depth and the number of splits in optimizing model performance.

Uploaded by

sundar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 50

© Jitesh Khurkhuriya – Azure ML Online Course

Classification

© Jitesh Khurkhuriya – Azure ML Online Course


What is a Decision Tree?

© Jitesh Khurkhuriya – Azure ML Online Course


What is Decision Tree?

• Supervised learning method

• Decision support tool that uses a tree-like graph or model of decisions


and their possible consequences

• Various variations such as Boosted Decision Tree, Decision Forest,


Decision Jungle

• Can be used for categorical as well as continuous variables

© Jitesh Khurkhuriya – Azure ML Online Course


Loan ID Income Level Credit Score Employment Approved?
L1 Medium Low Self-Employed No
L2 High Low Self-Employed Yes
L3 High High Salaried Yes
L4 Medium Low Salaried Yes
L5 Low High Salaried No
L6 Low Low Self-Employed No
L7 High Low Salaried Yes
L8 Medium Low Self-Employed No
L9 High High Self-Employed Yes
L10 Medium High Self-Employed Yes
L11 High Low Salaried Yes
L12 Medium High Salaried Yes
L13 Medium High Self-Employed Yes
L14 Low Low Self-Employed No
L15 Low High Self-Employed No

© Jitesh Khurkhuriya – Azure ML Online Course


Income
Level

High Low Medium

LID IL CS ET Status LID IL CS ET Status LID IL CS ET Status


L2 High Low SE Yes L5 Low High Salaried No L1 Medium Low SE No

L3 High High Salaried Yes L6 Low Low SE No L4 Medium Low Salaried Yes
L14 Low Low SE No L8 Medium Low SE No
L7 High Low Salaried Yes
L15 Low High SE No L10 Medium High SE Yes
L9 High High SE Yes
L12 Medium High Salaried Yes
L11 High Low Salaried Yes
L13 Medium High SE Yes

Pure Subset Pure Subset Split Further

© Jitesh Khurkhuriya – Azure ML Online Course


Income
Level

High Low Medium

Credit Score

Low High

LID IL CS ET Status LID IL CS ET Status

L1 Medium Low SE No L10 Medium High SE Yes

L4 Medium Low Salaried Yes L12 Medium High Salaried Yes

L8 Medium Low SE No L13 Medium High SE Yes

Pure Subset
Split Further

© Jitesh Khurkhuriya – Azure ML Online Course


Income
Level

High Low Medium

Credit Score

Low High

Employment Type

SE Salaried

LID IL CS ET Status LID IL CS ET Status


L1 Medium Low SE No L4 Medium Low Salaried Yes
L8 Medium Low SE No
Pure Subset
Pure Subset
© Jitesh Khurkhuriya – Azure ML Online Course
Decision Tree Terms

Root Node

Branch or Subtree

Decision Node Decision Node

Terminal Node Decision Node Terminal Node Terminal Node


Leaf Leaf Leaf

Terminal Node Terminal Node


Leaf Leaf

© Jitesh Khurkhuriya – Azure ML Online Course


Definitions
• Root Node: It represents entire population or sample and this further gets divided into two or
more homogeneous sets.

• Splitting: It is a process of dividing a node into two or more sub-nodes.

• Decision Node: When a sub-node splits into further sub-nodes, then it is called decision node.

• Leaf/ Terminal Node: Nodes do not split is called Leaf or Terminal node.

• Branch / Sub-Tree: A sub section of entire tree is called branch or sub-tree.

© Jitesh Khurkhuriya – Azure ML Online Course


Ensemble Learning

© Jitesh Khurkhuriya – Azure ML Online Course


Everyday Ensemble Learning

© Jitesh Khurkhuriya – Azure ML Online Course


Decision?

Is this price fair?


Appreciation of price?
Construction Quality?

Neighbourhood?
Location appropriate?

© Jitesh Khurkhuriya – Azure ML Online Course


Decision?

Broker or real estate portal to check


fair price, price appreciation

Friend or colleague who stays nearby


or stayed in the neighbourhood

Inspection by an architect for quality


checks and structural defects.
© Jitesh Khurkhuriya – Azure ML Online Course
Decision?

Is this price fair?


Appreciation of price?
Construction Quality?
Majority
Weighted Average
Neighbourhood?
Location appropriate?

© Jitesh Khurkhuriya – Azure ML Online Course


Ensemble Learning
• All algorithms have errors

• Collective wisdom is higher than the individual intelligence

• Generate a group of base learners and combined result gives higher accuracy

• Different base learners can use different,


• Parameters
• Sequence
• Training sets etc

• Two major Ensemble Learning Methods


• Bagging
• Boosting
© Jitesh Khurkhuriya – Azure ML Online Course
Bagging
• Various models are built in parallel

• All models vote to give the final prediction

3Y
Y Y Y
1N

Y
N
© Jitesh Khurkhuriya – Azure ML Online Course
Boosting

• Train the Decision Tree in a sequence

• Learn from the previous tree by focussing on incorrect


observations

• Build new model with higher weight for incorrect


observations from previous sequence

© Jitesh Khurkhuriya – Azure ML Online Course


© Jitesh Khurkhuriya – Azure ML Online Course
Boosted Model

© Jitesh Khurkhuriya – Azure ML Online Course


Two Class Boosted Decision
Tree

© Jitesh Khurkhuriya – Azure ML Online Course


Bank Telemarketing
• Goal is to predict if the client will subscribe to a product or not

• Number of instances – 45, 211 1. Age 12. Campaign Type


2. Job Type 13. P-Days
3. Marital Status 14. Previous
4. Education Level 15. P-Outcome
5. Credit Default? 16. Emp-Var-Rate
6. Housing Loan? 17. Consumer Price Index
7. Personal Loan 18. Consumer Confidence Index
8. Contacted Type 19. Euribor 3 Month Rate
9. Contacted Month 20. Number of employees
10. Last Contacted day 21. Subscribed?
11. Contact Duration

https://archive.ics.uci.edu/ml/datasets/bank+marketing
Source: [Moro et al., 2014] S. Moro, P. Cortez and P. Rita. A Data-Driven Approach to Predict the
Success of Bank Telemarketing. Decision Support Systems, Elsevier, 62:22-31, June 2014
© Jitesh Khurkhuriya – Azure ML Online Course
Boosting

• Train the Decision Tree in a sequence

• Learn from the previous tree by focussing on incorrect


observations

• Build new model with higher weight for incorrect


observations from previous sequence

© Jitesh Khurkhuriya – Azure ML Online Course


© Jitesh Khurkhuriya – Azure ML Online Course
Two-Class Boosted Decision Tree?
• Machine learning model based on the boosted decision trees
algorithm

• Based on ensemble learning method

• Among the easiest methods to get top performance

• One of the more memory-intensive learners.

© Jitesh Khurkhuriya – Azure ML Online Course


How the model should be trained?
• Single Parameter
• Parameter Range

© Jitesh Khurkhuriya – Azure ML Online Course


Leaves/Terminal Nodes
• Increases the size of the tree
• Gives better Precision
• Risk of overfitting and longer training
time

© Jitesh Khurkhuriya – Azure ML Online Course


Number of cases required

• Increases/decreases the threshold for


creating new node

© Jitesh Khurkhuriya – Azure ML Online Course


• Number between 0 to 1
• Defines the step size of learning
• How fast or slow the learner reaches
the optimal solution
• Smaller the rate, longer time to reach
solution but more accuracy

© Jitesh Khurkhuriya – Azure ML Online Course


• Any integer value
• Higher the number, better accuracy and
more time

© Jitesh Khurkhuriya – Azure ML Online Course


• Non-negative number for reproducing
the results
• Default is zero

© Jitesh Khurkhuriya – Azure ML Online Course


Decision Forest

© Jitesh Khurkhuriya – Azure ML Online Course


Bagging
• Various models are built in parallel

• All models vote to give the final prediction

Y Y Y
Y
N
© Jitesh Khurkhuriya – Azure ML Online Course
Bagging
Original

Train Test

Model 1 Model 2 Model 3 Model 4


D1 D2 D3 D4
Y1 Y2 Y3 Y4
w1 w2 w3 w4

Majority/Weighted Average
Model 1 Model 2 Model 3 Model 4

Y
© Jitesh Khurkhuriya – Azure ML Online Course
Two Class Decision Forest

© Jitesh Khurkhuriya – Azure ML Online Course


Adult Census Data
• Problem statement: Predict whether income exceeds $50K/yr based on census data.

1. Age
2. Workclass
3. Fnlwgt
4. Education
5. Education-Num
6. Marital Status
7. Occupation
8. Relationship
9. Race
10. Sex
11. Capital Gains
12. Capital Losses
13. Hours per week
14. Native Country
15. Income

© Jitesh Khurkhuriya – Azure ML Online Course


Bagging

• Each Individual tree is grown on a new sample


• Random Sample of dataset with replacement
• Output is combined by voting

© Jitesh Khurkhuriya – Azure ML Online Course


Replicate

• Each tree is trained on the same dataset

© Jitesh Khurkhuriya – Azure ML Online Course


• Number to limit the maximum depth of
decision trees.

• Increasing the depth of the tree might


increase precision, at the risk of some
overfitting and increased training time.

© Jitesh Khurkhuriya – Azure ML Online Course


• The number of splits to use when
building each node of the tree.

• A split means that features in each


level of the tree (node) are
randomly divided.

© Jitesh Khurkhuriya – Azure ML Online Course


Decision Forest

© Jitesh Khurkhuriya – Azure ML Online Course


Bagging
• Various models are built in parallel

• All models vote to give the final prediction

Y Y Y
Y
N
© Jitesh Khurkhuriya – Azure ML Online Course
Bagging
Original

Train Test

Model 1 Model 2 Model 3 Model 4


D1 D2 D3 D4
Y1 Y2 Y3 Y4
w1 w2 w3 w4

Majority/Weighted Average
Model 1 Model 2 Model 3 Model 4

Y
© Jitesh Khurkhuriya – Azure ML Online Course
Two Class Decision Forest

© Jitesh Khurkhuriya – Azure ML Online Course


Adult Census Data
• Problem statement: Predict whether income exceeds $50K/yr based on census data.

1. Age
2. Workclass
3. Fnlwgt
4. Education
5. Education-Num
6. Marital Status
7. Occupation
8. Relationship
9. Race
10. Sex
11. Capital Gains
12. Capital Losses
13. Hours per week
14. Native Country
15. Income

© Jitesh Khurkhuriya – Azure ML Online Course


Bagging

• Each Individual tree is grown on a new sample


• Random Sample of dataset with replacement
• Output is combined by voting

© Jitesh Khurkhuriya – Azure ML Online Course


Replicate

• Each tree is trained on the same dataset

© Jitesh Khurkhuriya – Azure ML Online Course


• Number to limit the maximum depth of
decision trees.

• Increasing the depth of the tree might


increase precision, at the risk of some
overfitting and increased training time.

© Jitesh Khurkhuriya – Azure ML Online Course


• The number of splits to use when
building each node of the tree.

• A split means that features in each


level of the tree (node) are
randomly divided.

© Jitesh Khurkhuriya – Azure ML Online Course


Thank You…!

© Jitesh Khurkhuriya – Azure ML Online Course

You might also like