0% found this document useful (0 votes)

73 views34 pages

Lecture #15: Regression Trees & Random Forests

This document provides an overview of regression trees, bagging, and random forests. It discusses how decision trees can be adapted for regression tasks by using splitting criteria that minimize mean squared error and labeling leaf nodes with the average target value of training examples. Bagging is described as generating multiple bootstrap samples and averaging the predictions of trees trained on each sample to reduce variance. Random forests are presented as an improvement on bagging by training each tree on a random subset of features to decorrelate the trees.

Uploaded by

Ash KPJ

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

73 views34 pages

Lecture #15: Regression Trees & Random Forests

Uploaded by

Ash KPJ

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

Lecture #15: Regression Trees & Random

Forests
Data Science 1
CS 109A, STAT 121A, AC 209A, E-109A

Pavlos Protopapas Kevin Rader

Rahul Dave Margo Levine
Lecture Outline

Review

Decision Trees for Regression

Bagging

Random Forests

2
Review

3
Decision Trees
A decision tree model is an interpretable model in which the
final output is based on a series of comparisons of the values
of predictors against threshold values.
Graphically, decision trees can be represented by a flow chart.
Geometrically, the model partitions the feature space
wherein each region is assigned a response variable value
based on the training points contained in the region.

4
Learning Algorithm
To learn a decision tree model, we take a greedy
approach:

1. Start with an empty decision tree (undivided

feature space)
2. Choose the ‘optimal’ predictor on which to split and
choose the ‘optimal’ threshold value for splitting by
applying a splitting criterion
3. Recurse on on each new node until stopping
condition is met

For classification, we label each region in the model

with the label of the class to which the plurality of the
points within the region belong.

5
Decision Trees for Regression

6
Adaptations for Regression
With just two modifications, we can use a decision tree model for
regression:

▶ The three splitting criteria we’ve examined each promoted

splits that were pure - new regions increasingly specialized in
a single class.

For classification, purity of the regions is a good indicator the

performance of the model.

For regression, we want to select a splitting criterion that

promotes splits that improves the predictive accuracy of the
model as measured by, say, the MSE.
▶ For regression with output in R, we want to label each region
in the model with a real number - typically the average of the
output values of the training points contained in the region.

7
Learning Regression Trees
The learning algorithms for decision trees in regression tasks is:

1. Start with an empty decision tree (undivided feature space)

2. Choose a predictor j on which to split and choose a threshold
value tj for splitting such that the weighted average MSE of
the new regions as smallest possible:
{ }
N1 N2
argminj,tj MSE(R1 ) + MSE(R2 )
N N
or equivalently,
{ }
N1 N2
argminj,tj Var[y|x ∈ R1 ] + Var[y|x ∈ R2 ]
N N
where Ni is the number of training points in Ri and N is the
number of points in R.
3. Recurse on on each new node until stopping condition is met
8
Regression Trees Prediction

For any data point xi

1. Traverse the tree until we reach a leaf node.
2. Averaged value of the response variable y’s in the
leaf (this is from the training set) is the ŷi

9
Stopping Conditions

Most of the stopping conditions, like maximum depth

or minimum number of points in region, we saw last
time can still be applied.
In the place of purity gain, we can instead compute
accuracy gain for splitting a region R

N1 N2
Gain(R) = ∆(R) = M SE(R)− M SE(R1 )− M SE(R2 )
N N
and stop the tree when the gain is less than some
pre-defined threshold.

10
Expressiveness of Decision Trees

We’ve seen that classification trees approximate boundaries

in the feature space that separate classes.
Regression trees, on the other hand, define simple functions
or step functions, functions that are defined on partitions of
the feature space and are constant over each part.

11
Expressiveness of Decision Trees

For a fine enough partition of the feature space, these

functions can approximate complex non-linear
functions.

11
Bagging

12
Limitations of Decision Tree Models

Decision trees models are highly interpretable and fast

to train, using our greedy learning algorithm.
However, in order to capture a complex decision
boundary (or to approximate a complex function), we
need to use a large tree (since each time we can only
make axis aligned splits).
We’ve seen that large trees have high variance and are
prone to overfitting.
For these reasons, in practice, decision tree models
often underperforms when compared with other
classification or regression methods.

13
Bagging
One way to adjust for the high variance of the output of an
experiment is to perform the experiment multiple times and
then average the results.
The same idea can be applied to high variance models:

1. (Bootstrap) we generate multiple samples of training

data, via bootstrapping. We train a full decision tree on
each sample of data.

2. (Aggregate) for a given input, we output the averaged

outputs of all the models for that input.

For classification, we return the class that is outputted

by the plurality of the models.

This method is called Bagging (Breiman, 1996), short for, of

course, Bootstrap Aggregating.
14
Bagging

Note that bagging enjoys the benefits of

1. High expressiveness - by using full trees each

model is able to approximate complex functions
and decision boundaries.
2. Low variance - averaging the prediction of all the
models reduces the variance in the final prediction,
assuming that we choose a sufficiently large
number of trees.

14
Bagging

However, the major drawback of bagging (and other

ensemble methods that we will study) is that the
averaged model is no longer easily interpretable - i.e.
one can no longer trace the ‘logic’ of an output through
a series of decisions based on predictor values!

14
Variable Importance for Bagging

Bagging improves prediction accuracy at the expense of

interpretability.

Calculate the total amount that the RSS (for regression)

or Gini index (for classification) is decreased due to
splits over a given predictor, averaged over all B trees.

15
Variable Importance for Bagging

15
Out-of-Bag Error
Bagging is an example of an ensemble method, a method of building
a single model by training and aggregating multiple models.
With ensemble methods, we get a new metric for assessing the
predictive performance of the model, the out-of-bag error.
Given a training set and an ensemble of modeled each trained on a
bootstrap sample, we compute the out-of-bag error of the averaged
model by

1. for each point in the training set, we average the predicted

output for this point over the models whose bootstrap
training set excludes this point.

We compute the error or squared error of this averaged

prediction. Call this the point-wise out-of-bag error.
2. we average the point-wise out-of-bag error over the full
training set.

16
Bagging, correlated darta set

[show example]

17
Random Forests

18
Improving on Bagging

In practice, the ensembles of trees in Bagging tend to be

highly correlated.
Suppose we have an extremely strong predictor, xj , in
the training set amongst moderate predictors. Then the
greedy learning algorithm ensures that most of the
models in the ensemble will choose to split on xj in
early iterations.
That is, each tree in the ensemble is identically
distributed, with the expected output of the averaged
model the same as the expected output of any one of
the trees.

19
Improving on Bagging

Recall, for B number of identically but not

independently distributed variables with pairwise
correlation ρ and variance σ 2 , the variance of their mean
is
1−ρ 2
ρσ 2 + σ .
B
As we increase B, the second term vanishes but the
first term remains.
Consequently, variance reduction in bagging is limited
by the fact that we are averaging over highly correlated
trees.

19
Random Forests

Random Forest is a modified form of bagging that

creates ensembles of independent decision trees.
To de-correlate the trees, we:
1. train each tree on a separate bootstrap sample of
the full training set (same as in bagging)
2. for each tree, at each split, we randomly select a set
of J ′ predictors from the full set of predictors.
From amongst the J ′ predictors, we select the
optimal predictor and the optimal corresponding
threshold for the split.

20
Tuning Random Forests

Random forest models have multiple hyper-parameters

to tune:

1. the number of predictors to randomly select at

each split
2. the total number of trees in the ensemble
3. the minimum leaf node size

In theory, each tree in the random forest is full, but in

practice this can be computationally expensive (and
added redundancies in the model), thus, imposing a
minimum node size is not unusual.

21
Tuning Random Forests

There are standard (default) values for each of random

forest hyper-parameters recommended by long time
practitioners, but generally these parameters should be
tuned through cross validation (making them data and
problem dependent).
Using out-of-bag errors, training and cross validation
can be done in a single sequence - we cease training
once the out-of-bag error stabilizes

21
Variable Importance for RF

▶ Record the prediction accuracy on the oob samples

for each tree.
▶ Randomly permute the data for column j in the oob
samples the record the accuracy again.
▶ The decrease in accuracy as a result of this
permuting is averaged over all trees, and is used as
a measure of the importance of variable j in the
random forest.

22
Variable Importance for RF

22
Example

[compare RF, Bagging and Tree]

23
Final Thoughts on Random Forests

▶ When the number of predictors is large, but the number

of relevant predictors is small, random forests can
perform poorly.

In each split, the chances of selected a relevant

predictor will be low and hence most trees in the
ensemble will be weak models.

24
Final Thoughts on Random Forests

▶ Increasing the number of trees in the ensemble

generally does not increase the risk of overfitting.

Again, by decomposing the generalization error in terms

of bias and variance, we see that increasing the number
of trees produces a model that is at least as robust as a
single tree.

However, if the number of trees is too large, then the

trees in the ensemble may become more correlated,
increase the variance.

Random Forest Class Lecture Notes
No ratings yet
Random Forest Class Lecture Notes
2 pages
Random Forests 2
No ratings yet
Random Forests 2
43 pages
Random Forests
No ratings yet
Random Forests
43 pages
Lecture 5
No ratings yet
Lecture 5
53 pages
Decision Tree
No ratings yet
Decision Tree
7 pages
Lecture 05 Random Forest 07112022 124639pm
No ratings yet
Lecture 05 Random Forest 07112022 124639pm
25 pages
Ensemble Learning Explained
No ratings yet
Ensemble Learning Explained
32 pages
Tree-Based Methods Explained
No ratings yet
Tree-Based Methods Explained
68 pages
CS109a Lecture16 Bagging RF Boosting
No ratings yet
CS109a Lecture16 Bagging RF Boosting
48 pages
Random Forest
No ratings yet
Random Forest
29 pages
PDS LVC 2 Post-Session Summary
No ratings yet
PDS LVC 2 Post-Session Summary
11 pages
Random Forest Lecture
No ratings yet
Random Forest Lecture
5 pages
Machine Learning
No ratings yet
Machine Learning
5 pages
Random Forest Summary
No ratings yet
Random Forest Summary
6 pages
WINSEM2024-25 BCSE334L TH VL2024250502042 2025-02-18 Reference-Material-II
No ratings yet
WINSEM2024-25 BCSE334L TH VL2024250502042 2025-02-18 Reference-Material-II
39 pages
Random Forest Summary
No ratings yet
Random Forest Summary
6 pages
Week 12
No ratings yet
Week 12
34 pages
Random Forest
No ratings yet
Random Forest
25 pages
RandomForest2324 CR - 4p
No ratings yet
RandomForest2324 CR - 4p
11 pages
Random Forest
No ratings yet
Random Forest
25 pages
Lecture+Notes+-+Random Forests
No ratings yet
Lecture+Notes+-+Random Forests
10 pages
05.random Forest
No ratings yet
05.random Forest
3 pages
Random Forest Algorithm in Machine Learning Random Forest Random Forests or Random Decision Trees Decision Trees
No ratings yet
Random Forest Algorithm in Machine Learning Random Forest Random Forests or Random Decision Trees Decision Trees
6 pages
Deep Learning and Neural Networks
No ratings yet
Deep Learning and Neural Networks
21 pages
Randon Forest
No ratings yet
Randon Forest
34 pages
Classification Algorithms
No ratings yet
Classification Algorithms
68 pages
Da MS
No ratings yet
Da MS
24 pages
Random Forest (RF) : Decision Trees
No ratings yet
Random Forest (RF) : Decision Trees
3 pages
Data Mining Notes
No ratings yet
Data Mining Notes
5 pages
D3 IT Random Forest Apr 2023
No ratings yet
D3 IT Random Forest Apr 2023
32 pages
Ensemble Learning in Machine Learning
No ratings yet
Ensemble Learning in Machine Learning
32 pages
Random Forest
100% (1)
Random Forest
83 pages
Random Forest
No ratings yet
Random Forest
8 pages
Random Forest: Proprietary Content. ©great Learning. All Rights Reserved. Unauthorized Use or Distribution Prohibited
No ratings yet
Random Forest: Proprietary Content. ©great Learning. All Rights Reserved. Unauthorized Use or Distribution Prohibited
16 pages
Random Forests: N 1 N J X A I X A I
No ratings yet
Random Forests: N 1 N J X A I X A I
12 pages
ML Lec6
No ratings yet
ML Lec6
4 pages
Trees and Random Forest
No ratings yet
Trees and Random Forest
34 pages
Random Forests - SpringerLink
No ratings yet
Random Forests - SpringerLink
6 pages
Random Forest - Basics
100% (1)
Random Forest - Basics
9 pages
Machine Learning: Practical Tutorial On Random Forest and Parameter Tuning in R
No ratings yet
Machine Learning: Practical Tutorial On Random Forest and Parameter Tuning in R
11 pages
CHP 8.2 Intro To Statistical Learning
No ratings yet
CHP 8.2 Intro To Statistical Learning
13 pages
Lecture 20: Bagging, Random Forests, Boosting: Reading: Chapter 8
No ratings yet
Lecture 20: Bagging, Random Forests, Boosting: Reading: Chapter 8
53 pages
Session 7 - Random Forest
No ratings yet
Session 7 - Random Forest
8 pages
DecisionTrees RandomForest v2
No ratings yet
DecisionTrees RandomForest v2
27 pages
Random Forest Algorithm Updated
No ratings yet
Random Forest Algorithm Updated
11 pages
Machine Learning With Random Forests - by Knoldus Inc. - Knoldus - Technical Insights - Medium
No ratings yet
Machine Learning With Random Forests - by Knoldus Inc. - Knoldus - Technical Insights - Medium
12 pages
Bagging Trees & Random Forests Guide
No ratings yet
Bagging Trees & Random Forests Guide
50 pages
Lecture 9 PDF
100% (1)
Lecture 9 PDF
28 pages
Random Forest for ML Enthusiasts
No ratings yet
Random Forest for ML Enthusiasts
4 pages
Bagging and Random Forest Presentation1
100% (4)
Bagging and Random Forest Presentation1
23 pages
Eda - M4
No ratings yet
Eda - M4
7 pages
Decision Tree & Regression
No ratings yet
Decision Tree & Regression
33 pages
Tree Based Methods
No ratings yet
Tree Based Methods
64 pages
UNIT-V (Bagging, Boosting, Random Forest) : by Dr. K. Aditya Shastry Associate Professor Dept. of ISE NMIT, Bengaluru
No ratings yet
UNIT-V (Bagging, Boosting, Random Forest) : by Dr. K. Aditya Shastry Associate Professor Dept. of ISE NMIT, Bengaluru
27 pages
05 - Ensemble Learning
No ratings yet
05 - Ensemble Learning
39 pages
ML Unit 3
No ratings yet
ML Unit 3
22 pages
MLS 1 - Decision Trees and Random Forests
No ratings yet
MLS 1 - Decision Trees and Random Forests
16 pages
Year 3 Science Judging Standards Assessment-Pointers
No ratings yet
Year 3 Science Judging Standards Assessment-Pointers
3 pages
Motivation Letter Template
No ratings yet
Motivation Letter Template
5 pages
(Ebook PDF) Supervisory Management 9th Edition by Donald C. Mosley Download
100% (4)
(Ebook PDF) Supervisory Management 9th Edition by Donald C. Mosley Download
64 pages
Winter 2007 2008 Guide To Finishing 165 Tips For The Perfect Finish 1st Edition American Woodworker. PDF Download
No ratings yet
Winter 2007 2008 Guide To Finishing 165 Tips For The Perfect Finish 1st Edition American Woodworker. PDF Download
133 pages
Jon Lang
75% (4)
Jon Lang
30 pages
Leadership Behavior and Employee Voice
No ratings yet
Leadership Behavior and Employee Voice
18 pages
Soal Sosiologi KKI Kamis
No ratings yet
Soal Sosiologi KKI Kamis
6 pages
Chapter 4 Data Mining
No ratings yet
Chapter 4 Data Mining
5 pages
Module 7 Culminating Project
100% (1)
Module 7 Culminating Project
5 pages
A Study To Assess The Knowledge Regarding Dental Hygiene Among School Going Children in A Selected School of Delhi
No ratings yet
A Study To Assess The Knowledge Regarding Dental Hygiene Among School Going Children in A Selected School of Delhi
7 pages
CG PO and Co Mapping
No ratings yet
CG PO and Co Mapping
2 pages
DIASS Week 4
No ratings yet
DIASS Week 4
43 pages
JBI Cohort and Case Study Appraisal
No ratings yet
JBI Cohort and Case Study Appraisal
10 pages
Richard Kern: Aerospace Engineer Profile
No ratings yet
Richard Kern: Aerospace Engineer Profile
3 pages
Compare and Contrast Essay
No ratings yet
Compare and Contrast Essay
2 pages
Chapter 1 - The Nature of Probability and Statistics - Sections 1 and 2
No ratings yet
Chapter 1 - The Nature of Probability and Statistics - Sections 1 and 2
34 pages
CN-Design Research, Architectural Research, Architectural Design Research - An Argument On Disciplinarity and Identity
No ratings yet
CN-Design Research, Architectural Research, Architectural Design Research - An Argument On Disciplinarity and Identity
30 pages
Gender Issues Lec 4 Gender Typing and Stereotyping
No ratings yet
Gender Issues Lec 4 Gender Typing and Stereotyping
9 pages
FITZPATRICK20200520 Explaining 20 Homelessness 20 Critical 20 Realism
No ratings yet
FITZPATRICK20200520 Explaining 20 Homelessness 20 Critical 20 Realism
19 pages
Winkens 2009
No ratings yet
Winkens 2009
24 pages
An Introduction To Anthropology
No ratings yet
An Introduction To Anthropology
24 pages
Group Investigation&Models Associatedwth Interests
No ratings yet
Group Investigation&Models Associatedwth Interests
17 pages
Integrated Programme in Business Analytics
No ratings yet
Integrated Programme in Business Analytics
20 pages
Education 5.0 Is A Slogan Discuss
No ratings yet
Education 5.0 Is A Slogan Discuss
10 pages
Brochure 1
No ratings yet
Brochure 1
10 pages
Final Manuscript
No ratings yet
Final Manuscript
21 pages
Media Literacy for Students
No ratings yet
Media Literacy for Students
22 pages
The Role of Youths in CD
100% (1)
The Role of Youths in CD
28 pages
Btech PDF
No ratings yet
Btech PDF
3 pages
6Q Interpretative Phenomenological Analysis
No ratings yet
6Q Interpretative Phenomenological Analysis
21 pages

Lecture #15: Regression Trees & Random Forests

Uploaded by

Lecture #15: Regression Trees & Random Forests

Uploaded by

Lecture #15: Regression Trees & Random

Pavlos Protopapas Kevin Rader

Decision Trees for Regression

1. Start with an empty decision tree (undivided

For classification, we label each region in the model

▶ The three splitting criteria we’ve examined each promoted

For classification, purity of the regions is a good indicator the

For regression, we want to select a splitting criterion that

1. Start with an empty decision tree (undivided feature space)

For any data point xi

Most of the stopping conditions, like maximum depth

We’ve seen that classification trees approximate boundaries

For a fine enough partition of the feature space, these

Decision trees models are highly interpretable and fast

1. (Bootstrap) we generate multiple samples of training

2. (Aggregate) for a given input, we output the averaged

For classification, we return the class that is outputted

This method is called Bagging (Breiman, 1996), short for, of

Note that bagging enjoys the benefits of

1. High expressiveness - by using full trees each

However, the major drawback of bagging (and other

Bagging improves prediction accuracy at the expense of

Calculate the total amount that the RSS (for regression)

1. for each point in the training set, we average the predicted

We compute the error or squared error of this averaged

In practice, the ensembles of trees in Bagging tend to be

Recall, for B number of identically but not

Random Forest is a modified form of bagging that

Random forest models have multiple hyper-parameters

1. the number of predictors to randomly select at

In theory, each tree in the random forest is full, but in

There are standard (default) values for each of random

▶ Record the prediction accuracy on the oob samples

[compare RF, Bagging and Tree]

▶ When the number of predictors is large, but the number

In each split, the chances of selected a relevant

▶ Increasing the number of trees in the ensemble

Again, by decomposing the generalization error in terms

However, if the number of trees is too large, then the

You might also like