0% found this document useful (0 votes)

109 views16 pages

Decision Trees Explained

Decision trees are a non-linear predictive modeling technique that uses a tree-like graph to predict outcomes. Nodes in the tree represent conditions related to predictor variables, and branches represent the outcome of those conditions. Decision trees split the data into smaller and smaller groups, known as nodes, and try to make each group as homogeneous as possible with respect to the target variable. There are two main types of decision trees: regression trees for continuous targets, and classification trees for categorical targets. Entropy, information gain, and Gini impurity are common metrics used to determine the best variable to split the nodes on. The tree is built in a top-down approach starting with the root node, which contains the entire sample, and recursively splitting nodes

Uploaded by

Vasudha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

109 views16 pages

Decision Trees Explained

Uploaded by

Vasudha

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 16

Decision trees

Decision trees support tool that uses a tree-like graph or a model of decisions and their
possible consequence. It is also called instance based algorithm as at each instance we take
decision or we can say it uses nested if- else condition.
Decision Tree is a non linear model which is made of various linear axis parallel planes.
In logistic regression we have a single plane but in decision tree we have multiple planes.
Suppose we want to classify a person is defaulter in paying tax or not on basis of three
categories – employed, area, location. Then decision tree made as:

1 = no ROOT
defaulter NODE
0=default 100
er N= 0
1 600
0 400
|
|
Employed
== Y
/ \
/ \
YES \ NO
N= 700 N= 300
1 550 1 50
0 150 0 250
| |
| |
Location==
Gender==M N
/ \ / \

/ \ / \
YES (Gender==M) (Gender==F) Location=N) Location!=N
N= 300 N= 400 N= 100 N= 200
1 230 1 10 25% 1 25 1 150
0 70 0 390 75% 0 75 0 50
No no defaul
defaulter defaulter defaulter er

Decision tree represented as:

Geometrically, it is axis parallel planes that teslate the area into Hypercuboid or
hypercube.

Decision Trees can be of can be of two types- Regression Trees and Classification trees.
Regression Trees are used when the dependent variable is a continuous variable classification
is used when dependent variable is categorical.
Some terms related to decision trees are:
 Decision Node: Any node that gets split is known as the decision node.
 Root Node: This is the top-most decision node and represents the entire sample
which further gets divided.
 Leaf Node: Also known as the Terminal node, this is a node that does not split further.
 Splitting: The process of dividing a node is known as splitting. This is the most
commonly used method where a top-down approach is taken to split each node to find
the best split
 Pruning: It is the opposite of splitting. Here we remove sub-nodes of a decision node.

How to build a decision tree?

In case of classification type of problem we use “Entropy”, “Information Gain”, “Ginni

coefficient ” as our measure to decide the feature that split the data, where as in case of
linear regression we use mean sum of square (MSE) to split the data.
The most common criteria/metrics for Classification Trees are-
 Gini Index
 Entropy
 Chi-Square (for Classification Trees)
 Reduction In Variance (for Regression Trees)
 ANOVA.

1. Entropy: It is defined as
Let us consider a case, where we need to classify the data point in two categories category 1
and 2 let p is probability of outcome variable classified in class 1 and q is probability that
outcome variable(defaulter) classified in class 0.Then entropy will be calculated as for
H= -p log2p – q log2q
Entropy measures uncertainty or we can say randomness. In other words, it is a measure of
unpredictability.
Suppose, we have a data like 1000 employees as
Defaulte Employed Area Gender
r

1 1 N M
0 0 S F
0 0 S M
1 0 S F
0 1 N F
1 0 N M
0 0 N M
1 1 S M
1 0 S F
0 1 S M
0 0 N F

And we want to classify data a variable as defaulter or not on basis of employed,

Area, Gender.

Steps in computing entropy as:

Suppose Entropy(employed) is 0.97(using above formula), Entropy(Area ) is 0.43
and Entropy(gender) is 0.58. Then we choose Entropy(employed) as our root node
Because using employed variable we get a more random split than others variable.
( percentage of person in class 1 in employed = Yes is 78% and percentage of
Person n class 1 in employed = No is 83%. Nearly comparable or homogenous.

Employed == Y
/ \

/ \
YES NO
N= 700 N= 300
1 550 1 250
0 150 0 50

Now, we compute Entropy(gender /employed =1(Yes)) 0.74 and Entropy(location =

North/employed = Yes(1)) is 0.35, since Entropy(gender /employed =1(Yes)) gives more
random split than Entropy(location = North/employed = Yes(1) so we choose gender as
feature for further splitting

Employed
=YES
N= 700
1 550
0 150
|
|
Gender==M

/ \

/ \
YES YES
(Gender==M) (Gender==F)
N= 300 N= 400
77% 1 230 3% 1 10
23% 0 70 98% 0 390

Similarly we compute values for employed = No.

Employed = NO
N= 300
1 50
0 250
|
|
Location==N
/ \
/ \
YES YES (Location!
(Location==N) =N)
N= 100 N= 200
1 25 1 150
0 75 0 50

2. Ginni Impurity: This method uses an impurity based criterion used in CART and
only performs binary splits.

Let consider the example:

We find that out of total 50 students, half of them were selected for the coaching.

For Calculating Gini Index, we have to find the proportions of these values

Now we can find the weighted Gini for Split X1 (Grade) by calculating Gini for sub-
node „A or B‟ and „C
or D‟.
The calculations are shown below

We can similarly do the same for X2 which here is „Class‟ and find the Gini for spliting
„Class‟.

For X3 Height which is a continuous variable, the statistical software in the backend split
the variable into
height > x and height < x and find the Gini value and whichever value of x provides them
with the
minimum Gini value, that value of x is used to calculate the Gini for splitting this
continuous variable.

In our example, we find that the value of x is equal to 1.75 and provides the minimum
value in comparison
to other values of height say 1.80 or 1.60. We find that the Gini for split height comes out
to be 0.23
Now we compare the value of Gini and find that the minimum Gini is provided by height
(0.23), thus, this
variable will be the root node that will split the data into two homogeneous „purer‟ sets.
The second
minimum value of Gini is found for the variable „Grade‟ and the maximum value of Gini
is found
with „Class‟. Thus the data will first split by Height as it plays a major role in deciding
whether a student
will be selected for the basketball coaching or not.

Ginni impurity and entropy both are same. The only difference lies in the formula
of ginni impurity doesn’t uses log that makes calculation faster as compared to
entropy (computational complexity).

3. Mean sum of square (MSE): It is defined as

∑(x – x ) 2 ÷ n
However, when the variable is categorical then it can get a bit complicate
We built tree top down right from root to leave and we want all similar data point to
be present in on leaf st that standard deviation measures homogeneity smaller.

For example, lets calculate mean and variance for the root node. Here we have 15 1s
(y=1) and 15 0s (y=0). We calculate mean by ((25 x 1) + (25 x 0)) ÷ 50 which comes
out to be 0.5 We can now use the formula of variance. Here our x will be 0.5. While
x will be 1 and 0. We have 50 entries so the formula will be (x-x )2× 50 ÷ n
To be very precise the formula would look something like this-
(1-0.5)2 + (1-0.5)2 + (1-0.5)2 + (1-0.5)2 + (1-0.5)2 + (1-0.5)2 + (1-0.5)2 + (1-0.5)2
+ (1-0.5)2 + (1-0.5)2 + (1- 0.5)2 + (1-0.5)2 + (1-0.5)2 + (1-0.5)2 + (1-0.5)2 + (1-
0.5)2 + (1-0.5)2 + (1-0.5)2 + (1-0.5)2 + (1-0.5)2 + (1- 0.5)2 + (1-0.5)2 + (1-0.5)2 +
(1-0.5)2 + (1-0.5)2 + (0-0.5)2 + (0-0.5)2 + (0-0.5)2 + (0-0.5)2 + (0-0.5)2 + (0-34.
Decision Trees 280 0.5)2 + (0-0.5)2 + (0-0.5)2 + (0-0.5)2 + (0-0.5)2 + (0-0.5)2 + (0-
0.5)2 + (0-0.5)2 + (0-0.5)2 + (0-0.5)2 + (0- 0.5)2 + (0-0.5)2 + (0-0.5)2 + (0-0.5)2 +
(0-0.5)2 + (0-0.5)2 + (0-0.5)2 + (0-0.5)2 + (0-0.5)2 + (0-0.5)2 ÷ 50 This can be
written as (1-0.5)2 X 25 + (0-0.5)2 X 25 ÷ 50
Using this we can finally come up with a equation (25 X (1-0.5)2)+(25 X (0-0.5)2) ÷
50
We use the above formula to find the mean and variance for each variable. See below
the calculation to find the variance of X1 (Grade) variable.

We can use the same formula to find the variance for Split Class.

We find that that the results are similar to entropy with the „Grade‟ being the least
important variable as its variance is higher (0.24) than the other categorical variable
which is „Class‟, having a variance of 0.21. The variance of Numerical Variable
follows similar steps and the variable which provides the minimum variance is
considered for the split

Now, we are aware of how to build a decision tree. But we must take care while
building a decision tree as depth increase the chance of overfitting increased (or
chances of higher noisy points in the data) and if depth is too short like 1 then
decision tree underfit. So, appropriate depth should be find out using cross validation.
Parameters Related to the Size of the Tree:
Maximum Depth Among the easiest parameter to understand is maximum depth
which is an important parameter to tune
trees and tree-based models. It is the length of the longest path from root node to the
leaf node or the depth
of the layers of the leaves. We often provide a value/maximum number/ depth-limit
to the decision tree
model to limit the further splitting of nodes, so when the specified depth is reached,
the tree stops splitting.

In the above example, we have two features with one being on x and other on the y-axis. Here
it finds that the first best split can be done by doing it on the x-axis i.e. split the variable on
the x-axis and find the value which provides the best result (the value that provides the
minimum entropy or Gini depending upon
the algorithm decision tree is using). Here for max depth =1, it splits the data and says that
data points on the left will be classified as 0 (blue, background colour) while the actual 0 are
the blue dots. This depth will make the model too generalized and will underfit the model.
However, if it keeps going and
recursively partitions the data, it then finds another way to split the data and thus creates
more and more branches and keeps on splitting the space. We can see by the time it reaches
the maximum depth, it has all the pure leaves, however. This leads to overfitting as the
decision boundry is very complex
and will yield very poor results in testing or when used for unseen new data. Thus we can set
the maximum depth at 5 to have a moderately complex and accurate decision tree model.
Minimum Split Size This is another parameter through which we define the minimum
number of observation required to allow the node to further split. This can control overfitting
and underfitting. A very low value of Minimum Split Size will lead to overfitting with a
small number of observations being considered for further splitting, however, an extremely
large value will lead to underfitting as a good amount of observations that on being split can
provide with useful insights won‟t be split.

Minimum Leaf Size

Here we can provide a value that will decide the minimum number of observations that a
terminal node will have. It is used for imbalanced class as, for example, if we define the
minimum split size at 99 and we
have a node with 100 observations, it can be split, however, upon splitting we find that the
child nodes have 99 observations in one node and one observation in another. We have
discussed in ANOVA and other Basic Statistical concepts that each class of a variable should
have a minimum number of
observations to be considered for any statistical test. Similarly here we define the minimum
leaf size which is generally 10% of the minimum split size.

Maximum Impurity
In the Maximum Depth, we saw how a low number of depth caused the red points to be
considered as blue. We can limit the number of wrong points required to be in a region before
splitting is stopped.
PYTHON CODE:

In case of classification

Dataset has folloing columns as:

['Pregnancies', 'Glucose', 'BloodPressure', 'SkinThickness', 'Insulin',
'BMI', 'DiabetesPedigreeFunction', 'Age', 'Outcome']

Outcome is a dependent variable and rest are independent variable.

Outcome is imbalance

it is clearly visible that there is no or low multicollinearity

First we balance the dataset

Since, we removed outlier, no missing value, no multicollinearity

Now, we train our model

Accuracy before cross validation and hyper parameter tuning

After K fold cross validation

Accuracy

Accuracy of train is 100 and test is 70%

After Hyper parameter tuning:

Accuracy

After parameter tuning accuracy of test has been increased from 60 to 73%.

In case of regression
Data set contain the following variable as:
In train data we have 8523 column and following columns:
# Item_Identifier: product ID
# Item_Weight: Weight of the product in Kg
# Item_Fat_Content: product is low fat or not
# Item_Visibility: The % of total display area of the products in a store
# Item_Type: The category to which the product belongs
# Item_MRP: Maximum Retail Price selling price) of the product
# Outlet_Identifier: Unique store ID
# Outlet_Establishment_Year: The year in which store was established
# Outlet_Size: The size of the store in terms of ground area covered
# Outlet_Location_Type: The type of city in which the store is located
# Outlet_Type: Whether the outlet is just a grocery store or supermarket
# Item_Outlet_Sales: Sales of the product in the particulat store. This is the outcome
variable to be predicted.
Item_Outlet_Sales is dependent variable rest are independent variable

After all preprocessing steps. We train a decision tree model without hyperparameter tuning
as:
Model score

We got 100% score on training data.

On test data we got 57% score because we haven’t done any tuning parameters just initialized
the tree with default values. So tree has expanded it full length using feature which is not
actually valid. That’s why 100 % accuracy on train data (i.e, a highly overfitted model)

Now we tuneparameter to get rid off this

Simple Models without any Hyper parameter or its tuning but
with K-Fold Cross Validation

Error after cross validation

After hyperparameter tuning.

Error after hyper parmeter tuning

It was observed that after tuning MSE has been decreased to a small value

When to stop growing?

 Pure node got stop growing

 If we have lack of point stop growing
 It tree are too deep

Advantages of decision trees:

 Decision Trees provide high accuracy and require very little preprocessing of
data such as outlier capping, missing value, variable transformation etc.
 Also, they can work if the dependent and independent variable are not linear
i.e. it can capture non-linear relationships
 Decision Trees require less data than what regression model requires and are
easy to understand along with being very inquisitive.
 Tree-Based models can be very easily visualized with clear-cut demarkation
allowing people having no background of statistics to understand the process
easily.
 Decision Tees can also be used for data cleaning, data exploration and
variable selection and creation.
 Decision Trees can work with high dimensional data having both- continuous
and categorical variables
 Feature interaction are in built of decision tree.
 DT are super - interpretable.

Decision Trees
No ratings yet
Decision Trees
13 pages
Unit-5 Decision Trees and Ensemble Learning
100% (1)
Unit-5 Decision Trees and Ensemble Learning
162 pages
ML Lecture 8 9 Classification
No ratings yet
ML Lecture 8 9 Classification
35 pages
02 DT Splitting (Entropy, Gini, ChiSqr, Variance)
No ratings yet
02 DT Splitting (Entropy, Gini, ChiSqr, Variance)
14 pages
Decision Trees Palagraism
No ratings yet
Decision Trees Palagraism
16 pages
Decision Tree
No ratings yet
Decision Tree
13 pages
5 1 Decision Trees
No ratings yet
5 1 Decision Trees
34 pages
Decision Tree
No ratings yet
Decision Tree
8 pages
Different Criterion For Splitting in Decision Tree Algorithm
No ratings yet
Different Criterion For Splitting in Decision Tree Algorithm
3 pages
Decision Tree Tutorial
No ratings yet
Decision Tree Tutorial
8 pages
Decision Tree Theory
No ratings yet
Decision Tree Theory
22 pages
Supervised Decision TreeRandom Forest
No ratings yet
Supervised Decision TreeRandom Forest
39 pages
Decision Tree in Machine Learning
No ratings yet
Decision Tree in Machine Learning
11 pages
Classification Algorithms
No ratings yet
Classification Algorithms
31 pages
Classification and Regression Trees
No ratings yet
Classification and Regression Trees
48 pages
UNIT - 3 ML
No ratings yet
UNIT - 3 ML
24 pages
Unit II
No ratings yet
Unit II
34 pages
Decision Trees - Detailed Notes
No ratings yet
Decision Trees - Detailed Notes
8 pages
Unit IV Decision Trees
No ratings yet
Unit IV Decision Trees
37 pages
Unit-4 (1) .Docx ML
No ratings yet
Unit-4 (1) .Docx ML
42 pages
1.decision Trees Concepts
No ratings yet
1.decision Trees Concepts
70 pages
Chapter 5. Decision Trees
No ratings yet
Chapter 5. Decision Trees
53 pages
Decision Tree
No ratings yet
Decision Tree
43 pages
Solution For DWDM Problems
No ratings yet
Solution For DWDM Problems
24 pages
Lecture 5 DecisionTree
No ratings yet
Lecture 5 DecisionTree
21 pages
Unit 1 ML (DT)
No ratings yet
Unit 1 ML (DT)
24 pages
DECISION TREES-jb
No ratings yet
DECISION TREES-jb
8 pages
Decision Trees for Data Scientists
No ratings yet
Decision Trees for Data Scientists
15 pages
ML Unit 3 Notes-1
No ratings yet
ML Unit 3 Notes-1
118 pages
Unit 3 Notes
No ratings yet
Unit 3 Notes
25 pages
Decision Tree
No ratings yet
Decision Tree
21 pages
CSE445 NSU Week - 4
No ratings yet
CSE445 NSU Week - 4
48 pages
UNIT - 3 ML
No ratings yet
UNIT - 3 ML
24 pages
Unit 1 ML (NN& ML Techniques)
No ratings yet
Unit 1 ML (NN& ML Techniques)
40 pages
BSC ML Ch3
No ratings yet
BSC ML Ch3
106 pages
ML-chap9 2024 110217
No ratings yet
ML-chap9 2024 110217
52 pages
ML Unit 3 Notes
No ratings yet
ML Unit 3 Notes
117 pages
Data Minning Unit 5 PDF
No ratings yet
Data Minning Unit 5 PDF
19 pages
Unit 5 - Data Mining - WWW - Rgpvnotes.in
No ratings yet
Unit 5 - Data Mining - WWW - Rgpvnotes.in
15 pages
2 Decision Tree Algo
No ratings yet
2 Decision Tree Algo
46 pages
3 Decision Trees
No ratings yet
3 Decision Trees
41 pages
Unit 3.2 Decision Tree Algorithm Wit Examples
No ratings yet
Unit 3.2 Decision Tree Algorithm Wit Examples
85 pages
L04 Decision Trees
No ratings yet
L04 Decision Trees
34 pages
Applying Decision Tree Algorithm Classification An
No ratings yet
Applying Decision Tree Algorithm Classification An
5 pages
Classification and Regression Tree Construction
No ratings yet
Classification and Regression Tree Construction
18 pages
Decision Tree Is An Upside
No ratings yet
Decision Tree Is An Upside
17 pages
Decision Trees
No ratings yet
Decision Trees
18 pages
Classification - Decision Trees
No ratings yet
Classification - Decision Trees
43 pages
Decision Tree Algorithm
No ratings yet
Decision Tree Algorithm
18 pages
Decision Tree
No ratings yet
Decision Tree
19 pages
Decision Tree
No ratings yet
Decision Tree
11 pages
Unit-3 ML
No ratings yet
Unit-3 ML
47 pages
IS4834 Week 8
No ratings yet
IS4834 Week 8
42 pages
ML Unit 2 Final - III Yr
No ratings yet
ML Unit 2 Final - III Yr
72 pages
Unit6 - 2 Classification-Decision-Trees
No ratings yet
Unit6 - 2 Classification-Decision-Trees
36 pages
Decision Tree
No ratings yet
Decision Tree
82 pages
MLunit 2 Mynotes
No ratings yet
MLunit 2 Mynotes
15 pages
Unit 3
No ratings yet
Unit 3
21 pages
ML 1-10
No ratings yet
ML 1-10
53 pages
S1 Winter Session Chapter 6 Manipulations of Polynomials: Exercise 6.1
No ratings yet
S1 Winter Session Chapter 6 Manipulations of Polynomials: Exercise 6.1
16 pages
Class 10 Math: Polynomial Assignment
No ratings yet
Class 10 Math: Polynomial Assignment
2 pages
Advanced VLSI Technology
No ratings yet
Advanced VLSI Technology
464 pages
Hashing
No ratings yet
Hashing
7 pages
Deep Neural Networks Intro As If
No ratings yet
Deep Neural Networks Intro As If
55 pages
A Novel IChOA-CNN-LSTM Model For Android Malware Detection Using Opcode-Based Feature Selection and Optimization
No ratings yet
A Novel IChOA-CNN-LSTM Model For Android Malware Detection Using Opcode-Based Feature Selection and Optimization
14 pages
Fixed-Point Digital IIR Filter Design Using Two-Stage Ensemble Evolutionary Algorithm
No ratings yet
Fixed-Point Digital IIR Filter Design Using Two-Stage Ensemble Evolutionary Algorithm
10 pages
Numerical Computions
No ratings yet
Numerical Computions
103 pages
Lab - 8 - 21130568 - NguyenNhuToan - Ipynb - Colab
No ratings yet
Lab - 8 - 21130568 - NguyenNhuToan - Ipynb - Colab
4 pages
Handbook On Constructing Composite Indicators - Methodology and UserGuide
No ratings yet
Handbook On Constructing Composite Indicators - Methodology and UserGuide
3 pages
Gram Scan IMP
No ratings yet
Gram Scan IMP
84 pages
Internship Presentation
No ratings yet
Internship Presentation
18 pages
ML Performance Improvement Cheatsheet
No ratings yet
ML Performance Improvement Cheatsheet
11 pages
Advanced Techniques in Machine Learning and Optimization
No ratings yet
Advanced Techniques in Machine Learning and Optimization
8 pages
NZ N 245 Multiplication Strategy Display Posters Ver 1
100% (1)
NZ N 245 Multiplication Strategy Display Posters Ver 1
15 pages
20 Hard Algorithm Questions
No ratings yet
20 Hard Algorithm Questions
4 pages
Bitmask DP
No ratings yet
Bitmask DP
3 pages
Computer Science Answer Key
No ratings yet
Computer Science Answer Key
1 page
225C4A
No ratings yet
225C4A
2 pages
Decision Tree Using ID3 Algorithm
No ratings yet
Decision Tree Using ID3 Algorithm
40 pages
AI ML Study Roadmap 6 Months
No ratings yet
AI ML Study Roadmap 6 Months
1 page
Breadth First Search
No ratings yet
Breadth First Search
6 pages
Scott TableContents 2011
No ratings yet
Scott TableContents 2011
5 pages
DAA Decrease and Conquer ADA
No ratings yet
DAA Decrease and Conquer ADA
16 pages
Ta ZC164 Course Handout
No ratings yet
Ta ZC164 Course Handout
6 pages
Module Assessment Brief
No ratings yet
Module Assessment Brief
3 pages
Assignment - I
No ratings yet
Assignment - I
1 page
Maths For Finance Chapter 3 PDF
67% (3)
Maths For Finance Chapter 3 PDF
13 pages
Operational Research: Assignment Problems
No ratings yet
Operational Research: Assignment Problems
32 pages