0% found this document useful (0 votes)

33 views5 pages

Understanding Decision Trees in ML

H hcyvu. k

Uploaded by

bormonp659

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views5 pages

Understanding Decision Trees in ML

H hcyvu. k

Uploaded by

bormonp659

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Data Mining Machine Learning

Md. Abdullah-Al-Kafi
Lecturer
Daffodil International University

Understanding Decision Trees

A decision tree is a type of algorithm used for making decisions based on data. Think of it like a flowchart
where each step (or node) represents a decision point that leads you to the next step, eventually arriving
at a final decision (or leaf). Here’s a simple explanation:

Key Components of a Decision Tree

• Root Node: This is the topmost decision point in the tree. It represents the starting point where
the first decision is made.
• Decision Nodes: These are points where decisions are made based on data features. Each node
represents a question or test about the data.
• Branches: These are the possible outcomes of a decision node. Each branch leads to another
decision node or a leaf node.
• Leaf Nodes (Terminal Nodes): These are the end points of the tree, representing the final
decision or outcome after all the decisions have been made.

How Decision Trees Work

1. Splitting the Data: The algorithm starts at the root node and splits the data based on the feature
that provides the best separation between the different classes (for classification) or outcomes
(for regression). The “best” feature is usually determined by some metric like Gini impurity or
information gain for classification tasks.
2. Recursive Process: The algorithm continues to split the data recursively at each decision node,
creating branches, until one of the stopping criteria is met (like reaching a maximum depth, having
too few samples to split further, or all the data in a node belonging to a single class).
3. Final Decision: When the algorithm reaches a leaf node, it assigns the most common class (for
classification) or the average value (for regression) of the data points in that node as the final
decision.

Example
Imagine you want to decide whether to play outside or stay indoors based on the weather. Your decision
tree might look like this:
• Root Node: Is it sunny?
– If yes: Is it warm?
∗ If yes: Go play outside! (Leaf Node)
∗ If no: Stay indoors. (Leaf Node)
– If no: Is it raining?
∗ If yes: Stay indoors. (Leaf Node)
∗ If no: Is it windy?
· If yes: Stay indoors. (Leaf Node)
· If no: Go play outside! (Leaf Node)

1
Advantages and Disadvantages
• Advantages:

– Easy to understand and interpret.

– Can handle both numerical and categorical data.
– No need for feature scaling (normalization or standardization).
• Disadvantages:

– Can overfit the data, meaning it might learn noise instead of the actual pattern.
– Can become complex and less interpretable if the tree is too deep.

Decision trees are powerful tools for making decisions and are widely used in machine learning for
tasks like classification and regression.

Decision Tree Example Using Gini Index

Suppose we have a dataset of 10 samples with two features, Weather (Sunny, Overcast, Rainy) and
Temperature (Hot, Mild, Cool), and a target variable, Play Tennis (Yes, No). The dataset is as
follows:
Sample Weather Temperature Play Tennis
1 Sunny Hot No
2 Sunny Hot No
3 Overcast Hot Yes
4 Rainy Mild Yes
5 Rainy Cool Yes
6 Rainy Cool No
7 Overcast Cool Yes
8 Sunny Mild No
9 Sunny Cool Yes
10 Rainy Mild Yes

We want to determine which feature (Weather or Temperature) is the best first split using the Gini
index.

Calculating the Gini Index

The Gini index measures the impurity of a dataset; a Gini index of 0 indicates a perfectly pure node
(all samples belong to one class).
The formula for the Gini index is:
X
Gini = 1 − (p2i )
where pi is the proportion of samples that belong to class i.

Step-by-Step Calculation
1. Calculate the Gini index for the overall dataset.
There are 10 samples, with 6 ”Yes” and 4 ”No”.
2 2
6 4
Giniparent = 1 − −
10 10

Giniparent = 1 − 0.36 − 0.16 = 0.48

2. Calculate the Gini index for splits on the feature ”Weather”.
The ”Weather” feature has three categories: Sunny, Overcast, and Rainy.
- Sunny: 4 samples (2 ”No”, 2 ”Yes”)

AbKafi
2 2
2 2
GiniSunny = 1 − − = 1 − 0.25 − 0.25 = 0.5
4 4
- Overcast: 2 samples (0 ”No”, 2 ”Yes”)
2 2
2 0
GiniOvercast =1− − =1−1−0=0
2 2
- Rainy: 4 samples (1 ”No”, 3 ”Yes”)
2 2
3 1
GiniRainy = 1 − − = 1 − 0.5625 − 0.0625 = 0.375
4 4
3. Calculate the weighted Gini index for the ”Weather” split.

4 2 4
GiniW eather = × 0.5 + ×0 + × 0.375
10 10 10

GiniW eather = 0.2 + 0 + 0.15 = 0.35

4. Calculate the Gini index for splits on the feature ”Temperature”.
The ”Temperature” feature has three categories: Hot, Mild, and Cool.
- Hot: 3 samples (2 ”No”, 1 ”Yes”)
2 2
2 1
GiniHot =1− − = 1 − 0.444 − 0.111 = 0.444
3 3
- Mild: 3 samples (1 ”No”, 2 ”Yes”)
2 2
2 1
GiniM ild = 1 − − = 0.444
3 3
- Cool: 4 samples (1 ”No”, 3 ”Yes”)
2 2
3 1
GiniCool =1− − = 0.375
4 4
5. Calculate the weighted Gini index for the ”Temperature” split.

3 3 4
GiniT emperature = × 0.444 + × 0.444 + × 0.375
10 10 10

GiniT emperature = 0.1332 + 0.1332 + 0.15 = 0.4164

Conclusion
The feature with the lower weighted Gini index is the better feature for splitting the data first. In this
case:

GiniW eather = 0.35, GiniT emperature = 0.4164

”Weather” has a lower Gini index (0.35), so it is the better feature to split on first.

Math Questions on Decision Trees and Gini Index

1. Calculate the Gini index for a dataset with two classes.
• Suppose you have a dataset with 8 samples: 5 are labeled ”Yes” and 3 are labeled ”No”.
Calculate the Gini index for this dataset.
2 2
5 3
Answer: Gini = 1 − − = 1 − 0.3906 − 0.1406 = 0.4688
8 8

AbKafi
2. Calculate the weighted Gini index for a split.
• Given a dataset split into two groups: Group 1 has 6 samples (4 ”Yes”, 2 ”No”) and Group
2 has 4 samples (2 ”Yes”, 2 ”No”). Calculate the weighted Gini index for this split.
2 2
4 2
Answer: Gini1 = 1 − − = 1 − 0.4444 − 0.1111 = 0.4445
6 6
2 2
2 2
Gini2 = 1 − − = 1 − 0.25 − 0.25 = 0.5
4 4
6 4
Weighted Gini = × 0.4445 + × 0.5 = 0.2667 + 0.2 = 0.4667
10 10
3. Find the Gini index for a dataset with three classes.
• Suppose you have a dataset with 9 samples: 3 are labeled ”A”, 3 are labeled ”B”, and 3 are
labeled ”C”. Calculate the Gini index for this dataset.
2 2 2
3 3 3
Answer: Gini = 1 − − − = 1 − 0.1111 − 0.1111 − 0.1111 = 0.6667
9 9 9

4. Compute the Gini index for a perfect split.

• A dataset has 10 samples, 5 labeled ”Yes” and 5 labeled ”No”. If we perfectly split the dataset
into two groups of 5 samples each, one group entirely ”Yes” and the other entirely ”No”, what
is the Gini index after the split?
Answer: Gini = 0 (since both groups are pure with only one class)

5. Determine the Gini index for an imbalanced dataset.

• A dataset has 20 samples: 16 are labeled ”Positive” and 4 are labeled ”Negative”. Calculate
the Gini index for this dataset.
2 2
16 4
Answer: Gini = 1 − − = 1 − 0.64 − 0.04 = 0.32
20 20

Math Questions on Decision Trees Using Entropy

1. Calculate the entropy for a dataset with two classes.
• Suppose you have a dataset with 8 samples: 5 are labeled ”Yes” and 3 are labeled ”No”.
Calculate the entropy for this dataset.

5 5 3 3
Entropy = − log2 − log2
8 8 8 8
Entropy = − (0.625 × log2 0.625) − (0.375 × log2 0.375)
Entropy = −(0.625 × −0.678) − (0.375 × −1.415) = 0.954

2. Calculate the weighted entropy for a split.

• Given a dataset split into two groups: Group 1 has 6 samples (4 ”Yes”, 2 ”No”) and Group
2 has 4 samples (2 ”Yes”, 2 ”No”). Calculate the weighted entropy for this split.

4 4 2 2
Entropy1 = − log2 − log2
6 6 6 6
Entropy1 = − (0.667 × −0.585) − (0.333 × −1.585)
Entropy1 = 0.389 + 0.528 = 0.917

2 2 2 2
Entropy2 = − log2 − log2
4 4 4 4

AbKafi
Entropy2 = − (0.5 × −1) − (0.5 × −1)
Entropy2 = 0.5 + 0.5 = 1

6 4
Weighted Entropy = × 0.917 + × 1 = 0.55 + 0.4 = 0.95
10 10
3. Find the entropy for a dataset with three classes.
• Suppose you have a dataset with 9 samples: 3 are labeled ”A”, 3 are labeled ”B”, and 3 are
labeled ”C”. Calculate the entropy for this dataset.

3 3 3 3 3 3
Entropy = − log2 − log2 − log2
9 9 9 9 9 9
Entropy = −(0.333 × −1.585) − (0.333 × −1.585) − (0.333 × −1.585)
Entropy = 0.528 + 0.528 + 0.528 = 1.584

4. Compute the entropy for a perfect split.

• A dataset has 10 samples, 5 labeled ”Yes” and 5 labeled ”No”. If we perfectly split the dataset
into two groups of 5 samples each, one group entirely ”Yes” and the other entirely ”No”, what
is the entropy after the split?
Entropy = 0 (since both groups are pure with only one class)

5. Determine the entropy for an imbalanced dataset.

• A dataset has 20 samples: 16 are labeled ”Positive” and 4 are labeled ”Negative”. Calculate
the entropy for this dataset.

16 16 4 4
Entropy = − log2 − log2
20 20 20 20
Entropy = −(0.8 × −0.322) − (0.2 × −2.322) = 0.257 + 0.464 = 0.721

AbKafi

Decision Trees: Construction & Pruning Guide
No ratings yet
Decision Trees: Construction & Pruning Guide
18 pages
Decision Trees and GINI Calculation
No ratings yet
Decision Trees and GINI Calculation
6 pages
CART Algorithm and Gini Index Explained
No ratings yet
CART Algorithm and Gini Index Explained
29 pages
Implementing CART Decision Tree Algorithm
No ratings yet
Implementing CART Decision Tree Algorithm
14 pages
Gini Impurity Calculation in Decision Trees
No ratings yet
Gini Impurity Calculation in Decision Trees
33 pages
ID3 Decision Tree Calculation Example
No ratings yet
ID3 Decision Tree Calculation Example
12 pages
Decision Tree Algorithm: Information Gain
No ratings yet
Decision Tree Algorithm: Information Gain
35 pages
Gini Index in Decision Trees
No ratings yet
Gini Index in Decision Trees
25 pages
Decision Tree Analysis for Weather Play
No ratings yet
Decision Tree Analysis for Weather Play
17 pages
Understanding Decision Trees in ML
No ratings yet
Understanding Decision Trees in ML
37 pages
Data Classification Methods in ArcGIS
No ratings yet
Data Classification Methods in ArcGIS
7 pages
Decision Tree Analysis for Tennis Play
No ratings yet
Decision Tree Analysis for Tennis Play
36 pages
Decision Trees: Building and Analysis Guide
No ratings yet
Decision Trees: Building and Analysis Guide
49 pages
Decision Tree Analysis with Gini Index
No ratings yet
Decision Tree Analysis with Gini Index
9 pages
Decision Tree Algorithms: ID3, C4.5, CART
100% (1)
Decision Tree Algorithms: ID3, C4.5, CART
7 pages
k-NN and Decision Trees for Iris Dataset
No ratings yet
k-NN and Decision Trees for Iris Dataset
33 pages
Data Mining Classification Algorithms
No ratings yet
Data Mining Classification Algorithms
7 pages
Entropy and Gini in Decision Trees
No ratings yet
Entropy and Gini in Decision Trees
5 pages
Data Mining & Machine Learning Insights
No ratings yet
Data Mining & Machine Learning Insights
10 pages
Understanding Decision Trees for Classification
No ratings yet
Understanding Decision Trees for Classification
46 pages
Understanding Decision Trees in Classification
No ratings yet
Understanding Decision Trees in Classification
71 pages
Decision Tree Learning Overview
No ratings yet
Decision Tree Learning Overview
7 pages
Understanding Decision Trees in ML
No ratings yet
Understanding Decision Trees in ML
10 pages
Decision Tree for Outdoor Play Prediction
No ratings yet
Decision Tree for Outdoor Play Prediction
20 pages
Gini Index Decision Tree Example
No ratings yet
Gini Index Decision Tree Example
12 pages
Decision Trees: Information Gain Explained
No ratings yet
Decision Trees: Information Gain Explained
11 pages
Bayes and Decision Tree Algorithms
No ratings yet
Bayes and Decision Tree Algorithms
8 pages
Decision Trees and Random Forests Explained
No ratings yet
Decision Trees and Random Forests Explained
104 pages
Constructing Decision Trees for Data Mining
No ratings yet
Constructing Decision Trees for Data Mining
12 pages
Decision Tree for Tennis Play Analysis
No ratings yet
Decision Tree for Tennis Play Analysis
29 pages
Understanding Histograms and Data Spread
No ratings yet
Understanding Histograms and Data Spread
19 pages
Decision Tree for Golf Play Analysis
No ratings yet
Decision Tree for Golf Play Analysis
17 pages
Decision Trees: Weather Impact on Play
No ratings yet
Decision Trees: Weather Impact on Play
16 pages
Decision Tree Algorithm Explained
No ratings yet
Decision Tree Algorithm Explained
45 pages
Gini Index in Decision Trees Explained
No ratings yet
Gini Index in Decision Trees Explained
6 pages
Gini Index and CART Decision Trees
No ratings yet
Gini Index and CART Decision Trees
16 pages
ID3 Decision Tree Explained
No ratings yet
ID3 Decision Tree Explained
8 pages
Understanding Decision Tree Algorithms
No ratings yet
Understanding Decision Tree Algorithms
46 pages
Weather Data Analysis for Play Decisions
No ratings yet
Weather Data Analysis for Play Decisions
6 pages
Decision Trees for Classification and Regression
No ratings yet
Decision Trees for Classification and Regression
6 pages
Understanding Decision Tree Algorithms
No ratings yet
Understanding Decision Tree Algorithms
18 pages
Teaching Statistics with Climate Data
No ratings yet
Teaching Statistics with Climate Data
24 pages
Decision Tree Algorithm Overview
No ratings yet
Decision Tree Algorithm Overview
26 pages
Understanding Decision Trees in Classification
No ratings yet
Understanding Decision Trees in Classification
5 pages
Decision Trees for Engineer Selection
No ratings yet
Decision Trees for Engineer Selection
47 pages
Visualizing Quantitative Data Techniques
No ratings yet
Visualizing Quantitative Data Techniques
49 pages
Understanding Decision Trees in ML
No ratings yet
Understanding Decision Trees in ML
34 pages
Decision Trees in Business Analytics
No ratings yet
Decision Trees in Business Analytics
42 pages
GINI Index vs Information Gain in Decision Trees
No ratings yet
GINI Index vs Information Gain in Decision Trees
8 pages
Decision Tree Algorithm Explained
No ratings yet
Decision Tree Algorithm Explained
45 pages
Strongware Algorithm Implementation
No ratings yet
Strongware Algorithm Implementation
24 pages
Knowledge Engineering Exam Insights
No ratings yet
Knowledge Engineering Exam Insights
11 pages
Tree-Based Methods Overview
No ratings yet
Tree-Based Methods Overview
35 pages
Understanding Decision Tree Algorithms
No ratings yet
Understanding Decision Tree Algorithms
85 pages
Understanding Information Gain in ML
No ratings yet
Understanding Information Gain in ML
34 pages
Decision Tree Classification Explained
No ratings yet
Decision Tree Classification Explained
19 pages
Efficient Decision Tree Construction Techniques
No ratings yet
Efficient Decision Tree Construction Techniques
11 pages
Mobile Phone Radiation and Health Risks
No ratings yet
Mobile Phone Radiation and Health Risks
12 pages
Denso Robot Training Manual Overview
100% (1)
Denso Robot Training Manual Overview
46 pages
Instrumentation and Control Specification
No ratings yet
Instrumentation and Control Specification
61 pages
Understanding Atomic Absorption Spectroscopy
100% (1)
Understanding Atomic Absorption Spectroscopy
34 pages
Level 2 Test for Klasa 1 Students
No ratings yet
Level 2 Test for Klasa 1 Students
2 pages
2024 UCE Biology Mock Exam Paper 1
100% (2)
2024 UCE Biology Mock Exam Paper 1
6 pages
Alpha 355 Crimping Machine Overview
100% (1)
Alpha 355 Crimping Machine Overview
6 pages
Forensic Insights on Crucifixion Methods
No ratings yet
Forensic Insights on Crucifixion Methods
3 pages
Strategic Supply Chain Management Insights
No ratings yet
Strategic Supply Chain Management Insights
23 pages
Insights from Teaching Music to Children
No ratings yet
Insights from Teaching Music to Children
28 pages
燃气轮机系统设备清单
No ratings yet
燃气轮机系统设备清单
41 pages
Waller Man
No ratings yet
Waller Man
43 pages
Green Infrastructure in Relation To Informal Urban SettlementsJournal of Architecture and Urbanism
No ratings yet
Green Infrastructure in Relation To Informal Urban SettlementsJournal of Architecture and Urbanism
12 pages
X-Ray Imaging and Biomedical Techniques
100% (1)
X-Ray Imaging and Biomedical Techniques
26 pages
Understanding Sociology and Society
No ratings yet
Understanding Sociology and Society
5 pages
Philosophy of Artificial Intelligence
No ratings yet
Philosophy of Artificial Intelligence
6 pages
GoXtreme Vision Plus
No ratings yet
GoXtreme Vision Plus
28 pages
Children: Dental Caries and Oral Health in Children-Special Issue
No ratings yet
Children: Dental Caries and Oral Health in Children-Special Issue
3 pages
MV99 - Load Sensing VV
No ratings yet
MV99 - Load Sensing VV
46 pages
Legion Slim 7 15 6 Userguide en
No ratings yet
Legion Slim 7 15 6 Userguide en
32 pages
(Physics of Atoms and Molecules 18) R. M. Sternheimer (Auth.), H. J. Beyer, Hans Kleinpoppen (Eds.) - Progress in Atomic Spectroscopy Part C-Springer US (1984) PDF
100% (1)
(Physics of Atoms and Molecules 18) R. M. Sternheimer (Auth.), H. J. Beyer, Hans Kleinpoppen (Eds.) - Progress in Atomic Spectroscopy Part C-Springer US (1984) PDF
645 pages
Đề Thi Tiếng Anh Lớp 3 Học Kỳ 1
No ratings yet
Đề Thi Tiếng Anh Lớp 3 Học Kỳ 1
10 pages
Analisis Pondasi Rakit pada Tanah Lunak
No ratings yet
Analisis Pondasi Rakit pada Tanah Lunak
6 pages
Automatic Timetable Generator System
No ratings yet
Automatic Timetable Generator System
7 pages
Belliss & Morcom PET Compressor Solutions
0% (1)
Belliss & Morcom PET Compressor Solutions
4 pages
LED Emergency Power Pack for T5/T8
No ratings yet
LED Emergency Power Pack for T5/T8
2 pages
Hematology and Biochemistry Report
No ratings yet
Hematology and Biochemistry Report
2 pages
Unified Dispersion Model Overview
No ratings yet
Unified Dispersion Model Overview
24 pages
Heat Treatment Guidelines for Steel
No ratings yet
Heat Treatment Guidelines for Steel
1 page
Introduction to Computer Architecture
100% (2)
Introduction to Computer Architecture
49 pages