0% found this document useful (0 votes)

41 views7 pages

Decision Trees Set-1

The document discusses various questions related to decision trees, including: - The difference between out-of-bag (OOB) score and validation score - How to deal with an overfitted decision tree through pre-pruning or post-pruning techniques - Some disadvantages of decision trees like overfitting and how to address them through validation sets, pruning, handling continuous variables, and dealing with missing values

Uploaded by

Kiruthiga Sivaraman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views7 pages

Decision Trees Set-1

Uploaded by

Kiruthiga Sivaraman

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 7

Q7: What is the difference between OOB score and validation score?

Q8: How would you deal with an Overfitted Decision Tree?

Q9: What are some disadvantages of using Decision Trees and how would you solve them?
Q10: How would you define the Stopping Criteria for decision trees?

Q1: What are Decision Trees?

Decision Trees are a fundamental machine learning algorithm used for both
classification and regression tasks. They are a non-parametric supervised learning
method that is widely used in data science. Decision Trees create a tree-like model of
decisions and their possible consequences.

In a Decision Tree:

 The root node represents the entire dataset or the initial problem.
 Internal nodes are decision nodes that split the data into subgroups based on
specific criteria, often using features from the dataset.
 Leaf nodes are the final outcomes or predictions.
The process of building a Decision Tree involves selecting the best feature to split the
data, typically using metrics like Gini impurity or information gain. Decision Trees are
known for their interpretability, as you can easily visualize the tree structure, making it
understandable even to non-technical stakeholders.

However, Decision Trees can be prone to overfitting, so techniques like pruning or using
ensemble methods like Random Forests are often employed to improve their
performance.

● Decision trees is a tool that uses a tree-like model of decisions and their possible consequences. If
an algorithm only contains conditional control statements, decision trees can model that
algorithm really well.
● Decision trees are a non-parametric, supervised learning method.
● Decision trees are used for classification and regression tasks.
● The diagram below shows an example of a decision tree (the dataset used is the Titanic dataset
to predict whether a passenger survived or not):
Q2: Explain the structure of a Decision Tree
A decision tree is a flowchart-like structure in which:

● Each internal node represents the test on an attribute (e.g. outcome of a coin flip).
● Each branch represents the outcome of the test.
● Each leaf node represents a class label.
● The paths from the root to leaf represent the classification rules.

● Splitting Criteria: At each internal node, a splitting criterion is used to determine

how the data should be divided into subgroups. Common criteria include Gini
impurity, information gain, or mean squared error, depending on whether it's a
classification or regression tree.
● Depth of the Tree: The depth of the tree is the length of the longest path from the
root node to a leaf node. A deeper tree can capture more complex patterns in the
data but is also more prone to overfitting.
● Features: Each internal node uses a specific feature from the dataset to make a
decision on how to split the data.
Q3: How are the different nodes of decision trees represented?
A decision tree consists of three types of nodes:

● Decision nodes: Represented by squares. It is a node where a flow branches into several optional
branches.
● Chance nodes: Represented by circles. It represents the probability of certain results.
● End nodes: Represented by triangles. It shows the final outcome of the decision path.

Q4: What are some advantages of using Decision Trees?

● It is simple to understand and interpret. It can be visualized easily.
● It does not require as much data preprocessing as other methods.
● It can handle both numerical and categorical data.
● It can handle multiple output problems.

Advantages of using Decision Trees:

Interpretability: Decision Trees are easy to understand and visualize, making them
great for explaining decisions.

Simple Implementation: They are straightforward to implement and can handle

various data types without much preprocessing.

Versatility: Suitable for classification and regression tasks, making them

applicable to a wide range of problems.

Feature Selection: They can automatically rank and select important features.

Robustness: Can handle missing values and are robust to outliers.

Scalability: Efficient on large datasets and parallelizable.

No Data Normalization: They don't require data normalization.

Non-linear Relationships: Can model complex non-linear relationships in the data.

Q5: What type of node is considered Pure?

In the context of Decision Trees, a node is considered "pure" when it contains only one class or category
in a classification problem or when it contains data points with the same target value in a regression
problem.
● If the Gini Index of the data is 0 then it means that all the elements belong to a specific class.
When this happens it is said to be pure.
● When all of the data belongs to a single class (pure) then the leaf node is reached in the tree.
● The leaf node represents the class label in the tree (which means that it gives the final output).

Q7: What is the difference between OOB score and validation score?
The oob_score uses a sample of “left-over” data that wasn’t necessarily used during the
model’s analysis, and the validation set is sample of data you yourself decided to subset. in
this way, the oob sample is a little more random than the validation set. Therefore, the oob
sample (on which the oob_score is measured) may be “harder” that the validation set. The
oob_score may on average have a “less good” accuracy score as a consequence.

Q8: How would you deal with an Overfitted Decision Tree?

Two approaches to avoiding overfitting are distinguished: pre-pruning
(generating a tree with fewer branches than would otherwise be the case) and
post-pruning (generating a tree in full and then removing parts of it). Results are
given for pre-pruning using either a size or a maximum depth cutoff. A method of
post-pruning a decision tree based on comparing the static and backed-up
estimated error rates at each node is also described.

Q9: What are some disadvantages of using Decision Trees and how would you solve them?

1. Overfitting the data:

Definition: given a hypothesis space H, a hypothesis is said to overfit the training

data if there exists some alternative hypothesis , such that h has smaller error than h'
over the training examples, but h' has smaller error than h over the entire distribution of
instances.

How can we prevent overfitting? Here are some common heuristics:

● Don't try to fit all examples, stop before the training set is exhausted.
● Fit all examples then prune the resultant tree.

How does one tell if a given tree is one that has overfit the data?

● Extract a validation set not used for training from the training set and use this to check for overfitting.
Usually the validation set consists of one-third of the training set, chosen randomly.
● Then use statistical tests, eg. the chi-squared metric, to determine if changing the tree improves its
performance over the validation set.
● A variation of the above is to use MDL to check if modifying the tree increases its MDL with respect
to the validation set.

If we use the validation set to guide pruning, how do we tell that the tree is not overfitting the validation set? In
this case, we need to extract yet another set called the test set from the training set and use this for the final
check.

2. Guarding against bad attribute choices:

The information gain measure has a bias that favors attributes with many values over those with only a few.

For example, if you imagine an attribute Date with unique values for each training example, then Gain(S,Date)

will yield H(S) since .

Obviously no other attribute can do better. This will result in a very broad tree of depth 1.

To guard against this, Quinlan uses GainRatio(S,A) instead of Gain(S,A).

Where

with P(Sv) estimated by relative frequency as before.

This introduces a penalty for partitioning into equipossible groups.

3. Handling continuous valued attributes:

Note that continuous valued attributes can be partitioned into a discrete number of disjoint intervals. Then we
can test for membership to these intervals.

If the Temperature attribute in the PlayTennis example took on continuous values in the range 40-90, note that
we cannot just use the same approach as before.

Why? Because Temperature becomes a bad choice for classification (It alone may perfectly classify the
training examples and therefore promise the highest information gain like in the earlier Date example) while
remaining a poor predictor on the test set.

The solution to this problem is to classify based not on the actual temperature, but on dynamically determined
intervals within which the temperature falls.

For instance we can introduce boolean attributes , , and T > c

instead of real valued T. a, b and c can be computed dynamically based on boundaries where T is likely or
known to change.

4. Handling missing attribute values:

What happens if some of the training examples contain one or more ``?'', meaning ``value not known'' instead
of the actual attribute values?

Here are some common ad hoc solutions:

● Substitute ``?'' by the most common value in that column.

● Substitute ``?'' by the most common value among all training examples that have been sorted into the
tree at that node.
● Substitute ``?'' by the most common value among all training examples that have been sorted into the
tree at that node with the same classification as the incomplete example.
● etc.

Quinlan's solution in C4.5 is slightly more complex: Instead of a value, he assigns a distribution over values to
the ``?''. This distribution is used when computing Gain(S,A). The distribution determines how much of H(Sv=?)
will contribute to each H(Sv).

5. Handling attributes with differing costs:

Sometimes we want to introduce additional bias against the selection of certain attributes. Eg. ``Wherever
possible try not to use InvasiveTest as an attribute, since this might inconvenience the patient.

Here is another ad hoc heuristic to fix this:

Introduce a user-defined measure Cost(A), to specify a fixed bias against each attribute.
Now we can use a CostedGain(S,A) which is defined along the lines of:

Q10: How would you define the Stopping Criteria for decision trees?
All decision trees need stopping criteria or it would be possible, and undesirable, to grow a tree
in which each case occupied its own node. The resulting tree would be computationally
expensive, difficult to interpret and would probably not work very well with new data. For
example, consider the diagram below. The dotted curve is a decision boundary that accurately
separates out the two classes in the training data. A diagonal red line is probably a better
decision boundary for new cases.

The stopping criteria used by CTREE are typical of many decision tree programs.

● Number of cases in the node is less than some pre-specified limit.

● Purity of the node is more than some pre-specified limit. Because CTREE only allows
equal costs this means that the proportion of cases in the node with class equal to the
majority class is more than the pre-specified limit.
● Depth of the node is more than some pre-specified limit.
● Predictor values for all records are identical - in which no rule could be generated to split
them.

Decision Tree Comprehesive
No ratings yet
Decision Tree Comprehesive
7 pages
AIML Removed
No ratings yet
AIML Removed
25 pages
AIML Removed Merged
No ratings yet
AIML Removed Merged
31 pages
Unit 4
No ratings yet
Unit 4
33 pages
Unit Ii
No ratings yet
Unit Ii
22 pages
Short Questions On Decision Tree
No ratings yet
Short Questions On Decision Tree
2 pages
Lecture Note #5 - PEC-CS701E
No ratings yet
Lecture Note #5 - PEC-CS701E
16 pages
Lecture Notes 3
No ratings yet
Lecture Notes 3
11 pages
AIML Final Cpy Word
No ratings yet
AIML Final Cpy Word
15 pages
Decision Tree
0% (1)
Decision Tree
16 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
What Is Decision Tree
No ratings yet
What Is Decision Tree
35 pages
Decision Trees
No ratings yet
Decision Trees
18 pages
ML Unit3
No ratings yet
ML Unit3
8 pages
Unit Iir20
No ratings yet
Unit Iir20
22 pages
Dmi Unit 4
No ratings yet
Dmi Unit 4
34 pages
Tree
No ratings yet
Tree
7 pages
Decision Trees Report
No ratings yet
Decision Trees Report
3 pages
Unit 3 Notes
No ratings yet
Unit 3 Notes
17 pages
2.12 Chapter 6 Decision Tree
No ratings yet
2.12 Chapter 6 Decision Tree
56 pages
Decision Trees for Beginners
No ratings yet
Decision Trees for Beginners
45 pages
AIML Module 4 Imp
No ratings yet
AIML Module 4 Imp
5 pages
Decision Trees for Data Mining Students
No ratings yet
Decision Trees for Data Mining Students
30 pages
Chapter 03
No ratings yet
Chapter 03
30 pages
Decision Tree
No ratings yet
Decision Tree
57 pages
Unit 3 Notes
No ratings yet
Unit 3 Notes
25 pages
Decision Tree Algorithm
No ratings yet
Decision Tree Algorithm
14 pages
Decision Tree
100% (1)
Decision Tree
57 pages
ML Unit 2
No ratings yet
ML Unit 2
8 pages
Unit 1 ML (NN& ML Techniques)
No ratings yet
Unit 1 ML (NN& ML Techniques)
40 pages
Decision Tree
No ratings yet
Decision Tree
3 pages
Notes On Decision Trees
No ratings yet
Notes On Decision Trees
2 pages
Assignment of Decision Tree
No ratings yet
Assignment of Decision Tree
15 pages
Unit 4-2
No ratings yet
Unit 4-2
20 pages
Unit 1 ML (DT)
No ratings yet
Unit 1 ML (DT)
24 pages
Machine - Learning - Lecture - 08 - Decision Tree Learning
No ratings yet
Machine - Learning - Lecture - 08 - Decision Tree Learning
67 pages
Breaking Down Decision Tree Algorithm
No ratings yet
Breaking Down Decision Tree Algorithm
10 pages
Decissin Tree & Over Fitting
No ratings yet
Decissin Tree & Over Fitting
22 pages
Decision Tree
No ratings yet
Decision Tree
6 pages
Unit 3
No ratings yet
Unit 3
25 pages
Decision Tree
No ratings yet
Decision Tree
82 pages
Decision Tree
No ratings yet
Decision Tree
8 pages
Decision Tree
No ratings yet
Decision Tree
11 pages
Decision Trees Cheat Sheet PDF
No ratings yet
Decision Trees Cheat Sheet PDF
2 pages
Lecture Note 5
No ratings yet
Lecture Note 5
7 pages
09 Decision Trees Nearest Neighbor
No ratings yet
09 Decision Trees Nearest Neighbor
8 pages
Intro to Decision Trees for ML Students
No ratings yet
Intro to Decision Trees for ML Students
15 pages
Decision Trees
No ratings yet
Decision Trees
37 pages
Lecture 8
No ratings yet
Lecture 8
28 pages
Lecture 5a
No ratings yet
Lecture 5a
24 pages
Ch5 Data Science
No ratings yet
Ch5 Data Science
60 pages
Machine Learning Note 2
No ratings yet
Machine Learning Note 2
2 pages
Decision Tree
No ratings yet
Decision Tree
5 pages
Decision Tree
No ratings yet
Decision Tree
31 pages
Decisiontree, Prefixcodeandgametree
No ratings yet
Decisiontree, Prefixcodeandgametree
12 pages
Decision Trees
No ratings yet
Decision Trees
9 pages
Introduction To Decision Trees
No ratings yet
Introduction To Decision Trees
10 pages
Decision Tree
No ratings yet
Decision Tree
3 pages
ML Supervised Full Notes
No ratings yet
ML Supervised Full Notes
62 pages
Machine Learning Is A Computer Vision
No ratings yet
Machine Learning Is A Computer Vision
7 pages
Ethos Artificial Intelligence Technical Brief
100% (1)
Ethos Artificial Intelligence Technical Brief
10 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
23 pages
First Order Rules-1
No ratings yet
First Order Rules-1
12 pages
MLParest Machine Learning Based Parasitic Estimation For Custom Circuit Design
No ratings yet
MLParest Machine Learning Based Parasitic Estimation For Custom Circuit Design
6 pages
DFS A Dataset For Fire and Smoke Object Detection
No ratings yet
DFS A Dataset For Fire and Smoke Object Detection
20 pages
ML 5 Marks Questions Answers 1 To 30
No ratings yet
ML 5 Marks Questions Answers 1 To 30
5 pages
NLP Assignment (213-15-4243)
No ratings yet
NLP Assignment (213-15-4243)
6 pages
Multi-Scale Feature Pyramid Fusion Network For Medical Image Segmentation
No ratings yet
Multi-Scale Feature Pyramid Fusion Network For Medical Image Segmentation
13 pages
Wa0012.
No ratings yet
Wa0012.
29 pages
All Projects S25
No ratings yet
All Projects S25
149 pages
NARX Neural Networks for Time Series
No ratings yet
NARX Neural Networks for Time Series
8 pages
A Robust Model For Automated Essay Scoring System
No ratings yet
A Robust Model For Automated Essay Scoring System
5 pages
Heart Disease ML Tool for Clinicians
No ratings yet
Heart Disease ML Tool for Clinicians
36 pages
Voice-Based Deep Learning Medical Diagnosis System For Parkinson's Disease Prediction
No ratings yet
Voice-Based Deep Learning Medical Diagnosis System For Parkinson's Disease Prediction
5 pages
2072 4119 1 SM
No ratings yet
2072 4119 1 SM
5 pages
Project Report ML Team 3-1
No ratings yet
Project Report ML Team 3-1
37 pages
Implementation of Logistic Regression On Diabetic Dataset Using Train-Test-Split, K-Fold and Stratified K-Fold Approach
No ratings yet
Implementation of Logistic Regression On Diabetic Dataset Using Train-Test-Split, K-Fold and Stratified K-Fold Approach
4 pages
Ps 1
No ratings yet
Ps 1
16 pages
A Machine Learning Predictive Model For Determining Credit Risks in Ethiopian Microfinance Institutions
No ratings yet
A Machine Learning Predictive Model For Determining Credit Risks in Ethiopian Microfinance Institutions
20 pages
Enhancing Medicare Fraud Detection Through Machine Learning Addressing Class Imbalance With SMOTE-ENN
No ratings yet
Enhancing Medicare Fraud Detection Through Machine Learning Addressing Class Imbalance With SMOTE-ENN
16 pages
Rainfall Prediction Using Machine Learning
No ratings yet
Rainfall Prediction Using Machine Learning
50 pages
Naive Bayes Iris Classifier Guide
No ratings yet
Naive Bayes Iris Classifier Guide
14 pages
Kannada Character Recognition Using CNN
No ratings yet
Kannada Character Recognition Using CNN
5 pages
Data Mining for Tech Students
No ratings yet
Data Mining for Tech Students
36 pages
AIML
No ratings yet
AIML
38 pages
Project Ideas
No ratings yet
Project Ideas
2 pages
Unsupervised Learning of Visual Features by Contrasting Cluster Assignments
No ratings yet
Unsupervised Learning of Visual Features by Contrasting Cluster Assignments
13 pages
144-Statistical Analysis of Imbalanced Classification With Training Size Variation and Subsampling On Datasets of Research Papers in Biomedical Literature
No ratings yet
144-Statistical Analysis of Imbalanced Classification With Training Size Variation and Subsampling On Datasets of Research Papers in Biomedical Literature
26 pages

Decision Trees Set-1

Uploaded by

Decision Trees Set-1

Uploaded by

Q7: What is the difference between OOB score and validation score?

Q8: How would you deal with an Overfitted Decision Tree?

Q1: What are Decision Trees?

● Splitting Criteria: At each internal node, a splitting criterion is used to determine

Q4: What are some advantages of using Decision Trees?

Advantages of using Decision Trees:

Simple Implementation: They are straightforward to implement and can handle

Versatility: Suitable for classification and regression tasks, making them

Robustness: Can handle missing values and are robust to outliers.

Scalability: Efficient on large datasets and parallelizable.

No Data Normalization: They don't require data normalization.

Non-linear Relationships: Can model complex non-linear relationships in the data.

Q5: What type of node is considered Pure?

Q8: How would you deal with an Overfitted Decision Tree?

1. Overfitting the data:

Definition: given a hypothesis space H, a hypothesis is said to overfit the training

How can we prevent overfitting? Here are some common heuristics:

2. Guarding against bad attribute choices:

will yield H(S) since .

To guard against this, Quinlan uses GainRatio(S,A) instead of Gain(S,A).

with P(Sv) estimated by relative frequency as before.

This introduces a penalty for partitioning into equipossible groups.

For instance we can introduce boolean attributes , , and T > c

4. Handling missing attribute values:

Here are some common ad hoc solutions:

● Substitute ``?'' by the most common value in that column.

5. Handling attributes with differing costs:

Here is another ad hoc heuristic to fix this:

● Number of cases in the node is less than some pre-specified limit.

You might also like