[go: up one dir, main page]

0% found this document useful (0 votes)
9 views8 pages

Decision Trees

Decision trees are models used for classification and regression tasks, utilizing features to make decisions through a structured tree-like format. The learning process involves selecting features to split data, measuring purity with entropy, and optimizing information gain until stopping criteria are met. Tree ensembles, such as random forests and XGBoost, enhance decision tree performance by combining multiple trees to improve accuracy and reduce overfitting.

Uploaded by

rahul.2nd.55
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views8 pages

Decision Trees

Decision trees are models used for classification and regression tasks, utilizing features to make decisions through a structured tree-like format. The learning process involves selecting features to split data, measuring purity with entropy, and optimizing information gain until stopping criteria are met. Tree ensembles, such as random forests and XGBoost, enhance decision tree performance by combining multiple trees to improve accuracy and reduce overfitting.

Uploaded by

rahul.2nd.55
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 8

Decision Trees

Decision trees are a type of model used for making decisions based on a set of features.
Here's an overview:

Cat Classification Example

Consider an example where you want to classify if an animal is a cat or not. You might use
features like ear shape, face shape, and presence of whiskers to make this classification.

Here's an image that shows the decision-making process:

A decision tree consists of:

 Root Node: The initial decision point.

 Decision Nodes: Intermediate nodes that split the data based on features.

 Leaf Nodes: Final nodes that give the prediction.

⚙️Decision Tree Learning Process

To build a decision tree, you need to learn how to make splits based on the features. For
example, you might start by splitting based on ear shape:

The image above shows how ear shape (pointy vs floppy) can be used to begin classifying
animal faces.

You continue splitting until you reach a point where you can confidently classify the data.
The image above provides a simple and visual guide for distinguishing between cats and
other animals

🤔 Key Decisions in Decision Tree Learning

There are two critical decisions to make when learning a decision tree:

1. How to choose what feature to split on at each node?

 Maximize purity (or minimize impurity).

2. When do you stop splitting?

 When a node is 100% one class.

 When splitting a node will result in the tree exceeding a maximum depth.

 When improvements in purity score are below a threshold.

 When the number of examples in a node is below a threshold.

The image below shows a tree data structure with nodes labeled with "Depth" values,
ranging from 0 at the top to 2 at the bottom, illustrating a basic hierarchical structure:
📏 Measuring Purity: Entropy

Entropy is a measure of impurity. In the context of decision trees, it helps determine the
best splits to maximize information gain. Entropy is measured by the following formula:

H(p1)

Where p1 is the fraction of examples that are cats.

The image below shows a graphical representation of the relationship


between p1 and H(p1).

📈 Choosing a Split: Information Gain


Information Gain is used to determine the best feature to split on. It measures how much
information is gained about the target variable, given a specific feature. The general formula
is:

InformationGain=H(p1root)−(leftleft+right∗H(p1left)
+rightleft+right∗H(p1right))InformationGain=H(p1root)−(left+rightleft∗H(p1left)
+left+rightright∗H(p1right))

Where:

 H(p1root)H(p1root) is the entropy of the root node.

 H(p1left)H(p1left) is the entropy of the left child node.

 H(p1right)H(p1right) is the entropy of the right child node.

🧩 Putting It Together

Here’s the process for constructing a decision tree:

 Start with all examples at the root node.

 Calculate information gain for all possible features and pick the one with the highest
information gain.

 Split the dataset according to the selected feature, and create left and right branches
of the tree.

 Keep repeating the splitting process until stopping criteria are met:

 When a node is 100% one class.

 When splitting a node will result in the tree exceeding a maximum depth.

 Information gain from additional splits is less than a threshold.

 When the number of examples in a node is below a threshold.

The image below shows how face shape can be used to classify cats, with ear shape as a
secondary factor.

Using One-Hot Encoding for Categorical Features

When dealing with categorical features, it's common to use one-hot encoding.

One-Hot Encoding: If a categorical feature can take on n values, create n binary features (0
or 1 valued).

Here's an example of how to apply one-hot encoding to the ear shape feature:
Ear shape Pointy Floppy

Pointy 1 0

Floppy 0 1

Oval 0 0

The image below shows a table with cat icons and features that have been one-hot encoded.

⚖️Continuous Valued Features

For continuous features, you can split the data based on a threshold. For example, if you
have a weight feature, you might split the data into examples where the weight is less than
or equal to a certain value, and examples where the weight is greater than that value. The
image below shows how a weight feature can be used to classify an animal.

🌲 Regression Trees (Optional)

Decision trees can also be used for regression tasks, where the goal is to predict a
continuous value. In this case, the leaf nodes would contain the average value of the target
variable for the examples that fall into that node.

The image below shows how ear and face shape could be used to predict a dog's weight.

🌳🌲🌴 Tree Ensembles

To improve the performance and robustness of decision trees, you can use tree ensembles.
These methods combine multiple decision trees to make predictions. The image below
shows the faces of a group of cats.

Tree ensembles can reduce overfitting and improve generalization performance.

Here's how a prediction might look using a tree ensemble:


🌳 Tree Ensembles

Sampling with Replacement

Sampling with replacement involves selecting items from a dataset where each item, once
chosen, is returned to the pool, allowing it to be selected again.

Here's an image demonstrating the concept of sampling with replacement:

This image displays a grid of colored blocks, each representing an item that can be sampled.
The colors are randomly distributed, illustrating how, with replacement, the same item
(color) can be chosen multiple times in a sample.

🌲 Random Forest Algorithm

The random forest algorithm is an ensemble learning method that operates by constructing
multiple decision trees during training and outputting the mode of the classes (classification)
or mean prediction (regression) of the individual trees.

The process can be described as follows:

1. Given a training set of size m.

2. For i = 1 to n:

 Use sampling with replacement to create a new training set of size m.

 Train a decision tree on the new dataset.

Randomizing the Feature Choice

At each node, when choosing a feature to use to split:

1. If n features are available, pick a random subset of < n features.

2. Allow the algorithm to only choose from that subset of features.

🚀 XGBoost (eXtreme Gradient Boosting)

XGBoost is an optimized distributed gradient boosting library designed to be


highly efficient, flexible and portable. It implements machine learning algorithms under the
Gradient Boosting framework. XGBoost provides a parallel tree boosting (also known as
Gradient Boosting Machine) that solve many data science problems in a fast and accurate
way.

Boosted Trees Intuition

1. Given a training set of size m.

2. For i = 1 to n:

 Use sampling with replacement to create a new training set of size m. But
instead of picking from all examples with equal (1/m) probability, make it
more likely to pick examples that the previously trained trees misclassify.

 Train a decision tree on the new dataset.

XGBoost Features:

 Open-source implementation of boosted trees

 Fast, efficient implementation

 Good choice of default splitting criteria and criteria for when to stop splitting

 Built-in regularization to prevent overfitting

 Highly competitive algorithm for machine learning competitions (e.g., Kaggle


competitions)

This image shows a decision tree diagram for classifying animals as cats or not-cats based on
physical characteristics:

Here are the code snippets for using XGBoost in classification and regression tasks:

# Classification

from xgboost import XGBClassifier

model = XGBClassifier()

model.fit(X_train, y_train)

y_pred = model.predict(X_test)
# Regression

from xgboost import XGBRegressor

model = XGBRegressor()

model.fit(X_train, y_train)

y_pred = model.predict(X_test)

🤔 When to Use Decision Trees

Decision Trees vs Neural Networks

Feature Decision Trees and Tree Ensembles Neural Networks

Works well on tabular (structured) Works well on all types of data, including tabul
Data Type data unstructured data

Data Not recommended for unstructured


Recommendation data (images, audio, text)

Speed Fast May be slower than a decision tree

Transfer Learning Works with transfer learning

Small decision trees may be human


Interpretability interpretable

When building a system of multiple models wor


System Building be easier to string together multiple neural net

You might also like