Decision Trees in
Data Mining
Decision trees are a powerful data mining technique used to predict
outcomes. They resemble a flowchart, branching out from a root node
to predict a target variable based on input features.
preencoded.png
Introduction to Decision
Trees
1 Tree-like Structure 2 Predictive Power
Decision trees visually They predict target
represent a series of variables based on input
decisions leading to a features, creating a
conclusion. hierarchy of decisions.
3 Human-Readable
Their simplicity and visual nature make them understandable to
humans, facilitating decision-making.
Advantages of Decision
Trees
Easy to Interpret Handling Mixed Data
Their intuitive structure and Decision trees can effectively
clear rules make them easily handle both categorical and
interpretable, even for non- numerical data, making them
technical users. versatile.
Non-Parametric Method
They don't require assumptions about the data distribution,
making them adaptable to various data types.
preencoded.png
Key Components of a Decision Tree
Root Node Internal Nodes Leaf Nodes
The starting point of the tree, Represent features used to split the The final nodes, representing the
representing the entire dataset. data, leading to different branches. predicted outcome based on the
feature path.
Information Gain and
Entropy
Entropy
Measures the impurity or disorder in a dataset.
Information Gain
Determines the best feature to split the data at each node.
Decision Node
The feature with the highest information gain is selected
for splitting the data.
Building a Decision Tree
Data Preparation Tree Evaluation
The dataset is cleaned, preprocessed, and divided into training and The built tree is tested on the unseen data to measure its accuracy
testing sets. and performance.
1 2 3
Tree Induction
Recursively splitting the dataset based on information gain until
stopping criteria are met.
Pruning and Overfitting
Pruning Overfitting
Removing branches from the tree to prevent overfitting Occurs when the tree becomes too complex, memorizing
and improve generalization. the training data instead of learning general patterns.
Classification and
Regression Trees
Classification Trees Predict categorical outcomes,
like predicting whether a
customer will churn.
Regression Trees Predict numerical outcomes,
like predicting the price of a
house.
Interpreting Decision Trees
1 Path Tracing 2 Feature Importance
Following a specific path The depth and frequency
from the root to a leaf of a feature in the tree
node reveals the decision- indicate its importance in
making process. making predictions.
3 Decision Rules
Each path can be translated into a set of decision rules, making
the model transparent and understandable.
preencoded.png
Real-world Applications of Decision Trees
Medical Diagnosis Financial Modeling E-commerce
Decision trees are used to predict They can be used to predict credit risk Used to recommend products based
diseases based on patient symptoms. or stock prices. on customer behavior.