What is a Decision Tree?
A decision tree is a flowchart-like structure used to make decisions or
predictions. It consists of nodes representing decisions or tests on attributes,
branches representing the outcome of these decisions, and leaf nodes
representing final outcomes or predictions. Each internal node corresponds to
a test on an attribute, each branch corresponds to the result of the test, and
each leaf node corresponds to a class label or a continuous value.
Advantages of Decision Trees
• Simplicity and Interpretability: Decision trees are easy to understand and
interpret. The visual representation closely mirrors human decision-making
processes.
• Versatility: Can be used for both classification and regression tasks.
• No Need for Feature Scaling: Decision trees do not require normalization
or scaling of the data.
• Handles Non-linear Relationships: Capable of capturing non-linear
relationships between features and target variables.
Disadvantages of Decision Trees
• Overfitting: Decision trees can easily overfit the training data, especially if
they are deep with many nodes.
• Instability: Small variations in the data can result in a completely different
tree being generated.
• Bias towards Features with More Levels: Features with more levels can
dominate the tree structure.
Example
Points to Remember
● While making Decision Trees, one should take a good look at the dataset given to them
and try to figure out what pattern does the output leaf follow. Try selecting any one
output and on its basis, find out the common links which all the similar outputs have.
● Many times, the dataset might contain redundant data which does not hold any value
while creating a decision tree. Hence, it is necessary that you note down which are the
parameters that affect the output directly and should use only those while creating a
decision tree.
● There might be multiple decision trees which lead to correct prediction for a single
dataset. The one which is the simplest should be chosen as the best.