[go: up one dir, main page]

0% found this document useful (0 votes)
18 views8 pages

Decision Trees in Machine Learning: Representation of Algorithm As A Tree

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 8

Decision Trees in Machine Learning

Trees occupy an important place in the life of man. The trees provide us
flowers, fruits, fodder for animals, wood for fire and furniture and provide cool
shadow from scorching sun. They give us so many such good things and yet
expect nothing in return. Besides being such a important element for the
survival of human beings, trees have also inspired wide variety of algorithms
in Machine Learning both classification and regression.

Representation of Algorithm as a Tree


Decision Tree learning algorithm generates decision trees from the training
data to solve classification and regression problem. Consider you would like to
go out for game of Tennis outside. Now the question is how would one decide
whether it is ideal to go out for a game of tennis. Now this depends upon
various factors like time, weather, temperature etc. We call these factors as
features which will influence our decision. If you could record all the factors
and decision you took, you could get a table something like this.
Data collected of last 14 days

With this table, other people would be able to use your intuition to decide
whether they should play tennis by looking up what you did given a certain
weather pattern, but after just 14 days, it’s a little unwieldy to match your
current weather situation with one of the rows in the table. We could represent
this tabular data in the form of tree.
Creating tree from the data

Here all the information is represented in the form of tree. The rectangular box
represents the node of the tree. Splitting of data is done by asking question to
the node. The branches represents various possible known outcome obtained
by asking the question on the node. The end nodes are the leafs. They
represent various classes in which the data can be classified. The two classes in
this example are Yes and No. Thus to obtain the class/final output, ask the
question to the node and using the answer travel through branch until one
reaches the leaf node.

Algorithm
1. Start with a training data set which we’ll call S. It should have
attributes and classification.

2. Determine the best attribute in the dataset. (We will go over the
definition of best attribute)

3. Split S into subset that contains the possile values for the best
attribute.

4. Make decision tree node that contains the best attribute.

5. Recursively generate new decision trees by using the subset of data


created from step 3 until a stage is reached where you cannot classify
the data further. Represent the class as leaf node.

In Decision Tree algorithm, the best mean the attribute which has
most information gain.
The left split has less information gain as the data is split on two classes which
has almost equal ‘+’ and ‘-’ examples. While the split on the right as more ‘+’
example in one class and more ‘-’ example in the other class. In order to
calculate best attribute we will use Entropy.

Entropy
In machine learning sense and especially in this case Entropy is the measure of
homegeneity in the data. Its value is ranges from 0 to 1. Its value is close to 0 if
all the example belongs to same class and is close to 1 is there is almost equal
split of the data into different classes. Now the formula to calculate entropy is :
Here pi represents the proportion of the data with ith classification and c
represents the different types of classification.

Now Information Gain measure the reduction in entropy by classifying the


data on a particular attribute. The formula to calculate Gain by splitting the
data on Dataset ‘S’ and on the attribute ‘A’ is :

Here Entropy(S) represents the entropy of the dataset and the second term on
the right is the weighted entropy of the different possible classes obtain after
the split. Now the goal is to maximize this information gain. The attribute
which has the maximum information gain is selected as the parent node and
successively data is split on the node.

Entropy of the dataset (Entropy(S)) is :

Now to calculate Information Gain


Applying the formula of information gain on all different attributes.

As we can see Outlook attribute has the maximum information gain and hence
it is placed at top of the tree.

Problem with Continuous Data

Now one question may arise is how the data is split in case of continuous data.
Suppose there is attribute temperature which has values from 10 to 45 degree
celcius. We cannot make a split on every discrete value. What we can do in this
case is to split the discrete values into continuous classes such as

10–20 can be class1, 20–30 and so on and a particular discrete value is put to a
particular class.

Avoiding overfitting the data

Now the main problem with decision tree is that it is prone to overfitting. We
could create a tree that could classify the data perfectly or we are not left with
any attribute to split. This would work well in on the training dataset but will
have a bad result on the testing dataset. There are two popular approaches to
avoid this in decision trees: stop growing the tree before it becomes too
large or prune the tree after it becomes too large. Typically, a limit to a
decision tree’s growth will be specified in terms of the maximum number of
layers, or depth, it’s allowed to have. The data available to train the decision
tree will be split into a training set and test set and trees with
various maximum depths will be created based on the training set and
tested against the test set. Cross-validation can be used as part of this
approach as well.Pruning the tree, on the other hand, involves testing the
original tree against pruned versions of it. Leaf nodes are taken away from the
tree as long as the pruned tree performs better against test data than the larger
tree.

You might also like