Data Analytics
Session 4-5:Decision Trees / Classification and Regression Trees
Arghya Ray
Decision Tree
• A decision tree is a popular classification method that results in a flow-chart like tree structure where each node denotes a test on
an attribute value and each branch represents an outcome of the test. The tree leaves represent the classes.
• Decision tree is a model that is both predictive and descriptive.
• Advantages:
• Decision tree approach is widely used since it is efficient and can deal with both continuous and categorical variables.
• The decision tree approach is able to deal with missing values in the training data and can tolerate some errors in data.
• The decision tree approach is perhaps the best if each attribute takes only a small number of possible values.
• Disadvantages:
• Decision trees are less appropriate for tasks where the task is to predict values of a continuous variable like share price or interest rate.
• Decision trees can lead to a large number of errors if the number of training examples per class is small.
• The complexity of a decision tree increases as the number of attributes increases.
• Measuring the quality of a decision tree is an interesting problem altogether. Classification accuracy determined using test data is
obviously a good measure but other measures like, average cost and worst case cost of classifying an object may be used.
Picture taken from Velocity Business Solutions. Link: https://www.vebuso.com/2020/01/decision-tree-intuition-from-concept-to-application/
2. Decision trees can also be used to visualize classification rules.
Classification and Regression Trees
Goal: Classify or predict an outcome based on a set of predictors.
The output is a set of rules
Example:
• Goal: classify a record as “will accept credit card offer” or “will not accept”
• Rule might be “IF (Income > 92.5) AND (Education < 1.5) AND (Family <= 2.5) THEN Class = 0 (non-acceptor)
• Recursive partitioning: Repeatedly split the records into two parts so as to achieve maximum homogeneity within the new
parts
Recursive partitioning steps:
• Pick one of the predictor variables, xi
• Pick a value of xi, say si, that divides the training data into two (not necessarily equal) portions
• Measure how “pure” or homogeneous each of the resulting portions are
• “Pure” = containing records of mostly one class
• Algorithm tries with different variables (x) and different values of xi, , i.e., si to maximize purity in a split
• After you get a “maximum purity” split, repeat the process for a second split, and so on
Forming a tree from the given example
RID Age Income Student Credit rating Class (buys
computer)
1 <=30 High No Fair No
2 <=30 High No Excellent No
3 31-40 High No Fair Yes
4 >40 Medium No Fair Yes
5 >40 Low Yes Fair Yes
6 >40 Low Yes Excellent No
7 31-40 Low Yes Excellent Yes
8 <=30 Medium No Fair No
9 <=30 Low Yes Fair Yes
10 >40 Medium Yes Excellent Yes
11 <=30 Medium Yes Excellent Yes
12 30-40 Medium No Excellent Yes
13 30-40 High Yes Fair Yes
14 >40 Medium No Excellent No
Age
<=30 >40
31-40
income student Credit Class income student Credit Class
rating rating
High No fair No Medium No fair Yes
High No excellent No Low Yes Fair Yes
Medium No Fair No Low Yes Excellent No
Low Yes Fair Yes
Medium Yes Fair Yes
Medium Yes Excellent yes
Medium No Excellent No
income student Credit Class
rating
High No Fair Yes
Low Yes Excellent Yes
Medium No Excellent Yes
High Yes Fair Yes
Measuring Impurity
• Gini Index (measure of impurity)
• Gini Index for rectangle A containing m cases
p = proportion of cases in rectangle A that belong to class k
• I(A) = 0 when all cases belong to same class (most pure)
• Entropy (measure of impurity)
p = proportion of cases (out of m) in rectangle A that belong to class k
Entropy ranges between 0 (most pure) and log2(m) (equal representation of classes)
Using the principle of ‘Information entropy’ build a ‘decision tree’ using the training data given below. Divide the ‘credit
rating’ attribute into ranges as follows: (0, 1.6], (1.6,1.7], (1.7,1.8], (1.8,1.9], (1.9,2.0], (2.0,5.0]
Sr. No. Profession Credit rating Class
1 Business 1.6 Buys only laptop
2 Service 2.0 Buys laptop with CD Writer
3 Business 1.9 Buys laptop with printer
4 Business 1.88 Buys laptop with printer
5 Business 1.70 Buys only laptop
6 Service 1.85 Buys laptop with printer
7 Business 1.60 Buys only laptop
8 Service 1.70 Buys only laptop
9 Service 2.20 Buys laptop with CD writer
10 Service 2.10 Buys laptop with CD writer
11 Business 1.80 Buys laptop with printer
12 Service 1.95 Buys laptop with printer
13 Business 1.90 Buys laptop with printer
14 Business 1.80 Buys laptop with printer
15 Business 1.75 Buys laptop with printer
Profession
Business Service
Sr. Credit Class Sr. Credit Class
No. rating No. rating
1 1.6 Buys only laptop 1 2.0 Buys laptop with CD Writer
2 1.9 Buys laptop with printer
2 1.85 Buys laptop with printer
3 1.88 Buys laptop with printer
4 1.70 Buys only laptop 3 1.70 Buys only laptop
5 1.60 Buys only laptop 4 2.20 Buys laptop with CD writer
6 1.80 Buys laptop with printer
5 2.10 Buys laptop with CD writer
7 1.90 Buys laptop with printer
6 1.95 Buys laptop with printer
8 1.80 Buys laptop with printer
9 1.75 Buys laptop with printer
Credit Rating
(0,1.6]
(1.6,1.7]
Sr. Profession Class Sr. Professio Class
No. No. n
1 Business Buys only 1 Business Buys only
Laptop Laptop
2 Business Buys only
Laptop 2 Service Buys only
Laptop
• Initially there are 3 classes: Buys only laptop, buys laptop with CD writer, buys laptop with printer
• Initial Overall Entropy (E0)= -= - [ + ) = 0.918
• Based on Profession : 9 Business, 6 Service
• Entropy (Profession) = (service) =
• Information Gain (Profession) = E0-E(Profession) = 0.918-0.716= 0.202
• Entropy (CR (2,5])=Entropy(CR (0, 1.6])= Entropy (CR (1.6,1.7]) = Entropy (CR (1.7,1.8]) = Entropy( CR (1.8,1.9]) = 0
• Entropy (CR (1.9,2])= = 0.630
• Entropy (Credit Rating) = ++++ + = 0.0841
• Information Gain (Credit Rating) = 0.918-0.084 = 0.834
Credit Rating
(2,5] (1.9,2] (1.7,1.9] (0,1.7]
Buys laptop with CD Buys Laptop with
Profession (Service) Buys only laptop
Writer Printer
P=0.5 P=0.5
Buys laptop with CD Buys laptop with
Writer printer
The content of the slides are prepared from different textbooks.
References:
• Data Mining and Predictive Analytics, By Daniel T. Larose. Copyright 2015 John Wiley & Sons, Inc.
• Predictive Analytics for Dummies, By Anasse Bari, Mohamed Chaouchi, & Tommy Jung, Copyright 2016, John Wiley & Sons, Inc.
• Introduction to Data Mining with Case Studies, By G.K. Gupta. Copyright 2014 by PHI Learning Private Limited.
Thank you..