0% found this document useful (0 votes)

62 views14 pages

Decision Tree Introduction

The document discusses decision trees, which create a flowchart-like structure to classify or predict outcomes based on predictor variables. Decision trees work by recursively splitting data into purer groups based on variables and thresholds. Advantages include the ability to handle both continuous and categorical variables, missing data, and errors, while disadvantages include reduced accuracy with few examples and increased complexity with more variables.

Uploaded by

Parvathaneni Karishma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

62 views14 pages

Decision Tree Introduction

Uploaded by

Parvathaneni Karishma

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 14

Data Analytics

Session 4-5:Decision Trees / Classification and Regression Trees

Arghya Ray
Decision Tree

• A decision tree is a popular classification method that results in a flow-chart like tree structure where each node denotes a test on
an attribute value and each branch represents an outcome of the test. The tree leaves represent the classes.
• Decision tree is a model that is both predictive and descriptive.

• Advantages:
• Decision tree approach is widely used since it is efficient and can deal with both continuous and categorical variables.
• The decision tree approach is able to deal with missing values in the training data and can tolerate some errors in data.
• The decision tree approach is perhaps the best if each attribute takes only a small number of possible values.

• Disadvantages:
• Decision trees are less appropriate for tasks where the task is to predict values of a continuous variable like share price or interest rate.
• Decision trees can lead to a large number of errors if the number of training examples per class is small.
• The complexity of a decision tree increases as the number of attributes increases.

• Measuring the quality of a decision tree is an interesting problem altogether. Classification accuracy determined using test data is
obviously a good measure but other measures like, average cost and worst case cost of classifying an object may be used.
Picture taken from Velocity Business Solutions. Link: https://www.vebuso.com/2020/01/decision-tree-intuition-from-concept-to-application/
2. Decision trees can also be used to visualize classification rules.

Classification and Regression Trees

Goal: Classify or predict an outcome based on a set of predictors.
The output is a set of rules
Example:
• Goal: classify a record as “will accept credit card offer” or “will not accept”
• Rule might be “IF (Income > 92.5) AND (Education < 1.5) AND (Family <= 2.5) THEN Class = 0 (non-acceptor)
• Recursive partitioning: Repeatedly split the records into two parts so as to achieve maximum homogeneity within the new
parts

Recursive partitioning steps:

• Pick one of the predictor variables, xi
• Pick a value of xi, say si, that divides the training data into two (not necessarily equal) portions
• Measure how “pure” or homogeneous each of the resulting portions are
• “Pure” = containing records of mostly one class
• Algorithm tries with different variables (x) and different values of xi, , i.e., si to maximize purity in a split
• After you get a “maximum purity” split, repeat the process for a second split, and so on
Forming a tree from the given example

RID Age Income Student Credit rating Class (buys

computer)

1 <=30 High No Fair No

2 <=30 High No Excellent No
3 31-40 High No Fair Yes
4 >40 Medium No Fair Yes
5 >40 Low Yes Fair Yes
6 >40 Low Yes Excellent No
7 31-40 Low Yes Excellent Yes
8 <=30 Medium No Fair No
9 <=30 Low Yes Fair Yes
10 >40 Medium Yes Excellent Yes
11 <=30 Medium Yes Excellent Yes
12 30-40 Medium No Excellent Yes
13 30-40 High Yes Fair Yes
14 >40 Medium No Excellent No
Age

<=30 >40
31-40

income student Credit Class income student Credit Class

rating rating
High No fair No Medium No fair Yes
High No excellent No Low Yes Fair Yes
Medium No Fair No Low Yes Excellent No
Low Yes Fair Yes
Medium Yes Fair Yes
Medium Yes Excellent yes
Medium No Excellent No

income student Credit Class

rating
High No Fair Yes
Low Yes Excellent Yes
Medium No Excellent Yes
High Yes Fair Yes
Measuring Impurity
• Gini Index (measure of impurity)

• Gini Index for rectangle A containing m cases

p = proportion of cases in rectangle A that belong to class k

• I(A) = 0 when all cases belong to same class (most pure)

• Entropy (measure of impurity)

p = proportion of cases (out of m) in rectangle A that belong to class k

Entropy ranges between 0 (most pure) and log2(m) (equal representation of classes)
Using the principle of ‘Information entropy’ build a ‘decision tree’ using the training data given below. Divide the ‘credit
rating’ attribute into ranges as follows: (0, 1.6], (1.6,1.7], (1.7,1.8], (1.8,1.9], (1.9,2.0], (2.0,5.0]
Sr. No. Profession Credit rating Class
1 Business 1.6 Buys only laptop
2 Service 2.0 Buys laptop with CD Writer
3 Business 1.9 Buys laptop with printer
4 Business 1.88 Buys laptop with printer
5 Business 1.70 Buys only laptop
6 Service 1.85 Buys laptop with printer
7 Business 1.60 Buys only laptop
8 Service 1.70 Buys only laptop
9 Service 2.20 Buys laptop with CD writer
10 Service 2.10 Buys laptop with CD writer
11 Business 1.80 Buys laptop with printer
12 Service 1.95 Buys laptop with printer
13 Business 1.90 Buys laptop with printer
14 Business 1.80 Buys laptop with printer
15 Business 1.75 Buys laptop with printer
Profession

Business Service

Sr. Credit Class Sr. Credit Class

No. rating No. rating

1 1.6 Buys only laptop 1 2.0 Buys laptop with CD Writer

2 1.9 Buys laptop with printer

2 1.85 Buys laptop with printer
3 1.88 Buys laptop with printer

4 1.70 Buys only laptop 3 1.70 Buys only laptop

5 1.60 Buys only laptop 4 2.20 Buys laptop with CD writer

6 1.80 Buys laptop with printer

5 2.10 Buys laptop with CD writer

7 1.90 Buys laptop with printer

6 1.95 Buys laptop with printer
8 1.80 Buys laptop with printer

9 1.75 Buys laptop with printer

Credit Rating

(0,1.6]
(1.6,1.7]

Sr. Profession Class Sr. Professio Class

No. No. n

1 Business Buys only 1 Business Buys only

Laptop Laptop

2 Business Buys only

Laptop 2 Service Buys only
Laptop
• Initially there are 3 classes: Buys only laptop, buys laptop with CD writer, buys laptop with printer

• Initial Overall Entropy (E0)= -= - [ + ) = 0.918

• Based on Profession : 9 Business, 6 Service

• Entropy (Profession) = (service) =

• Information Gain (Profession) = E0-E(Profession) = 0.918-0.716= 0.202

• Entropy (CR (2,5])=Entropy(CR (0, 1.6])= Entropy (CR (1.6,1.7]) = Entropy (CR (1.7,1.8]) = Entropy( CR (1.8,1.9]) = 0

• Entropy (CR (1.9,2])= = 0.630

• Entropy (Credit Rating) = ++++ + = 0.0841

• Information Gain (Credit Rating) = 0.918-0.084 = 0.834

Credit Rating

(2,5] (1.9,2] (1.7,1.9] (0,1.7]

Buys laptop with CD Buys Laptop with

Profession (Service) Buys only laptop
Writer Printer

P=0.5 P=0.5

Buys laptop with CD Buys laptop with

Writer printer
The content of the slides are prepared from different textbooks.

References:

• Introduction to Data Mining with Case Studies, By G.K. Gupta. Copyright 2014 by PHI Learning Private Limited.
Thank you..

Unit 1 ML (NN& ML Techniques)
No ratings yet
Unit 1 ML (NN& ML Techniques)
40 pages
Unit 1 ML (DT)
No ratings yet
Unit 1 ML (DT)
24 pages
Lecture 11 Classification-1
No ratings yet
Lecture 11 Classification-1
30 pages
S&ML Unit 6 - Q & A
No ratings yet
S&ML Unit 6 - Q & A
12 pages
Presentation Report S2019 Artificial Intelligence-CS360
No ratings yet
Presentation Report S2019 Artificial Intelligence-CS360
9 pages
Lecture 5a
No ratings yet
Lecture 5a
24 pages
DM Unit 4
No ratings yet
DM Unit 4
24 pages
Classification & Prediction
No ratings yet
Classification & Prediction
24 pages
DM - 06 Mar 2025
No ratings yet
DM - 06 Mar 2025
13 pages
AI Chapter 3 Part 2
No ratings yet
AI Chapter 3 Part 2
51 pages
Unit 3 Notes
No ratings yet
Unit 3 Notes
25 pages
Dmi Unit 4
No ratings yet
Dmi Unit 4
34 pages
DataMining-Handouts1 5
No ratings yet
DataMining-Handouts1 5
8 pages
CH 5
No ratings yet
CH 5
84 pages
Machine Learning Classification Guide
No ratings yet
Machine Learning Classification Guide
84 pages
Decision Tree Learning Guide
No ratings yet
Decision Tree Learning Guide
33 pages
Unit 4 Classification
No ratings yet
Unit 4 Classification
15 pages
Classification With Decision Trees: Instructor: Qiang Yang
100% (1)
Classification With Decision Trees: Instructor: Qiang Yang
62 pages
Decision Tree Is An Upside
No ratings yet
Decision Tree Is An Upside
17 pages
Module 04
No ratings yet
Module 04
75 pages
Classification & Prediction Guide
100% (1)
Classification & Prediction Guide
67 pages
Decision Tree
No ratings yet
Decision Tree
6 pages
Classifiction
No ratings yet
Classifiction
42 pages
06-Classification Part1
No ratings yet
06-Classification Part1
44 pages
Decision Tree Learning
No ratings yet
Decision Tree Learning
22 pages
Decision Tree Basics for Data Scientists
No ratings yet
Decision Tree Basics for Data Scientists
61 pages
Decision Trees
67% (3)
Decision Trees
14 pages
Lecture 3 - Decision Trees and Random Forest
No ratings yet
Lecture 3 - Decision Trees and Random Forest
20 pages
Decision Tree Course Guide
No ratings yet
Decision Tree Course Guide
37 pages
Data Warehousing and Data Mining: Classification, Trees
No ratings yet
Data Warehousing and Data Mining: Classification, Trees
26 pages
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
No ratings yet
Jalali@mshdiua - Ac.ir Jalali - Mshdiau.ac - Ir: Data Mining
50 pages
Unit 3.2 Decision Tree Algorithm Wit Examples
No ratings yet
Unit 3.2 Decision Tree Algorithm Wit Examples
85 pages
Decision Tree
No ratings yet
Decision Tree
22 pages
Unit 3 (MLT)
No ratings yet
Unit 3 (MLT)
42 pages
Data Minning Unit 5 PDF
No ratings yet
Data Minning Unit 5 PDF
19 pages
Business Analytics: Foundation: Material Handouts
No ratings yet
Business Analytics: Foundation: Material Handouts
7 pages
ML Unit 2 Final - III Yr
No ratings yet
ML Unit 2 Final - III Yr
72 pages
Unit 5
No ratings yet
Unit 5
25 pages
Classification and Prediction Guide
No ratings yet
Classification and Prediction Guide
93 pages
Week 6 - 7 - Classification
No ratings yet
Week 6 - 7 - Classification
67 pages
Unit 3 Data Mining
No ratings yet
Unit 3 Data Mining
21 pages
Classification With Decision Trees I: Instructor: Qiang Yang
No ratings yet
Classification With Decision Trees I: Instructor: Qiang Yang
29 pages
Decision - Tree
No ratings yet
Decision - Tree
75 pages
6CS4-02 Machine Learning Manish Bhardwaj
No ratings yet
6CS4-02 Machine Learning Manish Bhardwaj
625 pages
Lecture 8
No ratings yet
Lecture 8
28 pages
DWDM Unit 4
No ratings yet
DWDM Unit 4
80 pages
Classification
No ratings yet
Classification
33 pages
Lecture 6 - Decision Trees
No ratings yet
Lecture 6 - Decision Trees
43 pages
Machine - Learning - Lecture - 08 - Decision Tree Learning
No ratings yet
Machine - Learning - Lecture - 08 - Decision Tree Learning
67 pages
Trees
No ratings yet
Trees
78 pages
Class Basic
No ratings yet
Class Basic
75 pages
DM Unit Iii
No ratings yet
DM Unit Iii
87 pages
Decision Tree
No ratings yet
Decision Tree
58 pages
Decision Tree Induction Basics
No ratings yet
Decision Tree Induction Basics
55 pages
08ClassBasic L
No ratings yet
08ClassBasic L
78 pages
Ecture Ecision REE: Sajal Halder Bsmrstu
100% (1)
Ecture Ecision REE: Sajal Halder Bsmrstu
22 pages
Unit 3 Classification
No ratings yet
Unit 3 Classification
71 pages
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
No ratings yet
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
129 pages
1.decision Trees Concepts
No ratings yet
1.decision Trees Concepts
70 pages
Distribution
No ratings yet
Distribution
6 pages
Financial Reporting and Analysis - Course Outline - 2022
No ratings yet
Financial Reporting and Analysis - Course Outline - 2022
6 pages
Managerial Economics for PGDM
No ratings yet
Managerial Economics for PGDM
6 pages
22PGDM172 - Karishma Parvathaneni - C
No ratings yet
22PGDM172 - Karishma Parvathaneni - C
10 pages
17-Three Phase Transformers Part2
No ratings yet
17-Three Phase Transformers Part2
7 pages
FAS - Technical Submittal - Honyewell-Esser
No ratings yet
FAS - Technical Submittal - Honyewell-Esser
208 pages
HW6 Part 2 Solution
No ratings yet
HW6 Part 2 Solution
6 pages
Soil Liquid & Plastic Limit Testing
100% (1)
Soil Liquid & Plastic Limit Testing
3 pages
BCA Chap 1
No ratings yet
BCA Chap 1
28 pages
Week4 Questions
No ratings yet
Week4 Questions
4 pages
Petrel 2009.1 Petrel 2009.1: System Requirements System Requirements
No ratings yet
Petrel 2009.1 Petrel 2009.1: System Requirements System Requirements
4 pages
Hda Moc
No ratings yet
Hda Moc
2 pages
5G Bearer Network by IP
100% (1)
5G Bearer Network by IP
26 pages
Mizoram Board HSSLC Date
No ratings yet
Mizoram Board HSSLC Date
5 pages
IT131-8L Machine Problem Answer Sheet
No ratings yet
IT131-8L Machine Problem Answer Sheet
3 pages
Trashbot: A Beach Cleaning Robot
No ratings yet
Trashbot: A Beach Cleaning Robot
5 pages
Pre-Engineered Building Systems
100% (1)
Pre-Engineered Building Systems
44 pages
Data Sheet HS2.4S-53: Specification Shaft Seal
No ratings yet
Data Sheet HS2.4S-53: Specification Shaft Seal
4 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
2 pages
Microsoft Office Basics
100% (1)
Microsoft Office Basics
11 pages
Mechanical Engineer Resume
No ratings yet
Mechanical Engineer Resume
3 pages
Heatexchanger Design Rev 1
No ratings yet
Heatexchanger Design Rev 1
3 pages
Servodrayver Seriyi D1 Hiwin ENG 1 PDF
No ratings yet
Servodrayver Seriyi D1 Hiwin ENG 1 PDF
35 pages
MS Word 2016 Shortcuts
No ratings yet
MS Word 2016 Shortcuts
1 page
1989 240sx FSM
100% (1)
1989 240sx FSM
952 pages
Cabcode PDF
No ratings yet
Cabcode PDF
3 pages
CAL-1 Power System
100% (1)
CAL-1 Power System
72 pages
AATTHH DSH330000-22 User Manual
No ratings yet
AATTHH DSH330000-22 User Manual
63 pages
Fagor - Washing Machines and Dryers
No ratings yet
Fagor - Washing Machines and Dryers
35 pages
Brosur - Ramus Hydrant Equipment
No ratings yet
Brosur - Ramus Hydrant Equipment
5 pages
Irs2004 (S) PBF: Half-Bridge Driver
No ratings yet
Irs2004 (S) PBF: Half-Bridge Driver
15 pages
Project: Epoxy Treatment of Grooved PSC Sleepers at Curves On High Density Lines - Hubli DN / SWR
No ratings yet
Project: Epoxy Treatment of Grooved PSC Sleepers at Curves On High Density Lines - Hubli DN / SWR
23 pages
Re91001 01 X b2 - 2016 04
No ratings yet
Re91001 01 X b2 - 2016 04
30 pages

Decision Tree Introduction

Uploaded by

Decision Tree Introduction

Uploaded by

Data Analytics

Session 4-5:Decision Trees / Classification and Regression Trees

Classification and Regression Trees

Recursive partitioning steps:

RID Age Income Student Credit rating Class (buys

1 <=30 High No Fair No

income student Credit Class income student Credit Class

income student Credit Class

• Gini Index for rectangle A containing m cases

p = proportion of cases in rectangle A that belong to class k

• I(A) = 0 when all cases belong to same class (most pure)

• Entropy (measure of impurity)

p = proportion of cases (out of m) in rectangle A that belong to class k

Sr. Credit Class Sr. Credit Class

1 1.6 Buys only laptop 1 2.0 Buys laptop with CD Writer

2 1.9 Buys laptop with printer

4 1.70 Buys only laptop 3 1.70 Buys only laptop

5 1.60 Buys only laptop 4 2.20 Buys laptop with CD writer

6 1.80 Buys laptop with printer

7 1.90 Buys laptop with printer

9 1.75 Buys laptop with printer

Sr. Profession Class Sr. Professio Class

1 Business Buys only 1 Business Buys only

2 Business Buys only

• Initial Overall Entropy (E0)= -= - [ + ) = 0.918

• Based on Profession : 9 Business, 6 Service

• Entropy (Profession) = (service) =

• Information Gain (Profession) = E0-E(Profession) = 0.918-0.716= 0.202

• Entropy (CR (1.9,2])= = 0.630

• Entropy (Credit Rating) = ++++ + = 0.0841

• Information Gain (Credit Rating) = 0.918-0.084 = 0.834

(2,5] (1.9,2] (1.7,1.9] (0,1.7]

Buys laptop with CD Buys Laptop with

Buys laptop with CD Buys laptop with

You might also like