0% found this document useful (0 votes)

117 views104 pages

Lecture Notes For Chapter 4 Introduction To Data Mining: by Tan, Steinbach, Kumar

DATA

Uploaded by

Samia Elsayed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

117 views104 pages

Lecture Notes For Chapter 4 Introduction To Data Mining: by Tan, Steinbach, Kumar

DATA

Uploaded by

Samia Elsayed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPT, PDF, TXT or read online on Scribd

You are on page 1/ 104

Data Mining

Classification: Basic Concepts,

Decision Trees, and Model Evaluation
Lecture Notes for Chapter 4
Introduction to Data Mining
by
Tan, Steinbach, Kumar

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

Classification: Definition

Given a collection of records (training set )

Each record contains a set of attributes, one of the
attributes is the class.

Find a model for class attribute as a function

of the values of other attributes.
Goal: previously unseen records should be
assigned a class as accurately as possible.
A test set is used to determine the accuracy of the
model. Usually, the given data set is divided into
training and test sets, with training set used to build
the model and test set used to validate it.

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

Illustrating Classification Task

Attrib1 = yes Class = No

Attrib1 = No = Attrib3 < 95K Class = Yes
Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

Examples of Classification Task

Predicting

tumor cells as benign or malignant

Classifying

credit card transactions

as legitimate or fraudulent

Classifying

secondary structures of protein

as alpha-helix, beta-sheet, or random
coil

Categorizing

news stories as finance,

weather, entertainment, sports, etc

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

Classification Techniques
Decision

Tree based Methods

Rule-based Methods

Memory based reasoning

Neural Networks

Nave Bayes and Bayesian Belief Networks

Support Vector Machines
Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

Example of a Decision Tree

go
e
t

al
c
ri
ca

go
e
t

al
c
ri

us
o
u
in
t
ss
n
a
cl
co

Tid Refund Marital

Status

Taxable
Income Cheat

Yes

Single

125K

Married

100K

Single

70K

Yes

Married

120K

Divorced 95K

Yes

Married

Yes

Divorced 220K

Single

85K

Yes

Married

75K

Single

90K

Yes

60K

Splitting Attributes

Refund
Yes

MarSt
Single, Divorced
TaxInc

< 80K
NO

Married
NO

> 80K
YES

Model: Decision Tree

Training Data
Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

Another Example of Decision Tree

al
al
us
c
c
i
i
o
or
or
nu
i
g
g
t
ss
e
e
t
t
n
a
cl
ca
ca
co
Tid Refund Marital
Status

Taxable
Income Cheat

Yes

Single

125K

Married

100K

Single

70K

Yes

Married

120K

Divorced 95K

Yes

Married

Yes

Divorced 220K

Single

85K

Yes

Married

75K

Single

90K

Yes

60K

Married

MarSt

Single,
Divorced
Refund
No

Yes
NO

TaxInc
< 80K

> 80K

YES

There could be more than one tree that

fits the same data!

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

Decision Tree Classification Task

Decision
Tree

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

Apply Model to Test Data

Test Data
Start from the root of tree.
Refund
Yes

Refund Marital
Status

Taxable
Income Cheat

80K

Married

MarSt
Single, Divorced
TaxInc
< 80K
NO

Tan,Steinbach, Kumar

Married
NO

> 80K
YES

Introduction to Data Mining

4/18/2004

Apply Model to Test Data

Test Data

Refund
Yes

Refund Marital
Status

Taxable
Income Cheat

80K

Married

MarSt
Single, Divorced
TaxInc
< 80K
NO

Tan,Steinbach, Kumar

Married
NO

> 80K
YES

Introduction to Data Mining

4/18/2004

Apply Model to Test Data

Test Data

Refund
Yes

Refund Marital
Status

Taxable
Income Cheat

80K

Married

MarSt
Single, Divorced
TaxInc
< 80K
NO

Tan,Steinbach, Kumar

Married
NO

> 80K
YES

Introduction to Data Mining

4/18/2004

Apply Model to Test Data

Test Data

Refund
Yes

Refund Marital
Status

Taxable
Income Cheat

80K

Married

MarSt
Single, Divorced
TaxInc
< 80K
NO

Tan,Steinbach, Kumar

Married
NO

> 80K
YES

Introduction to Data Mining

4/18/2004

Apply Model to Test Data

Test Data

Refund
Yes

Refund Marital
Status

Taxable
Income Cheat

80K

Married

MarSt
Single, Divorced
TaxInc
< 80K
NO

Tan,Steinbach, Kumar

Married
NO

> 80K
YES

Introduction to Data Mining

4/18/2004

Apply Model to Test Data

Test Data

Refund
Yes

Refund Marital
Status

Taxable
Income Cheat

80K

Married

MarSt
Single, Divorced
TaxInc
< 80K
NO

Tan,Steinbach, Kumar

Married

Assign Cheat to No

NO
> 80K
YES

Introduction to Data Mining

4/18/2004

Decision Tree Classification Task

Decision
Tree

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

Decision Tree Induction

How to build a decision tree from a training set?

Many Algorithms:
Hunts Algorithm (one of the earliest)
CART
ID3 (Iterator Dichotomizer) , C4.5 (Quinlan
1986, 1993) (C5: See5) Demo
SLIQ,SPRINT (IBM 1996)

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

General Structure of Hunts

Algorithm

Let Dt be the set of training records that

reach a node t
General Procedure:
If Dt contains records that belong the
same class yt, then t is a leaf node
labeled as yt
Dt
t (cheat)

If Dt is an empty set, then t is a leaf

node labeled by the default class, y d
If Dt contains records that belong to
more than one class, use an attribute
test to split the data into smaller
subsets. Recursively apply the
procedure to each subset.
Dt

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

Hunts Algorithm
Dont
Cheat

Refund
Yes

No
Dont
Cheat

Dont
Cheat

Refund

Refund
Yes

Yes

Dont
Cheat
Single,
Divorced

Cheat

Dont
Cheat

Marital
Status
Married

Single,
Divorced

Marital
Status
Married
Dont
Cheat

Taxable
Income

Dont
Cheat

Tan,Steinbach, Kumar

< 80K

>= 80K

Dont
Cheat

Cheat

Introduction to Data Mining

4/18/2004

Tree Induction
Greedy

strategy.
greedy search through the space of possible
decision trees
Split the records based on an attribute test that
optimizes certain criterion
.
Issues
Determine how to split the records
How

to specify the attribute test condition?

How to determine the best split?

Determine when to stop splitting

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

Tree Induction
Greedy

strategy.
Split the records based on an attribute test
that optimizes certain criterion.

Issues

Determine how to split the records

How

to specify the attribute test condition?

E.g. X < 1? or X+Y < 1?
How to determine the best split?

Determine when to stop splitting

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

How to Specify Test Condition?

Depends

on attribute types
Nominal
Ordinal
Continuous

Depends

on number of ways to split

2-way split
Multi-way split

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

Splitting Based on Nominal

Attributes

Multi-way split: Use as many partitions as distinct

values.
CarType

Luxury

Family
Sports

Binary split: Divides values into two subsets.

Need to find optimal partitioning.
{Sports,
Luxury}

CarType

Tan,Steinbach, Kumar

{Family}

Introduction to Data Mining

{Family,
Luxury}

CarType
{Sports}

4/18/2004

Splitting Based on Ordinal

Attributes

Multi-way split: Use as many partitions as distinct

values.
Size
Small
Medium

Binary split: Divides values into two subsets.

Need to find optimal partitioning.
{Small,
Medium}

Large

Size
{Large}

What about this split?

Tan,Steinbach, Kumar

{Small,
Large}

Introduction to Data Mining

{Medium,
Large}

Size
{Small}

Size
{Medium}
4/18/2004

Splitting Based on Continuous

Attributes

Different ways of handling

Discretization to form an ordinal categorical attribute

Static discretize once at the beginning :

Dynamic ranges can be found by equal interval

bucketing, equal frequency bucketing
(percentiles), or clustering.

, :
. ( )

Binary Decision: (A < v) or (A v)

consider all possible splits and finds the best cut

can be more compute intensive

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

Splitting Based on Continuous

Attributes

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

Tree Induction
Greedy

strategy.
Split the records based on an attribute test
that optimizes certain criterion.

Issues

Determine how to split the records

How

to specify the attribute test condition?

How to determine the best split?

Determine when to stop splitting

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

How to determine the Best Split

Before Splitting: 10 records of class 0,
10 records of class 1

Which test condition is the best?

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

How to determine the Best Split

Greedy approach:
Nodes with homogeneous class distribution
are preferred

Need a measure of node impurity:

Non-homogeneous,

Homogeneous,

High degree of impurity

Low degree of impurity

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

Measures of Node Impurity

Gini

Index

Entropy

D .

Misclassification

error

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

How to Find the Best Split

Before Splitting:

A?
Yes

B?
No

Yes

Node N1

Node N2

Node N3

Node N4

M12

M34
Gain = M0 M12 vs M0 M34

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

Measure of Impurity: GINI

Gini Index for a given node t :

GINI (t ) 1 [ p ( j | t )]2
j

(NOTE: p( j | t) is the relative frequency of class j at node t).

Maximum (1 - 1/nc) when records are equally distributed

among all classes, implying least interesting
information
Minimum (0.0) when all records belong to one class,
implying most interesting information
C1
C2

0
6

Gini=0.000

C1
C2

1
5

Gini=0.278

C1
C2

2
4

Gini=0.444

C1
C2

3
3

Gini=0.500

Gini = 1-(1/6)2 (5/6)2 = 1- 1/36 25/36 = 0.278

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

Examples for computing GINI

GINI (t ) 1 [ p ( j | t )]2
j

C1
C2

0
6

P(C1) = 0/6 = 0

C1
C2

1
5

P(C1) = 1/6

C1
C2

2
4

P(C1) = 2/6

Tan,Steinbach, Kumar

P(C2) = 6/6 = 1

Gini = 1 P(C1)2 P(C2)2 = 1 0 1 = 0

P(C2) = 5/6

Gini = 1 (1/6)2 (5/6)2 = 0.278

P(C2) = 4/6

Gini = 1 (2/6)2 (4/6)2 = 0.444

Introduction to Data Mining

4/18/2004

Splitting Based on GINI

Used in CART, SLIQ, SPRINT.

When a node p is split into k partitions (children), the
quality of split is computed as,

GINI split
where,

ni
GINI (i )
i 1 n

ni = number of records at child i,

n = number of records at node p.

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

Binary Attributes: Computing GINI Index

Splits into two partitions

Effect of Weighing partitions:
Larger and Purer Partitions are
sought for.
Parent
B?
Yes

Gini = 0.500

Gini(N1)
= 1 (5/6)2 (2/6)2
= 0.194
Gini(N2)
= 1 (1/6)2 (4/6)2
= 0.528
Tan,Steinbach, Kumar

Node N1

Node N2

C1
C2

N1 N2
5
1
2
4

Gini=0.333
Introduction to Data Mining

Gini(Children)
= 7/12 * 0.194 +
5/12 * 0.528
= 0.333
4/18/2004

Categorical Attributes: Computing Gini

Index

For each distinct value, gather counts for each class in

the dataset
Use the count matrix to make decisions
Two-way split
(find best partition of values)

Multi-way split
CarType
C1
C2
Gini

Family Sports Luxury

1
2
1
4
1
1
0.393

Tan,Steinbach, Kumar

C1
C2
Gini

CarType
{Sports,
{Family}
Luxury}
3
1
2
4
0.400

Introduction to Data Mining

C1
C2
Gini

CarType
{Family,
{Sports}
Luxury}
2
2
1
5
0.419

4/18/2004

Continuous Attributes: Computing Gini

Index

Use Binary Decisions based on one

value
Several Choices for the splitting value
Number of possible splitting values
= Number of distinct values
Each splitting value has a count matrix
associated with it
Class counts in each of the
partitions, A < v and A v
Simple method to choose best v
For each v, scan the database to
gather count matrix and compute
its Gini index
Computationally Inefficient!
Repetition of work.

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

Continuous Attributes: Computing Gini

Index...

For efficient computation: for each attribute,

Sort the attribute on values
Linearly scan these values, each time updating the count matrix and
computing gini index
Choose the split position that has the least gini index

Cheat

Yes

100

120

125

220

Taxable Income
60

Sorted Values
Split Positions

110

122

172

230

Yes

Gini

Tan,Steinbach, Kumar

0.420

0.400

0.375

0.343

0.417

Introduction to Data Mining

0.400

0.300

0.343

0.375

0.400

4/18/2004

0.420

Alternative Splitting Criteria based on

INFO
Entropy at a given node t:

Entropy (t ) p ( j | t ) log p ( j | t )
j

(NOTE: p( j | t) is the relative frequency of class j at node t).

Measures homogeneity of a node.

Maximum

(log nc) when records are equally distributed among all

classes implying least information

Minimum

(0.0) when all records belong to one class, implying

most information

Entropy based computations are similar to the GINI index

computations
Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

Examples for computing Entropy

Entropy (t ) p ( j | t ) log p ( j | t )
j

C1
C2

0
6

P(C1) = 0/6 = 0

C1
C2

1
5

P(C1) = 1/6

C1
C2

2
4

P(C1) = 2/6

Tan,Steinbach, Kumar

P(C2) = 6/6 = 1

Entropy = 0 log 0 1 log 1 = 0 0 = 0

P(C2) = 5/6

5/6

Entropy = (1/6) log2 (1/6) (5/6) log2 (1/6) = 0.65

P(C2) = 4/6

Entropy = (2/6) log2 (2/6) (4/6) log2 (4/6) = 0.92

Introduction to Data Mining

4/18/2004

Splitting Based on INFO...

Information Gain: = )
(

GAIN

split

Entropy ( p )

Parent Node, p is split into k partitions;

ni is number of records in partition i

Entropy (i )
n

i 1

Measures Reduction in Entropy achieved because of the split. Choose

the split that achieves most reduction (maximizes GAIN)

Used in ID3 and C4.5
Disadvantage: Tends to prefer splits that result in large number of
partitions, each being small but pure.

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

Splitting Based on INFO...

Gain Ratio:

GainRATIO

GAIN
n
n

SplitINFO log
SplitINFO
n
n
k

Split

split

i 1

Parent Node, p is split into k partitions

ni is the number of records in partition i

Adjusts Information Gain by the entropy of the partitioning

(SplitINFO). Higher entropy partitioning (large number of small
partitions) is penalized!

GAIN

produced by the split!

split

is penalized when large number of small partitions are

SplitINFO increases when a larger number of small partitions is produced.

Used in C4.5
Designed to overcome the disadvantage of Information Gain
Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

Splitting Criteria based on Classification

Error
Classification error at a node t :

Error (t ) 1 max P (i | t )
i

Measures misclassification error made by a node.

Maximum

(1 - 1/nc) when records are equally distributed

among all classes, implying least interesting information

Minimum

(0.0) when all records belong to one class, implying

most interesting information

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

Examples for Computing Error

Error (t ) 1 max P (i | t )
i

C1
C2

0
6

P(C1) = 0/6 = 0

C1
C2

1
5

P(C1) = 1/6

C1
C2

2
4

P(C1) = 2/6

Tan,Steinbach, Kumar

P(C2) = 6/6 = 1

Error = 1 max (0, 1) = 1 1 = 0

P(C2) = 5/6

Error = 1 max (1/6, 5/6) = 1 5/6 = 1/6

P(C2) = 4/6

Error = 1 max (2/6, 4/6) = 1 4/6 = 1/3

Introduction to Data Mining

4/18/2004

Comparison among Splitting

Criteria
For a 2-class problem:

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

Misclassification Error vs Gini

Parent

A?
Yes

Node N1

Gini(N1)
= 1 (3/3)2 (0/3)2
=0
Gini(N2)
= 1 (4/7)2 (3/7)2
= 0.489

Node N2

C1
C2

N1
3
0

N2
4
3

Gini=0.361

Gini = 0.42

Gini(Children)
= 3/10 * 0
+ 7/10 * 0.489
= 0.342

Gini improves !!

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

Tree Induction
Greedy

strategy.
Split the records based on an attribute test
that optimizes certain criterion.

Issues

Determine how to split the records

How

to specify the attribute test condition?

How to determine the best split?

Determine when to stop splitting

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

Stopping Criteria for Tree

Induction
Stop

expanding a node when all the records

belong to the same class

Stop expanding a node when all the records have
similar attribute values

Early termination (to be discussed later)

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

Decision Tree Based

Classification

Advantages:

Inexpensive to construct
Extremely fast at classifying unknown records
Easy to interpret for small-sized trees
Accuracy is comparable to other classification
techniques for many simple data sets

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

Example: C4.5
Simple

depth-first construction.
Uses Information Gain
Sorts Continuous Attributes at each node.
Needs entire data to fit in memory.
Unsuitable for Large Datasets.
Needs out-of-core sorting.
You

can download the software from:

http://www.cse.unsw.edu.au/~quinlan/c4.5r8.tar.gz

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

Practical Issues of Classification

Underfitting
Missing
Costs

and Overfitting

Values

of Classification

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

Underfitting and Overfitting

(Example)
500 circular and 500
triangular data points.

Circular points:
0.5 sqrt(x12+x22) 1

Triangular points:
sqrt(x12+x22) > 0.5 or
sqrt(x12+x22) < 1

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

Underfitting and Overfitting

Overfitting

Underfitting: when model is too simple, both training and test errors are large
Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

Overfitting due to Noise

Decision boundary is distorted by noise point

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

Overfitting due to Insufficient

Examples

Lack of data points in the lower half of the diagram makes it difficult
to predict correctly the class labels of that region
-Insufficient

number of training records in the region causes the

decision tree to predict the test examples using other training
records that are irrelevant to the classification task
-

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

Notes on Overfitting
Overfitting

results in decision trees that are more

complex than necessary

Training

error no longer provides a good estimate

of how well the tree will perform on previously
unseen records

Need

new ways for estimating errors

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

Estimating Generalization Errors

Re-substitution errors: error on training ( e(t) )

Generalization errors: error on testing ( e(t))
Methods for estimating generalization errors:
Optimistic approach: e(t) = e(t)
Pessimistic approach:

For each leaf node: e(t) = (e(t)+0.5)

Total errors: e(T) = e(T) + N 0.5 (N: number of leaf nodes)
For a tree with 30 leaf nodes and 10 errors on training
(out of 1000 instances):
Training error = 10/1000 = 1%
Generalization error = (10 + 300.5)/1000 = 2.5%

Reduced error pruning (REP):

uses validation data set to estimate generalization

error

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

Occams Razor
Given two models of similar generalization errors,
one should prefer the simpler model over the
more complex model

For complex models, there is a greater chance
that it was fitted accidentally by errors in data

Therefore, one should include model complexity

when evaluating a model

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

Minimum Description Length

(MDL)

X
X1
X2
X3
X4

y
1
0
0
1

B
1

C
1

X
X1
X2
X3
X4

y
?
?
?
?

Cost(Model,Data) = Cost(Data|Model) + Cost(Model)

Cost is the number of bits needed for encoding.
Search for the least costly model.
Cost(Data|Model) encodes the misclassification errors.

Cost(Model) uses node encoding (number of children) plus
splitting condition encoding.

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

How to Address Overfitting

Pre-Pruning (Early Stopping Rule)

Stop the algorithm before it becomes a fully-grown tree

Typical stopping conditions for a node:

Stop if all instances belong to the same class

Stop if all the attribute values are the same

More restrictive conditions:

Stop if number of instances is less than some user-specified
threshold ) (

Stop if class distribution of instances are independent of the

available features (e.g., using 2 test)

Stop if expanding the current node does not improve impurity

measures (e.g., Gini or information gain).

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

How to Address Overfitting

Post-pruning
Grow decision tree to its entirety
Trim the nodes of the decision tree in a bottomup fashion
If generalization error improves after trimming,
replace sub-tree by a leaf node.
Class label of leaf node is determined from
majority class of instances in the sub-tree

Can use MDL for post-pruning

The minimum description length (MDL)principle
Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

Example of Post-Pruning
Training Error (Before splitting) = 10/30
Class = Yes

Pessimistic error = (10 + 0.5)/30 = 10.5/30

Class = No

Training Error (After splitting) = 9/30

Pessimistic error (After splitting)

Error = 10/30

= (9 + 4 0.5)/30 = 11/30
PRUNE!

A?
A1

A4
A3

A2
Class = Yes

Class = Yes

Class = No

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

Examples of Post-pruning
Optimistic error?

Case 1:

Dont prune for both cases

Pessimistic error?

C0: 11
C1: 3

C0: 2
C1: 4

C0: 14
C1: 3

C0: 2
C1: 2

Dont prune case 1, prune case 2

Reduced error pruning?

Case 2:

Depends on validation set

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

Handling Missing Attribute Values

Missing

values affect decision tree construction in

three different ways:
Affects how impurity measures are computed
Affects how to distribute instance with missing
value to child node
Affects how a test instance with missing value
is classified

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

Computing Impurity Measure

Before Splitting:
Entropy(Parent)
= -0.3 log(0.3)-(0.7)log(0.7) = 0.8813

Refund=Yes
Refund=No
Refund=?

Class Class
= Yes = No
0
3
2
4
1

Split on Refund:
Entropy(Refund=Yes) = 0
Entropy(Refund=No)
= -(2/6)log(2/6) (4/6)log(4/6) = 0.9183

Missing
value
Tan,Steinbach, Kumar

Entropy(Children)
= 0.3 (0) + 0.6 (0.9183) = 0.551
Gain = 0.9 (0.8813 0.551) = 0.3303
Introduction to Data Mining

4/18/2004

Distribute Instances

Refund
Yes

Class = no
Refund =
yes

Refund
Yes

Probability that Refund=Yes is 3/9

Probability that Refund=No is 6/9
Assign record to the left child with
weight = 3/9 and to the right child
with weight = 6/9

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

Classify Instances
New record:

Married

Refund
Yes
NO

Single

Divorced Total

Class=No

Class=Yes

6/9

2.67

Total

3.67

6.67

No
Single,
Divorced

MarSt
Married

TaxInc
< 80K
NO

Tan,Steinbach, Kumar

NO
> 80K

Probability that Marital Status

= Married is 3.67/6.67
Probability that Marital Status
={Single,Divorced} is 3/6.67

YES

Introduction to Data Mining

4/18/2004

Other Issues
Data

Fragmentation
Search Strategy
Expressiveness
Tree Replication

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

Data Fragmentation
Number

of instances gets smaller as you traverse

down the tree

Number

of instances at the leaf nodes could be

too small to make any statistically significant
decision

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

Search Strategy
Finding

an optimal decision tree is NP-hard

NP=negative positive

The

algorithm presented so far uses a greedy,

top-down, recursive partitioning strategy to
induce a reasonable solution

Other

strategies?
Bottom-up
Bi-directional

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

Expressiveness

Decision tree provides expressive representation for

learning discrete-valued function
But they do not generalize well to certain types of
Boolean functions

Example: parity function:

Class = 1 if there is an even number of Boolean attributes with truth
value = True
Class = 0 if there is an odd number of Boolean attributes with truth
value = True

For accurate modeling, must have a complete tree

Not expressive enough for modeling continuous variables

Particularly when test condition involves only a single
attribute at-a-time

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

Decision Boundary

Border line between two neighboring regions of different classes is

known as decision boundary

Decision boundary is parallel to axes because test condition involves

a single attribute at-a-time

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

Oblique Decision Trees

x+y<1

Class = +

Class =

Test condition may involve multiple attributes

More expressive representation

Finding optimal test condition is computationally expensive

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

Tree Replication
P

Same subtree appears in multiple branches

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

Model Evaluation
Metrics

for Performance Evaluation

How to evaluate the performance of a model?

Methods

for Performance Evaluation

How to obtain reliable estimates?

Methods

for Model Comparison

How to compare the relative performance
among competing models?

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

Model Evaluation
Metrics

for Performance Evaluation

How to evaluate the performance of a model?

Methods

for Performance Evaluation

How to obtain reliable estimates?

Methods

for Model Comparison

How to compare the relative performance
among competing models?

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

Metrics for Performance

Evaluation

Focus on the predictive capability of a model

Rather than how fast it takes to classify or build

models, scalability, etc.
Confusion Matrix:
PREDICTED CLASS
Class=Yes

ACTUAL
CLASS

Class=No

Class=Yes

Class=No

Tan,Steinbach, Kumar

Introduction to Data Mining

a: TP (true positive)
b: FN (false negative)
c: FP (false positive)
d: TN (true negative)

4/18/2004

Metrics for Performance

Evaluation
PREDICTED CLASS
Class=Yes

ACTUAL Class=Yes
CLASS
Class=No

Most

Class=No

a
(TP)

b
(FN)

c
(FP)

d
(TN)

widely-used metric:

ad
TP TN
Accuracy

a b c d TP TN FP FN
Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

Limitation of Accuracy
Consider

a 2-class problem
Number of Class 0 examples = 9990
Number of Class 1 examples = 10

model predicts everything to be class 0,

accuracy is 9990/10000 = 99.9 %
Accuracy is misleading because model
does not detect any class 1 example

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

Cost Matrix
PREDICTED CLASS
C(i|j)

Class=Yes

ACTUAL Class=Yes C(Yes|Yes)

CLASS
Class=No

C(Yes|No)

Class=No
C(No|Yes)
C(No|No)

C(i|j): Cost of misclassifying class j example as class I

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

Computing Cost of Classification

Cost
Matrix

PREDICTED CLASS

ACTUAL
CLASS

Model M1

C(i|j)

-1

100

PREDICTED CLASS

ACTUAL
CLASS

150

250

Accuracy = 80%
Cost = 3910
Tan,Steinbach, Kumar

Model M2

ACTUAL
CLASS

PREDICTED CLASS

250

200

Accuracy = 90%
Cost = 4255
Introduction to Data Mining

4/18/2004

Cost vs Accuracy
PREDICTED CLASS

Count

Class=Yes
Class=Yes

ACTUAL
CLASS

Class=No

Accuracy is proportional to cost if

1. C(Yes|No)=C(No|Yes) = q
2. C(Yes|Yes)=C(No|No) = p

b
N=a+b+c+d

Class=No

d
Accuracy = (a + d)/N

PREDICTED CLASS

Cost

Class=Yes

ACTUAL
CLASS

Class=No

Cost = p (a + d) + q (b + c)
= p (a + d) + q (N a d)

Class=Yes

= q N (q p)(a + d)

Class=No

= N [q (q-p) Accuracy]

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

Cost-Sensitive Measures
a
Precision (p)
ac
a
Recall (r)
ab
2rp
2a
F - measure (F)

r p 2a b c

Precision is biased towards C(Yes|Yes) & C(Yes|No)

Recall is biased towards C(Yes|Yes) & C(No|Yes)
F-measure is biased towards all except C(No|No)

wa w d
Weighted Accuracy
wa wb wc w d
1

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

Model Evaluation
Metrics

for Performance Evaluation

How to evaluate the performance of a model?

Methods

for Performance Evaluation

How to obtain reliable estimates?

Methods

for Model Comparison

How to compare the relative performance
among competing models?

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

Methods for Performance

Evaluation
How

to obtain a reliable estimate of

performance?

Performance

of a model may depend on other

factors besides the learning algorithm:
Class distribution
Cost of misclassification
Size of training and test sets

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

Learning Curve

Learning curve shows

how accuracy changes
with varying sample size

Requires a sampling
schedule for creating
learning curve:

Arithmetic sampling
(Langley, et al)

Geometric sampling
(Provost et al)

Effect of small sample size:

Tan,Steinbach, Kumar

Introduction to Data Mining

Bias in the estimate

Variance of estimate
4/18/2004

Methods of Estimation

Holdout
Reserve 2/3 for training and 1/3 for testing
Random subsampling
Repeated holdout
Cross validation
Partition data into k disjoint subsets
k-fold: train on k-1 partitions, test on the remaining one
Leave-one-out: k=n
Stratified sampling
oversampling vs undersampling
Bootstrap
Sampling with replacement

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

Model Evaluation
Metrics

for Performance Evaluation

How to evaluate the performance of a model?

Methods

for Performance Evaluation

How to obtain reliable estimates?

Methods

for Model Comparison

How to compare the relative performance
among competing models?

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

ROC (Receiver Operating

Characteristic)
Developed

in 1950s for signal detection theory to

analyze noisy signals
Characterize the trade-off between positive
hits and false alarms
ROC curve plots TP (on the y-axis) against FP
(on the x-axis)
Performance of each classifier represented as a
point on the ROC curve
changing the threshold of algorithm, sample
distribution or cost matrix changes the location
of the point
Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

ROC Curve
- 1-dimensional data set containing 2 classes (positive and negative)
- any points located at x > t is classified as positive

At threshold t:
TP=0.5, FN=0.5, FP=0.12, FN=0.88
Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

ROC Curve
(TP,FP):
(0,0): declare everything
to be negative class
(1,1): declare everything
to be positive class
(1,0): ideal

Diagonal line:
Random guessing
Below diagonal line:
prediction is opposite of
the true class

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

Using ROC for Model Comparison

No model consistently
outperform the other
M1 is better for
small FPR
M2 is better for
large FPR

Area Under the ROC

curve

Tan,Steinbach, Kumar

Introduction to Data Mining

Ideal:
Area = 1
Random guess:
Area = 0.5

4/18/2004

How to Construct an ROC curve

Use classifier that produces
posterior probability for each
test instance P(+|A)

Instance

P(+|A)

True Class

0.95

0.93

0.87

0.85

Sort the instances according

to P(+|A) in decreasing order

0.85

0.76

0.53

0.43

0.25

Tan,Steinbach, Kumar

Apply threshold at each

unique value of P(+|A)

Count the number of TP, FP,

TN, FN at each threshold

TP rate, TPR = TP/(TP+FN)

FP rate, FPR = FP/(FP + TN)

Introduction to Data Mining

4/18/2004

How to construct an ROC curve

0.25

0.43

0.53

0.76

0.85

0.87

0.93

0.95

1.00

TPR

0.8

0.6

0.4

0.2

FPR

0.8

0.6

0.4

0.2

Class
P
Threshold
>=
TP

ROC Curve:

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

Test of Significance

Given two models:

Model M1: accuracy = 85%, tested on 30 instances
Model M2: accuracy = 75%, tested on 5000 instances

Can we say M1 is better than M2?

How much confidence can we place on accuracy of
M1 and M2?
Can the difference in performance measure be
explained as a result of random fluctuations in the test
set?

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

Confidence Interval for Accuracy

Prediction can be regarded as a Bernoulli trial

A Bernoulli trial has 2 possible outcomes
Possible outcomes for prediction: correct or wrong
Collection of Bernoulli trials has a Binomial distribution:

x Bin(N, p)

e.g: Toss a fair coin 50 times, how many heads would turn up?
Expected number of heads = Np = 50 0.5 = 25

x: number of correct predictions

Given x (# of correct predictions) or equivalently,

acc=x/N, and N (# of test instances),
Can we predict p (true accuracy of model)?

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

Confidence Interval for Accuracy

Area = 1 -

For large test sets (N > 30),

acc has a normal distribution
with mean p and variance
p(1-p)/N

P(Z
/2

acc p
Z
p (1 p ) / N

1 / 2

Z/2

Z1- /2

Confidence Interval for p:

2 N acc Z Z 4 N acc 4 N acc

p
2( N Z )
2

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

Confidence Interval for Accuracy

Consider

a model that produces an accuracy of

80% when evaluated on 100 test instances:
N=100, acc = 0.8
Let 1- = 0.95 (95% confidence)
From probability table, Z/2=1.96

0.99 2.58
0.98 2.33

100

500

1000

5000

0.95 1.96

p(lower)

0.670

0.711

0.763

0.774

0.789

0.90 1.65

p(upper)

0.888

0.866

0.833

0.824

0.811

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

100

Comparing Performance of 2
Models
Given

two models, say M1 and M2, which is

better?
M1 is tested on D1 (size=n1), found error rate = e1
M2 is tested on D2 (size=n2), found error rate = e2
Assume D1 and D2 are independent
If n1 and n2 are sufficiently large, then

e1 ~ N 1 , 1

e2 ~ N 2 , 2
e (1 e )
Approximate:
n
i

Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

101

Comparing Performance of 2
Models
To

test if performance difference is statistically

significant: d = e1 e2
d ~ N(dt,t) where dt is the true difference
Since D1 and D2 are independent, their variance adds
up:

e1(1 e1) e2(1 e2)

n1
n2
2

At (1-) confidence level,

Tan,Steinbach, Kumar

d d Z

Introduction to Data Mining

4/18/2004

102

An Illustrative Example
Given: M1: n1 = 30, e1 = 0.15
M2: n2 = 5000, e2 = 0.25
d = |e2 e1| = 0.1 (2-sided test)

0.15(1 0.15) 0.25(1 0.25)

0.0043
30
5000
d

At 95% confidence level, Z/2=1.96

d 0.100 1.96 0.0043 0.100 0.128

=> Interval contains 0 => difference may not be

statistically significant
Tan,Steinbach, Kumar

Introduction to Data Mining

4/18/2004

103

Comparing Performance of 2
Algorithms
Each

learning algorithm may produce k models:

L1 may produce M11 , M12, , M1k

L2 may produce M21 , M22, , M2k
If

models are generated on the same test sets

D1,D2, , Dk (e.g., via cross-validation)
For each set: compute dj = e1j e2j
dj has mean dt and variance t
Estimate:

(d j
j 1

k (k 1)
d d t
t

Tan,Steinbach, Kumar

1 ,k 1

Introduction to Data Mining

4/18/2004

104

Chap4 Basic Classification
No ratings yet
Chap4 Basic Classification
101 pages
Lecture Notes For Chapter 4 Introduction To Data Mining: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 4 Introduction To Data Mining: by Tan, Steinbach, Kumar
101 pages
Chap4 - Basic - Classification-Admin and Economy
No ratings yet
Chap4 - Basic - Classification-Admin and Economy
31 pages
CS 6823 Data Mining: Classification Decision Tree
No ratings yet
CS 6823 Data Mining: Classification Decision Tree
39 pages
Chap4 Basic Classification
No ratings yet
Chap4 Basic Classification
51 pages
4-Chap4 Basic Classification
No ratings yet
4-Chap4 Basic Classification
128 pages
Chap4 Basic Classification PDF
No ratings yet
Chap4 Basic Classification PDF
101 pages
Classification Techniques
No ratings yet
Classification Techniques
50 pages
DM Lec6
No ratings yet
DM Lec6
18 pages
Lecture Notes For Chapter 4 Introduction To Data Mining: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 4 Introduction To Data Mining: by Tan, Steinbach, Kumar
82 pages
Lecture Notes For Chapter 4 Introduction To Data Mining: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 4 Introduction To Data Mining: by Tan, Steinbach, Kumar
35 pages
Data Mining Course: Classification & Decision Trees
No ratings yet
Data Mining Course: Classification & Decision Trees
77 pages
Unit 4 Data Mining Algorithms: Dr. Anjan Krishnamurthy Associate Professor Bmsit&M
No ratings yet
Unit 4 Data Mining Algorithms: Dr. Anjan Krishnamurthy Associate Professor Bmsit&M
95 pages
Classification Techniques Overview
No ratings yet
Classification Techniques Overview
160 pages
Data Mining Classification Basics
No ratings yet
Data Mining Classification Basics
21 pages
Lecture Notes For Chapter 1: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 1: by Tan, Steinbach, Kumar
34 pages
Important For Data Mining
No ratings yet
Important For Data Mining
96 pages
06 Classification
No ratings yet
06 Classification
32 pages
Decision Tree 1
No ratings yet
Decision Tree 1
59 pages
Chap3 Basic Classification
No ratings yet
Chap3 Basic Classification
59 pages
8 Basic Classification
No ratings yet
8 Basic Classification
59 pages
Chap3 Basic Classification
No ratings yet
Chap3 Basic Classification
29 pages
Lecture Notes For Chapter 3 Introduction To Data Mining, 2 Edition
No ratings yet
Lecture Notes For Chapter 3 Introduction To Data Mining, 2 Edition
61 pages
Chap3 Basic Classification
No ratings yet
Chap3 Basic Classification
63 pages
Chapter 6. Decision Tree Classification
No ratings yet
Chapter 6. Decision Tree Classification
19 pages
DMDW - Unit 3 - Classification
No ratings yet
DMDW - Unit 3 - Classification
43 pages
Liaquat Majeed Sheikh: National University of Computer and Emerging Sciences
No ratings yet
Liaquat Majeed Sheikh: National University of Computer and Emerging Sciences
79 pages
Data Mining Classification Basics
No ratings yet
Data Mining Classification Basics
58 pages
Lecture Notes For Chapter 3 Introduction To Data Mining, 2 Edition
No ratings yet
Lecture Notes For Chapter 3 Introduction To Data Mining, 2 Edition
59 pages
L4 Classification
No ratings yet
L4 Classification
7 pages
Basic Classification
No ratings yet
Basic Classification
58 pages
A.I. Lecture 6 NEW
No ratings yet
A.I. Lecture 6 NEW
59 pages
Week 6 Chap3 - Basic - Classificationi
No ratings yet
Week 6 Chap3 - Basic - Classificationi
59 pages
Lecture Notes For Chapter 3: by Tan, Steinbach, Karpatne, Kumar
No ratings yet
Lecture Notes For Chapter 3: by Tan, Steinbach, Karpatne, Kumar
58 pages
Chapter 3 DESKTOP VS93238 S Conflicted Copy 2019-09-29
No ratings yet
Chapter 3 DESKTOP VS93238 S Conflicted Copy 2019-09-29
55 pages
Lecture Notes For Chapter 3: by Tan, Steinbach, Karpatne, Kumar
No ratings yet
Lecture Notes For Chapter 3: by Tan, Steinbach, Karpatne, Kumar
58 pages
1 Lect - 1.2 - 12 - August 2022 PDF
No ratings yet
1 Lect - 1.2 - 12 - August 2022 PDF
59 pages
Chap4 - Basic - Classification - Class Teaching
No ratings yet
Chap4 - Basic - Classification - Class Teaching
168 pages
CSE2021 - MODULE 1ppt
No ratings yet
CSE2021 - MODULE 1ppt
62 pages
Classification Basics
No ratings yet
Classification Basics
65 pages
DM Consolidated
100% (1)
DM Consolidated
676 pages
By Eesha Tur Razia Babar: 2/1/2021 Introduction To Data Mining, 2 Edition 1
No ratings yet
By Eesha Tur Razia Babar: 2/1/2021 Introduction To Data Mining, 2 Edition 1
63 pages
CH 6
No ratings yet
CH 6
72 pages
Classification: Lecture Notes For Chapters 4 & 5
No ratings yet
Classification: Lecture Notes For Chapters 4 & 5
42 pages
Lecture Notes For Chapter 1 Introduction To Data Mining: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 1 Introduction To Data Mining: by Tan, Steinbach, Kumar
34 pages
Lecture Notes For Chapter 3 Introduction To Data Mining, 2 Edition
No ratings yet
Lecture Notes For Chapter 3 Introduction To Data Mining, 2 Edition
58 pages
Classification: Basic Concepts and Decision Trees
No ratings yet
Classification: Basic Concepts and Decision Trees
71 pages
Data Mining for CS Students
No ratings yet
Data Mining for CS Students
406 pages
Big Data Classification Basics
No ratings yet
Big Data Classification Basics
47 pages
Data Mining
No ratings yet
Data Mining
25 pages
Introduction To Data Mining
100% (1)
Introduction To Data Mining
643 pages
Lec 1
No ratings yet
Lec 1
33 pages
Datamining 1class
No ratings yet
Datamining 1class
76 pages
05 Chap3 - Basic - Classification Edited On Oct 10, 2023
No ratings yet
05 Chap3 - Basic - Classification Edited On Oct 10, 2023
78 pages
Lecture Notes For Chapter 2: by Tan, Steinbach, Kumar
No ratings yet
Lecture Notes For Chapter 2: by Tan, Steinbach, Kumar
25 pages
Computer Programming-I (Cs 1301) Tutorial - #02
No ratings yet
Computer Programming-I (Cs 1301) Tutorial - #02
6 pages
Tutorial 1
No ratings yet
Tutorial 1
6 pages
Chapter - 3
No ratings yet
Chapter - 3
45 pages
Time Value of Money in Finance
No ratings yet
Time Value of Money in Finance
49 pages
Chapter - 1
No ratings yet
Chapter - 1
35 pages
Information Systems in Organizations
No ratings yet
Information Systems in Organizations
25 pages
Time Series
No ratings yet
Time Series
29 pages
Students Perception of MBKM Policy - at University Level
No ratings yet
Students Perception of MBKM Policy - at University Level
41 pages
Machine Learning Course Overview
No ratings yet
Machine Learning Course Overview
17 pages
Neurological Physiotherapy Assignment# 2
No ratings yet
Neurological Physiotherapy Assignment# 2
6 pages
PhIMO 2023 Heat - PH RESULTS
No ratings yet
PhIMO 2023 Heat - PH RESULTS
41 pages
1628172537BS-Curriculum-GEB 2019-2020 Onword
No ratings yet
1628172537BS-Curriculum-GEB 2019-2020 Onword
113 pages
History Note
No ratings yet
History Note
8 pages
Lesson 2-Science Oobleck
No ratings yet
Lesson 2-Science Oobleck
4 pages
Formats For Essay & Letter
No ratings yet
Formats For Essay & Letter
4 pages
Unpacked Pe and Health Quarter 2
No ratings yet
Unpacked Pe and Health Quarter 2
4 pages
The Autism Diagnostic Interview-Revised and Cars
No ratings yet
The Autism Diagnostic Interview-Revised and Cars
2 pages
Physical Science - Q3 - SLM4
No ratings yet
Physical Science - Q3 - SLM4
16 pages
Ict Ccs Grade9 q3 Las3 Week3
No ratings yet
Ict Ccs Grade9 q3 Las3 Week3
11 pages
Lesson 2 Trend
No ratings yet
Lesson 2 Trend
13 pages
Military Lands and Cantonments Department: of RS: 240/-From Designated Bank Branches
No ratings yet
Military Lands and Cantonments Department: of RS: 240/-From Designated Bank Branches
4 pages
Area Statement PDF
No ratings yet
Area Statement PDF
4 pages
Practical Research 1 Day 2
No ratings yet
Practical Research 1 Day 2
4 pages
Motivation Later - For German Universities
No ratings yet
Motivation Later - For German Universities
1 page
Final Project, ISYE 6414 (2025.5.12)
No ratings yet
Final Project, ISYE 6414 (2025.5.12)
2 pages
Managerial Epidemiology For Health Care Organizations Public HealthEpidemiology and Biostatistics Full Download
100% (1)
Managerial Epidemiology For Health Care Organizations Public HealthEpidemiology and Biostatistics Full Download
408 pages
Week Eight, March 12
No ratings yet
Week Eight, March 12
15 pages
Arul Resume
No ratings yet
Arul Resume
3 pages
JD - Key Account Manager - Blinkit
No ratings yet
JD - Key Account Manager - Blinkit
2 pages
JEE Main 2020 Score Report
No ratings yet
JEE Main 2020 Score Report
2 pages
Critical Analysis of Role of Teacher and School in Pakistani Community
30% (10)
Critical Analysis of Role of Teacher and School in Pakistani Community
1 page
Engineering Dynamics A Primer Oliver M. O'Reilly Full
100% (2)
Engineering Dynamics A Primer Oliver M. O'Reilly Full
73 pages
Astrophysics IGCSE 9-1 Physics PDF
75% (4)
Astrophysics IGCSE 9-1 Physics PDF
31 pages
Implementation of Korean and Other Foreign Languages in School
100% (1)
Implementation of Korean and Other Foreign Languages in School
4 pages
2023清华大学外籍教师自主科研国际合作专项项目征集通知
No ratings yet
2023清华大学外籍教师自主科研国际合作专项项目征集通知
6 pages
Cont TP LVL2 0808 U08 Ak
No ratings yet
Cont TP LVL2 0808 U08 Ak
2 pages