0% found this document useful (0 votes)

17 views30 pages

Data Mining Concepts and Techniques

Chapter 6 discusses classification and prediction in data mining, defining classification as a process that predicts categorical class labels and prediction as modeling continuous-valued functions. It covers various classification methods such as decision trees, support vector machines, and Bayesian classification, along with issues related to data preparation and evaluation of classification methods. The chapter also emphasizes the importance of model construction, accuracy measures, and techniques like information gain and Gini index for attribute selection.

Uploaded by

Vit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views30 pages

Data Mining Concepts and Techniques

Uploaded by

Vit

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 30

Chapter 6.

Classification and Prediction

Data Mining:
Concepts and Techniques What is classification? What is
prediction?
Support Vector Machines (SVM)

Associative classification
Issues regarding classification Lazy learners (or learning from
— Chapter 6 — and prediction your neighbors)
Classification by decision tree Other classification methods
induction
Jiawei Han Prediction
Department of Computer Science Bayesian classification Accuracy and error measures
University of Illinois at Urbana-Champaign Rule-based classification Ensemble methods
www.cs.uiuc.edu/~hanj Classification by back Model selection
©2006 Jiawei Han and Micheline Kamber, All rights reserved propagation
Summary
95年10月17日星期二 Data Mining: Concepts and Techniques 1 95年10月17日星期二 Data Mining: Concepts and Techniques 2

Classification vs. Prediction Classification—A Two-Step Process

Classification Model construction: describing a set of predetermined classes

Each tuple/sample is assumed to belong to a predefined class,
predicts categorical class labels (discrete or nominal)
as determined by the class label attribute
classifies data (constructs a model) based on the The set of tuples used for model construction is training set

training set and the values (class labels) in a The model is represented as classification rules, decision trees,

classifying attribute and uses it in classifying new data or mathematical formulae

Prediction Model usage: for classifying future or unknown objects
Estimate accuracy of the model
models continuous-valued functions, i.e., predicts
The known label of test sample is compared with the
unknown or missing values classified result from the model
Typical applications Accuracy rate is the percentage of test set samples that are

Credit approval correctly classified by the model

Test set is independent of training set, otherwise over-fitting
Target marketing
will occur
Medical diagnosis If the accuracy is acceptable, use the model to classify data

Fraud detection tuples whose class labels are not known

95年10月17日星期二 Data Mining: Concepts and Techniques 3 95年10月17日星期二 Data Mining: Concepts and Techniques 4
Process (1): Model Construction Process (2): Using the Model in Prediction

Classification
Algorithms
Training Classifier
Data
Testing
Data Unseen Data
NAME RANK YEARS TENURED Classifier
M ike Assistant Prof 3 no (Model)
(Jeff, Professor, 4)
M ary Assistant Prof 7 yes
Bill Professor 2 yes NAME RANK YEARS TENURED
Jim Associate Prof 7 yes Tom Assistant Prof 2 no Tenured?
IF rank = ‘professor’ Merlisa Associate Prof 7 no
Dave Assistant Prof 6 no
OR years > 6 George Professor 5 yes
Anne Associate Prof 3 no
THEN tenured = ‘yes’ Joseph Assistant Prof 7 yes
95年10月17日星期二 Data Mining: Concepts and Techniques 5 95年10月17日星期二 Data Mining: Concepts and Techniques 6

Supervised vs. Unsupervised Learning Chapter 6. Classification and Prediction

What is classification? What is Support Vector Machines (SVM)

Supervised learning (classification)
prediction? Associative classification
Supervision: The training data (observations,
Issues regarding classification Lazy learners (or learning from
measurements, etc.) are accompanied by labels

and prediction your neighbors)

indicating the class of the observations
Classification by decision tree Other classification methods
New data is classified based on the training set

induction Prediction
Unsupervised learning (clustering)

Bayesian classification Accuracy and error measures

The class labels of training data is unknown

Rule-based classification Ensemble methods

Given a set of measurements, observations, etc. with

the aim of establishing the existence of classes or Classification by back Model selection
clusters in the data propagation
Summary
95年10月17日星期二 Data Mining: Concepts and Techniques 7 95年10月17日星期二 Data Mining: Concepts and Techniques 8
Issues: Data Preparation Issues: Evaluating Classification Methods

Accuracy
Data cleaning classifier accuracy: predicting class label

Preprocess data in order to reduce noise and handle predictor accuracy: guessing value of predicted

missing values attributes

Speed
Relevance analysis (feature selection)
time to construct the model (training time)
Remove the irrelevant or redundant attributes time to use the model (classification/prediction time)

Data transformation Robustness: handling noise and missing values

Generalize and/or normalize data Scalability: efficiency in disk-resident databases
Interpretability
understanding and insight provided by the model

Other measures, e.g., goodness of rules, such as decision

tree size or compactness of classification rules
95年10月17日星期二 Data Mining: Concepts and Techniques 9 95年10月17日星期二 Data Mining: Concepts and Techniques 10

Chapter 6. Classification and Prediction Decision Tree Induction: Training Dataset

What is classification? What is Support Vector Machines (SVM) age income student credit_rating buys_computer
<=30 high no fair no
prediction? Associative classification
This <=30 high no excellent no
Issues regarding classification Lazy learners (or learning from 31…40 high no fair yes

follows an >40 medium no fair yes
and prediction your neighbors) example >40 low yes fair yes
Classification by decision tree Other classification methods of >40 low yes excellent no
31…40 low yes excellent yes
induction Prediction Quinlan’s <=30 medium no fair no
Bayesian classification Accuracy and error measures
ID3 <=30 low yes fair yes
(Playing >40 medium yes fair yes
Rule-based classification Ensemble methods <=30 medium yes excellent yes

Tennis) 31…40 medium no excellent yes
Classification by back Model selection
31…40 high yes fair yes
propagation >40 medium no excellent no
Summary
95年10月17日星期二 Data Mining: Concepts and Techniques 11 95年10月17日星期二 Data Mining: Concepts and Techniques 12
Output: A Decision Tree for “buys_computer” Algorithm for Decision Tree Induction
Basic algorithm (a greedy algorithm)
Tree is constructed in a top-down recursive divide-and-conquer
age?
manner
At start, all the training examples are at the root

<=30 overcast
31..40 >40 Attributes are categorical (if continuous-valued, they are

discretized in advance)
Examples are partitioned recursively based on selected attributes
student? yes credit rating?
Test attributes are selected on the basis of a heuristic or

statistical measure (e.g., information gain)

no yes excellent fair
Conditions for stopping partitioning
All samples for a given node belong to the same class
no yes no yes
There are no remaining attributes for further partitioning –

majority voting is employed for classifying the leaf

There are no samples left

95年10月17日星期二 Data Mining: Concepts and Techniques 13 95年10月17日星期二 Data Mining: Concepts and Techniques 14

Attribute Selection Measure: Attribute Selection: Information Gain

Information Gain (ID3/C4.5)
g Class P: buys_computer = “yes” 5 4
Info age ( D ) = I ( 2,3) + I ( 4,0 )
Select the attribute with the highest information gain g Class N: buys_computer = “no” 14 14
Let pi be the probability that an arbitrary tuple in D Info ( D ) = I (9 ,5 ) = −
9 9 5 5
log 2 ( ) − log 2 ( ) = 0 .940 +
5
I (3, 2 ) = 0 .694
14 14 14 14
belongs to class Ci, estimated by |Ci, D|/|D| 14
Expected information (entropy) needed to classify a tuple age pi ni I(p i, ni) 5
I ( 2,3) means “age <=30” has 5
in D: m <=30 2 3 0.971 14
out of 14 samples, with 2 yes’es
Info ( D ) = − ∑ pi log 2 ( pi ) 31…40 4 0 0
i =1 and 3 no’s. Hence
>40 3 2 0.971
Information needed (after using A to split D into v age income student credit_rating buys_computer Gain ( age ) = Info ( D ) − Info age ( D ) = 0.246
v |D | <=30 high no fair no
partitions) to classify D:
InfoA (D) = ∑ × I (Dj )
j <=30 high no excellent no
31…40 high no fair yes Similarly,
j =1 | D |
>40 medium no fair yes
>40 low yes fair yes

Information gained by branching on attribute A

>40 low
31…40 low
yes
yes
excellent
excellent
no
yes
Gain(income) = 0.029
Gain( student ) = 0.151
<=30 medium no fair no

Gain(A)= Info(D)− InfoA(D)

<=30 low yes fair yes
>40 medium yes fair yes
<=30 medium
31…40 medium
yes
no
excellent
excellent
yes
yes
Gain(credit _ rating ) = 0.048
31…40 high yes fair yes
95年10月17日星期二 Data Mining: Concepts and Techniques 15 >4095年10月17日星期二
medium no excellent Data Mining:
no Concepts and Techniques 16
Computing Information-Gain for
Gain Ratio for Attribute Selection (C4.5)
Continuous-Value Attributes
Let attribute A be a continuous-valued attribute Information gain measure is biased towards attributes
Must determine the best split point for A with a large number of values
C4.5 (a successor of ID3) uses gain ratio to overcome the
Sort the value A in increasing order
problem (normalization to information gain)
Typically, the midpoint between each pair of adjacent v | Dj | | Dj |
values is considered as a possible split point SplitInfo A ( D) = −∑ × log 2 ( )
j =1 |D| |D|
(ai+ai+1)/2 is the midpoint between the values of ai and ai+1
GainRatio(A) = Gain(A)/SplitInfo(A)
The point with the minimum expected information 4 4 6 6 4 4
Ex. SplitInfo A ( D ) = − × log 2 ( ) − × log 2 ( ) − × log 2 ( ) = 0 .926
requirement for A is selected as the split-point for A 14 14 14 14 14 14
gain_ratio(income) = 0.029/0.926 = 0.031
Split:
The attribute with the maximum gain ratio is selected as
D1 is the set of tuples in D satisfying A ≤ split-point, and
the splitting attribute
D2 is the set of tuples in D satisfying A > split-point
95年10月17日星期二 Data Mining: Concepts and Techniques 17 95年10月17日星期二 Data Mining: Concepts and Techniques 18

Gini index (CART, IBM IntelligentMiner) Gini index (CART, IBM IntelligentMiner)

If a data set D contains examples from n classes, gini index, gini(D) is Ex. D has 9 tuples in buys_computer = “yes” and 5 in “no”
defined as n 9 5
2 2

gini ( D ) = 1 − ∑ p 2j gini( D) = 1 −   −   = 0.459

 14   14 
j =1
Suppose the attribute income partitions D into 10 in D1: {low, medium}
where pj is the relative frequency of class j in D  10  4
and 4 in D2 giniincome∈{low,medium} ( D) =  Gini( D1 ) +  Gini( D1 )
If a data set D is split on A into two subsets D1 and D2, the gini index  14   14 
gini(D) is defined as
| D1 | |D |
gini A ( D ) = gini ( D 1) + 2 gini ( D 2 )
|D | |D |
Reduction in Impurity: but gini{medium,high} is 0.30 and thus the best since it is the lowest
∆gini( A) = gini(D) − giniA (D) All attributes are assumed continuous-valued
May need other tools, e.g., clustering, to get the possible split values
The attribute provides the smallest ginisplit(D) (or the largest reduction
in impurity) is chosen to split the node (need to enumerate all the Can be modified for categorical attributes
possible splitting points for each attribute)
95年10月17日星期二 Data Mining: Concepts and Techniques 19 95年10月17日星期二 Data Mining: Concepts and Techniques 20
Comparing Attribute Selection Measures Other Attribute Selection Measures

The three measures, in general, return good results but CHAID: a popular decision tree algorithm, measure based on χ2 test
for independence
Information gain:
C-SEP: performs better than info. gain and gini index in certain cases
biased towards multivalued attributes
G-statistics: has a close approximation to χ2 distribution
Gain ratio:
MDL (Minimal Description Length) principle (i.e., the simplest solution
tends to prefer unbalanced splits in which one is preferred):
partition is much smaller than the others The best tree as the one that requires the fewest # of bits to both
Gini index: (1) encode the tree, and (2) encode the exceptions to the tree
biased to multivalued attributes Multivariate splits (partition based on multiple variable combinations)
CART: finds multivariate splits based on a linear comb. of attrs.
Which attribute selection measure is the best?
Most give good results, none is significantly superior than others

95年10月17日星期二 Data Mining: Concepts and Techniques 21 95年10月17日星期二 Data Mining: Concepts and Techniques 22

Overfitting and Tree Pruning Enhancements to Basic Decision Tree Induction

Overfitting: An induced tree may overfit the training data Allow for continuous-valued attributes
Too many branches, some may reflect anomalies due to noise or Dynamically define new discrete-valued attributes that
outliers partition the continuous attribute value into a discrete
Poor accuracy for unseen samples set of intervals
Two approaches to avoid overfitting Handle missing attribute values
Prepruning: Halt tree construction early—do not split a node if this Assign the most common value of the attribute
would result in the goodness measure falling below a threshold
Assign probability to each of the possible values
Difficult to choose an appropriate threshold
Attribute construction
Postpruning: Remove branches from a “fully grown” tree—get a
sequence of progressively pruned trees Create new attributes based on existing ones that are
Use a set of data different from the training data to decide
sparsely represented
which is the “best pruned tree” This reduces fragmentation, repetition, and replication
95年10月17日星期二 Data Mining: Concepts and Techniques 23 95年10月17日星期二 Data Mining: Concepts and Techniques 24
Classification in Large Databases Scalable Decision Tree Induction Methods

Classification—a classical problem extensively studied by SLIQ (EDBT’96 — Mehta et al.)

statisticians and machine learning researchers Builds an index for each attribute and only class list and

Scalability: Classifying data sets with millions of examples the current attribute list reside in memory
and hundreds of attributes with reasonable speed SPRINT (VLDB’96 — J. Shafer et al.)
Constructs an attribute list data structure
Why decision tree induction in data mining?
relatively faster learning speed (than other classification
PUBLIC (VLDB’98 — Rastogi & Shim)
methods) Integrates tree splitting and tree pruning: stop growing

convertible to simple and easy to understand

the tree earlier
classification rules RainForest (VLDB’98 — Gehrke, Ramakrishnan & Ganti)
can use SQL queries for accessing databases Builds an AVC-list (attribute, value, class label)

comparable classification accuracy with other methods BOAT (PODS’99 — Gehrke, Ganti, Ramakrishnan & Loh)
Uses bootstrapping to create several small samples

95年10月17日星期二 Data Mining: Concepts and Techniques 25 95年10月17日星期二 Data Mining: Concepts and Techniques 26

Chapter 6. Classification and Prediction Bayesian Classification: Why?

What is classification? What is Support Vector Machines (SVM) A statistical classifier: performs probabilistic prediction, i.e.,
predicts class membership probabilities
prediction? Associative classification

Foundation: Based on Bayes’ Theorem.
Issues regarding classification Lazy learners (or learning from Performance: A simple Bayesian classifier, naïve Bayesian
and prediction your neighbors)
classifier, has comparable performance with decision tree
and selected neural network classifiers
Classification by decision tree Other classification methods
Incremental: Each training example can incrementally
induction Prediction increase/decrease the probability that a hypothesis is
correct — prior knowledge can be combined with observed
Bayesian classification Accuracy and error measures data
Rule-based classification Ensemble methods Standard: Even when Bayesian methods are
computationally intractable, they can provide a standard
Classification by back Model selection
of optimal decision making against which other methods
propagation
Summary
can be measured
95年10月17日星期二 Data Mining: Concepts and Techniques 27 95年10月17日星期二 Data Mining: Concepts and Techniques 28
Bayesian Theorem: Basics Bayesian Theorem

Let X be a data sample (“evidence”): class label is unknown Given training data X, posteriori probability of a
Let H be a hypothesis that X belongs to class C hypothesis H, P(H|X), follows the Bayes theorem
Classification is to determine P(H|X), the probability that
P (H | X ) = P (X | H )P (H )

the hypothesis holds given the observed data sample X P (X )

P(H) (prior probability), the initial probability
Informally, this can be written as
E.g., X will buy computer, regardless of age, income, …
posteriori = likelihood x prior/evidence
P(X): probability that sample data is observed
Predicts X belongs to C2 iff the probability P(Ci|X) is the
P(X|H) (posteriori probability), the probability of observing
highest among all the P(Ck|X) for all the k classes
the sample X, given that the hypothesis holds
E.g., Given that X will buy computer, the prob. that X is Practical difficulty: require initial knowledge of many
31..40, medium income probabilities, significant computational cost
95年10月17日星期二 Data Mining: Concepts and Techniques 29 95年10月17日星期二 Data Mining: Concepts and Techniques 30

Towards Naïve Bayesian Classifier Derivation of Naïve Bayes Classifier

Let D be a training set of tuples and their associated class A simplified assumption: attributes are conditionally
labels, and each tuple is represented by an n-D attribute independent (i.e., no dependence relation between
attributes): n
vector X = (x1, x2, …, xn) P( X | C i) = ∏ P ( x | C i ) = P( x | C i ) × P( x | C i) × ... × P( x | C i)
k 1 2 n
k =1
Suppose there are m classes C1, C2, …, Cm.
This greatly reduces the computation cost: Only counts
Classification is to derive the maximum posteriori, i.e., the the class distribution
maximal P(Ci|X)
If Ak is categorical, P(xk|Ci) is the # of tuples in Ci having
This can be derived from Bayes’ theorem value xk for Ak divided by |Ci, D| (# of tuples of Ci in D)
P(X | C )P(C )
P(C | X) = i i If Ak is continous-valued, P(xk|Ci) is usually computed
i P(X) based on Gaussian distribution with a mean µ and
Since P(X) is constant for all classes, only standard deviation σ −
( x−µ ) 2
1
P(C | X) = P(X | C )P(C ) g ( x, µ ,σ ) = 2σ 2
e
i i i 2π σ
needs to be maximized and P(xk|Ci) is
P(X| Ci) = g(xk , µCi ,σCi )
95年10月17日星期二 Data Mining: Concepts and Techniques 31 95年10月17日星期二 Data Mining: Concepts and Techniques 32
Naïve Bayesian Classifier: Training Dataset Naïve Bayesian Classifier: An Example
age income studentcredit_rating
buys_computer
P(Ci): P(buys_computer = “yes”) = 9/14 = 0.643
<=30 high no fair no P(buys_computer = “no”) = 5/14= 0.357
<=30 high no excellent no
Class: 31…40 high no fair yes Compute P(X|Ci) for each class
P(age = “<=30” | buys_computer = “yes”) = 2/9 = 0.222
C1:buys_computer = ‘yes’ >40 medium no fair yes P(age = “<= 30” | buys_computer = “no”) = 3/5 = 0.6
C2:buys_computer = ‘no’ >40 low yes fair yes P(income = “medium” | buys_computer = “yes”) = 4/9 = 0.444
P(income = “medium” | buys_computer = “no”) = 2/5 = 0.4
>40 low yes excellent no P(student = “yes” | buys_computer = “yes) = 6/9 = 0.667
Data sample P(student = “yes” | buys_computer = “no”) = 1/5 = 0.2
31…40 low yes excellent yes
X = (age <=30, P(credit_rating = “fair” | buys_computer = “yes”) = 6/9 = 0.667
Income = medium,
<=30 medium no fair no P(credit_rating = “fair” | buys_computer = “no”) = 2/5 = 0.4

Student = yes <=30 low yes fair yes

X = (age <= 30 , income = medium, student = yes, credit_rating = fair)
Credit_rating = Fair) >40 medium yes fair yes
<=30 medium yes excellent yes P(X|Ci) : P(X|buys_computer = “yes”) = 0.222 x 0.444 x 0.667 x 0.667 = 0.044
P(X|buys_computer = “no”) = 0.6 x 0.4 x 0.2 x 0.4 = 0.019
31…40 medium no excellent yes P(X|Ci)*P(Ci) : P(X|buys_computer = “yes”) * P(buys_computer = “yes”) = 0.028
31…40 high yes fair yes P(X|buys_computer = “no”) * P(buys_computer = “no”) = 0.007

>40 medium no excellent no Therefore, X belongs to class (“buys_computer = yes”)

95年10月17日星期二 Data Mining: Concepts and Techniques 33 95年10月17日星期二 Data Mining: Concepts and Techniques 34

Avoiding the 0-Probability Problem Naïve Bayesian Classifier: Comments

Naïve Bayesian prediction requires each conditional prob. be non-zero. Advantages
Otherwise, the predicted prob. will be zero Easy to implement
n
Good results obtained in most of the cases
P ( X | C i) = ∏ P ( x k | C i)
k =1 Disadvantages
Ex. Suppose a dataset with 1000 tuples, income=low (0), income= Assumption: class conditional independence, therefore

medium (990), and income = high (10), loss of accuracy

Use Laplacian correction (or Laplacian estimator) Practically, dependencies exist among variables
Adding 1 to each case
E.g., hospitals: patients: Profile: age, family history, etc.
Prob(income = low) = 1/1003 Symptoms: fever, cough etc., Disease: lung cancer, diabetes, etc.
Prob(income = medium) = 991/1003 Dependencies among these cannot be modeled by Naïve

Prob(income = high) = 11/1003 Bayesian Classifier

The “corrected” prob. estimates are close to their “uncorrected”
How to deal with these dependencies?
counterparts
Bayesian Belief Networks
95年10月17日星期二 Data Mining: Concepts and Techniques 35 95年10月17日星期二 Data Mining: Concepts and Techniques 36
Bayesian Belief Networks Bayesian Belief Network: An Example

Family The conditional probability table

Bayesian belief network allows a subset of the variables History
Smoker
(CPT) for variable LungCancer:
conditionally independent (FH, S) (FH, ~S) (~FH, S) (~FH, ~S)

A graphical model of causal relationships LC 0.8 0.5 0.7 0.1

Represents dependency among the variables LungCancer Emphysema ~LC 0.2 0.5 0.3 0.9
Gives a specification of joint probability distribution
CPT shows the conditional probability for
Nodes: random variables each possible combination of its parents
Links: dependency
X Y X and Y are the parents of Z, and Y is PositiveXRay Dyspnea Derivation of the probability of a
particular combination of values of X,
the parent of P from CPT:
Z No dependency between Z and P Bayesian Belief Networks n
P ( x 1 ,..., x n ) = ∏ P ( x i | Parents ( Y i ))
P i =1
Has no loops or cycles
95年10月17日星期二 Data Mining: Concepts and Techniques 37 95年10月17日星期二 Data Mining: Concepts and Techniques 38

Training Bayesian Networks Chapter 6. Classification and Prediction

Several scenarios: What is classification? What is Support Vector Machines (SVM)

Given both the network structure and all variables
prediction? Associative classification
observable: learn only the CPTs
Issues regarding classification Lazy learners (or learning from
Network structure known, some hidden variables:

gradient descent (greedy hill-climbing) method, and prediction your neighbors)

analogous to neural network learning Classification by decision tree Other classification methods
Network structure unknown, all variables observable: induction Prediction
search through the model space to reconstruct
Bayesian classification
network topology
Accuracy and error measures
Unknown structure, all hidden variables: No good Rule-based classification Ensemble methods
algorithms known for this purpose Classification by back Model selection
Ref. D. Heckerman: Bayesian networks for data mining propagation
Summary
95年10月17日星期二 Data Mining: Concepts and Techniques 39 95年10月17日星期二 Data Mining: Concepts and Techniques 40
Using IF-THEN Rules for Classification Rule Extraction from a Decision Tree
age?

<=30 31..40
Represent the knowledge in the form of IF-THEN rules >40
Rules are easier to understand than large trees student? credit rating?
R: IF age = youth AND student = yes THEN buys_computer = yes yes
One rule is created for each path from the root
Rule antecedent/precondition vs. rule consequent no yes excellent fair
to a leaf
Assessment of a rule: coverage and accuracy no yes no yes

ncovers = # of tuples covered by R Each attribute-value pair along a path forms a

ncorrect = # of tuples correctly classified by R conjunction: the leaf holds the class prediction
coverage(R) = ncovers /|D| /* D: training data set */ Rules are mutually exclusive and exhaustive
accuracy(R) = ncorrect / ncovers
Example: Rule extraction from our buys_computer decision-tree
If more than one rule is triggered, need conflict resolution
IF age = young AND student = no THEN buys_computer = no
Size ordering: assign the highest priority to the triggering rules that has
the “toughest” requirement (i.e., with the most attribute test) IF age = young AND student = yes THEN buys_computer = yes
Class-based ordering: decreasing order of prevalence or misclassification IF age = mid-age THEN buys_computer = yes
cost per class IF age = old AND credit_rating = excellent THEN buys_computer = yes
Rule-based ordering (decision list): rules are organized into one long IF age = young AND credit_rating = fair THEN buys_computer = no
priority list, according to some measure of rule quality or by experts
95年10月17日星期二 Data Mining: Concepts and Techniques 41 95年10月17日星期二 Data Mining: Concepts and Techniques 42

Rule Extraction from the Training Data How to Learn-One-Rule?

Star with the most general rule possible: condition = empty
Sequential covering algorithm: Extracts rules directly from training data
Adding new attributes by adopting a greedy depth-first strategy
Typical sequential covering algorithms: FOIL, AQ, CN2, RIPPER
Picks the one that most improves the rule quality
Rules are learned sequentially, each for a given class Ci will cover many
tuples of Ci but none (or few) of the tuples of other classes Rule-Quality measures: consider both coverage and accuracy

Steps: Foil-gain (in FOIL & RIPPER): assesses info_gain by extending

condition pos' pos
Rules are learned one at a time FOIL_ Gain = pos'×(log2 − log2 )
pos'+neg' pos + neg
Each time a rule is learned, the tuples covered by the rules are
It favors rules that have high accuracy and cover many positive tuples
removed
Rule pruning based on an independent set of test tuples
The process repeats on the remaining tuples unless termination
pos − neg
condition, e.g., when no more training examples or when the quality FOIL _ Prune( R) =
pos + neg
of a rule returned is below a user-specified threshold
Pos/neg are # of positive/negative tuples covered by R.
Comp. w. decision-tree induction: learning a set of rules simultaneously
If FOIL_Prune is higher for the pruned version of R, prune R
95年10月17日星期二 Data Mining: Concepts and Techniques 43 95年10月17日星期二 Data Mining: Concepts and Techniques 44
Chapter 6. Classification and Prediction Classification: A Mathematical Mapping

What is classification? What is Support Vector Machines (SVM) Classification:

prediction? Associative classification predicts categorical class labels

Issues regarding classification Lazy learners (or learning from E.g., Personal homepage classification
and prediction your neighbors) xi = (x1, x2, x3, …), yi = +1 or –1
Classification by decision tree Other classification methods x1 : # of a word “homepage”

induction Prediction

x2 : # of a word “welcome”
Bayesian classification Accuracy and error measures

Mathematically
Rule-based classification
x ∈ X = ℜ , y ∈ Y = {+1, –1}

Ensemble methods n
Classification by back Model selection We want a function f: X Y
propagation
Summary
95年10月17日星期二 Data Mining: Concepts and Techniques 45 95年10月17日星期二 Data Mining: Concepts and Techniques 46

Linear Classification Discriminative Classifiers

Binary Classification Advantages

problem prediction accuracy is generally high

The data above the red As compared to Bayesian methods – in general

line belongs to class ‘x’ robust, works when training examples contain errors
x The data below red line
x fast evaluation of the learned target function
x x x belongs to class ‘o’ Bayesian networks are normally slow
x Examples: SVM, Criticism
x x o
x Perceptron, Probabilistic
o long training time

x o o Classifiers
oo o difficult to understand the learned function (weights)

o o Bayesian networks can be used easily for pattern discovery

o o o o not easy to incorporate domain knowledge
Easy in the form of priors on the data or distributions

95年10月17日星期二 Data Mining: Concepts and Techniques 47 95年10月17日星期二 Data Mining: Concepts and Techniques 48
Perceptron & Winnow Classification by Backpropagation
• Vector: x, w
x2 Backpropagation: A neural network learning algorithm
• Scalar: x, y, w
Input: {(x1, y1), …} Started by psychologists and neurobiologists to develop
Output: classification function f(x) and test computational analogues of neurons
f(xi) > 0 for yi = +1 A neural network: A set of connected input/output units
f(xi) < 0 for yi = -1
where each connection has a weight associated with it
f(x) => wx + b = 0
During the learning phase, the network learns by
or w1x1+w2x2+b = 0
adjusting the weights so as to be able to predict the
• Perceptron: update W
additively correct class label of the input tuples
• Winnow: update W Also referred to as connectionist learning due to the
multiplicatively
x1 connections between units

95年10月17日星期二 Data Mining: Concepts and Techniques 49 95年10月17日星期二 Data Mining: Concepts and Techniques 50

Neural Network as a Classifier A Neuron (= a perceptron)

Weakness
- µk

Long training time

Require a number of parameters typically best determined x0 w0
empirically, e.g., the network topology or ``structure."
x1 w1
Poor interpretability: Difficult to interpret the symbolic meaning
behind the learned weights and of ``hidden units" in the network
∑ f
output y
Strength xn wn
High tolerance to noisy data
For Example
Ability to classify untrained patterns n

Well-suited for continuous-valued inputs and outputs

Input weight weighted Activation y = sign( ∑ wi xi + µ k )
Successful on a wide array of real-world data vector x vector w sum function i =0

Algorithms are inherently parallel

The n-dimensional input vector x is mapped into variable y by
Techniques have recently been developed for the extraction of means of the scalar product and a nonlinear function mapping
rules from trained neural networks
95年10月17日星期二 Data Mining: Concepts and Techniques 51 95年10月17日星期二 Data Mining: Concepts and Techniques 52
A Multi-Layer Feed-Forward Neural Network How A Multi-Layer Neural Network Works?

The inputs to the network correspond to the attributes measured

Output vector
for each training tuple
Err j = O j (1 − O j )∑ Errk w jk Inputs are fed simultaneously into the units making up the input
Output layer k layer
θ j = θ j + (l) Err j They are then weighted and fed simultaneously to a hidden layer
wij = wij + (l ) Err j Oi The number of hidden layers is arbitrary, although usually only one

Hidden layer Err j = O j (1 − O j )(T j − O j ) The weighted outputs of the last hidden layer are input to units
making up the output layer, which emits the network's prediction
wij 1
Oj = −I j
The network is feed-forward in that none of the weights cycles
1+ e back to an input unit or to an output unit of a previous layer
Input layer
I j = ∑ wij Oi + θ j From a statistical point of view, networks perform nonlinear
i
regression: Given enough hidden units and enough training
Input vector: X samples, they can closely approximate any function
95年10月17日星期二 Data Mining: Concepts and Techniques 53 95年10月17日星期二 Data Mining: Concepts and Techniques 54

Defining a Network Topology Backpropagation

First decide the network topology: # of units in the Iteratively process a set of training tuples & compare the network's
prediction with the actual known target value
input layer, # of hidden layers (if > 1), # of units in each
hidden layer, and # of units in the output layer For each training tuple, the weights are modified to minimize the
mean squared error between the network's prediction and the
Normalizing the input values for each attribute measured in actual target value
the training tuples to [0.0—1.0]
Modifications are made in the “backwards” direction: from the output
One input unit per domain value, each initialized to 0 layer, through each hidden layer down to the first hidden layer, hence
Output, if for classification and more than two classes, “backpropagation”
one output unit per class is used Steps
Initialize weights (to small random #s) and biases in the network
Once a network has been trained and its accuracy is
Propagate the inputs forward (by applying activation function)
unacceptable, repeat the training process with a different
Backpropagate the error (by updating weights and biases)
network topology or a different set of initial weights Terminating condition (when error is very small, etc.)

95年10月17日星期二 Data Mining: Concepts and Techniques 55 95年10月17日星期二 Data Mining: Concepts and Techniques 56
Backpropagation and Interpretability Chapter 6. Classification and Prediction
Efficiency of backpropagation: Each epoch (one interation through the
What is classification? What is Support Vector Machines (SVM)
training set) takes O(|D| * w), with |D| tuples and w weights, but # of
epochs can be exponential to n, the number of inputs, in the worst prediction? Associative classification
case
Issues regarding classification Lazy learners (or learning from
Rule extraction from networks: network pruning
and prediction your neighbors)
Simplify the network structure by removing weighted links that
have the least effect on the trained network Classification by decision tree Other classification methods
Then perform link, unit, or activation value clustering induction Prediction
The set of input and activation values are studied to derive rules
Bayesian classification Accuracy and error measures
describing the relationship between the input and hidden unit

layers Rule-based classification Ensemble methods

Sensitivity analysis: assess the impact that a given input variable has
Classification by back Model selection
on a network output. The knowledge gained from this analysis can be

represented in rules propagation

Summary
95年10月17日星期二 Data Mining: Concepts and Techniques 57 95年10月17日星期二 Data Mining: Concepts and Techniques 58

SVM—Support Vector Machines SVM—History and Applications

A new classification method for both linear and nonlinear Vapnik and colleagues (1992)—groundwork from Vapnik
data & Chervonenkis’ statistical learning theory in 1960s
It uses a nonlinear mapping to transform the original
Features: training can be slow but accuracy is high owing
training data into a higher dimension
to their ability to model complex nonlinear decision
With the new dimension, it searches for the linear optimal
separating hyperplane (i.e., “decision boundary”) boundaries (margin maximization)
With an appropriate nonlinear mapping to a sufficiently Used both for classification and prediction
high dimension, data from two classes can always be Applications:
separated by a hyperplane
handwritten digit recognition, object recognition,
SVM finds this hyperplane using support vectors
speaker identification, benchmarking time-series
(“essential” training tuples) and margins (defined by the
support vectors) prediction tests
95年10月17日星期二 Data Mining: Concepts and Techniques 59 95年10月17日星期二 Data Mining: Concepts and Techniques 60
SVM—General Philosophy SVM—Margins and Support Vectors

Small Margin Large Margin

Support Vectors

95年10月17日星期二 Data Mining: Concepts and Techniques 61 95年10月17日星期二 Data Mining: Concepts and Techniques 62

SVM—When Data Is Linearly Separable SVM—Linearly Separable

A separating hyperplane can be written as
W●X+b=0
where W={w1, w2, …, wn} is a weight vector and b a scalar (bias)
For 2-D it can be written as
w0 + w1 x1 + w2 x2 = 0
m
The hyperplane defining the sides of the margin:
H1: w0 + w1 x1 + w2 x2 ≥ 1 for yi = +1, and
H2: w0 + w1 x1 + w2 x2 ≤ – 1 for yi = –1
Let data D be (X1, y1), …, (X|D|, y|D|), where Xi is the set of training tuples
associated with the class labels yi Any training tuples that fall on hyperplanes H1 or H2 (i.e., the
There are infinite lines (hyperplanes) separating the two classes but we want to
sides defining the margin) are support vectors
find the best one (the one that minimizes classification error on unseen data) This becomes a constrained (convex) quadratic optimization
SVM searches for the hyperplane with the largest margin, i.e., maximum problem: Quadratic objective function and linear constraints
marginal hyperplane (MMH) Quadratic Programming (QP) Lagrangian multipliers

95年10月17日星期二 Data Mining: Concepts and Techniques 63 95年10月17日星期二 Data Mining: Concepts and Techniques 64
A2

Why Is SVM Effective on High Dimensional Data? SVM—Linearly Inseparable

The complexity of trained classifier is characterized by the # of A1
Transform the original input data into a higher
support vectors rather than the dimensionality of the data
dimensional space
The support vectors are the essential or critical training examples —
they lie closest to the decision boundary (MMH)
If all other training examples are removed and the training is repeated,
the same separating hyperplane would be found
The number of support vectors found can be used to compute an
(upper) bound on the expected error rate of the SVM classifier, which
is independent of the data dimensionality
Thus, an SVM with a small number of support vectors can have good
generalization, even when the dimensionality of the data is high Search for a linear separating hyperplane in the new space

95年10月17日星期二 Data Mining: Concepts and Techniques 65 95年10月17日星期二 Data Mining: Concepts and Techniques 66

SVM—Kernel functions Scaling SVM by Hierarchical Micro-Clustering

Instead of computing the dot product on the transformed data tuples, SVM is not scalable to the number of data objects in terms of training
it is mathematically equivalent to instead applying a kernel function
time and memory usage
K(Xi, Xj) to the original data, i.e., K(Xi, Xj) = Φ(Xi) Φ(Xj)
Typical Kernel Functions “Classifying Large Datasets Using SVMs with Hierarchical Clusters
Problem” by Hwanjo Yu, Jiong Yang, Jiawei Han, KDD’03
CB-SVM (Clustering-Based SVM)
Given limited amount of system resources (e.g., memory),
maximize the SVM performance in terms of accuracy and the
training speed
Use micro-clustering to effectively reduce the number of points to
SVM can also be used for classifying multiple (> 2) classes and for be considered
regression analysis (with additional user parameters) At deriving support vectors, de-cluster micro-clusters near
“candidate vector” to ensure high classification accuracy

95年10月17日星期二 Data Mining: Concepts and Techniques 67 95年10月17日星期二 Data Mining: Concepts and Techniques 68
CB-SVM: Clustering-Based SVM CF-Tree: Hierarchical Micro-cluster

Training data sets may not even fit in memory

Read the data set once (minimizing disk access)
Construct a statistical summary of the data (i.e., hierarchical
clusters) given a limited amount of memory
The statistical summary maximizes the benefit of learning SVM
The summary plays a role in indexing SVMs
Essence of Micro-clustering (Hierarchical indexing structure)
Use micro-cluster hierarchical indexing structure
provide finer samples closer to the boundary and coarser
samples farther from the boundary
Selective de-clustering to ensure high accuracy
95年10月17日星期二 Data Mining: Concepts and Techniques 69 95年10月17日星期二 Data Mining: Concepts and Techniques 70

CB-SVM Algorithm: Outline Selective Declustering

Construct two CF-trees from positive and negative data CF tree is a suitable base structure for selective declustering
sets independently De-cluster only the cluster Ei such that
Need one scan of the data set Di – Ri < Ds, where Di is the distance from the boundary to
the center point of Ei and Ri is the radius of Ei
Train an SVM from the centroids of the root entries
Decluster only the cluster whose subclusters have
De-cluster the entries near the boundary into the next possibilities to be the support cluster of the boundary
level
“Support cluster”: The cluster whose centroid is a
The children entries de-clustered from the parent support vector
entries are accumulated into the training set with the
non-declustered parent entries
Train an SVM again from the centroids of the entries in
the training set
Repeat until nothing is accumulated
95年10月17日星期二 Data Mining: Concepts and Techniques 71 95年10月17日星期二 Data Mining: Concepts and Techniques 72
Experiment on Synthetic Dataset Experiment on a Large Data Set

95年10月17日星期二 Data Mining: Concepts and Techniques 73 95年10月17日星期二 Data Mining: Concepts and Techniques 74

SVM vs. Neural Network SVM Related Links

SVM Neural Network SVM Website

Relatively new concept Relatively old http://www.kernel-machines.org/

Nondeterministic
Deterministic algorithm Representative implementations
algorithm

Nice Generalization Generalizes well but LIBSVM: an efficient implementation of SVM, multi-class
properties doesn’t have strong
classifications, nu-SVM, one-class SVM, including also various
Hard to learn – learned mathematical foundation
Can easily be learned in interfaces with java, python, etc.
in batch mode using
incremental fashion
quadratic programming SVM-light: simpler but performance is not better than LIBSVM,
To learn complex
techniques support only binary classification and only C language
functions—use multilayer
Using kernels can learn perceptron (not that SVM-torch: another recent implementation also written in C.
very complex functions trivial)
95年10月17日星期二 Data Mining: Concepts and Techniques 75 95年10月17日星期二 Data Mining: Concepts and Techniques 76
SVM—Introduction Literature Chapter 6. Classification and Prediction

“Statistical Learning Theory” by Vapnik: extremely hard to understand, What is classification? What is Support Vector Machines (SVM)
containing many errors too.
prediction? Associative classification
C. J. C. Burges. A Tutorial on Support Vector Machines for Pattern
Issues regarding classification Lazy learners (or learning from
Recognition. Knowledge Discovery and Data Mining, 2(2), 1998.

and prediction your neighbors)

Better than the Vapnik’s book, but still written too hard for
introduction, and the examples are so not-intuitive Classification by decision tree Other classification methods
The book “An Introduction to Support Vector Machines” by N. induction Prediction
Cristianini and J. Shawe-Taylor Bayesian classification Accuracy and error measures
Also written hard for introduction, but the explanation about the
Rule-based classification Ensemble methods
mercer’s theorem is better than above literatures
Classification by back Model selection
The neural network book by Haykins

propagation
Contains one nice chapter of SVM introduction Summary
95年10月17日星期二 Data Mining: Concepts and Techniques 77 95年10月17日星期二 Data Mining: Concepts and Techniques 78

Associative Classification Typical Associative Classification Methods

Associative classification CBA (Classification By Association: Liu, Hsu & Ma, KDD’98)
Association rules are generated and analyzed for use in classification Mine association possible rules in the form of
Search for strong associations between frequent patterns Cond-set (a set of attribute-value pairs) class label
(conjunctions of attribute-value pairs) and class labels Build classifier: Organize rules according to decreasing precedence
based on confidence and then support
Classification: Based on evaluating a set of rules in the form of
CMAR (Classification based on Multiple Association Rules: Li, Han, Pei, ICDM’01)
P1 ^ p2 … ^ pl “Aclass = C” (conf, sup) Classification: Statistical analysis on multiple rules
Why effective? CPAR (Classification based on Predictive Association Rules: Yin & Han, SDM’03)
It explores highly confident associations among multiple attributes Generation of predictive rules (FOIL-like analysis)
and may overcome some constraints introduced by decision-tree High efficiency, accuracy similar to CMAR
induction, which considers only one attribute at a time RCBT (Mining top-k covering rule groups for gene expression data, Cong et al. SIGMOD’05)
In many studies, associative classification has been found to be more Explore high-dimensional classification, using top-k rule groups
accurate than some traditional classification methods, such as C4.5 Achieve high classification accuracy and high run-time efficiency
95年10月17日星期二 Data Mining: Concepts and Techniques 79 95年10月17日星期二 Data Mining: Concepts and Techniques 80
Associative Classification May Achieve High
A Closer Look at CMAR
Accuracy and Efficiency (Cong et al. SIGMOD05)
CMAR (Classification based on Multiple Association Rules: Li, Han, Pei, ICDM’01)
Efficiency: Uses an enhanced FP-tree that maintains the distribution of
class labels among tuples satisfying each frequent itemset
Rule pruning whenever a rule is inserted into the tree
Given two rules, R1 and R2, if the antecedent of R1 is more general

than that of R2 and conf(R1) ≥ conf(R2), then R2 is pruned

Prunes rules for which the rule antecedent and class are not

positively correlated, based on a χ2 test of statistical significance

Classification based on generated/pruned rules
If only one rule satisfies tuple X, assign the class label of the rule

If a rule set S satisfies X, CMAR

divides S into groups according to class labels

2
uses a weighted χ measure to find the strongest group of rules,

based on the statistical correlation of rules within a group

assigns X the class label of the strongest group
95年10月17日星期二 Data Mining: Concepts and Techniques 81 95年10月17日星期二 Data Mining: Concepts and Techniques 82

Chapter 6. Classification and Prediction Lazy vs. Eager Learning

What is classification? What is Support Vector Machines (SVM) Lazy vs. eager learning
Lazy learning (e.g., instance-based learning): Simply
prediction? Associative classification

stores training data (or only minor processing) and
Issues regarding classification Lazy learners (or learning from waits until it is given a test tuple
and prediction Eager learning (the above discussed methods): Given a
your neighbors)
set of training set, constructs a classification model
Classification by decision tree Other classification methods before receiving new (e.g., test) data to classify
induction Prediction Lazy: less time in training but more time in predicting
Bayesian classification Accuracy
Accuracy and error measures
Lazy method effectively uses a richer hypothesis space
Rule-based classification Ensemble methods since it uses many local linear functions to form its
Classification by back implicit global approximation to the target function
Model selection
Eager: must commit to a single hypothesis that covers
propagation
Summary the entire instance space
95年10月17日星期二 Data Mining: Concepts and Techniques 83 95年10月17日星期二 Data Mining: Concepts and Techniques 84
Lazy Learner: Instance-Based Methods The k-Nearest Neighbor Algorithm
Instance-based learning: All instances correspond to points in the n-D space
Store training examples and delay the processing The nearest neighbor are defined in terms of
(“lazy evaluation”) until a new instance must be Euclidean distance, dist(X1, X2)
classified Target function could be discrete- or real- valued
Typical approaches For discrete-valued, k-NN returns the most common
k-nearest neighbor approach value among the k training examples nearest to xq
Instances represented as points in a Euclidean
Vonoroi diagram: the decision surface induced by 1-
space. NN for a typical set of training examples
Locally weighted regression

Constructs local approximation

Case-based reasoning
_
_
_ _
.
+
Uses symbolic representations and knowledge-
_ .
+
xq + . . .
based inference
95年10月17日星期二 Data Mining: Concepts and Techniques 85 95年10月17日星期二
_
+ .
Data Mining: Concepts and Techniques 86

Discussion on the k-NN Algorithm Case-Based Reasoning (CBR)

CBR: Uses a database of problem solutions to solve new problems
k-NN for real-valued prediction for a given unknown tuple
Store symbolic description (tuples or cases)—not points in a Euclidean
Returns the mean values of the k nearest neighbors space
Distance-weighted nearest neighbor algorithm Applications: Customer-service (product-related diagnosis), legal ruling
Weight the contribution of each of the k neighbors Methodology

according to their distance to the query xq w≡ Instances represented by rich symbolic descriptions (e.g., function
1
graphs)
Give greater weight to closer neighbors d (xq, x )2
i Search for similar cases, multiple retrieved cases may be combined
Robust to noisy data by averaging k-nearest neighbors Tight coupling between case retrieval, knowledge-based reasoning,
Curse of dimensionality: distance between neighbors could and problem solving
be dominated by irrelevant attributes Challenges
Find a good similarity metric
To overcome it, axes stretch or elimination of the least
Indexing based on syntactic similarity measure, and when failure,
relevant attributes
backtracking, and adapting to additional cases
95年10月17日星期二 Data Mining: Concepts and Techniques 87 95年10月17日星期二 Data Mining: Concepts and Techniques 88
Chapter 6. Classification and Prediction Genetic Algorithms (GA)

What is classification? What is Support Vector Machines (SVM) Genetic Algorithm: based on an analogy to biological evolution
prediction? An initial population is created consisting of randomly generated rules
Associative classification
Each rule is represented by a string of bits
Issues regarding classification Lazy learners (or learning from E.g., if A1 and ¬A2 then C2 can be encoded as 100
and prediction your neighbors) If an attribute has k > 2 values, k bits can be used
Classification by decision tree Other classification methods Based on the notion of survival of the fittest, a new population is
induction formed to consist of the fittest rules and their offsprings
Prediction
The fitness of a rule is represented by its classification accuracy on a
Bayesian classification Accuracy and error measures set of training examples
Rule-based classification Offsprings are generated by crossover and mutation
Ensemble methods
The process continues until a population P evolves when each rule in P
Classification by back Model selection
satisfies a prespecified threshold
propagation Slow but easily parallelizable
Summary

95年10月17日星期二 Data Mining: Concepts and Techniques 89 95年10月17日星期二 Data Mining: Concepts and Techniques 90

Fuzzy Set
Rough Set Approach
Approaches
Rough sets are used to approximately or “roughly” define
equivalent classes
A rough set for a given class C is approximated by two sets: a lower Fuzzy logic uses truth values between 0.0 and 1.0 to
approximation (certain to be in C) and an upper approximation represent the degree of membership (such as using
(cannot be described as not belonging to C)
fuzzy membership graph)
Attribute values are converted to fuzzy values
Finding the minimal subsets (reducts) of attributes for feature
e.g., income is mapped into the discrete categories
reduction is NP-hard but a discernibility matrix (which stores the
{low, medium, high} with fuzzy values calculated
differences between attribute values for each pair of data tuples) is
used to reduce the computation intensity For a given new sample, more than one fuzzy value may
apply
Each applicable rule contributes a vote for membership
in the categories
Typically, the truth values for each predicted category
are summed, and these sums are combined
95年10月17日星期二 Data Mining: Concepts and Techniques 91 95年10月17日星期二 Data Mining: Concepts and Techniques 92
Chapter 6. Classification and Prediction What Is Prediction?

What is classification? What is Support Vector Machines (SVM) (Numerical) prediction is similar to classification
construct a model
prediction? Associative classification use model to predict continuous or ordered value for a given input

Issues regarding classification Lazy learners (or learning from Prediction is different from classification
Classification refers to predict categorical class label
and prediction your neighbors)
Prediction models continuous-valued functions
Classification by decision tree Other classification methods Major method for prediction: regression
induction Prediction model the relationship between one or more independent or

predictor variables and a dependent or response variable

Bayesian classification Accuracy and error measures
Regression analysis
Rule-based classification Ensemble methods Linear and multiple regression

Non-linear regression
Classification by back Model selection Other regression methods: generalized linear model, Poisson
propagation regression, log-linear models, regression trees
Summary
95年10月17日星期二 Data Mining: Concepts and Techniques 93 95年10月17日星期二 Data Mining: Concepts and Techniques 94

Linear Regression Nonlinear Regression

Linear regression: involves a response variable y and a single Some nonlinear models can be modeled by a polynomial
predictor variable x function
y = w0 + w1 x A polynomial regression model can be transformed into
where w0 (y-intercept) and w1 (slope) are regression coefficients linear regression model. For example,
Method of least squares: estimates the best-fitting straight line y = w0 + w1 x + w2 x2 + w3 x3
| D|

∑ (x − x )( y i − y ) convertible to linear with new variables: x2 = x2, x3= x3

w = w = y −w x
i
i =1

1 |D|
0 1 y = w0 + w1 x + w2 x2 + w3 x3
∑ ( xi − x ) 2
i =1
Other functions, such as power function, can also be
Multiple linear regression: involves more than one predictor variable

transformed to linear model
Training data is of the form (X1, y1), (X2, y2),…, (X|D|, y|D|)
Some models are intractable nonlinear (e.g., sum of
Ex. For 2-D data, we may have: y = w0 + w1 x1+ w2 x2

exponential terms)
Solvable by extension of least square method or using SAS, S-Plus
possible to obtain least square estimates through
Many nonlinear functions can be transformed into the above
extensive calculation on more complex formulae
95年10月17日星期二 Data Mining: Concepts and Techniques 95 95年10月17日星期二 Data Mining: Concepts and Techniques 96
Other Regression-Based Models Regression Trees and Model Trees
Generalized linear model: Regression tree: proposed in CART system (Breiman et al. 1984)
Foundation on which linear regression can be applied to modeling CART: Classification And Regression Trees
categorical response variables
Each leaf stores a continuous-valued prediction
Variance of y is a function of the mean value of y, not a constant
Logistic regression: models the prob. of some event occurring as a It is the average value of the predicted attribute for the training
linear function of a set of predictor variables tuples that reach the leaf
Poisson regression: models the data that exhibit a Poisson Model tree: proposed by Quinlan (1992)
distribution
Each leaf holds a regression model—a multivariate linear equation
Log-linear models: (for categorical data)
for the predicted attribute
Approximate discrete multidimensional prob. distributions
A more general case than regression tree
Also useful for data compression and smoothing
Regression trees and model trees Regression and model trees tend to be more accurate than linear

Trees to predict continuous values rather than class labels regression when the data are not represented well by a simple linear
model
95年10月17日星期二 Data Mining: Concepts and Techniques 97 95年10月17日星期二 Data Mining: Concepts and Techniques 98

Predictive Modeling in Multidimensional Databases Prediction: Numerical Data

Predictive modeling: Predict data values or construct
generalized linear models based on the database data
One can only predict value ranges or category distributions
Method outline:
Minimal generalization
Attribute relevance analysis
Generalized linear model construction
Prediction
Determine the major factors which influence the prediction
Data relevance analysis: uncertainty measurement,

entropy analysis, expert judgement, etc.

Multi-level prediction: drill-down and roll-up analysis
95年10月17日星期二 Data Mining: Concepts and Techniques 99 95年10月17日星期二 Data Mining: Concepts and Techniques 100
Prediction: Categorical Data Chapter 6. Classification and Prediction

What is classification? What is Support Vector Machines (SVM)

prediction? Associative classification
Issues regarding classification Lazy learners (or learning from
and prediction your neighbors)
Classification by decision tree Other classification methods
induction Prediction
Bayesian classification Accuracy and error measures
Rule-based classification Ensemble methods
Classification by back Model selection
propagation
Summary
95年10月17日星期二 Data Mining: Concepts and Techniques 101 95年10月17日星期二 Data Mining: Concepts and Techniques 102

C1 C2
C1 True positive False negative
Classifier Accuracy Measures Predictor Error Measures
C2 False positive True negative

classes buy_computer = yes buy_computer = no total recognition(%)

Measure predictor accuracy: measure how far off the predicted value is
buy_computer = yes 6954 46 7000 99.34
from the actual known value
buy_computer = no 412 2588 3000 86.27
Loss function: measures the error betw. yi and the predicted value yi’
total 7366 2634 10000 95.52
Absolute error: | yi – yi’|
Accuracy of a classifier M, acc(M): percentage of test set tuples that are
correctly classified by the model M Squared error: (yi – yi’)2
Error rate (misclassification rate) of M = 1 – acc(M) Test error (generalization error):
d
the average loss over the test
d
set
Given m classes, CMi,j, an entry in a confusion matrix, indicates # Mean absolute error: ∑| y
i =1
i − yi ' | Mean squared error: ∑(y
i =1
i − yi ' ) 2
of tuples in class i that are labeled by the classifier as class j d d d
Alternative accuracy measures (e.g., for cancer diagnosis) Relative absolute error: ∑ | y
d

i − y i '|
Relative squared error:
∑ ( y − y ')
i =1
i i
2

i =1
sensitivity = t-pos/pos /* true positive recognition rate */ d d
∑|y i − y |
∑ ( y − y) i
2

specificity = t-neg/neg /* true negative recognition rate */ i =1

i =1
The mean squared-error exaggerates the presence of outliers
precision = t-pos/(t-pos + f-pos)
Popularly use (square) root mean-square error, similarly, root relative
accuracy = sensitivity * pos/(pos + neg) + specificity * neg/(pos + neg)
squared error
This model can also be used for cost-benefit analysis
95年10月17日星期二 Data Mining: Concepts and Techniques 103 95年10月17日星期二 Data Mining: Concepts and Techniques 104
Evaluating the Accuracy of a Classifier Evaluating the Accuracy of a Classifier
or Predictor (I) or Predictor (II)
Holdout method Bootstrap
Given data is randomly partitioned into two independent sets Works well with small data sets
Training set (e.g., 2/3) for model construction
Samples the given training tuples uniformly with replacement
Test set (e.g., 1/3) for accuracy estimation
i.e., each time a tuple is selected, it is equally likely to be
Random sampling: a variation of holdout
selected again and re-added to the training set
Repeat holdout k times, accuracy = avg. of the accuracies
Several boostrap methods, and a common one is .632 boostrap
obtained
Suppose we are given a data set of d tuples. The data set is sampled d
Cross-validation (k-fold, where k = 10 is most popular) times, with replacement, resulting in a training set of d samples. The data
Randomly partition the data into k mutually exclusive subsets, tuples that did not make it into the training set end up forming the test set.
each approximately equal size About 63.2% of the original data will end up in the bootstrap, and the
At i-th iteration, use Di as test set and others as training set remaining 36.8% will form the test set (since (1 – 1/d)d ≈ e-1 = 0.368)
Leave-one-out: k folds where k = # of tuples, for small sized data Repeat the sampling procedue k times, overall accuracy of the
Stratified cross-validation: folds are stratified so that class dist. in model: k
acc( M ) = ∑ (0.632 × acc( M i ) test _ set +0.368 × acc( M i ) train _ set )
each fold is approx. the same as that in the initial data i =1

95年10月17日星期二 Data Mining: Concepts and Techniques 105 95年10月17日星期二 Data Mining: Concepts and Techniques 106

Chapter 6. Classification and Prediction Ensemble Methods: Increasing the Accuracy

What is classification? What is Support Vector Machines (SVM)

prediction? Associative classification
Issues regarding classification Lazy learners (or learning from Ensemble methods
and prediction your neighbors) Use a combination of models to increase accuracy

Classification by decision tree Other classification methods Combine a series of k learned models, M1, M2, …, Mk,

induction with the aim of creating an improved model M*

Prediction
Popular ensemble methods
Bayesian classification Accuracy and error measures
Bagging: averaging the prediction over a collection of

Rule-based classification Ensemble methods classifiers

Classification by back Boosting: weighted vote with a collection of classifiers
Model selection
propagation Ensemble: combining a set of heterogeneous classifiers
Summary
95年10月17日星期二 Data Mining: Concepts and Techniques 107 95年10月17日星期二 Data Mining: Concepts and Techniques 108
Bagging: Boostrap Aggregation Boosting
Analogy: Diagnosis based on multiple doctors’ majority vote Analogy: Consult several doctors, based on a combination of weighted
Training diagnoses—weight assigned based on the previous diagnosis accuracy
Given a set D of d tuples, at each iteration i, a training set Di of d How boosting works?
tuples is sampled with replacement from D (i.e., boostrap) Weights are assigned to each training tuple
A classifier model Mi is learned for each training set Di
A series of k classifiers is iteratively learned
Classification: classify an unknown sample X
After a classifier Mi is learned, the weights are updated to allow the
Each classifier Mi returns its class prediction
subsequent classifier, Mi+1, to pay more attention to the training
The bagged classifier M* counts the votes and assigns the class
tuples that were misclassified by Mi
with the most votes to X
The final M* combines the votes of each individual classifier, where
Prediction: can be applied to the prediction of continuous values by
the weight of each classifier's vote is a function of its accuracy
taking the average value of each prediction for a given test tuple
Accuracy The boosting algorithm can be extended for the prediction of
continuous values
Often significant better than a single classifier derived from D

For noise data: not considerably worse, more robust

Comparing with bagging: boosting tends to achieve greater accuracy,
but it also risks overfitting the model to misclassified data
Proved improved accuracy in prediction
95年10月17日星期二 Data Mining: Concepts and Techniques 109 95年10月17日星期二 Data Mining: Concepts and Techniques 110

Adaboost (Freund and Schapire, 1997) Chapter 6. Classification and Prediction

Given a set of d class-labeled tuples, (X1, y1), …, (Xd, yd)
What is classification? What is Support Vector Machines (SVM)
Initially, all the weights of tuples are set the same (1/d)
Generate k classifiers in k rounds. At round i, prediction? Associative classification
Tuples from D are sampled (with replacement) to form a Issues regarding classification
training set Di of the same size
Lazy learners (or learning from
Each tuple’s chance of being selected is based on its weight and prediction your neighbors)
A classification model Mi is derived from Di
Classification by decision tree Other classification methods
Its error rate is calculated using Di as a test set
If a tuple is misclssified, its weight is increased, o.w. it is induction Prediction
decreased
Bayesian classification Accuracy and error measures
Error rate: err(Xj) is the misclassification error of tuple Xj. Classifier

Mi error rate is the sum of the weights of the misclassified tuples: Rule-based classification
d Ensemble methods
error ( M i ) = ∑ w j × err ( X j )
j Classification by back Model selection
1 − error ( M i )
The weight of classifier Mi’s vote is log propagation
error ( M i ) Summary
95年10月17日星期二 Data Mining: Concepts and Techniques 111 95年10月17日星期二 Data Mining: Concepts and Techniques 112
Chapter 6. Classification and Prediction Summary (I)
Classification and prediction are two forms of data analysis that can
What is classification? What is Support Vector Machines (SVM)
be used to extract models describing important data classes or to
prediction? Associative classification predict future data trends.
Issues regarding classification Lazy learners (or learning from Effective and scalable methods have been developed for decision
and prediction your neighbors) trees induction, Naive Bayesian classification, Bayesian belief network,
rule-based classifier, Backpropagation, Support Vector Machine (SVM),
Classification by decision tree Other classification methods

associative classification, nearest neighbor classifiers, and case-based
induction Prediction reasoning, and other classification methods such as genetic
Bayesian classification algorithms, rough set and fuzzy set approaches.
Accuracy and error measures
Linear, nonlinear, and generalized linear models of regression can be
Rule-based classification Ensemble methods
used for prediction. Many nonlinear problems can be converted to
Classification by back Model selection linear problems by performing transformations on the predictor
propagation variables. Regression trees and model trees are also used for
Summary prediction.
95年10月17日星期二 Data Mining: Concepts and Techniques 113 95年10月17日星期二 Data Mining: Concepts and Techniques 114

Summary (II) References (1)

C. Apte and S. Weiss. Data mining with decision trees and decision rules. Future
Stratified k-fold cross-validation is a recommended method for

Generation Computer Systems, 13, 1997.

accuracy estimation. Bagging and boosting can be used to increase C. M. Bishop, Neural Networks for Pattern Recognition. Oxford University Press,
overall accuracy by learning and combining a series of individual 1995.
L. Breiman, J. Friedman, R. Olshen, and C. Stone. Classification and Regression Trees.
models.
Wadsworth International Group, 1984.
Significance tests and ROC curves are useful for model selection C. J. C. Burges. A Tutorial on Support Vector Machines for Pattern Recognition.
Data Mining and Knowledge Discovery, 2(2): 121-168, 1998.
There have been numerous comparisons of the different classification
P. K. Chan and S. J. Stolfo. Learning arbiter and combiner trees from partitioned
and prediction methods, and the matter remains a research topic data for scaling machine learning. KDD'95.
W. Cohen. Fast effective rule induction. ICML'95.
No single method has been found to be superior over all others for all
G. Cong, K.-L. Tan, A. K. H. Tung, and X. Xu. Mining top-k covering rule groups for
data sets gene expression data. SIGMOD'05.
Issues such as accuracy, training time, robustness, interpretability, and A. J. Dobson. An Introduction to Generalized Linear Models. Chapman and Hall,
1990.
scalability must be considered and can involve trade-offs, further G. Dong and J. Li. Efficient mining of emerging patterns: Discovering trends and
complicating the quest for an overall superior method differences. KDD'99.

95年10月17日星期二 Data Mining: Concepts and Techniques 115 95年10月17日星期二 Data Mining: Concepts and Techniques 116
References (2) References (3)
R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification, 2ed. John Wiley and T.-S. Lim, W.-Y. Loh, and Y.-S. Shih. A comparison of prediction accuracy,
Sons, 2001
complexity, and training time of thirty-three old and new classification
U. M. Fayyad. Branching on attribute values in decision tree generation. AAAI’94. algorithms. Machine Learning, 2000.
Y. Freund and R. E. Schapire. A decision-theoretic generalization of on-line
J. Magidson. The Chaid approach to segmentation modeling: Chi-squared
learning and an application to boosting. J. Computer and System Sciences, 1997.
automatic interaction detection. In R. P. Bagozzi, editor, Advanced Methods of
J. Gehrke, R. Ramakrishnan, and V. Ganti. Rainforest: A framework for fast decision
Marketing Research, Blackwell Business, 1994.
tree construction of large datasets. VLDB’98.
M. Mehta, R. Agrawal, and J. Rissanen. SLIQ : A fast scalable classifier for data
J. Gehrke, V. Gant, R. Ramakrishnan, and W.-Y. Loh, BOAT -- Optimistic Decision Tree
Construction. SIGMOD'99. mining. EDBT'96.

T. Hastie, R. Tibshirani, and J. Friedman. The Elements of Statistical Learning: Data T. M. Mitchell. Machine Learning. McGraw Hill, 1997.
Mining, Inference, and Prediction. Springer-Verlag, 2001. S. K. Murthy, Automatic Construction of Decision Trees from Data: A Multi-
D. Heckerman, D. Geiger, and D. M. Chickering. Learning Bayesian networks: The Disciplinary Survey, Data Mining and Knowledge Discovery 2(4): 345-389, 1998
combination of knowledge and statistical data. Machine Learning, 1995. J. R. Quinlan. Induction of decision trees. Machine Learning, 1:81-106, 1986.
M. Kamber, L. Winstone, W. Gong, S. Cheng, and J. Han. Generalization and decision J. R. Quinlan and R. M. Cameron-Jones. FOIL: A midterm report. ECML’93.
tree induction: Efficient classification in data mining. RIDE'97.
J. R. Quinlan. C4.5: Programs for Machine Learning. Morgan Kaufmann, 1993.
B. Liu, W. Hsu, and Y. Ma. Integrating Classification and Association Rule. KDD'98.
J. R. Quinlan. Bagging, boosting, and c4.5. AAAI'96.
W. Li, J. Han, and J. Pei, CMAR: Accurate and Efficient Classification Based on
Multiple Class-Association Rules, ICDM'01.
95年10月17日星期二 Data Mining: Concepts and Techniques 117 95年10月17日星期二 Data Mining: Concepts and Techniques 118

References (4)
R. Rastogi and K. Shim. Public: A decision tree classifier that integrates building
and pruning. VLDB’98.
J. Shafer, R. Agrawal, and M. Mehta. SPRINT : A scalable parallel classifier for
data mining. VLDB’96.
J. W. Shavlik and T. G. Dietterich. Readings in Machine Learning. Morgan Kaufmann,
1990.
P. Tan, M. Steinbach, and V. Kumar. Introduction to Data Mining. Addison Wesley,
2005.
S. M. Weiss and C. A. Kulikowski. Computer Systems that Learn: Classification
and Prediction Methods from Statistics, Neural Nets, Machine Learning, and
Expert Systems. Morgan Kaufman, 1991.
S. M. Weiss and N. Indurkhya. Predictive Data Mining. Morgan Kaufmann, 1997.
I. H. Witten and E. Frank. Data Mining: Practical Machine Learning Tools and
Techniques, 2ed. Morgan Kaufmann, 2005.
X. Yin and J. Han. CPAR: Classification based on predictive association rules.
SDM'03
H. Yu, J. Yang, and J. Han. Classifying large data sets using SVM with
hierarchical clusters. KDD'03.
95年10月17日星期二 Data Mining: Concepts and Techniques 119

Data Mining: Concepts and Techniques: - Chapter 6
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 6
129 pages
Classification and Prediction
No ratings yet
Classification and Prediction
134 pages
DM Ch6 (Classification and Prediction)
No ratings yet
DM Ch6 (Classification and Prediction)
39 pages
Data Mining: Concepts and Techniques
100% (2)
Data Mining: Concepts and Techniques
139 pages
Data Mining: Concepts and Techniques: - Chapter 6
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 6
172 pages
Data Mining: Concepts and Techniques: - Chapter 6
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 6
129 pages
Chapter 6 - : Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign
No ratings yet
Chapter 6 - : Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign
129 pages
Chapter4 Classification Prediction
No ratings yet
Chapter4 Classification Prediction
173 pages
Classification - Prediction Data Model Very Important
No ratings yet
Classification - Prediction Data Model Very Important
173 pages
Data Mining: Concepts and Techniques: - Chapter 6
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 6
115 pages
Chapter3 Classification and Prediction
No ratings yet
Chapter3 Classification and Prediction
63 pages
Classification & Prediction in Data Mining
No ratings yet
Classification & Prediction in Data Mining
112 pages
Classification and Prediction
No ratings yet
Classification and Prediction
126 pages
7 Class
No ratings yet
7 Class
72 pages
Data Mining for Analysts
No ratings yet
Data Mining for Analysts
38 pages
Chapter 5. Classification and Prediction
No ratings yet
Chapter 5. Classification and Prediction
122 pages
7 Class
No ratings yet
7 Class
72 pages
DM Classification 1 3
No ratings yet
DM Classification 1 3
19 pages
Chap 7
No ratings yet
Chap 7
71 pages
Unit V Classification
No ratings yet
Unit V Classification
69 pages
Classification and Prediction
No ratings yet
Classification and Prediction
130 pages
Slides For Textbook - Chapter 7 - : March 6, 2014 Data Mining: Concepts and Techniques 1
No ratings yet
Slides For Textbook - Chapter 7 - : March 6, 2014 Data Mining: Concepts and Techniques 1
23 pages
Chapter 6 Classification and Prediction25.10.13
No ratings yet
Chapter 6 Classification and Prediction25.10.13
43 pages
Chapter 7. Classification and Prediction
No ratings yet
Chapter 7. Classification and Prediction
68 pages
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 7
No ratings yet
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 7
88 pages
Classification
No ratings yet
Classification
36 pages
Classfication and Prediction
No ratings yet
Classfication and Prediction
133 pages
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 7
No ratings yet
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 7
72 pages
Lecture 3.1.1
No ratings yet
Lecture 3.1.1
17 pages
Classification & Prediction
No ratings yet
Classification & Prediction
78 pages
Classification Techniques Overview
No ratings yet
Classification Techniques Overview
141 pages
Classification Lecture 1
No ratings yet
Classification Lecture 1
51 pages
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 7
No ratings yet
Data Mining: Concepts and Techniques: - Slides For Textbook - Chapter 7
88 pages
Data Mining: UNIT-3 Classification
No ratings yet
Data Mining: UNIT-3 Classification
54 pages
Classification
No ratings yet
Classification
20 pages
08 Class Basic
No ratings yet
08 Class Basic
103 pages
DM Unit-3
No ratings yet
DM Unit-3
46 pages
Classification and Prediction
No ratings yet
Classification and Prediction
40 pages
Lecture 10
No ratings yet
Lecture 10
53 pages
Lecture 8
No ratings yet
Lecture 8
81 pages
Class Basic
No ratings yet
Class Basic
67 pages
Classification Prediction
No ratings yet
Classification Prediction
71 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
88 pages
DMDW 11 Classification Basic
No ratings yet
DMDW 11 Classification Basic
41 pages
Classification
No ratings yet
Classification
45 pages
DAMI 011114a
No ratings yet
DAMI 011114a
48 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
17 pages
7 Classification
100% (3)
7 Classification
63 pages
Updated DM Unit 3
No ratings yet
Updated DM Unit 3
28 pages
Classification & Prediction
No ratings yet
Classification & Prediction
24 pages
DM Chapter 4
No ratings yet
DM Chapter 4
47 pages
A Data Mining Query Language
No ratings yet
A Data Mining Query Language
69 pages
Classification - Basic Concepts
No ratings yet
Classification - Basic Concepts
35 pages
DWM - Module 3
No ratings yet
DWM - Module 3
22 pages
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
No ratings yet
Classification and Prediction Lecture-22,23,24,25,26,27, 28: Dr. Sudhir Sharma Manipal University Jaipur
43 pages
Module - 4.1-DM-1
No ratings yet
Module - 4.1-DM-1
63 pages
Class Basic
No ratings yet
Class Basic
75 pages
Table Layout Examples
No ratings yet
Table Layout Examples
9 pages
HP-UX LVM Glossary & Procedures
No ratings yet
HP-UX LVM Glossary & Procedures
5 pages
Database Concepts With Emerging Trends
No ratings yet
Database Concepts With Emerging Trends
104 pages
2nd Last File
No ratings yet
2nd Last File
20 pages
Exam Objectives - Red Hat Certified System Administrator Exam (EX200v7) Objectives
No ratings yet
Exam Objectives - Red Hat Certified System Administrator Exam (EX200v7) Objectives
4 pages
Prajan AI
No ratings yet
Prajan AI
25 pages
Deloitte Uk Automating Vat Returns Vertex
100% (1)
Deloitte Uk Automating Vat Returns Vertex
38 pages
Search Engines: by Bhaswanth 16311A0507
No ratings yet
Search Engines: by Bhaswanth 16311A0507
23 pages
Rancg ICTL F1 2010
No ratings yet
Rancg ICTL F1 2010
2 pages
Workcenter Wise Inspection Lot
No ratings yet
Workcenter Wise Inspection Lot
4 pages
C++ Programming: From Problem Analysis To Program Design: Chapter 6: User-Defined Functions I
No ratings yet
C++ Programming: From Problem Analysis To Program Design: Chapter 6: User-Defined Functions I
42 pages
Modeling and Simulation Lab
No ratings yet
Modeling and Simulation Lab
57 pages
Resource Allocation Example - INF3708
No ratings yet
Resource Allocation Example - INF3708
5 pages
A+ Sample Review
No ratings yet
A+ Sample Review
8 pages
Decommissioning Technician PDF
No ratings yet
Decommissioning Technician PDF
2 pages
A Comparative Analysis of
No ratings yet
A Comparative Analysis of
4 pages
B.Tech Exam: Data Structures April 2018
No ratings yet
B.Tech Exam: Data Structures April 2018
2 pages
BIOS Password Backdoors in Laptops
0% (1)
BIOS Password Backdoors in Laptops
3 pages
A Reaper at The Gates An Ember in The Ashes 3 PDF EPUB MOBI Read Download Sabaa Tahir
No ratings yet
A Reaper at The Gates An Ember in The Ashes 3 PDF EPUB MOBI Read Download Sabaa Tahir
3 pages
Chaoscope Tutorial
100% (1)
Chaoscope Tutorial
3 pages
EHR - Bussiness Analyst
No ratings yet
EHR - Bussiness Analyst
4 pages
Black-Box Testing Techniques
100% (1)
Black-Box Testing Techniques
6 pages
X.509 Certificate Trust Models
No ratings yet
X.509 Certificate Trust Models
15 pages
DBA Cockpit For SAP HANA
No ratings yet
DBA Cockpit For SAP HANA
8 pages
Ames Housing Price Prediction - Complete ML Project With Python
No ratings yet
Ames Housing Price Prediction - Complete ML Project With Python
14 pages
Resume - Sudhanshu Tiwari
No ratings yet
Resume - Sudhanshu Tiwari
3 pages
SAP Activate: Methodology Walkthrough For Project Managers 8.11.2018
No ratings yet
SAP Activate: Methodology Walkthrough For Project Managers 8.11.2018
75 pages
6to Lab QUI II
No ratings yet
6to Lab QUI II
6 pages
Mutual Exclusion Problem Solved
No ratings yet
Mutual Exclusion Problem Solved
3 pages