DM Module 4

MODULE 4
CLASSIFICATION
Classification is a task in data mining that involves assigning a class label to each instance in a
dataset based on its features. The goal of classification is to build a model that accurately
predicts the class labels of new instances based on their features.
There are two main types of classification: binary classification and multi-class classification.
Binary classification involves classifying instances into two classes, such as “spam” or “not
spam”, while multi-class classification involves classifying instances into more than two
classes.
Classification is a widely used technique in data mining and is applied in a variety of domains,
such as email filtering, sentiment analysis, and medical diagnosis.
Classification: It is a data analysis task, i.e. the process of finding a model that describes and
distinguishes data classes and concepts. Classification is the problem of identifying to which of
a set of categories (subpopulations), a new observation belongs to, on the basis of a training set
of data containing observations and whose categories membership is known.
Example: Before starting any project, we need to check its feasibility. In this case, a classifier
is required to predict class labels such as ‘Safe’ and ‘Risky’ for adopting the Project and to
further approve it. It is a two-step process such as:
1. Learning Step (Training Phase): Construction of Classification Model
Different Algorithms are used to build a classifier by making the model learn using the
training set available. The model has to be trained for the prediction of accurate results.
2. Classification Step: Model used to predict class labels and testing the constructed model on
test data and hence estimate the accuracy of the classification rules.
Classification Algorithms:
DECISION TREE INDUCTION
Decision Tree is a supervised learning method used in data mining for classification and
regression methods. It is a tree that helps us in decision-making purposes. The decision tree
creates classification or regression models as a tree structure. It separates a data set into
smaller subsets, and at the same time, the decision tree is steadily developed. The final tree is a
tree with the decision nodes and leaf nodes. A decision node has at least two branches. The
leaf nodes show a classification or decision. We can't accomplish more split on leaf nodes-The
uppermost decision node in a tree that relates to the best predictor called the root node.
Decision trees can deal with both categorical and numerical data.
Key factors:
Entropy:
Entropy refers to a common way to measure impurity. In the decision tree, it measures
the randomness or impurity in data sets.
Information Gain:
Information Gain refers to the decline in entropy after the dataset is split. It is also
called Entropy Reduction. Building a decision tree is all about discovering attributes
that return the highest data gain.
We can say that a decision tree is a hierarchical tree structure that can be used to split an
extensive collection of records into smaller sets of the class by implementing a sequence of
simple decision rules. A decision tree model comprises a set of rules for portioning a huge
heterogeneous population into smaller, more homogeneous, or mutually exclusive classes. The
attributes of the classes can be any variables from nominal, ordinal, binary, and quantitative
values, in contrast, the classes must be a qualitative type, such as categorical or ordinal or binary.
In brief, the given data of attributes together with its class, a decision tree creates a set of rules
that can be used to identify the class. One rule is implemented after another, resulting in a
hierarchy of segments within a segment. The hierarchy is known as the tree, and each segment is
called a node.
The following decision tree is for the concept buy_computer that indicates whether a customer at
a company is likely to buy a computer or not. Each internal node represents a test on an attribute.
Each leaf node represents a class.
The benefits of having a decision tree are as follows −
 It does not require any domain knowledge.

 It is easy to comprehend.
 The learning and classification steps of a decision tree are simple and fast.
Decision Tree Induction Algorithm
A machine researcher named J. Ross Quinlan in 1980 developed a decision tree algorithm known
as ID3 (Iterative Dichotomiser). Later, he presented C4.5, which was the successor of ID3. ID3
and C4.5 adopt a greedy approach. In this algorithm, there is no backtracking; the trees are
constructed in a top-down recursive divide-and-conquer manner.
Generating a decision tree form training tuples of data partition D

Algorithm : Generate_decision_tree
Input:
Data partition, D, which is a set of training tuples
and their associated class labels.
attribute_list, the set of candidate attributes.
Attribute selection method, a procedure to determine the
splitting criterion that best partitions that the data
tuples into individual classes. This criterion includes a
splitting_attribute and either a splitting point or splitting subset.
Output:
A Decision Tree
Method
create a node N;
if tuples in D are all of the same class, C then

return N as leaf node labeled with class C;
if attribute_list is empty then

return N as leaf node with labeled
with majority class in D;|| majority voting
apply attribute_selection_method(D, attribute_list)

to find the best splitting_criterion;
label node N with splitting_criterion;
if splitting_attribute is discrete-valued and

multiway splits allowed then // no restricted to binary trees
attribute_list = splitting attribute; // remove splitting attribute

for each outcome j of splitting criterion
// partition the tuples and grow subtrees for each partition
let Dj be the set of data tuples in D satisfying outcome j; // a partition
if Dj is empty then
attach a leaf labeled with the majority
class in D to node N;
else
attach the node returned by Generate
decision tree(Dj, attribute list) to node N;
end for
return N;
BAYES CLASSIFICATION
Bayesian classification uses Bayes theorem to predict the occurrence of any event. Bayesian
classifiers are the statistical classifiers with the Bayesian probability understandings. The theory
expresses how a level of belief, expressed as a probability.
Bayes theorem came into existence after Thomas Bayes, who first utilized conditional
probability to provide an algorithm that uses evidence to calculate limits on an unknown
parameter.
Bayes's theorem is expressed mathematically by the following equation that is given below.
Where X and Y are the events and P (Y) ≠ 0
P(X/Y) is a conditional probability that describes the occurrence of event X is given that Y is
true.
P(Y/X) is a conditional probability that describes the occurrence of event Y is given that X is
true.
P(X) and P(Y) are the probabilities of observing X and Y independently of each other. This is
known as the marginal probability.
For proposition X and evidence Y,
o P(X), the prior, is the primary degree of belief in X

o P(X/Y), the posterior is the degree of belief having accounted for Y.
o The quotient represents the supports Y provides for X.
Bayes theorem can be derived from the conditional probability:
Where P (X⋂Y) is the joint probability of both X and Y being true, because
RULE BASED CLASSIFICATION
Rule-based classifiers are just another type of classifier which makes the class decision
depending by using various “if..else” rules. These rules are easily interpretable and thus these
classifiers are generally used to generate descriptive models. The condition used with “if” is
called the antecedent and the predicted class of each rule is called the consequent.
Properties of rule-based classifiers:
 Coverage: The percentage of records which satisfy the antecedent conditions of a particular
rule.
 The rules generated by the rule-based classifiers are generally not mutually exclusive, i.e.
many rules can cover the same record.
 The rules generated by the rule-based classifiers may not be exhaustive, i.e. there may be
some records which are not covered by any of the rules.
 The decision boundaries created by them is linear, but these can be much more complex than
the decision tree because the many rules are triggered for the same record.
IF-THEN Rules
Rule-based classifier makes use of a set of IF-THEN rules for classification. We can express a
rule in the following from −
IF condition THEN conclusion
Let us consider a rule R1,
R1: IF age = youth AND student = yes

THEN buy_computer = yes
Points to remember −
 The IF part of the rule is called rule antecedent or precondition.

 The THEN part of the rule is called rule consequent.
 The antecedent part the condition consist of one or more attribute tests and these tests are
logically ANDed.
 The consequent part consists of class prediction.
Note − We can also write rule R1 as follows −
R1: (age = youth) ^ (student = yes))(buys computer = yes)
If the condition holds true for a given tuple, then the antecedent is satisfied.
KNN ALGORITHM(LAZY LEARNER ALG)
o K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on

Supervised Learning technique.
o K-NN algorithm assumes the similarity between the new case/data and available cases
and put the new case into the category that is most similar to the available categories.
o K-NN algorithm stores all the available data and classifies a new data point based on the
similarity. This means when new data appears then it can be easily classified into a well
suite category by using K- NN algorithm.
o K-NN algorithm can be used for Regression as well as for Classification but mostly it is
used for the Classification problems.
o K-NN is a non-parametric algorithm, which means it does not make any assumption on
underlying data.
o It is also called a lazy learner algorithm because it does not learn from the training set
immediately instead it stores the dataset and at the time of classification, it performs an
action on the dataset.
o KNN algorithm at the training phase just stores the dataset and when it gets new data,
then it classifies that data into a category that is much similar to the new data.
o Example: Suppose, we have an image of a creature that looks similar to cat and dog, but
we want to know either it is a cat or dog. So for this identification, we can use the KNN
algorithm, as it works on a similarity measure. Our KNN model will find the similar
features of the new data set to the cats and dogs images and based on the most similar
features it will put it in either cat or dog category.
The K-NN working can be explained on the basis of the below algorithm:
o Step-1: Select the number K of the neighbors

o Step-2: Calculate the Euclidean distance of K number of neighbors
o Step-3: Take the K nearest neighbors as per the calculated Euclidean distance.
o Step-4: Among these k neighbors, count the number of the data points in each category.
o Step-5: Assign the new data points to that category for which the number of the neighbor
is maximum.
o Step-6: Our model is ready.
Suppose we have a new data point and we need to put it in the required category. Consider the
below image:
o Firstly, we will choose the number of neighbors, so we will choose the k=5.
o Next, we will calculate the Euclidean distance between the data points. The Euclidean
distance is the distance between two points, which we have already studied in geometry.
It can be calculated as:
o By calculating the Euclidean distance we got the nearest neighbors, as three nearest
neighbors in category A and two nearest neighbors in category B. Consider the below
image:
o As we can see the 3 nearest neighbors are from category A, hence this new data point
must belong to category A.
Rule Extraction
Here we will learn how to build a rule-based classifier by extracting IF-THEN rules from a
decision tree.
To extract a rule from a decision tree −
 One rule is created for each path from the root to the leaf node.
 To form a rule antecedent, each splitting criterion is logically AND.
 The leaf node holds the class prediction, forming the rule consequent.
ACCURACY, PRECISION AND RECALL
Confusion matrix
In binary classification, there are two possible target classes, which are typically labeled as
"positive" and "negative" or "1" and "0". In our spam example above, the target (positive class)
is "spam," and the negative class is "not spam."
When evaluating the accuracy, we looked at correct and wrong predictions disregarding the class
label. However, in binary classification, we can be "correct" and "wrong" in two different ways.
Correct predictions include so-called true positives and true negatives. This is how it unpacks
for our spam use case example:
 True positive (TP): An email that is actually spam and is correctly classified by the
model as spam.
 True negative (TN): An email that is actually not spam and is correctly classified by the
model as not spam.
Model errors include so-called false positives and false negatives. In our example:
 False Positive (FP): An email that is actually not spam but is incorrectly classified by the
model as spam (a "false alarm").
 False Negative (FN): An email that is actually spam but is incorrectly classified by the
model as not spam (a "missed spam").
Using the confusion matrix, you can visualize all 4 different outcomes in a single table.
Accuracy
Accuracy is a metric that measures how often a machine learning model correctly predicts the
outcome. You can calculate accuracy by dividing the number of correct predictions by the
total number of predictions.
Precision
Precision is a metric that measures how often a machine learning model correctly predicts the
positive class. You can calculate precision by dividing the number of correct positive predictions
(true positives) by the total number of instances the model predicted as positive (both true and
false positives).
Recall
Recall is a metric that measures how often a machine learning model correctly identifies positive
instances (true positives) from all the actual positive samples in the dataset. You can calculate
recall by dividing the number of true positives by the number of positive instances. The latter
includes true positives (successfully identified cases) and false negative results (missed cases).

DM Module 4

Uploaded by

Document Informationclick to expand document informationHer he is juj

Document Informationclick to expand document information

Copyright:

Available Formats

DM Module 4

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

DM Module 4

Uploaded by

Copyright:

Available Formats

MODULE 4

The benefits of having a decision tree are as follows −

 It does not require any domain knowledge.

Decision Tree Induction Algorithm

Generating a decision tree form training tuples of data partition D

if tuples in D are all of the same class, C then

if attribute_list is empty then

apply attribute_selection_method(D, attribute_list)

if splitting_attribute is discrete-valued and

attribute_list = splitting attribute; // remove splitting attribute

Where X and Y are the events and P (Y) ≠ 0

For proposition X and evidence Y,

o P(X), the prior, is the primary degree of belief in X

o The quotient represents the supports Y provides for X.

Bayes theorem can be derived from the conditional probability:

RULE BASED CLASSIFICATION

IF condition THEN conclusion

Let us consider a rule R1,

R1: IF age = youth AND student = yes

 The IF part of the rule is called rule antecedent or precondition.

Note − We can also write rule R1 as follows −

R1: (age = youth) ^ (student = yes))(buys computer = yes)

o K-Nearest Neighbour is one of the simplest Machine Learning algorithms based on

o Step-1: Select the number K of the neighbors

To extract a rule from a decision tree −

ACCURACY, PRECISION AND RECALL

You might also like