0% found this document useful (0 votes)

5 views67 pages

Unit5 Ch8 - DataMiningClassification

Machine Learning is a branch of Artificial Intelligence that enables computers to learn from data without explicit programming. It includes techniques such as supervised, unsupervised, semi-supervised, and reinforcement learning, with applications in various fields like finance, marketing, and healthcare. Key concepts include classification, numeric prediction, and decision tree algorithms, which help in making predictions based on training data.

Uploaded by

akashilay0701

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views67 pages

Unit5 Ch8 - DataMiningClassification

Uploaded by

akashilay0701

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 67

Machine Learning

❖Machine Learning is the field of study that gives computers the capability to learn without
being explicitly programmed
❖The primary aim is to allow the computers learn automatically without human intervention or
assistance and adjust actions accordingly
❖It is a branch of Artificial Intelligence
❖Machine Learning Techniques
❖Supervised learning
❖Unsupervised learning
❖Semi-supervised learning
❖Re-inforcement machine learning

12/17/2022 MIT-WPU, DMDW 2

Terms
❑Class label attribute
❑In a dataset one column is a class label which classifies the tuple into categories.
❑Example of class labels – red/blue/green , bird/animal, safe/risky, yes/no
❑Labelled dataset - Dataset in which every tuple contains class label attribute
❑Training data - From the dataset, major portion of data is fed to the classification
algorithm . This data is labelled which enables the algorithm to construct a model or a
classifier

12/17/2022 MIT-WPU, DMDW 3

Terms
❑Test data (Validation set) - From the dataset, remaining portion of data is used to test the
classifier. The results are compared with its labels to find the accuracy of the classifier
❑Ground truth - Ground truth is a term used in statistics and machine learning that means
checking the results of machine learning for accuracy against the real world
❑Training tuples - Individual tuples making up the training set are referred to as training
tuples . They are also referred to as samples, examples, instances, data points or objects

12/17/2022 MIT-WPU, DMDW 4

Supervised vs. Unsupervised Learning
Supervised learning (classification/prediction)
◦ The training data (observations, measurements, etc.) are accompanied by labels indicating
the class of the observations
◦ New data is classified based on the training set
◦ Ground truth is available
◦ Techniques for Supervised Learning – Classification and Numeric Prediction
Unsupervised learning (clustering)
◦ The class labels of training data is unknown
◦ Its goal is to infer the natural structure present within a set of data points
◦ Exploratory analysis
◦ In situations where it is either impossible or impractical for a human to propose trends in the
data, unsupervised learning can provide initial insights that can then be used to test individual
hypotheses.

12/17/2022 MIT-WPU, DMDW 5

12/17/2022 MIT-WPU, DMDW 6
Classification
Form of data analysis that extracts models describing important data classes
Such models are called as classifiers, which are built to predict classes
Applications
➢ A bank loan officer wants to analyze the data in order to know which customer (loan
applicant) are risky or which are safe.
➢ A marketing manager at a company needs to analyze a customer with a given profile,
who will buy a new computer.
➢ Image classification, Face recognition
➢ Spam filtering, Cancer diagnosis
➢ Sentiment analysis, Mood detection
➢ Prediction of rainfall

7
Definition - Classification
Classification is the problem of identifying to which of a set of categories (subpopulations),
a new observation belongs to, on the basis of a training set of data containing
observations and whose categories membership is known.

12/17/2022 MIT-WPU, DMDW 8

Numeric Prediction
In this case, a model or a predictor is constructed that predicts a continuous-
valued-function or ordered value. Regression analysis is a statistical
methodology that is most often used for numeric prediction.
Examples/ applications of prediction:
✓Marketing manager needs to predict how much a given customer will spend during a sale at his
company.
✓Predict the agricultural yield
✓Predict the salary of the customer
✓Predict the number of runs by a cricketer
✓Predict stock market value

9
Classification—A Two-Step Process
Learning phase or Model construction: describing a set of predetermined classes
◦ Each tuple/sample is assumed to belong to a predefined class, as determined by the class
label attribute
◦ The set of tuples used for model construction is training set
◦ The model is represented as classification rules, decision trees, or mathematical formulae
Classification phase or Model usage: for classifying future or unknown objects
◦ Estimate accuracy of the model
◦ The known label of test sample is compared with the classified result from the model
◦ If the accuracy is acceptable, use the model to classify new data

Note: If the test set is used to select models, it is called validation set

12/17/2022 MIT-WPU, DMDW 10 10

Step (1): Model Construction
Classification
Training Algorithms
Data

Classifier
(Model)

IF rank = ‘professor’
OR years > 6
THEN tenured = ‘yes’
12/17/2022 MIT-WPU, DMDW 11 11
Step (2): Using the Model in Prediction

Classifier

Testing
Unseen Data
Data
(Jeff, Professor, 4)
Tenured?

12/17/2022 MIT-WPU, DMDW 12 12

Classification Techniques
✓Decision tree based methods
✓Rule based methods
✓Memory based reasoning
✓Neural Networks
✓Genetic algorithms
✓Bayesian networks
✓Support vector machines
12/17/2022 MIT-WPU, DMDW 13
Decision Tree – Whether to play or not?
Outlook

Sunny Overcast Rain

Yes
Humidity Wind

High Normal Strong Weak

No Yes No Yes
Decision Tree : Another Example
❑ Training data set: Buys_computer
❑ Resulting tree:
age?

<=30 31..40 >40

student? yes credit rating?

no yes excellent fair

no yes no yes
12/17/2022 MIT-WPU, DMDW 15 15
Decision Tree Induction
❑Decision tree induction is learning of decision tree from class-labeled training
tuples
❑In decision tree, each internal node denotes a test on an attribute, each
branch represents an outcome of test, and each leaf node holds a class label
❑For an unknown tuple X, path is traced from root to leaf node for prediction of
class.
❑Decision tree can be converted to classification rules.

12/17/2022 MIT-WPU, DMDW 16

Algorithms
❖ID3 (Iterative Dichotomizer)
❖C4.5 (Successor of ID3)
❖CART (Classification and Regression Tree)
❖SLIQ
❖SPRINT

12/17/2022 MIT-WPU, DMDW 17

ID3 Algorithm
Input :
Data partition D, which is a set of training tuples and their associated
class labels
Attribute_list: the set of candidate attributes
Attribute_selection_method: a procedure to determine selection
criterion that best partitions data tuples into individual classes
Output: A decision tree

12/17/2022 MIT-WPU, DMDW 18

Method:
Create a node N
If tuples in D are all of class C then
return N as a leaf node labeled class C
If attribute_list is empty then
return N as a leaf node labeled with the majority class in D
Apply Attribute_Selection_Method(D, attribute_list )
to find the best splitting criterion

12/17/2022 MIT-WPU, DMDW 19

Label node N with splitting criterion
If splitting_attribute is discrete valued and
multiway split is allowed then
attribute_list = attribute_list – splitting attribute ;
For each outcome j of splitting_criterion
// partition the tuples and grow subtrees
let Dj be the set of tuples in D satisfying j
If Dj is empty then
attach a leaf labeled with majority class in D to node N

12/17/2022 MIT-WPU, DMDW 20

Else attach the node returned by
Generate_decision_tree(Dj, attribute_list) to node N
Endfor
Return N;
End Algo

Time Complexity: O(m * |D| * log |D|)

m= number of attributes
|D| = number of records in dataset

12/17/2022 MIT-WPU, DMDW 21

Attribute Selection
1. Attribute selection method specifies a heuristic procedure for selecting the attribute that
best discriminates the given tuples according to class
2. The measure so selected determines the splitting criterion. Splitting criterion tells us which
attribute to test at node N
3. It determines the best way to separate or partition tuples in D into individual classes
4. It indicates the splitting attribute such that the resulting partitions at each branch are as pure
as possible
5. A partition is pure if all the tuples in it belong to the same class

12/17/2022 MIT-WPU, DMDW 22

Cases
a) A is discrete
b) A is continuous
c) A is discrete but
binary tree should
be produced

12/17/2022 MIT-WPU, DMDW 23

Attribute selection measure
oInformation Gain
oGain Ratio
oGini Index
oChi-square test
oC-SEP
oMinimum description length
Attribute selection measures are also known as splitting rules because
they determine how the tuples at a given node are to be split

12/17/2022 MIT-WPU, DMDW 24

Information Gain
Based on Information theory in Physics which studies information content of messages
Entropy -> measure of disorder in the dataset
Information gain = measure of decrease in disorder achieved by partitioning the original
dataset
Information gain = difference between original information requirement and new
requirement after partitioning on a certain attribute
Information gain = Entropy for dataset - Entropy for partition
The attribute with highest information gain is chosen as the splitting attribute for a node
This attribute minimizes the information needed to classify tuples in the resulting
partition, and reflects the least impurity/randomness in these partition

12/17/2022 MIT-WPU, DMDW 25

Entropy - Example

12/17/2022 MIT-WPU, DMDW 26

Partitioning on color to reduce the disorder

12/17/2022 MIT-WPU, DMDW 27

Attribute Selection Measure: Information Gain (ID3/C4.5)

▪Select the attribute with the highest information gain reflecting least impurity
▪Let pi be the probability that an arbitrary tuple in D belongs to class Ci, estimated by |Ci,D|/|D|
▪Let m = number of distinct values of class label attribute
▪Expected information (entropy of D) needed to classify a tuple in D:

▪Information needed (after using A to split D into v partitions) to classify D:

▪Information gained by branching on attribute A

12/17/2022 MIT-WPU, DMDW 28 28

Day Outlook Temperature Humidity Wind PlayGolf

D1 Sunny Hot High Weak No

D2 Sunny Hot High Strong No
D3 Overcast Hot High Weak Yes

D4 Rain Mild High Weak Yes

D5 Rain Cool Normal Weak Yes
D6 Rain Cool Normal Strong No
D7 Overcast Cool Normal Strong Yes

D8 Sunny Mild High Weak No

D9 Sunny Cool Normal Weak Yes
D10 Rain Mild Normal Weak Yes
D11 Sunny Mild Normal Strong Yes
D12 Overcast Mild High Strong Yes

D13 Overcast Hot Normal Weak Yes

D14 Rain Mild High Strong No

12/17/2022 MIT-WPU, DMDW 29
Info (D) = 0.940 bits

12/17/2022 MIT-WPU, DMDW 30

Now, consider attribute outlook
For outlook, there are 3 categories: sunny, overcast, rainy

12/17/2022 MIT-WPU, DMDW 31

Calculating entropy for each attribute
➢The dataset is split on the different attributes.
➢The entropy for each branch is calculated.
➢Then it is added proportionally, to get total entropy for the split.
➢The resulting entropy is subtracted from the entropy before the split.
➢The result is the Information Gain, or decrease in entropy.

12/17/2022 MIT-WPU, DMDW 32

Gain when split on different attributes

12/17/2022 MIT-WPU, DMDW 33

Choose attribute with the largest information gain as the decision node, divide
the dataset by its branches and repeat the same process on every branch.

12/17/2022 MIT-WPU, DMDW 34

A branch with entropy of 0 is a leaf node. For outlook=overcast, the
entire partition has play golf = yes

12/17/2022 MIT-WPU, DMDW 35

A branch with entropy more than 0 needs further splitting.

The ID3 algorithm is run recursively on the non-leaf branches, until all data is
classified.
12/17/2022 MIT-WPU, DMDW 36
Final Decision Tree

12/17/2022 MIT-WPU, DMDW 37

Decision Tree to Decision Rules

12/17/2022 MIT-WPU, DMDW 38

12/17/2022 MIT-WPU, DMDW 39
Limitation
Limitations of information gain measure is biased toward tests with many outcomes.
That is, it prefers to select attributes having a large number of values.
For example, consider an attribute that acts as a unique identifier such as product ID. A split on
product ID would result in a large number of partitions (as many as there are values), each one
containing just one tuple. Because each partition is pure, the information required to classify
data set D based on this partitioning would be Infoproduct ID(D) = 0. Therefore, the informa-
tion gained by partitioning on this attribute is maximal.

12/17/2022 MIT-WPU, DMDW 40

Gain Ratio
C4.5, a successor of ID3, uses an extension to information gain known as gain ratio, which
attempts to overcome this bias.
It applies a kind of normalization to information gain using a “split information” value defined
analogously with Info(D) as

12/17/2022 MIT-WPU, DMDW 41

Gain Ratio
Gain Ratio is defined as

12/17/2022 MIT-WPU, DMDW 42

12/17/2022 MIT-WPU, DMDW 43
Solution
Computation of gain ratio for the attribute income.
❖A test on income splits the data of Table into three partitions, namely low, medium, and high,
containing four, six, and four tuples, respectively.
❖To compute the gain ratio of income, we first use Split Info Eq. to obtain

❖we have Gain(income) = 0.029. Therefore, GainRatio(income) = 0.029/1.557 = 0.019.

12/17/2022 MIT-WPU, DMDW 44

Gini Index
❖The Gini index is used in CART. Using the notation previously described, the Gini index measures the
impurity of D, a data partition or set of training tuples.

❖where pi is the probability that a tuple in D belongs to class Ci and is estimated by |Ci,D|/|D|. The sum
is computed over m classes.
❖The Gini index considers a binary split for each attribute.
❖For example, if income has three possible values, namely {low, medium, high}, then the possible subsets
are {low, medium, high}, {low, medium}, {low, high}, {medium, high}, {low}, {medium}, {high}, and {}.
❖We exclude the power set, {low, medium, high}, and the empty set from consideration since, conceptu-
ally, they do not represent a split.
❖Therefore, there are 2v − 2 possible ways to form two partitions of the data, D, based on a binary split
on A.
12/17/2022 MIT-WPU, DMDW 45
Gini Index
❖When considering a binary split, we compute a weighted sum of the impurity of each resulting
partition.
❖For example, if a binary split on A partitions D into D1 and D2, the Gini index of D given that
partitioning is

❖a possible split-point of A, D1 is the set of tuples in D satisfying A ≤ split point, and D2 is the set
of tuples in D satisfying A > split point.
❖The reduction in impurity that would be incurred by a binary split on a discrete- or continuous-
valued attribute A is

12/17/2022 MIT-WPU, DMDW 46

Tree Pruning
❖Tree pruning addresses the problem of overfitting the data
❖Tree overfitting can occur due to irrelevant attributes, noisy data or too
little training data.
❖As depth of the tree increases, training error reduces but test error
increases
❖Statistical measures to remove least reliable branches
❖Approaches – prepruning and postpruning
❖Prepruning – construction halted early, node converted to leaf
❖Postpruning – removes subtrees from a fully grown tree

12/17/2022 MIT-WPU, DMDW 47

Advantages of Decision Tree
1. Simple to understand and interpret. Trees can also be displayed graphically in a way
that is easy for non-experts to interpret
2. Able to handle both numerical and categorical data
3. Requires little data preparation. Other techniques often require data normalization
4. Uses a white box model. By contrast, in a black box model, the explanation for the
results is typically difficult to understand, for example with an artificial neural network
5. Possible to validate a model using statistical tests, hence reliable
6. Makes no assumptions of the training data
7. Performs well with large datasets

12/17/2022 MIT-WPU, DMDW 48

Limitations of Decision Tree
1. A small change in the training data can result in a large change in the tree and consequently
the final predictions.
2. Decision-tree learners can create over-complex trees that do not generalize well from the
training data. (overfitting)

12/17/2022 MIT-WPU, DMDW 49

Implementations
R, Weka, Orange, scikit (python), IBM SPSS, SAS, Rapidminer, Matlab

12/17/2022 MIT-WPU, DMDW 50

Bayes Classification Methods
❖Bayesian classifiers are statistical classifiers
❖They predict class membership probabilities – probability that a tuple belongs
to a particular class
❖Classification is based on Bayes’ theorem
❖High accuracy and speed
❖Naïve Bayesian Classifiers assume that effect of an attribute value on a given
class is independent of the values of other attributes → class conditional
independence
❖Application: Chinese word segmentation

12/17/2022 MIT-WPU, DMDW 51

Bayes’ Theorem
X data tuple or evidence (contains n attributes)
H hypothesis (e.g. data tuple X belongs to class C)
P(H/X) probability that hypothesis H holds given evidence X [classification problem]
It is called “Posterior / Posteriori Probability” of H conditioned on X
P(H) probability of hypothesis irrespective of any condition (Prior / Priori probability)

P(X/H) posterior probability of X conditioned on H

P(X) prior probability of X
Bayes’ Theorem :
P(H/X) = [ P(X/H) P(H)] / P(X)
Suppose the fraction of UG students who play cricket is 15% and
fraction of graduate students playing cricket is 23%. If one-fifth of
the college students are graduate students and rest are UG, what is
the probability that a student who plays cricket is a graduate
student?
P(C|U) = 0.15
P(C|G)=0.23
P(G)=0.2
P(U)=0.8
To find P(G|C)
P(G|C) = [P(C|G) * P(G)]/ P(C) = (0.23 * 0.2 )/ (0.15*0.8 + 0.23 *
0.2) = 0.277
12/17/2022 MIT-WPU, DMDW 53
Working of Naïve Bayesian Classifier
❑Let D – training set of tuples, and their associated class labels
❑ X = {x1, x2, …, xn}
❑There are m classes, C1, C2, … ,Cm
❑Given tuple X, classifier predicts that it belongs to class having highest posterior probability

❑ P(Ci/X) > P(Cj/X) for 1 ≤ j ≤ m, j ≠ i

Thus, we have to maximize)
❑Class Ci for which P(Ci/X) is maximized is called “maximum posteriori hypothesis”
❑ P(Ci/X) = [ P(X/Ci)P(Ci)] / P(X) …… Bayes’ theorem
Continued…
❑P(X) is constant for all classes, hence only P(X/Ci)P(Ci) is to be
maximized.
❑If class probabilities are not known, then it is assumed that the classes
are equally likely. i.e. P(C1) = P(C2) = …=P(Cm)
Hence only maximize P(X/Ci)
❑With class conditional independence:
P(X/Ci) = ∏ P(xk/Ci) ….. k=1 to n
= P(x1/ Ci) * P(x2/ Ci) * … * P(xn/ Ci)
RID Age Income Student Credit-rating Class: buys_
computer
1 Youth High No Fair No
2 Youth High No Excellent No
3 Middle-aged High No Fair Yes
4 Senior Medium No Fair Yes
5 Senior Low Yes Fair Yes
6 Senior Low Yes Excellent No
7 Middle-aged Low Yes Excellent Yes

8 Youth Medium No Fair No

9 Youth Low Yes Fair Yes
10 Senior Medium Yes Fair Yes
11 Youth Medium Yes Excellent Yes
12 Middle-aged Medium No Excellent Yes

13 Middle-aged High Yes Fair Yes

14 Senior Medium No Excellent no

Problem statement
Given the training data, classify the tuple :

X =( age=youth, income=medium, student=yes, credit_rating=fair)

Let C1: buys_computer = yes, C2: buys_computer = no
We need to maximize P(X|Ci) for i=1 and i=2
P(C1) = P(buys_computer=yes) = 9/14 = 0.643
P(C2) = P(buys_computer=no) = 5/14 = 0.357

12/17/2022 MIT-WPU, DMDW 57

Computing conditional probabilities:
P(age=youth|buys_computer=yes) = 2/9 = 0.222
P(age=youth|buys_computer=no) = 3/5 = 0.600
P(income=medium | buys_computer=yes) = 4/9 = 0.444
P(income=medium | buys_computer=no) = 2/5 = 0.400
P(student=yes | buys_computer=yes) = 6/9 = 0.667
P(student=yes | buys_computer=no) = 1/5 = 0.2
P(credit_rating=fair | buys_computer=yes) = 6/9 = 0.667
P(credit_rating=fair | buys_computer=no) = 2/5 = 0.4

12/17/2022 MIT-WPU, DMDW 58

Similarly,
P (X|buys_computer=no) = 0.6 * 0.4 * 0.2 * 0.4 = 0.019

To find the class Ci that maximizes P(X|Ci).P(Ci):

P (X|buys_computer=yes).P(buys_computer=yes)= 0.044 * 0.643 = 0.028
P (X|buys_computer=no).P(buys_computer=no)= 0.019 * 0.357 = 0.007

Thus, Naïve Bayesian Classifier predicts buys_computer = YES for tuple X

If X =( age=youth, income=medium, student=yes, credit_rating=fair)
Then buys_computer = yes

12/17/2022 MIT-WPU, DMDW 59

Try it out on Weather Data
X =( Outlook = sunny , temperature = cool, humidity = high, windy =
strong, play = ?)

12/17/2022 MIT-WPU, DMDW 60

Zero probability value – Laplace correction
If no training tuples for particular attribute value, then its probability is zero, result in product
probability =0
Thus we would have P(X|Ci)=0, cancelling the effect of other probabilities
To avoid this problem, assume that D is very large and add one to each count to avoid zero
probability.
This technique is known as Laplacian correction or Laplace estimator.
Example: In a training database D, for class buys_computer =yes,
0 tuples with income=yes, 990 tuples with income = medium and 10 tuples with income=high
Probabilities of events without Laplacian correction: 0, 0.990, 0.010
Using Laplace correction, add 1 tuple of each type in D
New probabilities: 1/1003, 991/ 1003, 11/1003 i.e., 0.001, 0.988, 0.011

12/17/2022 MIT-WPU, DMDW 61

Model Evaluation and Selection
Terminology
➢Positive tuples- tuples of main class of interest
➢Negative tuples – all other tuples
➢True Positives (TP) – Positive tuples that were correctly labelled by the classifier
➢True Negatives (TN) – Negative tuples that were correctly labelled by the classifier
➢False Positives (FP) – Negative tuples that were incorrectly labelled as positives
➢False Negatives (FN) – Positive tuples that were incorrectly labelled as negative

12/17/2022 MIT-WPU, DMDW 62

Confusion Matrix
Confusion Matrix – method to analyze how well classifier can recognize
tuples of different classes

Predicted Class
C1 (yes / +ve) C2 (no / -ve)

Actual Class C1 True positive False negative

C2 False positive True negative

Example of confusion matrix

12/17/2022 MIT-WPU, DMDW 64

Evaluation Measures - Accuracy
1. Accuracy - Accuracy of a classifier on a given test set is percentage of test set tuples that are
correctly classified by the classifier. It is termed as overall Recognition rate
2. Class imbalance problem –
◦ In classification involving skewed or highly imbalanced data, e.g., network intrusion, financial fraud
detections or detection of cancer we are interested only in the minority class.
◦ High accuracy does not mean any intrusion is detected.
◦ Hence accuracy is not suitable in some applications

Classes Cancer=yes Cancer=no Total Find the overall

accuracy of the
Cancer=yes 90 210 300 classifier to detect
cancer patients
Cancer-=no 140 9560 9700

Total 230 9770 10000

12/17/2022 MIT-WPU, DMDW 65

Precision and Recall measures
TP TP
p= . r= .
TP + FP TP + FN
◼ Precision p is the number of correctly classified positive
examples divided by the total number of examples that
are classified as positive. [Exactness]
◼ Recall r is the number of correctly classified positive
examples divided by the total number of actual positive
examples in the test set. [Completeness]
o Find the precision and recall for cancer patients in
previous slide
CS583, Bing Liu, UIC 66
Calculate accuracy , precision and recall from the foll. confusion
matrix

Predicted Yes Predicted No Total

Actual Yes 100 5 105
Actual No 10 50 60
Total 110 55 165

12/17/2022 MIT-WPU, DMDW 67

Other parameters to compare classifiers
1. Speed – computational costs involved in generating and using the classifier
2. Robustness – ability of classifier to make correct predictions by handling noise and missing
values
3. Scalability – ability to construct classifier efficiently given large amounts of data
4. Interpretability - understandable and insight provided by the model

CS583, BING LIU, UIC 68

Data Mining Unit 2
No ratings yet
Data Mining Unit 2
40 pages
Topic7 Classification
No ratings yet
Topic7 Classification
104 pages
Module - 4.1-DM-1
No ratings yet
Module - 4.1-DM-1
63 pages
ML Unit II
No ratings yet
ML Unit II
183 pages
AI Chapter 3 Part 2
No ratings yet
AI Chapter 3 Part 2
51 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
88 pages
CH 5
No ratings yet
CH 5
84 pages
Decision Tree
No ratings yet
Decision Tree
41 pages
Classification
No ratings yet
Classification
75 pages
08 Class Basic
No ratings yet
08 Class Basic
86 pages
4 & 5 DWM 2024-25
No ratings yet
4 & 5 DWM 2024-25
32 pages
Classification and Prediction
No ratings yet
Classification and Prediction
143 pages
Classification DecisionTreesNaiveBayeskNN
No ratings yet
Classification DecisionTreesNaiveBayeskNN
75 pages
DM 4
No ratings yet
DM 4
68 pages
Unit 3-Classification
No ratings yet
Unit 3-Classification
71 pages
Unit - Iii
No ratings yet
Unit - Iii
52 pages
Classification & Prediction Techniques
No ratings yet
Classification & Prediction Techniques
71 pages
Classification
No ratings yet
Classification
45 pages
COMP 6930 Topic01 Classification Basics
No ratings yet
COMP 6930 Topic01 Classification Basics
190 pages
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
No ratings yet
Asset v1 MKAU+SEng9032+DEV 01+Type@Asset+Block@ML Chapterthree
129 pages
CH 8 Data Mining
No ratings yet
CH 8 Data Mining
30 pages
Classification
No ratings yet
Classification
27 pages
Topic01 Classification Basics Jiawei Han Extra
No ratings yet
Topic01 Classification Basics Jiawei Han Extra
198 pages
Unit 2 Notes
No ratings yet
Unit 2 Notes
83 pages
TTDS Lecture 4
No ratings yet
TTDS Lecture 4
31 pages
Decision Tree Learning Guide
No ratings yet
Decision Tree Learning Guide
33 pages
Concepts and Techniques: - Chapter 8
No ratings yet
Concepts and Techniques: - Chapter 8
81 pages
Decision Tree Course Guide
No ratings yet
Decision Tree Course Guide
37 pages
DWM - Module 3
No ratings yet
DWM - Module 3
22 pages
Data Mining Unit 2
No ratings yet
Data Mining Unit 2
41 pages
Week 5
No ratings yet
Week 5
72 pages
Chap4 Classification Lecture 5
No ratings yet
Chap4 Classification Lecture 5
74 pages
Supervised Learning Algorithm
No ratings yet
Supervised Learning Algorithm
59 pages
Module 3 Notes
No ratings yet
Module 3 Notes
31 pages
Classification and Prediction
No ratings yet
Classification and Prediction
40 pages
Class Basic
No ratings yet
Class Basic
67 pages
Unit 3 Classification
No ratings yet
Unit 3 Classification
71 pages
Concepts and Techniques: Data Mining
100% (1)
Concepts and Techniques: Data Mining
81 pages
Classification and Prediction: Data Mining 이복주 단국대학교 컴퓨터공학과
No ratings yet
Classification and Prediction: Data Mining 이복주 단국대학교 컴퓨터공학과
75 pages
08 Class Basic
No ratings yet
08 Class Basic
81 pages
05classification Rule Mining
No ratings yet
05classification Rule Mining
56 pages
Classification Ppts 2021
No ratings yet
Classification Ppts 2021
80 pages
08ClassBasic v1
No ratings yet
08ClassBasic v1
46 pages
Data Mining: Concepts and Techniques: - Chapter 7
No ratings yet
Data Mining: Concepts and Techniques: - Chapter 7
61 pages
DWDM Unit 4
No ratings yet
DWDM Unit 4
80 pages
7 Classification
100% (3)
7 Classification
63 pages
Data Mining Book
No ratings yet
Data Mining Book
84 pages
Week 6 - 7 - Classification
No ratings yet
Week 6 - 7 - Classification
67 pages
04 Classification
No ratings yet
04 Classification
72 pages
05 Classification
No ratings yet
05 Classification
79 pages
Classification
No ratings yet
Classification
81 pages
L05 - Advance Analytical Theory and Methods - Classification
No ratings yet
L05 - Advance Analytical Theory and Methods - Classification
34 pages
8 Classification
No ratings yet
8 Classification
82 pages
Data Classification Basics
No ratings yet
Data Classification Basics
34 pages
Trees
No ratings yet
Trees
78 pages
Mba 1
No ratings yet
Mba 1
145 pages
Tree
No ratings yet
Tree
40 pages
Ai&Ml Lab Manual r21-1
No ratings yet
Ai&Ml Lab Manual r21-1
47 pages
ML Lab Manual Completed
No ratings yet
ML Lab Manual Completed
56 pages
Syllabus: Data Warehousing and Data Mining
No ratings yet
Syllabus: Data Warehousing and Data Mining
18 pages
ML Question Bank Upto UNIT-III
No ratings yet
ML Question Bank Upto UNIT-III
3 pages
Boolean Functions & Decision Trees
No ratings yet
Boolean Functions & Decision Trees
3 pages
DMDW Notes
100% (1)
DMDW Notes
62 pages
Regression: UNIT - V Regression Model
100% (1)
Regression: UNIT - V Regression Model
21 pages
ML Questions
No ratings yet
ML Questions
9 pages
Week 6 Decision Trees
No ratings yet
Week 6 Decision Trees
47 pages
An Advanced Analysis of Disease Prediction and Prevention Using Machine Learning
No ratings yet
An Advanced Analysis of Disease Prediction and Prevention Using Machine Learning
4 pages
Guia R
No ratings yet
Guia R
32 pages
Hands On Machine Learning With Scikit Learn and TensorFlow Early Release 2nd Edition Aurélien Géron Download PDF
100% (3)
Hands On Machine Learning With Scikit Learn and TensorFlow Early Release 2nd Edition Aurélien Géron Download PDF
65 pages
Data Mining - Theories - Algorithms - and Examples PDF
No ratings yet
Data Mining - Theories - Algorithms - and Examples PDF
347 pages
Decision Trees for CS Students
No ratings yet
Decision Trees for CS Students
14 pages
22BCS14374 - Sanya - Singh - Assignment 2
No ratings yet
22BCS14374 - Sanya - Singh - Assignment 2
8 pages
Assignment B 1 LinearRegression
No ratings yet
Assignment B 1 LinearRegression
5 pages
DSML Practical
No ratings yet
DSML Practical
4 pages
Distributed Data Mining Systems
No ratings yet
Distributed Data Mining Systems
13 pages
2023-24 AIML ML Mid-Semester Regular QP Anwer-Keys
No ratings yet
2023-24 AIML ML Mid-Semester Regular QP Anwer-Keys
4 pages
Phishing Attacks Detection Using Machine Learning Approach
No ratings yet
Phishing Attacks Detection Using Machine Learning Approach
7 pages
Data Preprocessing & Mining Techniques
No ratings yet
Data Preprocessing & Mining Techniques
8 pages
DLT Unit-1 Answers
No ratings yet
DLT Unit-1 Answers
36 pages
Base Paper
No ratings yet
Base Paper
16 pages
Benchmarking Functions For Genetic Algorithms
No ratings yet
Benchmarking Functions For Genetic Algorithms
7 pages
Detecting The Financial Statement Fraud The Analysis of The Differences Between Data Mining Techniques and Experts' Judgments
100% (1)
Detecting The Financial Statement Fraud The Analysis of The Differences Between Data Mining Techniques and Experts' Judgments
12 pages
Machine Learning Internship Report
33% (9)
Machine Learning Internship Report
31 pages
ID3 Excercise Data IRIS
No ratings yet
ID3 Excercise Data IRIS
2 pages
Black Friday Sales
No ratings yet
Black Friday Sales
26 pages