0 ratings0% found this document useful (0 votes) 36 views18 pagesML Unit II Modelling Notes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content,
claim it here.
Available Formats
Download as PDF or read online on Scribd
Modelling and Evaluation
Syllabus
Selecting a Model : Predictive/Descriptive, Training a Model for supervised learning. model
representation and interpretability, Evaluating performance of a model, Improving performance of
a model.
Contents
3.1. Selecting a Mode!
3.2 Training a Model for Supervised Leaming
3.3 Model Reprosentation and Interpretability
3.4 Evaluating Performance of a Mode!
3.5 Improving Performance of a Mode!
3.6 Fillin the Blanks
3.7 Multiple Choice Questions
B-1
NEN ee UE EE EEE SEEEEESTETTEP REP Snel
Scanned with CamScannerMachine Leaming 3:2 Modelling and Evaluation
ERI Selecting a Model
Structured representation of raw input data to the meaningful pattem is called a
model. The model might have different forms. It might be a mathematical
equation, it might be a graph or tree structure, it might be a computational block,
etc.
© Given easy-to-use machine leaming libraries like scikitleam and Keras, it is
straightforward to fit many different machine leaming models on a given
predictive modeling dataset
Model selection is the task of selecting a statistical model from a set of candidate
models, given data.
© The decision regarding which model is to be selected for a specific data set is
taken by the learning task, based on the problem to be solved and the type of
daa
«The process of assigning a model, and fitting a specific model to a data set is
called mode! training.
«Model selection is the process of selecting one final machine learning model from
among 2 collection of candidate machine learning models for a training dataset.
J * Model selection is 2 process that can be applied both across different types of
i models (eg. logistic regression, SVM, KNN, etc) and across models of the same
type configured with different model hyperparameters.
Fitting models is relatively straightforward, although selecting among them is the
true challenge of applied machine leaming.
«All models have some predictive error, given the statistical noise in the data, the
incompleteness of the data sample, and the limitations of each different model
type. Therefore, the notion of a perfect or best model is not useful. Instead, we
must seek 2 model that is “good enough”
« The best approach to model selection requires “sufficient” data, which may be
nearly mfirute depending on the complexity of the problem.
© In this ideal situation, we would split the data into training, validation, and test
sets, then fit candidate models on the training set, evaluate and select them on the
validation set, and report the performance of the final model on the test set.
EREI Predictive models
«Predictive modelling is also called predictive analytics. It is a mathematical process
that seeks to predict future events or outcomes by analyzing patterns that are
likely to forecast future results.
TECHNICAL PUBLICATIONS® - an up-thst for knowledge
Scanned with CamScannerMechine Leaming 3-3 ‘Modelling and Evaluation
If you are trying to predict a continuous target, then you will need a regression
model. But if you are trying to predict a discrete target, then you will need a
classification model.
‘The predictive models have a clear focus on what they want to learn and how
they want to learn.
Predictive analysis provides answers of the future queries that move across using
historical data as the chief principle for decisions
It involves the supervised learning functions used for the prediction of the target
value. The methods fall under this mining category are the classification,
time-series analysis and regression.
Data modeling is the necessity of the predictive analysis, which works by utilizing
some variables to anticipate the unknown future data values for other variables.
It provides organizations with actionable insights based on data. It provides an
estimation regarding the likelihood of a future outcome.
To do this, a variety of techniques are used, such as machine learning, data
mining, modeling and game theory.
Predictive modeling can, for example, help to identify any risks or opportunities in
the future.
Predictive analytics can be used in all departments, from predicting customer
behaviour in sales and marketing, to forecasting demand for operations or
determining risk profiles for finance.
A very well-known application of predictive analytics is credit scoring used by
financial services to determine the likelihood of customers making future credit
payments on time. Determining such a risk profile requires a vast amount of data,
including public and social data.
Historical and transactional data are used to identify pattems, and statistical
models and algorithms are used to capture relationships in various datasets.
Predictive analytics has taken off in the big data era, and there are many tools
available for organisations to predict future outcomes.
The target feature is known as a class and the categories to which classes are
divided into are called levels. The k-Nearest Neighbor, Naive Bayes, and decision
tree are the popular classification models.
Predictive models may also be used to predict numerical values of the target
feature based on the predictor features. Popular regression models are Linear
Regression and Logistic Regression.
TECHNICAL PUBLICATIONS® - an up-trust for knowledge
Scanned with CamScannerReoning ant Seca
to 3
Eg tee res
& Sector sede med Gir ows Gan wnaif Hewitt Som the macht came
em Sains Gee mm pew am Eeesame woes The pees of mr?
aecoet med = ole cme! bee,
= coerced Hering slemiim we do oat uve acy eget of come
wens fr oe qseem Pos met ce Gowers perch = SSeer
Seeos WG Sch pet Gr seme Gems bn Set es
Descpoce Somtras § fe ceemoral feo of fuses: Stelignce md Gen
woarss, sess Seeds ws Genii: ox "Scythe of Secs ed Sees
se ie ee sti eee or eee Ge ee See ss.
Dee peer Sm ae ced Sor sper ast eves : des agoceestion end
an ome
2} pesene pes See om eesiy Geese Geemt ex Se eae: of 2 wide
es ace
2 se of eciges r setiewing and exes Shr dnt set tm ended the
Desspeve anairacs eins exgenisetioms to emderstend what happened in the past
2 bes w ondessns She velop between product and costomess.
The abectve of Gis emaivsis is > umderstending, what approach to take in the
fame. E we lee Som pest behaviour, i helps us to infuence future outcomes.
Company sepans is an example of descrivive analytics which simply provides 2
ristock ceview of company uperations, stzkeholders, customers and financials.
It aiso heips to describe and present data in such format, which can be easily
sndeswvad by 2 wide vanety of business readers.
The descriptive modeling task celled patter discovery is used to identify useful
associations within data Pattern discovery is often used for market basket analysis
on retailers’ transactional purchase data.
Here, the goal is to identify items that are frequently purchased together, such that
the leamed information can be used to refine marketing tactics.
For instance, if a retailer leams that swimming trunks are conunonly purchased at
the same time as sunglasses, the retailer might reposition the items more closely in
the store or run a promotion to “up-sell” customers on associated items.
TECHNICAL PUBLICATIONS® - an up-thrust lor knowledge
Scanned with CamScanneracne aes 3-5 Wroming ee Featacor
‘Training 3 Model for Supervised Learning
EERE oiscut method
© The dams sik tp owo Gee: detmeess Labelled os 2 came md a eg
Guaset. This Gm be 2 0/40 of 7H) of RVD spt. This echeigue is ald Se
Bald-eat vabdztion tecrsyoe.
© Supp we have 2 datthae wokh Asuse paises as the dependent varishle and neo
Spdependent varias showing he squme fectage of he howe and Se mache of
ros.
Now, Emagine this detasct has 30 rows The whole idea & Sut you build a madd
cen predict house prices accurately.
© To Yrain’ yoor model, or see how well it performs, we randomly subset 0 of
hose rows and St the modal.
© The second step is to predict the values of thase 10 rows that we excluded and
measure how well our predictions were.
* Asa rile of dumb, experts suggest to randomly sample 80 % of the data into the
training set and 20 % into the test set.
+ Training set : Used to train the dlassifier.
‘Total number of examples
[ae]
Fig. 324
© The holdout method has two, basic drawbacks :
1. It requires extra dataset
2 It is a single train-and-test experiment, the holdout estimate of error rate will
be misleading if we happen to get an “unfortunate” split
r4 Cross-Validation
© Cross-validation is a technique for evaluating estimating performance by training
several machine leaming models on subsets of the available input data and
evaluating them on the complementary subset of the data. Use cross-validation to
detect overfitting, ie., failing to generalize a pattem.
* In general, machine learning involves deriving models from data, with the aim of
achieving some kind of desired behaviour, e.g., prediction or classification.
«Fig. 32.2 shows cross-validation.
TECHINICAL PUBLICATIONS® - an up-tnust for knowledge
Scanned with CamScannerMaco Lesy 3-6 Moasling and E:veitssten,
i Tiiiiitii c)
© Sot Ss eee sk & Gk Geen 2 eer of ped es When
comme s dome. Se dae Sat was removed con be used tp test the performance
Se femme Sock! on "nex is Ge basic idea for a whole dass of mode!
: nckcidiid secaimmmen
© Tepes of cose walteton meds ee boldonut, K-fold and Lezve oneout
@ Toe Seideer meted § Se soiest od of cross validation The data set is
sepamed Em oxo ses, led Se teking set end the testing set The firctin
cose validation is one way to improve over the holdout method. The
dam ts Sided me kc subsets, and the holdout method is repeated k times.
© Each Gre ome of the k subsets is used as the test set and the other k — 1 subsets
” ame put meether to form 2 tracing set Then the average error across all k trials is
oes
© Lesvecnecut cuss validation is Kfold cross velidation taken to its logical
exteme, with K equal to N, the number of data points in the set
+ Thet means thet N separate times, the function approximate is trained on all the
d2ta except for one point and 2 prediction is made for that point.
© Cross-validation ensures non-overlapping test sets.
K-fold cross-validation :
© In this technique, k - 1 folds ere used for training and the remaining one is used
for testing 2s shown in Fig 323.
} ‘© The advantage is that entire data is used for training and testing. The error rate of
the model is average of the error rate of each iteration.
i This technique can also be called 2 form the repeated hold-out method. The error
rate could be improved by using stratification technique.
| TECHNICAL PUBLICATIONS® - en up-tnust for knowledge
Scanned with CamScanner5 active Losing 3-7 Usdating set Entater
Fig. 3.23 K-fold cress validation
i Bootstrap
Ensemble classifiers such as bagging, boosting and model averaging are known to
have improved accuracy and robustness over a single model Although
5 unsupervised models, such as clustering, do not directiy generate label prediction
L for each individual, they provide useful constraints for the joint prediction of a set
of related objects.
: © For given a training set of size n, create m samples of size n by drawing n
examples from the original data, with replacement. Each bootstrap sample will en
average contain 632 % of the unique training examples, the rest are replicates. It
combines the m resulting models using simple majority vote.
© In particular, on each round, the base learner is trained on what is often called 2
“bootstrap replicate” of the original training set. Suppose the training set consists
of n examples.
© Then a bootstrap replicate is a new training set that also consists of n examples,
and which is formed by repeatedly selecting uniformly at random and with
replacement n examples from the original training set. This means that the same
example may appear multiple times in the bootstrap replicate, or it may appear
not at all.
© It also decreases error by decreasing the variance in the results due to unstable
learners, algorithms (like decision trees) whose output can change dramatically
when the training data is slightly changed.
TEGHNICAL PUBLICATIONS® - an up-thrust for knowledge
Scanned with CamScannerMachine Leaming 3-8 ‘Modelling and Evaluation
Lazy vs. Eager Learner
© Eager learning : Given a set of training set, constructs a classification model before
receiving new data to classify. For example, decision tree induction, Bayesian
classification, rule-based classification ete.
© Lazy leaming ; Simply stores training data and waits until it is given a new
instance. Lazy learners take less time in training but more time in predicting. For
example, k-nearest-neighbor classifiers, case-based reasoning classifiers
© Instance-based methods are also known as lazy learning because they do not
generalize until needed.
The eager learner must create a global approximation. The lazy learner can create
many local approximations.
EE] Mode! Representation and interpretability
+ In addition to using models for prediction, the ability to interpret what a model
has leamed is receiving an increasing amount of attention.
Interpretability has to do with how accurate a machine learning model can
associate a cause to an effect.
If a model can take the inputs, and routinely get the same outputs, the model is
interpretable :
1. If you overeat your magi at dinnertime and you always have troubles
sleeping, the situation is interpretable.
2 If all 2019 polls showed " ABC party" win and the "XYZ party" candidate took
office, all those models showed low interpretability.
Interpretability poses no issue in low-risk scenarios. If a model is recommending
movies to watch, that can be a low-risk task
Fitness of a target function approximated by a learning algorithm determines how
correctly it is able to classify a set of data it has never seen.
Underfitting and Overfitting
* Training error can be reduced by making the hypothesis more sensitive to training
data, but this may lead to overfitting and poor generalization.
© Overfitting occurs when a statistical mode! describes random error or noise instead
of the underlying relationship. Overfitting is when a classifier fits the training
data too tightly. Such a classifier works well on the training data but not on
independent test data. It is a general problem that plagues all machine learning
methods.
TECHNICAL PUBLICATIONS® - an up-thnust for inowleclge
Scanned with CamScannerMachine Leaning 3-9 Modelling and Evaluation
¢ Underfitting : If we put too few variables in the model, leaving out variables that
could help explain the response, we are underfitting, Consequences :
1. Fitted model is not good for prediction of new data - prediction is biased
2. Regression coefficients are biased
3. Estimate of error variance is too large
‘© Because of overfitting, low error on training data and high error on test data.
Overfitting occurs when a model begins to memorize training data rather than
learning to generalize from trend.
The more difficult a criterion is to predict, the more noise exists in past
information that need to be ignored. The problem is determining which part to
ignore.
© Overfitting generally occurs when a model is excessively complex, such as having
too many parameters relative to the number of observations. We can determine
whether a predictive model is underfitting or overfitting the training data by
looking at the prediction error on the training data and the evaluation data.
Fig. 3.3.1 shows underfiting and overfiting.
Underfiting X Balanced = % Overfiting
Fig. 3.3.4
«Reasons for overfitting
1. Noisy data
2. Training set is too small
3. Large number of features
¢ In the machine learning the more complex model is said to show signs of
overfitting, while the simpler model underfitting. Often several heuristic are
developed in order to avoid overfitting, for example, when designing neural
networks one may :
1. Limit the number of hidden nodes
2. Stop training early to avoid a perfect explanation of the training set, and
3. Apply weight decay {o limit the size of the weights, and thus of the
function class implemented by the network
TECHNICAL PUBLICATIONS® - an upthrist for knowledge
Scanned with CamScannerMachine Leaming 3-10 ‘Modeling and Evatvation
* In the experimental practice we observe an important phenomenon called the bias
variance dilemma.
* In supervised leaming, the class value assigned by the leaming model built based
on the training data may differ from the actual class value. This error in learning
can be of two types, errors due to ‘bias’ and error due to ‘variance’.
© Fig, 33.2 shows bias-variance trade off.
Low variance High variance
©6
" Fig. 3.3.2 Blas-variance trade off
* Give two classes of hypothesis (eg. linear models and k-NNs) to fit to some
training data set, we observe that the more flexible hypothesis class has a low bias
term but a higher variance term. If we have parametric family of hypothesis, then
we can increases the flexibility of the hypothesis but we still observe the increase
of variance.
* The bias-variance-dilemma is the problem of simultaneously minimizing two
sources of error that prevent supervised leaming algorithm from generalizing
beyond their training set :
1. The bias is error from erroneous assumptions in the learning algorithm. High
bias can cause an algorithm to miss the relevant relations between features
and target outputs.
2. The variance is error from sensitivity to small fluctuations in the training set.
High variance can cause overfitting : modeling the random noise in the
training data, rather than the intended outputs.
@ In order to reduce the model error, the designer can aim at reducing either the
bias or the variance, as the noise components is irreducible.
TECHNICAL PUBLICATIONS® - an up-tust for knowtedge
Varlance
Low bias
High bias
Scanned with CamScannerMachine Loaming ae Modeting and Evaluation
«As the model increases in complexity, Its bias is likely to diminish. However, ax
the number of training examples is kept fixed, the parametric identification of the
model may strongly vary from one DN to another. This will increase the variance
term.
© Atone stage, the decrease in bias will be inferior to the increase in variance,
warming that the model should not be too complex. Conversely, to decrease the
variance term, the designer has to simplify its mode! so that it is less sensitive to a
specific training, set. This simplification will lead to a higher bias.
EEE] Evaluating Performance of a Model
EERE Suporvised Learning : Classification
# Classification is major task of supervised learning. The responsibility of the
classification model is to assign class label to the target feature based on the value
of the predictor features.
© When performing classification predictions, there's four types of outcomes that
could occur. The evaluation measures in classification problems are defined from a
matrix with the numbers of examples correctly and incorrectly classified for each
class, named confusion matrix.
.
Confusion matrix is also called a contingency table.
1) True positives are when you predict an observation belongs to a class and it
actually does belong to that class.
2) True negatives are when you predict an observation does not belong to a class
and it actually does not belong to that class.
3) False positives occur when you predict an observation belongs to a class when
in reality it does not.
4) False negatives occur when you predict an observation does not belong to a
class when in fact it does.
© Confusion matrix goes deeper than classification accuracy by showing the correct
and incorrect (ie. true or false) predictions on each class, In case of a binary
classification task, a confusion matrix is a 2x2 matrix. If there are three different
classes, it is a 3x3 matrix and so on.
TECHNICAL PUBLICATIONS® - an up-thust for knowedge
Scanned with CamScannerMachine Leaming 9-12 Modelling and Evaluation
«For any classification model, model accuracy is given by total number of correct
classifications (True Positive or True Negative) divided by total number of
classifications done.
Accuracy rate = rue negatives + True positives|
False negatives|+ [True positives + [True negatives|+ [rue positives|
* The complement of accuracy rate is the error rate, which evaluates a classifier by
its percentage of incorrect predictions.
[False negatives| + |False positives |
Error rate = = rT
[False negatives| + [False positives| + [True negatives| + [True positives]
Error rate = 1- (Accuracy rate)
» The recall accuracy rate predicted as positive.
* The specificity is a statistical measure of how well a binary classification test
correctly identifies the negatives cases,
[Inse negative
Recall 00 5 —_— a
[rue positivg +|False negative|
cea [True positives}
pecificity = [False positives|+ True negatives|
| True Positive Rate (TPR) is also called sensitivity, hit rate and recall.
‘es Number of true positives
| Sensitivity = 5 ber of true positives + Number of false negative
Precision measures how good our model is when the prediction is positive.
co = IP
ia Precision = = 55
The focus of precision is positive predictions. It indicates how many positive
predictions are true.
x
F, score is the weighted average of precision and recall.
Fyascore = 2 Piecisiont Recall
y Precision + Recall
@ F, score is a more useful measure than accuracy for problems with uneven class
distribution because it takes into account both false positive and false negatives.
© Kappa value of a model indicates the adjusted the model accuracy
Total accuracy ~ Random accuracy
1- Random accuracy
Kappa =
Scanned with CamScanner‘Machine Learning 3-13 Modolling and Evaluation
* Total accuracy is simply the sum of true positive and true negatives, divided by
the total number of items, that is :
. TP+IN
Total accurecy = TEEN GEPTEN
«+ Random Accuracy is defined as the sum of the products of reference likelihood
and result likelihood for each class. That is,
Random accuracy = Actual False «Predicted False+ Actual True * Predicted True
Total *Total
« In terms of false positives etc., random accuracy can be written as :
Random accuracy = N+P) *(IN-+FN)+(EN+1P) +(FP+TP)
Total *Total
Consider the following treecclass confusion matri.
|
i Predicted |
15 2 3 | |
Actual : ; |
1 a a i|
ae 2 3 |
Calculate precision and —e a asi ; |
recall forthe classe," PET as. Also calculate weighted average precision and |
Solution ;
Classifier accuracy «
15415445 ee
BH5445
Calculate perctasg
re
First class = 15
a
Scanned with CamScannerMachine Leaming 3-14 Modeling a, rd Eva,
Sten
5 cacuete axes, prison and el for he lowing —
Predicted + Predicted =
ial as =08
SEES cats trurmeatie rt e bacurcy end ps forthe fosing
+
8
aR
| Actual — 5. 20
Solution :
Accuracy = — 20425 075 = 75 %
50+5+25420 — vn .
Precision 0.59090
© True negative rate is also called as specificity.
ops [True negatives!
- True negatives
Spectlty = Fe positived [ine negative
. 2»
True negative rate = 70> = 08
ROC Curve :
Receiver Operating Characteristics (ROC) graphs have long been used in signal
detection theory to depict the trade-off between hit rates and false alarm rates over
noisy channel. Recent years have seen an increase in the use of ROC graphs in the
machine learning community.
TECHNICAL PUBLICATIONS® . an up-thnust for knowledge
Scanned with CamScannerMachine Leeming 3-15 Modeiting and Evelvation
» ROC curve summarizes the performance of the model at different threshold values
by combining confusion matrices at all threshold values. ROC curves are typically
used in binary classification to study the output of a classifier.
* An ROC plot plots true positive rate on the Y-axis against false positive rate on
the X-axis; a single contingency table corresponds to a single point in an ROC
plot.
‘© The performance of a ranker can be assessed by drawing a piecewise linear curve
in an ROC plot, known as an ROC curve. The curve starts in (0, 0), finishes in
(1, 2), and is monotonically non-decreasing in both axes.
» A useful technique for organizing classifiers and visualizing their performance.
Especially useful for domains with skewed class distribution and unequal
classification error costs.
«It allows to create ROC curve and a complete sensitivity/specificity report. The
ROC curve is a fundamental tool for diagnostic test evaluation. |
In a ROC curve the true positive rate (Sensitivity) is plotted in function of the
false positive rate for different cut-off points of a parameter.
® Each point on the ROC curve represents a sensitivity/specificity pair
corresponding to a particular decision threshold. The area under the ROC curve is
a measure of how well a parameter can distinguish between two diagnostic
groups.
« Each point on an ROC curve connecting two segments corresponds to the true and
false positive rates achieved on the same test set by the classifier obtained from
the ranker by splitting the ranking between those two segments.
* An ROC curve is convex if the slopes are monotonically non-increasing when
moving along the curve from (0, 0) to (1, 1). A concavity in an ROC curve, ie,
two or more adjacent segments with increasing slopes, indicates a locally worse
than random ranking.
True Positive Rate (TPR) is a synonym for recall and is therefore defined as
follows:
True Positive Rate TPR =
TP.
TP+EN
False Positive Rate (FPR) is defined as follows :
False Positive Rate FPR = }PrIn
TECHNICAL PUBLICATIONS® - an up-tnust for knowledge
rr
Scanned with CamScannerMachine Leaming 3-16 Modeling end Evaluation
Supervised Leaming : Regression
# A regression model which ensures that the difference between predicted ang
actual values is low can be considered as a good model.
© For example, a regression model could be used to predict the values of a data
warehouse based on web-marketing, number of data entries, size and other factors.
A regression task begins with a data set in which the target values are known,
Regression analysis is a good choice when all of the predictor variables are
continuous valued as well.
«Fig. 3.4.1 shows linear regression model.
Y
Dependent variable
Value of the apartment unit
Area (in square feet) —=
Fig. 3.4.1 Linear regression model
«If ‘area’ is the predictor variable (say x) and ‘value’ is the target variable (say y),
the linear regression model can be represented in the form : y = c +x
TECHNICAL PUBLICATIONS® - an up-thrust for knowledge
Scanned with CamScanneras
ar
Machine Leeming a-17 ‘Modelling and Evaluation
«In this equation
1. y is the output variable. It is also called the target variable in machine
learning, or the dependent variable in statistical modeling. It represents the
continuous value that we are trying to predict.
2. x is the input variable. In machine learning, x is referred to as the feature,
while in statistics, it is called the independent variable. It represents the
information given to us at any given time.
3, Bis the regression coefficient or scale factor.
It assumes that there exists a linear relationship between a dependent variable and
independent variable(s). The value of the dependent variable of a linear regression
model is a continuous value ie. real numbers.
Linear regression is a statistical tool that determines how well a straight line fits a
set of paired data. The straight line that best fits that data is called the least
squares regression line.
« The distance between the actual value and predicted values is called residual.
If the observed points are far from the regression line, then the residual will be
high, and so cost function will high. If the scatter points are close to the regression
line, then the residual will be small and hence the cost function.
« R-squared is a good measure to evaluate the model fitness. It is also known as the
coefficient of determination. R-squared is the fraction by which the variance of the
errors is less than the variance of the dependent variable.
It is called R-squared because in a simple regression model it is just the square of
the correlation between the dependent and independent variables, which is
commonly denoted by 'r’.
«Ina multiple regression model R-squared is determined by pairwise correlations
among all the variables, including correlations of the independent variables with
each other as well as with the dependent variable.
EZE] Unsupervised Learning : Clustering
‘© Clustering groups data points based on their similarities. Each group is called a
cluster and contains data points with high similarity and low similarity with data
points in other clusters.
© The objective of clustering is to segregate groups with similar traits and bundle
them together into different clusters.
© Silhouette analysis can be used to study the separation distance between the
resulting clusters. The silhouette plot displays a measure of how close each point
TECHNICAL PUBLICATIONS® - an up-irus! fr knowledge
Scanned with CamScannerMachine Leaming 3-18 Modelling and Evaluation
in one cluster is to points in the neighboring clusters. This measure has a range of
Lt.
© Silhouette coefficients near + 1 indicate that the sample is far away from the
neighboring clusters. A value of 0 indicates that the sample is on or very close to
the decision boundary between two neighboring clusters and negative values
indicate that those samples might have been assigned to the wrong cluster.
Many clustering algorithms use distance measures to determine the similarity or
dissimilarity between any pair of data points. A valid distance measure should be
symmetric and oblains its minimum value (usually zero) in case of identical data
points. By computing the distance or (dis) similarity between each pair of
observations, a dissimilarity or distance matrix is obtained.
Improving Performance of a Model
* When we build random forest classifier we can tune the number of trees to build,
the number of variables to choose for splitting etc.
* Similarly, when we build deep learning algorithm we can specify how many
layers we would need, how many neurons we want in each layer, which
activation function we want. Tuning parameter enhances model performance if we
use the right type of parameters in an algorithm.
* One effective way to improve model performance is by tuning model parameter.
Model parameter tuning is the process of adjusting the model fitting options.
Fill in the Blanks
Q1 Structured representation of raw input data to the meaningful pattern is call
2 The process of assigning a model, and fitting a specific model to a data set is |
i
called model ____.
Q3 In bias-variance, when the value of ‘k’ is decreased, the model becomes simpler _
to fit and increases.
4 In bias-variance , When the value of kis increased, the variance
| QS Both underfitting and overfitting result in poor classification quality which is
reflected by low classification
Q.6 Overfitting refers to a situation where the model has been designed in such a |
way that it emulates the data too closely.
Q7 A typical case of underfitting may occur when trying to represent a |
data with a linear model as demonstrated by both cases of underfitting |
earning.
TECHNICAL PUBLICATIONS®- an up-thrust for knowledge
Scanned with CamScanner