[go: up one dir, main page]

0% found this document useful (0 votes)
36 views18 pages

ML Unit II Modelling Notes

Use it

Uploaded by

Kamali
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
0% found this document useful (0 votes)
36 views18 pages

ML Unit II Modelling Notes

Use it

Uploaded by

Kamali
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF or read online on Scribd
You are on page 1/ 18
Modelling and Evaluation Syllabus Selecting a Model : Predictive/Descriptive, Training a Model for supervised learning. model representation and interpretability, Evaluating performance of a model, Improving performance of a model. Contents 3.1. Selecting a Mode! 3.2 Training a Model for Supervised Leaming 3.3 Model Reprosentation and Interpretability 3.4 Evaluating Performance of a Mode! 3.5 Improving Performance of a Mode! 3.6 Fillin the Blanks 3.7 Multiple Choice Questions B-1 NEN ee UE EE EEE SEEEEESTETTEP REP Snel Scanned with CamScanner Machine Leaming 3:2 Modelling and Evaluation ERI Selecting a Model Structured representation of raw input data to the meaningful pattem is called a model. The model might have different forms. It might be a mathematical equation, it might be a graph or tree structure, it might be a computational block, etc. © Given easy-to-use machine leaming libraries like scikitleam and Keras, it is straightforward to fit many different machine leaming models on a given predictive modeling dataset Model selection is the task of selecting a statistical model from a set of candidate models, given data. © The decision regarding which model is to be selected for a specific data set is taken by the learning task, based on the problem to be solved and the type of daa «The process of assigning a model, and fitting a specific model to a data set is called mode! training. «Model selection is the process of selecting one final machine learning model from among 2 collection of candidate machine learning models for a training dataset. J * Model selection is 2 process that can be applied both across different types of i models (eg. logistic regression, SVM, KNN, etc) and across models of the same type configured with different model hyperparameters. Fitting models is relatively straightforward, although selecting among them is the true challenge of applied machine leaming. «All models have some predictive error, given the statistical noise in the data, the incompleteness of the data sample, and the limitations of each different model type. Therefore, the notion of a perfect or best model is not useful. Instead, we must seek 2 model that is “good enough” « The best approach to model selection requires “sufficient” data, which may be nearly mfirute depending on the complexity of the problem. © In this ideal situation, we would split the data into training, validation, and test sets, then fit candidate models on the training set, evaluate and select them on the validation set, and report the performance of the final model on the test set. EREI Predictive models «Predictive modelling is also called predictive analytics. It is a mathematical process that seeks to predict future events or outcomes by analyzing patterns that are likely to forecast future results. TECHNICAL PUBLICATIONS® - an up-thst for knowledge Scanned with CamScanner Mechine Leaming 3-3 ‘Modelling and Evaluation If you are trying to predict a continuous target, then you will need a regression model. But if you are trying to predict a discrete target, then you will need a classification model. ‘The predictive models have a clear focus on what they want to learn and how they want to learn. Predictive analysis provides answers of the future queries that move across using historical data as the chief principle for decisions It involves the supervised learning functions used for the prediction of the target value. The methods fall under this mining category are the classification, time-series analysis and regression. Data modeling is the necessity of the predictive analysis, which works by utilizing some variables to anticipate the unknown future data values for other variables. It provides organizations with actionable insights based on data. It provides an estimation regarding the likelihood of a future outcome. To do this, a variety of techniques are used, such as machine learning, data mining, modeling and game theory. Predictive modeling can, for example, help to identify any risks or opportunities in the future. Predictive analytics can be used in all departments, from predicting customer behaviour in sales and marketing, to forecasting demand for operations or determining risk profiles for finance. A very well-known application of predictive analytics is credit scoring used by financial services to determine the likelihood of customers making future credit payments on time. Determining such a risk profile requires a vast amount of data, including public and social data. Historical and transactional data are used to identify pattems, and statistical models and algorithms are used to capture relationships in various datasets. Predictive analytics has taken off in the big data era, and there are many tools available for organisations to predict future outcomes. The target feature is known as a class and the categories to which classes are divided into are called levels. The k-Nearest Neighbor, Naive Bayes, and decision tree are the popular classification models. Predictive models may also be used to predict numerical values of the target feature based on the predictor features. Popular regression models are Linear Regression and Logistic Regression. TECHNICAL PUBLICATIONS® - an up-trust for knowledge Scanned with CamScanner Reoning ant Seca to 3 Eg tee res & Sector sede med Gir ows Gan wnaif Hewitt Som the macht came em Sains Gee mm pew am Eeesame woes The pees of mr? aecoet med = ole cme! bee, = coerced Hering slemiim we do oat uve acy eget of come wens fr oe qseem Pos met ce Gowers perch = SSeer Seeos WG Sch pet Gr seme Gems bn Set es Descpoce Somtras § fe ceemoral feo of fuses: Stelignce md Gen woarss, sess Seeds ws Genii: ox "Scythe of Secs ed Sees se ie ee sti eee or eee Ge ee See ss. Dee peer Sm ae ced Sor sper ast eves : des agoceestion end an ome 2} pesene pes See om eesiy Geese Geemt ex Se eae: of 2 wide es ace 2 se of eciges r setiewing and exes Shr dnt set tm ended the Desspeve anairacs eins exgenisetioms to emderstend what happened in the past 2 bes w ondessns She velop between product and costomess. The abectve of Gis emaivsis is > umderstending, what approach to take in the fame. E we lee Som pest behaviour, i helps us to infuence future outcomes. Company sepans is an example of descrivive analytics which simply provides 2 ristock ceview of company uperations, stzkeholders, customers and financials. It aiso heips to describe and present data in such format, which can be easily sndeswvad by 2 wide vanety of business readers. The descriptive modeling task celled patter discovery is used to identify useful associations within data Pattern discovery is often used for market basket analysis on retailers’ transactional purchase data. Here, the goal is to identify items that are frequently purchased together, such that the leamed information can be used to refine marketing tactics. For instance, if a retailer leams that swimming trunks are conunonly purchased at the same time as sunglasses, the retailer might reposition the items more closely in the store or run a promotion to “up-sell” customers on associated items. TECHNICAL PUBLICATIONS® - an up-thrust lor knowledge Scanned with CamScanner acne aes 3-5 Wroming ee Featacor ‘Training 3 Model for Supervised Learning EERE oiscut method © The dams sik tp owo Gee: detmeess Labelled os 2 came md a eg Guaset. This Gm be 2 0/40 of 7H) of RVD spt. This echeigue is ald Se Bald-eat vabdztion tecrsyoe. © Supp we have 2 datthae wokh Asuse paises as the dependent varishle and neo Spdependent varias showing he squme fectage of he howe and Se mache of ros. Now, Emagine this detasct has 30 rows The whole idea & Sut you build a madd cen predict house prices accurately. © To Yrain’ yoor model, or see how well it performs, we randomly subset 0 of hose rows and St the modal. © The second step is to predict the values of thase 10 rows that we excluded and measure how well our predictions were. * Asa rile of dumb, experts suggest to randomly sample 80 % of the data into the training set and 20 % into the test set. + Training set : Used to train the dlassifier. ‘Total number of examples [ae] Fig. 324 © The holdout method has two, basic drawbacks : 1. It requires extra dataset 2 It is a single train-and-test experiment, the holdout estimate of error rate will be misleading if we happen to get an “unfortunate” split r4 Cross-Validation © Cross-validation is a technique for evaluating estimating performance by training several machine leaming models on subsets of the available input data and evaluating them on the complementary subset of the data. Use cross-validation to detect overfitting, ie., failing to generalize a pattem. * In general, machine learning involves deriving models from data, with the aim of achieving some kind of desired behaviour, e.g., prediction or classification. «Fig. 32.2 shows cross-validation. TECHINICAL PUBLICATIONS® - an up-tnust for knowledge Scanned with CamScanner Maco Lesy 3-6 Moasling and E:veitssten, i Tiiiiitii c) © Sot Ss eee sk & Gk Geen 2 eer of ped es When comme s dome. Se dae Sat was removed con be used tp test the performance Se femme Sock! on "nex is Ge basic idea for a whole dass of mode! : nckcidiid secaimmmen © Tepes of cose walteton meds ee boldonut, K-fold and Lezve oneout @ Toe Seideer meted § Se soiest od of cross validation The data set is sepamed Em oxo ses, led Se teking set end the testing set The firctin cose validation is one way to improve over the holdout method. The dam ts Sided me kc subsets, and the holdout method is repeated k times. © Each Gre ome of the k subsets is used as the test set and the other k — 1 subsets ” ame put meether to form 2 tracing set Then the average error across all k trials is oes © Lesvecnecut cuss validation is Kfold cross velidation taken to its logical exteme, with K equal to N, the number of data points in the set + Thet means thet N separate times, the function approximate is trained on all the d2ta except for one point and 2 prediction is made for that point. © Cross-validation ensures non-overlapping test sets. K-fold cross-validation : © In this technique, k - 1 folds ere used for training and the remaining one is used for testing 2s shown in Fig 323. } ‘© The advantage is that entire data is used for training and testing. The error rate of the model is average of the error rate of each iteration. i This technique can also be called 2 form the repeated hold-out method. The error rate could be improved by using stratification technique. | TECHNICAL PUBLICATIONS® - en up-tnust for knowledge Scanned with CamScanner 5 active Losing 3-7 Usdating set Entater Fig. 3.23 K-fold cress validation i Bootstrap Ensemble classifiers such as bagging, boosting and model averaging are known to have improved accuracy and robustness over a single model Although 5 unsupervised models, such as clustering, do not directiy generate label prediction L for each individual, they provide useful constraints for the joint prediction of a set of related objects. : © For given a training set of size n, create m samples of size n by drawing n examples from the original data, with replacement. Each bootstrap sample will en average contain 632 % of the unique training examples, the rest are replicates. It combines the m resulting models using simple majority vote. © In particular, on each round, the base learner is trained on what is often called 2 “bootstrap replicate” of the original training set. Suppose the training set consists of n examples. © Then a bootstrap replicate is a new training set that also consists of n examples, and which is formed by repeatedly selecting uniformly at random and with replacement n examples from the original training set. This means that the same example may appear multiple times in the bootstrap replicate, or it may appear not at all. © It also decreases error by decreasing the variance in the results due to unstable learners, algorithms (like decision trees) whose output can change dramatically when the training data is slightly changed. TEGHNICAL PUBLICATIONS® - an up-thrust for knowledge Scanned with CamScanner Machine Leaming 3-8 ‘Modelling and Evaluation Lazy vs. Eager Learner © Eager learning : Given a set of training set, constructs a classification model before receiving new data to classify. For example, decision tree induction, Bayesian classification, rule-based classification ete. © Lazy leaming ; Simply stores training data and waits until it is given a new instance. Lazy learners take less time in training but more time in predicting. For example, k-nearest-neighbor classifiers, case-based reasoning classifiers © Instance-based methods are also known as lazy learning because they do not generalize until needed. The eager learner must create a global approximation. The lazy learner can create many local approximations. EE] Mode! Representation and interpretability + In addition to using models for prediction, the ability to interpret what a model has leamed is receiving an increasing amount of attention. Interpretability has to do with how accurate a machine learning model can associate a cause to an effect. If a model can take the inputs, and routinely get the same outputs, the model is interpretable : 1. If you overeat your magi at dinnertime and you always have troubles sleeping, the situation is interpretable. 2 If all 2019 polls showed " ABC party" win and the "XYZ party" candidate took office, all those models showed low interpretability. Interpretability poses no issue in low-risk scenarios. If a model is recommending movies to watch, that can be a low-risk task Fitness of a target function approximated by a learning algorithm determines how correctly it is able to classify a set of data it has never seen. Underfitting and Overfitting * Training error can be reduced by making the hypothesis more sensitive to training data, but this may lead to overfitting and poor generalization. © Overfitting occurs when a statistical mode! describes random error or noise instead of the underlying relationship. Overfitting is when a classifier fits the training data too tightly. Such a classifier works well on the training data but not on independent test data. It is a general problem that plagues all machine learning methods. TECHNICAL PUBLICATIONS® - an up-thnust for inowleclge Scanned with CamScanner Machine Leaning 3-9 Modelling and Evaluation ¢ Underfitting : If we put too few variables in the model, leaving out variables that could help explain the response, we are underfitting, Consequences : 1. Fitted model is not good for prediction of new data - prediction is biased 2. Regression coefficients are biased 3. Estimate of error variance is too large ‘© Because of overfitting, low error on training data and high error on test data. Overfitting occurs when a model begins to memorize training data rather than learning to generalize from trend. The more difficult a criterion is to predict, the more noise exists in past information that need to be ignored. The problem is determining which part to ignore. © Overfitting generally occurs when a model is excessively complex, such as having too many parameters relative to the number of observations. We can determine whether a predictive model is underfitting or overfitting the training data by looking at the prediction error on the training data and the evaluation data. Fig. 3.3.1 shows underfiting and overfiting. Underfiting X Balanced = % Overfiting Fig. 3.3.4 «Reasons for overfitting 1. Noisy data 2. Training set is too small 3. Large number of features ¢ In the machine learning the more complex model is said to show signs of overfitting, while the simpler model underfitting. Often several heuristic are developed in order to avoid overfitting, for example, when designing neural networks one may : 1. Limit the number of hidden nodes 2. Stop training early to avoid a perfect explanation of the training set, and 3. Apply weight decay {o limit the size of the weights, and thus of the function class implemented by the network TECHNICAL PUBLICATIONS® - an upthrist for knowledge Scanned with CamScanner Machine Leaming 3-10 ‘Modeling and Evatvation * In the experimental practice we observe an important phenomenon called the bias variance dilemma. * In supervised leaming, the class value assigned by the leaming model built based on the training data may differ from the actual class value. This error in learning can be of two types, errors due to ‘bias’ and error due to ‘variance’. © Fig, 33.2 shows bias-variance trade off. Low variance High variance ©6 " Fig. 3.3.2 Blas-variance trade off * Give two classes of hypothesis (eg. linear models and k-NNs) to fit to some training data set, we observe that the more flexible hypothesis class has a low bias term but a higher variance term. If we have parametric family of hypothesis, then we can increases the flexibility of the hypothesis but we still observe the increase of variance. * The bias-variance-dilemma is the problem of simultaneously minimizing two sources of error that prevent supervised leaming algorithm from generalizing beyond their training set : 1. The bias is error from erroneous assumptions in the learning algorithm. High bias can cause an algorithm to miss the relevant relations between features and target outputs. 2. The variance is error from sensitivity to small fluctuations in the training set. High variance can cause overfitting : modeling the random noise in the training data, rather than the intended outputs. @ In order to reduce the model error, the designer can aim at reducing either the bias or the variance, as the noise components is irreducible. TECHNICAL PUBLICATIONS® - an up-tust for knowtedge Varlance Low bias High bias Scanned with CamScanner Machine Loaming ae Modeting and Evaluation «As the model increases in complexity, Its bias is likely to diminish. However, ax the number of training examples is kept fixed, the parametric identification of the model may strongly vary from one DN to another. This will increase the variance term. © Atone stage, the decrease in bias will be inferior to the increase in variance, warming that the model should not be too complex. Conversely, to decrease the variance term, the designer has to simplify its mode! so that it is less sensitive to a specific training, set. This simplification will lead to a higher bias. EEE] Evaluating Performance of a Model EERE Suporvised Learning : Classification # Classification is major task of supervised learning. The responsibility of the classification model is to assign class label to the target feature based on the value of the predictor features. © When performing classification predictions, there's four types of outcomes that could occur. The evaluation measures in classification problems are defined from a matrix with the numbers of examples correctly and incorrectly classified for each class, named confusion matrix. . Confusion matrix is also called a contingency table. 1) True positives are when you predict an observation belongs to a class and it actually does belong to that class. 2) True negatives are when you predict an observation does not belong to a class and it actually does not belong to that class. 3) False positives occur when you predict an observation belongs to a class when in reality it does not. 4) False negatives occur when you predict an observation does not belong to a class when in fact it does. © Confusion matrix goes deeper than classification accuracy by showing the correct and incorrect (ie. true or false) predictions on each class, In case of a binary classification task, a confusion matrix is a 2x2 matrix. If there are three different classes, it is a 3x3 matrix and so on. TECHNICAL PUBLICATIONS® - an up-thust for knowedge Scanned with CamScanner Machine Leaming 9-12 Modelling and Evaluation «For any classification model, model accuracy is given by total number of correct classifications (True Positive or True Negative) divided by total number of classifications done. Accuracy rate = rue negatives + True positives| False negatives|+ [True positives + [True negatives|+ [rue positives| * The complement of accuracy rate is the error rate, which evaluates a classifier by its percentage of incorrect predictions. [False negatives| + |False positives | Error rate = = rT [False negatives| + [False positives| + [True negatives| + [True positives] Error rate = 1- (Accuracy rate) » The recall accuracy rate predicted as positive. * The specificity is a statistical measure of how well a binary classification test correctly identifies the negatives cases, [Inse negative Recall 00 5 —_— a [rue positivg +|False negative| cea [True positives} pecificity = [False positives|+ True negatives| | True Positive Rate (TPR) is also called sensitivity, hit rate and recall. ‘es Number of true positives | Sensitivity = 5 ber of true positives + Number of false negative Precision measures how good our model is when the prediction is positive. co = IP ia Precision = = 55 The focus of precision is positive predictions. It indicates how many positive predictions are true. x F, score is the weighted average of precision and recall. Fyascore = 2 Piecisiont Recall y Precision + Recall @ F, score is a more useful measure than accuracy for problems with uneven class distribution because it takes into account both false positive and false negatives. © Kappa value of a model indicates the adjusted the model accuracy Total accuracy ~ Random accuracy 1- Random accuracy Kappa = Scanned with CamScanner ‘Machine Learning 3-13 Modolling and Evaluation * Total accuracy is simply the sum of true positive and true negatives, divided by the total number of items, that is : . TP+IN Total accurecy = TEEN GEPTEN «+ Random Accuracy is defined as the sum of the products of reference likelihood and result likelihood for each class. That is, Random accuracy = Actual False «Predicted False+ Actual True * Predicted True Total *Total « In terms of false positives etc., random accuracy can be written as : Random accuracy = N+P) *(IN-+FN)+(EN+1P) +(FP+TP) Total *Total Consider the following treecclass confusion matri. | i Predicted | 15 2 3 | | Actual : ; | 1 a a i| ae 2 3 | Calculate precision and —e a asi ; | recall forthe classe," PET as. Also calculate weighted average precision and | Solution ; Classifier accuracy « 15415445 ee BH5445 Calculate perctasg re First class = 15 a Scanned with CamScanner Machine Leaming 3-14 Modeling a, rd Eva, Sten 5 cacuete axes, prison and el for he lowing — Predicted + Predicted = ial as =08 SEES cats trurmeatie rt e bacurcy end ps forthe fosing + 8 aR | Actual — 5. 20 Solution : Accuracy = — 20425 075 = 75 % 50+5+25420 — vn . Precision 0.59090 © True negative rate is also called as specificity. ops [True negatives! - True negatives Spectlty = Fe positived [ine negative . 2» True negative rate = 70> = 08 ROC Curve : Receiver Operating Characteristics (ROC) graphs have long been used in signal detection theory to depict the trade-off between hit rates and false alarm rates over noisy channel. Recent years have seen an increase in the use of ROC graphs in the machine learning community. TECHNICAL PUBLICATIONS® . an up-thnust for knowledge Scanned with CamScanner Machine Leeming 3-15 Modeiting and Evelvation » ROC curve summarizes the performance of the model at different threshold values by combining confusion matrices at all threshold values. ROC curves are typically used in binary classification to study the output of a classifier. * An ROC plot plots true positive rate on the Y-axis against false positive rate on the X-axis; a single contingency table corresponds to a single point in an ROC plot. ‘© The performance of a ranker can be assessed by drawing a piecewise linear curve in an ROC plot, known as an ROC curve. The curve starts in (0, 0), finishes in (1, 2), and is monotonically non-decreasing in both axes. » A useful technique for organizing classifiers and visualizing their performance. Especially useful for domains with skewed class distribution and unequal classification error costs. «It allows to create ROC curve and a complete sensitivity/specificity report. The ROC curve is a fundamental tool for diagnostic test evaluation. | In a ROC curve the true positive rate (Sensitivity) is plotted in function of the false positive rate for different cut-off points of a parameter. ® Each point on the ROC curve represents a sensitivity/specificity pair corresponding to a particular decision threshold. The area under the ROC curve is a measure of how well a parameter can distinguish between two diagnostic groups. « Each point on an ROC curve connecting two segments corresponds to the true and false positive rates achieved on the same test set by the classifier obtained from the ranker by splitting the ranking between those two segments. * An ROC curve is convex if the slopes are monotonically non-increasing when moving along the curve from (0, 0) to (1, 1). A concavity in an ROC curve, ie, two or more adjacent segments with increasing slopes, indicates a locally worse than random ranking. True Positive Rate (TPR) is a synonym for recall and is therefore defined as follows: True Positive Rate TPR = TP. TP+EN False Positive Rate (FPR) is defined as follows : False Positive Rate FPR = }PrIn TECHNICAL PUBLICATIONS® - an up-tnust for knowledge rr Scanned with CamScanner Machine Leaming 3-16 Modeling end Evaluation Supervised Leaming : Regression # A regression model which ensures that the difference between predicted ang actual values is low can be considered as a good model. © For example, a regression model could be used to predict the values of a data warehouse based on web-marketing, number of data entries, size and other factors. A regression task begins with a data set in which the target values are known, Regression analysis is a good choice when all of the predictor variables are continuous valued as well. «Fig. 3.4.1 shows linear regression model. Y Dependent variable Value of the apartment unit Area (in square feet) —= Fig. 3.4.1 Linear regression model «If ‘area’ is the predictor variable (say x) and ‘value’ is the target variable (say y), the linear regression model can be represented in the form : y = c +x TECHNICAL PUBLICATIONS® - an up-thrust for knowledge Scanned with CamScanner as ar Machine Leeming a-17 ‘Modelling and Evaluation «In this equation 1. y is the output variable. It is also called the target variable in machine learning, or the dependent variable in statistical modeling. It represents the continuous value that we are trying to predict. 2. x is the input variable. In machine learning, x is referred to as the feature, while in statistics, it is called the independent variable. It represents the information given to us at any given time. 3, Bis the regression coefficient or scale factor. It assumes that there exists a linear relationship between a dependent variable and independent variable(s). The value of the dependent variable of a linear regression model is a continuous value ie. real numbers. Linear regression is a statistical tool that determines how well a straight line fits a set of paired data. The straight line that best fits that data is called the least squares regression line. « The distance between the actual value and predicted values is called residual. If the observed points are far from the regression line, then the residual will be high, and so cost function will high. If the scatter points are close to the regression line, then the residual will be small and hence the cost function. « R-squared is a good measure to evaluate the model fitness. It is also known as the coefficient of determination. R-squared is the fraction by which the variance of the errors is less than the variance of the dependent variable. It is called R-squared because in a simple regression model it is just the square of the correlation between the dependent and independent variables, which is commonly denoted by 'r’. «Ina multiple regression model R-squared is determined by pairwise correlations among all the variables, including correlations of the independent variables with each other as well as with the dependent variable. EZE] Unsupervised Learning : Clustering ‘© Clustering groups data points based on their similarities. Each group is called a cluster and contains data points with high similarity and low similarity with data points in other clusters. © The objective of clustering is to segregate groups with similar traits and bundle them together into different clusters. © Silhouette analysis can be used to study the separation distance between the resulting clusters. The silhouette plot displays a measure of how close each point TECHNICAL PUBLICATIONS® - an up-irus! fr knowledge Scanned with CamScanner Machine Leaming 3-18 Modelling and Evaluation in one cluster is to points in the neighboring clusters. This measure has a range of Lt. © Silhouette coefficients near + 1 indicate that the sample is far away from the neighboring clusters. A value of 0 indicates that the sample is on or very close to the decision boundary between two neighboring clusters and negative values indicate that those samples might have been assigned to the wrong cluster. Many clustering algorithms use distance measures to determine the similarity or dissimilarity between any pair of data points. A valid distance measure should be symmetric and oblains its minimum value (usually zero) in case of identical data points. By computing the distance or (dis) similarity between each pair of observations, a dissimilarity or distance matrix is obtained. Improving Performance of a Model * When we build random forest classifier we can tune the number of trees to build, the number of variables to choose for splitting etc. * Similarly, when we build deep learning algorithm we can specify how many layers we would need, how many neurons we want in each layer, which activation function we want. Tuning parameter enhances model performance if we use the right type of parameters in an algorithm. * One effective way to improve model performance is by tuning model parameter. Model parameter tuning is the process of adjusting the model fitting options. Fill in the Blanks Q1 Structured representation of raw input data to the meaningful pattern is call 2 The process of assigning a model, and fitting a specific model to a data set is | i called model ____. Q3 In bias-variance, when the value of ‘k’ is decreased, the model becomes simpler _ to fit and increases. 4 In bias-variance , When the value of kis increased, the variance | QS Both underfitting and overfitting result in poor classification quality which is reflected by low classification Q.6 Overfitting refers to a situation where the model has been designed in such a | way that it emulates the data too closely. Q7 A typical case of underfitting may occur when trying to represent a | data with a linear model as demonstrated by both cases of underfitting | earning. TECHNICAL PUBLICATIONS®- an up-thrust for knowledge Scanned with CamScanner

You might also like