0% found this document useful (0 votes)

28 views9 pages

Unit 5 New

Uploaded by

SENTHAMIZH VANI A P

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

28 views9 pages

Unit 5 New

Uploaded by

SENTHAMIZH VANI A P

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

UNIT V

Generalization error, also known as the difference between training and testing error, occurs when a
model performs well on the training data but poorly on new, unseen data.

Types of Generalization Errors:

1. Overfitting: Models are too complex and fit the noise in the training data.
2. Underfitting: Models are too simple and fail to capture important patterns.
Causes of Generalization Errors:
1. Insufficient training data
2. Model complexity
3. Noise in training data
4. Over-optimization
5. Poor feature selection

Overfitting occurs when a model is too complex and learns the noise in the training data, resulting in
poor performance on unseen data.
Overfitting Leads to
1. High training accuracy
2. Low test accuracy
3. Model complexity (e.g., many features, deep neural networks)
Overfitting Prevention Techniques/ways
1. Regularization (L1, L2, dropout)
2. Cross-validation
3. Early stopping
4. Feature selection
5. Data augmentation
6. Ensemble methods (bagging, boosting)

Examples:

1. Image Classification: A neural network with 1000 layers and 1000 features is trained on a dataset of
100 images. The model achieves 99% accuracy on the training data but only 50% accuracy on the test
data.

2. Stock Market Prediction: A linear regression model with 100 features is trained on a dataset of 1000
stock prices. The model achieves a high R-squared value on the training data but performs poorly on
unseen data.

Cross-validation is a statistical technique used to evaluate the performance of a machine learning model
by training and testing it on multiple subsets of the data. An alternative to predefined separate training
and testing data to validate generalizability is the cross-validation technique, sometimes called rotation
estimation.

There are different variations of cross-validation used in practice:

1. Holdout method

The simplest kind of cross-validation is the holdout method. In this case, the sample is separated into
two disjoint sets, called the training set and the testing set. A model is built using the training set only.
Then this model is asked to predict the output values for the test. The errors in predictions are
accumulated. Which is often called mean absolute test set error and this error serves as an evaluation
measure of the model.

2. K-Fold Cross-Validation

In this case, the dataset is divided into k subsets. The evaluation is done k number of times. Each time,
one of the k slices is kept aside for testing and the other k − 1 subsets put together to be used as the
training set. Thus, it can be seen as the holdout method being repeated k times here. The average error
across all k trials is computed, which is the overall accuracy of the model.

The advantage of k-fold cross-validation is that the model is less biased from how the data gets divided
between training and test sets. Every data point gets a chance to be in a test set exactly once and in the
training set k − 1 times. Therefore, the variance of the resulting estimate is reduced as k is increased.

The disadvantage of this over the holdout method is that for k-fold cross-validation the training
algorithm has to be run k times compared to just once in the holdout method, and it takes k times as
much computation to make an evaluation.

3. Leave-One-Out Cross-Validation (LOOCV)

Leave-one-out cross-validation is another logical extreme variation of k-fold cross validation, where k is
equal to the number of data points (N) in the set. As a result, the function approximator is used to train
on all the data except for one point, which is kept aside for testing, and this process is repeated for each
data point used exactly once as test case, in total N number of times.

The average error in N evaluations is computed and used as the overall error of the model. The
evaluation provided by leave-one-out cross validation method (LOO-XVE) is good. Low bias since nearly
all data is used for training in each iteration. Suitable for small datasets where we need to maximize the
use of available data for training.

The disadvantage is, it seems very expensive to compute for large datasets since the model must be
trained 𝑛 times. High variance in model evaluation since each fold is tested on only one data point,
leading to potentially high variability in results.

4. Leave-P-Out Cross-Validation (LPOCV)

Train on entire data except p samples, test on those p samples. Repeat for all combinations. The number
of iterations depends on the number of ways p points can be chosen from the dataset of size 𝑛 .This can
become very large for even moderately sized datasets and values of 𝑝.

The advantages is that, it provides a more comprehensive evaluation of the model's performance by
testing on all possible subsets of p points. Disadvantage is computationally more expensive than LOOCV,
especially for large p or large n, as the number of combinations grows exponentially.
5. Stratified K-Fold Cross-Validation

In stratified K-fold cross-validation, the dataset is divided into 'K' number of folds, and the folds are made
by preserving the percentage of samples for each class. For each of the K rounds, a different fold is used
as a validation set (or test set), while the remaining K-1 folds form the training set. The model is trained
on the training set and validated on the validation set. This process is repeated K times, with each of the
K folds used exactly once as the validation data.

After training, the model's performance is evaluated in each iteration, and the results are averaged out to
get a final score. This score is more robust than a simple train-test split, as it ensures that every observation
from the original dataset has the chance to appear in both the training and validation sets, mitigating any
potential bias in the model evaluation due to particular data splits.

Advantages: Stratified K-fold cross-validation is advantageous because it reduces the variance associated
with a single trial of train-test split. By preserving the distribution of classes in each fold, it ensures that
each training and validation set is representative of the overall dataset, which is especially important in
imbalanced datasets where some classes dominate over others.

Underfitting

Definition: Underfitting occurs when a model is too simple and fails to capture the underlying patterns in
the training data.

Symptoms:

1. Low training accuracy

2. Low test accuracy

3. Simple model (e.g., few features, shallow neural networks)

Causes:

1. Too few features

2. Insufficient model complexity

3. Inadequate training data

Consequences:

1. Poor predictive performance

2. Failure to capture important patterns

3. Inadequate insights from the model

Examples:

1. Customer Churn Prediction: A logistic regression model with only 2 features is trained on a dataset of
1000 customer records. The model achieves a low accuracy on both training and test data.

2. Speech Recognition: A shallow neural network with only 1 hidden layer is trained on a dataset of 1000
speech recordings. The model fails to recognize speech patterns accurately.

Prevention and Remedies:

Overfitting:

1. Regularization (L1, L2, dropout)

2. Cross-validation

3. Early stopping

4. Feature selection

5. Data augmentation

6. Ensemble methods (bagging, boosting)

Underfitting:

1. Increase model complexity

2. Add relevant features

3. Increase training data

4. Hyperparameter tuning

5. Feature engineering
Hyper Parameter tuning using Grid Search
Grid Search:

Grid Search is a method for finding the optimal combination of hyperparameters for a machine learning
model. Implementing a grid search is the best way to verify the best hyperparameters for an algorithm.
This means, in the case of complex settings of multiple parameters, that has to run hundreds, if not
thousands, of slightly differently tuned models.

Grid searching is a systematic search method that combines all the possible combinations of the
hyperparameters into individual sets. It’s a time-consuming technique. However, grid searching provides
one of the best ways to optimize a machine learning application that could have many working
combinations, but just a single best one.

Key characteristics:

1. Exhaustive search

2. Pre-defined hyperparameter ranges

3. Evaluates all possible combinations

4. Computational expensive

Hyperparameter Tuning:

Hyperparameter Tuning is the process of adjusting hyperparameters to optimize a model's performance.

Key characteristics:

1. Automated or manual adjustment

2. Goal-oriented (e.g., maximize accuracy)

3. May use various optimization techniques (e.g., Bayesian, gradient-based)

4. Can be computationally efficient or expensive

Grid Search is a systematic approach for hyperparameter optimization in machine learning. It is essential
for finding the best combination of hyperparameters that yields the most effective model performance.

Importance of Grid Search:

1. Systematic Search for Optimal Parameters: Grid Search allows for an exhaustive search over a
specified set of hyperparameters. By evaluating every possible combination within the defined
parameter grid, it ensures that the best parameter set is identified to maximize model
performance.

2. Enhancing Model Accuracy: The performance of a machine learning model heavily depends on
hyperparameters such as learning rate, regularization strength, number of layers, and kernel
type (in SVMs). Proper tuning can significantly improve the model’s accuracy, precision, recall,
and overall robustness.

3. Automated and Reproducible: Grid Search automates the process of hyperparameter selection,
making it more consistent and reproducible compared to manual tuning. This leads to a more
reliable model-building process, especially when scaling up to more complex models.

4. Cross-Validation for Reliable Results: Grid Search often incorporates cross-validation (e.g., k-fold
cross-validation), which ensures that the hyperparameter evaluation is robust against overfitting.
Each combination of hyperparameters is validated across multiple data splits, providing a more
accurate estimate of performance.

Steps to perform Grid Search:

1. Define the Hyperparameter Grid: Specify the hyperparameters and their potential values to
explore. For example:

param_grid = {

'C': [0.1, 1, 10],

'kernel': ['linear', 'rbf'],

'gamma': [1, 0.1, 0.01]

 'C': This is a hyperparameter for Support Vector Machines (SVM). It controls the regularization
strength. Smaller values of C create a wider margin but may allow more misclassifications,
making the model more generalized. Larger values of C create a narrower margin, making the
model try to classify all training examples correctly, which can lead to overfitting.
 'kernel': This parameter specifies the type of kernel function to be used in the algorithm. The
kernel function determines how the input space is transformed.

o 'linear' uses a linear kernel, which is suitable for linearly separable data.

o 'rbf' (Radial Basis Function) is a non-linear kernel that is powerful for capturing complex
relationships.

 'gamma': This is another hyperparameter for SVM, specifically for non-linear kernels like 'rbf'. It
defines the influence of a single training example:

 Higher values of gamma mean the model will consider only points close to the decision
boundary, making it more complex and potentially overfitting.

 Lower values mean that the model will consider points farther away, resulting in a
simpler and smoother decision boundary.

2. Evaluate Each Combination: Train the model using each combination of hyperparameters and
evaluate its performance using a scoring metric (e.g., accuracy, F1-score).
3. Select the Best Combination: Identify the combination that results in the highest performance
metric and use it to train the final model.

Examples:

1. Support Vector Machine (SVM) Optimization: In a classification problem, using an SVM model
with C (regularization parameter) and gamma (kernel coefficient) as hyperparameters, a Grid
Search helps find the best pair for the problem. For instance, if a Grid Search is conducted over C
= [0.1, 1, 10] and gamma = [0.1, 0.01, 0.001], the method systematically tests each combination
(e.g., C=0.1, gamma=0.1, C=1, gamma=0.01, etc.) and selects the combination that achieves the
highest cross-validated performance.

2. Random Forest Parameter Tuning: In Random Forest models, parameters like the number of
trees (n_estimators), maximum depth of trees (max_depth), and the number of features to
consider for the best split (max_features) are often optimized using Grid Search.

For example:

param_grid = {

'n_estimators': [100, 200, 300],

'max_depth': [10, 20, 30],

'max_features': ['auto', 'sqrt']

This grid search approach evaluates different combinations and finds the optimal settings that improve
model performance on validation data.

3. Neural Network Hyperparameter Tuning: For more complex models like neural networks, Grid
Search can optimize hyperparameters such as learning rate, batch size, number of hidden layers,
and number of units per layer. By testing combinations like learning rate=[0.01, 0.001], batch
size=[32, 64, 128], and hidden units=[50, 100], practitioners can observe which settings yield the
most effective training and validation results.

Grid Search used in a Classification example:

from sklearn.neighbors import KNeighborsClassifier

classifier = KNeighborsClassifier(n_neighbors=5, weights='uniform', metric= 'minkowski', p=2)

The K-neighbors classifier has quite a few hyperparameters that can be set for optimal performance:

 The number of neighbor points to consider in the estimate

 How to weight each of them
 What metric to use for finding the neighbors

Using a range of possible values for all the parameters— exactly 40 in this case:

grid = {'n_neighbors': range (1,11),

'weights': ['uniform', 'distance'], 'p': [1,2]}
print ('Number of tested models: %i'
% np.prod([len(grid[element]) for element in grid]))
score_metric = 'accuracy'

The code multiplies the number of all the parameters and prints the result:

Number of tested models: 40

To set the instructions for the search, need to build a Python dictionary whose keys are the names of the
parameters, and the dictionary’s values are lists of the values that we want to test. For instance, in the
above example records a range of 1 to 10 for the hyperparameter n_neighbors using the range (1,11)
iterator, which produces the sequence of numbers during the grid search.

After being instantiated with the learning algorithm, the search dictionary, the scoring metric, and the
cross-validation folds, the GridSearch class operates with the fit method. Optionally, after the grid search
ended, it refits the model with the best found parameter combination (refit=True), allowing it to
immediately start predicting by using the GridSearch class itself. When the search is completed, it can be
inspected using the best_params_ and best_score_ attributes

Grid Search used in a Regression example:

from sklearn.model_selection import train_test_split

X_train,X_test,Y_train,Y_test=train_test_split(X,Y,test_size=0.2,random_state=42)

from xgboost import XGBRegressor

model=XGBRegressor(random_state=42)

#Dictionary of Hyperparameter values to search

search_grid={
GS.fit(X_train,Y_train)

print(GS.best_estimator_)

df=pd.DataFrame(GS.cv_results_)

df=df.sort_values("rank_test_r2")

df.to_csv("GS_best_results.csv")

Practical Considerations:

Computational Cost: Grid Search can be computationally expensive, especially when dealing with large
datasets and complex models. To manage this, techniques like Randomized Search (which samples a
fixed number of parameter combinations) or Bayesian optimization can be used as more efficient
alternatives.

Parallel Computing: Many implementations of Grid Search, such as GridSearchCV in scikit-learn, support
parallel processing, which can help mitigate the time cost by utilizing multiple CPU cores.

Cross Validation in ML
No ratings yet
Cross Validation in ML
5 pages
ML Mod 5
No ratings yet
ML Mod 5
58 pages
Unit 2
No ratings yet
Unit 2
28 pages
Unit 5 (ML)
No ratings yet
Unit 5 (ML)
25 pages
ML-4 Cross Validation in Machine Learning
No ratings yet
ML-4 Cross Validation in Machine Learning
13 pages
P-2.1.2 Cross Validation and Regularization
No ratings yet
P-2.1.2 Cross Validation and Regularization
37 pages
List Steps in Data Preparation. Give Short Description of Each Step
No ratings yet
List Steps in Data Preparation. Give Short Description of Each Step
20 pages
Topic 3
No ratings yet
Topic 3
48 pages
Improving Machine Learning Performance
No ratings yet
Improving Machine Learning Performance
14 pages
DS Unit 5
No ratings yet
DS Unit 5
18 pages
Cross Validation for ML Models
No ratings yet
Cross Validation for ML Models
6 pages
Cross Validation in Machine Learning
No ratings yet
Cross Validation in Machine Learning
4 pages
Cross Validation
No ratings yet
Cross Validation
5 pages
Lec - 4
No ratings yet
Lec - 4
43 pages
ML-4th Unit
No ratings yet
ML-4th Unit
44 pages
ML Unit4 Notes
No ratings yet
ML Unit4 Notes
20 pages
18 Bias Variance K-foldCrossValidation Boosting
No ratings yet
18 Bias Variance K-foldCrossValidation Boosting
23 pages
Machine Learning Data Splits Guide
No ratings yet
Machine Learning Data Splits Guide
30 pages
Chap 2 Logistique Regression
No ratings yet
Chap 2 Logistique Regression
32 pages
Cross Validation Techniques
No ratings yet
Cross Validation Techniques
27 pages
Question1 Answers Complete
No ratings yet
Question1 Answers Complete
4 pages
CH 05 Optimization Technique
No ratings yet
CH 05 Optimization Technique
58 pages
Unit IV
No ratings yet
Unit IV
51 pages
ML Unit4 Notes
No ratings yet
ML Unit4 Notes
20 pages
ML Chap 5
No ratings yet
ML Chap 5
14 pages
Cross Validation
No ratings yet
Cross Validation
4 pages
Chapter2 1 33
No ratings yet
Chapter2 1 33
18 pages
Module3-Ensemble Learning
No ratings yet
Module3-Ensemble Learning
107 pages
Cross Validation
No ratings yet
Cross Validation
7 pages
ML Unit 4 Trupesh Patel
No ratings yet
ML Unit 4 Trupesh Patel
56 pages
ML Module Iii
No ratings yet
ML Module Iii
12 pages
15-The Bias - Variance - Trade-Off-08-04-2024
No ratings yet
15-The Bias - Variance - Trade-Off-08-04-2024
23 pages
Unit 4
No ratings yet
Unit 4
34 pages
Unit 9 Model Evaluation
No ratings yet
Unit 9 Model Evaluation
26 pages
MLquestions
No ratings yet
MLquestions
26 pages
Notes - Unit 3 - Machine Learning Lnctu-Bca (Aida) - IV Sem
No ratings yet
Notes - Unit 3 - Machine Learning Lnctu-Bca (Aida) - IV Sem
19 pages
Unit 2 Part 2 Data Science Final 23june
No ratings yet
Unit 2 Part 2 Data Science Final 23june
39 pages
Lecture Note #6 - PEC-CS701E
No ratings yet
Lecture Note #6 - PEC-CS701E
11 pages
Unit V
No ratings yet
Unit V
16 pages
Cross Validation
No ratings yet
Cross Validation
16 pages
Section 1: Cross-Validation and Model Performance
No ratings yet
Section 1: Cross-Validation and Model Performance
33 pages
Lec-1 Bias-variance-Tradeoff
No ratings yet
Lec-1 Bias-variance-Tradeoff
24 pages
T1 ML QB Soln
No ratings yet
T1 ML QB Soln
23 pages
Cofusion Matrix Cross - Validation
No ratings yet
Cofusion Matrix Cross - Validation
34 pages
6 Model Evalution
No ratings yet
6 Model Evalution
16 pages
Module 3 - ML
No ratings yet
Module 3 - ML
101 pages
Learning Best Practices For Model Evaluation and Hyperparameter Tuning
No ratings yet
Learning Best Practices For Model Evaluation and Hyperparameter Tuning
17 pages
Xiiaiuniticapstone Projectpartii
No ratings yet
Xiiaiuniticapstone Projectpartii
11 pages
Cross Validation
No ratings yet
Cross Validation
16 pages
Comparing Multiple Algorithms
No ratings yet
Comparing Multiple Algorithms
70 pages
ML Unit 2 Part 1
No ratings yet
ML Unit 2 Part 1
47 pages
Sklearn Cross-Validation Guide
100% (1)
Sklearn Cross-Validation Guide
9 pages
Model Generalization
No ratings yet
Model Generalization
117 pages
EDA Module 2
No ratings yet
EDA Module 2
28 pages
ML Pyq Ans
No ratings yet
ML Pyq Ans
37 pages
Ch6-Models Selection Evaluating Classifiers
No ratings yet
Ch6-Models Selection Evaluating Classifiers
28 pages
ML.1Lecture.2 (Old)
No ratings yet
ML.1Lecture.2 (Old)
23 pages
Energy Consumption Forecasting Model For Puerto Princesa Distribution System Using Multiple Linear Regression
No ratings yet
Energy Consumption Forecasting Model For Puerto Princesa Distribution System Using Multiple Linear Regression
4 pages
Unstructured Data Classification
100% (2)
Unstructured Data Classification
83 pages
Business Statistics Operations Research
No ratings yet
Business Statistics Operations Research
8 pages
VAR Slides
No ratings yet
VAR Slides
12 pages
Logistic Regression Cia3
No ratings yet
Logistic Regression Cia3
14 pages
Test Bank For Introductory Econometrics A Modern Approach 6th Edition Wooldridge 130527010X 9781305270107 Instant Download
100% (3)
Test Bank For Introductory Econometrics A Modern Approach 6th Edition Wooldridge 130527010X 9781305270107 Instant Download
160 pages
Naskah Publikasi
No ratings yet
Naskah Publikasi
23 pages
Conducting and Interpreting Canonical Correlation Analysis in Personality Research
No ratings yet
Conducting and Interpreting Canonical Correlation Analysis in Personality Research
13 pages
13 Optimizing Schnapsen Strategy - Report
No ratings yet
13 Optimizing Schnapsen Strategy - Report
62 pages
Anova Sheet
No ratings yet
Anova Sheet
4 pages
Nptel Notes 4
No ratings yet
Nptel Notes 4
12 pages
Correlation and Regression Notes 2.0
No ratings yet
Correlation and Regression Notes 2.0
6 pages
Data Processing and Analysis
No ratings yet
Data Processing and Analysis
13 pages
ML 06 Multiclass
No ratings yet
ML 06 Multiclass
11 pages
Forecasting and Learning Theory
No ratings yet
Forecasting and Learning Theory
46 pages
ANOVA Test on Money's Effect
No ratings yet
ANOVA Test on Money's Effect
15 pages
Session 6 BEDO - Hyd (Before Class)
No ratings yet
Session 6 BEDO - Hyd (Before Class)
31 pages
Multilevel Analysis For Applied Research It S Just Regression Methodology in The Social Sciences 1st Edition Robert Bickel Download
100% (6)
Multilevel Analysis For Applied Research It S Just Regression Methodology in The Social Sciences 1st Edition Robert Bickel Download
71 pages
Feasible Generalized Least Squares For Panel Data With Cross-Sectional and Serial Correlations
No ratings yet
Feasible Generalized Least Squares For Panel Data With Cross-Sectional and Serial Correlations
18 pages
Basic Stats:: A Supplement To Multivariate Data Analysis
No ratings yet
Basic Stats:: A Supplement To Multivariate Data Analysis
55 pages
Excercise Review (Chapter 6)
No ratings yet
Excercise Review (Chapter 6)
5 pages
Difference GMM vs. System GMM Term Paper
No ratings yet
Difference GMM vs. System GMM Term Paper
23 pages
ECON 330 Problem Sets
No ratings yet
ECON 330 Problem Sets
3 pages
CSE3506 - Essentials of Data Analytics: Facilitator: DR Sathiya Narayanan S
No ratings yet
CSE3506 - Essentials of Data Analytics: Facilitator: DR Sathiya Narayanan S
36 pages
Comparison Between Hybrid QRNN and ARIMAX
No ratings yet
Comparison Between Hybrid QRNN and ARIMAX
8 pages
Role of Micro-Financing in Women Empowerment: An Empirical Study of Urban Punjab
No ratings yet
Role of Micro-Financing in Women Empowerment: An Empirical Study of Urban Punjab
16 pages
Linear Regression Using R
No ratings yet
Linear Regression Using R
24 pages
Chapter-6-Simple Linear Regression & Correlation
No ratings yet
Chapter-6-Simple Linear Regression & Correlation
12 pages
Econometrics (4th Edition) Baltagi
No ratings yet
Econometrics (4th Edition) Baltagi
10 pages
Cia 2
No ratings yet
Cia 2
14 pages

Unit 5 New

Uploaded by

Unit 5 New

Uploaded by

UNIT V

Types of Generalization Errors:

There are different variations of cross-validation used in practice:

3. Leave-One-Out Cross-Validation (LOOCV)

4. Leave-P-Out Cross-Validation (LPOCV)

1. Low training accuracy

2. Low test accuracy

3. Simple model (e.g., few features, shallow neural networks)

1. Too few features

2. Insufficient model complexity

3. Inadequate training data

1. Poor predictive performance

2. Failure to capture important patterns

3. Inadequate insights from the model

Prevention and Remedies:

1. Regularization (L1, L2, dropout)

6. Ensemble methods (bagging, boosting)

1. Increase model complexity

2. Add relevant features

3. Increase training data

2. Pre-defined hyperparameter ranges

3. Evaluates all possible combinations

Hyperparameter Tuning is the process of adjusting hyperparameters to optimize a model's performance.

1. Automated or manual adjustment

2. Goal-oriented (e.g., maximize accuracy)

3. May use various optimization techniques (e.g., Bayesian, gradient-based)

4. Can be computationally efficient or expensive

Importance of Grid Search:

Steps to perform Grid Search:

'C': [0.1, 1, 10],

'kernel': ['linear', 'rbf'],

'gamma': [1, 0.1, 0.01]

'n_estimators': [100, 200, 300],

'max_depth': [10, 20, 30],

'max_features': ['auto', 'sqrt']

Grid Search used in a Classification example:

from sklearn.neighbors import KNeighborsClassifier

 The number of neighbor points to consider in the estimate

grid = {'n_neighbors': range (1,11),

Number of tested models: 40

Grid Search used in a Regression example:

from sklearn.model_selection import train_test_split

from xgboost import XGBRegressor

#Dictionary of Hyperparameter values to search

You might also like