Lecture 1
Resampling Methods
Manoj Kumar
Machine Learning
Youtube
October 30, 2024
October 30, 2024 1 / 15
Goal: Learn y = wT f (x) + b where f (.) is a polynomial basis function.
x y
0.0000 1.4544
0.1579 2.1039
0.3158 1.6518
0.4737 1.5701
0.6316 2.1284
Goal: Learn y = wT f (x) + b where f (.) is a polynomial basis function.
y x x2
1.4544 0.0000 0.0000
2.1039 0.1579 0.0249
1.6518 0.3158 0.0998
1.5701 0.4737 0.2244
2.1284 0.6316 0.3989
Goal: Learn y = wT f (x) + b where f (.) is a polynomial basis function.
y x x2 x3
1.4544 0.0000 0.0000 0.0000
2.1039 0.1579 0.0249 0.0039
1.6518 0.3158 0.0998 0.0315
1.5701 0.4737 0.2244 0.1063
2.1284 0.6316 0.3989 0.2519
Goal: Learn y = wT f (x) + b where f (.) is a polynomial basis function.
y x x2 x3
1.4544 0.0000 0.0000 0.0000
2.1039 0.1579 0.0249 0.0039
1.6518 0.3158 0.0998 0.0315
1.5701 0.4737 0.2244 0.1063
2.1284 0.6316 0.3989 0.2519
Does higher value of polynomial degree
means better model?
Goal: Learn y = wT f (x) + b where f (.) is a polynomial basis function.
y x x2 ··· x9
1.4544 0.0000 0.0000 ··· 0.0000
2.1039 0.1579 0.0249 ··· 0.0000
1.6518 0.3158 0.0998 ··· 0.0003
1.5701 0.4737 0.2244 ··· 0.0013
2.1284 0.6316 0.3989 ··· 0.0101
Goal: Learn y = wT f (x) + b where f (.) is a polynomial basis function.
y x x2 ··· x19
1.4544 0.0000 0.0000 ··· 0.0000
2.1039 0.1579 0.0249 ··· 0.0000
1.6518 0.3158 0.0998 ··· 0.0000
1.5701 0.4737 0.2244 ··· 0.0001
2.1284 0.6316 0.3989 ··· 0.0004
Given a dataset, begin by splitting into
October 30, 2024 2 / 15
Given a dataset, begin by splitting into
October 30, 2024 2 / 15
Given a dataset, begin by splitting into
Model assessment: Use TEST to assess the accuracy of the model you output.
October 30, 2024 2 / 15
Given a dataset, begin by splitting into
Model assessment: Use TEST to assess the accuracy of the model you output.
Never ever train or choose parameters based on the test data.
October 30, 2024 2 / 15
Validation set Approach
October 30, 2024 4 / 15
Validation set Approach
Goal: Estimate the test error for a supervised learning method.
Strategy:
Split the data into 2 parts
Train the method in the first part
Compute the error on the second part
October 30, 2024 4 / 15
Validation set approach
Estimates can vary considerably with different train-validation splits.
Only subset of points used to evaluate model.
October 30, 2024 5 / 15
Leave one out cross-validation
October 30, 2024 6 / 15
Leave one out cross-validation
For every i = 1, . . . , n:
▶ Train the model on every point except i,
▶ Compute the test error on the held-out point.
Average the test errors.
October 30, 2024 6 / 15
Leave one out cross-validation
For every i = 1, . . . , n:
▶ Train the model on every point except i,
▶ Compute the test error on the held-out point.
Average the test errors.
n
1X
CV(n) = (yi − ŷi )2
n
i=1
Prediction for the i sample without using the ith sample.
October 30, 2024 7 / 15
Leave one out cross-validation
For every i = 1, . . . , n:
▶ Train the model on every point except i,
▶ Compute the test error on the held-out point.
Average the test errors.
n
1X
CV(n) = 1(yi ̸= ŷi )
n
i=1
... for a classification problem.
October 30, 2024 8 / 15
Leave one out cross-validation
Computing CV(n) can be computationally expensive, since it involves fitting the model n
times.
A single linear regression fit takes O(d 3 + nd 2 ) time.
October 30, 2024 9 / 15
k-fold cross-validation
Split the data into k subsets or folds.
For every i = 1, . . . , k:
▶ Train the model on every fold except the i-th fold,
▶ Compute the test error on the i-th fold.
Average the test errors.
October 30, 2024 10 / 15
5-Fold Cross Validation Demo
Given k folds, to determine the quality of a particular hyperparameter, e.g., alpha =
0.1:
Pick a fold, which we’ll call the validation fold. Train the model on all but this fold.
Compute the error on the validation fold.
Repeat the step above for all k possible choices of validation fold.
Quality of the hyperparameter is the average of the k validation fold errors.
October 30, 2024 12 / 15
5-Fold Cross Validation Demo
Given k folds, to determine the quality of a particular hyperparameter, e.g., alpha =
0.1:
Pick a fold, which we’ll call the validation fold. Train the model on all but this fold.
Compute the error on the validation fold.
Repeat the step above for all k possible choices of validation fold.
Quality of the hyperparameter is the average of the k validation fold errors.
October 30, 2024 12 / 15
Picking K
Typical choices of k are 5, 10, and N, where N is the amount of data.
K = N is also known as “leave one out cross-validation,” and will typically give you the
best results.
▶ In this approach, each validation set is only one point.
▶ Every point gets a chance to get used as the validation set.
k = N is also very expensive, requiring you to fit a huge number of models.
Ultimately, the tradeoff is between k and computation time.
October 30, 2024 13 / 15
LOOCV vs. k-fold cross-validation
k-fold CV depends on the chosen split.
In k-fold CV, we train the model on less data than what is available. This
introduces bias into the estimates of test error.
In LOOCV, the training samples highly resemble each other. This increases the
variance of the test error estimate.
October 30, 2024 14 / 15