How to do LeaveOneOut cross validation #15900

qinhanmin2014 · 2019-12-16T11:48:50Z

Currrently, we have two ways to do LOO cv in scikit-learn
The first one is in GridSearchCV, where we calculate the score of each fold (i.e., each sample) and then take the average.
The second one is in RidgeCV, where we calculate the prediction of each fold (i.e., each sample), put them together and calculate the score.
I think this inconsistency is annoying.
Another issue is that whether we should consider sample_weight when averaging the scores in the first option and when calculating the scores in the second option. We do so in RidgeCV, but don't do so in GridSearchCV.
Related to RidgeCV issues ping @glemaitre

amueller · 2019-12-23T21:52:28Z

The use in RidgeCV here is the inconsistency. However, it is not very easy to fix, because of #5097: we can not compute the r2_score with LOO because of the way we define the r2_score.

Also see #14886.

We do not use sample_weights in computing the score anywhere, see #4632 and many related issues.

jnothman · 2019-12-24T00:46:57Z

(Would we be able to define r2 better if we could fit metrics on a dataset??)

…

qinhanmin2014 · 2019-12-26T14:05:25Z

The use in RidgeCV here is the inconsistency. However, it is not very easy to fix, because of #5097: we can not compute the r2_score with LOO because of the way we define the r2_score.

I guess this is not a problem, because we'll get nan if we rely on default r2 when using GridSearchCV to solve regression problems

import warnings
warnings.simplefilter("ignore")
from sklearn.datasets import load_boston
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import GridSearchCV, LeaveOneOut
X, y = load_boston(return_X_y=True)
X, y = X[:10], y[:10]
reg = RandomForestRegressor(random_state=0)
params = {"n_estimators": [10, 20]}
grid = GridSearchCV(reg, params, cv=LeaveOneOut())
grid.fit(X, y)
print(grid.best_score_)
# nan

qinhanmin2014 · 2019-12-26T14:08:40Z

Though I think it's not good to return nan.

I tried to google things like leave one out cross validation r2, seems that most people calculate the prediction of each fold (i.e., each sample), put them together and calculate the score.

qinhanmin2014 mentioned this issue Jan 14, 2020

Wrongly implemented test in RidgeCV #16041

Open

cmarmo added the module:model_selection label Mar 29, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to do LeaveOneOut cross validation #15900

How to do LeaveOneOut cross validation #15900

How to do LeaveOneOut cross validation #15900

How to do LeaveOneOut cross validation #15900

Comments