8000 Should cross-validation scoring take sample-weights into account? · Issue #4632 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Should cross-validation scoring take sample-weights into account? #4632

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
amueller opened this issue Apr 24, 2015 · 22 comments · Fixed by ZEFR-INC/scikit-learn#1, ZEFR-INC/scikit-learn#2, #22083 or #26896

Comments

@amueller
Copy link
Member

Currently many of our metrics allow for sample weights, but when using the cross_val_score or GridSearchCV these are not taken into account. As I said on the mailing list, I'm not sure if this is a bug or a feature. It might depend on the domain whether we want to do this or not.
#4497 will allow to do this in a more consistent manner.

@amueller
Copy link
Member Author

Ping @arjoly @jnothman @GaelVaroquaux ?

@arjoly
Copy link
Member
arjoly commented Apr 25, 2015

We probably want to have both option.

@amueller
Copy link
Member Author

Probably yes. But how to specify? When creating a scorer?

@jnothman
Copy link
Member

This is one of the issues raised in #1574, wherein, iirc, @ndawe strongly believed weighted scoring should be used. But as raised here there are compatibility issues.

But how to specify? When creating a scorer?

I don't think so; too inaccessible, especially given that the default score methods support weight. We're better off with a parameter in each of the CV wrappers, even though I appreciate that's not modular and touches lots of code :s

@ndawe
Copy link
Member
ndawe commented Apr 25, 2015

Yes, weights should be accounted for. I suppose you already know I think that 😄

If they are not used, but your dataset is really supposed to be weighted, you can get a very different (suboptimal!) model.

I forgot I already have this implemented in my fork but it hasn't been merged yet. This was essential for my PhD research where sample weights could vary by orders of magnitude and just using a weight of 1 for all samples (anywhere in the learning process, including the GridSearchCV) would give poor results.

@ndawe
Copy link
Member
ndawe commented Apr 25, 2015

But I'm in support of implementing all of this according to the discussion in #4497. I just desperately needed the handling of sample weights by the cross-validation for my research and implemented it for myself at that time.

@diego898
Copy link

@ndawe I am in a similar situation. Could you share your implementation?

@ndawe
Copy link
Member
ndawe commented May 10, 2015

@diego898 it's not guaranteed to still work against upstream changes in scikit-learn, but this is what I had: ndawe@3da7fb7

@alexpearce
Copy link

Has anybody picked this up? There are a few half-baked PRs, but I'd like to push to get this in.

If no one is working on it and people think the feature isn't a bad idea in general, I'm happy to give it a go.

@amueller
Copy link
Member Author

@alexpearce I think the issue is that we haven't come up with a good API for it.
I think we'd be happy to consider a proposal. It should be explicit and backward-compatible, so there should be some way for the user to specify whether the score should be weighted or not.

@csbrown
Copy link
csbrown commented Jun 1, 2018

In case anybody needs a solution now, now, now and can't be bothered to install a custom fork of sklearn.

import sklearn.linear_model
import numpy as np
from sklearn.model_selection import KFold
import sklearn.metrics

def cross_val_scores_weighted(model, X, y, weights, cv=5, metrics=[sklearn.metrics.accuracy_score]):
    kf = KFold(n_splits=cv)
    kf.get_n_splits(X)
    scores = [[] for metric in metrics]
    for train_index, test_index in kf.split(X):
        model_clone = sklearn.base.clone(model)
        X_train, X_test = X[train_index], X[test_index]
        y_train, y_test = y[train_index], y[test_index]
        weights_train, weights_test = weights[train_index], weights[test_index]
        model_clone.fit(X_train,y_train,sample_weight=weights_train)
        y_pred = model_clone.predict(X_test)
        for i, metric in enumerate(metrics):
            score = metric(y_test, y_pred, sample_weight = weights_test)
            scores[i].append(score)
    return scores

X = np.array([[1],[2],[1]]*30)
y = np.array([0,1,1]*30)
weights = np.array([98,1,1]*30)
no_weights = np.array([1,1,1]*30)

logistic_model = sklearn.linear_model.LogisticRegression()
weighted_scores = cross_val_scores_weighted(logistic_model, X, y, weights, cv=5)
no_weighted_scores = cross_val_scores_weighted(logistic_model, X, y, no_weights, cv=5)

print(weighted_scores, no_weighted_scores)

THE TEST EXPLAINED: (if you care)
The half-assed test has for X 2/3 1s, which give y=0 or 1 50% each (no info), and 1/3 2s which give y=1 always. The logistic regression therefore predicts 2/3 of the data with 50% accuracy and 1/3 of the data with 100% accuracy = 2/3, which is what we get. With weights, we have 99% 1s, which give 98/99 y=0. 1% 2s all give y=1. A good model predicts y=0 for x=1 and y=1 for x=2, so it gets all but that 1% of 1s correct, which is 99% accuracy. This is what gets printed, so I assume this works in all possible cases. :)

@lorentzenchr
Copy link
Member

Has this issue been addressed lately? PR #16449 needs to choose between the 2 following options:

  1. Compute unweighted MSE on each fold k, i.e. mse_k=1/n_k * sum((y_i-predicted_i)^2)
    Compute unweighted mse_cv = 1/(number of folds) * sum(mse_k)
  2. Compute weighted MSE on each fold k, i.e. mse_k = 1/sum(sw_i) * sum(sw_i*(y_i-predicted_i)^2)
    Compute weighted mse_cv = 1/sum(sw_k) * sum(sw_k * mse_k), where sw_k = sum of sw over fold k.

You can replace MSE by most other scores/metrics. As @ndawe, I'm in favour of always using the weighted version as minimal implementation. Though, an additional option would be a nice-to-have.

@jnothman
Copy link
Member
jnothman commented Feb 17, 2020 via email

@rth
Copy link
Member
rth commented Feb 17, 2020

There seem to be overall agreement that sample_weight should be taken into account for scoring in grid search, but making this happen in GridSearchCV is technically challenging (see #13432 (comment) by Joel for more context).

I'm in favour of always using the weighted version as minimal implementation.
Though, an additional option would be a nice-to-have.

I think we cannot make it the default in LassoCV and ElasticNetCV (or other specialized implementations) as long as GridSearchCV doesn't support it. There are also cases where we wouldn't want to use sample weights in cv scoring cf #13432 (comment) . Some metrics can also not have sample weight implemented and would fail in that case.

+1 for these two estimators to allow this optionally. Maybe a weight_scoring init parameter with a default to False? Wouldn't we need first for regression metrics to support sample_weight though? Currently only r2_score does.

@jnothman
Copy link
Member
jnothman commented Feb 17, 2020 via email

@CameronBieganek
Copy link

I'm not 100% sure about this, but it seems to me that domain adaptation is one scenario where you do not want to weight the test samples when evaluating the test score. The idea is that the importance weights are used to adapt the train set to more accurately reflect the test set. No weights are needed for the test set since the test set already accurately reflects the test set. :)

Of course, it probably doesn't make sense to be doing randomly sampled KFold cross-validation and domain adaptation at the same time. In my case, I'm doing a time series cross-validation (actually a nested time series cross-validation), and there's a time-dependent sampling bias in the data that I'm trying to account for with domain adaptation.

@gosuto-inzasheru
Copy link

What would be the difference with a direct acceptance of the sample_weight parameter by cross_val_score and just passing it to the .fit function using fit_params={'sample_weight': weight_array}?

@Yuting-Kou
Copy link

What would be the difference with a direct acceptance of the sample_weight parameter by cross_val_score and just passing it to the .fit function using fit_params={'sample_weight': weight_array}?

If gridCV fit using sample_weight, then the correct way to pick the best estimator by using same sample_weight in scorer. Therefore, I do agree that adding sample_weight as direct parameters in baseSearchCV is a great idea (default is None). It just like fit_params = _check_fit_params(X, fit_params, train) in _fit_and_score function.

@mlondschien
Copy link
Contributor
mlondschien commented Jun 3, 2021

It appears that eight years of discussion have not resolved this issue. Maybe a comic (and some humour) helps:

https://xkcd.com/1172/

@lorentzenchr
Copy link
Member

While there is no theoretical result that I'm aware of to prefer CV with or without sample weights, the ability to use sample weights in CV will become available with #22083.

@GaelVaroquaux
Copy link
Member
GaelVaroquaux commented Feb 25, 2022 via email

@jeroenvermunt
Copy link

While there is no theoretical result that I'm aware of to prefer CV with or without sample weights, the ability to use sample weights in CV will become available with #22083.

It this already available in the newest version? I cannot seem to find a clear example in the release highlights whether it is now possible or not.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
0