-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Should cross-validation scoring take sample-weights into account? #4632
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Should cross-validation scoring take sample-weights into account? #4632
Comments
Ping @arjoly @jnothman @GaelVaroquaux ? |
We probably want to have both option. |
Probably yes. But how to specify? When creating a scorer? |
This is one of the issues raised in #1574, wherein, iirc, @ndawe strongly believed weighted scoring should be used. But as raised here there are compatibility issues.
I don't think so; too inaccessible, especially given that the default score methods support weight. We're better off with a parameter in each of the CV wrappers, even though I appreciate that's not modular and touches lots of code :s |
Yes, weights should be accounted for. I suppose you already know I think that 😄 If they are not used, but your dataset is really supposed to be weighted, you can get a very different (suboptimal!) model. I forgot I already have this implemented in my fork but it hasn't been merged yet. This was essential for my PhD research where sample weights could vary by orders of magnitude and just using a weight of 1 for all samples (anywhere in the learning process, including the GridSearchCV) would give poor results. |
But I'm in support of implementing all of this according to the discussion in #4497. I just desperately needed the handling of sample weights by the cross-validation for my research and implemented it for myself at that time. |
@ndawe I am in a similar situation. Could you share your implementation? |
@diego898 it's not guaranteed to still work against upstream changes in scikit-learn, but this is what I had: ndawe@3da7fb7 |
Has anybody picked this up? There are a few half-baked PRs, but I'd like to push to get this in. If no one is working on it and people think the feature isn't a bad idea in general, I'm happy to give it a go. |
@alexpearce I think the issue is that we haven't come up with a good API for it. |
In case anybody needs a solution now, now, now and can't be bothered to install a custom fork of sklearn.
THE TEST EXPLAINED: (if you care) |
Has this issue been addressed lately? PR #16449 needs to choose between the 2 following options:
You can replace MSE by most other scores/metrics. As @ndawe, I'm in favour of always using the weighted version as minimal implementation. Though, an additional option would be a nice-to-have. |
No we have not really got a ready solution for this yet.
|
There seem to be overall agreement that sample_weight should be taken into account for scoring in grid search, but making this happen in GridSearchCV is technically challenging (see #13432 (comment) by Joel for more context).
I think we cannot make it the default in LassoCV and ElasticNetCV (or other specialized implementations) as long as GridSearchCV doesn't support it. There are also cases where we wouldn't want to use sample weights in cv scoring cf #13432 (comment) . Some metrics can also not have sample weight implemented and would fail in that case. +1 for these two estimators to allow this optionally. Maybe a |
but making this happen in GridSearchCV is technically challenging
This is not particularly challenging if we want to just change the default
behaviour, actually.
|
I'm not 100% sure about this, but it seems to me that domain adaptation is one scenario where you do not want to weight the test samples when evaluating the test score. The idea is that the importance weights are used to adapt the train set to more accurately reflect the test set. No weights are needed for the test set since the test set already accurately reflects the test set. :) Of course, it probably doesn't make sense to be doing randomly sampled KFold cross-validation and domain adaptation at the same time. In my case, I'm doing a time series cross-validation (actually a nested time series cross-validation), and there's a time-dependent sampling bias in the data that I'm trying to account for with domain adaptation. |
What would be the difference with a direct acceptance of the |
If gridCV |
It appears that eight years of discussion have not resolved this issue. Maybe a comic (and some humour) helps: |
While there is no theoretical result that I'm aware of to prefer CV with or without sample weights, the ability to use sample weights in CV will become available with #22083. |
While there is no theoretical result that I'm aware of to prefer CV with or without sample weights,
In the context of causal inference, we are seeing a need for such practice (paper to come).
However, in practice, even with weights added in scikit-learn cross-validation framework, I think that the scheme that we need would be too complex for the existing framework and that we should write our own (I am not suggesting to adapt scikit-learn's framework: it would probably be unreasonably complex).
|
It this already available in the newest version? I cannot seem to find a clear example in the release highlights whether it is now possible or not. |
Currently many of our metrics allow for sample weights, but when using the
cross_val_score
orGridSearchCV
these are not taken into account. As I said on the mailing list, I'm not sure if this is a bug or a feature. It might depend on the domain whether we want to do this or not.#4497 will allow to do this in a more consistent manner.
The text was updated successfully, but these errors were encountered: