-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
FIX Ignore zero sample weights in precision recall curve #18328
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot @albertvillanova ! A few comments are below.
if len(np.unique(y_true)) != 2: | ||
raise ValueError("Only one class present in y_true. Detection error " | ||
"tradeoff curve is not defined in that case.") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change? I would imagine it's better to check for it before computing the the _binary_clf_curve
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As it was, if you pass a multiclass y_true
, it would rise "Only one class present..." ValueError.
With the reorder, a multiclass ValueError is raised by _binary_clf_curve
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to be in whats_new since it's changing a behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adrinjalali are you sure?
When passing a multiclass y_true
, before and now a ValueError is raised; that does not change. The only change is the error message:
- Before: "Only one class present...", which was wrong.
- Now: "multiclass format is not supported"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've a few minor comments.
sklearn/metrics/_ranking.py
Outdated
10000
sample_weight = column_or_1d(sample_weight) | ||
_check_sample_weight(sample_weight, y_true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sample_weight = column_or_1d(sample_weight) | |
_check_sample_weight(sample_weight, y_true) | |
sample_weight = _check_sample_weight(sample_weight, y_true) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_check_sample_weight
internally calls check_array
. This proposed change is the only one I consider important and not cosmetic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done. Thank you for pointing this out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can remove the line sample_weight = column_or_1d(sample_weight)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After that change, I think it'll be a LGTM from my side:smirk:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case, we can't call _check_sample_weight
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@scikit-learn/core-devs Could someone help out?
According to the docstrings, _binary_clf_curve
accepts only 1d array-likes of shape (n_samples,)
. Nevertheless, the code for _binary_clf_curve
, dating back more than 7 yeary at #2251 d8f90d1, calls column_or_1d
on y_true
, y_score
and sample_weight
and therefore also accepts shape (n_samples, 1)
.
Shall we adopt the docstings or be stricter to the inputs and actually require shape (n_sampels,)
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we should change the behavior. When there's a discrepancy between the code and the docs, we tend to fix the docs. In this case, I think I'd be happy with having column_or_1d
, since it makes it more convenient for people to work with the API, and it doesn't have much of a consequence. In terms of the docs, I'm not sure if we need to updated them, but I wouldn't object an update I think. I'd need to think more about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adrinjalali Thanks for your expert judgement. In this case, I'd leave the docs as is.
The only open question is then whether to use _check_sample_weight
, which enforces shape (n_samples,)
, xor column_or_1d
for sample_weight
—but not both.
@albertvillanova For me, the choice is yours:smirk:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, I realize: the current solution with both column_or_1d
before _check_sample_weight
works as expected.
I propose to open a new issue for the renaming and deprecation in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks for your patience.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @albertvillanova
sample_weight = _check_sample_weight(sample_weight, y_true) | ||
nonzero_weight_mask = sample_weight != 0 | ||
y_true = y_true[nonzero_weight_mask] | ||
y_score = y_score[nonzero_weight_mask] | ||
sample_weight = sample_weight[nonzero_weight_mask] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it not be better to instead of filtering out zero sample weights, doing the computation of scores in a way that zero sample weights are not taken into account? This proposed change results in a difference between 0.
and epsilon, which I don't think is desired.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For whatever reasons you have some zeros in sam
10000
ple_weight
, you better always get the same length of arrays. Imagine y_score = clf.predict_proba(X)
with y_score.shape[0] == X.shape[0] == sample_weight.shape[0]
. Then, filtering out zero weight samples here in _binary_clf_curve
is quite convenient.
This proposed change results in a difference between 0. and epsilon, which I don't think is desired.
Could you elaborate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry @adrinjalali I do not understand your point either...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let say we want a weighted average of a bunch of numbers, one way is first to filter out data with zero weight, the other is to have sth like: sum(w[i]*x[i]) / sum(w[i])
and ignore the fact that some of those weights are zero. The calculation itself takes care of zero sample weights.
Now in this case, there's an issue with having samples with zero weight in the data. My question is if we just filter out the zero weights, then what happens to the sample with weight equal to 1e-32 for instance? That's practically zero, and should be treated [almost] the same as the ones with zero sample weight.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAIK, _binary_clf_curve
is only used here in _ranking.py. It calculates counts cumsum(y * weight)
which are the ingredients for precision, recall and the like. You never have ... / sum(weight)
. This would cancel out in ratios of counts anyway. Therefore, I would have expected that filtering out zero weight samples is equivalent.
The real problem mentioned in #16065, however, seems to be the different values of thresholds, i.e. thresholds with only zero weights should be dropped. This PR solves this problem (by removing those sample).
Suggestion: Should we just add a test with weight1=[1, 1, 1e-30]
and weight2=[1, 1, 0]
for several relevant scores/erros/losses?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we reach a consensus? I would like to get this one closed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I played around a little bit, I'm happy with this solution as is :)
if len(np.unique(y_true)) != 2: | ||
raise ValueError("Only one class present in y_true. Detection error " | ||
"tradeoff curve is not defined in that case.") | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this needs to be in whats_new since it's changing a behavior.
@adrinjalali I guess you're busy with other important tasks. Do you still intend to review this one or should we look out for another reviewer? |
I did what I wanted to do (#18328 (comment)), I'm happy for this to be merged :) |
@adrinjalali Thanks for confirming. A green check mark would have been enough:smirk: @albertvillanova Could you move the whatsnew entry from 0.24 to 1.0 and thereby solve merge conflicts (after merging main)? After that, I'm eager to hit the green button. |
Done @lorentzenchr 😉 |
…n#18328) Co-authored-by: Alonso Silva Allende <alonsosilva@gmaiil.com>
Fix #16065.
Supersede and close #16319.