FIX Ignore zero sample weights in precision recall curve#18328
FIX Ignore zero sample weights in precision recall curve#18328lorentzenchr merged 25 commits intoscikit-learn:mainfrom
Conversation
There was a problem hiding this comment.
Thanks a lot @albertvillanova ! A few comments are below.
| if len(np.unique(y_true)) != 2: | ||
| raise ValueError("Only one class present in y_true. Detection error " | ||
| "tradeoff curve is not defined in that case.") | ||
|
|
There was a problem hiding this comment.
Why this change? I would imagine it's better to check for it before computing the the _binary_clf_curve?
There was a problem hiding this comment.
As it was, if you pass a multiclass y_true, it would rise "Only one class present..." ValueError.
With the reorder, a multiclass ValueError is raised by _binary_clf_curve.
There was a problem hiding this comment.
this needs to be in whats_new since it's changing a behavior.
There was a problem hiding this comment.
@adrinjalali are you sure?
When passing a multiclass y_true, before and now a ValueError is raised; that does not change. The only change is the error message:
- Before: "Only one class present...", which was wrong.
- Now: "multiclass format is not supported"
There was a problem hiding this comment.
I've a few minor comments.
sklearn/metrics/_ranking.py
Outdated
| sample_weight = column_or_1d(sample_weight) | ||
| _check_sample_weight(sample_weight, y_true) |
There was a problem hiding this comment.
| sample_weight = column_or_1d(sample_weight) | |
| _check_sample_weight(sample_weight, y_true) | |
| sample_weight = _check_sample_weight(sample_weight, y_true) |
There was a problem hiding this comment.
_check_sample_weight internally calls check_array. This proposed change is the only one I consider important and not cosmetic.
There was a problem hiding this comment.
Done. Thank you for pointing this out.
There was a problem hiding this comment.
I think you can remove the line sample_weight = column_or_1d(sample_weight).
There was a problem hiding this comment.
After that change, I think it'll be a LGTM from my side:smirk:
There was a problem hiding this comment.
In this case, we can't call _check_sample_weight.
There was a problem hiding this comment.
@scikit-learn/core-devs Could someone help out?
According to the docstrings, _binary_clf_curve accepts only 1d array-likes of shape (n_samples,). Nevertheless, the code for _binary_clf_curve, dating back more than 7 yeary at #2251 d8f90d1, calls column_or_1d on y_true, y_score and sample_weight and therefore also accepts shape (n_samples, 1).
Shall we adopt the docstings or be stricter to the inputs and actually require shape (n_sampels,).
There was a problem hiding this comment.
I don't think we should change the behavior. When there's a discrepancy between the code and the docs, we tend to fix the docs. In this case, I think I'd be happy with having column_or_1d, since it makes it more convenient for people to work with the API, and it doesn't have much of a consequence. In terms of the docs, I'm not sure if we need to updated them, but I wouldn't object an update I think. I'd need to think more about it.
There was a problem hiding this comment.
@adrinjalali Thanks for your expert judgement. In this case, I'd leave the docs as is.
The only open question is then whether to use _check_sample_weight, which enforces shape (n_samples,), xor column_or_1d for sample_weight—but not both.
@albertvillanova For me, the choice is yours:smirk:
There was a problem hiding this comment.
Oh, I realize: the current solution with both column_or_1d before _check_sample_weight works as expected.
|
I propose to open a new issue for the renaming and deprecation in |
| sample_weight = _check_sample_weight(sample_weight, y_true) | ||
| nonzero_weight_mask = sample_weight != 0 | ||
| y_true = y_true[nonzero_weight_mask] | ||
| y_score = y_score[nonzero_weight_mask] | ||
| sample_weight = sample_weight[nonzero_weight_mask] |
There was a problem hiding this comment.
Would it not be better to instead of filtering out zero sample weights, doing the computation of scores in a way that zero sample weights are not taken into account? This proposed change results in a difference between 0. and epsilon, which I don't think is desired.
There was a problem hiding this comment.
For whatever reasons you have some zeros in sample_weight, you better always get the same length of arrays. Imagine y_score = clf.predict_proba(X) with y_score.shape[0] == X.shape[0] == sample_weight.shape[0]. Then, filtering out zero weight samples here in _binary_clf_curve is quite convenient.
This proposed change results in a difference between 0. and epsilon, which I don't think is desired.
Could you elaborate?
There was a problem hiding this comment.
Sorry @adrinjalali I do not understand your point either...
There was a problem hiding this comment.
let say we want a weighted average of a bunch of numbers, one way is first to filter out data with zero weight, the other is to have sth like: sum(w[i]*x[i]) / sum(w[i]) and ignore the fact that some of those weights are zero. The calculation itself takes care of zero sample weights.
Now in this case, there's an issue with having samples with zero weight in the data. My question is if we just filter out the zero weights, then what happens to the sample with weight equal to 1e-32 for instance? That's practically zero, and should be treated [almost] the same as the ones with zero sample weight.
There was a problem hiding this comment.
AFAIK, _binary_clf_curve is only used here in _ranking.py. It calculates counts cumsum(y * weight) which are the ingredients for precision, recall and the like. You never have ... / sum(weight). This would cancel out in ratios of counts anyway. Therefore, I would have expected that filtering out zero weight samples is equivalent.
The real problem mentioned in #16065, however, seems to be the different values of thresholds, i.e. thresholds with only zero weights should be dropped. This PR solves this problem (by removing those sample).
Suggestion: Should we just add a test with weight1=[1, 1, 1e-30] and weight2=[1, 1, 0] for several relevant scores/erros/losses?
There was a problem hiding this comment.
Can we reach a consensus? I would like to get this one closed?
There was a problem hiding this comment.
I played around a little bit, I'm happy with this solution as is :)
| if len(np.unique(y_true)) != 2: | ||
| raise ValueError("Only one class present in y_true. Detection error " | ||
| "tradeoff curve is not defined in that case.") | ||
|
|
There was a problem hiding this comment.
this needs to be in whats_new since it's changing a behavior.
|
@adrinjalali I guess you're busy with other important tasks. Do you still intend to review this one or should we look out for another reviewer? |
|
I did what I wanted to do (#18328 (comment)), I'm happy for this to be merged :) |
|
@adrinjalali Thanks for confirming. A green check mark would have been enough:smirk: @albertvillanova Could you move the whatsnew entry from 0.24 to 1.0 and thereby solve merge conflicts (after merging main)? After that, I'm eager to hit the green button. |
|
Done @lorentzenchr 😉 |
…n#18328) Co-authored-by: Alonso Silva Allende <alonsosilva@gmaiil.com>
Fix #16065.
Supersede and close #16319.