-
-
Notifications
You must be signed in to change notification settings - Fork 26k
[MRG+1] sample_weight support in metrics #3043
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I'm away from home and may take some time to review this, sorry. On 7 April 2014 09:53, Noel Dawe notifications@github.com wrote:
|
No problem. Maybe someone else can review? These are the same changes you reviewed in #1574 when you said it was ready to merge, so hopefully the review can be quick. |
@@ -428,6 +468,8 @@ def _average_binary_score(binary_metric, y_true, y_score, average): | |||
raise ValueError("{0} format is not supported".format(y_type)) | |||
|
|||
if y_type == "binary": | |||
if sample_weight is not None: | |||
return binary_metric(y_true, y_score, sample_weight=sample_weight) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why not just reuse the default case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh. Forgot binary_metric
may not support sample_weight
. Since _average_binary_score
is private, and new to this release, and all its uses now support sample_weight
, it's safe to require this as a property of binary_metric
, and I'd recommend doing so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done in d4794c5
Apart from those minor comments, looks good to me. |
All changes are applied. Let's see if it's still green. |
Thanks for your comments. This PR is still green and should be ready to merge now. |
"not equal (%f != %f) for %s" % ( | ||
weighted_score, weighted_score_list, name)) | ||
|
||
if not name.startswith('samples'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I've not understood why this condition should be the case. Why does the per-sample scoring not still result in a weighted sum?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, that should not have been there. In fact, the sample weights were not correctly accounted for when average='samples'. This should now be fixed.
Thanks for all your work, Noel. This has my +1. |
I am not so familiar with the metrics module, but these changes look fine to me. Tests are thorough. |
Thanks @ndawe. I will try to have a look at this pr today. |
numerator = ((y_true - y_pred) ** 2).sum(dtype=np.float64) | ||
denominator = ((y_true - y_true.mean(axis=0)) ** 2).sum(dtype=np.float64) | ||
if sample_weight is not None: | ||
sample_weight = column_or_1d(sample_weight) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not factoring this king of validation in _check_reg_targets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe not since sample_weight isn't a target.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Those _check_
function are general utility to decrease code duplication. Maybe the name is not the best.
I wasn't expecting that you do so before then, I just thought it would be a On 13 April 2014 10:35, Noel Dawe notifications@github.com wrote:
|
return np.mean(score) | ||
else: | ||
if sample_weight is not None: | ||
return np.sum(np.multiply(score, sample_weight)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: It seems equivalent to np.dot(score, sample_weight)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, done.
@ndawe This looks very good! Some little arrangements remains before it is finalised. |
whats_new.rst is updated. Any other comments? |
When the last comment is addressed about micro/macro averaging, LGTM ! |
…r precision, recall, and f-score
Great! Last comment addressed. |
👍. Great work! |
Glad to have this merged! Thanks a lot for the reviews. Should this PR now be closed? |
Yes, we can! One more time congratulation ! |
Awesome. Thanks for the patience and the hard work, @ndawe! On 22 April 2014 03:04, Arnaud Joly notifications@github.com wrote:
|
This PR is part of the larger #1574 and adds support for sample weights in the metrics.