8000 [MRG+1] sample_weight support in metrics by ndawe · Pull Request #3043 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

[MRG+1] sample_weight support in metrics #3043

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 17 commits into from

Conversation

ndawe
Copy link
Member
@ndawe ndawe commented Apr 6, 2014

This PR is part of the larger #1574 and adds support for sample weights in the metrics.

@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling 011cecf on ndawe:weighted_metrics into 070d7a6 on scikit-learn:master.

@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling 714c82e on ndawe:weighted_metrics into 070d7a6 on scikit-learn:master.

@ndawe
Copy link
Member Author
ndawe commented Apr 6, 2014

@jnothman @arjoly this should be ready to merge. I'll then move on to cross-validation and grid searching.

@jnothman
Copy link
Member
jnothman commented Apr 7, 2014

I'm away from home and may take some time to review this, sorry.

On 7 April 2014 09:53, Noel Dawe notifications@github.com wrote:

@jnothman https://github.com/jnothman @arjolyhttps://github.com/arjolythis should be ready to merge. I'll then move on to cross-validation and
grid searching.

Reply to this email directly or view it on GitHubhttps://github.com//pull/3043#issuecomment-39687407
.

@ndawe
Copy link
Member Author
ndawe commented Apr 7, 2014

No problem. Maybe someone else can review? These are the same changes you reviewed in #1574 when you said it was ready to merge, so hopefully the review can be quick.

@@ -428,6 +468,8 @@ def _average_binary_score(binary_metric, y_true, y_score, average):
raise ValueError("{0} format is not supported".format(y_type))

if y_type == "binary":
if sample_weight is not None:
return binary_metric(y_true, y_score, sample_weight=sample_weight)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just reuse the default case?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh. Forgot binary_metric may not support sample_weight. Since _average_binary_score is private, and new to this release, and all its uses now support sample_weight, it's safe to require this as a property of binary_metric, and I'd recommend doing so.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done in d4794c5

@jnothman
Copy link
Member
jnothman commented Apr 7, 2014

Apart from those minor comments, looks good to me.

@ndawe
Copy link
Member Author
ndawe commented Apr 7, 2014

All changes are applied. Let's see if it's still green.

@ndawe
Copy link
Member Author
ndawe commented Apr 7, 2014

Thanks for your comments. This PR is still green and should be ready to merge now.

"not equal (%f != %f) for %s" % (
weighted_score, weighted_score_list, name))

if not name.startswith('samples'):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I've not understood why this condition should be the case. Why does the per-sample scoring not still result in a weighted sum?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, that should not have been there. In fact, the sample weights were not correctly accounted for when average='samples'. This should now be fixed.

@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling 24c9340 on ndawe:weighted_metrics into 070d7a6 on scikit-learn:master.

@jnothman
Copy link
Member
jnothman commented Apr 8, 2014

Thanks for all your work, Noel. This has my +1.

@jnothman jnothman changed the title [MRG] Weighted metrics [MRG+1] Weighted metrics Apr 8, 2014
@ndawe ndawe mentioned this pull request Apr 9, 2014
6 tasks
@ndawe ndawe changed the title [MRG+1] Weighted metrics [MRG+1] sample_weight support in metrics Apr 9, 2014
@glouppe
Copy link
Contributor
glouppe commented Apr 9, 2014

I am not so familiar with the metrics module, but these changes look fine to me. Tests are thorough.

@arjoly
Copy link
Member
arjoly commented Apr 9, 2014

Thanks @ndawe. I will try to have a look at this pr today.

numerator = ((y_true - y_pred) ** 2).sum(dtype=np.float64)
denominator = ((y_true - y_true.mean(axis=0)) ** 2).sum(dtype=np.float64)
if sample_weight is not None:
sample_weight = column_or_1d(sample_weight)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not factoring this king of validation in _check_reg_targets?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not since sample_weight isn't a target.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those _check_ function are general utility to decrease code duplication. Maybe the name is not the best.

@ndawe
Copy link
Member Author
ndawe commented Apr 13, 2014

@jnothman @glouppe I would like to add an example once the grid search and cross-validation parts of #1574 are ready.

@jnothman
Copy link
Member

I wasn't expecting that you do so before then, I just thought it would be a
worthwhile addition to this contribution.

On 13 April 2014 10:35, Noel Dawe notifications@github.com wrote:

@jnothman https://github.com/jnothman @glouppehttps://github.com/glouppeI would like to add an example once the grid search and cross-validation
parts of #1574 https://github.com/scikit-learn/scikit-learn/pull/1574are ready.

Reply to this email directly or view it on GitHubhttps://github.com//pull/3043#issuecomment-40296337
.

return np.mean(score)
else:
if sample_weight is not None:
return np.sum(np.multiply(score, sample_weight))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: It seems equivalent to np.dot(score, sample_weight)?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, done.

@arjoly
Copy link
Member
arjoly commented Apr 13, 2014

@ndawe This looks very good! Some little arrangements remains before it is finalised.

@glouppe
Copy link
Contributor
glouppe commented Apr 13, 2014

@jnothman @glouppe I would like to add an example once the grid search and cross-validation parts of #1574 are ready.

Same as @jnothman. It doesn't need to be included in this PR.

@coveralls
Copy link

Coverage Status

Coverage remained the same when pulling c84d26e on ndawe:weighted_metrics into 070d7a6 on scikit-learn:master.

@ndawe
Copy link
Member Author
ndawe commented Apr 20, 2014

whats_new.rst is updated. Any other comments?

@arjoly
Copy link
Member
arjoly commented Apr 20, 2014

When the last comment is addressed about micro/macro averaging, LGTM !

@ndawe
Copy link
Member Author
ndawe commented Apr 21, 2014

Great! Last comment addressed.

@arjoly
Copy link
Member
arjoly commented Apr 21, 2014

merge by rebase with some last cosmetic changes in 3c9e458 !

Thanks @ndawe ! 🍻

@GaelVaroquaux
Copy link
Member

Thanks @ndawe ! 🍻

👍. Great work!

@ndawe
Copy link
Member Author
ndawe commented Apr 21, 2014

Glad to have this merged! Thanks a lot for the reviews. Should this PR now be closed?

@arjoly
Copy link
Member
arjoly commented Apr 21, 2014

Yes, we can!

One more time congratulation !

@arjoly arjoly closed this Apr 21, 2014
@jnothman
Copy link
Member

Awesome. Thanks for the patience and the hard work, @ndawe!

On 22 April 2014 03:04, Arnaud Joly notifications@github.com wrote:

Closed #3043 #3043.


Reply to this email directly or view it on GitHubhttps://github.com//pull/3043
.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants
0