[MRG+1] sample_weight support in metrics #3043

ndawe · 2014-04-06T23:35:33Z

This PR is part of the larger #1574 and adds support for sample weights in the metrics.

coveralls · 2014-04-06T23:43:11Z

Coverage remained the same when pulling 011cecf on ndawe:weighted_metrics into 070d7a6 on scikit-learn:master.

coveralls · 2014-04-06T23:51:06Z

Coverage remained the same when pulling 714c82e on ndawe:weighted_metrics into 070d7a6 on scikit-learn:master.

ndawe · 2014-04-06T23:53:53Z

@jnothman @arjoly this should be ready to merge. I'll then move on to cross-validation and grid searching.

jnothman · 2014-04-07T04:49:40Z

I'm away from home and may take some time to review this, sorry.

On 7 April 2014 09:53, Noel Dawe notifications@github.com wrote:

@jnothman https://github.com/jnothman @arjolyhttps://github.com/arjolythis should be ready to merge. I'll then move on to cross-validation and
grid searching.

Reply to this email directly or view it on GitHubhttps://github.com//pull/3043#issuecomment-39687407
.

ndawe · 2014-04-07T05:00:15Z

No problem. Maybe someone else can review? These are the same changes you reviewed in #1574 when you said it was ready to merge, so hopefully the review can be quick.

jnothman · 2014-04-07T09:39:53Z

sklearn/metrics/metrics.py

@@ -428,6 +468,8 @@ def _average_binary_score(binary_metric, y_true, y_score, average):
        raise ValueError("{0} format is not supported".format(y_type))

    if y_type == "binary":
+        if sample_weight is not None:
+            return binary_metric(y_true, y_score, sample_weight=sample_weight)


why not just reuse the default case?

Oh. Forgot binary_metric may not support sample_weight. Since _average_binary_score is private, and new to this release, and all its uses now support sample_weight, it's safe to require this as a property of binary_metric, and I'd recommend doing so.

Done in d4794c5

jnothman · 2014-04-07T13:05:25Z

Apart from those minor comments, looks good to me.

ndawe · 2014-04-07T20:50:31Z

All changes are applied. Let's see if it's still green.

ndawe · 2014-04-07T21:52:46Z

Thanks for your comments. This PR is still green and should be ready to merge now.

jnothman · 2014-04-07T22:05:49Z

sklearn/metrics/tests/test_metrics.py

+            "not equal (%f != %f) for %s" % (
+                weighted_score, weighted_score_list, name))
+
+    if not name.startswith('samples'):


Actually, I've not understood why this condition should be the case. Why does the per-sample scoring not still result in a weighted sum?

Sorry, that should not have been there. In fact, the sample weights were not correctly accounted for when average='samples'. This should now be fixed.

coveralls · 2014-04-08T02:06:34Z

Coverage remained the same when pulling 24c9340 on ndawe:weighted_metrics into 070d7a6 on scikit-learn:master.

jnothman · 2014-04-08T10:54:13Z

Thanks for all your work, Noel. This has my +1.

glouppe · 2014-04-09T09:03:56Z

I am not so familiar with the metrics module, but these changes look fine to me. Tests are thorough.

arjoly · 2014-04-09T09:17:48Z

Thanks @ndawe. I will try to have a look at this pr today.

arjoly · 2014-04-09T13:03:30Z

sklearn/metrics/metrics.py

-    numerator = ((y_true - y_pred) ** 2).sum(dtype=np.float64)
-    denominator = ((y_true - y_true.mean(axis=0)) ** 2).sum(dtype=np.float64)
+    if sample_weight is not None:
+        sample_weight = column_or_1d(sample_weight)


Why not factoring this king of validation in _check_reg_targets?

Maybe not since sample_weight isn't a target.

Those _check_ function are general utility to decrease code duplication. Maybe the name is not the best.

ndawe · 2014-04-13T00:35:24Z

@jnothman @glouppe I would like to add an example once the grid search and cross-validation parts of #1574 are ready.

jnothman · 2014-04-13T01:03:33Z

I wasn't expecting that you do so before then, I just thought it would be a
worthwhile addition to this contribution.

On 13 April 2014 10:35, Noel Dawe notifications@github.com wrote:

@jnothman https://github.com/jnothman @glouppehttps://github.com/glouppeI would like to add an example once the grid search and cross-validation
parts of #1574 https://github.com/scikit-learn/scikit-learn/pull/1574are ready.

Reply to this email directly or view it on GitHubhttps://github.com//pull/3043#issuecomment-40296337
.

arjoly · 2014-04-13T14:48:45Z

sklearn/metrics/metrics.py

        return np.mean(score)
    else:
+        if sample_weight is not None:
+            return np.sum(np.multiply(score, sample_weight))


Nitpick: It seems equivalent to np.dot(score, sample_weight)?

arjoly · 2014-04-13T15:33:01Z

@ndawe This looks very good! Some little arrangements remains before it is finalised.

glouppe · 2014-04-13T20:13:35Z

@jnothman @glouppe I would like to add an example once the grid search and cross-validation parts of #1574 are ready.

Same as @jnothman. It doesn't need to be included in this PR.

coveralls · 2014-04-20T05:04:43Z

Coverage remained the same when pulling c84d26e on ndawe:weighted_metrics into 070d7a6 on scikit-learn:master.

ndawe · 2014-04-20T05:52:21Z

whats_new.rst is updated. Any other comments?

arjoly · 2014-04-20T09:23:17Z

When the last comment is addressed about micro/macro averaging, LGTM !

…r precision, recall, and f-score

ndawe · 2014-04-21T04:16:25Z

Great! Last comment addressed.

arjoly · 2014-04-21T09:43:01Z

merge by rebase with some last cosmetic changes in 3c9e458 !

Thanks @ndawe ! 🍻

GaelVaroquaux · 2014-04-21T10:16:06Z

Thanks @ndawe ! 🍻

👍. Great work!

ndawe · 2014-04-21T16:54:58Z

Glad to have this merged! Thanks a lot for the reviews. Should this PR now be closed?

arjoly · 2014-04-21T17:04:46Z

Yes, we can!

One more time congratulation !

jnothman · 2014-04-22T09:57:05Z

Awesome. Thanks for the patience and the hard work, @ndawe!

On 22 April 2014 03:04, Arnaud Joly notifications@github.com wrote:

Closed #3043 #3043.

—
Reply to this email directly or view it on GitHubhttps://github.com//pull/3043
.

ndawe added 3 commits April 6, 2014 14:29

add sample_weight to base score and weight_boosting staged_score

ddd1471

weight_boosting: unneeded np.copy

0a94946

weight_boosting: include sample_weight in test_staged_predict

0870997

metrics: add sample_weight support

714c82e

jnothman reviewed Apr 7, 2014
View reviewed changes

ndawe added 4 commits April 7, 2014 13:36

rm (default=None)

98fdc26

require sample_weight support for binary_metric

d4794c5

newline

c95ef51

atleast_2d.reshape -> reshape

1d5ba2a

jnothman reviewed Apr 7, 2014
View reviewed changes

weighted metrics: fix sample_weight handling for average=samples

24c9340

jnothman changed the title ~~[MRG] Weighted metrics~~ [MRG+1] Weighted metrics Apr 8, 2014

ndawe mentioned this pull request Apr 9, 2014

[WIP] sample_weight support #1574

Closed

6 tasks

ndawe changed the title ~~[MRG+1] Weighted metrics~~ [MRG+1] sample_weight support in metrics Apr 9, 2014

arjoly reviewed Apr 9, 2014
View reviewed changes

ndawe added 3 commits April 9, 2014 15:11

format

91e4c10

metrics tests

01b3c93

doc: fix default

0bc686a

arjoly reviewed Apr 13, 2014
View reviewed changes

ndawe added 2 commits April 19, 2014 21:56

weighted metrics tests fixes

4da5742

np.sum(np.multiply( -> np.dot(

c84d26e

ndawe added 2 commits April 19, 2014 22:25

update whats_new.rst

da10f4c

add test_base.test_score_sample_weight

eeef074

sample_weight metrics tests: add missing micro and macro averaging fo…

7dd8392

…r precision, recall, and f-score

arjoly closed this Apr 21, 2014

Uh oh!

[MRG+1] sample_weight support in metrics #3043

[MRG+1] sample_weight support in metrics #3043

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!