[WIP] FIX make sure sample_weight is taken into account by estimators #14246

glemaitre · 2019-07-03T14:35:55Z

It seems that we don't have a common test checking that over-sampling or under-sampling X give equivalent results as sample_weigth. This PR introduces this new common tests.

In addition, it fixes the estimators which do not follow this constraint if they should.

jnothman

This tests a necessary condition of sample_weight support only in some situations. There will be cases where the weighted version is implemented differently... And cases like MLP where we might decide that the minibatch should be sampled without weights and the updates weighted.

A sufficient condition would also check that sample_weight had some effect at all on the model learnt.

sklearn/utils/estimator_checks.py

glemaitre · 2019-07-05T13:12:36Z

This tests a necessary condition of sample_weight support only in some situations. There will be cases where the weighted version is implemented differently... And cases like MLP where we might decide that the minibatch should be sampled without weights and the updates weighted.

A sufficient condition would also check that sample_weight had some effect at all on the model learnt.

You are right. It was my comments in #14532 that this test can fail due to implementation details rather than bugs. However, it allowed me to find some bugs (at least one for the moment).

Surely I will implement, the version that you are mentioning. However, depending on the follow-up I would maybe propose to implement both tests with an additional estimator tag. But I need to investigate all failing estimators and what is the underlying reason.

…/fix_sample_weight

amueller · 2019-09-19T15:47:52Z

sklearn/utils/estimator_checks.py

+    # check that the estimators yield same results for
+    # over-sample dataset by indice filtering and using sample_weight
+    if (has_fit_parameter(estimator_orig, "sample_weight") and
+            not (hasattr(estimator_orig, "_pairwise")


why don't we use the pairwise tool to make it pairwise? Or do we not properly support sample weights then?

No idea, the base test was coming from the other sample_weight common test.

amueller · 2019-09-19T15:52:13Z

I can't see the builts :(
Which estimators were failing? As I said in #15015 I agree with @jnothman that online algorithms might be problematic and also that CV based algorithms might be problematic.
Given all the bugs, I feel we do need a test like this, though.
Dare I say we need a tag :(
Or should we make sure that the batch algorithms actually do "the right thing"? That might be more costly and not worth it, though. For convex optimization we should end up in the same place if we train until convergence, but for MLPs it's probably tricky? Though the prediction results on the training set really shouldn't be that different when changing the sampling of the batches a bit.

glemaitre · 2019-09-19T16:37:16Z

adaboostregressor, bagginregressor, calibratedclassifiercv, gradientboostingregressor, isolationforest, linearsvc, linearsvr, minibatchkmeans, kmeans, nusvr, oneclasssvm, perceptron, ransacregressor, randomforestregressor, sgdclassifier, sgdregressor

glemaitre · 2019-09-19T16:38:53Z

i think that the ensemble could be due to the bootstrapping which might see different X and therefore generate different boostrap. The SVM could be linked to what we discuss in the other PR.

Something is wrong with SGD (but this is stochastic) and LinearSVM (we should check the solver there)

amueller · 2019-09-19T16:41:08Z

LinearSVC just ignores the sample weights, see #10873

glemaitre · 2019-09-19T17:14:17Z

Digging in the code, it seems actually reasonable:

Only one configuration is using the sample_weight (+another fixed there #15018). They are not triggered by the default parameters. The test is passing for LinearSVC with dual=False (after merging #15018) and LinearSVR with dual='False' and loss='squared_epsilon_insensitive'.

So one clean way would be to raise a NotImplementedError for the other configuration in the meanwhile (waiting if somebody wants to implement, if possible). Our test could test this configuration.

adrinjalali · 2024-02-01T18:48:23Z

Should we close this and only do slep6 on it?

adrinjalali · 2024-02-01T18:49:25Z

Actually, we do have it for CalibratedClassifierCV. So I guess we can close this.

FIX make sure sample_weight is taken into account by estimators

6ab4fe6

glemaitre changed the title ~~FIX make sure sample_weight is taken into account by estimators~~ [WIP] FIX make sure sample_weight is taken into account by estimators Jul 3, 2019

typo

a853011

jnothman reviewed Jul 5, 2019

View reviewed changes

sklearn/utils/estimator_checks.py Outdated Show resolved Hide resolved

sklearn/utils/estimator_checks.py Outdated Show resolved Hide resolved

glemaitre added 2 commits July 5, 2019 15:17

improve readibility

1271d8d

Merge remote-tracking branch 'glemaitre/is/fix_sample_weight' into is…

05ebcc6

…/fix_sample_weight

glemaitre mentioned this pull request Jul 5, 2019

[WIP] FIX make sure sample_weight has an effect on predictions #14266

Closed

amueller added the Needs Decision Requires decision label Aug 6, 2019

glemaitre mentioned this pull request Sep 19, 2019

add common test that zero sample weight means samples are ignored #15015

Closed

14 tasks

amueller added this to the 0.22 milestone Sep 19, 2019

amueller reviewed Sep 19, 2019

View reviewed changes

jnothman modified the milestones: 0.22, 0.23 Oct 31, 2019

github-actions bot added the module:utils label Mar 2, 2020

thomasjpfan modified the milestones: 0.23, 0.24 Apr 20, 2020

cmarmo removed this from the 0.24 milestone Oct 15, 2020

Base automatically changed from master to main January 22, 2021 10:51

adrinjalali closed this Feb 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[WIP] FIX make sure sample_weight is taken into account by estimators #14246

[WIP] FIX make sure sample_weight is taken into account by estimators #14246

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Uh oh!

[WIP] FIX make sure sample_weight is taken into account by estimators #14246

[WIP] FIX make sure sample_weight is taken into account by estimators #14246

Uh oh!

Conversation

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants