8000 [MRG+1] explicit exception message for strict selectors by ogrisel · Pull Request #4206 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

[MRG+1] explicit exception message for strict selectors #4206

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Feb 7, 2015

Conversation

ogrisel
Copy link
Member
@ogrisel ogrisel commented Feb 5, 2015

This is a fix for #4059 to raise a ValueError at transform time with an explicit error message instead of crashing when calling get_support() with a cryptic error message.

Note that for consistency, the behavior of SelectKBest(k=0) is also impacted by this change.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.0%) to 95.0% when pulling 99f0044 on ogrisel:fix-strict-select-fdr into 1d08f08 on scikit-learn:master.

@jnothman
Copy link
Member
jnothman commented Feb 5, 2015

FWIW, I can imagine cases in which SelectKBest(k=0) is used, in the context of a grid search over FeatureUnion.

@amueller
Copy link
Member
amueller commented Feb 5, 2015

LGTM

@jnothman
Copy link
Member
jnothman commented Feb 5, 2015

In case some user code depends on the prior behaviour of SelectKBest and SelectPercentile -- which explicitly says percentile >= 0, I would rather raise a warning for now, perhaps saying "this will become an error in the future. Report to the scikit-learn mailing list if this will break your usage."

But this is more reasonable given the error_score feature in GridSearchCV.

@amueller
8000 Copy link
Member
amueller commented Feb 5, 2015

@jnothman in which case does that raise an error that didn't raise an error before?

@jnothman
Copy link
Member
jnothman commented Feb 6, 2015

SelectPercentile(percentile=0).fit_transform([[0]], [0]) raises an error with this PR

@ogrisel
Copy link
Member Author
ogrisel commented Feb 6, 2015

But passing an empty data array as input to the next step in the pipeline is extremely likely to cause hard to diagnose errors. I think it's better to fail fast with an explicit error message at the point where the problem is fixable.

I you find one legitimate case to transform data into an empty dataset I will reconsider this by ATM I don't see any.

@ogrisel
Copy link
Member Author
ogrisel commented Feb 6, 2015

This is the kind of weird downstream errors that might be caused by passing empty datasets to models:

>>> X = np.ones(0).reshape(10, 0)
>>> y = np.random.random_integers(0, 2, 10)
>>> SVC().fit(X, y)
Traceback (most recent call last):
  File "<ipython-input-20-0c9fbc689f4e>", line 1, in <module>
    SVC().fit(X, y)
  File "/Users/ogrisel/code/scikit-learn/sklearn/svm/base.py", line 164, in fit
    self._gamma = 1.0 / X.shape[1]
ZeroDivisionError: float division by zero

We should probably make our input validation check stricter for the case where X.shape[1] == 0 to raise better error messages in general but It think it's worth fixing it upstream in the selector mixin as well.

@amueller
Copy link
Member
amueller commented Feb 6, 2015

The use case @jnothman mentioned is where you want to grid-search over disabling one part of a FeatureUnion basically. That is somewhat vaild, and I feel we might not want to break it.

Two ways out:

  • Deprecate the percentile=0 case and possibly enrage users that used this
  • Return an empty array and warn that we returned no features, possibly leading to weird error messages (or fix the error message everywhere, which is an additional line in check_array ;) )

(FYI @jnothman's example gives a zero division warning on master)

@ogrisel
Copy link
Member Author
ogrisel commented Feb 6, 2015

Thanks for the clarification. I now get the FeatureUnion use case... I find it ugly to just issue a warning but maybe this is the best we can do in that case.

fix the error message everywhere, which is an additional line in check_array

This is good independently of the rest of the discussion, I will work on a PR for that.

@amueller
Copy link
Member
amueller commented Feb 6, 2015

Thanks @ogrisel :) Btw there are a couple more bug-fix PRs that could use your reviews ;) https://github.com/scikit-learn/scikit-learn/pulls?q=is%3Aopen+is%3Apr+milestone%3A0.16

@ogrisel ogrisel force-pushed the fix-strict-select-fdr branch from 99f0044 to bc5b403 Compare February 6, 2015 11:07
@ogrisel
8000 Copy link
Member Author
ogrisel commented Feb 6, 2015

I updated this PR based on your feedback to issue a UserWarning instead of raising a ValueError. Would you like me to introduce a new EmptyDataWarning class as a subclass of UserWarning to make it easier for users to silence this specific class of warnings when running grid search with FeatureUnion?

@ogrisel
Copy link
Member Author
ogrisel commented Feb 6, 2015

Thanks @ogrisel :) Btw there are a couple more bug-fix PRs that could use your reviews ;)

I know. I plan to dedicate this day to them (and maybe the BH t-SNE PR that deserves some love too).

@ogrisel
Copy link
Member Author
ogrisel commented Feb 6, 2015

BTW @amueller feel free to join http://gitter.im/scikit-learn/scikit-learn to speak about the priority of the bug fix issues to review.

@amueller
Copy link
Member
amueller commented Feb 6, 2015

Thanks and sorry for pestering ;) I'm travelling this week but will be more (re)active again next week.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.0%) to 95.0% when pulling bc5b403 on ogrisel:fix-strict-select-fdr into a168a6c on scikit-learn:master.

@ogrisel
Copy link
Member Author
ogrisel commented Feb 6, 2015

@amueller @jnothman would you prefer this to issue a specific new class EmptyDataWarning(UserWarning) or do you prefer to keep it like that with a generic UserWarning? I don't have strong opinion. I would like to make it easy to silence specific warnings but I don't want to introduce warning types bloat.

@amueller
Copy link
Member
amueller commented Feb 6, 2015

I'm not a huge fan of creating new warnings in most situations, -0 from me ;)

@ogrisel ogrisel changed the title [MRG] explicit exception message for strict selectors [MRG+1] explicit exception message for strict selectors Feb 6, 2015
@ogrisel
Copy link
Member Author
ogrisel commented Feb 6, 2015

@amueller I take your thanks as a +1 and marked this in the title to attract more reviewers, feel free to re-edit the title if you have more comments or don't think this is ready for merge.

@ogrisel ogrisel added the Bug label Feb 6, 2015
@ogrisel ogrisel added this to the 0.16 milestone Feb 6, 2015
# rejects all the features
X = rng.rand(40, 10)
y = rng.randint(0, 4, size=40)
fdr = SelectFdr(alpha=0.00001).fit(X, y)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you should test all heuristics and also SelectKBest and SelectPercentile?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@coveralls
Copy link

Coverage Status

Coverage increased (+0.01%) to 95.02% when pulling 2ce9eac on ogrisel:fix-strict-select-fdr into a168a6c on scikit-learn:master.

@jnothman
Copy link
Member
jnothman commented Feb 7, 2015

Regarding the FeatureUnion case, I might remind that #1769 is open and provides an alternative means of disabling an element of a FeatureUnion (or Pipeline). Of course that won't be a recourse for users in the short-term (and remains syntactically clunky when using GridSearch in that it's not just an alternative value of k).

And this LGTM. Merging.

@jnothman
Copy link
Member
jnothman commented Feb 7, 2015

After saying LGTM, I've realised LearntSelectorMixin, etc aren't tested. Still, this isn't a real impediment.

jnothman added a commit that referenced this pull request Feb 7, 2015
[MRG] explicit warning message for strict selectors

Also fixes #4059
@jnothman jnothman merged commit 95681ee into scikit-learn:master Feb 7, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0