-
-
Notifications
You must be signed in to change notification settings - Fork 26k
[MRG+1] MAINT Refactor univariate_selection module #3131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Testing failed because coveralls went down (I can't say I've found coveralls especially useful). I'll try take a look at this soon... |
% (k, len(self.scores_))) | ||
else: | ||
scores = _clean_nans(self.scores_) | ||
# XXX This should be refactored; we're getting an array of indices |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can remove this comment. All our feature selectors output masks now (as they should, IMO). It will be turned back into indices only for sparse matrix indexing, but only until we require scipy >= 0.14 (?).
This LGTM. Pity it doesn't actually reduce the number of lines as well as the number of classes in |
Is WIP intentional? |
I want to write some tests for parameter checking and was waiting for travis. |
Rebase on top of master. This is ready for review. |
Still LGTM. +1 |
Thanks @jnothman for the review! |
(150, 4) | ||
>>> X_new = SelectKBest(chi2, k=2).fit_transform(X, y) | ||
>>> X_new.shape | ||
(150, 2) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While you are at it, I think it would be great to make it possible to pass "chi2" string constant for the score_func
argument to spare the user an import statement:
>>> from sklearn.datasets import load_iris
>>> from sklearn.feature_selection import SelectKBest
>>> iris = load_iris()
>>> X, y = iris.data, iris.target
>>> X.shape
(150, 4)
>>> X_new = SelectKBest(score_func="chi2", k=2).fit_transform(X, y)
>>> X_new.shape
(150, 2)
The docstring will need to be updated to give the list of score_func
values available by default.
Apart from my 2 comments, looks good to me. |
Side note, unrelated to this PR, but the feature selection should really To avoid a class explosion, it would be useful to explore setting the |
which the GenericUnivariateSelector already does. But I agree we can On 8 May 2014 02:17, Gael Varoquaux notifications@github.com wrote:
|
The class hierarchy could be rewrote to have only a By the way, fpr, fdr and fwe are horrible names to say that we correct or not for multiple tests (bonferonni or benjamini-hochberg correction). Is there a reason to have a
when you might want to select only based on a p-values (stastical test) or a score (e.g. correlation, variance)?
Good idea! But it's not worth to make it if we deprecate most classes of the module. |
…tput masks now (as they should, IMO). It will be turned back into indices only for sparse matrix indexing, but only until we require scipy >= 0.14 (?).')
I agree somewhat, in that I don't see why the same/similar facility Yet, when you have some selectors that require a score, and others that On 8 May 2014 18:07, Coveralls notifications@github.com wrote:
|
@GaelVaroquaux says
But which defaults? The appropriate test depends on the distribution of feature values in |
I changed some of the classes previously to only look at the score, not the p-value, so that I could implement pointwise MI feature selection (which I never did) and because the probabilities from the chi² test where unstable in high-d input. Having one API for scoring functions keeps the code simple. When you don't need probabilities, just return Also, I've never noticed anyone trying to define their own feature selection metric and reporting trouble. I interpret that as "it's not broken, so don't fix it". |
Merged from the command line as 494a91b. |
That was to be compatible with functions in scipy.stats. |
The refactoring :