-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
Fixes #4355: DictVectorizer.restrict docstring unclear #4356
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
"""Restrict the features to those in support. | ||
"""Gives support for feature selection by restricting the features to those in support. | ||
|
||
It modifies the estimator in-place. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@amueller - How can I improve this line?
7149bbe
to
fe36291
Compare
@amueller - Travis gives this [shown below], when I checked on python 2.7.9 on my system, it also gave what Travis is getting. One way to overcome this would be to direct the output to null I guess, what should I do? EDIT: Fixed it with what I suggested above, though is it the right way to go in an example? Failed example: v.restrict(support.get_support(), indices=False) Expected: DictVectorizer(dtype=, separator='=', sort=True, sparse=True) Got: DictVectorizer(dtype=, separator='=', sort=True, sparse=True) |
4ac8a94
to
6846744
Compare
>>> support = SelectKBest(chi2, k=2).fit(X, [0, 1]) | ||
>>> v.get_feature_names() | ||
['bar', 'baz', 'foo'] | ||
>>> v.restrict(support.get_support(), indices=False).stdout = os.devnull |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you need the indices=False
?
I would try to avoid the stdout
trick. What was your problem before? Did you try #doctest: +NORMALIZE_WHITESPACE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is different output for different versions, use # doctest: +ELLIPSIS
and insert "..." as wildcards (git grep ELLIPSIS
for examples)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it doesn't need indices=False
, I'll remove it.
I was getting different outputs for v.restrict(support.get_support())
in python 3.4.3 and python 2.7.9 on my box, because of which the Travis build was failing.
Python 3.4.3
DictVectorizer(dtype=<class 'numpy.float64'>, separator='=', sort=True, sparse=True)
Python 2.7.9
DictVectorizer(dtype=<type 'numpy.float64'>, separator='=', sparse=True)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try what you suggested.
@amueller - Thanks for telling me about ELLIPSIS, should I squash these commits together? |
>>> v.get_feature_names() | ||
['bar', 'baz', 'foo'] | ||
>>> v.restrict(support.get_support()) # doctest: +ELLIPSIS | ||
DictVectorizer(dtype=<... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please keep as much of the text as possible. The current one looks odd. I'm surprised you say there is a sort
argument on python 3.4 but not on 2.7..... That can't be right.
once we are happy with them, yes. I think you should only use them for the smallest possible part, so that the docstring remains as informative as possible. I think the dtype represenation is what changed, so you leave everything except for |
Oh, I thought we had to write it from the beginning of the output till the changed part, I'll fix it. |
If I remember correctly, I was getting sort argument earlier. I think I might have changed something while checking it this time. |
@amueller - Fixed the output, is this what you meant? |
1858152
to
568276c
Compare
yes, that is good :) |
@@ -312,7 +312,9 @@ def get_feature_names(self): | |||
return self.feature_names_ | |||
|
|||
def restrict(self, support, indices=False): | |||
"""Restrict the features to those in support. | |||
"""Gives support for feature selection by restricting the features to those in support. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line looks too long. Is it 79 characters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is 84. Should the docstrings be less than 79 characters? Should I add a line break? Or perhaps, try to shorten this line and add that information in the estimator line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything should be less than 80 characters according to pep8. The first line of the docstring should be a single line, so please rather reformulate.
Sorry for the late response, had to go somewhere. |
@amueller - Does this one look good? Should I squash? |
416eb92
to
c653b64
Compare
updated docstring fixed build error fixed build issue occuring due to different python versions added doctest: +ELLIPSIS fixed output Reformulated docstring
c653b64
to
ef9cbcb
Compare
Looks good, merging as small doc improvement. |
Fixes #4355: DictVectorizer.restrict docstring unclear
#4355