8000 Fixes #4355: DictVectorizer.restrict docstring unclear by vinayak-mehta · Pull Request #4356 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Fixes #4355: DictVectorizer.restrict docstring unclear #4356

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 1, 2015

Conversation

vinayak-mehta
Copy link
Contributor

"""Restrict the features to those in support.
"""Gives support for feature selection by restricting the features to those in support.

It modifies the estimator in-place.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amueller - How can I improve this line?

@vinayak-mehta vinayak-mehta changed the title Fixes issue #4355: DictVectorizer.restrict docstring unclear Fixes issue: DictVectorizer.restrict docstring unclear Mar 7, 2015
@vinayak-mehta vinayak-mehta changed the title Fixes issue: DictVectorizer.restrict docstring unclear Fixes #4355: DictVectorizer.restrict docstring unclear Mar 7, 2015
@vinayak-mehta vinayak-mehta force-pushed the dict_vectorizer branch 2 times, most recently from 7149bbe to fe36291 Compare March 7, 2015 20:36
@vinayak-mehta
Copy link
Contributor Author

@amueller - Travis gives this [shown below], when I checked on python 2.7.9 on my system, it also gave what Travis is getting. One way to overcome this would be to direct the output to null I guess, what should I do?

EDIT: Fixed it with what I suggested above, though is it the right way to go in an example?

Failed example:
v.restrict(support.get_support(), indices=False)
Expected:
DictVectorizer(dtype=, separator='=', sort=True,
sparse=True)
Got:
DictVectorizer(dtype=, separator='=', sort=True,
sparse=True)

>>> support = SelectKBest(chi2, k=2).fit(X, [0, 1])
>>> v.get_feature_names()
['bar', 'baz', 'foo']
>>> v.restrict(support.get_support(), indices=False).stdout = os.devnull
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you need the indices=False?
I would try to avoid the stdout trick. What was your problem before? Did you try #doctest: +NORMALIZE_WHITESPACE

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is different output for different versions, use # doctest: +ELLIPSIS and insert "..." as wildcards (git grep ELLIPSIS for examples)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it doesn't need indices=False, I'll remove it.

I was getting different outputs for v.restrict(support.get_support()) in python 3.4.3 and python 2.7.9 on my box, because of which the Travis build was failing.

Python 3.4.3

DictVectorizer(dtype=<class 'numpy.float64'>, separator='=', sort=True,
        sparse=True)

Python 2.7.9

DictVectorizer(dtype=<type 'numpy.float64'>, separator='=', sparse=True)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try what you suggested.

@vinayak-mehta
Copy link
Contributor Author

@amueller - Thanks for telling me about ELLIPSIS, should I squash these commits together?

>>> v.get_feature_names()
['bar', 'baz', 'foo']
>>> v.restrict(support.get_support()) # doctest: +ELLIPSIS
DictVectorizer(dtype=<...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please keep as much of the text as possible. The current one looks odd. I'm surprised you say there is a sort argument on python 3.4 but not on 2.7..... That can't be right.

@amueller
Copy link
Member

once we are happy with them, yes. I think you should only use them for the smallest possible part, so that the docstring remains as informative as possible. I think the dtype represenation is what changed, so you leave everything except for dtype=...

@vinayak-mehta
Copy link
Contributor Author

Oh, I thought we had to write it from the beginning of the output till the changed part, I'll fix it.

@vinayak-mehta
Copy link
Contributor Author

If I remember correctly, I was getting sort argument earlier. I think I might have changed something while checking it this time.

@vinayak-mehta
Copy link
Contributor Author

@amueller - Fixed the output, is this what you meant?

@amueller
Copy link
Member

yes, that is good :)

@@ -312,7 +312,9 @@ def get_feature_names(self):
return self.feature_names_

def restrict(self, support, indices=False):
"""Restrict the features to those in support.
"""Gives support for feature selection by restricting the features to those in support.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line looks too long. Is it 79 characters?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is 84. Should the docstrings be less than 79 characters? Should I add a line break? Or perhaps, try to shorten this line and add that information in the estimator line?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything should be less than 80 characters according to pep8. The first line of the docstring should be a single line, so please rather reformulate.

@vinayak-mehta
Copy link
Contributor Author

Sorry for the late response, had to go somewhere.

@coveralls
Copy link

Coverage Status

Coverage remained the same at 95.12% when pulling 3d57517 on vortex-ape:dict_vectorizer into 588b3f7 on scikit-learn:master.

@vinayak-mehta
Copy link
Contributor Author

@amueller - Does this one look good? Should I squash?

@vinayak-mehta vinayak-mehta force-pushed the dict_vectorizer branch 3 times, most recently from 416eb92 to c653b64 Compare March 12, 2015 21:34
updated docstring

fixed build error

fixed build issue occuring due to different python versions

added doctest: +ELLIPSIS

fixed output

Reformulated docstring
@coveralls
Copy link

Coverage Status

Coverage remained the same at 95.11% when pulling ef9cbcb on vortex-ape:dict_vectorizer into 0779566 on scikit-learn:master.

@amueller
Copy link
Member
amueller commented Apr 1, 2015

Looks good, merging as small doc improvement.

amueller added a commit that referenced this pull request Apr 1, 2015
Fixes #4355: DictVectorizer.restrict docstring unclear
@amueller amueller merged commit 6e54079 into scikit-learn:master Apr 1, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0