Fixes #4355: DictVectorizer.restrict docstring unclear #4356

vinayak-mehta · 2015-03-07T05:59:19Z

vinayak-mehta · 2015-03-07T06:05:11Z

sklearn/feature_extraction/dict_vectorizer.py

-        """Restrict the features to those in support.
+        """Gives support for feature selection by restricting the features to those in support.
+
+        It modifies the estimator in-place.


@amueller - How can I improve this line?

vinayak-mehta · 2015-03-07T21:43:07Z

@amueller - Travis gives this [shown below], when I checked on python 2.7.9 on my system, it also gave what Travis is getting. One way to overcome this would be to direct the output to null I guess, what should I do?

EDIT: Fixed it with what I suggested above, though is it the right way to go in an example?

Failed example:
v.restrict(support.get_support(), indices=False)
Expected:
DictVectorizer(dtype=, separator='=', sort=True,
sparse=True)
Got:
DictVectorizer(dtype=, separator='=', sort=True,
sparse=True)

amueller · 2015-03-09T14:23:02Z

sklearn/feature_extraction/dict_vectorizer.py

+        >>> support = SelectKBest(chi2, k=2).fit(X, [0, 1])
+        >>> v.get_feature_names()
+        ['bar', 'baz', 'foo']
+        >>> v.restrict(support.get_support(), indices=False).stdout = os.devnull


do you need the indices=False?
I would try to avoid the stdout trick. What was your problem before? Did you try #doctest: +NORMALIZE_WHITESPACE

If there is different output for different versions, use # doctest: +ELLIPSIS and insert "..." as wildcards (git grep ELLIPSIS for examples)

Yeah, it doesn't need indices=False, I'll remove it.

I was getting different outputs for v.restrict(support.get_support()) in python 3.4.3 and python 2.7.9 on my box, because of which the Travis build was failing.

Python 3.4.3

DictVectorizer(dtype=<class 'numpy.float64'>, separator='=', sort=True, sparse=True)

Python 2.7.9

DictVectorizer(dtype=<type 'numpy.float64'>, separator='=', sparse=True)

I'll try what you suggested.

vinayak-mehta · 2015-03-10T01:18:14Z

@amueller - Thanks for telling me about ELLIPSIS, should I squash these commits together?

amueller · 2015-03-10T01:19:49Z

sklearn/feature_extraction/dict_vectorizer.py

+        >>> v.get_feature_names()
+        ['bar', 'baz', 'foo']
+        >>> v.restrict(support.get_support()) # doctest: +ELLIPSIS
+        DictVectorizer(dtype=<...


please keep as much of the text as possible. The current one looks odd. I'm surprised you say there is a sort argument on python 3.4 but not on 2.7..... That can't be right.

amueller · 2015-03-10T01:30:14Z

once we are happy with them, yes. I think you should only use them for the smallest possible part, so that the docstring remains as informative as possible. I think the dtype represenation is what changed, so you leave everything except for dtype=...

vinayak-mehta · 2015-03-10T01:32:03Z

Oh, I thought we had to write it from the beginning of the output till the changed part, I'll fix it.

vinayak-mehta · 2015-03-10T01:33:31Z

If I remember correctly, I was getting sort argument earlier. I think I might have changed something while checking it this time.

vinayak-mehta · 2015-03-10T02:06:51Z

@amueller - Fixed the output, is this what you meant?

amueller · 2015-03-10T02:29:11Z

yes, that is good :)

amueller · 2015-03-10T02:30:01Z

sklearn/feature_extraction/dict_vectorizer.py

@@ -312,7 +312,9 @@ def get_feature_names(self):
        return self.feature_names_

    def restrict(self, support, indices=False):
-        """Restrict the features to those in support.
+        """Gives support for feature selection by restricting the features to those in support.


This line looks too long. Is it 79 characters?

I think it is 84. Should the docstrings be less than 79 characters? Should I add a line break? Or perhaps, try to shorten this line and add that information in the estimator line?

Everything should be less than 80 characters according to pep8. The first line of the docstring should be a single line, so please rather reformulate.

vinayak-mehta · 2015-03-10T08:09:26Z

Sorry for the late response, had to go somewhere.

coveralls · 2015-03-10T19:50:47Z

Coverage remained the same at 95.12% when pulling 3d57517 on vortex-ape:dict_vectorizer into 588b3f7 on scikit-learn:master.

vinayak-mehta · 2015-03-11T05:11:39Z

@amueller - Does this one look good? Should I squash?

updated docstring fixed build error fixed build issue occuring due to different python versions added doctest: +ELLIPSIS fixed output Reformulated docstring

coveralls · 2015-03-17T17:41:03Z

Coverage remained the same at 95.11% when pulling ef9cbcb on vortex-ape:dict_vectorizer into 0779566 on scikit-learn:master.

amueller · 2015-04-01T21:44:39Z

Looks good, merging as small doc improvement.

Fixes #4355: DictVectorizer.restrict docstring unclear

vinayak-mehta mentioned this pull request Mar 7, 2015

DictVectorizer.restrict docstring unclear #4355

Closed

vinayak-mehta reviewed Mar 7, 2015
View reviewed changes

vinayak-mehta changed the title ~~Fixes issue #4355: DictVectorizer.restrict docstring unclear~~ Fixes issue: DictVectorizer.restrict docstring unclear Mar 7, 2015

vinayak-mehta changed the title ~~Fixes issue: DictVectorizer.restrict docstring unclear~~ Fixes #4355: DictVectorizer.restrict docstring unclear Mar 7, 2015

vinayak-mehta force-pushed the dict_vectorizer branch 2 times, most recently from 7149bbe to fe36291 Compare March 7, 2015 20:36

vinayak-mehta force-pushed the dict_vectorizer branch from 4ac8a94 to 6846744 Compare March 9, 2015 12:45

amueller reviewed Mar 9, 2015
View reviewed changes

amueller reviewed Mar 10, 2015
View reviewed changes

vinayak-mehta force-pushed the dict_vectorizer branch from 1858152 to 568276c Compare March 10, 2015 02:13

amueller reviewed Mar 10, 2015
View reviewed changes

vinayak-mehta force-pushed the dict_vectorizer branch 3 times, most recently from 416eb92 to c653b64 Compare March 12, 2015 21:34

added docstrings for restrict

ef9cbcb

updated docstring fixed build error fixed build issue occuring due to different python versions added doctest: +ELLIPSIS fixed output Reformulated docstring

vinayak-mehta force-pushed the dict_vectorizer branch from c653b64 to ef9cbcb Compare March 17, 2015 16:53

amueller added a commit that referenced this pull request Apr 1, 2015

Merge pull request #4356 from vortex-ape/dict_vectorizer

6e54079

Fixes #4355: DictVectorizer.restrict docstring unclear

amueller merged commit 6e54079 into scikit-learn:master Apr 1, 2015

vinayak-mehta deleted the dict_vectorizer branch April 1, 2015 21:46

vinayak-mehta mentioned this pull request Apr 6, 2015

[MRG + 2 -.5] Listed valid metrics for neighbors algorithms #4525

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Fixes #4355: DictVectorizer.restrict docstring unclear #4356

Fixes #4355: DictVectorizer.restrict docstring unclear #4356

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Fixes #4355: DictVectorizer.restrict docstring unclear #4356

Fixes #4355: DictVectorizer.restrict docstring unclear #4356

Uh oh!

Conversation

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!