-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
[MRG+1] Implementing fix for fetching 20newsgroup dataset #4600
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
this is the same fix as #4435, right? Did you find out what was wrong with the bunch? |
Yes this is the same fix as #4435 but made on top of current master (manual rebase) and with a test. The bunch is just a 3-lines hack is not really meant to be "read / write" but only "read only". The problem is that this specific data loader is modifying the attributes of a |
@@ -38,6 +38,19 @@ def test_20news(): | |||
entry2 = data.data[np.where(data.target == label)[0][0]] | |||
assert_equal(entry1, entry2) | |||
|
|||
def test_20news_length(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be more explicit to name that test test_20news_length_consistency
.
@@ -18,6 +18,7 @@ | |||
from os.path import isdir | |||
from os import listdir | |||
from os import makedirs | |||
import re |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should not have been committed.
Adding set/getattr methods that fill/query the same thing as `bunch[key]`. Add test for a non-regression bug in fetch_20newsgroups.
The above suggestions have been implemented. Bunch "bug" has been fixed. |
Looks good to me. Thanks for the fix and the test! (if travis is green) |
thanks ! |
[MRG+1] Implementing fix for fetching 20newsgroup dataset
Thanks for the fix. with the old implementation, I did bunch = Bunch(x="x")
bunch.x = "y"
bunch['x']
Why did this work? Why is the current addition necessary? |
Maybe is related to pickle (the bunch is loaded from pickle), I will do some tests... |
Indeed, the feature comes with pickle:
I will add a test to test_base.py and make a PR |
"interesting" ^^ |
Thanks for the fix / test :) |
Updated Bunch getattr definition to raise the appropriate error (should raise Attribute error not KeyError) See issue scikit-learn#4600
Updated Bunch getattr definition to raise the appropriate error (should raise Attribute error not KeyError) See issue scikit-learn#4600
Fix (sub)issue #4600 adding test to Bunch class
Fixing issue #4435 :
the fetched bunch behaved differently for
.data
and['data']
(similary for target and filenames)
Fixed by updating both in the fetching (same as was suggested in the issue)
I copied the (three lines) changes from commit e3c9294
instead of doing a rebase.
Also test has been added to check that both
.data
and['data']
have the same length(testing also target and filenames)