-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
When fetching both training and test set of the 20newsgroup dataset, #4435
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When fetching both training and test set of the 20newsgroup dataset, #4435
Conversation
the data['data'], data['target'], data['filenames'] should be updated.
These are automatically updated for bunch = Bunch(x="x")
bunch.x = "y"
bunch['x']
Did you run into any issues? |
Please see the simple test below. Before fixing: In [1]: import sklearn.datasets In [2]: len(sklearn.datasets.fetch_20newsgroups(subset='all')['data']) In [3]: len(sklearn.datasets.fetch_20newsgroups(subset='train')['data']) In [4]: len(sklearn.datasets.fetch_20newsgroups(subset='test')['data']) After: In [1]: import sklearn.datasets In [2]: len(sklearn.datasets.fetch_20newsgroups(subset='all')['data']) In [3]: len(sklearn.datasets.fetch_20newsgroups(subset='train')['data']) In [4]: len(sklearn.datasets.fetch_20newsgroups(subset='test')['data']) |
I can reproduce, but I do not understand. Your fix is odd as the current version should be working. |
Sorry I just broke your PR by doing a small fix. |
Ah, I got it. At the place where this assignment happens |
It seems to be more subtle than that. While at line 98 |
I'm pretty sure |
Did I misunderstand anything here? |
Oh, you are right. |
Looks like something is wrong with our bunch implementation. |
This one should be closed now that #4600 has been merged. |
Thanks :) |
When fetching both training and test set of the 20newsgroup dataset, the data['data'], data['target'], data['filenames'] should be updated.