-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
CountVectorizer sets self.vocabulary_ in transform #14559
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hmmm... Not ideal. Should it just validate use `vocabulary` if fit was
never called??? It should probably then have tag requires_fit=False if
vocabulary is provided...
|
yes, both of these sound reasonable to me. |
@thomasjpfan do you remember someone (you? sprint?) working on this? I have some vague memories. |
We may have spoke about this IRL regarding |
hm ok might have also been me. Either way, seems open? |
This issue doesn't seem to be relevant anymore? There are two lines where scikit-learn/sklearn/feature_extraction/text.py Lines 1286 to 1354 in ee5fddc
and one is inside scikit-learn/sklearn/feature_extraction/text.py Lines 463 to 491 in ee5fddc
|
It's set in |
Hmm, I'm not sure, because it seems to create |
Right now CountVectorizer sometimes sets
self.vocabulary_
outside offit
. We usually prohibit this, but the common tests haven't reached the vectorizers yet.The text was updated successfully, but these errors were encountered: