-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
[MRG+1] Issue w/ tf-idf computation #5900
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
ba5dd42
to
5efb349
Compare
Looks good to me. This makes it possible to use the canonical tf-idf, without changing the current behaviour. +1 |
Could you please add a test and update the documentation to discuss this topic? |
@larsmans I think this is a reasonable fix but I would also appreciate your opinion :) |
LGTM but needs a test, e.g., one that compares vectorizer output to a straightforward textbook calculation of tf-idf. |
@@ -990,7 +995,7 @@ def fit(self, X, y=None): | |||
|
|||
# log+1 instead of log makes sure terms with zero idf don't get |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comment should be updated.
I don't think either the name |
This looks stalled and I'd argue #7015 fixes it. |
Hello, |
See the |
@jnothman looking at the other issue, no I don't think that parameter actually controls that and there is no way to compute the original tf-idf iirc. |
Fixes #4391