-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
DOC be more explicit about tfidf-formula #6996
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
seems like there was a PR addressing part of the issue at #6839, but it was closed by the author |
Yeah, the Tf-idf is a tricky one ... I tried to reproduce the results ~1 year ago and got different results so that I looked a bit closer at the code at that time. I just found an older write-up based on that, which may be useful as a template (for the equations) -- given that nothing has changed 'til then: http://nbviewer.jupyter.org/github/rasbt/pattern_classification/blob/master/machine_learning/scikit-learn/tfidf_scikit-learn.ipynb#Tf-idf-in-scikit-learn |
thanks, yeah that's a good reference. I would add a shorter version of that to the docstring and the user guide |
Okay, let me work on this one then ... |
Hello @amueller - looks like a PR was submitted weeks ago; is this issue closed or does it need more work? |
The actual formula used in our tf-idf formulation is somewhat non-obvious #2998 #4391. For now, I think it would be good to be more explicit about what is actually computed, both in the user guide and in the docstring itself.
The text was updated successfully, but these errors were encountered: