8000 DOC be more explicit about tfidf-formula · Issue #6996 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

DOC be more explicit about tfidf-formula #6996

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
amueller opened this issue Jul 15, 2016 · 6 comments · Fixed by #7015
Closed

DOC be more explicit about tfidf-formula #6996

amueller opened this issue Jul 15, 2016 · 6 comments · Fixed by #7015
Labels
Documentation Easy Well-defined and straightforward way to resolve Sprint

Comments

@amueller
Copy link
Member

The actual formula used in our tf-idf formulation is somewhat non-obvious #2998 #4391. For now, I think it would be good to be more explicit about what is actually computed, both in the user guide and in the docstring itself.

@amueller amueller added Easy Well-defined and straightforward way to resolve Documentation Sprint labels Jul 15, 2016
@nelson-liu
Copy link
Contributor

seems like there was a PR addressing part of the issue at #6839, but it was closed by the author

@rasbt
Copy link
Contributor
rasbt commented Jul 16, 2016

Yeah, the Tf-idf is a tricky one ... I tried to reproduce the results ~1 year ago and got different results so that I looked a bit closer at the code at that time. I just found an older write-up based on that, which may be useful as a template (for the equations) -- given that nothing has changed 'til then: http://nbviewer.jupyter.org/github/rasbt/pattern_classification/blob/master/machine_learning/scikit-learn/tfidf_scikit-learn.ipynb#Tf-idf-in-scikit-learn

@amueller
Copy link
Member Author

thanks, yeah that's a good reference. I would add a shorter version of that to the docstring and the user guide

@rasbt
Copy link
Contributor
rasbt commented Jul 16, 2016

Okay, let me work on this one then ...

@vharavu
Copy link
Contributor
vharavu commented Aug 8, 2016

Hello @amueller - looks like a PR was submitted weeks ago; is this issue closed or does it need more work?
thx.

@nelson-liu
Copy link
Contributor

@vharavu I think @rasbt 's current pull request is pretty sufficient for now...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation Easy Well-defined and straightforward way to resolve Sprint
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants
0