DOC be more explicit about tfidf-formula #6996

amueller · 2016-07-15T19:20:18Z

The actual formula used in our tf-idf formulation is somewhat non-obvious #2998 #4391. For now, I think it would be good to be more explicit about what is actually computed, both in the user guide and in the docstring itself.

nelson-liu · 2016-07-15T19:24:17Z

seems like there was a PR addressing part of the issue at #6839, but it was closed by the author

rasbt · 2016-07-16T00:25:57Z

Yeah, the Tf-idf is a tricky one ... I tried to reproduce the results ~1 year ago and got different results so that I looked a bit closer at the code at that time. I just found an older write-up based on that, which may be useful as a template (for the equations) -- given that nothing has changed 'til then: http://nbviewer.jupyter.org/github/rasbt/pattern_classification/blob/master/machine_learning/scikit-learn/tfidf_scikit-learn.ipynb#Tf-idf-in-scikit-learn

amueller · 2016-07-16T15:14:22Z

thanks, yeah that's a good reference. I would add a shorter version of that to the docstring and the user guide

rasbt · 2016-07-16T16:23:12Z

Okay, let me work on this one then ...

vharavu · 2016-08-08T23:37:02Z

Hello @amueller - looks like a PR was submitted weeks ago; is this issue closed or does it need more work?
thx.

nelson-liu · 2016-08-08T23:46:00Z

@vharavu I think @rasbt 's current pull request is pretty sufficient for now...

amueller added Easy Well-defined and straightforward way to resolve Documentation Sprint labels Jul 15, 2016

amueller added the Need Contributor label Jul 16, 2016

amueller removed the Need Contributor label Jul 16, 2016

rasbt mentioned this issue Jul 16, 2016

[MRG + 2] extending tfidf documentation #7015

Merged

jnothman closed this as completed in #7015 Aug 12, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

DOC be more explicit about tfidf-formula #6996

DOC be more explicit about tfidf-formula #6996

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

DOC be more explicit about tfidf-formula #6996

DOC be more explicit about tfidf-formula #6996

Comments

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!