10000 [MRG+1] Issue w/ tf-idf computation by ermakovpetr · Pull Request #5900 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

[MRG+1] Issue w/ tf-idf computation #5900

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

ermakovpetr
Copy link

Fixes #4391

< 8000 /div>
@ermakovpetr ermakovpetr force-pushed the #4391 branch 3 times, most recently from ba5dd42 to 5efb349 Compare November 22, 2015 22:29
@glouppe
Copy link
Contributor
glouppe commented Feb 11, 2016

Looks good to me. This makes it possible to use the canonical tf-idf, without changing the current behaviour. +1

@glouppe glouppe changed the title Issue w/ tf-idf computation [MRG+1] Issue w/ tf-idf computation Feb 11, 2016
@ogrisel
Copy link
Member
ogrisel commented Feb 18, 2016

Could you please add a test and update the documentation to discuss this topic?

@ogrisel
Copy link
Member
ogrisel commented Feb 18, 2016

@larsmans I think this is a reasonable fix but I would also appreciate your opinion :)

@larsmans
Copy link
Member

LGTM but needs a test, e.g., one that compares vectorizer output to a straightforward textbook calculation of tf-idf.

@@ -990,7 +995,7 @@ def fit(self, X, y=None):

# log+1 instead of log makes sure terms with zero idf don't get
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment should be updated.

@jnothman
Copy link
Member

I don't think either the name additional_idf nor the current parameter description enlightens to the user that this is how much tf should be weighted independent of df. Can we come up with a better parameter name?

@amueller
Copy link
Member

This looks stalled and I'd argue #7015 fixes it.

@amueller amueller closed this Oct 11, 2016
@AliceGab
Copy link
AliceGab commented Oct 23, 2018

Hello,
Was this implemented in the end ? It seems that the current code doesn't support tf-idf = tf x idf instead of tf-idf = tf + tf x idf...
Otherwise, is there any way to go around it by calculating tf and idf separately ?
Thanks !

@jnothman
Copy link
Member

See the smooth_idf parameter

@amueller
Copy link
Member

@jnothman looking at the other issue, no I don't think that parameter actually controls that and there is no way to compute the original tf-idf iirc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants
0