[MRG+1] Issue w/ tf-idf computation #5900

ermakovpetr · 2015-11-22T14:47:16Z

Fixes #4391

< 8000 /div>

glouppe · 2016-02-11T10:08:19Z

Looks good to me. This makes it possible to use the canonical tf-idf, without changing the current behaviour. +1

ogrisel · 2016-02-18T14:05:54Z

Could you please add a test and update the documentation to discuss this topic?

ogrisel · 2016-02-18T14:06:25Z

@larsmans I think this is a reasonable fix but I would also appreciate your opinion :)

larsmans · 2016-02-18T14:38:36Z

LGTM but needs a test, e.g., one that compares vectorizer output to a straightforward textbook calculation of tf-idf.

jnothman · 2016-05-29T13:04:02Z

sklearn/feature_extraction/text.py

@@ -990,7 +995,7 @@ def fit(self, X, y=None):

            # log+1 instead of log makes sure terms with zero idf don't get


This comment should be updated.

jnothman · 2016-05-29T13:50:58Z

I don't think either the name additional_idf nor the current parameter description enlightens to the user that this is how much tf should be weighted independent of df. Can we come up with a better parameter name?

amueller · 2016-10-11T02:31:38Z

This looks stalled and I'd argue #7015 fixes it.

AliceGab · 2018-10-23T01:46:15Z

Hello,
Was this implemented in the end ? It seems that the current code doesn't support tf-idf = tf x idf instead of tf-idf = tf + tf x idf...
Otherwise, is there any way to go around it by calculating tf and idf separately ?
Thanks !

jnothman · 2018-10-27T20:51:04Z

See the smooth_idf parameter

amueller · 2018-10-28T21:32:20Z

@jnothman looking at the other issue, no I don't think that parameter actually controls that and there is no way to compute the original tf-idf iirc.

ermakovpetr force-pushed the #4391 branch 3 times, most recently from ba5dd42 to 5efb349 Compare November 22, 2015 22:29

Issue w/ tf-idf computation

8957461

ermakovpetr force-pushed the #4391 branch from 5efb349 to 8957461 Compare November 22, 2015 22:41

This was referenced Nov 22, 2015

Issue w/ tf-idf computation #4391 #4715

Closed

Issue w/ tf-idf computation #4391

Closed

amueller added the Waiting for Reviewer label Jan 15, 2016

glouppe changed the title ~~Issue w/ tf-idf computation~~ [MRG+1] Issue w/ tf-idf computation Feb 11, 2016

krishnakalyan3 mentioned this pull request May 28, 2016

TfidfVectorizer documentation doesn't track TfidfTransformer #6766

Closed

jnothman reviewed May 29, 2016
View reviewed changes

amueller closed this Oct 11, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[MRG+1] Issue w/ tf-idf computation #5900

[MRG+1] Issue w/ tf-idf computation #5900

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

		@@ -990,7 +995,7 @@ def fit(self, X, y=None):

		# log+1 instead of log makes sure terms with zero idf don't get

Uh oh!

[MRG+1] Issue w/ tf-idf computation #5900

[MRG+1] Issue w/ tf-idf computation #5900

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!