Curious for IDF calculations #4544

l1th1um · 2015-04-08T03:58:38Z

Hi,

I'm curious for IDF calculations (sklearn/feature_extraction/text.py) line 978

In 0.9, the formula for calculating IDF is log(n_samples/ (df + 1)), then in 16.0 the formula become log((n_samples + 1) / (df + 1)) + 1.

According to the Cornell SMART system, the formula of IDF is log((n_samples + 1) / (df + 1)). So, may I know the reason or reference paper for adding another +1 at the end of formula?

*smooth_idf = True

jnothman · 2015-04-08T05:34:25Z

See #4391, #2998

l1th1um · 2015-04-08T06:27:25Z

thx

l1th1um closed this as completed Apr 8, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Curious for IDF calculations #4544

Curious for IDF calculations #4544

Uh oh!

Uh oh!

Uh oh!

Curious for IDF calculations #4544

Curious for IDF calculations #4544

Comments

Uh oh!

Uh oh!