You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I'm curious for IDF calculations (sklearn/feature_extraction/text.py) line 978
In 0.9, the formula for calculating IDF is log(n_samples/ (df + 1)), then in 16.0 the formula become log((n_samples + 1) / (df + 1)) + 1.
According to the Cornell SMART system, the formula of IDF is log((n_samples + 1) / (df + 1)). So, may I know the reason or reference paper for adding another +1 at the end of formula?
*smooth_idf = True
The text was updated successfully, but these errors were encountered:
Hi,
I'm curious for IDF calculations (sklearn/feature_extraction/text.py) line 978
In 0.9, the formula for calculating IDF is log(n_samples/ (df + 1)), then in 16.0 the formula become log((n_samples + 1) / (df + 1)) + 1.
According to the Cornell SMART system, the formula of IDF is log((n_samples + 1) / (df + 1)). So, may I know the reason or reference paper for adding another +1 at the end of formula?
*smooth_idf = True
The text was updated successfully, but these errors were encountered: