-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
TfidfVectorizer documentation doesn't track TfidfTransformer #6766
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Actually, I'm not entirely happy with the documentation of TfidfTransformer. |
and neither documents the default value of "norm". |
slightly unrelated: the |
@amueller I may I take up this task?. As far as I understand, the scope of this task is to update the TfidfVectorizer's docstring. Please let me know incase I have missed any details. |
@krishnakalyan3 go ahead, but @amueller suggested |
Apart from not being sure what distinction you make between 'count' and 'frequency', "document frequency" is jargon in Information Retrieval, meaning specifically, "the number of documents a particular term appears in". |
Shouldn't the default value of analyzer be given in the comments as well? |
@HimaVarsha94 yes but that seems to be a different issue. |
Someone can take this up from #6839 from what it looks like. |
I'll take a look at this today. Plan is to start by making the TfIDFVectorizer docstring provide more detail and potentially wordsmith docstrings for TfIDfVectorizer and TfIdftransformer. |
thanks @y3l2n :) |
btw @y3l2n there was an earlier attempt at #6839 but I'm not sure what happened to it. |
@amueller @y3l2n I was not exactly sure how to go about it, thats why I closed this PR. I am not working on this issue anymore. Please feel free to work on it. |
I am working on this! |
There are important details of the tf-idf transformation in the TfidfTransformer that are not reflected in the TfidfVectorizer's docstring. I think they should be added there.
Also, I think the formula that is used should be added to the narrative.
The text was updated successfully, but these errors were encountered: