8000 DOC TfidfVectorizer documentation details by krishnakalyan3 · Pull Request #6839 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

DOC TfidfVectorizer documentation details #6839

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

krishnakalyan3
Copy link
Contributor
@krishnakalyan3 krishnakalyan3 commented May 29, 2016

Reference Issue

Fixes #6766

What does this implement/fix? Explain your changes.

Added important details of the tf-idf transformation to TfidfVectorizer's docstring.
Added default values of norm.

Any other comments?


Tf is "n" (natural) by default, "l" (logarithmic) when sublinear_tf=True.
Idf is "t" when use_idf is given, "n" (none) otherwise.
Normalization is "c" (cosine) when norm='l2', "n" (none) when norm=None.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel these 3 lines can be omitted. The parameters below state the same thing..

@jnothman
Copy link
Member
jnothman commented Jun 3, 2016

@MechCoder all (or is it only most) of this text is copied from TfidfTransformer

@jnothman
Copy link
Member
jnothman commented Jun 3, 2016

At #6766, @amueller felt there were other aspects that needed further explication in TfidfTransformer and the relate narrative documentation (i.e. the user guide in the doc folder), as well as TfidfVectorizer, but other than making the formula more explicit (I don't know what precise fault he sees in the current version), I'm not sure what. Do you, @krishnakalyan3, feel that between the docstrings and the narrative documentation it's clear what TfidfTransformer is doing (and why)?

@krishnakalyan3
Copy link
Contributor Author

@jnothman @MechCoder you are right, I have copied the text from TfidfTransformer. I understood the narrative after going though related issue like #2998 (There is an extra tf to boost the term frequencies apparently gives better results for classification and clustering ). I am not totally sure what more exactly should be added to the documentation. I would be grateful If you could guide. Thanks

@krishnakalyan3 krishnakalyan3 deleted the 6766 branch June 18, 2016 19:22
@MechCoder
Copy link
Member

@krishnakalyan3 Why did you close this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

TfidfVectorizer documentation doesn't track TfidfTransformer
3 participants
0