8000 DOC Rework plot_hashing_vs_dict_vectorizer.py example by ArturoAmorQ · Pull Request #23266 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

DOC Rework plot_hashing_vs_dict_vectorizer.py example #23266

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 29 commits into from
May 30, 2022

Conversation

ArturoAmorQ
Copy link
Member

Reference Issues/PRs

Related to #22928

What does this implement/fix? Explain your changes.

In #22928 we remove the use of HashingVectorizer from the plot_document_classification_20newsgroups.py example for the sake of simplicity.
A comparison of the performance of hashers and vectorizers can be moved to this existing example.

Any other comments?

Side effect: Implements notebook style as intended in #22406

@lesteve lesteve added the Quick Review For PRs that are quick to review label May 11, 2022
@lesteve lesteve removed the Quick Review For PRs that are quick to review label May 13, 2022
Copy link
Member
@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, here is a batch of feedback.

ArturoAmorQ and others added 3 commits May 19, 2022 16:53
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
@ArturoAmorQ ArturoAmorQ changed the title [WIP] DOC Rework plot_hashing_vs_dict_vectorizer.py example DOC Rework plot_hashing_vs_dict_vectorizer.py example May 20, 2022
Copy link
Member
@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks very much @ArturoAmorQ, this notebook is much nicer than the original benchmark script.

Here is a final batch of suggestions for improvement:

ArturoAmorQ and others added 3 commits May 23, 2022 17:58
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
ArturoAmorQ and others added 3 commits May 25, 2022 10:48
Copy link
Member
@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @ArturoAmorQ.

I think one should use other terms to make this example more accurate.

This is for instance the case of:

  • "frequency" which can be replace by "occurence (counts)" (to respect the the definition)
  • "speed" which can be replaced by "data processing rate" (to respect the unit (bytes/sec))

Here are some comments and formatting fixes.


Edit: not related to this PR, but #23004 might come with new changes for this example then.

Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
@ArturoAmorQ
Copy link
Member Author

Thanks @ogrisel and @jjerphan. This notebook is much more clearer thanks to your comments.

Copy link
Member
@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @ArturoAmorQ.

Edit: I let @ogrisel merge if everything LGTH.

Copy link
Member
@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM again, just a final batch of nitpicks + a formatting fix.

@ogrisel ogrisel merged commit 6ff214c into scikit-learn:main May 30, 2022
@ogrisel
Copy link
Member
ogrisel commented May 30, 2022

Merged, thank you very much for the nice contribution @ArturoAmorQ!

@ArturoAmorQ ArturoAmorQ deleted the compare_vectorizers branch June 9, 2022 13:29
ogrisel added a commit to ogrisel/scikit-learn that referenced this pull request Jul 11, 2022
…3266)


Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Aug 4, 2022
…3266)


Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
glemaitre pushed a commit that referenced this pull request Aug 5, 2022
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0