10000 DOC Rework plot_grid_search_text_feature_extraction.py example by ArturoAmorQ · Pull Request #23740 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

DOC Rework plot_grid_search_text_feature_extraction.py example #23740

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 51 commits into from
Sep 15, 2022

Conversation

ArturoAmorQ
Copy link
Member
@ArturoAmorQ ArturoAmorQ commented Jun 23, 2022

Reference Issues/PRs

Related to #22928.

What does this implement/fix? Explain your changes.

Yet another release of the revamped examples on text analysis. This is a complement to the plot_document_classification_20newsgroups.py example, where a GridSearchCV is used for text classification but is not shown to keep the example simple.

Any other comments?

Side effect: Implements notebook style as intended in #22406.

Adds plotly to DOC dependencies.

@ogrisel
Copy link
Member
ogrisel commented Jun 28, 2022

Please try to run build_tools/update_environments_and_lock_files.py for the doc builds and commit the resulting lock files to see if it can make the documentation generation work on this PR.

python build_tools/update_environments_and_lock_files.py --select-build doc

@ogrisel
Copy link
Member
ogrisel commented Jun 28, 2022

Ok the build is fixed but the plotly output does not show-up. There is some additional configuration required, see for instance:

https://sphinx-gallery.github.io/dev/auto_examples/plot_9_plotly.html

Copy link
Member
@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something is wrong with the doc-min-dependencies builder. It's been running for more than 2h. I tried to cancel it.

Have you tried to run the example in a local env with the 5.9.0 version of plotly?

Besides here is some feedback on the content of the PR itself.

@ogrisel
Copy link
Member
ogrisel commented Jun 28, 2022

Also it would be interesting to do a scatter plot of the result of the hyper-parameter search that compares mean test accuracy vs mean score duration. This can be done with plotly as in https://nbviewer.org/github/ogrisel/notebooks/blob/master/sklearn_demos/ames_housing.ipynb

Copy link
Member
@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this qualitative improvement, @ArturoAmorQ!

Adding plotly as a doc dependency works for me.

We need to make sure that HTTP links for the previous example (examples/model_selection/grid_search_text_feature_extraction.py) redirect to the new example (examples/model_selection/plot_grid_search_text_feature_extraction.py).

Apart from that, here are minor comments.

Copy link
Member
@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some small changes.

@glemaitre
Copy link
Member

We need to make sure that HTTP links for the previous example (examples/model_selection/grid_search_text_feature_extraction.py) redirect to the new example (examples/model_selection/plot_grid_search_text_feature_extraction.py).

We should have something done for that in the conf.py.

@glemaitre
Copy link
Member

To redirect the examples, it is in doc/conf.py and check the following:

# redirects dictionary maps from old links to new links
redirects = {
    "documentation": "index",
    "auto_examples/feature_selection/plot_permutation_test_for_classification": (
        "auto_examples/model_selection/plot_permutation_tests_for_classification"
    ),
    "modules/model_persistence": "model_persistence",
    "auto_examples/linear_model/plot_bayesian_ridge": (
        "auto_examples/linear_model/plot_ard"
    ),
}
html_context["redirects"] = redirects
for old_link in redirects:
    html_additional_pages[old_link] = "redirects.html"

ArturoAmorQ and others added 10 commits September 13, 2022 11:43
Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
Copy link
Member
@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thank you, @ArturoAmorQ.

@ArturoAmorQ
Copy link
Member Author

LGTM. Thank you, @ArturoAmorQ.

As always, thank you all for your comments!

Copy link
Member
@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it was the only change, I directly pushed it and I am going to merge. Thanks @ArturoAmorQ

@glemaitre glemaitre merged commit 851d02f into scikit-learn:main Sep 15, 2022
@ArturoAmorQ ArturoAmorQ deleted the text_pipeline branch November 17, 2022 13:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0