DOC use notebook-style for plot_train_error_vs_test_error #22440

brendo-k · 2022-02-10T18:34:08Z

Reference Issues/PRs

Update examples/linear_model/plot_lasso_and_elasticnet.py to notebook style, Issue #22406

What does this implement/fix? Explain your changes.

Slit example into:

Generate Sample Data
Compute train and test errors
Plot results functions

Any other comments?

glemaitre · 2022-02-11T11:12:28Z

examples/model_selection/plot_train_error_vs_test_error.py

@@ -12,27 +12,30 @@

 """

+# %%
+# Generate sample data
+# --------------------


You can move this section below the author/license

brendo-k · 2022-02-11T13:51:02Z

Hi @glemaitre, I moved the section as you suggested

glemaitre · 2022-02-11T14:36:24Z

examples/model_selection/plot_train_error_vs_test_error.py

 n_samples_train, n_samples_test, n_features = 75, 150, 500
 np.random.seed(0)
 coef = np.random.randn(n_features)
 coef[50:] = 0.0  # only the top 10 features are impacting the model
 X = np.random.randn(n_samples_train + n_samples_test, n_features)
 y = np.dot(X, coef)

+# %%


I find the code above quite complex. You can replace it with our available helper function from scikit-learn:

from sklearn.datasets import make_regression from sklearn.model_selection import train_test_split n_samples_train, n_samples_test, n_features = 75, 150, 500 X, y, coef = make_regression( n_samples=n_samples_train + n_samples_test, n_features=n_features, n_informative=50, shuffle=False, noise=1.0, coef=True, ) X_train, X_test, y_train, y_test = train_test_split( X, y, train_size=n_samples_train, test_size=n_samples_test, shuffle=False )

brendo-k · 2022-02-11T16:57:23Z

Hi @glemaitre, I updated the code for generating data. Let me know if I should change anything else!

glemaitre · 2022-02-14T10:47:12Z

Thanks @brendo-k LGTM.

It is always nice to see an improvement in the documentation changing some code from 11 years ago by new helpers that have been proposed on the development path.

…rn#22440)

updated notebook style for plot_train_error_vs_test_error

20e4355

github-actions bot added the Documentation label Feb 10, 2022 8000

glemaitre reviewed Feb 11, 2022

View reviewed changes

moved generate sample data header

a0644bf

glemaitre reviewed Feb 11, 2022

View reviewed cha 8000 nges

Generate sample data using scikit-learn helper functions

3c0780b

glemaitre merged commit 835904a into scikit-learn:main Feb 14, 2022

lesteve mentioned this pull request Feb 17, 2022

Fix notebook-style examples #22406

Closed

47 tasks

brendo-k deleted the example-notebook-style branch February 19, 2022 22:26

thomasjpfan pushed a commit to thomasjpfan/scikit-learn that referenced this pull request Mar 1, 2022

DOC use notebook-style for plot_train_error_vs_test_error (scikit-lea…

c97e032

…rn#22440)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC use notebook-style for plot_train_error_vs_test_error #22440

DOC use notebook-style for plot_train_error_vs_test_error #22440

DOC use notebook-style for plot_train_error_vs_test_error #22440

DOC use notebook-style for plot_train_error_vs_test_error #22440

Conversation

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Choose a reason for hiding this comment

Choose a reason for hiding this comment