8000 DOC use notebook-style for plot_train_error_vs_test_error by brendo-k · Pull Request #22440 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

DOC use notebook-style for plot_train_error_vs_test_error #22440

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Feb 14, 2022

Conversation

brendo-k
Copy link
Contributor

Reference Issues/PRs

Update examples/linear_model/plot_lasso_and_elasticnet.py to notebook style, Issue #22406

What does this implement/fix? Explain your changes.

Slit example into:

  • Generate Sample Data
  • Compute train and test errors
  • Plot results functions

Any other comments?

@@ -12,27 +12,30 @@

"""

# %%
# Generate sample data
# --------------------
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can move this section below the author/license

@brendo-k
Copy link
Contributor Author

Hi @glemaitre, I moved the section as you suggested

n_samples_train, n_samples_test, n_features = 75, 150, 500
np.random.seed(0)
coef = np.random.randn(n_features)
coef[50:] = 0.0 # only the top 10 features are impacting the model
X = np.random.randn(n_samples_train + n_samples_test, n_features)
y = np.dot(X, coef)

# %%
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find the code above quite complex. You can replace it with our available helper function from scikit-learn:

from sklearn.datasets import make_regression
from sklearn.model_selection import train_test_split

n_samples_train, n_samples_test, n_features = 75, 150, 500
X, y, coef = make_regression(
    n_samples=n_samples_train + n_samples_test,
    n_features=n_features,
    n_informative=50,
    shuffle=False,
    noise=1.0,
    coef=True,
)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, train_size=n_samples_train, test_size=n_samples_test, shuffle=False
)

@brendo-k
Copy link
Contributor Author

Hi @glemaitre, I updated the code for generating data. Let me know if I should change anything else!

@glemaitre glemaitre merged commit 835904a into scikit-learn:main Feb 14, 2022
@glemaitre
Copy link
Member

Thanks @brendo-k LGTM.

It is always nice to see an improvement in the documentation changing some code from 11 years ago by new helpers that have been proposed on the development path.

@lesteve lesteve mentioned this pull request Feb 17, 2022
47 tasks
@brendo-k brendo-k deleted the example-notebook-style branch February 19, 2022 22:26
thomasjpfan pushed a commit to thomasjpfan/scikit-learn that referenced this pull request Mar 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0