-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
DOC Rework Importance of Feature Scaling example #25012
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Weird that the CIs did not start. I merged |
…arn into scaling_importance
…nto scaling_importance
…nto scaling_importance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My 5 cent.
Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>
…arn into scaling_importance
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
…arn into scaling_importance
I think all of your comments have been addressed, @glemaitre and @lorentzenchr. |
I will have to check the rendering but I think that the proposal is already an improvement. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, some nitpicks.
) | ||
scaled_X_train = scaler.fit_transform(X_train) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Optional: We could show the mean value of each feature, or min and max.
|
||
# %% | ||
# The need for regularization is higher (lower values of `C 6D47 `) for the data | ||
# that was not scaled before applying PCA. From the plot we can confirm that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which plot? Is it over- or underfitting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By plotting the validation curves I realized that the training and test accuracy overlap too much to make a proper statement about over- or underfitting for the scenario with no standardization.
I think that it is better to avoid mentioning over-/underfitting to keep the example as simple as possible.
Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>
…nto scaling_importance
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only nitpicks. Otherwise LGTM.
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
I certainly broke the linter with my suggestion. Sorry @ArturoAmorQ |
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com>
…nto scaling_importance
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>
Co-authored-by: Guillaume Lemaitre <g.lemaitre58@gmail.com> Co-authored-by: Christian Lorentzen <lorentzen.ch@gmail.com>
Reference Issues/PRs
Fixes #12282.
What does this implement/fix? Explain your changes.
This example can benefit from a "tutorialization". In particular, this PR adds a section regarding how nearest neighbors is sensitive to scaling.
Any other comments?
Side effect: Implements notebook style as intended in #22406.