-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
FIX Allow input validation by pass in MLPClassifier #24873
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
With arrays created inside the estimator we do not need to perform validation again. This change allows the stimator to compute the score during fitting, without having to provide column names for the validation dataset.
Hi @betatim, thanks for the pull request ( :) ).
|
TIL about |
This is ready for review. No Changelog entry because I don't think this is needed for this change. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the PR @betatim !
All comments addressed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a big enough change to have a change log entry as a bug fix.
Added what's new entry, targeting v1.3. Not sure about how/when things still get added to the upcoming v1.2 but I guess we can do that in addition? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
ebde1d2
to
300a827
Compare
Thanks for the review @glemaitre. I've implemented your changes. |
LGTM. Thanks @betatim |
With arrays created inside the estimator we do not need to perform validation again. This change allows the estimator to compute the score during fitting, without having to provide column names for the validation dataset.
This solves a problem where fitting data with feature names generates warnings. The reason for the warnings is that the input validation strips the column names from the input data, which is then split into a train and validation set. When the validation set is passed to
score
, it performs input validation and raises a warning because the input data doesn't have feature names.Reference Issues/PRs
closes #24846
What does this implement/fix? Explain your changes.
This adds a private score method which is used during fitting with early stopping. The private score method uses a
_predict
method that does not perform input data validation. This is Ok because the validation dataset is derived from atrain_test_split
of "input validated data".Alternatives
We could turn
X_val
into a dataframe when it is created by usingself.feature_names_in_
. Feels awkward.We could delay setting
feature_names_in_
until the fitting is complete, would require reworking the validation functions.What do people think?