-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
FIX permutation_importance
with polars dataframe raises warning on feature names
#28513
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@glemaitre Here are some similar issues to what you were working on. WDYT?
Not sure if I'm using the adapters correctly with this new commit? (Haven't added tests for them yet.) |
Seems like it could be useful and could potentially fix #28488. There are some conflicts to fix and I guess this may need a bit more work as well? |
Sorry I totally forgot about this PR; will revisit in a few days. |
I've resolved the merge conflicts and passed the previously failing checks. But note that it is merely a "working" version and I'm not sure whether I'm using adapters in the correct way. @adrinjalali @glemaitre @lesteve do you have bandwidth to review this one :) |
I'll have a look. |
ping @glemaitre |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll like to have the thoughts on @thomasjpfan regarding this PR: we are stretching the _set_output
scope because now, we just using it as a "dispatcher" for the different container library. This is quite different from the original purpose.
sklearn/utils/_set_output.py
Outdated
# pandas may match the indices of `X` and `col` on certain platforms, but we | ||
# want the column replacement to behave as if `col` is an array without index, | ||
# so we reset the index of `col` in the first place | ||
col.index = X.index |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I find the trick annoying. What I find weird is that this was trigger in the debian 32 bits that should not run with pandas
. So I'm not really sure what is the reason for the bug. I'll probably try to debug more by pushing a couple of commits.
At a glance, I do not like it. I originally intended We have been hiding the complexity of our dataframe operations in functions such as |
Indeed having a nice |
But I can't think of a better way of doing |
I believe I'm making I think this currently is a "working" version, but in a sense of numerical arrays/dataframes. When it comes to object or categorical data something could be broken especially for |
We might also consider https://github.com/narwhals-dev/narwhals soon here (cc @MarcoGorelli ) 😉 |
Looks pretty promising (seems like dataframe version of |
Reference Issues/PRs
Closes #28488.
What does this implement/fix? Explain your changes.
In
permutation_importance
we forcefully convert polars dataframes to arrays, causingUserWarning: X does not have valid feature names, but XXX was fitted with feature names
.I added no extra tests, except for extending all tests on
pandas
to also includepolars
.