Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
BUG ensure that parallel/sequential give the same permutation importances #15933
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG ensure that parallel/sequential give the same permutation importances #15933
Changes from all commits
a47ebe2
ab95fd5
d58272d
7211a53
70e1ef4
a6909ca
a24b9d5
7ad0b26
fb69870
f80dc69
775e986
0631299
1a21a98
023eca2
e213236
be8f1c1
910ef4f
723bf03
f5bda8c
e9770cf
9cdc7b8
03ab3a1
0c25e61
fe4cac6
7bdb93a
d399d96
bdaffb5
42e8cb5
1834fca
74a1c54
5cf37f6
51f7467
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note for the reviewers: the fact that we always make a copy here also fixes the issue with read-only memmaps as reported in #15810.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just by chance, you are not aware of a way to avoid to re-allocating the full dataframe, and instead make a view for most columns except for the one we want to change inplace?
For instance of one makes a slice of a dataframe, and then tries to modify to a column, pandas will raise a warning about a view being modified, but I'm not sure if it will actually change the original dataframe inplace or not in this case...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends on the internal block structure of the dataframe but this is considered private API and is likely to change in future versions of pandas. I would rather stay safe for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the future, I can see this being possible when pandas switches to using a columnar data structure to hold its data as detailed in the pandas roadmap.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Speaking with @jorisvandenbossche, this is the case where one should use
values
, to discard the index.X_permuted.iloc[:, col_idx] = X_permuted[shuffle_idx, col_idx].values
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But
.values
would also discard dtype information forExtendedArray
columns, no?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No it will keep the dtype info of the
ExtendedArray
:That's why you need to use
.to_numpy()
to explicitly try to convert it to a numpy dtype.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alright, good point. Feel free to submit a new PR to simplify the code then :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The
.values
is a bit of a historical mix. For categorical it indeed already returns the categorical array (so preserving the dtype), but eg for periods or datetime with timezone, it does not (now in practice I suppose sklearn will care less about those dtypes).To solve this mix, in more recent versions of pandas there is
.array
to.values
for this use case, but so that's only available for recent versions.