-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
API / CoW: Copy NumPy arrays by default in DataFrame constructor #51731
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
563257e
f3161a3
3a95311
17cf5ae
07aa26d
8e84d85
f3ccf0f
5cdc6ad
3e384ea
49ee53f
fcc7be2
9223836
a474bf5
d5a0268
0be7fc6
293f8a5
3376d06
265d9e3
9ac1bae
db92ce4
be9cb04
4bf3ee8
e2eceec
65965c6
ecb756c
8e837d9
177fbbc
2bbff3b
5bef4ba
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
- Loading branch information
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -14,6 +14,8 @@ | |
import numpy as np | ||
from numpy import ma | ||
|
||
from pandas._config import using_copy_on_write | ||
|
||
from pandas._libs import lib | ||
from pandas._typing import ( | ||
ArrayLike, | ||
|
@@ -289,6 +291,14 @@ def ndarray_to_mgr( | |
if values.ndim == 1: | ||
values = values.reshape(-1, 1) | ||
|
||
elif ( | ||
using_copy_on_write() | ||
and isinstance(values, np.ndarray) | ||
and (dtype is None or is_dtype_equal(values.dtype, dtype)) | ||
): | ||
values = np.array(values, order="F", copy=copy_on_sanitize) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. is the order="F" thing something we should be doing in general in cases with copy=True? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. We are operating on the transposed array when copying a DataFrame object, so not needed in that case There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. xref #44871 took a look at preserving order when doing copy. there are some tradeoffs here |
||
values = _ensure_2d(values) | ||
|
||
elif isinstance(values, (np.ndarray, ExtensionArray, ABCSeries, Index)): | ||
# drop subclass info | ||
values = np.array(values, copy=copy_on_sanitize) | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the order="F" specifically needed for CoW? (and can you add a comment about it)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is about #50756
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I understand that it's related to that, but I don't understand why we would only do it for CoW? The default is not to copy arrays (without CoW) at the moment, which creates this inefficient layout; but so if the user manually specifies copy=True in the constructor, why not directly copy it to the desired layout in all cases without the if/else check for CoW?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah did not think about this, will add
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on @jbrockmendel s comment above I think we should leave it out for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Leave it out in general (from this PR), or you mean not do it for the default mode for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely default mode and maybe also split the copy and the layout change into two PRs?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
reiterating preference for this to be separate. The two of you are much more familiar with the CoW logic than most of the rest of us; i get antsy when i see using_copy_on_write checks appearing in new places
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I removed all order relevant changes, is more straightforward now