8000 API: honor copy=True when passing dict to DataFrame by jbrockmendel · Pull Request #38939 · pandas-dev/pandas · GitHub
[go: up one dir, main page]

Skip to content

API: honor copy=True when passing dict to DataFrame #38939

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 69 commits into from
Mar 31, 2021
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
69 commits
Select commit Hold shift + click to select a range
cbc97f0
ENH: allow non-consolidation in constructors
jbrockmendel Oct 5, 2020
a135d96
Merge branch 'master' of https://github.com/pandas-dev/pandas into my…
jbrockmendel Oct 5, 2020
5c94129
mypy fixup
jbrockmendel Oct 5, 2020
d653c54
ENH: allow non-consolidation in constructors
jbrockmendel Oct 5, 2020
e71e319
Merge branch 'myway-init-no-consolidate' of github.com:jbrockmendel/p…
jbrockmendel Jan 3, 2021
cafa718
Merge branch 'master' of https://github.com/pandas-dev/pandas into my…
jbrockmendel Jan 3, 2021
c706ad6
Merge branch 'master' of https://github.com/pandas-dev/pandas into my…
jbrockmendel Jan 3, 2021
3f9195e
Merge branch 'master' of https://github.com/pandas-dev/pandas into my…
jbrockmendel Jan 4, 2021
1cba671
Merge branch 'master' of https://github.com/pandas-dev/pandas into my…
jbrockmendel Jan 4, 2021
396daba
BUG: respect copy=False in constructing DataFrame from dict
jbrockmendel Jan 4, 2021
11ae1c9
whatsnew
jbrockmendel Jan 4, 2021
b505267
clean test
jbrockmendel Jan 4, 2021
a1e9b68
Merge branch 'master' of https://github.com/pandas-dev/pandas into my…
jbrockmendel Jan 4, 2021
b70d997
fixed xfail
jbrockmendel Jan 4, 2021
09213e0
update whatsnew
jbrockmendel Jan 4, 2021
e895de0
Merge branch 'master' of https://github.com/pandas-dev/pandas into my…
jbrockmendel Jan 4, 2021
31bda58
de-kludge
jbrockmendel Jan 4, 2021
37a2c0c
remove no-longer-used msg
jbrockmendel Jan 4, 2021
b93d7d5
Merge branch 'master' of https://github.com/pandas-dev/pandas into my…
jbrockmendel Jan 10, 2021
aa667a6
Merge branch 'master' of https://github.com/pandas-dev/pandas into my…
jbrockmendel Jan 12, 2021
185cd99
fix broken test
jbrockmendel Jan 12, 2021
701356f
Consolidate in tm, update whatsnew
jbrockmendel Jan 20, 2021
948ac67
Merge branch 'master' of https://github.com/pandas-dev/pandas into my…
jbrockmendel Jan 21, 2021
46f2fcf
always copy when data is None
jbrockmendel Jan 21, 2021
0bbfec0
Merge branch 'master' into myway-init-no-consolidate
jbrockmendel Jan 23, 2021
590c820
update exception message
jbrockmendel Jan 23, 2021
17a693c
Merge branch 'master' of https://github.com/pandas-dev/pandas into my…
jbrockmendel Jan 27, 2021
fb8f32d
update exception message
jbrockmendel Jan 27, 2021
7835184
typo fixup
jbrockmendel Jan 28, 2021
2136289
Merge branch 'master' into myway-init-no-consolidate
jbrockmendel Feb 2, 2021
187499c
Merge branch 'master' into myway-init-no-consolidate
jbrockmendel Feb 2, 2021
6f30beb
Merge branch 'master' of https://github.com/pandas-dev/pandas into my…
jbrockmendel Feb 4, 2021
e80d57c
Merge branch 'master' into myway-init-no-consolidate
jbrockmendel Feb 11, 2021
b0a6abd
CI: fix broken asv
jbrockmendel Feb 11, 2021
bf942ae
revert
jbrockmendel Feb 11, 2021
95e30a5
Merge branch 'master' into myway-init-no-consolidate
jbrockmendel Feb 11, 2021
510f697
Merge branch 'master' into myway-init-no-consolidate
jbrockmendel Feb 13, 2021
48e359e
Default to copy=True for dict data
jbrockmendel Feb 13, 2021
048e826
troubleshoot docbuild
jbrockmendel Feb 16, 2021
6a9c9f0
Merge branch 'master' into myway-init-no-consolidate
jbrockmendel Feb 17, 2021
a17c728
Merge branch 'master' into myway-init-no-consolidate
jbrockmendel Feb 18, 2021
5b3d419
update whatsnew
jbrockmendel Feb 18, 2021
f961378
Merge branch 'master' into myway-init-no-consolidate
jbrockmendel Feb 18, 2021
fcee44b
Merge branch 'master' into myway-init-no-consolidate
jbrockmendel Feb 27, 2021
0c60ae8
Merge branch 'master' into myway-init-no-consolidate
jbrockmendel Feb 27, 2021
1468e59
Merge branch 'master' into myway-init-no-consolidate
jbrockmendel Feb 27, 2021
b6d8b70
Merge branch 'master' into myway-init-no-consolidate
jbrockmendel Mar 3, 2021
8b66b11
skip for ArrayManager
jbrockmendel Mar 3, 2021
54cacfc
Update doc/source/whatsnew/v1.3.0.rst
jbrockmendel Mar 6, 2021
5ea7a75
Merge branch 'master' into myway-init-no-consolidate
jbrockmendel Mar 6, 2021
e11ea68
requested edits
jbrockmendel Mar 6, 2021
65d01c7
Merge branch 'master' into myway-init-no-consolidate
jbrockmendel Mar 8, 2021
41c4e7a
test test_df_mod_zero_df with and without copy
jbrockmendel Mar 8, 2021
7260a72
collect copy-adjusting code in one place
jbrockmendel Mar 9, 2021
52344bb
Merge branch 'master' into myway-init-no-consolidate
jbrockmendel Mar 14, 2021
3ddc3d3
update docstring
jbrockmendel Mar 14, 2021
e6bae0f
whatsnew, comment
jbrockmendel Mar 14, 2021
7cab084
Merge branch 'master' into myway-init-no-consolidate
jbrockmendel Mar 15, 2021
e8e3d84
Merge branch 'master' into myway-init-no-consolidate
jbrockmendel Mar 15, 2021
5c44953
mypy fixup
jbrockmendel Mar 15, 2021
e32f630
Merge branch 'master' into myway-init-no-consolidate
jbrockmendel Mar 16, 2021
abd890a
Merge branch 'master' into myway-init-no-consolidate
jbrockmendel Mar 16, 2021
1b7f7ca
Update pandas/core/frame.py
jbrockmendel Mar 16, 2021
b326b5f
Merge branch 'master' into myway-init-no-consolidate
jbrockmendel Mar 22, 2021
b7aed5d
Merge branch 'master' into myway-init-no-consolidate
jbrockmendel Mar 26, 2021
4d20fe7
Merge branch 'master' into myway-init-no-consolidate
jbrockmendel Mar 30, 2021
ad5485a
add versionchanged
jbrockmendel Mar 30, 2021
6bed6ac
Merge branch 'master' into myway-init-no-consolidate
jbrockmendel Mar 30, 2021
98b6dff
update bc .values has changed to DTA/TDA
jbrockmendel Mar 30, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
ENH: allow non-consolidation in constructors
  • Loading branch information
jbrockmendel committed Nov 30, 2020
commit d653c5435729c6bb4a9e6d51445d85932ec70e9a
84 changes: 72 additions & 12 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -421,6 +421,8 @@ class DataFrame(NDFrame, OpsMixin):
Data type to force. Only a single dtype is allowed. If None, infer.
copy : bool, default False
Copy data from inputs. Only affects DataFrame / 2d ndarray input.
consolidate : bool or None, default None
Whether to consolidate the arrays in the new DataFrame.

See Also
--------
Expand Down Expand Up @@ -508,12 +510,16 @@ def __init__(
columns: Optional[Axes] = None,
dtype: Optional[Dtype] = None,
copy: bool = False,
consolidate=None,
):
if data is None:
data = {}
if dtype is not None:
dtype = self._validate_dtype(dtype)

if consolidate is None:
consolidate = not copy

if isinstance(data, DataFrame):
data = data._mgr

Expand All @@ -528,7 +534,7 @@ def __init__(
)

elif isinstance(data, dict):
mgr = init_dict(data, index, columns, dtype=dtype)
mgr = init_dict(data, index, columns, dtype=dtype, consolidate=consolidate)
elif isinstance(data, ma.MaskedArray):
import numpy.ma.mrecords as mrecords

Expand All @@ -545,19 +551,41 @@ def __init__(
data[mask] = fill_value
else:
data = data.copy()
mgr = init_ndarray(data, index, columns, dtype=dtype, copy=copy)
mgr = init_ndarray(
data,
index,
columns,
dtype=dtype,
copy=copy,
consolidate=consolidate,
)

elif isinstance(data, (np.ndarray, Series, Index)):
if data.dtype.names:
data_columns = list(data.dtype.names)
data = {k: data[k] for k in data_columns}
if columns is None:
columns = data_columns
mgr = init_dict(data, index, columns, dtype=dtype)
mgr = init_dict(
data, index, columns, dtype=dtype, consolidate=consolidate
)
elif getattr(data, "name", None) is not None:
mgr = init_dict({data.name: data}, index, columns, dtype=dtype)
mgr = init_dict(
{data.name: data},
index,
columns,
dtype=dtype,
consolidate=consolidate,
)
else:
mgr = init_ndarray(data, index, columns, dtype=dtype, copy=copy)
mgr = init_ndarray(
data,
index,
columns,
dtype=dtype,
copy=copy,
consolidate=consolidate,
)

# For data is list-like, or Iterable (will consume into list)
elif isinstance(data, abc.Iterable) and not isinstance(data, (str, bytes)):
Expand All @@ -581,11 +609,27 @@ def __init__(
else:
index = ibase.default_index(len(data))

mgr = arrays_to_mgr(arrays, columns, index, columns, dtype=dtype)
mgr = arrays_to_mgr(
arrays,
columns,
index,
columns,
dtype=dtype,
consolidate=consolidate,
)
else:
mgr = init_ndarray(data, index, columns, dtype=dtype, copy=copy)
mgr = init_ndarray(
data,
index,
columns,
dtype=dtype,
copy=copy,
consolidate=consolidate,
)
else:
mgr = init_dict({}, index, columns, dtype=dtype)
mgr = init_dict(
{}, index, columns, dtype=dtype, consolidate=consolidate
)
# For data is scalar
else:
if index is None or columns is None:
Expand All @@ -601,7 +645,9 @@ def __init__(
construct_1d_arraylike_from_scalar(data, len(index), dtype)
for _ in range(len(columns))
]
mgr = arrays_to_mgr(values, columns, index, columns, dtype=None)
mgr = arrays_to_mgr(
values, columns, index, columns, dtype=None, consolidate=consolidate
)
else:
# Attempt to coerce to a numpy array
try:
Expand All @@ -621,7 +667,12 @@ def __init__(
)

mgr = init_ndarray(
values, index, columns, dtype=values.dtype, copy=False
values,
index,
columns,
dtype=values.dtype,
copy=False,
consolidate=consolidate,
)

NDFrame.__init__(self, mgr)
Expand Down Expand Up @@ -1733,6 +1784,7 @@ def from_records(
columns=None,
coerce_float=False,
nrows=None,
consolidate: bool = True,
) -> DataFrame:
"""
Convert structured or record ndarray to DataFrame.
Expand Down Expand Up @@ -1760,6 +1812,8 @@ def from_records(
decimal.Decimal) to floating point, useful for SQL result sets.
nrows : int, default None
Number of rows to read if data is an iterator.
consolidate: bool, default True
Whether to consolidate the arrays in the new DataFrame.

Returns
-------
Expand Down Expand Up @@ -1895,7 +1949,9 @@ def from_records(
arr_columns = arr_columns.drop(arr_exclude)
columns = columns.drop(exclude)

mgr = arrays_to_mgr(arrays, arr_columns, result_index, columns)
mgr = arrays_to_mgr(
arrays, arr_columns, result_index, columns, consolidate=consolidate
)

return cls(mgr)

Expand Down Expand Up @@ -2074,6 +2130,7 @@ def _from_arrays(
index,
dtype: Optional[Dtype] = None,
verify_integrity: bool = True,
consolidate: bool = True,
) -> DataFrame:
"""
Create DataFrame from a list of arrays corresponding to the columns.
Expand All @@ -2094,6 +2151,8 @@ def _from_arrays(
stored in a block (numpy ndarray or ExtensionArray), have the same
length as and are aligned with the index, and that `columns` and
`index` are ensured to be an Index object.
consolidate: bool, default True
Whether to consolidate the passed arrays in the new DataFrame.

Returns
-------
Expand All @@ -2109,6 +2168,7 @@ def _from_arrays(
columns,
dtype=dtype,
verify_integrity=verify_integrity,
consolidate=consolidate,
)
return cls(mgr)

Expand Down Expand Up @@ -6047,7 +6107,7 @@ def _dispatch_frame_op(self, right, func, axis: Optional[int] = None):
raise NotImplementedError(right)

return type(self)._from_arrays(
arrays, self.columns, self.index, verify_integrity=False
arrays, self.columns, self.index, verify_integrity=False, consolidate=False
)

def _combine_frame(self, other: DataFrame, func, fill_value=None):
Expand Down
42 changes: 34 additions & 8 deletions pandas/core/internals/construction.py
Original file line number Diff line number Diff line change
Expand Up @@ -66,6 +66,7 @@ def arrays_to_mgr(
columns,
dtype: Optional[DtypeObj] = None,
verify_integrity: bool = True,
consolidate: bool = True,
):
"""
Segregate Series based on type and coerce into matrices.
Expand All @@ -92,7 +93,9 @@ def arrays_to_mgr(
# from BlockManager perspective
axes = [columns, index]

return create_block_manager_from_arrays(arrays, arr_names, axes)
return create_block_manager_from_arrays(
arrays, arr_names, axes, consolidate=consolidate
)


def masked_rec_array_to_mgr(
Expand Down Expand Up @@ -131,7 +134,9 @@ def masked_rec_array_to_mgr(
if columns is None:
columns = arr_columns

mgr = arrays_to_mgr(arrays, arr_columns, index, columns, dtype)
mgr = arrays_to_mgr(
arrays, arr_columns, index, columns, dtype, consolidate=True
) # FIXME: dont hardcode

if copy:
mgr = mgr.copy()
Expand All @@ -142,7 +147,14 @@ def masked_rec_array_to_mgr(
# DataFrame Constructor Interface


def init_ndarray(values, index, columns, dtype: Optional[DtypeObj], copy: bool):
def init_ndarray(
values,
index,
columns,
dtype: Optional[DtypeObj],
copy: bool,
consolidate: bool = True,
):
# input must be a ndarray, list, Series, index

if isinstance(values, ABCSeries):
Expand Down Expand Up @@ -171,7 +183,9 @@ def init_ndarray(values, index, columns, dtype: Optional[DtypeObj], copy: bool):
values = values.copy()

index, columns = _get_axes(len(values), 1, index, columns)
return arrays_to_mgr([values], columns, index, columns, dtype=dtype)
return arrays_to_mgr(
[values], columns, index, columns, dtype=dtype, consolidate=consolidate
)
elif is_extension_array_dtype(values) or is_extension_array_dtype(dtype):
# GH#19157

Expand All @@ -185,7 +199,9 @@ def init_ndarray(values, index, columns, dtype: Optional[DtypeObj], copy: bool):
if columns is None:
columns = Index(range(len(values)))

return arrays_to_mgr(values, columns, index, columns, dtype=dtype)
return arrays_to_mgr(
values, columns, index, columns, dtype=dtype, consolidate=consolidate
)

# by definition an array here
# the dtypes will be coerced to a single dtype
Expand Down Expand Up @@ -235,10 +251,18 @@ def init_ndarray(values, index, columns, dtype: Optional[DtypeObj], copy: bool):
else:
block_values = [values]

return create_block_manager_from_blocks(block_values, [columns, index])
return create_block_manager_from_blocks(
block_values, [columns, index], consolidate=consolidate
)


def init_dict(data: Dict, index, columns, dtype: Optional[DtypeObj] = None):
def init_dict(
data: Dict,
index,
columns,
dtype: Optional[DtypeObj] = None,
consolidate: bool = True,
):
"""
Segregate Series based on type and coerce into matrices.
Needs to handle a lot of exceptional cases.
Expand Down Expand Up @@ -284,7 +308,9 @@ def init_dict(data: Dict, index, columns, dtype: Optional[DtypeObj] = None):
arrays = [
arr if not is_datetime64tz_dtype(arr) else arr.copy() for arr in arrays
]
return arrays_to_mgr(arrays, data_names, index, columns, dtype=dtype)
return arrays_to_mgr(
arrays, data_names, index, columns, dtype=dtype, consolidate=consolidate
)


# ---------------------------------------------------------------------
Expand Down
17 changes: 12 additions & 5 deletions pandas/core/internals/managers.py
Original file line number Diff line number Diff line change
Expand Up @@ -1648,7 +1648,9 @@ def fast_xs(self, loc):
# Constructor Helpers


def create_block_manager_from_blocks(blocks, axes: List[Index]) -> BlockManager:
def create_block_manager_from_blocks(
blocks, axes: List[Index], consolidate: bool = True
) -> BlockManager:
try:
if len(blocks) == 1 and not isinstance(blocks[0], Block):
# if blocks[0] is of length 0, return empty blocks
Expand All @@ -1665,7 +1667,8 @@ def create_block_manager_from_blocks(blocks, axes: List[Index]) -> BlockManager:
]

mgr = BlockManager(blocks, axes)
mgr._consolidate_inplace()
if consolidate:
mgr._consolidate_inplace()
return mgr

except ValueError as e:
Expand All @@ -1675,7 +1678,10 @@ def create_block_manager_from_blocks(blocks, axes: List[Index]) -> BlockManager:


def create_block_manager_from_arrays(
arrays, names: Index, axes: List[Index]
arrays,
names: Index,
axes: List[Index],
consolidate: bool = True,
) -> BlockManager:
assert isinstance(names, Index)
assert isinstance(axes, list)
Expand All @@ -1687,10 +1693,11 @@ def create_block_manager_from_arrays(
try:
blocks = _form_blocks(arrays, names, axes)
mgr = BlockManager(blocks, axes)
mgr._consolidate_inplace()
return mgr
except ValueError as e:
raise construction_error(len(arrays), arrays[0].shape, axes, e)
if consolidate:
mgr._consolidate_inplace()
return mgr


def construction_error(tot_items, block_shape, axes, e=None):
Expand Down
0