8000 REF: remove Block.concat_same_type by jbrockmendel · Pull Request #33486 · pandas-dev/pandas · GitHub
[go: up one dir, main page]

Skip to content

REF: remove Block.concat_same_type #33486

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 43 commits into from
Apr 15, 2020
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
43 commits
Select commit Hold shift + click to select a range
cb8f6c6
REF: reshape.concat operate on arrays, not SingleBlockManagers
jbrockmendel Mar 29, 2020
5fe3348
Merge branch 'master' of https://github.com/pandas-dev/pandas into no…
jbrockmendel Mar 30, 2020
2a2c9e7
Merge branch 'master' of https://github.com/pandas-dev/pandas into no…
jbrockmendel Mar 30, 2020
2e774f2
Merge branch 'master' of https://github.com/pandas-dev/pandas into no…
jbrockmendel Mar 31, 2020
e008f40
xfail more selectively
jbrockmendel Mar 31, 2020
a244f15
Merge branch 'master' of https://github.com/pandas-dev/pandas into no…
jbrockmendel Apr 4, 2020
9d52e7e
Merge branch 'master' of https://github.com/pandas-dev/pandas into no…
jbrockmendel Apr 6, 2020
3f0ee1b
Revert PandasArray.astype patch
jbrockmendel Apr 6, 2020
2da47de
DOC: Fix examples in `pandas/core/strings.py` (#33328)
ShaharNaveh Apr 6, 2020
9585a41
DOC: do not include type hints in signature in html docs (#33312)
jorisvandenbossche Apr 6, 2020
ed862c0
BUG: DataFrame fail to construct when data is list and columns is nes…
charlesdong1991 Apr 6, 2020
c57f6e7
API/CLN: simplify CategoricalBlock.replace (#33279)
jbrockmendel Apr 6, 2020
2b322d2
REF: BlockManager.delete -> idelete (#33332)
jbrockmendel Apr 6, 2020
d4d7538
TST: Don't use 'is' on strings to avoid SyntaxWarning (#33322)
rebecca-palmer Apr 6, 2020
e3eb29c
CLN: remove fill_tuple kludge (#33310)
jbrockmendel Apr 6, 2020
fcfa7c4
TST: misplaced reduction/indexing tests (#33307)
jbrockmendel Apr 6, 2020
7a468b0
BUG: Don't raise on value_counts for empty Int64 (#33339)
dsaxton Apr 6, 2020
0a2b9cd
REGR: Fix bug when replacing categorical value with self (#33292)
dsaxton Apr 6, 2020
5a38119
Pass method in __finalize__ (#33273)
TomAugspurger Apr 6, 2020
4f1fb46
DOC: Added an example for each series.dt field accessor (#33259)
ShaharNaveh Apr 6, 2020
8150c11
BUG: Timestamp+- ndarray[td64] (#33296)
jbrockmendel Apr 6, 2020
9585ae4
BUG: 2D indexing on DTA/TDA/PA (#33290)
jbrockmendel Apr 6, 2020
c05d28b
REF: dispatch TDBlock.to_native_types to TDA._format_native_types (#3…
jbrockmendel Apr 6, 2020
047e5d7
REF: put concatenate_block_managers in internals.concat (#33231)
jbrockmendel Apr 6, 2020
0e382f2
TST: Add tests for duplicated and drop_duplicates (#32575)
mproszewska Apr 6, 2020
717662b
Ods loses spaces 32207 (#33233)
detrout Apr 6, 2020
9c1984c
PERF: masked ops for reductions (min/max) (#33261)
jorisvandenbossche Apr 6, 2020
efce8fc
REF: do concat on values, avoid blocks
jbrockmendel Apr 7, 2020
362e86c
CLN: Clean nanops.get_corr_func (#33244)
dsaxton Apr 7, 2020
3ad2110
[DOC]: Mention default behaviour of index_col in readcsv (#32977)
bharatr21 Apr 7, 2020
629d7c5
Merge branch 'master' of https://github.com/pandas-dev/pandas into no…
jbrockmendel Apr 7, 2020
859327d
Merge branch 'master' of https://github.com/pandas-dev/pandas into no…
jbrockmendel Apr 8, 2020
87c1006
Merge branch 'master' of https://github.com/pandas-dev/pandas into no…
jbrockmendel Apr 10, 2020
3a84357
Merge branch 'master' of https://github.com/pandas-dev/pandas into no…
jbrockmendel Apr 10, 2020
3ee8363
Remove Block.concat_same_type
jbrockmendel Apr 11, 2020
9e6c7ed
Merge branch 'master' of https://github.com/pandas-dev/pandas into no…
jbrockmendel Apr 11, 2020
fd7c72e
Merge branch 'master' of https://github.com/pandas-dev/pandas into no…
jbrockmendel Apr 11, 2020
41d6da0
use concat_compat
jbrockmendel Apr 11, 2020
5d567f0
Merge branch 'master' of https://github.com/pandas-dev/pandas into no…
jbrockmendel Apr 12, 2020
2e070ca
combine cases
jbrockmendel Apr 12, 2020
9b6d3ac
Merge branch 'master' of https://github.com/pandas-dev/pandas into no…
jbrockmendel Apr 13, 2020
858658a
Merge branch 'master' of https://github.com/pandas-dev/pandas into no…
jbrockmendel Apr 14, 2020
675a948
Dummy commit to force CI
jbrockmendel Apr 14, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
PERF: masked ops for reductions (min/max) (#33261)
  • Loading branch information
jorisvandenbossche authored and jbrockmendel committed Apr 7, 2020
commit 9c1984c5ce7648eb5a613637791492030801d43a
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v1.1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -276,7 +276,7 @@ Performance improvements
sparse values from ``scipy.sparse`` matrices using the
:meth:`DataFrame.sparse.from_spmatrix` constructor (:issue:`32821`,
:issue:`32825`, :issue:`32826`, :issue:`32856`, :issue:`32858`).
- Performance improvement in :meth:`Series.sum` for nullable (integer and boolean) dtypes (:issue:`30982`).
- Performance improvement in reductions (sum, min, max) for nullable (integer and boolean) dtypes (:issue:`30982`, :issue:`33261`).


.. ---------------------------------------------------------------------------
Expand Down
41 changes: 41 additions & 0 deletions pandas/core/array_algos/masked_reductions.py
Original file line number Diff line number Diff line change
Expand Up @@ -45,3 +45,44 @@ def sum(
return np.sum(values[~mask])
else:
return np.sum(values, where=~mask)


def _minmax(func, values: np.ndarray, mask: np.ndarray, skipna: bool = True):
"""
Reduction for 1D masked array.

Parameters
----------
func : np.min or np.max
values : np.ndarray
Numpy array with the values (can be of any dtype that support the
operation).
mask : np.ndarray
Boolean numpy array (True values indicate missing values).
skipna : bool, default True
Whether to skip NA.
"""
if not skipna:
if mask.any():
return libmissing.NA
else:
if values.size:
return func(values)
else:
# min/max with empty array raise in numpy, pandas returns NA
return libmissing.NA
else:
subset = values[~mask]
if subset.size:
return func(values[~mask])
else:
# min/max with empty array raise in numpy, pandas returns NA
return libmissing.NA


def min(values: np.ndarray, mask: np.ndarray, skipna: bool = True):
return _minmax(np.min, values=values, mask=mask, skipna=skipna)


def max(values: np.ndarray, mask: np.ndarray, skipna: bool = True):
return _minmax(np.max, values=values, mask=mask, skipna=skipna)
8 changes: 3 additions & 5 deletions pandas/core/arrays/boolean.py
Original file line number Diff line number Diff line change
Expand Up @@ -696,8 +696,9 @@ def _reduce(self, name: str, skipna: bool = True, **kwargs):
data = self._data
mask = self._mask

if name == "sum":
return masked_reductions.sum(data, mask, skipna=skipna, **kwargs)
if name in {"sum", "min", "max"}:
op = getattr(masked_reductions, name)
return op(data, mask, skipna=skipna, **kwargs)

# coerce to a nan-aware float if needed
if self._hasna:
Expand All @@ -715,9 +716,6 @@ def _reduce(self, name: str, skipna: bool = True, **kwargs):
if int_result == result:
result = int_result

elif name in ["min", "max"] and notna(result):
result = np.bool_(result)

return result

def _maybe_mask_result(self, result, mask, other, op_name: str):
Expand Down
7 changes: 4 additions & 3 deletions pandas/core/arrays/integer.py
Original file line number Diff line number Diff line change
Expand Up @@ -562,8 +562,9 @@ def _reduce(self, name: str, skipna: bool = True, **kwargs):
data = self._data
mask = self._mask

if name == "sum":
return masked_reductions.sum(data, mask, skipna=skipna, **kwargs)
if name in {"sum", "min", "max"}:
op = getattr(masked_reductions, name)
return op(data, mask, skipna=skipna, **kwargs)

# coerce to a nan-aware float if needed
# (we explicitly use NaN within reductions)
Expand All @@ -582,7 +583,7 @@ def _reduce(self, name: str, skipna: bool = True, **kwargs):

# if we have a preservable numeric op,
# provide coercion back to an integer type if possible
elif name in ["min", "max", "prod"]:
elif name == "prod":
# GH#31409 more performant than casting-then-checking
result = com.cast_scalar_indexer(result)

Expand Down
2 changes: 1 addition & 1 deletion pandas/tests/arrays/integer/test_dtypes.py
Original file line number Diff line number Diff line change
Expand Up @@ -34,7 +34,7 @@ def test_preserve_dtypes(op):

# op
result = getattr(df.C, op)()
if op == "sum":
if op in {"sum", "min", "max"}:
assert isinstance(result, np.int64)
else:
assert isinstance(result, int)
Expand Down
62 changes: 45 additions & 17 deletions pandas/tests/reductions/test_reductions.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,27 +65,58 @@ def test_ops(self, opname, obj):
assert result.value == expected

@pytest.mark.parametrize("opname", ["max", "min"])
def test_nanops(self, opname, index_or_series):
@pytest.mark.parametrize(
"dtype, val",
[
("object", 2.0),
("float64", 2.0),
("datetime64[ns]", datetime(2011, 11, 1)),
("Int64", 2),
("boolean", True),
],
)
def test_nanminmax(self, opname, dtype, val, index_or_series):
# GH#7261
klass = index_or_series
arg_op = "arg" + opname if klass is Index else "idx" + opname

obj = klass([np.nan, 2.0])
assert getattr(obj, opname)() == 2.0
if dtype in ["Int64", "boolean"] and klass == pd.Index:
pytest.skip("EAs can't yet be stored in an index")

obj = klass([np.nan])
assert pd.isna(getattr(obj, opname)())
assert pd.isna(getattr(obj, opname)(skipna=False))
def check_missing(res):
if dtype == "datetime64[ns]":
return res is pd.NaT
elif dtype == "Int64":
return res is pd.NA
else:
return pd.isna(res)

obj = klass([], dtype=object)
assert pd.isna(getattr(obj, opname)())
assert pd.isna(getattr(obj, opname)(skipna=False))
obj = klass([None], dtype=dtype)
assert check_missing(getattr(obj, opname)())
assert check_missing(getattr(obj, opname)(skipna=False))

obj = klass([pd.NaT, datetime(2011, 11, 1)])
# check DatetimeIndex monotonic path
assert getattr(obj, opname)() == datetime(2011, 11, 1)
assert getattr(obj, opname)(skipna=False) is pd.NaT
obj = klass([], dtype=dtype)
assert check_missing(getattr(obj, opname)())
assert check_missing(getattr(obj, opname)(skipna=False))

if dtype == "object":
# generic test with object only works for empty / all NaN
return

obj = klass([None, val], dtype=dtype)
assert getattr(obj, opname)() == val
assert check_missing(getattr(obj, opname)(skipna=False))

obj = klass([None, val, None], dtype=dtype)
assert getattr(obj, opname)() == val
assert check_missing(getattr(obj, opname)(skipna=False))

@pytest.mark.parametrize("opname", ["max", "min"])
def test_nanargminmax(self, opname, index_or_series):
# GH#7261
klass = index_or_series
arg_op = "arg" + opname if klass is Index else "idx" + opname

obj = klass([pd.NaT, datetime(2011, 11, 1)])
assert getattr(obj, arg_op)() == 1
result = getattr(obj, arg_op)(skipna=False)
if klass is Series:
Expand All @@ -95,9 +126,6 @@ def test_nanops(self, opname, index_or_series):

obj = klass([pd.NaT, datetime(2011, 11, 1), pd.NaT])
# check DatetimeIndex non-monotonic path
assert getattr(obj, opname)(), datetime(2011, 11, 1)
assert getattr(obj, opname)(skipna=False) is pd.NaT

assert getattr(obj, arg_op)() == 1
result = getattr(obj, arg_op)(skipna=False)
if klass is Series:
Expand Down
0