8000 BUG: DataFrameGroupBy.__getitem__ fails to propagate dropna by arw2019 · Pull Request #35078 · pandas-dev/pandas · GitHub
[go: up one dir, main page]

Skip to content

BUG: DataFrameGroupBy.__getitem__ fails to propagate dropna #35078

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
23c05f7
add values.dtype.kind==f branch to array_with_unit_datetime
arw2019 Jun 27, 2020
f564d48
revert pandas/_libs/tslib.pyx
arw2019 Jun 29, 2020
afe1869
added cast_from_unit definition for float
arw2019 Jun 27, 2020
c2594e0
revert accidental changes
arw2019 Jun 29, 2020
ef36084
revert changes
arw2019 Aug 7, 2020
cd92bc7
revert accidental changes
arw2019 Jun 29, 2020
077bd8e
update Grouping.indicies to return for nan values
arw2019 Jul 1, 2020
72f66d4
updated _GroupBy._get_index to return for nan values
arw2019 Jul 1, 2020
c5c3a28
revert accidental changes
arw2019 Jul 1, 2020
0214470
revert accidental changes
arw2019 Jul 1, 2020
2a3a86b
revert accidental changes
arw2019 Jul 1, 2020
cf71b75
styling change
arw2019 Jul 1, 2020
9051166
added tests
arw2019 Jul 2, 2020
84e04c0
fixed groupby/groupby.py's _get_indicies
arw2019 Jul 6, 2020
86ce781< 8000 /code>
removed debug statement
arw2019 Jul 6, 2020
7090e2d
fixed naming error in test
arw2019 Jul 7, 2020
68d9b78
remove type coercion block
arw2019 Jul 7, 2020
9caba29
added missing values handing for _GroupBy.get_group method
arw2019 Jul 7, 2020
8e3a460
updated indicies for case dropna=True
arw2019 Jul 7, 2020
b2ed957
cleaned up syntax
arw2019 Jul 7, 2020
6d87151
cleaned up syntax
arw2019 Jul 7, 2020
c67e072
removed print statements
arw2019 Jul 7, 2020
ed338e1
_transform_general: add a check that we don't accidentally upcast
arw2019 Jul 7, 2020
9ffef45
_transform_general: add int32, float32 to upcasting check
arw2019 Jul 7, 2020
736ac69
rewrite for loop as list comprehension
arw2019 Jul 7, 2020
68902eb
rewrote if statement as dict comp + ternary
arw2019 Jul 7, 2020
c6668f0
fixed small bug in list comp in groupby/groupby.py
arw2019 Jul 7, 2020
46949ea
deleted debug statement in groupby/groupby.py
arw2019 Jul 7, 2020
e16a495
rewrite _get_index using next_iter to set default value
arw2019 Jul 7, 2020
e00d71d
update exepcted test_groupby_nat_exclude for new missing values handling
arw2019 Jul 7, 2020
6d5a441
remove print statement
arw2019 Jul 7, 2020
9c24cf2
reworked solution
arw2019 Jul 9, 2020
5637c3e
fixed PEP8 issue
arw2019 Jul 9, 2020
29c13f6
run pre-commit checks
arw2019 Jul 9, 2020
2ea68af
styling fix
arw2019 Jul 9, 2020
3f5c6d6
update whatnew + styling improvements
arw2019 Jul 9, 2020
10147b0
move NaN handling to _get_indicies
arw2019 Jul 22, 2020
c9f6f7e
removed 1.1 release note
arw2019 Jul 22, 2020
9b536dd
redo solution - modify SeriesGroupBy._transform_general only
arw2019 Jul 22, 2020
2c9de8e
Merge remote-tracking branch 'upstream/master' into groupby-getitem-d…
arw2019 Aug 7, 2020
8d991d5
rewrite case + rewrite tests w fixtures
arw2019 Aug 7, 2020
570ce21
fix mypy error
arw2019 Aug 7, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
8000
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
redo solution - modify SeriesGroupBy._transform_general only
  • Loading branch information
arw2019 committed Aug 7, 2020
commit 9b536dd2d281e564474a8de91b38f672cea3c4f5
14 changes: 0 additions & 14 deletions doc/source/whatsnew/v1.1.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1017,7 +1017,6 @@ Indexing

Missing
^^^^^^^
<<<<<<< HEAD
- Calling :meth:`fillna` on an empty :class:`Series` now correctly returns a shallow copied object. The behaviour is now consistent with :class:`Index`, :class:`DataFrame` and a non-empty :class:`Series` (:issue:`32543`).
- Bug in :meth:`Series.replace` when argument ``to_replace`` is of type dict/list and is used on a :class:`Series` containing ``<NA>`` was raising a ``TypeError``. The method now handles this by ignoring ``<NA>`` values when doing the comparison for the replacement (:issue:`32621`)
- Bug in :meth:`~Series.any` and :meth:`~Series.all` incorrectly returning ``<NA>`` for all ``False`` or all ``True`` values using the nulllable Boolean dtype and with ``skipna=False`` (:issue:`33253`)
Expand All @@ -1026,19 +1025,6 @@ Missing
- Bug in :meth:`DataFrame.interpolate` when called on a :class:`DataFrame` with column names of string type was throwing a ValueError. The method is now independent of the type of the column names (:issue:`33956`)
- Passing :class:`NA` into a format string using format specs will now work. For example ``"{:.1f}".format(pd.NA)`` would previously raise a ``ValueError``, but will now return the string ``"<NA>"`` (:issue:`34740`)
- Bug in :meth:`Series.map` not raising on invalid ``na_action`` (:issue:`32815`)
=======
- Calling :meth:`fillna` on an empty Series now correctly returns a shallow copied object. The behaviour is now consistent with :class:`Index`, :class:`DataFrame` and a non-empty :class:`Series` (:issue:`32543`).
- Bug in :meth:`replace` when argument ``to_replace`` is of type dict/list and is used on a :class:`Series` containing ``<NA>`` was raising a ``TypeError``. The method now handles this by ignoring ``<NA>`` values when doing the comparison for the replacement (:issue:`32621`)
- Bug in :meth:`~Series.any` and :meth:`~Series.all` incorrectly returning ``<NA>`` for all ``False`` or all ``True`` values using the nulllable boolean dtype and with ``skipna=False`` (:issue:`33253`)
- Clarified documentation on interpolate with method =akima. The ``der`` parameter must be scalar or None (:issue:`33426`)
- :meth:`DataFrame.interpolate` uses the correct axis convention now. Previously interpolating along columns lead to interpolation along indices and vice versa. Furthermore interpolating with methods ``pad``, ``ffill``, ``bfill`` and ``backfill`` are identical to using these methods with :meth:`fillna` (:issue:`12918`, :issue:`29146`)
- Bug in :meth:`DataFrame.interpolate` when called on a DataFrame with column names of string type was throwing a ValueError. The method is no independing of the type of column names (:issue:`33956`)
- passing :class:`NA` will into a format string using format specs will now work. For example ``"{:.1f}".format(pd.NA)`` would previously raise a ``ValueError``, but will now return the string ``"<NA>"`` (:issue:`34740`)
<<<<<<< HEAD
- Bug in :meth:`SeriesGroupBy.transform` now correctly handles missing values for `dropna=False` (:issue:`35014`)
>>>>>>> 90e9b6a10... update whatnew + styling improvements
=======
>>>>>>> 8c11b6072... removed 1.1 release note

MultiIndex
^^^^^^^^^^
Expand Down
2 changes: 1 addition & 1 deletion doc/source/whatsnew/v1.2.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,7 @@ Indexing
Missing
^^^^^^^

-
- Bug in :meth:`SeriesGroupBy.transform` now correctly handles missing values for `dropna=False` (:issue:`35014`)
-

MultiIndex
Expand Down
20 changes: 0 additions & 20 deletions pandas/_libs/tslib.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -246,31 +246,11 @@ def array_with_unit_to_datetime(
if ((fvalues < Timestamp.min.value).any()
or (fvalues > Timestamp.max.value).any()):
raise OutOfBoundsDatetime(f"cannot convert input with unit '{unit}'")
<<<<<<< HEAD
<<<<<<< HEAD
=======
>>>>>>> 7df44d10f... revert accidental changes
result = (iresult * m).astype('M8[ns]')
iresult = result.view('i8')
iresult[mask] = NPY_NAT
return result, tz

<<<<<<< HEAD
=======
# GH20445
if values.dtype.kind == 'i':
result = (iresult * m).astype('M8[ns]')
iresult = result.view('i8')
iresult[mask] = NPY_NAT
return result, tz
elif values.dtype.kind == 'f':
result = (fresult * m_as_float).astype('M8[ns]')
fresult = result.view('f8')
fresult[mask] = NPY_NAT
return result, tz
>>>>>>> f1ae8f562... _libs/tslib.pyx added comments
=======
>>>>>>> 7df44d10f... revert accidental changes
result = np.empty(n, dtype='M8[ns]')
iresult = result.view('i8')

Expand Down
4 changes: 0 additions & 4 deletions pandas/_libs/tslibs/conversion.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -120,11 +120,7 @@ cdef inline int64_t cast_from_unit(object ts, str unit) except? -1:
return <int64_t>(base * m) + <int64_t>(frac * m)


<<<<<<< HEAD
cpdef inline (int64_t, int) precision_from_unit(str unit):
=======
cpdef inline object precision_from_unit(str unit):
>>>>>>> 6b9d4de82... revert changes
"""
Return a casting of the unit represented to nanoseconds + the precision
to round the fractional part.
Expand Down
20 changes: 9 additions & 11 deletions pandas/core/groupby/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,6 @@
maybe_cast_result_dtype,
maybe_convert_objects,
maybe_downcast_numeric,
maybe_downcast_to_dtype,
)
from pandas.core.dtypes.common import (
ensure_int64,
Expand Down Expand Up @@ -535,26 +534,25 @@ def _transform_general(
if isinstance(res, (ABCDataFrame, ABCSeries)):
res = res._values

indexer = self._get_index(name)
ser = klass(res, indexer)
results.append(ser)
results.append(klass(res, index=group.index))

# check for empty "results" to avoid concat ValueError
if results:
from pandas.core.reshape.concat import concat

result = concat(results).sort_index()
concatenated = concat(results)
result = self._set_result_index_ordered(concatenated)
else:
result = self.obj._constructor(dtype=np.float64)

# we will only try to coerce the result type if
# we have a numeric dtype, as these are *always* user-defined funcs
# the cython take a different path (and casting)
# make sure we don't accidentally upcast (GH35014)
types = ["bool", "int32", "int64", "float32", "float64"]
dtype = self._selected_obj.dtype
if is_numeric_dtype(dtype) and types.index(dtype) < types.index(result.dtype):
result = maybe_downcast_to_dtype(result, dtype)
if is_numeric_dtype(result.dtype):
common_dtype = np.find_common_type(
[self._selected_obj.dtype, result.dtype], []
)
if common_dtype is result.dtype:
result = maybe_downcast_numeric(result, self._selected_obj.dtype)

result.name = self._selected_obj.name
result.index = self._selected_obj.index
Expand Down
10 changes: 2 additions & 8 deletions pandas/core/groupby/groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,7 +54,6 @@ class providing the base-class of operations.
)
from pandas.core.dtypes.missing import isna, notna

import pandas as pd
from pandas.core import nanops
import pandas.core.algorithms as algorithms
from pandas.core.arrays import Categorical, DatetimeArray
Expand Down Expand Up @@ -624,12 +623,7 @@ def get_converter(s):
converter = get_converter(index_sample)
names = (converter(name) for name in names)

return [
self.indices.get(name, [])
if not isna(name)
else self.indices.get(pd.NaT, [])
for name in names
]
return [self.indices.get(name, []) for name in names]

def _get_index(self, name):
"""
Expand Down Expand Up @@ -813,7 +807,7 @@ def get_group(self, name, obj=None):
if obj is None:
obj = self._selected_obj

inds = self._get_index(pd.NaT) if pd.isna(name) else self._get_index(name)
inds = self._get_index(name)
if not len(inds):
raise KeyError(name)

Expand Down
12 changes: 1 addition & 11 deletions pandas/core/groupby/grouper.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,6 @@
)
from pandas.core.dtypes.generic import ABCSeries

import pandas as pd
import pandas.core.algorithms as algorithms
from pandas.core.arrays import Categorical, ExtensionArray
import pandas.core.common as com
Expand Down Expand Up @@ -558,16 +557,7 @@ def indices(self):
return self.grouper.indices

values = Categorical(self.grouper)

# GH35014
reverse_indexer = values._reverse_indexer()
if not self.dropna and any(pd.isna(v) for v in values):
return {
**reverse_indexer,
pd.NaT: np.array([i for i, v in enumerate(values) if pd.isna(v)]),
}
else:
return reverse_indexer
return values._reverse_indexer()

@property
def codes(self) -> np.ndarray:
Expand Down
0