8000 BUG: DataFrameGroupBy.__getitem__ fails to propagate dropna by arw2019 · Pull Request #35078 · pandas-dev/pandas · GitHub
[go: up one dir, main page]

Skip to content

BUG: DataFrameGroupBy.__getitem__ fails to propagate dropna #35078

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
42 commits
Select commit Hold shift + click to select a range
23c05f7
add values.dtype.kind==f branch to array_with_unit_datetime
arw2019 Jun 27, 2020
f564d48
revert pandas/_libs/tslib.pyx
arw2019 Jun 29, 2020
afe1869
added cast_from_unit definition for float
arw2019 Jun 27, 2020
c2594e0
revert accidental changes
arw2019 Jun 29, 2020
ef36084
revert changes
arw2019 Aug 7, 2020
cd92bc7
revert accidental changes
arw2019 Jun 29, 2020
077bd8e
update Grouping.indicies to return for nan values
arw2019 Jul 1, 2020
72f66d4
updated _GroupBy._get_index to return for nan values
arw2019 Jul 1, 2020
c5c3a28
revert accidental changes
arw2019 Jul 1, 2020
0214470
revert accidental changes
arw2019 Jul 1, 2020
2a3a86b
revert accidental changes
arw2019 Jul 1, 2020
cf71b75
styling change
arw2019 Jul 1, 2020
9051166
added tests
arw2019 Jul 2, 2020
84e04c0
fixed groupby/groupby.py's _get_indicies
arw2019 Jul 6, 2020
8000 86ce781
removed debug statement
arw2019 Jul 6, 2020
7090e2d
fixed naming error in test
arw2019 Jul 7, 2020
68d9b78
remove type coercion block
arw2019 Jul 7, 2020
9caba29
added missing values handing for _GroupBy.get_group method
arw2019 Jul 7, 2020
8e3a460
updated indicies for case dropna=True
arw2019 Jul 7, 2020
b2ed957
cleaned up syntax
arw2019 Jul 7, 2020
6d87151
cleaned up syntax
arw2019 Jul 7, 2020
c67e072
removed print statements
arw2019 Jul 7, 2020
ed338e1
_transform_general: add a check that we don't accidentally upcast
arw2019 Jul 7, 2020
9ffef45
_transform_general: add int32, float32 to upcasting check
arw2019 Jul 7, 2020
736ac69
rewrite for loop as list comprehension
arw2019 Jul 7, 2020
68902eb
rewrote if statement as dict comp + ternary
arw2019 Jul 7, 2020
c6668f0
fixed small bug in list comp in groupby/groupby.py
arw2019 Jul 7, 2020
46949ea
deleted debug statement in groupby/groupby.py
arw2019 Jul 7, 2020
e16a495
rewrite _get_index using next_iter to set default value
arw2019 Jul 7, 2020
e00d71d
update exepcted test_groupby_nat_exclude for new missing values handling
arw2019 Jul 7, 2020
6d5a441
remove print statement
arw2019 Jul 7, 2020
9c24cf2
reworked solution
arw2019 Jul 9, 2020
5637c3e
fixed PEP8 issue
arw2019 Jul 9, 2020
29c13f6
run pre-commit checks
arw2019 Jul 9, 2020
2ea68af
styling fix
arw2019 Jul 9, 2020
3f5c6d6
update whatnew + styling improvements
arw2019 Jul 9, 2020
10147b0
move NaN handling to _get_indicies
arw2019 Jul 22, 2020
c9f6f7e
removed 1.1 release note
arw2019 Jul 22, 2020
9b536dd
redo solution - modify SeriesGroupBy._transform_general only
arw2019 Jul 22, 2020
2c9de8e
Merge remote-tracking branch 'upstream/master' into groupby-getitem-d…
arw2019 Aug 7, 2020
8d991d5
rewrite case + rewrite tests w fixtures
arw2019 Aug 7, 2020
570ce21
fix mypy error
arw2019 Aug 7, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
reworked solution
  • Loading branch information
arw2019 committed Aug 7, 2020
commit 9c24cf286fd9e24374bc38f23ab91dd1d3eea8e5
15 changes: 5 additions & 10 deletions pandas/core/groupby/groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -624,21 +624,16 @@ def get_converter(s):
converter = get_converter(index_sample)
names = (converter(name) for name in names)

res = [
[v for k, v in self.indices.items() if isna(k)]
if isna(name)
else [self.indices.get(name, [])]
for name in names
]

return res
return [self.indices.get(name, []) for name in names]

def _get_index(self, name):
"""
Safe get index, translate keys for datelike to underlying repr.
"""
return next(iter(self._get_indices([name])), [])

if isna(name):
return self._get_indices([pd.NaT])[0]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we would want _get_indices to handle a null rather than this way

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok! moved this

else:
return self._get_indices([name])[0]
@cache_readonly
def _selected_obj(self):
# Note: _selected_obj is always just `self.obj` for SeriesGroupBy
Expand Down
8 changes: 3 additions & 5 deletions pandas/core/groupby/grouper.py
Original file line number Diff line number Diff line change
Expand Up @@ -561,11 +561,9 @@ def indices(self):

# GH35014
reverse_indexer = values._reverse_indexer()
return (
{**reverse_indexer, pd.NaT: [i for i, v in enumerate(values) if pd.isna(v)]}
if not self.dropna and any(pd.isna(v) for v in values)
else reverse_indexer
)
res = {**reverse_indexer, pd.NaT: np.array([i for i, v in enumerate(values) if pd.isna(v)])} if not self.dropna and any(pd.isna(v) for v in values) else reverse_indexer
print(f"grouper.py Grouping.indices returns {res}")
return res

@property
def codes(self) -> np.ndarray:
Expand Down
5 changes: 2 additions & 3 deletions pandas/tests/groupby/test_groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -1260,15 +1260,14 @@ def test_groupby_nat_exclude():
}

for k in grouped.indices:
if pd.isna(k):
continue # GH 35014
tm.assert_numpy_array_equal(grouped.indices[k], expected[k])

tm.assert_frame_equal(grouped.get_group(Timestamp("2013-01-01")), df.iloc[[1, 7]])
tm.assert_frame_equal(grouped.get_group(Timestamp("2013-02-01")), df.iloc[[3, 5]])

# GH35014
grouped.get_group(pd.NaT)
with pytest.raises(KeyError):
grouped.get_group(pd.NaT)

nan_df = DataFrame(
{"nan": [np.nan, np.nan, np.nan], "nat": [pd.NaT, pd.NaT, pd.NaT]}
Expand Down
0