8000 BUG: Adding skipna as an option to groupby cumsum and cumprod by shangyian · Pull Request #19914 · pandas-dev/pandas · GitHub
[go: up one dir, main page]

Skip to content

BUG: Adding skipna as an option to groupby cumsum and cumprod #19914

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Mar 1, 2018
Prev Previous commit
Next Next commit
Fixing pep8 issues
  • Loading branch information
shangyian committed Feb 26, 2018
commit 75d2870857c4c67697f236042338389d9750d56d
18 changes: 12 additions & 6 deletions pandas/tests/groupby/test_groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -2523,16 +2523,22 @@ def test_groupby_cumprod(self):

# make sure that skipna works
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you split this into a new test? Make sure to add a reference to the Github issue as a comment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd find this a bit easier to read as

df = pd.DataFrame({"x": [...], "gp": ['a'] * 6 + ['b'] * 6})

instead of concat.

Copy link
8000
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And maybe have a group with all-NA to ensure that is handled properly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a new test (test_groupby_cum_skipna) for all of the skipna-related cases. And added a groupby+cumsum/prod on all-NA values.

df = pd.concat(
[pd.DataFrame({'x': [1.0, 2.0, np.nan, np.nan, 3.0, 2.0], 'gp': 'a'}),
pd.DataFrame({'x': [2.0, 5.0, 6.0, 1.0, np.nan, 1.0], 'gp': 'b'})])
[pd.DataFrame({'x': [1.0, 2.0, np.nan, np.nan, 3.0, 2.0],
'gp': 'a'}),
pd.DataFrame({'x': [2.0, 5.0, 6.0, 1.0, np.nan, 1.0],
'gp': 'b'})])
result = df.groupby('gp')['x'].cumprod(skipna=False)
expected = pd.Series([1.0, 2.0, np.nan, np.nan, np.nan, np.nan, 2.0, 10.0, 60.0, 60.0, np.nan, np.nan],
name='x', index=(0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5))
expected = pd.Series([1.0, 2.0, np.nan, np.nan, np.nan, np.nan,
2.0, 10.0, 60.0, 60.0, np.nan, np.nan],
name='x', index=(0, 1, 2, 3, 4, 5,
0, 1, 2, 3, 4, 5))
tm.assert_series_equal(result, expected)

result = df.groupby('gp')['x'].cumprod()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think you need this test. skipna=False should be adequately tested as is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, removed it.

expected = pd.Series([1.0, 2.0, np.nan, np.nan, 6.0, 12.0, 2.0, 10.0, 60.0, 60.0, np.nan, 60.0],
name='x', index=(0, 1, 2, 3, 4, 5, 0, 1, 2, 3, 4, 5))
expected = pd.Series([1.0, 2.0, np.nan, np.nan, 6.0, 12.0,
2.0, 10.0, 60.0, 60.0, np.nan, 60.0],
name='x', index=(0, 1, 2, 3, 4, 5,
0, 1, 2, 3, 4, 5))
tm.assert_series_equal(result, expected)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a test for cumsum with skipna=False`?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should be covered by test_groupby_cum_skipna now.

def test_ops_general(self):
Expand Down
0