-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
[REVIEW] Add dropna argument to groupby #5579
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Is this the first case of building something in dask.dataframe which is not supported by pandas ? My guess is yes, though could be wrong. It's interesting that dask is now supporting dataframe operations which pandas does not yet have |
Right - #5565 does seem to specify an explicit |
@TomAugspurger if you're comfortable I would prefer to leave reviewing this to you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The implementation looks good, thanks. Just one requested change on the test.
Merging in. Thanks @rjzamora for the fix and @TomAugspurger for the review. |
This PR allows
cudf
-backed groupby operations to include (or exclude) null values in the index. The changes are directly motivated by cudf#3302.Since pandas does not yet support the
dropna
argument (see: pandas#3729 and pandas#21669), the newtest_groupby_dropna_pandas
test only checks for an unknown-argumentTypeError
. A more-thoroughtest_groupby_dropna_cudf
test was also added here, but this was mostly to illustrate the value of these changes. The cudf-based test should most-likely be removed before merging, because it will be skipped by CI anyway.Closes #5565 (Although not a permanent solution -- Cleanup will be necessary once pandas supports the
dropna
argument).black dask
/flake8 dask