8000 BUG: fixes pd.Grouper for non-datetimelike groupings #8866 by jamalsenouci · Pull Request #8964 · pandas-dev/pandas · GitHub
[go: up one dir, main page]

Skip to content

BUG: fixes pd.Grouper for non-datetimelike groupings #8866 #8964

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

jamalsenouci
Copy link

closes #8866

The bug was caused by a couple of things, firstly when specifying only a level, the axis default was None. This meant it failed to create the internal grouper breaking on line 254 of groupby.py:

ax = obj._get_axis(self.axis)

Secondly, the aggregate function is set up to take a basegrouper class whereas for datetime resampling we need to pass a datetime index. I split the _set_grouper method to return a datetime index when a resampling frequency is specified and a BaseGrouper class otherwise. I opted to split that out into two functions to reduce complexity with much of the _set_basegrouper function being borrowed from the _get_grouper function. Let me know if there's a better way to fix this.

returns basegrouper class for non-datetime groupings while still returning a datetime index for datetime grouping
@jreback
Copy link
Contributor
jreback commented Dec 2, 2014

this needs a test. This should be a much simpler change.

@jamalsenouci
Copy link
Author

I thought it was going to be simpler, maybe I've missed something but I couldn't get around that I needed to pass a basegrouper to aggregate and a datetime index to resample.

@jamalsenouci jamalsenouci force-pushed the MultiIndexGroupby branch 2 times, most recently from dbdc3f3 to 2162586 Compare December 2, 2014 13:02
@jreback
Copy link
Contributor 8000
jreback commented Dec 2, 2014

no, this just needs a small change in TimeGrouper to handle the level arg.

when pd.Grouper(freq='....'....) happens, it is immeditately returned as a TimeGrouper. (see __new__). Its not the best design, but TimeGrouper is a valid class.

@jreback jreback added this to the 0.16.0 milestone Dec 2, 2014
@jamalsenouci
Copy link
Author

Ah I see, I presumed TimeGrouper should only be used for datetimes, (was that originally the case?). So anything that gets passed into groupby as a grouper will be handled by the TimeGrouper then. I guess that simplifies things, I'll look into making that change instead.

@jamalsenouci
Copy link
Author

I've had a look again and it's easy enough to return a TimeGrouper when level is specified, however the problem is that the groupby calls the TimeGrouper._get_grouper function which returns a BinGrouper. I'm not sure it makes sense to convert a non-datetime index to a BinGrouper does it? Only BinGrouper and BaseGrouper are currently valid groupers for aggregating.

@jreback
Copy link
Contributor
jreback commented Dec 3, 2014

no this should be virtually identical to setting the index then resampling (in fact that can be w test)

@jamalsenouci
Copy link
Author

how can we resample on a non-datetime index?

@jreback
Copy link
Contributor
jreback commented Dec 4, 2014

I mean groupby.

@jreback
Copy link
Contributor
jreback commented Dec 5, 2014

turns out this was a trivial issue, see #9008

thanks for the effort

@jreback jreback closed this Dec 5, 2014
@jamalsenouci jamalsenouci deleted the MultiIndexGroupby branch December 5, 2014 05:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BUG: pd.Grouper specification broken for non-datetimelike when level specified
2 participants
0