-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
Cythonized GroupBy Fill #19673
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cythonized GroupBy Fill #19673
Changes from 1 commit
c58123c
2bc8023
3cb25c0
7fecc11
3c2fb36
a52b8c4
16c1823
bd3d5e0
cae65af
0266514
50dc690
9fa8e25
5da06d8
2fe91a4
825ba17
127c71c
3a23cd6
a363146
fd513c8
776d1b7
33f0d06
662008a
27e24fa
6f72476
eff6603
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
- Loading branch information
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1882,10 +1882,10 @@ def _get_cythonized_result(self, how, grouper, needs_mask=False, | |
base_func = getattr(libgroupby, how) | ||
|
||
for name, obj in self._iterate_slices(): | ||
indexer = np.zeros_like(labels) | ||
indexer = np.zeros_like(labels, dtype=np.int64) | ||
func = partial(base_func, indexer, labels) | ||
if needs_mask: | ||
mask = isnull(obj.values).astype(np.uint8, copy=False) | ||
mask = isnull(obj.values).view(np.uint8) | ||
func = partial(func, mask) | ||
|
||
if needs_ngroups: | ||
|
@@ -4633,12 +4633,11 @@ def _apply_to_column_groupbys(self, func): | |
keys=self._selected_obj.columns, axis=1) | ||
|
||
def _fill(self, direction, limit=None): | ||
"""Overriden method to concat grouped columns in output""" | ||
"""Overriden method to join grouped columns in output""" | ||
res = super()._fill(direction, limit=limit) | ||
output = collections.OrderedDict() | ||
for grp in self.grouper.groupings: | ||
ser = grp.group_index.take(grp.labels) | ||
output[ser.name] = ser.values | ||
output = collections.OrderedDict( | ||
(grp.name, grp.group_index.take(grp.labels)) for grp in | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. maybe missing something obvious, but isn't the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Certainly could be a better way to do it - first time working this far into the API so if there's any easier way you know of that I missed let me know. FWIW I think that instead of the There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry for the delay - was working from my memory and I honestly get lost on all the levels of groupby, groupers, and groupings. What you want here is In [40]: df = pd.DataFrame({'k': [1, 1, 2, 2, 1, 1], 'v': [1., np.nan, 2., np.nan, np.nan, 3.]})
In [41]: gb = df.groupby('k')
In [42]: [grp.grouper for grp in gb.grouper.groupings]
Out[42]: [array([1, 1, 2, 2, 1, 1], dtype=int64)] |
||
self.grouper.groupings) | ||
|
||
return self._wrap_transformed_output(output).join(res) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. here, rather than joining, could't the entries of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As is the super call already wraps it's There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. you can do a concat no? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. concat should work just as well. Is that out of performance reasons we prefer that to join? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. just idiomatic |
||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dont need to use .values here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing
.values
for this returns aSeries
object so I do think we need this to get the ndarray before passing to CythonThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right this is ok