8000 BUG: Maintain column order with groupby.nth by reidy-p · Pull Request #22811 · pandas-dev/pandas · GitHub
[go: up one dir, main page]

Skip to content

BUG: Maintain column order with groupby.nth #22811

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Nov 20, 2018
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
BUG: Maintain column order with groupby.nth
  • Loading branch information
reidy-p committed Nov 10, 2018
commit bc68c371c255e64802d4e30cf9682943cbb99761
1 change: 1 addition & 0 deletions doc/source/whatsnew/v0.24.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -1323,6 +1323,7 @@ Groupby/Resample/Rolling
- Bug in :meth:`DataFrame.resample` and :meth:`Series.resample` when resampling by a weekly offset (``'W'``) across a DST transition (:issue:`9119`, :issue:`21459`)
- Bug in :meth:`DataFrame.expanding` in which the ``axis`` argument was not being respected during aggregations (:issue:`23372`)
- Bug in :meth:`pandas.core.groupby.DataFrameGroupBy.transform` which caused missing values when the input function can accept a :class:`DataFrame` but renames it (:issue:`23455`).
- Bug in :func:`pandas.core.groupby.GroupBy.nth` where column order was not always preserved (:issue:`20760`)

Reshaping
^^^^^^^^^
Expand Down
3 changes: 2 additions & 1 deletion pandas/core/groupby/groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -493,7 +493,8 @@ def _set_group_selection(self):

if len(groupers):
# GH12839 clear selected obj cache when group selection changes
self._group_selection = ax.difference(Index(groupers)).tolist()
self._group_selection = ax.difference(Index(groupers),
sort=False).tolist()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Index.difference tries to sort its result by default and this 8000 means that sometimes the order of the columns was changed from the original DataFrame. I added a new sort parameter to Index.difference with a default of True to control this.

self._reset_cache('_selected_obj')

def _set_result_index_ordered(self, result):
Expand Down
20 changes: 13 additions & 7 deletions pandas/core/indexes/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -2910,17 +2910,20 @@ def intersection(self, other):
taken.name = None
return taken

def difference(self, other):
def difference(self, other, sort=True):
"""
Return a new Index with elements from the index that are not in
`other`.

This is the set difference of two Index objects.
It's sorted if sorting is possible.

Parameters
----------
other : Index or array-like
sort : bool, default True
Sort the resulting index if possible

.. versionadded:: 0.24.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you make sure this is added to all subclasses as well (mutli, interval) I think have there own impl.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you do this (in this PR), can ideally update the tests for .difference for all types to parameterize it where appropriate


Returns
-------
Expand All @@ -2929,10 +2932,12 @@ def difference(self, other):
Examples
--------

>>> idx1 = pd.Index([1, 2, 3, 4])
>>> idx1 = pd.Index([2, 1, 3, 4])
>>> idx2 = pd.Index([3, 4, 5, 6])
>>> idx1.difference(idx2)
Int64Index([1, 2], dtype='int64')
>>> idx1.difference(idx2, sort=False)
Int64Index([2, 1], dtype='int64')

"""
self._assert_can_do_setop(other)
Expand All @@ -2951,10 +2956,11 @@ def difference(self, other):
label_diff = np.setdiff1d(np.arange(this.size), indexer,
assume_unique=True)
the_diff = this.values.take(label_diff)
try:
the_diff = sorting.safe_sort(the_diff)
except TypeError:
pass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you add some tests in the index tests to exercise this (prob just parameterize the parameter in the tests)

if sort:
try:
the_diff = sorting.safe_sort(the_diff)
except TypeError:
pass

return this._shallow_copy(the_diff, name=result_name, freq=None)

Expand Down
24 changes: 24 additions & 0 deletions pandas/tests/groupby/test_nth.py
Original file line number Diff line number Diff line change
Expand Up @@ -390,3 +390,27 @@ def test_nth_empty():
names=['a', 'b']),
columns=['c'])
assert_frame_equal(result, expected)


def test_nth_column_order():
# GH 20760
# Check that nth preserves column order
df = DataFrame([[1, 'b', 100],
[1, 'a', 50],
[1, 'a', np.nan],
[2, 'c', 200],
[2, 'd', 150]],
columns=['A', 'C', 'B'])
result = df.groupby('A').nth(0)
expected = DataFrame([['b', 100.0],
['c', 200.0]],
columns=['C', 'B'],
index=Index([1, 2], name='A'))
assert_frame_equal(result, expected)

result = df.groupby('A').nth(-1, dropna='any')
expected = DataFrame([['a', 50.0],
['d', 150.0]],
columns=['C', 'B'],
index=Index([1, 2], name='A'))
assert_frame_equal(result, expected)
0