10000 DEPR: deprecate relableling dicts in groupby.agg by jreback · Pull Request #15931 · pandas-dev/pandas · GitHub
[go: up one dir, main page]

Skip to content

DEPR: deprecate relableling dicts in groupby.agg #15931

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 13, 2017
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
docs & fix window test
  • Loading branch information
jreback committed Apr 12, 2017
commit 7262515a3ef245ab7d9eeb75616f7362d8532d66
31 changes: 19 additions & 12 deletions doc/source/whatsnew/v0.20.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -429,7 +429,6 @@ Using ``.iloc``. Here we will get the location of the 'A' column, then use *posi
df.iloc[[0, 2], df.columns.get_loc('A')]

8000
<<<<<<< c25fbde09272f369f280212e5216441d5975687c
.. _whatsnew_0200.api_breaking.deprecate_panel:

Deprecate Panel
Expand Down Expand Up @@ -462,33 +461,41 @@ Convert to an xarray DataArray
Deprecate groupby.agg() with a dictionary when renaming
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

The ``.groupby(..).agg(..)`` syntax can accept a variable of inputs, including scalars, list, and a dictionary of column names to scalars or lists.
This provides a useful syntax for constructing multiple (potentially different) aggregations for a groupby.
The ``.groupby(..).agg(..)``, ``.rolling(..).agg(..)``, and ``.resample(..).agg(..)`` syntax can accept a variable of inputs, including scalars,
list, and a dict of column names to scalars or lists. This provides a useful syntax for constructing multiple
(potentially different) aggregations.

1) We are deprecating passing a dictionary to a grouped ``Series``. This allowed one to ``rename`` the resulting aggregation, but this had a completely different
meaning that passing a dictionary to a grouped ``DataFrame``, which accepts column-to-aggregations.
2) We are deprecating passing a dict-of-dict to a grouped ``DataFrame`` in a similar manner.
However, ``.agg(..)`` can *also* accept a dict that allows 'renaming' of the result columns. This is a complicated and confusing syntax, as well as not consistent
between ``Series`` and ``DataFrame``. We are deprecating this 'renaming' functionarility.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo in functionarility

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


Here's an example of 1), passing a dict to a grouped ``Series``:
1) We are deprecating passing a dict to a grouped/rolled/resampled ``Series``. This allowed
one to ``rename`` the resulting aggregation, but this had a completely different
meaning than passing a dictionary to a grouped ``DataFrame``, which accepts column-to-aggregations.
2) We are deprecating passing a dict-of-dict to a grouped/rolled/resampled ``DataFrame`` in a similar manner.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dict-of-dicts

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done


This is an illustrative example:

.. ipython:: python

df = pd.DataFrame({'A': [1, 1, 1, 2, 2],
'B': range(5),
'C':range(5)})
'C': range(5)})
df

Aggregating a DataFrame with column selection.
Here is a typical useful syntax for computing different aggregations for different columns. This
is a natural (and useful) syntax. We aggregate from the dict-to-list by taking the specified
columns and applying the list of functions. This returns a ``MultiIndex`` for the columns.

.. ipython:: python

df.groupby('A').agg({'B': ['sum', 'max'],
'C': ['count', 'min']})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might do the simpler thing of a dict of scalars instead of list of lists (to not complicate the example).

Eg

In [29]: df.groupby('A').agg({'B': 'sum','C': 'min'})
Out[29]: 
   C  B
A      
1  0  3
2  3  7

Then it even contrasts more with the series one, where the example is also a dict of scalar.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure



We are deprecating the following
Here's an example of the first deprecation (1), passing a dict to a grouped ``Series``. This
is a combination aggregation & renaming:

.. code-block:: ipython. Which is a combination aggregation & renaming.
.. code-block:: ipython

In [6]: df.groupby('A').B.agg({'foo': 'count'})
FutureWarning: using a dictionary on a Series for aggregation
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reminder to updated this with the new FutureWarning if we change the message

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

Expand All @@ -507,7 +514,7 @@ You can accomplish the same operation, more idiomatically by:
df.groupby('A').B.agg(['count']).rename({'count': 'foo'})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

df.groupby('A').B.agg('count').rename('foo') would actually be simpler .. (but is not exactly equivalent to the dict case)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, there are a myriad of options!



Here's an example of 2), passing a dict-of-dict to a grouped ``DataFrame``:
Here's an example of the second deprecation (2), passing a dict-of-dict to a grouped ``DataFrame``:

.. code-block:: python

Expand Down
4 changes: 3 additions & 1 deletion pandas/core/base.py
Original file line number Diff line number Diff line change
Expand Up @@ -497,6 +497,7 @@ def _aggregate(self, arg, *args, **kwargs):
'dictionary'.format(k))

# deprecation of nested renaming
# GH 15931
warnings.warn(
("using a dict with renaming "
"is deprecated and will be removed in a future "
Expand All @@ -506,7 +507,8 @@ def _aggregate(self, arg, *args, **kwargs):
arg = new_arg

else:
# we may have renaming keys
# deprecation of renaming keys
# GH 15931
keys = list(compat.iterkeys(arg))
if (isinstance(obj, ABCDataFrame) and
len(obj.columns.intersection(keys)) != len(keys)):
Expand Down
1 change: 1 addition & 0 deletions pandas/core/groupby.py
Original file line number Diff line number Diff line change
Expand Up @@ -2837,6 +2837,7 @@ def _aggregate_multiple_funcs(self, arg, _level):

# show the deprecation, but only if we
# have not shown a higher level one
# GH 15931
if isinstance(self._selected_obj, Series) and _level <= 1:
warnings.warn(
("using a dict on a Series for aggregation\n"
Expand Down
22 changes: 13 additions & 9 deletions pandas/tests/test_window.py
Original file line number Diff line number Diff line change
Expand Up @@ -134,16 +134,18 @@ def test_agg(self):
expected.columns = ['mean', 'sum']
tm.assert_frame_equal(result, expected)

result = r.aggregate({'A': {'mean': 'mean', 'sum': 'sum'}})
with catch_warnings(record=True):
result = r.aggregate({'A': {'mean': 'mean', 'sum': 'sum'}})
expected = pd.concat([a_mean, a_sum], axis=1)
expected.columns = pd.MultiIndex.from_tuples([('A', 'mean'),
('A', 'sum')])
tm.assert_frame_equal(result, expected, check_like=True)

result = r.aggregate({'A': {'mean': 'mean',
'sum': 'sum'},
'B': {'mean2': 'mean',
'sum2': 'sum'}})
with catch_warnings(record=True):
result = r.aggregate({'A': {'mean': 'mean',
'sum': 'sum'},
'B': {'mean2': 'mean',
'sum2': 'sum'}})
expected = pd.concat([a_mean, a_sum, b_mean, b_sum], axis=1)
exp_cols = [('A', 'mean'), ('A', 'sum'), ('B', 'mean2'), ('B', 'sum2')]
expected.columns = pd.MultiIndex.from_tuples(exp_cols)
Expand Down Expand Up @@ -195,12 +197,14 @@ def f():
r['B'].std()], axis=1)
expected.columns = pd.MultiIndex.from_tuples([('ra', 'mean'), (
'ra', 'std'), ('rb', 'mean'), ('rb', 'std')])
result = r[['A', 'B']].agg({'A': {'ra': ['mean', 'std']},
'B': {'rb': ['mean', 'std']}})
with catch_warnings(record=True):
result = r[['A', 'B']].agg({'A': {'ra': ['mean', 'std']},
'B': {'rb': ['mean', 'std']}})
tm.assert_frame_equal(result, expected, check_like=True)

result = r.agg({'A': {'ra': ['mean', 'std']},
'B': {'rb': ['mean', 'std']}})
with catch_warnings(record=True):
result = r.agg({'A': {'ra': ['mean', 'std']},
'B': {'rb': ['mean', 'std']}})
expected.columns = pd.MultiIndex.from_tuples([('A', 'ra', 'mean'), (
'A', 'ra', 'std'), ('B', 'rb', 'mean'), ('B', 'rb', 'std')])
tm.assert_frame_equal(result, expected, check_like=True)
Expand Down
0