8000 BUG: df.agg, df.transform and df.apply use different methods when axis=1 than when axis=0 by topper-123 · Pull Request #21224 · pandas-dev/pandas · GitHub
[go: up one dir, main page]

Skip to content

BUG: df.agg, df.transform and df.apply use different methods when axis=1 than when axis=0 #21224

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jul 28, 2018
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
correct apply(axis=1) and related bugs
  • Loading branch information
tp authored and topper-123 committed Jul 26, 2018
commit 2be37475ffa04e08d9901b493378813fd851b1df
6 changes: 5 additions & 1 deletion doc/source/whatsnew/v0.24.0.txt
Original file line number Diff line number Diff line change
Expand Up @@ -475,7 +475,11 @@ Numeric
- Bug in :class:`Series` ``__rmatmul__`` doesn't support matrix vector multiplication (:issue:`21530`)
- Bug in :func:`factorize` fails with read-only array (:issue:`12813`)
- Fixed bug in :func:`unique` handled signed zeros inconsistently: for some inputs 0.0 and -0.0 were treated as equal and for some inputs as different. Now they are treated as equal for all inputs (:issue:`21866`)
-
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use double-back ticks around TypeError. you don't need the 2nd sentence, just list the issue.

- Bug in :meth:`DataFrame.agg`, :meth:`DataFrame.transform` and :meth:`DataFrame.apply` when ``axis=1``.
Using ``apply`` with a list of functions and axis=1 (e.g. ``df.apply(['abs'], axis=1)``)
previously gave a TypeError. This fixes that issue.
As ``agg`` and ``transform`` in some cases delegate to ``apply``, this also
fixed this issue for them (:issue:`16679`).
-

Strings
Expand Down
14 changes: 5 additions & 9 deletions pandas/core/apply.py
Original file line number Diff line number Diff line change
Expand Up @@ -105,6 +105,11 @@ def agg_axis(self):
def get_result(self):
""" compute the results """

# dispatch to agg
if isinstance(self.f, (list, dict)):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you use is_list_like and is_dict_like

return self.obj.aggregate(self.f, axis=self.axis,
*self.args, **self.kwds)

# all empty
if len(self.columns) == 0 and len(self.index) == 0:
return self.apply_empty_result()
Expand Down Expand Up @@ -308,15 +313,6 @@ def wrap_results(self):
class FrameRowApply(FrameApply):
axis = 0

def get_result(self):

# dispatch to agg
if isinstance(self.f, (list, dict)):
return self.obj.aggregate(self.f, axis=self.axis,
*self.args, **self.kwds)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if isinstance(self.f, (list, dict)) should also be called when axis=1, so moved up to FrameApply.get_results.


return super(FrameRowApply, self).get_result()

def apply_broadcast(self):
return super(FrameRowApply, self).apply_broadcast(self.obj)

Expand Down
13 changes: 11 additions & 2 deletions pandas/core/frame.py
Original file line number Diff line number Diff line change
Expand Up @@ -6080,11 +6080,20 @@ def aggregate(self, func, axis=0, *args, **kwargs):
return result

def _aggregate(self, arg, axis=0, *args, **kwargs):
obj = self.T if axis == 1 else self
return super(DataFrame, obj)._aggregate(arg, *args, **kwargs)
if axis == 1:
result, how = (super(DataFrame, self.T)
._aggregate(arg, *args, **kwargs))
result = result.T if result is not None else result
return result, how
return super(DataFrame, self)._aggregate(arg, *args, **kwargs)

agg = aggregate

def transform(self, func, axis=0, *args, **kwargs):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this doc-string get updated somewhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add the Appender

if axis == 1:
return super(DataFrame, self.T).transform(func, *args, **kwargs).T
return super(DataFrame, self).transform(func, *args, **kwargs)

def apply(self, func, axis=0, broadcast=None, raw=False, reduce=None,
result_type=None, args=(), **kwds):
"""
Expand Down
16 changes: 7 additions & 9 deletions pandas/core/generic.py
Original file line number Diff line number Diff line change
Expand Up @@ -9193,16 +9193,14 @@ def ewm(self, com=None, span=None, halflife=None, alpha=None,

cls.ewm = ewm

@Appender(_shared_docs['transform'] % _shared_doc_kwargs)
def transform(self, func, *args, **kwargs):
result = self.agg(func, *args, **kwargs)
if is_scalar(result) or len(result) != len(self):
raise ValueError("transforms cannot produce "
"aggregated results")
@Appender(_shared_docs['transform'] % _shared_doc_kwargs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to add this appender in frame.py as well

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I'll do that

def transform(self, func, *args, **kwargs):
result = self.agg(func, *args, **kwargs)
if is_scalar(result) or len(result) != len(self):
raise ValueError("transforms cannot produce "
"aggregated results")

return result

cls.transform = transform
Copy link
Contributor Author
@topper-123 topper-123 Jul 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when transform was added by calling _add_series_or_dataframe_operations that method shadowed a transform method on DataFrame. As transform doesnt need to be added in any special way, I just moved it to be a normal instance method.

return result

# ----------------------------------------------------------------------
# Misc methods
Expand Down
98 changes: 60 additions & 38 deletions pandas/tests/frame/test_apply.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,7 @@
import pytest

import operator
from collections import OrderedDict
from datetime import datetime
from itertools import chain

Expand Down Expand Up @@ -846,58 +847,74 @@ def test_consistency_for_boxed(self, box):
assert_frame_equal(result, expected)


def zip_frames(*frames):
def zip_frames(frames, axis=1):
"""
take a list of frames, zip the columns together for each
assume that these all have the first frame columns
take a list of frames, zip them together under the
assumption that these all have the first frames' index/columns.

return a new frame
Returns
-------
new_frame : DataFrame
"""
columns = frames[0].columns
zipped = [f[c] for c in columns for f in frames]
return pd.concat(zipped, axis=1)
if axis == 1:
columns = frames[0].columns
zipped = [f.loc[:, c] for c in columns for f in frames]
return pd.concat(zipped, axis=1)
else:
index = frames[0].index
zipped = [f.loc[i, :] for i in index for f in frames]
return pd.DataFrame(zipped)


class TestDataFrameAggregate(TestData):

def test_agg_transform(self):
def test_agg_transform(self, axis):
other_axis = abs(axis - 1)

with np.errstate(all='ignore'):

f_sqrt = np.sqrt(self.frame)
f_abs = np.abs(self.frame)
f_sqrt = np.sqrt(self.frame)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having "absolute" come before "sqrt" maintains alphabetical ordering, and makes creating multindexes easier below.


# ufunc
result = self.frame.transform(np.sqrt)
result = self.frame.transform(np.sqrt, axis=axis)
expected = f_sqrt.copy()
assert_frame_equal(result, expected)

result = self.frame.apply(np.sqrt)
result = self.frame.apply(np.sqrt, axis=axis)
assert_frame_equal(result, expected)

result = self.frame.transform(np.sqrt)
result = self.frame.transform(np.sqrt, axis=axis)
assert_frame_equal(result, expected)

# list-like
result = self.frame.apply([np.sqrt])
result = self.frame.apply([np.sqrt], axis=axis)
expected = f_sqrt.copy()
expected.columns = pd.MultiIndex.from_product(
[self.frame.columns, ['sqrt']])
if axis == 0:
expected.columns = pd.MultiIndex.from_product(
[self.frame.columns, ['sqrt']])
else:
expected.index = pd.MultiIndex.from_product(
[self.frame.index, ['sqrt']])
assert_frame_equal(result, expected)

result = self.frame.transform([np.sqrt])
result = self.frame.transform([np.sqrt], axis=axis)
assert_frame_equal(result, expected)

# multiple items in list
# these are in the order as if we are applying both
# functions per series and then concatting
expected = zip_frames(f_sqrt, f_abs)
expected.columns = pd.MultiIndex.from_product(
[self.frame.columns, ['sqrt', 'absolute']])
result = self.frame.apply([np.sqrt, np.abs])
result = self.frame.apply([np.abs, np.sqrt], axis=axis)
expected = zip_frames([f_abs, f_sqrt], axis=other_axis)
if axis == 0:
expected.columns = pd.MultiIndex.from_product(
[self.frame.columns, ['absolute', 'sqrt']])
else:
expected.index = pd.MultiIndex.from_product(
[self.frame.index, ['absolute', 'sqrt']])
assert_frame_equal(result, expected)

result = self.frame.transform(['sqrt', np.abs])
result = self.frame.transform([np.abs, 'sqrt'], axis=axis)
assert_frame_equal(result, expected)

def test_transform_and_agg_err(self, axis):
Expand Down Expand Up @@ -985,46 +1002,51 @@ def test_agg_dict_nested_renaming_depr(self):

def test_agg_reduce(self, axis):
other_axis = abs(axis - 1)
name1, name2 = self.frame.axes[other_axis].unique()[:2]
name1, name2 = self.frame.axes[other_axis].unique()[:2].sort_values()

# all reducers
expected = zip_frames(self.frame.mean(axis=axis).to_frame(),
self.frame.max(axis=axis).to_frame(),
self.frame.sum(axis=axis).to_frame()).T
expected.index = ['mean', 'max', 'sum']
expected = pd.concat([self.frame.mean(axis=axis),
self.frame.max(axis=axis),
self.frame.sum(axis=axis),
], axis=1)
expected.columns = ['mean', 'max', 'sum']
expected = expected.T if axis == 0 else expected

result = self.frame.agg(['mean', 'max', 'sum'], axis=axis)
assert_frame_equal(result, expected)

# dict input with scalars
func = {name1: 'mean', name2: 'sum'}
func = OrderedDict([(name1, 'mean'), (name2, 'sum')])
result = self.frame.agg(func, axis=axis)
expected = Series([self.frame.loc(other_axis)[name1].mean(),
self.frame.loc(other_axis)[name2].sum()],
index=[name1, name2])
assert_series_equal(result.reindex_like(expected), expected)
assert_series_equal(result, expected)

# dict input with lists
func = {name1: ['mean'], name2: ['sum']}
func = OrderedDict([(name1, ['mean']), (name2, ['sum'])])
result = self.frame.agg(func, axis=axis)
expected = DataFrame({
name1: Series([self.frame.loc(other_axis)[name1].mean()],
index=['mean']),
name2: Series([self.frame.loc(other_axis)[name2].sum()],
index=['sum'])})
assert_frame_equal(result.reindex_like(expected), expected)
expected = expected.T if axis == 1 else expected
assert_frame_equal(result, expected)

# dict input with lists with multiple
func = {name1: ['mean', 'sum'],
name2: ['sum', 'max']}
func = OrderedDict([(name1, ['mean', 'sum']), (name2, ['sum', 'max'])])
result = self.frame.agg(func, axis=axis)
expected = DataFrame({
name1: Series([self.frame.loc(other_axis)[name1].mean(),
expected = DataFrame(OrderedDict([
(name1, Series([self.frame.loc(other_axis)[name1].mean(),
self.frame.loc(other_axis)[name1].sum()],
index=['mean', 'sum']),
name2: Series([self.frame.loc(other_axis)[name2].sum(),
index=['mean', 'sum'])),
(name2, Series([self.frame.loc(other_axis)[name2].sum(),
self.frame.loc(other_axis)[name2].max()],
index=['sum', 'max'])})
assert_frame_equal(result.reindex_like(expected), expected)
index=['sum', 'max'])),
]))
expected = expected.T if axis == 1 else expected
assert_frame_equal(result, expected)

def test_nuiscance_columns(self):

Expand Down
0