8000 describe_by(groups) (and describe(axis, groups)) gives wrong result for std and percentiles · Issue #1124 · larray-project/larray · GitHub
[go: up one dir, main page]

Skip to content
describe_by(groups) (and describe(axis, groups)) gives wrong result for std and percentiles #1124
Open
@gdementen

Description

@gdementen
>>> arr = ndtest((3, 4))
>>> arr.describe('a', 'b0,b1 >> b01;b1,b2 >> b12')
>>> arr.describe_by('b0,b1 >> b01;b1,b2 >> b12')
b\statistic  count  mean  std  min   25%  50%   75%   max
        b01    6.0   4.5  0.0  0.0  2.25  4.5  6.75   9.0
        b12    6.0   5.5  0.0  1.0  3.25  5.5  7.75  10.0

The correct result should be:

b\statistic  count  mean                 std  min   25%  50%   75%   max
        b01    6.0   4.5  3.6193922141707713  0.0  1.75  4.5  7.25   9.0
        b12    6.0   5.5  3.6193922141707713  1.0  2.75  5.5  8.25  10.0

For example:

>>> arr['b0,b1'].describe()
statistic  count  mean                 std  min   25%  50%   75%  max
             6.0   4.5  3.6193922141707713  0.0  1.75  4.5  7.25  9.0

I think this is related to #1118. In fact, I think these are two different symptoms for the same bug: because of the way group aggregates and axis aggregates are not done at the same time.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0