8000 GroupbyRolling aggregate error introduced by bottleneck · Issue #26156 · pandas-dev/pandas · GitHub
[go: up one dir, main page]

Skip to content
GroupbyRolling aggregate error introduced by bottleneck #26156
@dataders

Description

@dataders

Code Sample, a copy-pastable example if possible

import pandas as pd

df = pd.read_json('{"Month":{"0":"2018-01-01","1":"2018-02-01","2":"2018-03-01","3":"2018-04-01","4":"2018-05-01","5":"2018-01-01","6":"2018-02-01","7":"2018-03-01","8":"2018-04-01","9":"2018-05-01","10":"2018-01-01","11":"2018-02-01","12":"2018-03-01","13":"2018-04-01","14":"2018-05-01"},"Person":{"0":"A","1":"A","2":"A","3":"A","4":"A","5":"B","6":"B","7":"B","8":"B","9":"B","10":"C","11":"C","12":"C","13":"C","14":"C"},"Foo":{"0":2,"1":3,"2":4,"3":4,"4":3,"5":10,"6":8,"7":6,"8":4,"9":8,"10":5,"11":6,"12":5,"13":6,"14":5},"Bar":{"0":10,"1":30,"2":5,"3":40,"4":20,"5":80,"6":70,"7":60,"8":50,"9":40,"10":50,"11":50,"12":50,"13":50,"14":50}}'
, convert_dates = ['Month'])

df_rolls = (df
    .sort_values(by=['Month', 'Person'], ascending=True)
    .set_index(['Month'])
    .groupby(['Person'])
    .rolling(3, min_periods=3)
)

this works without bottleneck installed but throws this error: # AttributeError: 'float' object has no attribute 'round'

df_rolls.agg([lambda x: x.mean().round(4])

Problem description

my intention is to compute rolling a rolling mean and sum on multiple columns at once (see below). I was using agg because it allows for multiple functions at once.

df_rolls.agg([MeanRound, 'sum'])

Expected Output

df_rolls.agg([MeanRound])

image

df_rolls.agg([MeanRound, 'sum'])

I was able get a workaround with apply (even though .transform() isn't implemented for RollingGroupby?)

df_rolls.apply(MeanRound, raw = True)

image

Output of pd.show_versions()

[paste the output of pd.show_versions() here below this line]
before bottleneck

INSTALLED VERSIONS
------------------
commit: d04fe2a3f27f84b91e4df800cd8b0836bd8b0dfc
python: 3.6.7.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 142 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None

pandas: 0.24.1
pytest: 4.4.1
pip: 18.1
setuptools: 40.6.2
Cython: 0.29
numpy: 1.14.6
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 7.2.0
sphinx: None
patsy: 0.5.1
dateutil: 2.7.5
pytz: 2018.7
blosc: None
bottleneck: None
tables: None
numexpr: 2.6.8
feather: None
matplotlib: 3.0.2
openpyxl: None
xlrd: 1.2.0
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

after bottleneck

------------------
commit: d04fe2a3f27f84b91e4df800cd8b0836bd8b0dfc
python: 3.6.7.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 142 Stepping 9, GenuineIntel
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: None.None

pandas: 0.24.1
pytest: 4.4.1
pip: 18.1
setuptools: 40.6.2
Cython: 0.29
numpy: 1.14.6
scipy: 1.1.0
pyarrow: None
xarray: None
IPython: 7.2.0
sphinx: None
patsy: 0.5.1
dateutil: 2.7.5
pytz: 2018.7
blosc: None
bottleneck: 1.2.1
tables: None
numexpr: 2.6.8
feather: None
matplotlib: 3.0.2
openpyxl: None
xlrd: 1.2.0
xlwt: None
xlsxwriter: None
lxml.etree: None
bs4: None
html5lib: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.10
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None
gcsfs: None

Metadata

Metadata

Assignees

No one assigned

    Labels

    AlgosNon-arithmetic algos: value_counts, factorize, sorting, isin, clip, shift, diffDependenciesRequired and optional dependenciesWindowrolling, ewma, expanding

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0