-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
DatetimeIndex plot converter performance #17479
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Could you edit your example to just use the It looks like something buggy in our converter we register with matplotlib: https://github.com/pandas-dev/pandas/blob/master/pandas/plotting/_converter.py, if you want to dig into that (see if we're calling it too often maybe?) As a workaround there's |
I profiled In [4]: import pandas as pd
...: import time
...: import numpy
...:
...:
...: # This is how it is in my code, but you don't have it.
...: # data1 = load('../170907/data/1504817777.npy')
...:
...: # You could just as easily do
...: data1 = numpy.random.randn(432000)
...:
...: date_index = pd.date_range(start=1504817777*1e9, periods=len(data1), freq='
...: 100 ms', tz='UTC')\
...: .tz_convert('America/Los_Angeles')\
...: .tz_localize(None)
...: voltage = pd.Series(data1, date_index)
...:
In [5]: import cProfile, pstats
In [6]: cProfile.run('voltage[::10000].plot()', 'pandas.prof')
In [8]: p = pstats.Stats('pandas.prof')
In [12]: p.sort_stats('time').print_stats(10)
Tue Nov 7 18:36:36 2017 pandas.prof
258887 function calls (249954 primitive calls) in 39.934 seconds
Ordered by: internal time
List reduced from 2650 to 10 due to restriction <10>
ncalls tottime percall cumtime percall filename:lineno(function)
8 36.270 4.534 36.270 4.534 {pandas._libs.period.get_period_field_arr}
2 1.057 0.529 1.057 0.529 {built-in method _operator.mod}
2 0.628 0.314 0.628 0.314 {method 'compress' of 'numpy.ndarray' objects}
1 0.382 0.382 38.957 38.957 /home/kris/projects/pandas/pandas/plotting/_converter.py:491(_daily_finder)
5 0.379 0.076 0.379 0.076 {method 'nonzero' of 'numpy.ndarray' objects}
4 0.363 0.091 0.363 0.091 {built-in method _operator.sub}
4 0.254 0.063 0.289 0.072 /home/kris/projects/pandas/pandas/core/indexes/period.py:713(shift)
2 0.064 0.032 0.064 0.032 {built-in method _operator.eq}
67 0.060 0.001 0.060 0.001 {built-in method numpy.core.multiarray.arange}
96 0.054 0.001 0.054 0.001 {built-in method marshal.loads} |
Yeah, this is a bit tricky; the calls to @tacaswell will you be a PyData NYC? Maybe we can sketch out a plan for finally getting these converters into matplotlib. |
I will be! This PR matplotlib/matplotlib#9779 also just went in which deals with datetime64. |
Locally this ran pretty fast now on main so I think the performance issue has been addressed in the meantime so going to close |
Uh oh!
There was an error while loading. Please reload this page.
Code Sample, a copy-pastable example if possible
I have a detector with 432000 data points sampled at 10 Hz (12 hours of data). I want to plot the time trace using pandas.Series
Problem description
I love that pandas handles a lot of the date handling automatically. However, it's not practical to spend 30 seconds waiting for this plot to render. I've tried striding over the data in order to decrease the amount of time (I don't need to see all 432000 points), but this doesn't seem to improve rendering time.
Am I missing something?
Output of
pd.show_versions()
INSTALLED VERSIONS
commit: None
python: 3.5.2.final.0
python-bits: 64
OS: Darwin
OS-release: 15.6.0
machine: x86_64
processor: i386
byteorder: little
LC_ALL: None
LANG: en_US.UTF-8
LOCALE: en_US.UTF-8
pandas: 0.19.1
nose: None
pip: 9.0.1
setuptools: 36.2.7
Cython: None
numpy: 1.11.2
scipy: 0.18.1
statsmodels: 0.6.1
xarray: None
IPython: 5.1.0
sphinx: 1.6.2
patsy: 0.4.1
dateutil: 2.6.1
pytz: 2017.2
blosc: None
bottleneck: None
tables: 3.3.0
numexpr: 2.6.1
matplotlib: 1.5.3
openpyxl: None
xlrd: 1.0.0
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: 1.0b10
httplib2: 0.10.3
apiclient: None
sqlalchemy: None
pymysql: None
psycopg2: None
jinja2: 2.9.6
boto: None
pandas_datareader: None
The text was updated successfully, but these errors were encountered: