-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
[RFC] Speed up Axes.bar #20092
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RFC] Speed up Axes.bar #20092
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for opening your first PR into Matplotlib!
If you have not heard from us in a while, please feel free to ping @matplotlib/developers
or anyone who has commented on the PR. Most of our reviewers are volunteers and sometimes things fall through the cracks.
You can also join us on gitter for real-time discussion.
For details on testing, writing docs, and our review process, please see the developer guide
We strive to be a welcoming and open project. Please follow our Code of Conduct.
Thanks! I think this idea has been discussed before. I'm not sure of the technical issues, but in general it seems reasonable. Some of your image changes are quite large according to the tests so it would be good to see before and after and decide what is wrong. The way to do that is to check in the new images as part of the PR (in tests/baseline_images) and we can see the difference. |
FWIW this will also break all of the seaborn tests that assert the correctness of bar/hist plots by checking the attributes of the patch artists. I don't know if there is some way to keep a "dummy" reference to artists in their original place that does not trigger a duplicate draw ... could be useful? |
@jklymak thanks for the quick look! I've added a TODO to the top of the PR with the problems I've noticed in the test plots. Currently looking at legends for # If None, make all labels None (via zip_longest below); otherwise,
# cast each element to str, but keep a single str as it.
labels = [] if label is None else np.atleast_1d(np.asarray(label, str))
for patch, lbl in itertools.zip_longest(patches, labels):
if patch:
p = patch[0]
p.update(kwargs)
if lbl is not None:
p.set_label(lbl) And instead adding a new something ( example labels = [] if label is None else np.atleast_1d(np.asarray(label, str))
legend_patches = []
for patch, lbl in itertools.zip_longest(patches, labels):
if patch:
p = patch[0]
p.update(kwargs)
if lbl is not None:
p.set_label(lbl)
legend_patches.append(p)
for p in patch[1:]:
p.update(kwargs)
p.set_label('_nolegend_')
self.add_somehow(legend_patches) Is there a more elegant way to do this? @mwaskom, is it possible to expand the collections into lists of patches? Maybe a helper around this? |
Changing the return type is definitively an API change. Any user who is trying to do something with the returned is going to have their code broken. You might be able to sub-class the Collection and make it look enough like the current return type, but it will be a lot of work as things like
all should keep working. If your problem is primarily histograms @ivirshup you may do better either passing |
maybe we could have |
@tacaswell, my thing was mainly that I didn't want to have to consider the size of my data and choose the keyword arguments. Particularly as a library author, having changes to the plots around arbitrary data size cutoffs is not going to be fun to maintain. Also I think people generally are expecting something that looks like bars when the bins are wide enough to be able to tell. I think your point is totally fair. I wasn't really sure what was considered to be key parts of the API though, since arguably any changes to public attributes of the object is breaking. I wasn't planning on changing the return type, just it arguably wouldn't be a particularly useful value. Basically, my thinking was if changing the contents of TBH, if this got rejected here, my plan was to see if @mwaskom would be up for replacing the use of From looking at the code a bit, I was actually wondering if the plan was to deprecate the matplotlib/lib/matplotlib/backend_tools.py Lines 1008 to 1009 in a94acb3
My point here being, maybe this would be worth it for 4.0? It can be quite slow at the moment, and the machinery for making it fast is all here. Using a Collection instead of a Container would also bring this function more in line with the rest of package. |
Every change that can break or change the result of currently valid user code is a breaking change to us.
Unfortunately, it is. This is the downside of a very large user base. We have to be extermely careful with these. Using a Collection for the bars is reasonable. Probably the way forward is similar to A clever in between way may be returning a proxy in |
@ivirshup are you still interested in working on this (under the constraints of backward compatibility and the paths laid out in my last comment)? |
@timhoffm, I don't think so. Many of the solutions suggested above got fairly involved, and I don't think I'm familiar enough with matplotlib internals to do this in a reasonable timeframe. Feel free to close, though I think the conversation here definitely has value. |
PR Summary
Currently Axes.bar calls
self.add_patch
for every bar. This can be quite slow for dense histograms. This PR instead puts the patches into a collection and adds that. This seems to give a large performance increase.Details
Evidence of a speed up
On
master
:On this branch:
More thoughts
I don't think this counts as an API change. Of the failing tests, all the image comparison ones I've looked at so far don't look different by eye. Some tests fail because they are checking for artists which are now in a different place.
I thought it would be good to get feedback on whether this was a reasonable approach before making more changes. Any thoughts?
TODO
Based on the failing tests
hist
PR Checklist
pytest
passes).flake8
on changed files to check).flake8-docstrings
and runflake8 --docstring-convention=all
).doc/users/next_whats_new/
(follow instructions in README.rst there).doc/api/next_api_changes/
(follow instructions in README.rst there).