8000 Fix TkAgg memory leaks and test for memory growth regressions by richardsheridan · Pull Request #22002 · matplotlib/matplotlib · GitHub
[go: up one dir, main page]

Skip to content

Fix TkAgg memory leaks and test for memory growth regressions #22002

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Apr 28, 2022

Conversation

richardsheridan
Copy link
Contributor
@richardsheridan richardsheridan commented Dec 18, 2021

PR Summary

This PR is an attempt to fix #20490. Basically it allows a normally-forbidden self.window.update() if we detect that the tkinter mainloop is not running. The new test currently still fails though because of some additional leak related to the HiDPI stuff in #19167.

I took the liberty of introducing a new dependency in the test suite. Let me know if I should insert it into any other files or if you would rather find a different way to observe the leak.

PR Checklist

Tests and Styling

  • Has pytest style unit tests
  • (and pytest passes).
  • Is Flake 8 compliant (install flake8-docstrings and run flake8 --docstring-convention=all).

Documentation

  • New features are documented, with examples if plot related.
  • New features have an entry in doc/users/next_whats_new/ (follow instructions in README.rst there).
  • API changes documented in doc/api/next_api_changes/ (follow instructions in README.rst there).
  • Documentation is sphinx and numpydoc compliant (the docs should build without error).

@tacaswell tacaswell added this to the v3.5.2 milestone Dec 18, 2021
@tacaswell
Copy link
Member

Hopefully adding a test dependency in a patch release will not be too troublesome for the packagers. Assume the escape hatch to patch out the test.

@richardsheridan richardsheridan changed the title Maybe update Allow TkAgg backend to flush events if mainloop is not detected Dec 18, 2021
@tacaswell
Copy link
Member

@richardsheridan Thank you for tracking this down and 👍🏻 on adding that test!

@richardsheridan
Copy link
Contributor Author

@tacaswell should I XFAIL the test for now or just let it fail entirely?

@tacaswell
Copy link
Member

Can you uses pytest.importorskip https://docs.pytest.org/en/6.2.x/reference.html#pytest-importorskip ? We already use it in the tests to skip the tests that require pandas.

@richardsheridan
Copy link
Contributor Author

This PR is blocked by the hidpi leak that maybe only @QuLogic can debug. Otherwise we could adjust the memory leak threshold higher so it passes for now. That way the hidpi leak can be fixed later and th threshold can be lowered again at that time. Thoughts?

@QuLogic
Copy link
Member
QuLogic commented Apr 6, 2022

I don't have Windows to check the HiDPI code right now. But I'm not sure how it could be leaking? The only HiDPI variable is attached to the window (the figure manager does window_dpi = tk.IntVar(master=window, ...) and self._window_dpi = window_dpi), so it should go away with it.

@QuLogic
Copy link
Member
QuLogic commented Apr 6, 2022

I suppose there's also the additional Font object on the toolbar as well, but adding an explicit del self.toolbar._label_font to delayed_destroy doesn't do anything.

@tacaswell
Copy link
Member

If I run memleak.py with the following patch:

$ git diff
diff --git a/tools/memleak.py b/tools/memleak.py
index 9b9da912b7..d61d0b57c2 100755
--- a/tools/memleak.py
+++ b/tools/memleak.py
@@ -79,10 +79,10 @@ def run_memleak_test(bench, iterations, report):
     ax3.plot(open_files_arr)
     ax3.set_ylabel('open file handles')
 
-    if not report.endswith('.pdf'):
-        report = report + '.pdf'
+    if not report.endswith('.png'):
+        report = report + '.png'
     fig.tight_layout()
-    fig.savefig(report, format='pdf')
+    fig.savefig(report, format='png')
 
 
 class MemleakTest:
@@ -115,7 +115,7 @@ class MemleakTest:
             ax.pcolor(10 * np.random.rand(50, 50))
 
         fig.savefig(BytesIO(), dpi=75)
-        fig.canvas.flush_events()
+        # fig.canvas.flush_events()
         plt.close(1)

Main branch

tkagg_main_no_flush

this branch

tkagg_branch_no_flush


This clearly shows that this branch fixes something, but that the warm up time is in the hundreds of figures which is not practical for the test suite.

@QuLogic
Copy link
Member
QuLogic commented Apr 8, 2022

I took the test and put it in a standalone script, and ran it through scalene. Unfortunately, it seems to break Tk threading with explicit gc.collect calls. With garbage collection, it seemed to think that FigureCanvasTk._tkphoto and FigureCanvasTk._tkcanvas.create_image were the biggest leak. However, I don't know how much we can trust that without garbage collection.

That being said, @tacaswell noticed that we explicitly deleted the image created from the photo on the canvas during resize:

self._tkcanvas.delete(self._tkphoto)
self._tkphoto = tk.PhotoImage(
master=self._tkcanvas, width=int(width), height=int(height))
self._tkcanvas.create_image(
int(width / 2), int(height / 2), image=self._tkphoto)

So I wonder if we're somehow missing deletion of that image when destroying the figure?

tkinter variables get cleaned up with normal `destroy` and `gc` semantics but tkinter's implementation of trace is effectively global and keeps the callback object alive until the trace is removed.
we know we have created a bunch of cycles with heavyweight GUI objects, but they won't be in gen 1 yet
@richardsheridan
Copy link
Contributor Author
richardsheridan commented Apr 8, 2022

I made a bit of progress on this thanks to learning about memleak.py. The ever-increasing not-garbage objects before this PR was due to tkinter hanging on to various callbacks. I identified them using objgraph.show_growth and following objgraph.show_backrefs from them. Confusingly they lead to nowhere, and they also show up with objgraph.get_leaking_objects. They aren't actually leaked, though... they are being legitimately tracked by the tkinter c module. these are mostly cleared via running the event loop in the patch.

I say mostly, but actually all were gone but one; the CallWrapper associated with FigureManagerTk._update_window_dpi. This CallWrapper is created by the trace_add command which registers the callback, and, much to my surprise, does not die with the IntVar either on destroy. It might be cleaned up on normal python GC, but the entire web of figures and backend python objects is being pinned alive by that callback, in a sort of un-collectable reference cycle.

I manually unregister the trace in a3016af, and accellerate the garbage collection in e6e4cad. This gave me flat graphs for memory and object behavior in memleak.py. Yay! But tkagg still fails my new test. Boo!

I updated the test to flush events after both creating and closing the figure and adjusted the thresholds. It passes on my machine and we'll see if CI accepts it.

@richardsheridan
Copy link
Contributor Author

If I change memleak.py a bit more:

$  git diff
diff --git a/tools/memleak.py b/tools/memleak.py
index 9b9da912b7..6dbc9dc505 100755
--- a/tools/memleak.py
+++ b/tools/memleak.py

@@ -115,8 +121,9 @@ class MemleakTest:
             ax.pcolor(10 * np.random.rand(50, 50))

         fig.savefig(BytesIO(), dpi=75)
-        fig.canvas.flush_events()
+       plt.pause(.1)
         plt.close(1)


 if __name__ == '__main__':

I see continued growth in RSS in the tkagg backend, but constant object count which is disturbing.

memleakpy

Nevertheless, the growth is muted compared to before the patches.

@richardsheridan
Copy link
Contributor Author

I cannot reproduce the error in CI that has to do with a missing update command locally, but it must have something to do with the difference between flush_events and pause after the figure is destroyed. Pause seems more leaky than flush_events so I reverted my earlier tweak, but the test will still fail on tkagg at the moment.

< 67E6 div class="Details-content--hidden mt-2">
based on suggestion from code review
Based on suggestion from code review
Without it, checking the 'external' memory may indicate a leak even if
we've gotten rid of all our objects.
@QuLogic
Copy link
Member
QuLogic commented Apr 28, 2022

I pushed a couple of updates to the tests that should hopefully fix them. At least, they appear fine locally.

Sorry, something went wrong.

As apparently dropping this was needed for Tk.
@richardsheridan
Copy link
Contributor Author

thanks @QuLogic I'm happy with your changes fwiw.

this one is a definite squash merge when the time comes

@richardsheridan richardsheridan changed the title Allow TkAgg backend to flush events if mainloop is not detected Fix TkAgg memory leaks and test for memory growth regressions Apr 28, 2022
@QuLogic
Copy link
Member
QuLogic commented Apr 28, 2022

For reference, using the example from #20490 (comment) (+ a gc.collect), I managed to run memray on it, and it went from:
newplot(1)
to:
newplot
There's a bit of a spinup, which I haven't quite investigated (i.e., how many figures, or if it's just initial imports and such). The peaking behaviour doesn't exhibit in the RSS when you run the script by itself, so I assume that's just memray periodically writing out stats (or maybe it's slowed down the gc a bit.)

@tacaswell tacaswell merged commit 1a016f0 into matplotlib:main Apr 28, 2022
@tacaswell
Copy link
Member

Thank you @richardsheridan and @QuLogic !

@tacaswell
Copy link
Member

@meeseeksdev backport to v3.5.x

@lumberbot-app
Copy link
lumberbot-app bot commented Apr 28, 2022

Owee, I'm MrMeeseeks, Look at me.

There seem to be a conflict, please backport manually. Here are approximate instructions:

  1. Checkout backport branch and update it.
git checkout v3.5.x
git pull
  1. Cherry pick the first parent branch of the this PR on top of the older branch:
git cherry-pick -x -m1 1a016f0395c3a8d87b632a47d813db2491863164
  1. You will likely have some merge/cherry-pick conflict here, fix them and commit:
git commit -am 'Backport PR #22002: Fix TkAgg memory leaks and test for memory growth regressions'
  1. Push to a named branch:
git push YOURFORK v3.5.x:auto-backport-of-pr-22002-on-v3.5.x
  1. Create a PR against branch v3.5.x, I would have named this PR:

"Backport PR #22002 on branch v3.5.x (Fix TkAgg memory leaks and test for memory growth regressions)"

And apply the correct labels and milestones.

Congratulations — you did some good work! Hopefully your backport PR will be tested by the continuous integration and merged soon!

Remember to remove the Still Needs Manual Backport label once the PR gets merged.

If these instructions are inaccurate, feel free to suggest an improvement.

tacaswell pushed a commit to tacaswell/matplotlib that referenced this pull request Apr 29, 2022
…ory growth regressions

FIX: TkAgg memory leaks and test for memory growth regressions (matplotlib#22002)

tkinter variables get cleaned up with normal `destroy` and `gc` semantics but tkinter's implementation of trace is effectively global and keeps the callback object alive until the trace is removed.

Additionally extend and clean up the tests.

Closes matplotlib#20490

Co-authored-by: Elliott Sales de Andrade <quantum.analyst@gmail.com>
(cherry picked from commit 1a016f0)
tacaswell pushed a commit to tacaswell/matplotlib that referenced this pull request Apr 29, 2022
…ory growth regressions

FIX: TkAgg memory leaks and test for memory growth regressions (matplotlib#22002)

tkinter variables get cleaned up with normal `destroy` and `gc` semantics but tkinter's implementation of trace is effectively global and keeps the callback object alive until the trace is removed.

Additionally extend and clean up the tests.

Closes matplotlib#20490

Co-authored-by: Elliott Sales de Andrade <quantum.analyst@gmail.com>
(cherry picked from commit 1a016f0)
@richardsheridan richardsheridan deleted the maybe_update branch April 29, 2022 11:50
QuLogic added a commit that referenced this pull request Apr 29, 2022
…-v3.5.x

Backport PR #22002: Fix TkAgg memory leaks and test for memory growth regressions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Bug]: FigureCanvasTkAgg call creates memory leak Memory leaks on matplotlib 3.4.2 (and 3.4.0)
3 participants
0