10000 bpo-37788: Fix a reference leak if a thread is not joined. by shihai1991 · Pull Request #25226 · python/cpython · GitHub
[go: up one dir, main page]

Skip to content

bpo-37788: Fix a reference leak if a thread is not joined. #25226

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

shihai1991
Copy link
Member
@shihai1991 shihai1991 commented Apr 6, 2021

Add threading.Thread._discard_shutdown_lock() method to ensure that the thread state
lock is removed from the _shutdown_locks list when a thread
completes.

Co-authored-by: Victor Stinner vstinner@python.org

https://bugs.python.org/issue37788

@shihai1991
Copy link
Member Author
shihai1991 commented Apr 6, 2021

Those test cases have been passed in my local:
./python -m test -v test_threading -R 3:3
./python -m test -v test_threading -m test_threads_join_2 -F

@shihai1991 shihai1991 requested a review from vstinner April 6, 2021 18:04
@shihai1991
Copy link
Member Author

Hi, Victor. Can you take a look when you have free time.? @vstinner
I copied your code from #15228. ;)

@shihai1991
Copy link
Member Author

It seems like the failed test case(test_ssl) of gate have no releation with this PR.

@shihai1991 shihai1991 closed this Apr 6, 2021
@shihai1991 shihai1991 reopened this Apr 6, 2021
Lib/threading.py Outdated
return
lock = self._tstate_lock
if lock is not None and not self.daemon:
if lock.acquire(False):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the lock is not acquired, then the lock still remains in the global list and leaks.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right. Running threads will still in interpreter.

import sys
import random
import threading
import time

def f():
    seconds = random.random() * 0.01
    time.sleep(seconds)

for _ in range(10):
    threading.Thread(target=f).start()
    print(sys.gettotalrefcount())

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wait. this test case can not measure the refleaks. It will raise the refleaks too when I apply #15228.

@shihai1991 shihai1991 changed the title bpo-37788: Fix a reference leak if a thread is not joined. [WIP]bpo-37788: Fix a reference leak if a thread is not joined. Apr 8, 2021
@shihai1991
Copy link
Member Author

Hm, there have an error in my local:

$ ./python -m test test_threading -m test_is_alive_after_fork -F -v
== CPython 3.10.0a6+ (heads/bpo_37788-dirty:5fdb00d3a5b, Apr 9 2021, 02:47:24) [GCC 9.3.0]
== Linux-3.10.0-957.10.1.el7.x86_64-x86_64-with-glibc2.17 little-endian
== cwd: /home/shihai/cpython/build/test_python_6807æ
== CPU count: 4
== encodings: locale=UTF-8, FS=utf-8
0:00:00 load avg: 0.19 Run tests sequentially
0:00:00 load avg: 0.19 [  1] test_threading
test_is_alive_after_fork (test.test_threading.ThreadTests) ... Warning -- threading_cleanup() failed to cleanup 0 threads (count: 0, dangling: 2)
Warning -- Dangling thread: <_MainThread(MainThread, started 140160670410560)>
Warning -- Dangling thread: <Thread(Thread-7 (<lambda>), stopped 140160456398592)>
FAIL

======================================================================
FAIL: test_is_alive_after_fork (test.test_threading.ThreadTests)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/shihai/cpython/Lib/test/test_threading.py", line 546, in test_is_alive_after_fork
    support.wait_process(pid, exitcode=10)
  File "/home/shihai/cpython/Lib/test/support/__init__.py", line 1943, in wait_process
    raise AssertionError(f"process {pid} is still running "
AssertionError: process 6822 is still running after 30.1 seconds

----------------------------------------------------------------------

@shihai1991
Copy link
Member Author

Using if lock not in _shutdown_locks:return in _discard_shutdown_lock() avoids to acquire the unnecessary _shutdown_locks lock.

And ./python -m test test_threading -m test_is_alive_after_fork -F -v have been passed in my local.

@shihai1991 shihai1991 changed the title [WIP]bpo-37788: Fix a reference leak if a thread is not joined. bpo-37788: Fix a reference leak if a thread is not joined. Apr 11, 2021
@shihai1991
Copy link
Member Author

Hi, Antoine, Serhiy. Can you take a look when you have free time? I try to fix the refleak of threading after victor reverted his PR: #15228 @pitrou @serhiy-storchaka

@zhangyangyu zhangyangyu requested a review from pitrou April 14, 2021 00:27
@pitrou
Copy link
Member
pitrou commented Apr 16, 2021

I don't like the idea of adding C code for this. You could try to make _shutdown_locks a WeakSet.

@shihai1991
Copy link
Member Author
shihai1991 commented Apr 16, 2021

I don't like the idea of adding C code for this. You could try to make _shutdown_locks a WeakSet.

Hm. When I use _shutdown_locks as a WeakSet, this test case will failed: ./python -m test test_threading -m test_threads_join_2 -v -F.

The error info:

0:00:00 load avg: 0.14 [  2] test_threading
test_threads_join_2 (test.test_threading.SubinterpThreadingTests) ... Fatal Python error: Py_EndInterpreter: not the last thread
Python runtime state: initialized

Thread 0x00007f20c330e700 (most recent call first):
  File "<string>", line 9 in random_sleep
  File "<string>", line 13 in __del__

The reason is that the interpreter can't use the weakref of shutdown_lock to check the thread status when interpreter is finished.

@pitrou
Copy link
Member
pitrou commented Apr 16, 2021

Ah, right. The problem, though, is this comment at the top of release_sentinel:

Tricky: this function is called when the current thread state is being deleted. Therefore, only simple C code can safely execute here.

Your code is adding the execution of arbitrary Python code, which violates the requirement expressed in the comment. It seems to work, but it doesn't seem like a good idea to bet that it will always work.

@pitrou
Copy link
Member
pitrou commented Apr 16, 2021

This patch is not the prettiest solution, but it seems to work:
(note that Github auto-markdowned the BPO reference in the comment :-/)

diff --git a/Lib/threading.py b/Lib/threading.py
index 4dcf84715c..e44e13666a 100644
--- a/Lib/threading.py
+++ b/Lib/threading.py
@@ -780,12 +780,27 @@ def _newname(name_template):
 _active = {}    # maps thread id to Thread object
 _limbo = {}
 _dangling = WeakSet()
+
 # Set of Thread._tstate_lock locks of non-daemon threads used by _shutdown()
 # to wait until all Python thread states get deleted:
 # see Thread._set_tstate_lock().
 _shutdown_locks_lock = _allocate_lock()
 _shutdown_locks = set()
 
+def _maintain_shutdown_locks():
+    """
+    Drop any shutdown locks that don't correspond to running threads anymore.
+
+    Calling this from time to time avoids an ever-growing _shutdown_locks
+    set when Thread objects are not joined explicitly.  See [bpo-37788](https://bugs.python.org/issue37788).
+
+    This must be called with _shutdown_locks_lock acquired.
+    """
+    # If a lock was released, the corresponding thread has exited
+    to_remove = [lock for lock in _shutdown_locks if not lock.locked()]
+    _shutdown_locks.difference_update(to_remove)
+
+
 # Main class for threads
 
 class Thread:
@@ -968,6 +983,7 @@ def _set_tstate_lock(self):
 
         if not self.daemon:
             with _shutdown_locks_lock:
+                _maintain_shutdown_locks()
                 _shutdown_locks.add(self._tstate_lock)
 
     def _bootstrap_inner(self):
@@ -1023,7 +1039,8 @@ def _stop(self):
         self._tstate_lock = None
         if not self.daemon:
             with _shutdown_locks_lock:
-                _shutdown_locks.discard(lock)
+                # Remove our lock and other released locks from _shutdown_locks
+                _maintain_shutdown_locks()
 
     def _delete(self):
         "Remove current thread from the dict of currently running threads."

@pitrou
Copy link
Member
pitrou commented Apr 16, 2021

Also cc @tim-one for advice.

@shihai1991
Copy link
Member Author

This patch is not the prettiest solution, but it seems to work

Nice patch. I have tested in my local and it can work :)

@shihai1991
Copy link
Member Author
shihai1991 commented Apr 18, 2021

How about maintain _shutdown_locks in __del__ of thread? something like:

diff --git a/Lib/threading.py b/Lib/threading.py
index 4dcf84715c4..dd5d07ca1bd 100644
--- a/Lib/threading.py
+++ b/Lib/threading.py
@@ -857,6 +857,16 @@ class is implemented.
         # For debugging and _after_fork()
         _dangling.add(self)

+    def __del__(self):
+        if not self._initialized:
+            return
+        lock = self._tstate_lock
+        if lock is not None and not self.daemon:
+            with _shutdown_locks_lock:
+                to_remove = [lock for lock in _shutdown_locks
+                             if not lock.locked()]
+                _shutdown_locks.difference_update(to_remove)

@pitrou
Copy link
Member
pitrou commented Apr 19, 2021

__del__ methods make finalization order non-deterministic in reference cycles, so it's a good practice to avoid them for public stdlib objects such as Thread.

@shihai1991
Copy link
Member Author
shihai1991 commented May 13, 2021

@pitrou Hi, Antoine. Would you mind to create a new PR for your patch?

@shihai1991
Copy link
Member Author

Antoine's PR is more better than this PR, so I closed this PR ;)

@shihai1991 shihai1991 closed this May 13, 2021
@pitrou
Copy link
Member
pitrou commented May 13, 2021

Ok, I've created PR #26103 for it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0