-
-
Notifications
You must be signed in to change notification settings - Fork 357
Description
We have an issue for fancier deadlock detection, and API support to make it more useful (#182). This is about a simpler issue: detecting when the entire program has deadlocked, i.e. no tasks are runnable or will ever be runnable again. This is not nearly as fancy, but it would catch lots of real-world deadlock cases (e.g. in tests), and is potentially wayyy simpler. In particular, I believe a Trio program has deadlocked if:
- There are no runnable tasks
- There are no registered timeouts
- There are no tasks waiting on the
IOManager
- No-one is blocked in
wait_all_tasks_blocked
(Did I miss anything?)
However, there is one practical problem: the EntryQueue
task is always blocked in the IOManager
, waiting for someone to call run_sync_soon
.
Practical example of why this is important: from the Trio scheduler's point of view, run_sync_in_worker_thread
puts a task to sleep, and then later a call to reschedule(...)
magically appears through run_sync_soon
. So... it's entirely normal to be in a state where the whole program looks deadlocked except for the possibility of getting a run_sync_soon
, and the program actually isn't deadlocked. But, of course, 99% of the time, there is absolutely and definitely no run_sync_soon
call coming. There's just no way for Trio to know that.
So I guess to make this viable, we would need some way to recognize the 99% of cases where there is no chance of a run_sync_soon
. I think that means, we need to refactor TrioToken
so that it uses an acquire/release pattern: you acquire the token only if you plan to call run_sync_soon
, and then when you're done with it you explicitly close it.
This will break the other usage of TrioToken
, which is that you can compare them with is
to check if two calls to trio.run
are in fact the same. Maybe this is not even that useful? If it is though then we should split it off into a separate class, so that the only reason to acquire the run_sync_soon
-object is because you're going to call run_sync_soon
.
Given that, I think we could implement this by extending the code at the top of the event loop like:
if runner.runq:
timeout = 0
elif runner.deadlines:
deadline, _ = runner.deadlines.keys()[0]
timeout = runner.clock.deadline_to_sleep_time(deadline)
else:
- timeout = _MAX_TIMEOUT
+ if not runner.io_manager.has_waits() and not runner.tokens_outstanding and not runner.waiting_for_idle:
+ # Deadlock detected! Dump a stack tree and crash, maybe...?
+ else:
+ timeout = _MAX_TIMEOUT
This is probably super-cheap too, because we only do the extra checks when there are no runnable tasks or deadlines. No runnable tasks means we're either about to go to sleep for a while, so taking some extra time here is "free", or else that we're about to detect I/O, but if there's outstanding I/O then you should probably have a deadline set...