bpo-46771: Implement asyncio context managers for handling timeouts #31394 8000

asvetlov · 2022-02-17T21:51:25Z

https://bugs.python.org/issue46771

asvetlov

@Tinche this is the first sketch.
It has no tests yet, sorry.

The idea is: let's use CancelScope class.
The class has a state.
If timeout occurs, the state is switched to cancelled and task.cancel() is called.
The fact of timeout occurrence is detected by the state check (self._state == _State.CANCELLED).
The explicit state machine is very useful, it simplifies both designing and debugging.

move_on_* family is renamed with cancel_scope + cancel_spoce_at.

fail_* family is renamed with timeout + timeout_at

Sync context managers are used to visually guarantee no suspension points.

I didn't implement CancelScope.cancel() because it is barely equal to task.cancel(). Can be added if we decide to support this.

Ironically, the cancellation message is not needed for implementation because the cancellation flag is set by call_at() callback. If something swallows raised CancelledError -- the flag is still set. We can add two flags actually: cancellation was raised and caught; I'm not sure if it is really needed. If we decide 'yes' .cancelling() flag can be added along with '.cancelled()'.

I'll try to add tests later.
Sorry, tomorrow I'll be traveling and inactive most likely.

@gvanrossum you may be interested in this preliminary design. It is a mix of trio-styled cancellation scopes and asyncio specific.

The code is pretty simple and understandable, at least from my point of view.
Sure, need to write a comprehensive test suite.
I'll gather tests from both async-timeout and quattro, plus add tests for mentioned edge cases.

Lib/asyncio/timeouts.py

Tinche · 2022-02-18T16:17:17Z

@asvetlov Thanks for starting this, I will dedicate some time over the weekend. How can I contribute tests to this though? You invited me to asvetlov/cpython, but this PR is from python/cpython. Sorry, I have not done this before :)

asvetlov · 2022-02-18T16:37:36Z

Clone my repo locally git clone git@github.com:asvetlov/cpython.git
Use issue-46771 branch to work on.
When you push into this remote branch, the upstream python/cpython reflects all changes.

BTW, I use GitHub CLI (https://cli.github.com/), it allows to checkout forks very easily.
Look on the top of this PR page for line asvetlov wants to merge 9 commits into main from issue-46771.

Click 'copy-to-clipboard' button right to the branch name.
Run gh pr checkout <paste-from-clipboard>.

The tool setups remote, checkouts branch, and switches to it.
git commit and git push updates the PR even if it is from a fork if you have access (admins and maintainers can edit contributors' PRs). Super easy.

asvetlov · 2022-02-18T16:50:13Z

Sorry, I was over-optimistic :(
A cancellation message is still required if CancelScope wants to swallow self-initiated cancellations (as Quatro does).

Tinche · 2022-02-18T17:03:58Z

Thanks for the explanation. The asvetlov repo has no issue-46771 branch but gh pr checkout did the trick.

Can CancelScope be a dataclass?

Tinche · 2022-02-18T17:14:59Z

@asvetlov Ok, for the discussion about more sophisticated nonce handling. Would you like me to do a writeup here, or open an issue on the asvetlov repo? I will try to explain the issue and a proposed solution.

You were right saying on the mailing list that asyncio doesn't have priorities, but I think with a little logic we can do better and maybe save our users from a potential footgun.

1st1 · 2022-02-18T17:26:05Z

@asvetlov

A cancellation message is still required if CancelScope wants to swallow self-initiated cancellations (as Quatro does).

Can you add a test that can only be fixed with a nonce? I'd like to take a look, because I'm still -1 on the nonce idea. We should figure out another way.

gvanrossum

Looking at the API only.

Lib/asyncio/timeouts.py

gvanrossum · 2022-02-18T17:33:11Z

I'm not sure if the discussion about cancel edge cases should move here, but my comments on that so far are:

Yes, there really is something that we need to solve.
Yes, the cancel-message solution works (I have a version for TaskGroup in a branch).
Solutions introducing a new "nonce" field seem to just add complications (e.g. the CancelledError exception would have to be modified to accept the nonce arg and store it, or if we make it positional, we'd end up with the awkward CancelledError(None, nonce) if there's a nonce but no message -- and usually there isn't a message.
If you make the nonce a float and order them, it's no longer a nonce. And despite using time.monotonic(), using timestamps to deal with races feels wrong.

Tinche · 2022-02-18T18:51:05Z

@1st1 Here's a test that fails without nonces:

    async def test_nested_timeouts_concurrent(self):
        with self.assertRaises(TimeoutError):
            with asyncio.timeout(0.002):
                try:
                    with asyncio.timeout(0.003):
                        # Pretend we crunch some numbers.
                        time.sleep(0.005)
                        await asyncio.sleep(1)
                except asyncio.TimeoutError:
                    pass

Both timeouts are marked as cancelled, but the inner timeout swallows the cancellation for the outer timeout. So the outer timeout never triggers.

asvetlov · 2022-02-18T18:55:23Z

@Tinche up to you.
I wanted to share access to work on timeouts/cancellation scopes work with you.
As the cpython core, I can write to any of your pull requests if you don't clear the corresponding check box.
You are not core dev (yet), so I granted you the write access to my clone. You cannot write to the main repo but can apply changes without PR-over-PR, apply, and merge workflow.
I believe it reduces friction.
If you want to demonstrate an alternative approach and a separate branch and PR works better for you -- please go ahead.
If you have a feeling that direct commit to this PR is more comfortable -- I'm fine with it.
For example, adding a new test is a perfect example for the second case.

gvanrossum · 2022-02-18T20:50:54Z

@1st1 Here's a test that fails without nonces:

    async def test_nested_timeouts_concurrent(self):
        with self.assertRaises(TimeoutError):
            with asyncio.timeout(0.002):
                try:
                    with asyncio.timeout(0.003):
                        # Pretend we crunch some numbers.
                        time.sleep(0.005)
                        await asyncio.sleep(1)
                except asyncio.TimeoutError:
                    pass

Both timeouts are marked as cancelled, but the inner timeout swallows the cancellation for the outer timeout. So the outer timeout never triggers.

Thinking about what happens here: When we reach the await asyncio.sleep(1), both callbacks run, both try to cancel the task, the inner __exit__() receives CancelledError, uncancels the task, and raises TimeoutError, which is called by the user code's except clause. Then the outer cancel scope's __exit__() is entered without an exception, and overall the task completes successfully.

The expectation is that the outer cancel scope should also raise TimeoutError, because its deadline is also exceeded. So perhaps the outer __exit__() should check whether its deadline is exceeded, and raise TimeoutError even though it did not see a CancelledError?

But it looks like there's a variant of the example that isn't fixed by such a check -- for example, we could add

        await asyncio.sleep(2)

after the try/except block (or even in the except clause), inside the outer cancel scope. Since the CancelledError exception has been wholly swallowed by the inner __exit__() this second await is not interrupted.

I'm guessing the fix for that would be to use a nonce (either using an explicit nonce API or via the cancel message). However, that still depends on the order in which the callbacks run. With the new cancel semantics (where extra cancels are ignored) whoever runs first wins, while with the old cancel semantics (where the most recent cancel gets to set the cancel message / nonce) whoever runs last wins -- but we don't want to rely on the order in which the callbacks run (since I have no idea in which order the current implementation runs them, given that they both become "ready" simultaneously, and we shouldn't depend on that). So that's why Tin is proposing to use timestamps (or a cancel stack, or some other mechanism that lets us arbitrate between the two cancel calls).

Note that this case is slightly different from the "web server cancels" problem, which was solvable without resorting to timestamps (the cancel scope callback would have to check whether the task was already being cancelled, using t.cancelling()).

I'm not sure yet what the right solution would be, but I'm glad that we're thingking about this scenario carefully.

Tinche · 2022-02-18T22:43:17Z

@gvanrossum thanks for the great write up, I believe you hit the nail on the head. I think to get this 100% correct, we need a way to throw the highest priority cancellation into the task on its iteration. The highest priority is the cancel scope that was entered first. It doesn't have to be a timestamp, we could have a global (or thread-local, but that's overthinking it) counter that the cancel scope takes from on __enter__.

The cancellation data could also be in a context var instead of on the CancelledError, I suppose? Would that be cleaner?

gvanrossum · 2022-02-19T15:36:41Z

I have a new proposal. Thinking about @asvetlov's diagrams some more, I think we can make everything work if the cancel state of a task kept a count of cancellations. This would replace the t._cancel_requested flag -- we could just make that flag an int instead of a bool. t.cancel() increments the flag. t.uncancel() decrements it (asserting that it stays >= 0). t.cancelling() returns the value of the flag, i.e. the number of pending cancellations.

Now any context manager cleanup handler, like CancelScope.__exit__() or TaskGroup.__aexit__(), when it has to decide whether to let a CancellationError bubble out or swallow it, can just look at the number of pending cancellations to decide. Such context managers should still keep track of whether they cancelled the task themselves or not (at least TaskGroup needs this to avoid cancelling multiple times), and if they did, they should always call t.uncancel(). If after that's done there's still a nonzero number of pending cancellations (as reported by t.cancelling()), they should let the cancellation bubble out, other wise they can swallow or replace it.

The only thing that's painful with this scheme is how to handle cancel messages. Hoever, (this is for @cjerdonek to disagree with :-) I think cancel messages were a bad idea to begin with -- if I read the bpo where it started, it came from a user who misunderstood how to print exceptions. It seems they are not always preserved regardless. I believe that any scheme that allows for multiple pending cancellations will have a hard time to keep track of which cancel message should be allowed to bubbled out. I'm guessing it could be solved by making t._cancel_requested a list, and giving t.uncancel() an optional argument that indicates which cancel message to remove from that list (None by default), but that feels like a complication I'd much rather avoid.

Tinche · 2022-02-19T16:29:13Z

@gvanrossum what happens to the count when the exception is thrown? Does it reset to 0 (I guess this would have to happen after the entire iteration is done) or does it stay, and only t.uncancel() can decrement it?

If it stays, I think essentially you have level-triggered cancellation then. Which I like, but it would be a compatibility break, right?

gvanrossum · 2022-02-19T17:00:28Z

It stays, but this is not the flag that causes the exception to be thrown -- for that we have _must_cancel, which remains a bool and is reset when the exception is actually thrown (this reset logic is here, the actual throw here.

So I don't think it's quite the same as level-triggered, IIUC (that would keep interrupting every await governed by the cancel scope until its __exit__() is reached, right?).

Tinche · 2022-02-19T21:26:39Z

Cool, I'm on board with this proposal then. (I got a little confused on _must_cancel vs _cancel_requested.) Want me to tinker around and try implementing this?

Your understanding of level-based cancellation is correct. In practice it helps remove a few footguns, especially around finally clauses. I feel like this counter approach might open the door to enable it (on an opt-in basis) in a future version of Python, but that's not a concern for the matter at hand.

gvanrossum · 2022-02-19T21:28:46Z

Sure, go ahead and send a draft PR!

Tinche · 2022-02-20T01:15:43Z

Got a working version here: #31434.

bedevere-bot · 2022-03-09T22:33:01Z

🤖 New build scheduled with the buildbot fleet by @asvetlov for commit ac6f8c8 🤖

If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again.

… code

gvanrossum

LGTM. Once we're confident that the buldbot tests will pass let's land this!

bedevere-bot · 2022-03-10T00:15:17Z

🤖 New build scheduled with the buildbot fleet by @gvanrossum for commit e8c67ce 🤖

If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again.

F438

bedevere-bot · 2022-03-10T03:30:04Z

🤖 New build scheduled with the buildbot fleet by @asvetlov for commit e65d766 🤖

If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again.

asvetlov · 2022-03-10T11:55:46Z

buildbot failures are unrelated:

11 tests failed:
test_codecmaps_cn test_codecmaps_hk test_codecmaps_jp
test_codecmaps_kr test_codecmaps_tw test_descr test_pickle
test_pickletools test_sax test_unicode test_zipfile
12 tests skipped:
test_devpoll test_ioctl test_kqueue test_msilib test_startfile
test_tix test_tk test_ttk_guionly test_winconsoleio test_winreg
test_winsound test_zipfile64
11 re-run tests:
test_codecmaps_cn test_codecmaps_hk test_codecmaps_jp
test_codecmaps_kr test_codecmaps_tw test_descr test_pickle
test_pickletools test_sax test_unicode test_zipfile
test_unicode leaked [1373, 1371, 1373] references, sum=4117
11 tests failed again:
test_codecmaps_cn test_codecmaps_hk test_codecmaps_jp
test_codecmaps_kr test_codecmaps_tw test_descr test_pickle
test_pickletools test_sax test_unicode test_zipfile

The same on other boxes, test_asyncio is never affected

bedevere-bot · 2022-03-10T11:56:00Z

🤖 New build scheduled with the buildbot fleet by @asvetlov for commit e65d766 🤖

If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again.

gvanrossum · 2022-03-10T16:02:30Z

Looks like it's all the refleaks buildbots, leaking in unicode. Let's ignore that; I'll merge this. Thanks for this significant improvement to asyncio usability!

Tinche · 2022-03-10T16:03:42Z

I've been quietly following along, thanks a ton @asvetlov !

Tinche · 2022-03-10T16:04:41Z

(And thanks to @gvanrossum for managing this of course).

asvetlov · 2022-03-10T16:06:17Z

Thanks, guys!

bedevere-bot added the awaiting core review label Feb 17, 2022

the-knights-who-say-ni added the CLA signed label Feb 17, 2022

asvetlov marked this pull request as ready for review February 17, 2022 21:56

asvetlov requested a review from 1st1 as a code owner February 17, 2022 21:56

asvetlov marked this pull request as draft February 17, 2022 21:56

asvetlov commented Feb 18, 2022

View reviewed changes

Lib/asyncio/timeouts.py Outdated Show resolved Hide resolved

asvetlov commented Feb 18, 2022

View reviewed changes

Lib/asyncio/timeouts.py Outdated Show resolved Hide resolved

asvetlov commented Feb 18, 2022

View reviewed changes

Lib/asyncio/timeouts.py Outdated Show resolved Hide resolved

gvanrossum reviewed Feb 18, 2022

View reviewed changes

This comment was marked as resolved.

Sign in to view

the-knights-who-say-ni added CLA not signed and removed CLA signed labels Feb 18, 2022

the-knights-who-say-ni removed the CLA not signed label Feb 20, 2022

asvetlov force-pushed the issue-46771 branch from c1be697 to 8790e49 Compare March 9, 2022 21:07

Increase delay

ac6f8c8

asvetlov added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Mar 9, 2022

bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Mar 9, 2022

Don't raise TimeoutError if the CancelledError was swallowed by inner…

e8c67ce

… code

gvanrossum approved these changes Mar 10, 2022

View reviewed changes

bedevere-bot added awaiting merge and removed awaiting core review labels Mar 10, 2022

gvanrossum added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Mar 10, 2022

bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Mar 10, 2022

Don't duplicate py/c tests, timeout has no C accelerators

e65d766

asvetlov added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Mar 10, 2022

bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Mar 10, 2022

asvetlov added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Mar 10, 2022

bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Mar 10, 2022

gvanrossum merged commit f537b2a into main Mar 10, 2022

gvanrossum deleted the issue-46771 branch March 10, 2022 16:05

bedevere-bot removed the awaiting merge label Mar 10, 2022

gvanrossum mentioned this pull request May 24, 2022

Implement asyncio.timeout() context manager #90927

Closed

kumaraditya303 mentioned this pull request Oct 22, 2022

GH-96764: rewrite asyncio.wait_for to use asyncio.timeout #98518

Merged

Uh oh!

bpo-46771: Implement asyncio context managers for handling timeouts #31394 8000

bpo-46771: Implement asyncio context managers for handling timeouts #31394

Uh oh!

Conversation

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

This comment was marked as resolved.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!