8000 bpo-46771: Implement asyncio context managers for handling timeouts by asvetlov · Pull Request #31394 · python/cpython · GitHub
[go: up one dir, main page]

Skip to content

bpo-46771: Implement asyncio context managers for handling timeouts #31394

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 31 commits into from
Mar 10, 2022

Conversation

asvetlov
Copy link
Contributor
@asvetlov asvetlov commented Feb 17, 2022

@asvetlov asvetlov marked this pull request as ready for review February 17, 2022 21:56
@asvetlov asvetlov requested a review from 1st1 as a code owner February 17, 2022 21:56
@asvetlov asvetlov marked this pull request as draft February 17, 2022 21:56
Copy link
Contributor Author
@asvetlov asvetlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Tinche this is the first sketch.
It has no tests yet, sorry.

The idea is: let's use CancelScope class.
The class has a state.
If timeout occurs, the state is switched to cancelled and task.cancel() is called.
The fact of timeout occurrence is detected by the state check (self._state == _State.CANCELLED).
The explicit state machine is very useful, it simplifies both designing and debugging.

move_on_* family is renamed with cancel_scope + cancel_spoce_at.

fail_* family is renamed with timeout + timeout_at

Sync context managers are used to visually guarantee no suspension points.

I didn't implement CancelScope.cancel() because it is barely equal to task.cancel(). Can be added if we decide to support this.

Ironically, the cancellation message is not needed for implementation because the cancellation flag is set by call_at() callback. If something swallows raised CancelledError -- the flag is still set. We can add two flags actually: cancellation was raised and caught; I'm not sure if it is really needed. If we decide 'yes' .cancelling() flag can be added along with '.cancelled()'.

I'll try to add tests later.
Sorry, tomorrow I'll be traveling and inactive most likely.

@gvanrossum you may be interested in this preliminary design. It is a mix of trio-styled cancellation scopes and asyncio specific.

The code is pretty simple and understandable, at least from my point of view.
Sure, need to write a comprehensive test suite.
I'll gather tests from both async-timeout and quattro, plus add tests for mentioned edge cases.

@Tinche
Copy link
Contributor
Tinche commented Feb 18, 2022

@asvetlov Thanks for starting this, I will dedicate some time over the weekend. How can I contribute tests to this though? You invited me to asvetlov/cpython, but this PR is from python/cpython. Sorry, I have not done this before :)

@asvetlov
Copy link
Contributor Author

Clone my repo locally git clone git@github.com:asvetlov/cpython.git
Use issue-46771 branch to work on.
When you push into this remote branch, the upstream python/cpython reflects all changes.

BTW, I use GitHub CLI (https://cli.github.com/), it allows to checkout forks very easily.
Look on the top of this PR page for line asvetlov wants to merge 9 commits into main from issue-46771.

Click 'copy-to-clipboard' button right to the branch name.
Run gh pr checkout <paste-from-clipboard>.

The tool setups remote, checkouts branch, and switches to it.
git commit and git push updates the PR even if it is from a fork if you have access (admins and maintainers can edit contributors' PRs). Super easy.

@asvetlov
Copy link
Contributor Author

Sorry, I was over-optimistic :(
A cancellation message is still required if CancelScope wants to swallow self-initiated cancellations (as Quatro does).

@Tinche
Copy link
Contributor
Tinche commented Feb 18, 2022

Thanks for the explanation. The asvetlov repo has no issue-46771 branch but gh pr checkout did the trick.

Can CancelScope be a dataclass?

@Tinche
Copy link
Contributor
Tinche commented Feb 18, 2022

@asvetlov Ok, for the discussion about more sophisticated nonce handling. Would you like me to do a writeup here, or open an issue on the asvetlov repo? I will try to explain the issue and a proposed solution.

You were right saying on the mailing list that asyncio doesn't have priorities, but I think with a little logic we can do better and maybe save our users from a potential footgun.

@1st1
Copy link
Member
1st1 commented Feb 18, 2022

@asvetlov

A cancellation message is still required if CancelScope wants to swallow self-initiated cancellations (as Quatro does).

Can you add a test that can only be fixed with a nonce? I'd like to take a look, because I'm still -1 on the nonce idea. We should figure out another way.

Copy link
Member
@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at the API only.

@gvanrossum
Copy link
Member

I'm not sure if the discussion about cancel edge cases should move here, but my comments on that so far are:

  • Yes, there really is something that we need to solve.
  • Yes, the cancel-message solution works (I have a version for TaskGroup in a branch).
  • Solutions introducing a new "nonce" field seem to just add complications (e.g. the CancelledError exception would have to be modified to accept the nonce arg and store it, or if we make it positional, we'd end up with the awkward CancelledError(None, nonce) if there's a nonce but no message -- and usually there isn't a message.
  • If you make the nonce a float and order them, it's no longer a nonce. And despite using time.monotonic(), using timestamps to deal with races feels wrong.

@Tinche
Copy link
Contributor
Tinche commented Feb 18, 2022

@1st1 Here's a test that fails without nonces:

    async def test_nested_timeouts_concurrent(self):
        with self.assertRaises(TimeoutError):
            with asyncio.timeout(0.002):
                try:
                    with asyncio.timeout(0.003):
                        # Pretend we crunch some numbers.
                        time.sleep(0.005)
                        await asyncio.sleep(1)
                except asyncio.TimeoutError:
                    pass

Both timeouts are marked as cancelled, but the inner timeout swallows the cancellation for the outer timeout. So the outer timeout never triggers.

@asvetlov
Copy link
Contributor Author

@Tinche up to you.
I wanted to share access to work on timeouts/cancellation scopes work with you.
As the cpython core, I can write to any of your pull requests if you don't clear the corresponding check box.
You are not core dev (yet), so I granted you the write access to my clone. You cannot write to the main repo but can apply changes without PR-over-PR, apply, and merge workflow.
I believe it reduces friction.
If you want to demonstrate an alternative approach and a separate branch and PR works better for you -- please go ahead.
If you have a feeling that direct commit to this PR is more comfortable -- I'm fine with it.
For example, adding a new test is a perfect example for the second case.

@gvanrossum
Copy link
Member

@1st1 Here's a test that fails without nonces:

    async def test_nested_timeouts_concurrent(self):
        with self.assertRaises(TimeoutError):
            with asyncio.timeout(0.002):
                try:
                    with asyncio.timeout(0.003):
                        # Pretend we crunch some numbers.
                        time.sleep(0.005)
                        await asyncio.sleep(1)
                except asyncio.TimeoutError:
                    pass

Both timeouts are marked as cancelled, but the inner timeout swallows the cancellation for the outer timeout. So the outer timeout never triggers.

Thinking about what happens here: When we reach the await asyncio.sleep(1), both callbacks run, both try to cancel the task, the inner __exit__() receives CancelledError, uncancels the task, and raises TimeoutError, which is called by the user code's except clause. Then the outer cancel scope's __exit__() is entered without an exception, and overall the task completes successfully.

The expectation is that the outer cancel scope should also raise TimeoutError, because its deadline is also exceeded. So perhaps the outer __exit__() should check whether its deadline is exceeded, and raise TimeoutError even though it did not see a CancelledError?

But it looks like there's a variant of the example that isn't fixed by such a check -- for example, we could add

        await asyncio.sleep(2)

after the try/except block (or even in the except clause), inside the outer cancel scope. Since the CancelledError exception has been wholly swallowed by the inner __exit__() this second await is not interrupted.

I'm guessing the fix for that would be to use a nonce (either using an explicit nonce API or via the cancel message). However, that still depends on the order in which the callbacks run. With the new cancel semantics (where extra cancels are ignored) whoever runs first wins, while with the old cancel semantics (where the most recent cancel gets to set the cancel message / nonce) whoever runs last wins -- but we don't want to rely on the order in which the callbacks run (since I have no idea in which order the current implementation runs them, given that they both become "ready" simultaneously, and we shouldn't depend on that). So that's why Tin is proposing to use timestamps (or a cancel stack, or some other mechanism that lets us arbitrate between the two cancel calls).

Note that this case is slightly different from the "web server cancels" problem, which was solvable without resorting to timestamps (the cancel scope callback would have to check whether the task was already being cancelled, using t.cancelling()).

I'm not sure yet what the right solution would be, but I'm glad that we're thingking about this scenario carefully.

@the-knights-who-say-ni

This comment was marked as resolved.

@Tinche
Copy link
Contributor
Tinche commented Feb 18, 2022

@gvanrossum thanks for the great write up, I believe you hit the nail on the head. I think to get this 100% correct, we need a way to throw the highest priority cancellation into the task on its iteration. The highest priority is the cancel scope that was entered first. It doesn't have to be a timestamp, we could have a global (or thread-local, but that's overthinking it) counter that the cancel scope takes from on __enter__.

The cancellation data could also be in a context var instead of on the CancelledError, I suppose? Would that be cleaner?

@gvanrossum
Copy link
Member

I have a new proposal. Thinking about @asvetlov's diagrams some more, I think we can make everything work if the cancel state of a task kept a count of cancellations. This would replace the t._cancel_requested flag -- we could just make that flag an int instead of a bool. t.cancel() increments the flag. t.uncancel() decrements it (asserting that it stays >= 0). t.cancelling() returns the value of the flag, i.e. the number of pending cancellations.

Now any context manager cleanup handler, like CancelScope.__exit__() or TaskGroup.__aexit__(), when it has to decide whether to let a CancellationError bubble out or swallow it, can just look at the number of pending cancellations to decide. Such context managers should still keep track of whether they cancelled the task themselves or not (at least TaskGroup needs this to avoid cancelling multiple times), and if they did, they should always call t.uncancel(). If after that's done there's still a nonzero number of pending cancellations (as reported by t.cancelling()), they should let the cancellation bubble out, other wise they can swallow or replace it.

The only thing that's painful with this scheme is how to handle cancel messages. Hoever, (this is for @cjerdonek to disagree with :-) I think cancel messages were a bad idea to begin with -- if I read the bpo where it started, it came from a user who misunderstood how to print exceptions. It seems they are not always preserved regardless. I believe that any scheme that allows for multiple pending cancellations will have a hard time to keep track of which cancel message should be allowed to bubbled out. I'm guessing it could be solved by making t._cancel_requested a list, and giving t.uncancel() an optional argument that indicates which cancel message to remove from that list (None by default), but that feels like a complication I'd much rather avoid.

@Tinche
Copy link
Contributor
Tinche commented Feb 19, 2022

@gvanrossum what happens to the count when the exception is thrown? Does it reset to 0 (I guess this would have to happen after the entire iteration is done) or does it stay, and only t.uncancel() can decrement it?

If it stays, I think essentially you have level-triggered cancellation then. Which I like, but it would be a compatibility break, right?

@gvanrossum
Copy link
Member

It stays, but this is not the flag that causes the exception to be thrown -- for that we have _must_cancel, which remains a bool and is reset when the exception is actually thrown (this reset logic is here, the actual throw here.

So I don't think it's quite the same as level-triggered, IIUC (that would keep interrupting every await governed by the cancel scope until its __exit__() is reached, right?).

@Tinche
Copy link
Contributor
Tinche commented Feb 19, 2022

Cool, I'm on board with this proposal then. (I got a little confused on _must_cancel vs _cancel_requested.) Want me to tinker around and try implementing this?

Your understanding of level-based cancellation is correct. In practice it helps remove a few footguns, especially around finally clauses. I feel like this counter approach might open the door to enable it (on an opt-in basis) in a future version of Python, but that's not a concern for the matter at hand.

@gvanrossum
Copy link
Member

Sure, go ahead and send a draft PR!

@Tinche
Copy link
Contributor
Tinche commented Feb 20, 2022

Got a working version here: #31434.

@asvetlov asvetlov added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Mar 9, 2022
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @asvetlov for commit ac6f8c8 🤖

If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Mar 9, 2022
Copy link
Member
@gvanrossum gvanrossum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Once we're confident that the buldbot tests will pass let's land this!

@gvanrossum gvanrossum added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Mar 10, 2022
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @gvanrossum for commit e8c67ce 🤖

If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Mar 10, 2022
@asvetlov asvetlov added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Mar 10, 2022
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @asvetlov for commit e65d766 🤖

If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Mar 10, 2022
@asvetlov
Copy link
Contributor Author

buildbot failures are unrelated:

11 tests failed:
test_codecmaps_cn test_codecmaps_hk test_codecmaps_jp
test_codecmaps_kr test_codecmaps_tw test_descr test_pickle
test_pickletools test_sax test_unicode test_zipfile
12 tests skipped:
test_devpoll test_ioctl test_kqueue test_msilib test_startfile
test_tix test_tk test_ttk_guionly test_winconsoleio test_winreg
test_winsound test_zipfile64
11 re-run tests:
test_codecmaps_cn test_codecmaps_hk test_codecmaps_jp
test_codecmaps_kr test_codecmaps_tw test_descr test_pickle
test_pickletools test_sax test_unicode test_zipfile
test_unicode leaked [1373, 1371, 1373] references, sum=4117
11 tests failed again:
test_codecmaps_cn test_codecmaps_hk test_codecmaps_jp
test_codecmaps_kr test_codecmaps_tw test_descr test_pickle
test_pickletools test_sax test_unicode test_zipfile

The same on other boxes, test_asyncio is never affected

@asvetlov asvetlov added the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Mar 10, 2022
@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @asvetlov for commit e65d766 🤖

If you want to schedule another build, you need to add the ":hammer: test-with-buildbots" label again.

@bedevere-bot bedevere-bot removed the 🔨 test-with-buildbots Test PR w/ buildbots; report in status section label Mar 10, 2022
@gvanrossum
Copy link
Member

Looks like it's all the refleaks buildbots, leaking in unicode. Let's ignore that; I'll merge this. Thanks for this significant improvement to asyncio usability!

@Tinche
Copy link
Contributor
Tinche commented Mar 10, 2022

I've been quietly following along, thanks a ton @asvetlov !

@Tinche
Copy link
Contributor
Tinche commented Mar 10, 2022

(And thanks to @gvanrossum for managing this of course).

@gvanrossum gvanrossum merged commit f537b2a into main Mar 10, 2022
@gvanrossum gvanrossum deleted the issue-46771 branch March 10, 2022 16:05
@asvetlov
Copy link
Contributor Author

Thanks, guys!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants
0