8000 bpo-32734: Fix asyncio.Lock multiple acquire safety issue by bharel · Pull Request #5466 · python/cpython · GitHub
[go: up one dir, main page]

Skip to content

bpo-32734: Fix asyncio.Lock multiple acquire safety issue #5466

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Feb 2, 2018

Conversation

bharel
Copy link
Contributor
@bharel bharel commented Jan 31, 2018

As promised, found and fixed in the same day due to the high importance.
This will cause almost non-existing overhead (one try statement) and eliminate the bug.

https://bugs.python.org/issue32734

Copy link
Contributor
@asvetlov asvetlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, the change is correct.

Please fix minor issues for test.

t1.cancel()
lock.release()

asyncio.Task(lockandtrigger(), loop=self.loop)
Copy link
Contributor
8000

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use loop.create_task() for a task creation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Keep in mind asyncio.Task was used in the rest of the tests so I used it in order to be consistent.


# Make sure only one lock was taken
self.assertEqual(lock_count, 1)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please explicitly make sure that all four tasks are finished at the end of test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

@bedevere-bot
Copy link

GH-5477 is a backport of this pull request to the 3.6 branch.

@bedevere-bot
Copy link

GH-5479 is a backport of this pull request to the 3.7 branch.

@bharel
Copy link
Contributor Author
bharel commented Feb 1, 2018

Alright. Done with changes and backports. Anything else I can help with? :-)

I have made the requested changes; please review again

@bedevere-bot
Copy link

Thanks for making the requested changes!

@asvetlov: please review the changes made to this pull request.

@@ -181,18 +181,22 @@ def locked(self):
self._locked = True
return True

_waiters = self._waiters
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: just use self._waiters. No need for a new variable.

Copy link
Contributor Author
@bharel bharel Feb 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New variable was created in order to avoid the double attribute lookup, I'll remove it if you think it's more readable though.

await fut
self._locked = True
return True
try:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is very subtle. Can you add a comment why we use two nested 'try' statements here? I.e. that it's important that 'finally' block is called before 'except CancelledError'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing :-)

if not fut.done():
fut.set_result(True)
break
"""Wake up the first waiter if it isn't done."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if it is done? Can there be a situation where you have [fut, fut] in the _waiters list?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There can be indeed, when 2 or more locks are waiting on acquire.
If the first future is done, it means that later on it will either get the lock, or it was cancelled in which case it will wake up the next waiter upon loop's next cycle.
The patch fixes the bug by avoiding multiple done waiters at once and cleaning up the waiters list asap.
The only way multiple done waiters can happen is if 2 futures get cancelled in which case only 1 waiter will wake up, and the second cancelled future will not wake up another one as lock would have already been taken (if not self._locked: self._wake_up_first()) or it wasn't taken yet it which case it's still sitting .done() in the waiter's list.

In summary:
.done necessarily means either the lock is immediately taken next loop or if it isn't, (.done occurred because of cancellation) next one wakes up. So if we have a done future not including the one currently running we shouldn't wake up another one as it will cause a double wakeup, hence, the almighty bug.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, makes sense. Could you please add this all as a comment to _wake_up_first? :)

Copy link
Contributor Author
@bharel bharel Feb 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Phew, now that's a long comment, lemme see how can I shorten it while still managing to fully explain it. Challenge accepted :P

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, if you have time to write a short comment :)

Sorry for being anal about this. Every once in a while we find a new bug in asyncio.Lock or asyncio.Queue and it's always a challenge to wrap your head around how they are actually working. Every line of code is significant, sometimes it's not safe to do a simple refactoring because things break in very non-obvious ways.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nah, all good mate. It took some time to wrap my head around it as well.
Luckily I didn't actually encounter the bug at production but rather found it by reading the implementation which took a couple of hours ;-)

How's this comment? Attempting to be short and to the point as much as you can with synchronization.

Copy link
Member
@1st1 1st1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've left a few review comments

@bedevere-bot
Copy link

A Python core developer has requested some changes be made to your pull request before we can consider merging it. If you could please address their requests along with any other requests in other reviews from core developers that would be appreciated.

Once you have made the requested changes, please leave a comment on this pull request containing the phrase I have made the requested changes; please review again. I will then notify any core developers who have left a review that you're ready for them to take another look at this pull request.

And if you don't make the requested changes, you will be put in the comfy chair!

@@ -181,18 +181,22 @@ def locked(self):
self._locked = True
return True

_waiters = self._waiters
Copy link
Contributor Author
@bharel bharel Feb 2, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New variable was created in order to avoid the double attribute lookup, I'll remove it if you think it's more readable though.

if not fut.done():
fut.set_result(True)
break
"""Wake up the first waiter if it isn't done."""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There can be indeed, when 2 or more locks are waiting on acquire.
If the first future is done, it means that later on it will either get the lock, or it was cancelled in which case it will wake up the next waiter upon loop's next cycle.
The patch fixes the bug by avoiding multiple done waiters at once and cleaning up the waiters list asap.
The only way multiple done waiters can happen is if 2 futures get cancelled in which case only 1 waiter will wake up, and the second cancelled future will not wake up another one as lock would have already been taken (if not self._locked: self._wake_up_first()) or it wasn't taken yet it which case it's still sitting .done() in the waiter's list.

In summary:
.done necessarily means either the lock is immediately taken next loop or if it isn't, (.done occurred because of cancellation) next one wakes up. So if we have a done future not including the one currently running we shouldn't wake up another one as it will cause a double wakeup, hence, the almighty bug.

try:
await fut
finally:
_waiters.remove(fut)
except futures.CancelledError:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@1st1 @asvetlov I'll be honest, the only problem I have (which originates from the original code anyway) is what will happen if the exception isn't a CancelledError (such as task.__step(self, exc=OutOfCoconutsException)). Apart from Monty Python having no horse, the code will enter a deadlock as no one will wake up the next waiter. I'm not entirely sure what exceptions can pass into the task's __step as task doesn't allow set_exception but still better be safe than sorry no?

await fut
self._locked = True
return True
try:
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure thing :-)

@bharel
Copy link
Contributor Author
bharel commented Feb 2, 2018

I have made the requested changes; please review again

@bedevere-bot
Copy link

Thanks for making the requested changes!

@1st1, @asvetlov: please review the changes made to this pull request.

Copy link
Contributor
@asvetlov asvetlov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.
@1st1 please review again

@bharel
Copy link
Contributor Author
bharel commented Feb 2, 2018

Alright, I guess that's a green light for my backports?

@1st1
Copy link
Member
1st1 commented Feb 2, 2018

Alright, I guess that's a green light for my backports?

Let's see if our bots can do them automagically first ;)

@1st1 1st1 merged commit 2f79c01 into python:master Feb 2, 2018
@miss-islington
Copy link
Contributor

Thanks @bharel for the PR, and @1st1 for merging it 🌮🎉.. I'm working now to backport this PR to: 3.6, 3.7.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Feb 2, 2018
)

(cherry picked from commit 2f79c01)

Co-authored-by: Bar Harel <bzvi7919@gmail.com>
@bedevere-bot
Copy link

GH-5501 is a backport of this pull request to the 3.7 branch.

@miss-islington
Copy link
Contributor

Sorry, @bharel and @1st1, I could not cleanly backport this to 3.6 due to a conflict.
Please backport using cherry_picker on command line.
cherry_picker 2f79c014931cbb23b08a7d16c534a3cc9607ae14 3.6

@bedevere-bot
Copy link

GH-5502 is a backport of this pull request to the 3.6 branch.

@bharel
Copy link
Contributor Author
bharel commented Feb 2, 2018

Thanks @1st1 :-)
Done with 3.6 backport.
Btw, doesn't it apply to 3.5 as well or did we stop supporting it?

@1st1
Copy link
Member
1st1 commented Feb 2, 2018

3.5 is in security-only fixes mode IIRC.

1st1 pushed a commit that referenced this pull request Feb 2, 2018
…5501)

(cherry picked from commit 2f79c01)

Co-authored-by: Bar Harel <bzvi7919@gmail.com>
1st1 pushed a commit that referenced this pull request Feb 2, 2018
@bharel bharel deleted the bpo-32734 branch December 23, 2019 12:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants
0