8000 bpo-39847: win32: don't over-wait for mutex after tickcount overflow by bobince · Pull Request #18780 · python/cpython · GitHub
[go: up one dir, main page]

Skip to content

bpo-39847: win32: don't over-wait for mutex after tickcount overflow #18780

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Mar 11, 2020

Conversation

bobince
Copy link
Contributor
@bobince bobince commented Mar 4, 2020

The 32-bit (49-day) TickCount relied on in EnterNonRecursiveMutex can overflow in the gap between the 'target' time and the 'now' time WaitForSingleObjectEx returns, causing the loop to think it needs to wait another 49 days. This is most likely to happen when the machine is hibernated during WaitForSingleObjectEx.

This makes acquiring a lock/event/etc from the _thread or threading module appear to never timeout.

Replace with GetTickCount64. This is OK now we no longer support WinXP which lacks it; it is already in use for time.monotonic.

Approaches not taken:

  • forcing signed arithmetic for the now/target comparison (this would only partially solve the problem with GetTickCount as you could still break it by hibernating for 24 days; it's unnecessary for 64-bit which takes however many hundred million years to roll over);

  • doing anything to cope with the possibility of the new milliseconds being out of range for DWORD; this shouldn't be possible given the target-now time should never be more than the original milliseconds DWORD passed in, and PyCOND_TIMEDWAIT can't accept a longer time anyway;

  • I haven't been able to add a representative test, as it would mean waiting 49 days or some pretty nasty syscall hooking which seems like overkill. Hopefully change is fairly transparently harmless.

https://bugs.python.org/issue39847

The 32-bit (49-day) TickCount relied on in EnterNonRecursiveMutex can overflow in the gap between the 'target' time and the 'now' time WaitForSingleObjectEx returns, causing the loop to think it needs to wait another 49 days. This is most likely to happen when the machine is hibernated during WaitForSingleObjectEx.

This makes acquiring a lock/event/etc from the _thread or threading module appear to never timeout.

Replace with GetTickCount64 - this is OK now we no longer support XP which lacks it, and is in use for time.monotonic.

Approaches not taken:

- forcing signed arithmetic for the now/target comparison (this would only partially solve the problem with GetTickCount as you could still break it by hibernating for 24 days; it's unnecessary for 64-bit which takes however many hundred million years to roll over)
@bobince bobince force-pushed the bpo39847-mutex-wait-dword-overflow branch from 181d6de to 1026d8d Compare March 4, 2020 17:06
Copy link
Member
@vstinner vstinner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@zooba: Do you want to have a look?

@vstinner
Copy link
Member
vstinner commented Mar 4, 2020

cc @pitrou who loves lock if I recall correctly :-D

@miss-islington
Copy link
Contributor

Thanks @bobince for the PR, and @vstinner for merging it 🌮🎉.. I'm working now to backport this PR to: 3.7.
🐍🍒⛏🤖

@miss-islington
Copy link
Contributor

Thanks @bobince for the PR, and @vstinner for merging it 🌮🎉.. I'm working now to backport this PR to: 3.8.
🐍🍒⛏🤖

@miss-islington
Copy link
Contributor

Sorry @bobince and @vstinner, I had trouble checking out the 3.7 backport branch.
Please backport using cherry_picker on command line.
cherry_picker 64838ce7172c7a92183b39b22504b433a33a884d 3.7

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Mar 11, 2020
…8780)

The 32-bit (49-day) TickCount relied on in EnterNonRecursiveMutex can overflow
in the gap between the 'target' time and the 'now' time WaitForSingleObjectEx
returns, causing the loop to think it needs to wait another 49 days. This is
most likely to happen when the machine is hibernated during
WaitForSingleObjectEx.

This makes acquiring a lock/event/etc from the _thread or threading module
appear to never timeout.

Replace with GetTickCount64 - this is OK now Python no longer supports XP which
lacks it, and is in use for time.monotonic().

Co-authored-by: And Clover <and.clover@bromium.com>
(cherry picked from commit 64838ce)

Co-authored-by: bobince <and+github@doxdesk.com>
@bedevere-bot
Copy link

GH-18945 is a backport of this pull request to the 3.8 branch.

@vstinner
Copy link
Member

@bobince: There is a conflict on the backport to 3.7. Can you try to backport manually the fix to 3.7? Either use "git cherry-pick -x 64838ce" or try "cherry_picker 64838ce 3.7".

miss-islington added a commit that referenced this pull request Mar 11, 2020
The 32-bit (49-day) TickCount relied on in EnterNonRecursiveMutex can overflow
in the gap between the 'target' time and the 'now' time WaitForSingleObjectEx
returns, causing the loop to think it needs to wait another 49 days. This is
most likely to happen when the machine is hibernated during
WaitForSingleObjectEx.

This makes acquiring a lock/event/etc from the _thread or threading module
appear to never timeout.

Replace with GetTickCount64 - this is OK now Python no longer supports XP which
lacks it, and is in use for time.monotonic().

Co-authored-by: And Clover <and.clover@bromium.com>
(cherry picked from commit 64838ce)

Co-authored-by: bobince <and+github@doxdesk.com>
bobince added a commit to bobince/cpython that referenced this pull request Mar 12, 2020
…onGH-18780)

The 32-bit (49-day) TickCount relied on in EnterNonRecursiveMutex can overflow
in the gap between the 'target' time and the 'now' time WaitForSingleObjectEx
returns, causing the loop to think it needs to wait another 49 days. This is
most likely to happen when the machine is hibernated during
WaitForSingleObjectEx.

This makes acquiring a lock/event/etc from the _thread or threading module
appear to never timeout.

Replace with GetTickCount64 - this is OK now Python no longer supports XP which
lacks it, and is in use for time.monotonic().

Co-authored-by: And Clover <and.clover@bromium.com>
(cherry picked from commit 64838ce)

Co-authored-by: bobince <and+github@doxdesk.com>
@bedevere-bot
Copy link

GH-18959 is a backport of this pull request to the 3.7 branch.

@bobince
Copy link
Contributor Author
bobince commented Mar 12, 2020

Yep, ta. I couldn't see any conflict with my human eyes but the cherry-pick doesn't care.

miss-islington pushed a commit that referenced this pull request Mar 12, 2020
…8780) (GH-18959)

The 32-bit (49-day) TickCount relied on in EnterNonRecursiveMutex can overflow
in the gap between the 'target' time and the 'now' time WaitForSingleObjectEx
returns, causing the loop to think it needs to wait another 49 days. This is
most likely to happen when the machine is hibernated during
WaitForSingleObjectEx.

This makes acquiring a lock/event/etc from the _thread or threading module
appear to never timeout.

Replace with GetTickCount64 - this is OK now Python no longer supports XP which
lacks it, and is in use for time.monotonic().

Co-authored-by: And Clover <and.clover@bromium.com>
(cherry picked from commit 64838ce)

Co-authored-by: bobince <and+github@doxdesk.com>
@vstinner
Copy link
Member

Yep, ta. I couldn't see any conflict with my human eyes but the cherry-pick doesn't care.

Thanks. I merged your 3.7 backport.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0