8000 GH-100573: Fix server hang caused by os.stat() on named pipe (Windows) by gvanrossum · Pull Request #100959 · python/cpython · GitHub
[go: up one dir, main page]

Skip to content

GH-100573: Fix server hang caused by os.stat() on named pipe (Windows) #100959

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jan 13, 2023

Conversation

gvanrossum
Copy link
Member
@gvanrossum gvanrossum commented Jan 11, 2023

@gvanrossum
Copy link
Member Author

@kumaraditya303 Any chance you have time to review this? And maybe @eryksun too?

The only possibly controversial thing is that I added self.call_soon(loop_accept_pipe) to the OSError except clause as well, so that other OS errors don't do the same thing and disable the server.

Comment on lines 270 to 271
async def stat_it():
os.stat(ADDRESS)
Copy link
Contributor
@eryksun eryksun Jan 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case the implementation of os.stat() changes, what basically needs to be tested is whether a pipe connection that's closed immediately causes the server to die, such as os.close(os.open(ADDRESS, os.O_RDONLY)).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried it and it does not reproduces the error. Note that I am using Windows 11.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I found the same problem. I also had to add two await asyncio.sleep(0) calls after server.close() to make the final close(open()) call fail as expected.

So I think I'd rather merge this as is, using os.stat().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All that os.stat() is doing that's even remotely relevant here is calling h = CreateFileW(ADDRESS, ...), doing a bit of work that takes a little time, and then calling CloseHandle(h).

os.stat() won't be relevant to the error at all if Microsoft provides a better way to implement os.stat() that doesn't require opening a handle to the pipe, or if we decide that we should just inspect a "\\.\PIPE\" path as a string in os.stat() instead of opening a handle.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then how would you fix the test? os.close(os.open()) doesn’t fail without the fix.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe os.open() actually isn't fast enough to trigger the race condition reliably. Try _winapi.CloseHandle(_overlapped.ConnectPipe(ADDRESS)).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I forgot that os.stat() also calls GetFileAttributesW() on the pipe while the handle is open. This tries to open a second pipe connection before the first one has been closed. The following reproduces the scenario:

            h = _overlapped.ConnectPipe(ADDRESS)
            try:
                _winapi.CloseHandle(_overlapped.ConnectPipe(ADDRESS))
            except OSError as e:
                if e.winerror != _overlapped.ERROR_PIPE_BUSY:
                    raise
            finally:
                _winapi.CloseHandle(h)

In the test, 2/5 attempts at opening a second pipe connection fail with ERROR_PIPE_BUSY, so this error is intentionally ignored.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we don’t need a test. All this is irrelevant for the fix. I can either delete the test, keep the PR as is, or close it unresolved. (Someone else can try again.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code in my last comment reliably triggers the problem for me because it ensures that the client connects and disconnects from the server's next pipe instance, leaving the pipe in a broken state before the server calls ConnectNamedPipe().

If it comes to closing the PR or deleting the test, I vote for deleting the test. It should have been designed from the beginning to handle the broken pipe error. Not every aspect of a correct design has to be tested.

server.close()

with self.assertRaises(FileNotFoundError):
os.stat(ADDRESS)
Copy link
Contributor
@eryksun eryksun Jan 12, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similarly here, os.close(os.open(ADDRESS, os.O_RDONLY)) will raise FileNotFoundError after the server has closed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For that we need to wait for closing the server so asyncio.sleep(0) is required.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't look like that's needed with the latest probe code.

@kumaraditya303 kumaraditya303 added type-bug An unexpected behavior, bug, or error topic-asyncio needs backport to 3.10 only security fixes needs backport to 3.11 only security fixes labels Jan 12, 2023
Copy link
Contributor
@kumaraditya303 kumaraditya303 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@gvanrossum
Copy link
Member Author

Please re-review, I've added Eryk's code. I've confirmed that it reproduces the bug and refactored the test so the probe code only needs to be written once.

@gvanrossum gvanrossum merged commit 1bc7a73 into python:main Jan 13, 2023
@miss-islington
Copy link
Contributor

Thanks @gvanrossum for the PR 🌮🎉.. I'm working now to backport this PR to: 3.10, 3.11.
🐍🍒⛏🤖

@gvanrossum gvanrossum deleted the asyncio-winpipe branch January 13, 2023 21:25
@bedevere-bot
Copy link

GH-101019 is a backport of this pull request to the 3.11 branch.

@bedevere-bot bedevere-bot removed the needs backport to 3.11 only security fixes label Jan 13, 2023
@bedevere-bot
Copy link

GH-101020 is a backport of this pull request to the 3.10 branch.

@bedevere-bot bedevere-bot removed the needs backport to 3.10 only security fixes label Jan 13, 2023
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Jan 13, 2023
…indows) (pythonGH-100959)

(cherry picked from commit 1bc7a73)

Co-authored-by: Guido van Rossum <guido@python.org>
miss-islington pushed a commit to miss-islington/cpython that referenced this pull request Jan 13, 2023
…indows) (pythonGH-100959)

(cherry picked from commit 1bc7a73)

Co-authored-by: Guido van Rossum <guido@python.org>
gvanrossum pushed a commit that referenced this pull request Jan 13, 2023
gvanrossum pushed a commit that referenced this pull request Jan 13, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
topic-asyncio type-bug An unexpected behavior, bug, or error
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0