8000 gh-93821: Handle connection resets on Windows by Klaar-pretzel · Pull Request #124779 · python/cpython · GitHub
[go: up one dir, main page]

Skip to content

gh-93821: Handle connection resets o 8000 n Windows #124779

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

Klaar-pretzel
Copy link
@Klaar-pretzel Klaar-pretzel commented Sep 30, 2024

Refactor OSError re-throwing and wrap all instances of 'ov.getresult()' with the decorator so that client-side disconnects can be gracefully handled. Update the Proactor server to handle the connection and continue looping and serving (client disconnects are not a fatal error).

I discovered that, especially under heavy server load, the asycio Windows server could slow down and could end up throwing [WinError 64] The specified network name is no longer available errors. This occurs when clients connect and disconnect before the asyncio code has time to service the connection, especially if the client side disconnects erroneously (for example, if it's process was immediately killed before it could gracefully close the connection). These errors would bubble up into the proactor server loop and cause the socket there close - and never get re-opened. The result of this is that my servers would appear to hang, requiring a force-restart to restore their functionality.

The solution here catches all potential connection reset errors in the IocpProactor class by clients. It appears that the sending-side was previously capturing the errors, but likely the tight timing of the client connection sequence meant errors in these steps weren't noticed previously.

Refactor OSError re-throwing and wrap all instances of 'ov.getresult()'
with the decorator so that client-side disconnects can be gracefully
handled. Update the Proactor server to handle the connection and
continue looping and serving (client disconnects are not a fatal error).
@ghost
Copy link
ghost commented Sep 30, 2024

All commit authors signed the Contributor License Agreement.
CLA signed

@bedevere-app
Copy link
bedevere-app bot commented Sep 30, 2024

Most changes to Python require a NEWS entry. Add one using the blurb_it web app or the blurb command-line tool.

If this change has little impact on Python users, wait for a maintainer to apply the skip news label instead.

@Klaar-pretzel
Copy link
Author

Similar to this one #124032, however this is a little bit more complete by capturing more potential problems that were observed in the original patch.

@@ -0,0 +1 @@
Fix error handling in windows events where clients terminating connections could result in an asyncio server using Proactor event loops to hang indefinitely.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About NEWS format please to reading this: https://devguide.python.org/documentation/markup/
for example:

:mod:`asyncio`

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there something I need to change, or is the comment informational?

Copy link
Contributor
@rruuaanng rruuaanng Sep 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, sorry. I meant that your NEWS format seems incorrect. Maybe you should check out the NEWS written by others in the Misc section, that might be better.

Edit: For the module, you should use the example I provided.

@@ -0,0 +1 @@
Fix error handling in windows events where clients terminating connections could result in an asyncio server using Proactor event loops to hang indefinitely.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

think this is what they meant:

Suggested change
Fix error handling in windows events where clients terminating connections could result in an asyncio server using Proactor event loops to hang indefinitely.
Fix error handling in windows events where clients terminating connections could result in an :mod:`asyncio` server using Proactor event loops to hang indefinitely.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it! Thanks for clarification.

Klaar-pretzel and others added 2 commits October 2, 2024 21:55
Keep docs better organized by linking the module the change was destined
for.
except exceptions.CancelledError:
# Effectively ignore connections that throw a cancelled error
# during setup, loop back around and continue serving.
sock.close()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suppressing CancelledError is a huge red flag and should only be done in desperate times. Can you write an elaborate comment what's going on here? I don't understand why CancelledError needs suppressing neither from this comment nor from the PR description.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is supposed to be:

except ConnectionResetError:
    self.call_soon(loop)

https://github.com/python/cpython/pull/124032/files#diff-15176d64608861034db7168743f4f116f4cb33422cf685a2b911481c55826aa2R861-R862

We have reproduced this: the regular uvicorn+fastapi server receives a connection from a process that is forcefully terminated afterward.
What _loop needs to do: determine if an error = OSError: [WinError 64] The specified network name is no longer available; in that case, it should just run the next iteration of the loop.

Our working fix looks like that: https://gist.github.com/cybergrind/9cbbdc94503548d74dc4d5d3ed99248c

@python-cla-bot
Copy link

The following commit authors need to sign the Contributor License Agreement:

CLA signed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0