8000 Sentry background worker is chronically blocking async event loop when many exceptions are raised · Issue #2824 · getsentry/sentry-python · GitHub
[go: up one dir, main page]

Skip to content

Sentry background worker is chronically blocking async event loop when many exceptions are raised #2824

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
cpvandehey opened this issue Mar 14, 2024 · 17 comments
Labels
Better async support Component: SDK Core Dealing with the core of the SDK Component: Transport Dealing with the transport Improvement Python SDK

Comments

@cpvandehey
Copy link

How do you use Sentry?

Self-hosted/on-premise

Version

1.40.6

Steps to Reproduce

Hello! And thanks for reading my ticket :)

The python sentry client is a synchronous client library that is retrofitted to fit the async model (by spinning off separate threads to avoid disrupting the event loop thread -- see background worker (1) for thread usage).

Under healthy conditions, the sentry client doesn’t need to make many web requests. However, if conditions become rocky and exceptions are frequently raised (caught or uncaught), the sentry client may become an extreme inhibitor to the app event loop (assuming high sample rate). This is due to the necessary OS thread context switching that effectively pauses/blocks the event loop to work on other threads (i.e the background worker (1)). This is not a recommended pattern (obviously) due to the costs of switching threads, but can be useful for quickly/lazily retrofitting sync code.

Relevant flow - in short:
Every time an exception is raised (caught or uncaught) in my code, a web request is immediately made to dump the data to sentry when sampled. Since sentry’s background worker is thread based (1), this will trigger an thread context switch and then a synchronous web request to dump the data to sentry. When applications receive many exceptions in a short period of time, this becomes a context switching nightmare.

Suggestion:
In an ideal world, sentry would asyncify its Background worker to use a task (1) and its transport layer (2) would use aiohttp. I don't think this is of super high complexity, but I could be wrong.

An immediate workaround could be made with more background worker control. If sentry’s background worker made web requests to dump data at configurable intervals, it would behave far more efficiently for event loops apps. At the moment, the background worker always dumps data immediately with regards to exceptions. In my opinion, if sentry is flushing data at app exit, having a 60 second timer to dump data would alleviate most of the symptoms I described above without ever losing data (albeit it would be up to 60 seconds slower).

(1) -

class BackgroundWorker(object):

(2) -

response = self._pool.request(

Expected Result

I expect to have less thread context switching when using sentry.

Actual Result

I see a lot of thread context switching when there are high exception rates.

@getsantry getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 2 Mar 14, 2024
@antonpirker
Copy link
Member

Hey @cpvandehey ! Thanks for the great ticket!

@sentrivana
Copy link
Contributor

Hey @cpvandehey, thanks for writing in. Definitely agree with you that our async support could definitely use some improvements (see e.g. #1735, #2007, #2184, and multiple other issues).

Using an aiohttp client and an asyncio task both sounds doable and would go a long way in making the SDK more async friendly.

@sentrivana sentrivana added the Component: Transport Dealing with the transport label Mar 18, 2024
@antonpirker
Copy link
Member

We could detect if aiohttp is in the project and based on this enable the new async support automatically. (have not thought long about this if this could lead to problems though..)

@sentrivana sentrivana added this to the Better async support milestone Mar 18, 2024
@cpvandehey
Copy link
Author
cpvandehey commented Apr 29, 2024

Hey @sentrivana / @antonpirker , any update on the progress for this? Happy to help

@getsantry getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 2 Apr 29, 2024
@sentrivana
Copy link
Contributor

Hey @cpvandehey, no news on this, but PRs are always welcome if you feel like giving this a shot.

@antonpirker antonpirker removed this from the Better async support milestone Jun 20, 2024
@cpvandehey
Copy link
Author

I see the milestone for this task was removed. @antonpirker, should we still consider writing our own attempt?

@sentrivana
Copy link
Contributor

Hey @cpvandehey, sorry for the confusion regarding the milestone. Previously we were (mis)using milestones to group issues together, but have now decided to abandon that system. Nothing has changed priority wise.

@cpvandehey
Copy link
Author

Alright, I think im going to start to implement this. Stay tuned.

@cpvandehey
Copy link
Author

Hey Sentry folks.

Currently we are in the middle of doing a big refactor, where we try to use OpenTelementry (OTel) under the hood for our performance instrumentation.

Just bumping this ticket again. I assume the repo is in a better state to start this effort?

@getsantry getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 3 Sep 23, 2024
@BYK
Copy link
Member
BYK commented Sep 23, 2024

Hi @cpvandehey, thanks for the bump! I do agree the repo is in a better shape. Moreover I've started working on an experimental HTTP/2 transport using httpcore and looks like it has native async support for that part.

Since I'm lending some of my time to the Python SDK nowadays and working in a similar area, I think we can work together on the async support too.

I don't think I'm as well-versed as you are when it comes to async game in Python so I can use some of the code you say you've already written such as the async background worker (I wonder if we actually need it with async as all we need is a queue and the event loops should handle the worker logic, right?).

Anyway, this is hopefully coming and your involvement is much appreciated!

sentrivana added a commit that referenced this issue Oct 4, 2024
All our ingest endpoints support HTTP/2 and some even HTTP/3 which are significantly more efficient compared to HTTP/1.1 with multiplexing and, header compression, connection reuse and 0-RTT TLS.

This patch adds an experimental HTTP2Transport with the help of httpcore library. It makes minimal changes to the original HTTPTransport that said with httpcore we should be able to implement asyncio support easily and remove the worker logic (see #2824).

This should also open the door for future HTTP/3 support (see encode/httpx#275).


---------

Co-authored-by: Ivana Kellyer <ivana.kellyer@sentry.io>
@cpvandehey
Copy link
Author
cpvandehey commented Feb 12, 2025

Hey @BYK , I see your code was merged to add in httpcore (1)

Does that mean this would be a good time to start the work on the above? Would love to help in any way that I can

(1) - #3588

@getsantry getsantry bot moved this to Waiting for: Product Owner in GitHub Issues with 👀 3 Feb 12, 2025
@BYK
Copy link
Member
BYK commented Feb 13, 2025

@cpvandehey yeah! Sorry this side stalled a bit, juggling too many things at once 😅

Now that we have httpcore with HTTP/2 support, I think we can try to re-imagine the worker thread-based transport as a completely asyncio-based one. I'd say it should be a new, experimental transport.

Would love to help in any way that I can

If you have any time, trying to take a jab at this new transport would be nice. Second best would be to try this new transport once it lands and see if it actually solves your issues and is stable.

@oktavlachs
Copy link

@cpvandehey

Could you give us an update on your plan? Have you started yet? Do you need support?
Thanks for your efforts taken so far! :)

@BYK
Copy link
Member
BYK commented Apr 24, 2025

Hi @oktavlachs -- apologies for the radio silence. I'm planning to dive into this issue in the coming 1-2 weeks. I'm guessing the changes will require a major version bump which is also cooking.

No work has started yet so if you wanna jump in or provide any directions on where we should go, I'd definitely take those into account.

@stephanie-anderson stephanie-anderson added the Python SDK label Apr 25, 2025 — with Linear
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Better async support Component: SDK Core Dealing with the core of the SDK Component: Transport Dealing with the transport Improvement Python SDK
Projects
Status: No status
Status: No status
Development

No branches or pull requests

7 participants
0