10000 gh-133885: Use locks instead of critical sections for _zstd by emmatyping · Pull Request #134289 · python/cpython · GitHub
[go: up one dir, main page]

Skip to content

gh-133885: Use locks instead of critical sections for _zstd #134289

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 18 commits into from
May 23, 2025

Conversation

emmatyping
Copy link
Member
@emmatyping emmatyping commented May 19, 2025

Move from using critical sections to locks for the (de)compression methods. Since the methods allow other threads to run, we should use a lock rather than a critical section.

I'm not totally sure about the removal of the critical sections in the _get_(C|D)Dict function, but it should be safe to yield to other threads in an initializer as the values cannot be modified (ZstdDicts are immutable).

Also remove critical sections around code only run in initializer.
Copy link
Member
@ZeroIntensity ZeroIntensity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a downgrade. Though rare, functions like PyMem_Malloc can call arbitrary Python code, so we lose our protection against deadlocks. What's the rationale here?

@colesbury colesbury self-requested a review May 20, 2025 17:52
@colesbury
Copy link
Contributor

@ZeroIntensity - PyMem_Malloc is not allowed to call back into Python.

@ZeroIntensity
Copy link
Member

Oh, that's news to me. When did that change?

@colesbury
Copy link
Contributor

@ZeroIntensity - that's always been the case, as far as I know. A lot of things would break if PyMem_Malloc were allowed to call back into Python. PyObject_GC_Malloc used to be able to call back into Python (via triggering GC), but that hasn't been the case since 3.12.

@ZeroIntensity
Copy link
Member

PyObject_GC_Malloc used to be able to call back into Python (via triggering GC), but that hasn't been the case since 3.12.

Ah, that's what I was thinking of.

We do need to document this contract somewhere. I'm imagining a case where an embedder sets their own allocator that calls Python code, because they have an attached thread state. The documentation doesn't say you can't do that, as far as I can tell.

Copy link
Member
@ZeroIntensity ZeroIntensity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I understand this a little better now, sorry!

I'm a bit iffy about the lock-ordering here. I think the lock is only held under non-reentrant calls? PyErr_SetString and friends seem to be safe, but it looks like they might be able to re-enter if someone did weird things to the exception state. I wouldn't worry about it, really, but it's worth bringing up.

@dpdani
Copy link
Contributor
dpdani commented May 21, 2025

Are we sure about the implications of implicit locking in de/compressor contexts?
I believe we might carefully implement this now only to discourage it later.

If two threads concurrently write into a compressor context with implicit locking they might succeed in corrupting their data. Say:

  1. thread 1 writes <html>
  2. thread 2 writes a png file
  3. thread 1 writes </html>

Or pick whichever other interleaving of the two threads.
The corruption is only evident after decompressing back the data: from zstd's point of view there is nothing wrong with the above.

This is just one example, but I think there's reasonable expectation that users won't want to share de/compressor contexts anyways.

I think raising an exception when detecting concurrent use is a more desirable outcome.

@emmatyping
Copy link
Member Author
8000

If two threads concurrently write into a compressor context with implicit locking they might succeed in corrupting their data. Say:

thread 1 writes
thread 2 writes a png file
thread 1 writes

I think I would say don't write different data to the same stream with threading. I can append things to a shared list that makes it invalid for what I want but that is a programmer error. I don't particularly see a reason to share a compression context between threads, but I do think we might see people move a compression context into a thread to do all the compression in another thread, and we should try to support that.

This also reverts the setdefault usage which we don't need anymore and refactors the _zstd_load_(c,d)_dict functions to not use goto/properly lock around dictionary usage.
@dpdani
Copy link
Contributor
dpdani commented May 21, 2025

Oh yes, I agree.

Let me explain a bit better what my intention was with python-zstandard.

Instead of using a lock-based approach, I used a flag to signify that a de/compressor context was in use by some thread.
If another thread tried to set the same de/compressor's flag (with an atomic compare-and-set operation) while it was still in use, then it would raise an exception.

The differences with using locks are:

  1. a ConcurrentUseException is raised when appropriate, which should
  2. make users be aware that data corruption may occur, caused by multithreading.

And, in both approaches:

  1. no segmentation fault can be issued by the underlying zstd implementation.

IIUC, you alluded to an ownership transfer model. Note that this approach is not in contrast with that model. If a de/compressor object is created by one thread and then transferred to a different thread for use, no exception is raised. The only problematic usage that the exception would signal is when a thread calls into a context method while another thread was already using the context.

In fact, this is not even in contrast with other thread synchronization mechanisms that don't use ownership transfer (e.g. barriers): if the program makes sure that there is no concurrent access, then the exception is never raised.

Of course, the choice is up to you. I just wanted to make sure we considered both approaches.

@ZeroIntensity
Copy link
Member

Hmm, I don't think we should be inventing object ownership models here. That makes sense for a third-party, but that sort of thing should get its own proposal for the stdlib as a whole.

Locking should be sufficient, right? If torn writes at the method-level are an issue, we should just document that users need to have an additional layer of synchronization when using compressors concurrently.

- Assert locks are held in `_get_(C/D)Dict`
- Update self->d_dict outside `Py_BEGIN_ALLOW_THREADS`
- removed critical section AC I missed
@ZeroIntensity
Copy link
Member

!buildbot nogil refleak

@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @ZeroIntensity for commit d142aa8 🤖

Results will be shown at:

https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F134289%2Fmerge

The command will test the builders whose names match following regular expression: nogil refleak

The builders matched are:

  • AMD64 CentOS9 NoGIL Refleaks PR
  • PPC64LE Fedora Rawhide NoGIL refleaks PR
  • s390x Fedora Rawhide NoGIL refleaks PR
  • aarch64 Fedora Rawhide NoGIL refleaks PR
  • AMD64 Fedora Rawhide NoGIL refleaks PR

Copy link
Member
@ZeroIntensity ZeroIntensity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yay, buildbots passed! LGTM too, apart from one (speculative) concern about re-entrancy.

@emmatyping
Copy link
Member Author

!buildbot nogil refleak

@bedevere-bot
Copy link

🤖 New build scheduled with the buildbot fleet by @emmatyping for commit 69a755b 🤖

Results will be shown at:

https://buildbot.python.org/all/#/grid?branch=refs%2Fpull%2F134289%2Fmerge

The command will test the builders whose names match following regular expression: nogil refleak

The builders matched are:

  • AMD64 CentOS9 NoGIL Refleaks PR
  • PPC64LE Fedora Rawhide NoGIL refleaks PR
  • AMD64 Fedora Rawhide NoGIL refleaks PR
  • s390x Fedora Rawhide NoGIL refleaks PR
  • aarch64 Fedora Rawhide NoGIL refleaks PR

Copy link
Member
@ZeroIntensity ZeroIntensity left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM assuming buildbots pass

@emmatyping
Copy link
Member Author

Hm, that CentOS 9 buildbot doesn't have zstd installed but it is failing due to unrelated issues.

@emmatyping
Copy link
Member Author

(FWIW, I ran the refleak hunter locally on a nogil build and it passed)

@emmatyping
Copy link
Member Author

Opened #134557 for the free threaded buildbot refleak

@colesbury colesbury merged commit 8dbc119 into python:main May 23, 2025
46 of 49 checks passed
@miss-islington-app
Copy link

Thanks @emmatyping for the PR, and @colesbury for merging it 🌮🎉.. I'm working now to backport this PR to: 3.14.
🐍🍒⛏🤖

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request May 23, 2025
…thongh-134289)

Move from using critical sections to locks for the (de)compression methods.
Since the methods allow other threads to run, we should use a lock rather
than a critical section.
(cherry picked from commit 8dbc119)

Co-authored-by: Emma Smith <emma@emmatyping.dev>
@bedevere-app
Copy link
bedevere-app bot commented May 23, 2025

GH-134560 is a backport of this pull request to the 3.14 branch.

@bedevere-app bedevere-app bot removed the needs backport to 3.14 bugs and security fixes label May 23, 2025
@emmatyping
Copy link
Member Author

Thanks all for the reviews! Thinking through free threading reentrancy at the C level is something I'm still getting a feel for. Chatting about this fix has definitely helped with my understanding :)

colesbury pushed a commit that referenced this pull request May 23, 2025
…h-134289) (gh-134560)

Move from using critical sections to locks for the (de)compression methods.
Since the methods allow other threads to run, we should use a lock rather
than a critical section.
(cherry picked from commit 8dbc119)

Co-authored-by: Emma Smith <emma@emmatyping.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants
0