8000 CI: temporarily disable free-threaded job with parallel-threads by rgommers · Pull Request #22997 · scipy/scipy · GitHub
[go: up one dir, main page]

Skip to content

CI: temporarily disable free-threaded job with parallel-threads #22997

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 16, 2025

Conversation

rgommers
Copy link
Member

Reasons:

  1. A segfault in stats/_distribution_infrastructure.py since ~1 week, see CI: Failures in pytest-run-parallel CI job #22758 (comment)
  2. Preferred not to have this not-so-stable job running on the 1.16.x branch

Re-enable on main after 1.16.x has branched and (1) is investigated.

Reasons:
1. A segfault in `stats/_distribution_infrastructure.py` since ~1 week
2. Preferred not to have this not-so-stable job running on the 1.16.x
   branch

Re-enable on `main` after 1.16.x has branched and (1) is investigated.

[skip circle]
@rgommers rgommers added this to the 1.16.0 milestone May 16, 2025
@rgommers rgommers added free-threading Items related to supporting free-threaded (a.k.a. "no-GIL") builds of CPython CI Items related to the CI tools such as CircleCI, GitHub Actions or Azure labels May 16, 2025
Copy link
Contributor
@tylerjereddy tylerjereddy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, I tried to reproduce locally as described at #22758 (comment), but ran into other issues rather than the segfault... anyway, let's put this in so the release branch has a sensible CI for now and we can consider backports/reactivating later.

@tylerjereddy tylerjereddy merged commit 08cf2f3 into scipy:main May 16, 2025
39 checks passed
@tylerjereddy
Copy link
Contributor

thanks Ralf

@rgommers rgommers deleted the ci-disable-parallel branch May 16, 2025 18:48
@mdhaber
Copy link
Contributor
mdhaber commented May 16, 2025

For future reference, would putting @pytest.mark.thread_unsafe on the revealing test do the trick if we don't want to disable the whole element of the job?

Also, I've tried to reproduce this locally, but it is outrageously slow to collect and even slower to run. gh-23002 tries to reproduce this in a stripped-down CI, but I've run it four times and everything has passed. Maybe I need to run all tests (not just stats) to see it? Would running both elements of the matrix change anything? Please feel free to commit to that PR.

@tylerjereddy
Copy link
Contributor

Ya, my experience described at #22758 (comment) was almost identical to yours I think. As I note there:

In the full testsuite via spin test -- --parallel-threads=4 I was seeing crazy slow timing just to collect the tests for some reason... slow enough that I gave up for the moment.

Similar to you, I also pondered running stats stuff in isolation vs. full suite, but the latter was intractably slow to collect tests for me as well.

I wonder if using slightly-outdated versions of 3.13.x or the supporting tools/libs can lead to pretty bad experiences still (possibly/probably?).

For future reference, would putting @pytest.mark.thread_unsafe on the revealing test do the trick if we don't want to disable the whole element of the job?

I would have thought so--that's why I was trying to reproduce it (probably you had a similar motivation). I suspect we will want to backport a fix at some point and perhaps reactivate that job eventually.

I suspect the team is just noticing that things are a bit behind schedule and trying to reduce the barriers as much as possible, which makes sense for now I suppose.

@rgommers
Copy link
Member Author

For future reference, would putting @pytest.mark.thread_unsafe on the revealing test do the trick if we don't want to disable the whole element of the job?

Yes, it avoids the problem. However, please only put a thread_unsafe marker on something if you actually understand why the test is failing. If it's a known issue then we skip it, but if the code under test is actually unsafe, it should be fixed rather than the problem hidden. At a minimum we then need a bug report to track the problem so it can be investigated/fixed later.

Similar to you, I also pondered running stats stuff in isolation vs. full suite, but the latter was intractably slow to collect tests for me as well.

Running only the submodule (or test file/class/function) that has a problem is always more efficient. Full test suite collection takes about 45 seconds at the moment for the default config, and I think with --parallel-threads=4 it may get 4x slower. Is that what you saw, or is it way more than that?

@mdhaber
Copy link
Contributor
mdhaber commented May 18, 2025

Yes, it avoids the problem

I think I see. So same as standard practice for at least the past eight years, right?

Running only the submodule (or test file/class/function) that has a problem is always more efficient.

That's what I did in gh-23002, but it doesn't seem to be reproducing the problem.

Collection takes less than a second when I run spin test -t scipy.stats.tests.test_continuous and at least 45 seconds when I run spin test -t scipy.stats.tests.test_continuous -- --parallel-threads=4. I'm using an editable install on Windows, if it makes a difference.

@rgommers
Copy link
Member Author

Collection takes less than a second when I run spin test -t scipy.stats.tests.test_continuous and at least 45 seconds when

That's bad indeed. The problem is a recent regression in pytest-run-parallel 0.4.0: Quansight-Labs/pytest-run-parallel#56. For now, downgrading to 0.3.1 will avoid the problem - that may then run into some failures related to not auto-detection pytest.warns, eager_warns & co, but that can be ignored so it may still be a helpful workaround.

So same as standard practice for at least the past eight years, right?

Yep.

I wonder if using slightly-outdated versions of 3.13.x or the supporting tools/libs can lead to pretty bad experiences still (possibly/probably?).

Perhaps, yes. The difference between CPython 3.13.2 and 3.13.3 is not large anymore, the flood of backports in the RCs and .0/.1 releases slowed down a lot. And NumPy 2.2.6 is also fine. Some other libraries/tools with more recent support may show larger differences.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI Items related to the CI tools such as CircleCI, GitHub Actions or Azure free-threading Items related to supporting free-threaded (a.k.a. "no-GIL") builds of CPython
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0