-
-
Notifications
You must be signed in to change notification settings - Fork 11k
ENH: Add ARM64 (aarch64) CI testing #28263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@r-devulap or maybe @seiko2plus do either of you know what's up with the failing CI run? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @abhishek-iitmadras, thanks for working on improving aarch64 testing! Before looking in detail at the diff here, let's discuss what is actually needed. There are too many jobs here, that isn't needed nor sustainable. We already have Linux aarch64 CI: https://github.com/numpy/numpy/blob/main/tools/ci/cirrus_arm.yml. So rather than adding extra jobs, let's remove the Cirrus CI one and migrate that to linux.yml
indeed. We want a single job like on Cirrus, not repeated for multiple Python versions. We also don't need the smoke test - let's simply do one job. It's okay (but not essential) for that to run the full rather than the fast test suite.
We don't want more OpenBLAS job, but it'd be okay to switch one nightly job over to arm64
.
The compiler sanitizer job isn't needed either in my opinion, that's a fiddly/expensive one and it's not super stable, nor does it add a whole lot over running it on x86-64.
For linux_simd.yml
, I think it may make sense to add one job, but I'm not quite sure. Could you please check which -Dcpu-baseline
value(s) are not duplicate with what is anyway run by default in the other jobs on arm64
?
c590d2a
to
bdd95ed
Compare
bdd95ed
to
c41fd10
Compare
Migrate
SGTM, Now there is only one configuration as
Removed. IMO, We should attempt to change the runner from macos-latest to ubuntu-22.04-arm with efficient cache support to determine if it improves build times. or not
All four configurations listed below are unique and required:
Thanks @rgommers |
Fix done in #28294 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is starting to look closer, thanks @abhishek-iitmadras.
All four [SIMD] configurations listed below are unique and required:
Thanks. This looks correct to me. I think effectively one is going to overlap with the default config, but it's fine to leave that since the actual build options used are still different.
The new compiler_sanitizers.yml jobs, i.e clang_TSAN and clang_ASAN, have high build times, exceeding 10 min for ASAN and 20 min for TSAN.
IMO, We should attempt to change the runner from macos-latest to ubuntu-22.04-arm with efficient cache support to determine if it improves build times. or not
@ngoldbaum just migrated it to macOS, so I'll leave it to him to comment on this.
Let’s do that in a followup so it doesn’t hold up turning on aarch64 linux CI. I don’t have a problem with it and would have chosen aarch64 linux if it was easily available for numpy a couple weeks ago. |
9ad38bb
to
e7202b3
Compare
Thanks @ngoldbaum Yes, I need the sanitizer jobs to be run on an ARM64 Linux runner. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is starting to look close to ready. A few more small comments. Also, the SIMD-native job shows a build failure:
../numpy/_core/src/umath/loops_hyperbolic.dispatch.cpp.src: In function ‘vec_f64 lut_16_f64(const double*, vec_u64)’:
../numpy/_core/src/umath/loops_hyperbolic.dispatch.cpp.src:155:27: error: call to non-‘constexpr’ function ‘size_t hwy::N_SVE::Lanes(hwy::N_SVE::Simd<T, N, kPow2>) [with T = double; long unsigned int N = 32; int kPow2 = 0; size_t = long unsigned int]’
155 | if constexpr(hn::Lanes(f64) == 8){
| ~~~~~~~~~^~~~~
|
9a2d320
to
7effd03
Compare
@abhishek-iitmadras please rebase. |
7effd03
to
0b8aab4
Compare
0b8aab4
to
24514a5
Compare
Please help with review and if all ok then merge |
24514a5
to
75d40fe
Compare
a gentle reminder, thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @abhishek-iitmadras, this looks really close now. I have a few minor inline comments. My one larger one is that the NEON and ASIMD configs look identical:
Baseline:
Message:
CPU Optimization Options
baseline:
Requested : min
Enabled : NEON NEON_FP16 NEON_VFPV4 ASIMD
dispatch:
Requested : none
Enabled :
NEON:
Message:
CPU Optimization Options
baseline:
Requested : neon
Enabled : NEON NEON_FP16 NEON_VFPV4 ASIMD
dispatch:
Requested : max -xop -fma4
Enabled : ASIMDHP ASIMDFHM SVE
ASIMD:
Message:
CPU Optimization Options
baseline:
Requested : asimd
Enabled : NEON NEON_FP16 NEON_VFPV4 ASIMD
dispatch:
Requested : max -xop -fma4
Enabled : ASIMDHP ASIMDFHM SVE
Native:
Message:
CPU Optimization Options
baseline:
Requested : native
Enabled : NEON NEON_FP16 NEON_VFPV4 ASIMD ASIMDHP
dispatch:
Requested : none
Enabled :
You said these were unique - am I missing it, or do they end up being the same?
75d40fe
to
0083708
Compare
Theoretically, ASIMD is an advanced version of NEON. However, since both run on an SVE128 ARMV8 machine, they ended up with the same configurations. So, Removed Neon one for now |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM now, in it goes. Thanks a lot @abhishek-iitmadras!
Add support for testing NumPy on ARM64 architecture