8000 ENH: Add ARM64 (aarch64) CI testing by abhishek-iitmadras · Pull Request #28263 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

ENH: Add ARM64 (aarch64) CI testing #28263

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Mar 12, 2025
Merged

Conversation

abhishek-iitmadras
Copy link
Contributor

Add support for testing NumPy on ARM64 architecture

@abhishek-iitmadras abhishek-iitmadras marked this pull request as ready for review February 2, 2025 21:07
@ngoldbaum
Copy link
Member

@r-devulap or maybe @seiko2plus do either of you know what's up with the failing CI run?

Copy link
Member
@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @abhishek-iitmadras, thanks for working on improving aarch64 testing! Before looking in detail at the diff here, let's discuss what is actually needed. There are too many jobs here, that isn't needed nor sustainable. We already have Linux aarch64 CI: https://github.com/numpy/numpy/blob/main/tools/ci/cirrus_arm.yml. So rather than adding extra jobs, let's remove the Cirrus CI one and migrate that to linux.yml indeed. We want a single job like on Cirrus, not repeated for multiple Python versions. We also don't need the smoke test - let's simply do one job. It's okay (but not essential) for that to run the full rather than the fast test suite.

We don't want more OpenBLAS job, but it'd be okay to switch one nightly job over to arm64.

The compiler sanitizer job isn't needed either in my opinion, that's a fiddly/expensive one and it's not super stable, nor does it add a whole lot over running it on x86-64.

For linux_simd.yml, I think it may make sense to add one job, but I'm not quite sure. Could you please check which -Dcpu-baseline value(s) are not duplicate with what is anyway run by default in the other jobs on arm64?

@abhishek-iitmadras
Copy link
Contributor Author
abhishek-iitmadras commented Feb 7, 2025

Hi @abhishek-iitmadras, thanks for working on improving aarch64 testing! Before looking in detail at the diff here, let's discuss what is actually needed. There are too many jobs here, that isn't needed nor sustainable. We already have Linux aarch64 CI: https://github.com/numpy/numpy/blob/main/tools/ci/cirrus_arm.yml. So rather than adding extra jobs, let's remove the Cirrus CI one and migrate that to linux.yml indeed. We want a single job like on Cirrus, not repeated for multiple Python versions. We also don't need the smoke test - let's simply do one job. It's okay (but not essential) for that to run the full rather than the fast test suite.

Migrate cirrus_arm.yml into linux.yml, ensuring the use of efficient caching. Additionally, remove the smoke test, use Python 3.11 only, and run the full test suite.

We don't want more OpenBLAS job, but it'd be okay to switch one nightly job over to arm64.

SGTM, Now there is only one configuration as {name: "OpenBLAS 64-bit nightly", use_nightly: true, bits: 64}

The compiler sanitizer job isn't needed either in my opinion, that's a fiddly/expensive one and it's not super stable, nor does it add a whole lot over running it on x86-64.

Removed.
But The new compiler_sanitizers.yml jobs, i.e clang_TSAN and clang_ASAN, have high build times, exceeding 10 min for ASAN and 20 min for TSAN (see Clang_sanitizers jobs).

IMO, We should attempt to change the runner from macos-latest to ubuntu-22.04-arm with efficient cache support to determine if it improves build times. or not

For linux_simd.yml, I think it may make sense to add one job, but I'm not quite sure. Could you please check which -Dcpu-baseline value(s) are not duplicate with what is anyway run by default in the other jobs on arm64?

All four configurations listed below are unique and required:

config:
          - name: "baseline only"
            args: "-Dallow-noblas=true -Dcpu-dispatch=none"
          - name: "with NEON"
            args: "-Dallow-noblas=true -Dcpu-baseline=neon"
          - name: "with ASIMD"
            args: "-Dallow-noblas=true -Dcpu-baseline=asimd"
          - name: "native"
            args: "-Dallow-noblas=true -Dcpu-baseline=native -Dcpu-dispatch=none"

Thanks @rgommers

@abhishek-iitmadras
Copy link
Contributor Author

@r-devulap or maybe @seiko2plus do either of you know what's up with the failing CI run?

Fix done in #28294
Please have a look into #28294 first and then to this patch.

Copy link
Member
@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is starting to look closer, thanks @abhishek-iitmadras.

All four [SIMD] configurations listed below are unique and required:

Thanks. This looks correct to me. I think effectively one is going to overlap with the default config, but it's fine to leave that since the actual build options used are still different.

The new compiler_sanitizers.yml jobs, i.e clang_TSAN and clang_ASAN, have high build times, exceeding 10 min for ASAN and 20 min for TSAN.
IMO, We should attempt to change the runner from macos-latest to ubuntu-22.04-arm with efficient cache support to determine if it improves build times. or not

@ngoldbaum just migrated it to macOS, so I'll leave it to him to comment on this.

@ngoldbaum
Copy link
Member
8000

just migrated it to macOS, so I'll leave it to him to comment on this.

Let’s do that in a followup so it doesn’t hold up turning on aarch64 linux CI. I don’t have a problem with it and would have chosen aarch64 linux if it was easily available for numpy a couple weeks ago.

@abhishek-iitmadras abhishek-iitmadras force-pushed the abhishek-ci branch 2 times, most recently from 9ad38bb to e7202b3 Compare February 13, 2025 07:21
@abhishek-iitmadras
Copy link
Contributor Author

just migrated it to macOS, so I'll leave it to him to comment on this.

Let’s do that in a followup so it doesn’t hold up turning on aarch64 linux CI. I don’t have a problem with it and would have chosen aarch64 linux if it was easily available for numpy a couple weeks ago.

Thanks @ngoldbaum

Yes, I need the sanitizer jobs to be run on an ARM64 Linux runner.

Copy link
Member
@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is starting to look close to ready. A few more small comments. Also, the SIMD-native job shows a build failure:

../numpy/_core/src/umath/loops_hyperbolic.dispatch.cpp.src: In function ‘vec_f64 lut_16_f64(const double*, vec_u64)’:
../numpy/_core/src/umath/loops_hyperbolic.dispatch.cpp.src:155:27: error: call to non-‘constexpr’ function ‘size_t hwy::N_SVE::Lanes(hwy::N_SVE::Simd<T, N, kPow2>) [with T = double; long unsigned int N = 32; int kPow2 = 0; size_t = long unsigned int]’
  155 |     if constexpr(hn::Lanes(f64) == 8){
      |                  ~~~~~~~~~^~~~~

@abhishek-iitmadras
Copy link
Contributor Author

This is starting to look close to ready. A few more small comments. Also, the SIMD-native job shows a build failure:

../numpy/_core/src/umath/loops_hyperbolic.dispatch.cpp.src: In function ‘vec_f64 lut_16_f64(const double*, vec_u64)’:
../numpy/_core/src/umath/loops_hyperbolic.dispatch.cpp.src:155:27: error: call to non-‘constexpr’ function ‘size_t hwy::N_SVE::Lanes(hwy::N_SVE::Simd<T, N, kPow2>) [with T = double; long unsigned int N = 32; int kPow2 = 0; size_t = long unsigned int]’
  155 |     if constexpr(hn::Lanes(f64) == 8){
      |                  ~~~~~~~~~^~~~~

Fix in #28294, wait for @Mousius to review and merge

@abhishek-iitmadras abhishek-iitmadras force-pushed the abhishek-ci branch 2 times, most recently from 9a2d320 to 7effd03 Compare February 13, 2025 09:59
@r-devulap
Copy link
Member

@abhishek-iitmadras please rebase.

@abhishek-iitmadras
Copy link
Contributor Author
abhishek-iitmadras commented Feb 24, 2025

Please help with review and if all ok then merge

@rgommers @ngoldbaum @charris

@abhishek-iitmadras
Copy link
Contributor Author
abhishek-iitmadras commented Mar 10, 2025

a gentle reminder, thanks

@rgommers

Copy link
Member
@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @abhishek-iitmadras, this looks really close now. I have a few minor inline comments. My one larger one is that the NEON and ASIMD configs look identical:

Baseline:

Message: 
CPU Optimization Options
  baseline:
    Requested : min
    Enabled   : NEON NEON_FP16 NEON_VFPV4 ASIMD
  dispatch:
    Requested : none
    Enabled   : 

NEON:

Message: 
CPU Optimization Options
  baseline:
    Requested : neon
    Enabled   : NEON NEON_FP16 NEON_VFPV4 ASIMD
  dispatch:
    Requested : max -xop -fma4
    Enabled   : ASIMDHP ASIMDFHM SVE

ASIMD:

Message: 
CPU Optimization Options
  baseline:
    Requested : asimd
    Enabled   : NEON NEON_FP16 NEON_VFPV4 ASIMD
  dispatch:
    Requested : max -xop -fma4
    Enabled   : ASIMDHP ASIMDFHM SVE

Native:

 Message: 
CPU Optimization Options
  baseline:
    Requested : native
    Enabled   : NEON NEON_FP16 NEON_VFPV4 ASIMD ASIMDHP
  dispatch:
    Requested : none
    Enabled   : 

You said these were unique - am I missing it, or do they end up being the same?

@abhishek-iitmadras
Copy link
Contributor Author
abhishek-iitmadras commented Mar 12, 2025

Thanks @abhishek-iitmadras, this looks really close now. I have a few minor inline comments. My one larger one is that the NEON and ASIMD configs look identical:

Baseline:

Message: 
CPU Optimization Options
  baseline:
    Requested : min
    Enabled   : NEON NEON_FP16 NEON_VFPV4 ASIMD
  dispatch:
    Requested : none
    Enabled   : 

NEON:

Message: 
CPU Optimization Options
  baseline:
    Requested : neon
    Enabled   : NEON NEON_FP16 NEON_VFPV4 ASIMD
  dispatch:
    Requested : max -xop -fma4
    Enabled   : ASIMDHP ASIMDFHM SVE

ASIMD:

Message: 
CPU Optimization Options
  baseline:
    Requested : asimd
    Enabled   : NEON NEON_FP16 NEON_VFPV4 ASIMD
  dispatch:
    Requested : max -xop -fma4
    Enabled   : ASIMDHP ASIMDFHM SVE

Native:

 Message: 
CPU Optimization Options
  baseline:
    Requested : native
    Enabled   : NEON NEON_FP16 NEON_VFPV4 ASIMD ASIMDHP
  dispatch:
    Requested : none
    Enabled   : 

You said these were unique - am I missing it, or do they end up being the same?

Theoretically, ASIMD is an advanced version of NEON. However, since both run on an SVE128 ARMV8 machine, they ended up with the same configurations.

So, Removed Neon one for now

Copy link
Member
@rgommers rgommers left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now, in it goes. Thanks a lot @abhishek-iitmadras!

@rgommers rgommers merged commit eedb1f8 into numpy:main Mar 12, 2025
72 checks passed
@rgommers rgommers added this to the 2.3.0 release milestone Mar 12, 2025
@abhishek-iitmadras abhishek-iitmadras deleted the abhishek-ci branch April 7, 2025 20:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0