ENH: Add ARM64 (aarch64) CI testing #28263

abhishek-iitmadras · 2025-02-02T20:51:15Z

Add support for testing NumPy on ARM64 architecture

ngoldbaum · 2025-02-04T23:30:54Z

@r-devulap or maybe @seiko2plus do either of you know what's up with the failing CI run?

rgommers

Hi @abhishek-iitmadras, thanks for working on improving aarch64 testing! Before looking in detail at the diff here, let's discuss what is actually needed. There are too many jobs here, that isn't needed nor sustainable. We already have Linux aarch64 CI: https://github.com/numpy/numpy/blob/main/tools/ci/cirrus_arm.yml. So rather than adding extra jobs, let's remove the Cirrus CI one and migrate that to linux.yml indeed. We want a single job like on Cirrus, not repeated for multiple Python versions. We also don't need the smoke test - let's simply do one job. It's okay (but not essential) for that to run the full rather than the fast test suite.

We don't want more OpenBLAS job, but it'd be okay to switch one nightly job over to arm64.

The compiler sanitizer job isn't needed either in my opinion, that's a fiddly/expensive one and it's not super stable, nor does it add a whole lot over running it on x86-64.

For linux_simd.yml, I think it may make sense to add one job, but I'm not quite sure. Could you please check which -Dcpu-baseline value(s) are not duplicate with what is anyway run by default in the other jobs on arm64?

.github/workflows/linux.yml

abhishek-iitmadras · 2025-02-07T19:33:41Z

Hi @abhishek-iitmadras, thanks for working on improving aarch64 testing! Before looking in detail at the diff here, let's discuss what is actually needed. There are too many jobs here, that isn't needed nor sustainable. We already have Linux aarch64 CI: https://github.com/numpy/numpy/blob/main/tools/ci/cirrus_arm.yml. So rather than adding extra jobs, let's remove the Cirrus CI one and migrate that to linux.yml indeed. We want a single job like on Cirrus, not repeated for multiple Python versions. We also don't need the smoke test - let's simply do one job. It's okay (but not essential) for that to run the full rather than the fast test suite.

Migrate cirrus_arm.yml into linux.yml, ensuring the use of efficient caching. Additionally, remove the smoke test, use Python 3.11 only, and run the full test suite.

We don't want more OpenBLAS job, but it'd be okay to switch one nightly job over to arm64.

SGTM, Now there is only one configuration as {name: "OpenBLAS 64-bit nightly", use_nightly: true, bits: 64}

The compiler sanitizer job isn't needed either in my opinion, that's a fiddly/expensive one and it's not super stable, nor does it add a whole lot over running it on x86-64.

Removed.
But The new compiler_sanitizers.yml jobs, i.e clang_TSAN and clang_ASAN, have high build times, exceeding 10 min for ASAN and 20 min for TSAN (see Clang_sanitizers jobs).

IMO, We should attempt to change the runner from macos-latest to ubuntu-22.04-arm with efficient cache support to determine if it improves build times. or not

For linux_simd.yml, I think it may make sense to add one job, but I'm not quite sure. Could you please check which -Dcpu-baseline value(s) are not duplicate with what is anyway run by default in the other jobs on arm64?

All four configurations listed below are unique and required:

config:
          - name: "baseline only"
            args: "-Dallow-noblas=true -Dcpu-dispatch=none"
          - name: "with NEON"
            args: "-Dallow-noblas=true -Dcpu-baseline=neon"
          - name: "with ASIMD"
            args: "-Dallow-noblas=true -Dcpu-baseline=asimd"
          - name: "native"
            args: "-Dallow-noblas=true -Dcpu-baseline=native -Dcpu-dispatch=none"

Thanks @rgommers

abhishek-iitmadras · 2025-02-07T19:36:33Z

@r-devulap or maybe @seiko2plus do either of you know what's up with the failing CI run?

Fix done in #28294
Please have a look into #28294 first and then to this patch.

rgommers

This is starting to look closer, thanks @abhishek-iitmadras.

All four [SIMD] configurations listed below are unique and required:

Thanks. This looks correct to me. I think effectively one is going to overlap with the default config, but it's fine to leave that since the actual build options used are still different.

The new compiler_sanitizers.yml jobs, i.e clang_TSAN and clang_ASAN, have high build times, exceeding 10 min for ASAN and 20 min for TSAN.
IMO, We should attempt to change the runner from macos-latest to ubuntu-22.04-arm with efficient cache support to determine if it improves build times. or not

@ngoldbaum just migrated it to macOS, so I'll leave it to him to comment on this.

.github/workflows/linux.yml

.github/workflows/linux_blas.yml

.github/workflows/linux_simd.yml

ngoldbaum · 2025-02-11T14:39:51Z

8000

just migrated it to macOS, so I'll leave it to him to comment on this.

Let’s do that in a followup so it doesn’t hold up turning on aarch64 linux CI. I don’t have a problem with it and would have chosen aarch64 linux if it was easily available for numpy a couple weeks ago.

abhishek-iitmadras · 2025-02-13T08:06:23Z

just migrated it to macOS, so I'll leave it to him to comment on this.

Let’s do that in a followup so it doesn’t hold up turning on aarch64 linux CI. I don’t have a problem with it and would have chosen aarch64 linux if it was easily available for numpy a couple weeks ago.

Thanks @ngoldbaum

Yes, I need the sanitizer jobs to be run on an ARM64 Linux runner.

rgommers

This is starting to look close to ready. A few more small comments. Also, the SIMD-native job shows a build failure:

../numpy/_core/src/umath/loops_hyperbolic.dispatch.cpp.src: In function ‘vec_f64 lut_16_f64(const double*, vec_u64)’:
../numpy/_core/src/umath/loops_hyperbolic.dispatch.cpp.src:155:27: error: call to non-‘constexpr’ function ‘size_t hwy::N_SVE::Lanes(hwy::N_SVE::Simd<T, N, kPow2>) [with T = double; long unsigned int N = 32; int kPow2 = 0; size_t = long unsigned int]’
  155 |     if constexpr(hn::Lanes(f64) == 8){
      |                  ~~~~~~~~~^~~~~

.github/workflows/linux.yml

abhishek-iitmadras · 2025-02-13T09:13:52Z

This is starting to look close to ready. A few more small comments. Also, the SIMD-native job shows a build failure:

../numpy/_core/src/umath/loops_hyperbolic.dispatch.cpp.src: In function ‘vec_f64 lut_16_f64(const double*, vec_u64)’:
../numpy/_core/src/umath/loops_hyperbolic.dispatch.cpp.src:155:27: error: call to non-‘constexpr’ function ‘size_t hwy::N_SVE::Lanes(hwy::N_SVE::Simd<T, N, kPow2>) [with T = double; long unsigned int N = 32; int kPow2 = 0; size_t = long unsigned int]’
  155 |     if constexpr(hn::Lanes(f64) == 8){
      |                  ~~~~~~~~~^~~~~

Fix in #28294, wait for @Mousius to review and merge

r-devulap · 2025-02-13T18:24:32Z

@abhishek-iitmadras please rebase.

abhishek-iitmadras · 2025-02-24T11:39:26Z

Please help with review and if all ok then merge

@rgommers @ngoldbaum @charris

abhishek-iitmadras · 2025-03-10T15:25:16Z

a gentle reminder, thanks

@rgommers

rgommers

Thanks @abhishek-iitmadras, this looks really close now. I have a few minor inline comments. My one larger one is that the NEON and ASIMD configs look identical:

Baseline:

Message: 
CPU Optimization Options
  baseline:
    Requested : min
    Enabled   : NEON NEON_FP16 NEON_VFPV4 ASIMD
  dispatch:
    Requested : none
    Enabled   :

NEON:

Message: 
CPU Optimization Options
  baseline:
    Requested : neon
    Enabled   : NEON NEON_FP16 NEON_VFPV4 ASIMD
  dispatch:
    Requested : max -xop -fma4
    Enabled   : ASIMDHP ASIMDFHM SVE

ASIMD:

Message: 
CPU Optimization Options
  baseline:
    Requested : asimd
    Enabled   : NEON NEON_FP16 NEON_VFPV4 ASIMD
  dispatch:
    Requested : max -xop -fma4
    Enabled   : ASIMDHP ASIMDFHM SVE

Native:

 Message: 
CPU Optimization Options
  baseline:
    Requested : native
    Enabled   : NEON NEON_FP16 NEON_VFPV4 ASIMD ASIMDHP
  dispatch:
    Requested : none
    Enabled   :

You said these were unique - am I missing it, or do they end up being the same?

.github/workflows/linux.yml

.github/workflows/linux_simd.yml

abhishek-iitmadras · 2025-03-12T12:21:36Z

Thanks @abhishek-iitmadras, this looks really close now. I have a few minor inline comments. My one larger one is that the NEON and ASIMD configs look identical:

Baseline:

Message: 
CPU Optimization Options
  baseline:
    Requested : min
    Enabled   : NEON NEON_FP16 NEON_VFPV4 ASIMD
  dispatch:
    Requested : none
    Enabled   :

NEON:

Message: 
CPU Optimization Options
  baseline:
    Requested : neon
    Enabled   : NEON NEON_FP16 NEON_VFPV4 ASIMD
  dispatch:
    Requested : max -xop -fma4
    Enabled   : ASIMDHP ASIMDFHM SVE

ASIMD:

Message: 
CPU Optimization Options
  baseline:
    Requested : asimd
    Enabled   : NEON NEON_FP16 NEON_VFPV4 ASIMD
  dispatch:
    Requested : max -xop -fma4
    Enabled   : ASIMDHP ASIMDFHM SVE

Native:

 Message: 
CPU Optimization Options
  baseline:
    Requested : native
    Enabled   : NEON NEON_FP16 NEON_VFPV4 ASIMD ASIMDHP
  dispatch:
    Requested : none
    Enabled   :

You said these were unique - am I missing it, or do they end up being the same?

Theoretically, ASIMD is an advanced version of NEON. However, since both run on an SVE128 ARMV8 machine, they ended up with the same configurations.

So, Removed Neon one for now

rgommers

LGTM now, in it goes. Thanks a lot @abhishek-iitmadras!

github-actions bot added the 01 - Enhancement label Feb 2, 2025

abhishek-iitmadras marked this pull request as ready for review February 2, 2025 21:07

rgommers requested changes Feb 5, 2025

View reviewed changes

rgommers added the component: CI label Feb 5, 2025

abhishek-iitmadras force-pushed the abhishek-ci branch from c590d2a to bdd95ed Compare February 6, 2025 21:50

ngoldbaum reviewed Feb 6, 2025

View reviewed changes

.github/workflows/linux.yml Outdated Show resolved Hide resolved

abhishek-iitmadras force-pushed the abhishek-ci branch from bdd95ed to c41fd10 Compare February 6, 2025 22:05

rgommers requested changes Feb 11, 2025

View reviewed changes

abhishek-iitmadras force-pushed the abhishek-ci branch 2 times, most recently from 9ad38bb to e7202b3 Compare February 13, 2025 07:21

rgommers reviewed Feb 13, 2025

View reviewed changes

.github/workflows/linux.yml Outdated Show resolved Hide resolved

.github/workflows/linux.yml Show resolved Hide resolved

abhishek-iitmadras force-pushed the abhishek-ci branch 2 times, most recently from 9a2d320 to 7effd03 Compare February 13, 2025 09:59

abhishek-iitmadras force-pushed the abhishek-ci branch from 7effd03 to 0b8aab4 Compare February 13, 2025 18:37

abhishek-iitmadras mentioned this pull request Feb 17, 2025

MAINT: Enable building tanh on vector length agnostic architectures #28294

Merged

abhishek-iitmadras force-pushed the abhishek-ci branch from 0b8aab4 to 24514a5 Compare February 24, 2025 10:54

abhishek-iitmadras mentioned this pull request Feb 24, 2025

Arm64 CI: Run debug and full builds CI on arm64 runners too #28381

Closed

mayeut mentioned this pull request Feb 26, 2025

CI: build Linux aarch64 wheels on GitHub Actions #28392

Merged

abhishek-iitmadras force-pushed the abhishek-ci branch from 24514a5 to 75d40fe Compare March 10, 2025 15:22

rgommers reviewed Mar 12, 2025

View reviewed changes

.github/workflows/linux.yml Outdated Show resolved Hide resolved

.github/workflows/linux.yml Outdated Show resolved Hide resolved

.github/workflows/linux.yml Outdated Show resolved Hide resolved

.github/workflows/linux_simd.yml Outdated Show resolved Hide resolved

add aarch64 CI

17050cf

change as per request

0083708

abhishek-iitmadras force-pushed the abhishek-ci branch from 75d40fe to 0083708 Compare March 12, 2025 11:55

rgommers approved these changes Mar 12, 2025

View reviewed changes

rgommers merged commit eedb1f8 into numpy:main Mar 12, 2025
72 checks passed

rgommers added this to the 2.3.0 release milestone Mar 12, 2025

abhishek-iitmadras mentioned this pull request Mar 12, 2025

CI: aarch64_test fails on main branch #28487

Closed

abhishek-iitmadras deleted the abhishek-ci branch April 7, 2025 20:50

Uh oh!

ENH: Add ARM64 (aarch64) CI testing #28263

ENH: Add ARM64 (aarch64) CI testing #28263

Uh oh!

Conversation

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!