From e40ef29cbc7592312494a108e2071a56680f8a51 Mon Sep 17 00:00:00 2001
From: DWesl <22566757+DWesl@users.noreply.github.com>
Date: Thu, 4 Feb 2021 00:11:13 -0500
Subject: [PATCH 01/12] TST: Try to set up a test environment on Cygwin.

Edit: Fix spelling of "pull request"

TST: Follow workflow for other OSs using commands available on Cygwin.

Other GitHub actions set fetch-depth to zero and use ./.github/actions
for the actual testing.  Let's see if this works.

Edit: Drop shell from the last step in the action.

Apparently "shell" isn't valid for steps.  I hope the action knows to
use bash rather than Power Shell or cmd.exe, or this is going to get
very confused.

TST: Adjust for cygwin

The action assumes `sudo` exists, which is a bit of a problem.

TST: Fetch with Cygwin git

Make sure Cygwin git can access the repository.  Versioneer depends on
git, so this is kind of important.

Edit: Include closing quote.

Edit: Install Cygwin git.

Versioneer needs this in order to function.

Edit: FIXUP: build_src option is not spelled --verbose-config

It's --verbose-cfg.

TST: Check which version of Python is being used.

I want to be sure it's a Cygwin one, not a Windows one.

TST: Build a wheel for Cygwin.

I may need to install in a virtualenv, and installing from a wheel is
much faster than installing from a directory.

TST: Use runtests.py to build the extensions.

It couldn't find them during the tests, which is not a problem I've
run into locally.  Hopefully this fixes that.

TST: Drop separate build step.

TST: Cygwin CI: Install numpy and run tests from that.

Running with runtests.py didn't work, but this should.

TST: Add cygwin bin dir to path, not root dir.

TST: Make sure Cygwin build fails when tests fail

The tests managed to pass despite NumPy not importing.  This is not a good thing.

TST: Avoid steps after tests in Cygwin CI.

I really need this to fail if NumPy doesn't import.

TST: Ensure Cygwin CI is running in Cygwin.

Kinda defeats the point if it's not.

TST: Check that pip installed the C extensions.

This really shouldn't need to be checked, but the CI runs have been
failing due to a failed import of a C extension.  I also printed the
version, which will make me feel better about the right version being
found and used.

TST: Include closing quote in test command

TST: Report loaded modules

Still trying to find why the C extension module import fails.

TST: Check permissions on NumPy C extensions

TST: Ask python which files it's trying to load.

Hopefully this gives me a general direction for where to look for
"Cannot find file or directory"

TST: Work around powershell syntax weirdness

Drop a level of quoting and just write the command to a file, then run that.  It should still show DLL permissions.

TST: Fix line endings in script.

bash expects \n
powershell creates \r\n

TST: Check for DLLs required by C extensions in Cygwin CI.

It's still not working; hopefully this shows why.

TST: Add import checks to the dll testing.

I have a project where `python -c 'import module'` worked but
`pytest --pyargs module` did not.  Lets see if that happens here.

Allow for global installs

The runners have global install privileges, so pip will install NumPy there.  I need to account for that in my script.

Fix sed regex.

Lots of leaning-toothpick syndrome, but it actually does what it's supposed to locally.  Hopefully the CI agrees.

Stop trying to import NumPy modules from sourcedir

This doesn't work and hasn't for a while.

TST: Simplify PATH and make sure lapack is on it.

Most recent CI run said it couldn't find lapack.
Reduce PATH to just Cygwin directories, make sure /usr/lib/lapack is included (after /usr/bin), and try again.

Shortening PATH occasionally fixes some other problems.  Let's see if that works here.

TST: Change Cygwin CI python from 3.8 to 3.7

Let's see if this solves the problems.  There was a change in DLL load path handling in 3.8 that might be causing the "cannot load numpy.linalg.lapack_lite" errors.

TST: Stop running CI on PR branch.

STY: Wrap long lines in Cygwin workflow file.

TST: Update name of "main" branch used to trigger workflow.

I forgot this changed a while back.

TST: Specify full paths for commands.

Also use dash in more places.

TST: Change the newline-escape mechanism in Cygwin workflow.

Backslash is apparently difficult.  This reduces that step to a single
command and tells YAML the string is to be interpreted as a single
line.

TST: Move the test command onto one line so GitHub can read it.

Apparently the GitHub Actions YAML parser is incomplete.  It should be
reading what was there as a single-line string with no linebreak at
the end and all linebreaks in the middle replaced by spaces.  It seems
to have parsed this as two lines.

Does GitHub have a document saying which subset of YAML they actually
recognize?

TST: Add dependency on importlib-metadata.

Pytest didn't declare it.

TST: Move importlib-metadata earlier in install list.

I hope this means it actually gets installed.  The manager is ignoring
it right now.

TST: Add zipp to the package install list.

Also tell pip to install the test_requirements, so I don't keep
running into this problem.

TST: Specify full path to python for testing.

I apparently missed this earlier.

TST: Add CFFI and pytz to Cygwin test environment.

TST: Set global path to include /usr/lib/lapack.

I'd forgotten there was a line for this already.

TST: Shorten and unpin the Cygwin test requirements.

I don't want to build more modules than I have to.  It works fine on
my machine with the system setuptools, pytz, and CFFI as well as
system-ish Cython, so it should work fine on the CI runners.

TST: Make sure test requirements are actually installed.

Apparently
pytest->importlib_metadata->"typing_extensions; python_version <= 3.7"
isn't a declared dependency chain.  Mentioning the last two
explicitly should get them installed anyway.

TST: Fix quoting in command line.
---
 .github/workflows/cygwin.yml | 84 ++++++++++++++++++++++++++++++++++++
 1 file changed, 84 insertions(+)
 create mode 100644 .github/workflows/cygwin.yml

diff --git a/.github/workflows/cygwin.yml b/.github/workflows/cygwin.yml
new file mode 100644
index 000000000000..75a8a0818d88
--- /dev/null
+++ b/.github/workflows/cygwin.yml
@@ -0,0 +1,84 @@
+name: Test on Cygwin
+on:
+  push:
+    branches:
+      - main
+  pull_request:
+    branches:
+      - main
+jobs:
+  cygwin_build_test:
+    runs-on: windows-latest
+    steps:
+      - uses: actions/checkout@v2
+        with:
+          submodules: recursive
+          fetch-depth: 0
+      - name: Install Cygwin
+        uses: egor-tensin/setup-cygwin@v3
+        with:
+          platform: x64
+          install-dir: 'C:\tools\cygwin'
+          packages: >
+            python37-devel python37-zipp python37-importlib-metadata
+            python37-cython python37-pip python37-wheel python37-cffi
+            python37-pytz python37-setuptools python37-pytest
+            python37-hypothesis liblapack-devel libopenblas
+            gcc-fortran git dash
+      - name: Set Windows PATH
+        uses: egor-tensin/cleanup-path@v1
+        with:
+          dirs: 'C:\tools\cygwin\bin;C:\tools\cygwin\lib\lapack'
+      - name: Verify that bash is Cygwin bash
+        run: |
+          command bash
+          bash -c "uname -svrmo"
+      - name: Update with Cygwin git
+        # fetch-depth=0 above should make this short.
+        run: |
+          dash -c "which git; /usr/bin/git fetch --all -p"
+      - name: Verify python version
+        # Make sure it's the Cygwin one, not a Windows one
+        run: |
+          dash -c "which python3.7; /usr/bin/python3.7 --version -V"
+      - name: Build NumPy wheel
+        run: |
+          dash -c "/usr/bin/python3.7 -m pip install 'setuptools<49.2.0' pytest pytz cffi pickle5 importlib_metadata typing_extensions"
+          dash -c "/usr/bin/python3.7 setup.py bdist_wheel"
+      - name: Install new NumPy
+        run: |
+          bash -c "/usr/bin/python3.7 -m pip install dist/numpy-*cp37*.whl"
+      - name: Run NumPy test suite
+        run: >-
+          dash -c "/usr/bin/python3.7 runtests.py -n -vv"
+      - name: Upload wheel if tests fail
+        uses: actions/upload-artifact@v2
+        if: failure()
+        with:
+          name: numpy-cygwin-wheel
+          path: dist/numpy-*cp37*.whl
+      - name: On failure check the extension modules
+        if: failure()
+        run: |
+          dash -c "/usr/bin/python3.7 -m pip show numpy"
+          dash -c "/usr/bin/python3.7 -m pip show -f numpy | grep .dll"
+          echo >list_dlls_dos.sh 'site_packages=$(python3.7 -m pip show numpy | \
+              grep Location | cut -d " " -f 2 -);
+          dll_list=$(for name in $(python3.7 -m pip show -f numpy | \
+              grep -F .dll); do echo ${site_packages}/${name}; done)
+          ls -l ${dll_list}
+          file ${dll_list}
+          ldd ${dll_list}
+          cygcheck ${dll_list}
+          cd dist/
+          for name in ${dll_list};
+          do
+              echo ${name}
+              python3.7 -c "import "$(echo ${name} | \
+                  sed -E \
+                  -e "s/\/+(home|usr).*?site-packages\/+//g" -e "s/\//./g" \
+                  -e "s/.cpython-3.m?-x86(_64)?-cygwin.dll//g")
+          done
+          '
+          dash -c "/bin/tr -d '\r' <list_dlls_dos.sh >list_dlls_unix.sh"
+          dash "list_dlls_unix.sh"

From 9ad49e47bebed493431e3cd27f1c5d902ba990b1 Mon Sep 17 00:00:00 2001
From: DWesl <22566757+DWesl@users.noreply.github.com>
Date: Thu, 15 Apr 2021 10:23:27 -0400
Subject: [PATCH 02/12] TST: Clean up the output from the extension module
 check.

Also serves to document the script a bit.
---
 .github/workflows/cygwin.yml | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/.github/workflows/cygwin.yml b/.github/workflows/cygwin.yml
index 75a8a0818d88..6e21acec386c 100644
--- a/.github/workflows/cygwin.yml
+++ b/.github/workflows/cygwin.yml
@@ -62,22 +62,30 @@ jobs:
         run: |
           dash -c "/usr/bin/python3.7 -m pip show numpy"
           dash -c "/usr/bin/python3.7 -m pip show -f numpy | grep .dll"
-          echo >list_dlls_dos.sh 'site_packages=$(python3.7 -m pip show numpy | \
+          echo >list_dlls_dos.sh '#!/bin/dash
+          site_packages=$(python3.7 -m pip show numpy | \
               grep Location | cut -d " " -f 2 -);
           dll_list=$(for name in $(python3.7 -m pip show -f numpy | \
               grep -F .dll); do echo ${site_packages}/${name}; done)
+          echo "Checks for existence, permissions and file type"
           ls -l ${dll_list}
           file ${dll_list}
-          ldd ${dll_list}
-          cygcheck ${dll_list}
+          echo "Dependency checks"
+          ldd ${dll_list} | grep -F -e " => not found" && exit 1
+          cygcheck ${dll_list} >cygcheck_dll_list 2>cygcheck_missing_deps
+          grep -F -e "cygcheck: track_down: could not find " cygcheck_missing_deps && exit 1
+          echo "Import tests"
+          mkdir -p dist/
           cd dist/
           for name in ${dll_list};
           do
               echo ${name}
-              python3.7 -c "import "$(echo ${name} | \
+              ext_module=$(echo ${name} | \
                   sed -E \
-                  -e "s/\/+(home|usr).*?site-packages\/+//g" -e "s/\//./g" \
-                  -e "s/.cpython-3.m?-x86(_64)?-cygwin.dll//g")
+                      -e "s/^\/+(home|usr).*?site-packages\/+//" \
+                      -e "s/.cpython-3.m?-x86(_64)?-cygwin.dll$//" \
+                      -e "s/\//./g")
+              python3.7 -c "import ${ext_module}"
           done
           '
           dash -c "/bin/tr -d '\r' <list_dlls_dos.sh >list_dlls_unix.sh"

From 952b89fdfe1842b292bba2848db3f2ea294b1e94 Mon Sep 17 00:00:00 2001
From: DWesl <22566757+DWesl@users.noreply.github.com>
Date: Sun, 25 Apr 2021 10:19:53 -0400
Subject: [PATCH 03/12] BLD: Tell NumPy about functions that cause test
 failures.

Several of the test failures still happen.  I'm not entirely sure why.

BLD: Add more functions to the Cygwin replace list.

These functions are mentioned in test failures, so I mark them for
replacement, along with the more obvious functions that might get
called by functions that continue to fail.

TST: List more functions to be replaced on Cygwin.

casin and casinf don't pass the branch cut tests.  Let's see if
replacing them lets the test pass on CI.

TST: Mark more functions to be replaced on Cygwin.

I tried to note the tests each function fails by group, but I don't
remember all of them anymore.

Let's see if casin{,f,l} gets replaced on the CI runner.

BLD: Mark more functions for replacement on Cygwin.

I probably need to run `git clean` to see improvements locally.

BLD:List more functions to be replaced on Cygwin.

This is nearly the last of them.

There are still a few failures I don't know how to deal with.
The cabsl/hypotl overflows will be gone next Cygwin update.
I may have an idea for cpowl (if cpowl(x, n) == cexpl(n*clogl(x)))
I don't understand timezones, CFFI, or LAPACK,
and that's the rest of the failures.

BLD: Change list of functions marked for replacement on Cygwin.

I hoped this would convince `cpowl` to flag its overflows, but that
doesn't appear to have happened.
---
 numpy/core/src/common/npy_config.h | 45 ++++++++++++++++++++++++++++++
 1 file changed, 45 insertions(+)

diff --git a/numpy/core/src/common/npy_config.h b/numpy/core/src/common/npy_config.h
index 61cc3c7f18d4..c6de0cd30794 100644
--- a/numpy/core/src/common/npy_config.h
+++ b/numpy/core/src/common/npy_config.h
@@ -96,6 +96,51 @@
 #undef HAVE_POWL
 #endif
 
+#ifdef __CYGWIN__
+/* Loss of precision */
+#undef HAVE_CASINHL
+#undef HAVE_CASINH
+#undef HAVE_CASINHF
+
+/* Loss of precision */
+#undef HAVE_CATANHL
+#undef HAVE_CATANH
+#undef HAVE_CATANHF
+
+/* Loss of precision and branch cuts */
+#undef HAVE_CATANL
+#undef HAVE_CATAN
+#undef HAVE_CATANF
+
+/* Branch cuts */
+#undef HAVE_CACOSHF
+#undef HAVE_CACOSH
+
+/* Branch cuts */
+#undef HAVE_CSQRTF
+#undef HAVE_CSQRT
+
+/* Branch cuts and loss of precision */
+#undef HAVE_CASINF
+#undef HAVE_CASIN
+#undef HAVE_CASINL
+
+/* Branch cuts */
+#undef HAVE_CACOSF
+#undef HAVE_CACOS
+
+/* log2(exp2(i)) off by a few eps */
+#undef HAVE_LOG2
+
+/* np.power(..., dtype=np.complex256) doesn't report overflow */
+#undef HAVE_CPOWL
+#undef HAVE_CEXPL
+
+/* Builtin abs reports overflow */
+#undef HAVE_CABSL
+#undef HAVE_HYPOTL
+#endif
+
 /* Disable broken gnu trig functions */
 #if defined(HAVE_FEATURES_H)
 #include <features.h>

From 66c15c0835d6efd433c106c7a0fc9eb3149bef7c Mon Sep 17 00:00:00 2001
From: DWesl <22566757+DWesl@users.noreply.github.com>
Date: Wed, 5 May 2021 07:54:20 -0400
Subject: [PATCH 04/12] TST: Return to requirements in test_requirements.txt

There will probably be compilation involved for coverage (from
pytest-cov), maybe also cython, but the rest look like pure python and
should be fast.
---
 .github/workflows/cygwin.yml | 1 +
 1 file changed, 1 insertion(+)

diff --git a/.github/workflows/cygwin.yml b/.github/workflows/cygwin.yml
index 6e21acec386c..8fd4babd24f0 100644
--- a/.github/workflows/cygwin.yml
+++ b/.github/workflows/cygwin.yml
@@ -44,6 +44,7 @@ jobs:
       - name: Build NumPy wheel
         run: |
           dash -c "/usr/bin/python3.7 -m pip install 'setuptools<49.2.0' pytest pytz cffi pickle5 importlib_metadata typing_extensions"
+          dash -c "/usr/bin/python3.7 -m pip install -r test_requirements.txt"
           dash -c "/usr/bin/python3.7 setup.py bdist_wheel"
       - name: Install new NumPy
         run: |

From 8f473b15b8c4e15b6c2a889b32dade8c15280833 Mon Sep 17 00:00:00 2001
From: DWesl <22566757+DWesl@users.noreply.github.com>
Date: Wed, 5 May 2021 09:46:25 -0400
Subject: [PATCH 05/12] BLD: Export random distribution functions on Cygwin.

CFFI should be able to find them now.  The Cygwin runtime DLL loader
is the Windows one, and the linker also shares most of the same
semantics.
---
 numpy/core/include/numpy/random/distributions.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/numpy/core/include/numpy/random/distributions.h b/numpy/core/include/numpy/random/distributions.h
index c58024605ff5..554198174919 100644
--- a/numpy/core/include/numpy/random/distributions.h
+++ b/numpy/core/include/numpy/random/distributions.h
@@ -28,7 +28,7 @@ extern "C" {
 #define RAND_INT_MAX INT64_MAX
 #endif
 
-#ifdef _MSC_VER
+#if defined(_MSC_VER) || defined(__CYGWIN__)
 #define DECLDIR __declspec(dllexport)
 #else
 #define DECLDIR extern

From aa9fd3c7cb7c535355996c591b210beeec71a700 Mon Sep 17 00:00:00 2001
From: DWesl <22566757+DWesl@users.noreply.github.com>
Date: Wed, 12 May 2021 09:43:35 -0400
Subject: [PATCH 06/12] BUG: Set default hypotl to use npy_longdouble
 arithmetic.

The implementation is already there, and the tests require
npy_longdouble arithmetic, so I set up the boilerplate to make it so.

It seems to fix only np.abs(npy_clongdouble), not
abs(npy_clongdouble), for reasons I don't understand.
---
 .../core/src/npymath/npy_math_internal.h.src  | 27 ++++++++++++-------
 1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/numpy/core/src/npymath/npy_math_internal.h.src b/numpy/core/src/npymath/npy_math_internal.h.src
index 1e46a23031fe..ab1f4557bdad 100644
--- a/numpy/core/src/npymath/npy_math_internal.h.src
+++ b/numpy/core/src/npymath/npy_math_internal.h.src
@@ -188,10 +188,15 @@ NPY_INPLACE double npy_atan2(double y, double x)
 
 #endif
 
-#ifndef HAVE_HYPOT
-NPY_INPLACE double npy_hypot(double x, double y)
+/**begin repeat
+ * #type = npy_float, npy_double, npy_longdouble#
+ * #c = f, , l#
+ * #C = F, , L#
+ */
+#ifndef HAVE_HYPOT@C@
+NPY_INPLACE @type@ npy_hypot@c@(@type@ x, @type@ y)
 {
-    double yx;
+    @type@ yx;
 
     if (npy_isinf(x) || npy_isinf(y)) {
         return NPY_INFINITY;
@@ -201,10 +206,11 @@ NPY_INPLACE double npy_hypot(double x, double y)
         return NPY_NAN;
     }
 
-    x = npy_fabs(x);
-    y = npy_fabs(y);
+    x = npy_fabs@c@(x);
+    y = npy_fabs@c@(y);
+    /* Ensure |x| >= |y|, switching if needed */
     if (x < y) {
-        double temp = x;
+        @type@ temp = x;
         x = y;
         y = temp;
     }
@@ -213,10 +219,11 @@ NPY_INPLACE double npy_hypot(double x, double y)
     }
     else {
         yx = y/x;
-        return x*npy_sqrt(1.+yx*yx);
+        return x*npy_sqrt@c@(1.+yx*yx);
     }
 }
 #endif
+/**end repeat**/
 
 #ifndef HAVE_ACOSH
 NPY_INPLACE double npy_acosh(double x)
@@ -361,7 +368,7 @@ NPY_INPLACE double npy_log2(double x)
  * asin, acos, atan,
  * asinh, acosh, atanh
  *
- * hypot, atan2, pow, fmod, modf
+ * atan2, pow, fmod, modf
  * ldexp, frexp
  *
  * We assume the above are always available in their double versions.
@@ -398,8 +405,8 @@ NPY_INPLACE @type@ npy_@kind@@c@(@type@ x)
 /**end repeat1**/
 
 /**begin repeat1
- * #kind = atan2,hypot,pow,fmod,copysign#
- * #KIND = ATAN2,HYPOT,POW,FMOD,COPYSIGN#
+ * #kind = atan2,pow,fmod,copysign#
+ * #KIND = ATAN2,POW,FMOD,COPYSIGN#
  */
 #ifdef @kind@@c@
 #undef @kind@@c@

From 5b8cb3af6fe8eeff01effbea4b0818636ac6bce4 Mon Sep 17 00:00:00 2001
From: DWesl <22566757+DWesl@users.noreply.github.com>
Date: Thu, 24 Jun 2021 17:53:57 -0400
Subject: [PATCH 07/12] Go back to old npy_hypotl and mark the failing test
 case.

---
 .../core/src/npymath/npy_math_internal.h.src  | 27 +++++++-----------
 numpy/core/tests/test_scalarmath.py           | 28 +++++++++++++++++--
 2 files changed, 36 insertions(+), 19 deletions(-)

diff --git a/numpy/core/src/npymath/npy_math_internal.h.src b/numpy/core/src/npymath/npy_math_internal.h.src
index ab1f4557bdad..1e46a23031fe 100644
--- a/numpy/core/src/npymath/npy_math_internal.h.src
+++ b/numpy/core/src/npymath/npy_math_internal.h.src
@@ -188,15 +188,10 @@ NPY_INPLACE double npy_atan2(double y, double x)
 
 #endif
 
-/**begin repeat
- * #type = npy_float, npy_double, npy_longdouble#
- * #c = f, , l#
- * #C = F, , L#
- */
-#ifndef HAVE_HYPOT@C@
-NPY_INPLACE @type@ npy_hypot@c@(@type@ x, @type@ y)
+#ifndef HAVE_HYPOT
+NPY_INPLACE double npy_hypot(double x, double y)
 {
-    @type@ yx;
+    double yx;
 
     if (npy_isinf(x) || npy_isinf(y)) {
         return NPY_INFINITY;
@@ -206,11 +201,10 @@ NPY_INPLACE @type@ npy_hypot@c@(@type@ x, @type@ y)
         return NPY_NAN;
     }
 
-    x = npy_fabs@c@(x);
-    y = npy_fabs@c@(y);
-    /* Ensure |x| >= |y|, switching if needed */
+    x = npy_fabs(x);
+    y = npy_fabs(y);
     if (x < y) {
-        @type@ temp = x;
+        double temp = x;
         x = y;
         y = temp;
     }
@@ -219,11 +213,10 @@ NPY_INPLACE @type@ npy_hypot@c@(@type@ x, @type@ y)
     }
     else {
         yx = y/x;
-        return x*npy_sqrt@c@(1.+yx*yx);
+        return x*npy_sqrt(1.+yx*yx);
     }
 }
 #endif
-/**end repeat**/
 
 #ifndef HAVE_ACOSH
 NPY_INPLACE double npy_acosh(double x)
@@ -368,7 +361,7 @@ NPY_INPLACE double npy_log2(double x)
  * asin, acos, atan,
  * asinh, acosh, atanh
  *
- * atan2, pow, fmod, modf
+ * hypot, atan2, pow, fmod, modf
  * ldexp, frexp
  *
  * We assume the above are always available in their double versions.
@@ -405,8 +398,8 @@ NPY_INPLACE @type@ npy_@kind@@c@(@type@ x)
 /**end repeat1**/
 
 /**begin repeat1
- * #kind = atan2,pow,fmod,copysign#
- * #KIND = ATAN2,POW,FMOD,COPYSIGN#
+ * #kind = atan2,hypot,pow,fmod,copysign#
+ * #KIND = ATAN2,HYPOT,POW,FMOD,COPYSIGN#
  */
 #ifdef @kind@@c@
 #undef @kind@@c@
diff --git a/numpy/core/tests/test_scalarmath.py b/numpy/core/tests/test_scalarmath.py
index 5981225c1f05..8ddebcd6f914 100644
--- a/numpy/core/tests/test_scalarmath.py
+++ b/numpy/core/tests/test_scalarmath.py
@@ -678,11 +678,35 @@ def _test_abs_func(self, absfunc, test_dtype):
         x = test_dtype(np.finfo(test_dtype).min)
         assert_equal(absfunc(x), -x.real)
 
-    @pytest.mark.parametrize("dtype", floating_types + complex_floating_types)
+    @pytest.mark.parametrize(
+        "dtype",
+        [
+            pytest.param(
+                dtype,
+                marks=pytest.mark.xfail(
+                    sys.platform == "cygwin" and dtype == np.clongdouble,
+                    reason="npy_cabsl calls npy_hypotl, which is npy_hypot",
+                ),
+            )
+            for dtype in floating_types + complex_floating_types
+        ],
+    )
     def test_builtin_abs(self, dtype):
         self._test_abs_func(abs, dtype)
 
-    @pytest.mark.parametrize("dtype", floating_types + complex_floating_types)
+    @pytest.mark.parametrize(
+        "dtype",
+        [
+            pytest.param(
+                dtype,
+                marks=pytest.mark.xfail(
+                    sys.platform == "cygwin" and dtype == np.clongdouble,
+                    reason="npy_cabsl calls npy_hypotl, which is npy_hypot",
+                ),
+            )
+            for dtype in floating_types + complex_floating_types
+        ],
+    )
     def test_numpy_abs(self, dtype):
         self._test_abs_func(np.abs, dtype)
 

From bbd571a0f2b6a81eb69fca6183070cdd4676c5b1 Mon Sep 17 00:00:00 2001
From: DWesl <22566757+DWesl@users.noreply.github.com>
Date: Tue, 20 Jul 2021 06:24:03 -0400
Subject: [PATCH 08/12] Undo the remaining changes from "SIMD: Force inlining
 all functions that accept AVX registers"

These changes are not present in `main`.  I see no commits likely to
have specifically changed whether these SIMD functions are inlined.
Adding these back to `main` is left for another PR.  The symptoms I
saw were segfaults, basically because function calls do not preserve
alignment information.
---
 .../src/umath/loops_arithm_fp.dispatch.c.src  |  18 +--
 .../umath/loops_exponent_log.dispatch.c.src   |  70 ++++++------
 numpy/core/src/umath/simd.inc.src             | 106 +++++++++---------
 3 files changed, 97 insertions(+), 97 deletions(-)

diff --git a/numpy/core/src/umath/loops_arithm_fp.dispatch.c.src b/numpy/core/src/umath/loops_arithm_fp.dispatch.c.src
index 51b167844097..d8c8fdc9e41e 100644
--- a/numpy/core/src/umath/loops_arithm_fp.dispatch.c.src
+++ b/numpy/core/src/umath/loops_arithm_fp.dispatch.c.src
@@ -565,36 +565,36 @@ NPY_NO_EXPORT void NPY_CPU_DISPATCH_CURFX(@TYPE@_@kind@)
 #endif
 
 #ifdef AVX512F_NOMSVC
-NPY_FINLINE __mmask16
+static NPY_INLINE __mmask16
 avx512_get_full_load_mask_ps(void)
 {
     return 0xFFFF;
 }
 
-NPY_FINLINE __mmask8
+static NPY_INLINE __mmask8
 avx512_get_full_load_mask_pd(void)
 {
     return 0xFF;
 }
-NPY_FINLINE __m512
+static NPY_INLINE __m512
 avx512_masked_load_ps(__mmask16 mask, npy_float* addr)
 {
     return _mm512_maskz_loadu_ps(mask, (__m512 *)addr);
 }
 
-NPY_FINLINE __m512d
+static NPY_INLINE __m512d
 avx512_masked_load_pd(__mmask8 mask, npy_double* addr)
 {
     return _mm512_maskz_loadu_pd(mask, (__m512d *)addr);
 }
 
-NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F __mmask16
+static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F __mmask16
 avx512_get_partial_load_mask_ps(const npy_int num_elem, const npy_int total_elem)
 {
     return (0x0001 << num_elem) - 0x0001;
 }
 
-NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F __mmask8
+static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F __mmask8
 avx512_get_partial_load_mask_pd(const npy_int num_elem, const npy_int total_elem)
 {
     return (0x01 << num_elem) - 0x01;
@@ -613,18 +613,18 @@ avx512_get_partial_load_mask_pd(const npy_int num_elem, const npy_int total_elem
  *  #INF = NPY_INFINITYF, NPY_INFINITY#
  *  #NAN = NPY_NANF, NPY_NAN#
  */
-NPY_FINLINE @vtype@
+static @vtype@
 avx512_hadd_@vsub@(const @vtype@ x)
 {
     return _mm512_add_@vsub@(x, _mm512_permute_@vsub@(x, @perm_@));
 }
 
-NPY_FINLINE @vtype@
+static @vtype@
 avx512_hsub_@vsub@(const @vtype@ x)
 {
     return _mm512_sub_@vsub@(x, _mm512_permute_@vsub@(x, @perm_@));
 }
-NPY_FINLINE @vtype@
+static NPY_INLINE @vtype@
 avx512_cmul_@vsub@(@vtype@ x1, @vtype@ x2)
 {
     // x1 = r1, i1
diff --git a/numpy/core/src/umath/loops_exponent_log.dispatch.c.src b/numpy/core/src/umath/loops_exponent_log.dispatch.c.src
index b17643d23c29..9970ad2ea994 100644
--- a/numpy/core/src/umath/loops_exponent_log.dispatch.c.src
+++ b/numpy/core/src/umath/loops_exponent_log.dispatch.c.src
@@ -45,19 +45,19 @@
 
 #ifdef SIMD_AVX2_FMA3
 
-NPY_FINLINE __m256
+static NPY_INLINE __m256
 fma_get_full_load_mask_ps(void)
 {
     return _mm256_set1_ps(-1.0);
 }
 
-NPY_FINLINE __m256i
+static NPY_INLINE __m256i
 fma_get_full_load_mask_pd(void)
 {
     return _mm256_castpd_si256(_mm256_set1_pd(-1.0));
 }
 
-NPY_FINLINE __m256
+static NPY_INLINE __m256
 fma_get_partial_load_mask_ps(const npy_int num_elem, const npy_int num_lanes)
 {
     float maskint[16] = {-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,
@@ -66,7 +66,7 @@ fma_get_partial_load_mask_ps(const npy_int num_elem, const npy_int num_lanes)
     return _mm256_loadu_ps(addr);
 }
 
-NPY_FINLINE __m256i
+static NPY_INLINE __m256i
 fma_get_partial_load_mask_pd(const npy_int num_elem, const npy_int num_lanes)
 {
     npy_int maskint[16] = {-1,-1,-1,-1,-1,-1,-1,-1,1,1,1,1,1,1,1,1};
@@ -74,7 +74,7 @@ fma_get_partial_load_mask_pd(const npy_int num_elem, const npy_int num_lanes)
     return _mm256_loadu_si256((__m256i*) addr);
 }
 
-NPY_FINLINE __m256
+static NPY_INLINE __m256
 fma_masked_gather_ps(__m256 src,
                      npy_float* addr,
                      __m256i vindex,
@@ -83,7 +83,7 @@ fma_masked_gather_ps(__m256 src,
     return _mm256_mask_i32gather_ps(src, addr, vindex, mask, 4);
 }
 
-NPY_FINLINE __m256d
+static NPY_INLINE __m256d
 fma_masked_gather_pd(__m256d src,
                      npy_double* addr,
                      __m128i vindex,
@@ -92,49 +92,49 @@ fma_masked_gather_pd(__m256d src,
     return _mm256_mask_i32gather_pd(src, addr, vindex, mask, 8);
 }
 
-NPY_FINLINE __m256
+static NPY_INLINE __m256
 fma_masked_load_ps(__m256 mask, npy_float* addr)
 {
     return _mm256_maskload_ps(addr, _mm256_cvtps_epi32(mask));
 }
 
-NPY_FINLINE __m256d
+static NPY_INLINE __m256d
 fma_masked_load_pd(__m256i mask, npy_double* addr)
 {
     return _mm256_maskload_pd(addr, mask);
 }
 
-NPY_FINLINE __m256
+static NPY_INLINE __m256
 fma_set_masked_lanes_ps(__m256 x, __m256 val, __m256 mask)
 {
     return _mm256_blendv_ps(x, val, mask);
 }
 
-NPY_FINLINE __m256d
+static NPY_INLINE __m256d
 fma_set_masked_lanes_pd(__m256d x, __m256d val, __m256d mask)
 {
     return _mm256_blendv_pd(x, val, mask);
 }
 
-NPY_FINLINE __m256
+static NPY_INLINE __m256
 fma_blend(__m256 x, __m256 y, __m256 ymask)
 {
     return _mm256_blendv_ps(x, y, ymask);
 }
 
-NPY_FINLINE __m256
+static NPY_INLINE __m256
 fma_invert_mask_ps(__m256 ymask)
 {
     return _mm256_andnot_ps(ymask, _mm256_set1_ps(-1.0));
 }
 
-NPY_FINLINE __m256i
+static NPY_INLINE __m256i
 fma_invert_mask_pd(__m256i ymask)
 {
     return _mm256_andnot_si256(ymask, _mm256_set1_epi32(0xFFFFFFFF));
 }
 
-NPY_FINLINE __m256
+static NPY_INLINE __m256
 fma_get_exponent(__m256 x)
 {
     /*
@@ -165,7 +165,7 @@ fma_get_exponent(__m256 x)
     return _mm256_blendv_ps(exp, denorm_exp, denormal_mask);
 }
 
-NPY_FINLINE __m256
+static NPY_INLINE __m256
 fma_get_mantissa(__m256 x)
 {
     /*
@@ -195,7 +195,7 @@ fma_get_mantissa(__m256 x)
                         _mm256_castps_si256(x), mantissa_bits), exp_126_bits));
 }
 
-NPY_FINLINE __m256
+static NPY_INLINE __m256
 fma_scalef_ps(__m256 poly, __m256 quadrant)
 {
     /*
@@ -238,31 +238,31 @@ fma_scalef_ps(__m256 poly, __m256 quadrant)
 
 #ifdef SIMD_AVX512F
 
-NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F __mmask16
+static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F __mmask16
 avx512_get_full_load_mask_ps(void)
 {
     return 0xFFFF;
 }
 
-NPY_FINLINE __mmask8
+static NPY_INLINE __mmask8
 avx512_get_full_load_mask_pd(void)
 {
     return 0xFF;
 }
 
-NPY_FINLINE __mmask16
+static NPY_INLINE __mmask16
 avx512_get_partial_load_mask_ps(const npy_int num_elem, const npy_int total_elem)
 {
     return (0x0001 << num_elem) - 0x0001;
 }
 
-NPY_FINLINE __mmask8
+static NPY_INLINE __mmask8
 avx512_get_partial_load_mask_pd(const npy_int num_elem, const npy_int total_elem)
 {
     return (0x01 << num_elem) - 0x01;
 }
 
-NPY_FINLINE __m512
+static NPY_INLINE __m512
 avx512_masked_gather_ps(__m512 src,
                         npy_float* addr,
                         __m512i vindex,
@@ -271,7 +271,7 @@ avx512_masked_gather_ps(__m512 src,
     return _mm512_mask_i32gather_ps(src, kmask, vindex, addr, 4);
 }
 
-NPY_FINLINE __m512d
+static NPY_INLINE __m512d
 avx512_masked_gather_pd(__m512d src,
                         npy_double* addr,
                         __m256i vindex,
@@ -280,67 +280,67 @@ avx512_masked_gather_pd(__m512d src,
     return _mm512_mask_i32gather_pd(src, kmask, vindex, addr, 8);
 }
 
-NPY_FINLINE __m512
+static NPY_INLINE __m512
 avx512_masked_load_ps(__mmask16 mask, npy_float* addr)
 {
     return _mm512_maskz_loadu_ps(mask, (__m512 *)addr);
 }
 
-NPY_FINLINE __m512d
+static NPY_INLINE __m512d
 avx512_masked_load_pd(__mmask8 mask, npy_double* addr)
 {
     return _mm512_maskz_loadu_pd(mask, (__m512d *)addr);
 }
 
-NPY_FINLINE __m512
+static NPY_INLINE __m512
 avx512_set_masked_lanes_ps(__m512 x, __m512 val, __mmask16 mask)
 {
     return _mm512_mask_blend_ps(mask, x, val);
 }
 
-NPY_FINLINE __m512d
+static NPY_INLINE __m512d
 avx512_set_masked_lanes_pd(__m512d x, __m512d val, __mmask8 mask)
 {
     return _mm512_mask_blend_pd(mask, x, val);
 }
 
-NPY_FINLINE __m512
+static NPY_INLINE __m512
 avx512_blend(__m512 x, __m512 y, __mmask16 ymask)
 {
     return _mm512_mask_mov_ps(x, ymask, y);
 }
 
-NPY_FINLINE __mmask16
+static NPY_INLINE __mmask16
 avx512_invert_mask_ps(__mmask16 ymask)
 {
     return _mm512_knot(ymask);
 }
 
-NPY_FINLINE __mmask8
+static NPY_INLINE __mmask8
 avx512_invert_mask_pd(__mmask8 ymask)
 {
     return _mm512_knot(ymask);
 }
 
-NPY_FINLINE __m512
+static NPY_INLINE __m512
 avx512_get_exponent(__m512 x)
 {
     return _mm512_add_ps(_mm512_getexp_ps(x), _mm512_set1_ps(1.0f));
 }
 
-NPY_FINLINE __m512
+static NPY_INLINE __m512
 avx512_get_mantissa(__m512 x)
 {
     return _mm512_getmant_ps(x, _MM_MANT_NORM_p5_1, _MM_MANT_SIGN_src);
 }
 
-NPY_FINLINE __m512
+static NPY_INLINE __m512
 avx512_scalef_ps(__m512 poly, __m512 quadrant)
 {
     return _mm512_scalef_ps(poly, quadrant);
 }
 
-NPY_FINLINE __m512d
+static NPY_INLINE __m512d
 avx512_permute_x4var_pd(__m512d t0,
                         __m512d t1,
                         __m512d t2,
@@ -355,7 +355,7 @@ avx512_permute_x4var_pd(__m512d t0,
     return _mm512_mask_blend_pd(lut_mask, res1, res2);
 }
 
-NPY_FINLINE __m512d
+static NPY_INLINE __m512d
 avx512_permute_x8var_pd(__m512d t0, __m512d t1, __m512d t2, __m512d t3,
                         __m512d t4, __m512d t5, __m512d t6, __m512d t7,
                         __m512i index)
@@ -401,7 +401,7 @@ avx512_permute_x8var_pd(__m512d t0, __m512d t1, __m512d t2, __m512d t3,
  * 3) x* = x - y*c3
  * c1, c2 are exact floating points, c3 = C - c1 - c2 simulates higher precision
  */
-NPY_FINLINE @vtype@
+static NPY_INLINE @vtype@
 simd_range_reduction(@vtype@ x, @vtype@ y, @vtype@ c1, @vtype@ c2, @vtype@ c3)
 {
     @vtype@ reduced_x = @fmadd@(y, c1, x);
diff --git a/numpy/core/src/umath/simd.inc.src b/numpy/core/src/umath/simd.inc.src
index 654ab81cc370..b535599c6b80 100644
--- a/numpy/core/src/umath/simd.inc.src
+++ b/numpy/core/src/umath/simd.inc.src
@@ -399,7 +399,7 @@ run_unary_simd_@kind@_BOOL(char **args, npy_intp const *dimensions, npy_intp con
 * # VOP = min, max#
 */
 
-NPY_FINLINE npy_float sse2_horizontal_@VOP@___m128(__m128 v)
+static NPY_INLINE npy_float sse2_horizontal_@VOP@___m128(__m128 v)
 {
     npy_float r;
     __m128 tmp = _mm_movehl_ps(v, v);                   /* c     d     ... */
@@ -409,7 +409,7 @@ NPY_FINLINE npy_float sse2_horizontal_@VOP@___m128(__m128 v)
     return r;
 }
 
-NPY_FINLINE npy_double sse2_horizontal_@VOP@___m128d(__m128d v)
+static NPY_INLINE npy_double sse2_horizontal_@VOP@___m128d(__m128d v)
 {
     npy_double r;
     __m128d tmp = _mm_unpackhi_pd(v, v);    /* b     b */
@@ -440,7 +440,7 @@ NPY_FINLINE npy_double sse2_horizontal_@VOP@___m128d(__m128d v)
  * the last vector is passed as a pointer as MSVC 2010 is unable to ignore the
  * calling convention leading to C2719 on 32 bit, see #4795
  */
-NPY_FINLINE void
+static NPY_INLINE void
 sse2_compress4_to_byte_@TYPE@(@vtype@ r1, @vtype@ r2, @vtype@ r3, @vtype@ * r4,
                               npy_bool * op)
 {
@@ -557,7 +557,7 @@ sse2_@kind@_@TYPE@(npy_bool * op, @type@ * ip1, npy_intp n)
 */
 
 /* sets invalid fpu flag on QNaN for consistency with packed compare */
-NPY_FINLINE int
+static NPY_INLINE int
 sse2_ordered_cmp_@kind@_@TYPE@(const @type@ a, const @type@ b)
 {
     @vtype@ one = @vpre@_set1_@vsuf@(1);
@@ -733,19 +733,19 @@ sse2_@kind@_@TYPE@(@type@ * ip, @type@ * op, const npy_intp n)
 /* bunch of helper functions used in ISA_exp/log_FLOAT*/
 
 #if defined HAVE_ATTRIBUTE_TARGET_AVX2_WITH_INTRINSICS
-NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA __m256
+static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA __m256
 fma_get_full_load_mask_ps(void)
 {
     return _mm256_set1_ps(-1.0);
 }
 
-NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA __m256i
+static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA __m256i
 fma_get_full_load_mask_pd(void)
 {
     return _mm256_castpd_si256(_mm256_set1_pd(-1.0));
 }
 
-NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA __m256
+static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA __m256
 fma_get_partial_load_mask_ps(const npy_int num_elem, const npy_int num_lanes)
 {
     float maskint[16] = {-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,
@@ -754,7 +754,7 @@ fma_get_partial_load_mask_ps(const npy_int num_elem, const npy_int num_lanes)
     return _mm256_loadu_ps(addr);
 }
 
-NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA __m256i
+static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA __m256i
 fma_get_partial_load_mask_pd(const npy_int num_elem, const npy_int num_lanes)
 {
     npy_int maskint[16] = {-1,-1,-1,-1,-1,-1,-1,-1,1,1,1,1,1,1,1,1};
@@ -762,7 +762,7 @@ fma_get_partial_load_mask_pd(const npy_int num_elem, const npy_int num_lanes)
     return _mm256_loadu_si256((__m256i*) addr);
 }
 
-NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA __m256
+static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA __m256
 fma_masked_gather_ps(__m256 src,
                      npy_float* addr,
                      __m256i vindex,
@@ -771,7 +771,7 @@ fma_masked_gather_ps(__m256 src,
     return _mm256_mask_i32gather_ps(src, addr, vindex, mask, 4);
 }
 
-NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA __m256d
+static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA __m256d
 fma_masked_gather_pd(__m256d src,
                      npy_double* addr,
                      __m128i vindex,
@@ -780,43 +780,43 @@ fma_masked_gather_pd(__m256d src,
     return _mm256_mask_i32gather_pd(src, addr, vindex, mask, 8);
 }
 
-NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA __m256
+static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA __m256
 fma_masked_load_ps(__m256 mask, npy_float* addr)
 {
     return _mm256_maskload_ps(addr, _mm256_cvtps_epi32(mask));
 }
 
-NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA __m256d
+static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA __m256d
 fma_masked_load_pd(__m256i mask, npy_double* addr)
 {
     return _mm256_maskload_pd(addr, mask);
 }
 
-NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA __m256
+static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA __m256
 fma_set_masked_lanes_ps(__m256 x, __m256 val, __m256 mask)
 {
     return _mm256_blendv_ps(x, val, mask);
 }
 
-NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA __m256d
+static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA __m256d
 fma_set_masked_lanes_pd(__m256d x, __m256d val, __m256d mask)
 {
     return _mm256_blendv_pd(x, val, mask);
 }
 
-NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA __m256
+static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA __m256
 fma_blend(__m256 x, __m256 y, __m256 ymask)
 {
     return _mm256_blendv_ps(x, y, ymask);
 }
 
-NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA __m256
+static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA __m256
 fma_invert_mask_ps(__m256 ymask)
 {
     return _mm256_andnot_ps(ymask, _mm256_set1_ps(-1.0));
 }
 
-NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA __m256i
+static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA __m256i
 fma_invert_mask_pd(__m256i ymask)
 {
     return _mm256_andnot_si256(ymask, _mm256_set1_epi32(0xFFFFFFFF));
@@ -826,37 +826,37 @@ fma_invert_mask_pd(__m256i ymask)
  *  #vsub = ps, pd#
  *  #vtype = __m256, __m256d#
  */
-NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA @vtype@
+static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA @vtype@
 fma_abs_@vsub@(@vtype@ x)
 {
     return _mm256_andnot_@vsub@(_mm256_set1_@vsub@(-0.0), x);
 }
 
-NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA @vtype@
+static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA @vtype@
 fma_reciprocal_@vsub@(@vtype@ x)
 {
     return _mm256_div_@vsub@(_mm256_set1_@vsub@(1.0f), x);
 }
 
-NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA @vtype@
+static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA @vtype@
 fma_rint_@vsub@(@vtype@ x)
 {
     return _mm256_round_@vsub@(x, _MM_FROUND_TO_NEAREST_INT);
 }
 
-NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA @vtype@
+static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA @vtype@
 fma_floor_@vsub@(@vtype@ x)
 {
     return _mm256_round_@vsub@(x, _MM_FROUND_TO_NEG_INF);
 }
 
-NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA @vtype@
+static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA @vtype@
 fma_ceil_@vsub@(@vtype@ x)
 {
     return _mm256_round_@vsub@(x, _MM_FROUND_TO_POS_INF);
 }
 
-NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA @vtype@
+static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA @vtype@
 fma_trunc_@vsub@(@vtype@ x)
 {
     return _mm256_round_@vsub@(x, _MM_FROUND_TO_ZERO);
@@ -865,31 +865,31 @@ fma_trunc_@vsub@(@vtype@ x)
 #endif
 
 #if defined HAVE_ATTRIBUTE_TARGET_AVX512F_WITH_INTRINSICS
-NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F __mmask16
+static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F __mmask16
 avx512_get_full_load_mask_ps(void)
 {
     return 0xFFFF;
 }
 
-NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F __mmask8
+static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F __mmask8
 avx512_get_full_load_mask_pd(void)
 {
     return 0xFF;
 }
 
-NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F __mmask16
+static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F __mmask16
 avx512_get_partial_load_mask_ps(const npy_int num_elem, const npy_int total_elem)
 {
     return (0x0001 << num_elem) - 0x0001;
 }
 
-NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F __mmask8
+static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F __mmask8
 avx512_get_partial_load_mask_pd(const npy_int num_elem, const npy_int total_elem)
 {
     return (0x01 << num_elem) - 0x01;
 }
 
-NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F __m512
+static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F __m512
 avx512_masked_gather_ps(__m512 src,
                         npy_float* addr,
                         __m512i vindex,
@@ -898,7 +898,7 @@ avx512_masked_gather_ps(__m512 src,
     return _mm512_mask_i32gather_ps(src, kmask, vindex, addr, 4);
 }
 
-NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F __m512d
+static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F __m512d
 avx512_masked_gather_pd(__m512d src,
                         npy_double* addr,
                         __m256i vindex,
@@ -907,43 +907,43 @@ avx512_masked_gather_pd(__m512d src,
     return _mm512_mask_i32gather_pd(src, kmask, vindex, addr, 8);
 }
 
-NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F __m512
+static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F __m512
 avx512_masked_load_ps(__mmask16 mask, npy_float* addr)
 {
     return _mm512_maskz_loadu_ps(mask, (__m512 *)addr);
 }
 
-NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F __m512d
+static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F __m512d
 avx512_masked_load_pd(__mmask8 mask, npy_double* addr)
 {
     return _mm512_maskz_loadu_pd(mask, (__m512d *)addr);
 }
 
-NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F __m512
+static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F __m512
 avx512_set_masked_lanes_ps(__m512 x, __m512 val, __mmask16 mask)
 {
     return _mm512_mask_blend_ps(mask, x, val);
 }
 
-NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F __m512d
+static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F __m512d
 avx512_set_masked_lanes_pd(__m512d x, __m512d val, __mmask8 mask)
 {
     return _mm512_mask_blend_pd(mask, x, val);
 }
 
-NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F __m512
+static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F __m512
 avx512_blend(__m512 x, __m512 y, __mmask16 ymask)
 {
     return _mm512_mask_mov_ps(x, ymask, y);
 }
 
-NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F __mmask16
+static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F __mmask16
 avx512_invert_mask_ps(__mmask16 ymask)
 {
     return _mm512_knot(ymask);
 }
 
-NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F __mmask8
+static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F __mmask8
 avx512_invert_mask_pd(__mmask8 ymask)
 {
     return _mm512_knot(ymask);
@@ -963,56 +963,56 @@ avx512_invert_mask_pd(__mmask8 ymask)
  *  #INF = NPY_INFINITYF, NPY_INFINITY#
  *  #NAN = NPY_NANF, NPY_NAN#
  */
-NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F @vtype@
+static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F @vtype@
 avx512_abs_@vsub@(@vtype@ x)
 {
     return (@vtype@) _mm512_and_@epi_vsub@((__m512i) x,
 				    _mm512_set1_@epi_vsub@ (@and_const@));
 }
 
-NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F @vtype@
+static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F @vtype@
 avx512_reciprocal_@vsub@(@vtype@ x)
 {
     return _mm512_div_@vsub@(_mm512_set1_@vsub@(1.0f), x);
 }
 
-NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F @vtype@
+static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F @vtype@
 avx512_rint_@vsub@(@vtype@ x)
 {
     return _mm512_roundscale_@vsub@(x, 0x08);
 }
 
-NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F @vtype@
+static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F @vtype@
 avx512_floor_@vsub@(@vtype@ x)
 {
     return _mm512_roundscale_@vsub@(x, 0x09);
 }
 
-NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F @vtype@
+static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F @vtype@
 avx512_ceil_@vsub@(@vtype@ x)
 {
     return _mm512_roundscale_@vsub@(x, 0x0A);
 }
 
-NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F @vtype@
+static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F @vtype@
 avx512_trunc_@vsub@(@vtype@ x)
 {
     return _mm512_roundscale_@vsub@(x, 0x0B);
 }
 
-NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F @vtype@
+static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F @vtype@
 avx512_hadd_@vsub@(const @vtype@ x)
 {
     return _mm512_add_@vsub@(x, _mm512_permute_@vsub@(x, @perm_@));
 }
 
-NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F @vtype@
+static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F @vtype@
 avx512_hsub_@vsub@(const @vtype@ x)
 {
     return _mm512_sub_@vsub@(x, _mm512_permute_@vsub@(x, @perm_@));
 }
 
-NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F @vtype@
+static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F @vtype@
 avx512_cabsolute_@vsub@(const @vtype@ x1,
                         const @vtype@ x2,
                         const __m512i re_indices,
@@ -1057,7 +1057,7 @@ avx512_cabsolute_@vsub@(const @vtype@ x1,
     return _mm512_mul_@vsub@(hypot, larger);
 }
 
-NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F @vtype@
+static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F @vtype@
 avx512_conjugate_@vsub@(const @vtype@ x)
 {
     /*
@@ -1070,7 +1070,7 @@ avx512_conjugate_@vsub@(const @vtype@ x)
     return _mm512_castsi512_@vsub@(res);
 }
 
-NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F @vtype@
+static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F @vtype@
 avx512_cmul_@vsub@(@vtype@ x1, @vtype@ x2)
 {
     // x1 = r1, i1
@@ -1083,7 +1083,7 @@ avx512_cmul_@vsub@(@vtype@ x1, @vtype@ x2)
     return _mm512_mask_blend_@vsub@(@cmpx_img_mask@, outreal, outimg);
 }
 
-NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F @vtype@
+static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F @vtype@
 avx512_csquare_@vsub@(@vtype@ x)
 {
     return avx512_cmul_@vsub@(x, x);
@@ -1106,25 +1106,25 @@ avx512_csquare_@vsub@(@vtype@ x)
 
 #if defined @CHK@
 
-NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_@ISA@ @vtype@
+static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_@ISA@ @vtype@
 @isa@_sqrt_ps(@vtype@ x)
 {
     return _mm@vsize@_sqrt_ps(x);
 }
 
-NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_@ISA@ @vtype@d
+static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_@ISA@ @vtype@d
 @isa@_sqrt_pd(@vtype@d x)
 {
     return _mm@vsize@_sqrt_pd(x);
 }
 
-NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_@ISA@ @vtype@
+static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_@ISA@ @vtype@
 @isa@_square_ps(@vtype@ x)
 {
     return _mm@vsize@_mul_ps(x,x);
 }
 
-NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_@ISA@ @vtype@d
+static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_@ISA@ @vtype@d
 @isa@_square_pd(@vtype@d x)
 {
     return _mm@vsize@_mul_pd(x,x);
@@ -1615,7 +1615,7 @@ AVX512F_absolute_@TYPE@(@type@ * op,
  * you never know
  */
 #if !@and@
-NPY_FINLINE @vtype@ byte_to_true(@vtype@ v)
+static NPY_INLINE @vtype@ byte_to_true(@vtype@ v)
 {
     const @vtype@ zero = @vpre@_setzero_@vsuf@();
     const @vtype@ truemask = @vpre@_set1_epi8(1 == 1);

From f5a2c39c99777f93ca10fe077dec6c8331200822 Mon Sep 17 00:00:00 2001
From: DWesl <22566757+DWesl@users.noreply.github.com>
Date: Tue, 20 Jul 2021 13:43:46 -0400
Subject: [PATCH 09/12] Revert "Undo the remaining changes from "SIMD: Force
 inlining all functions that accept AVX registers""

This reverts commit bbd571a0f2b6a81eb69fca6183070cdd4676c5b1.

That commit was original to this branch, but is now in main, so
undoing it is a bad idea.
---
 .../src/umath/loops_arithm_fp.dispatch.c.src  |  18 +--
 .../umath/loops_exponent_log.dispatch.c.src   |  70 ++++++------
 numpy/core/src/umath/simd.inc.src             | 106 +++++++++---------
 3 files changed, 97 insertions(+), 97 deletions(-)

diff --git a/numpy/core/src/umath/loops_arithm_fp.dispatch.c.src b/numpy/core/src/umath/loops_arithm_fp.dispatch.c.src
index d8c8fdc9e41e..51b167844097 100644
--- a/numpy/core/src/umath/loops_arithm_fp.dispatch.c.src
+++ b/numpy/core/src/umath/loops_arithm_fp.dispatch.c.src
@@ -565,36 +565,36 @@ NPY_NO_EXPORT void NPY_CPU_DISPATCH_CURFX(@TYPE@_@kind@)
 #endif
 
 #ifdef AVX512F_NOMSVC
-static NPY_INLINE __mmask16
+NPY_FINLINE __mmask16
 avx512_get_full_load_mask_ps(void)
 {
     return 0xFFFF;
 }
 
-static NPY_INLINE __mmask8
+NPY_FINLINE __mmask8
 avx512_get_full_load_mask_pd(void)
 {
     return 0xFF;
 }
-static NPY_INLINE __m512
+NPY_FINLINE __m512
 avx512_masked_load_ps(__mmask16 mask, npy_float* addr)
 {
     return _mm512_maskz_loadu_ps(mask, (__m512 *)addr);
 }
 
-static NPY_INLINE __m512d
+NPY_FINLINE __m512d
 avx512_masked_load_pd(__mmask8 mask, npy_double* addr)
 {
     return _mm512_maskz_loadu_pd(mask, (__m512d *)addr);
 }
 
-static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F __mmask16
+NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F __mmask16
 avx512_get_partial_load_mask_ps(const npy_int num_elem, const npy_int total_elem)
 {
     return (0x0001 << num_elem) - 0x0001;
 }
 
-static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F __mmask8
+NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F __mmask8
 avx512_get_partial_load_mask_pd(const npy_int num_elem, const npy_int total_elem)
 {
     return (0x01 << num_elem) - 0x01;
@@ -613,18 +613,18 @@ avx512_get_partial_load_mask_pd(const npy_int num_elem, const npy_int total_elem
  *  #INF = NPY_INFINITYF, NPY_INFINITY#
  *  #NAN = NPY_NANF, NPY_NAN#
  */
-static @vtype@
+NPY_FINLINE @vtype@
 avx512_hadd_@vsub@(const @vtype@ x)
 {
     return _mm512_add_@vsub@(x, _mm512_permute_@vsub@(x, @perm_@));
 }
 
-static @vtype@
+NPY_FINLINE @vtype@
 avx512_hsub_@vsub@(const @vtype@ x)
 {
     return _mm512_sub_@vsub@(x, _mm512_permute_@vsub@(x, @perm_@));
 }
-static NPY_INLINE @vtype@
+NPY_FINLINE @vtype@
 avx512_cmul_@vsub@(@vtype@ x1, @vtype@ x2)
 {
     // x1 = r1, i1
diff --git a/numpy/core/src/umath/loops_exponent_log.dispatch.c.src b/numpy/core/src/umath/loops_exponent_log.dispatch.c.src
index 9970ad2ea994..b17643d23c29 100644
--- a/numpy/core/src/umath/loops_exponent_log.dispatch.c.src
+++ b/numpy/core/src/umath/loops_exponent_log.dispatch.c.src
@@ -45,19 +45,19 @@
 
 #ifdef SIMD_AVX2_FMA3
 
-static NPY_INLINE __m256
+NPY_FINLINE __m256
 fma_get_full_load_mask_ps(void)
 {
     return _mm256_set1_ps(-1.0);
 }
 
-static NPY_INLINE __m256i
+NPY_FINLINE __m256i
 fma_get_full_load_mask_pd(void)
 {
     return _mm256_castpd_si256(_mm256_set1_pd(-1.0));
 }
 
-static NPY_INLINE __m256
+NPY_FINLINE __m256
 fma_get_partial_load_mask_ps(const npy_int num_elem, const npy_int num_lanes)
 {
     float maskint[16] = {-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,
@@ -66,7 +66,7 @@ fma_get_partial_load_mask_ps(const npy_int num_elem, const npy_int num_lanes)
     return _mm256_loadu_ps(addr);
 }
 
-static NPY_INLINE __m256i
+NPY_FINLINE __m256i
 fma_get_partial_load_mask_pd(const npy_int num_elem, const npy_int num_lanes)
 {
     npy_int maskint[16] = {-1,-1,-1,-1,-1,-1,-1,-1,1,1,1,1,1,1,1,1};
@@ -74,7 +74,7 @@ fma_get_partial_load_mask_pd(const npy_int num_elem, const npy_int num_lanes)
     return _mm256_loadu_si256((__m256i*) addr);
 }
 
-static NPY_INLINE __m256
+NPY_FINLINE __m256
 fma_masked_gather_ps(__m256 src,
                      npy_float* addr,
                      __m256i vindex,
@@ -83,7 +83,7 @@ fma_masked_gather_ps(__m256 src,
     return _mm256_mask_i32gather_ps(src, addr, vindex, mask, 4);
 }
 
-static NPY_INLINE __m256d
+NPY_FINLINE __m256d
 fma_masked_gather_pd(__m256d src,
                      npy_double* addr,
                      __m128i vindex,
@@ -92,49 +92,49 @@ fma_masked_gather_pd(__m256d src,
     return _mm256_mask_i32gather_pd(src, addr, vindex, mask, 8);
 }
 
-static NPY_INLINE __m256
+NPY_FINLINE __m256
 fma_masked_load_ps(__m256 mask, npy_float* addr)
 {
     return _mm256_maskload_ps(addr, _mm256_cvtps_epi32(mask));
 }
 
-static NPY_INLINE __m256d
+NPY_FINLINE __m256d
 fma_masked_load_pd(__m256i mask, npy_double* addr)
 {
     return _mm256_maskload_pd(addr, mask);
 }
 
-static NPY_INLINE __m256
+NPY_FINLINE __m256
 fma_set_masked_lanes_ps(__m256 x, __m256 val, __m256 mask)
 {
     return _mm256_blendv_ps(x, val, mask);
 }
 
-static NPY_INLINE __m256d
+NPY_FINLINE __m256d
 fma_set_masked_lanes_pd(__m256d x, __m256d val, __m256d mask)
 {
     return _mm256_blendv_pd(x, val, mask);
 }
 
-static NPY_INLINE __m256
+NPY_FINLINE __m256
 fma_blend(__m256 x, __m256 y, __m256 ymask)
 {
     return _mm256_blendv_ps(x, y, ymask);
 }
 
-static NPY_INLINE __m256
+NPY_FINLINE __m256
 fma_invert_mask_ps(__m256 ymask)
 {
     return _mm256_andnot_ps(ymask, _mm256_set1_ps(-1.0));
 }
 
-static NPY_INLINE __m256i
+NPY_FINLINE __m256i
 fma_invert_mask_pd(__m256i ymask)
 {
     return _mm256_andnot_si256(ymask, _mm256_set1_epi32(0xFFFFFFFF));
 }
 
-static NPY_INLINE __m256
+NPY_FINLINE __m256
 fma_get_exponent(__m256 x)
 {
     /*
@@ -165,7 +165,7 @@ fma_get_exponent(__m256 x)
     return _mm256_blendv_ps(exp, denorm_exp, denormal_mask);
 }
 
-static NPY_INLINE __m256
+NPY_FINLINE __m256
 fma_get_mantissa(__m256 x)
 {
     /*
@@ -195,7 +195,7 @@ fma_get_mantissa(__m256 x)
                         _mm256_castps_si256(x), mantissa_bits), exp_126_bits));
 }
 
-static NPY_INLINE __m256
+NPY_FINLINE __m256
 fma_scalef_ps(__m256 poly, __m256 quadrant)
 {
     /*
@@ -238,31 +238,31 @@ fma_scalef_ps(__m256 poly, __m256 quadrant)
 
 #ifdef SIMD_AVX512F
 
-static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F __mmask16
+NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F __mmask16
 avx512_get_full_load_mask_ps(void)
 {
     return 0xFFFF;
 }
 
-static NPY_INLINE __mmask8
+NPY_FINLINE __mmask8
 avx512_get_full_load_mask_pd(void)
 {
     return 0xFF;
 }
 
-static NPY_INLINE __mmask16
+NPY_FINLINE __mmask16
 avx512_get_partial_load_mask_ps(const npy_int num_elem, const npy_int total_elem)
 {
     return (0x0001 << num_elem) - 0x0001;
 }
 
-static NPY_INLINE __mmask8
+NPY_FINLINE __mmask8
 avx512_get_partial_load_mask_pd(const npy_int num_elem, const npy_int total_elem)
 {
     return (0x01 << num_elem) - 0x01;
 }
 
-static NPY_INLINE __m512
+NPY_FINLINE __m512
 avx512_masked_gather_ps(__m512 src,
                         npy_float* addr,
                         __m512i vindex,
@@ -271,7 +271,7 @@ avx512_masked_gather_ps(__m512 src,
     return _mm512_mask_i32gather_ps(src, kmask, vindex, addr, 4);
 }
 
-static NPY_INLINE __m512d
+NPY_FINLINE __m512d
 avx512_masked_gather_pd(__m512d src,
                         npy_double* addr,
                         __m256i vindex,
@@ -280,67 +280,67 @@ avx512_masked_gather_pd(__m512d src,
     return _mm512_mask_i32gather_pd(src, kmask, vindex, addr, 8);
 }
 
-static NPY_INLINE __m512
+NPY_FINLINE __m512
 avx512_masked_load_ps(__mmask16 mask, npy_float* addr)
 {
     return _mm512_maskz_loadu_ps(mask, (__m512 *)addr);
 }
 
-static NPY_INLINE __m512d
+NPY_FINLINE __m512d
 avx512_masked_load_pd(__mmask8 mask, npy_double* addr)
 {
     return _mm512_maskz_loadu_pd(mask, (__m512d *)addr);
 }
 
-static NPY_INLINE __m512
+NPY_FINLINE __m512
 avx512_set_masked_lanes_ps(__m512 x, __m512 val, __mmask16 mask)
 {
     return _mm512_mask_blend_ps(mask, x, val);
 }
 
-static NPY_INLINE __m512d
+NPY_FINLINE __m512d
 avx512_set_masked_lanes_pd(__m512d x, __m512d val, __mmask8 mask)
 {
     return _mm512_mask_blend_pd(mask, x, val);
 }
 
-static NPY_INLINE __m512
+NPY_FINLINE __m512
 avx512_blend(__m512 x, __m512 y, __mmask16 ymask)
 {
     return _mm512_mask_mov_ps(x, ymask, y);
 }
 
-static NPY_INLINE __mmask16
+NPY_FINLINE __mmask16
 avx512_invert_mask_ps(__mmask16 ymask)
 {
     return _mm512_knot(ymask);
 }
 
-static NPY_INLINE __mmask8
+NPY_FINLINE __mmask8
 avx512_invert_mask_pd(__mmask8 ymask)
 {
     return _mm512_knot(ymask);
 }
 
-static NPY_INLINE __m512
+NPY_FINLINE __m512
 avx512_get_exponent(__m512 x)
 {
     return _mm512_add_ps(_mm512_getexp_ps(x), _mm512_set1_ps(1.0f));
 }
 
-static NPY_INLINE __m512
+NPY_FINLINE __m512
 avx512_get_mantissa(__m512 x)
 {
     return _mm512_getmant_ps(x, _MM_MANT_NORM_p5_1, _MM_MANT_SIGN_src);
 }
 
-static NPY_INLINE __m512
+NPY_FINLINE __m512
 avx512_scalef_ps(__m512 poly, __m512 quadrant)
 {
     return _mm512_scalef_ps(poly, quadrant);
 }
 
-static NPY_INLINE __m512d
+NPY_FINLINE __m512d
 avx512_permute_x4var_pd(__m512d t0,
                         __m512d t1,
                         __m512d t2,
@@ -355,7 +355,7 @@ avx512_permute_x4var_pd(__m512d t0,
     return _mm512_mask_blend_pd(lut_mask, res1, res2);
 }
 
-static NPY_INLINE __m512d
+NPY_FINLINE __m512d
 avx512_permute_x8var_pd(__m512d t0, __m512d t1, __m512d t2, __m512d t3,
                         __m512d t4, __m512d t5, __m512d t6, __m512d t7,
                         __m512i index)
@@ -401,7 +401,7 @@ avx512_permute_x8var_pd(__m512d t0, __m512d t1, __m512d t2, __m512d t3,
  * 3) x* = x - y*c3
  * c1, c2 are exact floating points, c3 = C - c1 - c2 simulates higher precision
  */
-static NPY_INLINE @vtype@
+NPY_FINLINE @vtype@
 simd_range_reduction(@vtype@ x, @vtype@ y, @vtype@ c1, @vtype@ c2, @vtype@ c3)
 {
     @vtype@ reduced_x = @fmadd@(y, c1, x);
diff --git a/numpy/core/src/umath/simd.inc.src b/numpy/core/src/umath/simd.inc.src
index b535599c6b80..654ab81cc370 100644
--- a/numpy/core/src/umath/simd.inc.src
+++ b/numpy/core/src/umath/simd.inc.src
@@ -399,7 +399,7 @@ run_unary_simd_@kind@_BOOL(char **args, npy_intp const *dimensions, npy_intp con
 * # VOP = min, max#
 */
 
-static NPY_INLINE npy_float sse2_horizontal_@VOP@___m128(__m128 v)
+NPY_FINLINE npy_float sse2_horizontal_@VOP@___m128(__m128 v)
 {
     npy_float r;
     __m128 tmp = _mm_movehl_ps(v, v);                   /* c     d     ... */
@@ -409,7 +409,7 @@ static NPY_INLINE npy_float sse2_horizontal_@VOP@___m128(__m128 v)
     return r;
 }
 
-static NPY_INLINE npy_double sse2_horizontal_@VOP@___m128d(__m128d v)
+NPY_FINLINE npy_double sse2_horizontal_@VOP@___m128d(__m128d v)
 {
     npy_double r;
     __m128d tmp = _mm_unpackhi_pd(v, v);    /* b     b */
@@ -440,7 +440,7 @@ static NPY_INLINE npy_double sse2_horizontal_@VOP@___m128d(__m128d v)
  * the last vector is passed as a pointer as MSVC 2010 is unable to ignore the
  * calling convention leading to C2719 on 32 bit, see #4795
  */
-static NPY_INLINE void
+NPY_FINLINE void
 sse2_compress4_to_byte_@TYPE@(@vtype@ r1, @vtype@ r2, @vtype@ r3, @vtype@ * r4,
                               npy_bool * op)
 {
@@ -557,7 +557,7 @@ sse2_@kind@_@TYPE@(npy_bool * op, @type@ * ip1, npy_intp n)
 */
 
 /* sets invalid fpu flag on QNaN for consistency with packed compare */
-static NPY_INLINE int
+NPY_FINLINE int
 sse2_ordered_cmp_@kind@_@TYPE@(const @type@ a, const @type@ b)
 {
     @vtype@ one = @vpre@_set1_@vsuf@(1);
@@ -733,19 +733,19 @@ sse2_@kind@_@TYPE@(@type@ * ip, @type@ * op, const npy_intp n)
 /* bunch of helper functions used in ISA_exp/log_FLOAT*/
 
 #if defined HAVE_ATTRIBUTE_TARGET_AVX2_WITH_INTRINSICS
-static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA __m256
+NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA __m256
 fma_get_full_load_mask_ps(void)
 {
     return _mm256_set1_ps(-1.0);
 }
 
-static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA __m256i
+NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA __m256i
 fma_get_full_load_mask_pd(void)
 {
     return _mm256_castpd_si256(_mm256_set1_pd(-1.0));
 }
 
-static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA __m256
+NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA __m256
 fma_get_partial_load_mask_ps(const npy_int num_elem, const npy_int num_lanes)
 {
     float maskint[16] = {-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,-1.0,
@@ -754,7 +754,7 @@ fma_get_partial_load_mask_ps(const npy_int num_elem, const npy_int num_lanes)
     return _mm256_loadu_ps(addr);
 }
 
-static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA __m256i
+NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA __m256i
 fma_get_partial_load_mask_pd(const npy_int num_elem, const npy_int num_lanes)
 {
     npy_int maskint[16] = {-1,-1,-1,-1,-1,-1,-1,-1,1,1,1,1,1,1,1,1};
@@ -762,7 +762,7 @@ fma_get_partial_load_mask_pd(const npy_int num_elem, const npy_int num_lanes)
     return _mm256_loadu_si256((__m256i*) addr);
 }
 
-static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA __m256
+NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA __m256
 fma_masked_gather_ps(__m256 src,
                      npy_float* addr,
                      __m256i vindex,
@@ -771,7 +771,7 @@ fma_masked_gather_ps(__m256 src,
     return _mm256_mask_i32gather_ps(src, addr, vindex, mask, 4);
 }
 
-static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA __m256d
+NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA __m256d
 fma_masked_gather_pd(__m256d src,
                      npy_double* addr,
                      __m128i vindex,
@@ -780,43 +780,43 @@ fma_masked_gather_pd(__m256d src,
     return _mm256_mask_i32gather_pd(src, addr, vindex, mask, 8);
 }
 
-static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA __m256
+NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA __m256
 fma_masked_load_ps(__m256 mask, npy_float* addr)
 {
     return _mm256_maskload_ps(addr, _mm256_cvtps_epi32(mask));
 }
 
-static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA __m256d
+NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA __m256d
 fma_masked_load_pd(__m256i mask, npy_double* addr)
 {
     return _mm256_maskload_pd(addr, mask);
 }
 
-static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA __m256
+NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA __m256
 fma_set_masked_lanes_ps(__m256 x, __m256 val, __m256 mask)
 {
     return _mm256_blendv_ps(x, val, mask);
 }
 
-static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA __m256d
+NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA __m256d
 fma_set_masked_lanes_pd(__m256d x, __m256d val, __m256d mask)
 {
     return _mm256_blendv_pd(x, val, mask);
 }
 
-static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA __m256
+NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA __m256
 fma_blend(__m256 x, __m256 y, __m256 ymask)
 {
     return _mm256_blendv_ps(x, y, ymask);
 }
 
-static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA __m256
+NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA __m256
 fma_invert_mask_ps(__m256 ymask)
 {
     return _mm256_andnot_ps(ymask, _mm256_set1_ps(-1.0));
 }
 
-static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA __m256i
+NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA __m256i
 fma_invert_mask_pd(__m256i ymask)
 {
     return _mm256_andnot_si256(ymask, _mm256_set1_epi32(0xFFFFFFFF));
@@ -826,37 +826,37 @@ fma_invert_mask_pd(__m256i ymask)
  *  #vsub = ps, pd#
  *  #vtype = __m256, __m256d#
  */
-static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA @vtype@
+NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA @vtype@
 fma_abs_@vsub@(@vtype@ x)
 {
     return _mm256_andnot_@vsub@(_mm256_set1_@vsub@(-0.0), x);
 }
 
-static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA @vtype@
+NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA @vtype@
 fma_reciprocal_@vsub@(@vtype@ x)
 {
     return _mm256_div_@vsub@(_mm256_set1_@vsub@(1.0f), x);
 }
 
-static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA @vtype@
+NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA @vtype@
 fma_rint_@vsub@(@vtype@ x)
 {
     return _mm256_round_@vsub@(x, _MM_FROUND_TO_NEAREST_INT);
 }
 
-static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA @vtype@
+NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA @vtype@
 fma_floor_@vsub@(@vtype@ x)
 {
     return _mm256_round_@vsub@(x, _MM_FROUND_TO_NEG_INF);
 }
 
-static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA @vtype@
+NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA @vtype@
 fma_ceil_@vsub@(@vtype@ x)
 {
     return _mm256_round_@vsub@(x, _MM_FROUND_TO_POS_INF);
 }
 
-static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA @vtype@
+NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_FMA @vtype@
 fma_trunc_@vsub@(@vtype@ x)
 {
     return _mm256_round_@vsub@(x, _MM_FROUND_TO_ZERO);
@@ -865,31 +865,31 @@ fma_trunc_@vsub@(@vtype@ x)
 #endif
 
 #if defined HAVE_ATTRIBUTE_TARGET_AVX512F_WITH_INTRINSICS
-static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F __mmask16
+NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F __mmask16
 avx512_get_full_load_mask_ps(void)
 {
     return 0xFFFF;
 }
 
-static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F __mmask8
+NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F __mmask8
 avx512_get_full_load_mask_pd(void)
 {
     return 0xFF;
 }
 
-static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F __mmask16
+NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F __mmask16
 avx512_get_partial_load_mask_ps(const npy_int num_elem, const npy_int total_elem)
 {
     return (0x0001 << num_elem) - 0x0001;
 }
 
-static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F __mmask8
+NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F __mmask8
 avx512_get_partial_load_mask_pd(const npy_int num_elem, const npy_int total_elem)
 {
     return (0x01 << num_elem) - 0x01;
 }
 
-static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F __m512
+NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F __m512
 avx512_masked_gather_ps(__m512 src,
                         npy_float* addr,
                         __m512i vindex,
@@ -898,7 +898,7 @@ avx512_masked_gather_ps(__m512 src,
     return _mm512_mask_i32gather_ps(src, kmask, vindex, addr, 4);
 }
 
-static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F __m512d
+NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F __m512d
 avx512_masked_gather_pd(__m512d src,
                         npy_double* addr,
                         __m256i vindex,
@@ -907,43 +907,43 @@ avx512_masked_gather_pd(__m512d src,
     return _mm512_mask_i32gather_pd(src, kmask, vindex, addr, 8);
 }
 
-static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F __m512
+NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F __m512
 avx512_masked_load_ps(__mmask16 mask, npy_float* addr)
 {
     return _mm512_maskz_loadu_ps(mask, (__m512 *)addr);
 }
 
-static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F __m512d
+NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F __m512d
 avx512_masked_load_pd(__mmask8 mask, npy_double* addr)
 {
     return _mm512_maskz_loadu_pd(mask, (__m512d *)addr);
 }
 
-static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F __m512
+NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F __m512
 avx512_set_masked_lanes_ps(__m512 x, __m512 val, __mmask16 mask)
 {
     return _mm512_mask_blend_ps(mask, x, val);
 }
 
-static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F __m512d
+NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F __m512d
 avx512_set_masked_lanes_pd(__m512d x, __m512d val, __mmask8 mask)
 {
     return _mm512_mask_blend_pd(mask, x, val);
 }
 
-static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F __m512
+NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F __m512
 avx512_blend(__m512 x, __m512 y, __mmask16 ymask)
 {
     return _mm512_mask_mov_ps(x, ymask, y);
 }
 
-static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F __mmask16
+NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F __mmask16
 avx512_invert_mask_ps(__mmask16 ymask)
 {
     return _mm512_knot(ymask);
 }
 
-static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F __mmask8
+NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F __mmask8
 avx512_invert_mask_pd(__mmask8 ymask)
 {
     return _mm512_knot(ymask);
@@ -963,56 +963,56 @@ avx512_invert_mask_pd(__mmask8 ymask)
  *  #INF = NPY_INFINITYF, NPY_INFINITY#
  *  #NAN = NPY_NANF, NPY_NAN#
  */
-static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F @vtype@
+NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F @vtype@
 avx512_abs_@vsub@(@vtype@ x)
 {
     return (@vtype@) _mm512_and_@epi_vsub@((__m512i) x,
 				    _mm512_set1_@epi_vsub@ (@and_const@));
 }
 
-static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F @vtype@
+NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F @vtype@
 avx512_reciprocal_@vsub@(@vtype@ x)
 {
     return _mm512_div_@vsub@(_mm512_set1_@vsub@(1.0f), x);
 }
 
-static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F @vtype@
+NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F @vtype@
 avx512_rint_@vsub@(@vtype@ x)
 {
     return _mm512_roundscale_@vsub@(x, 0x08);
 }
 
-static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F @vtype@
+NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F @vtype@
 avx512_floor_@vsub@(@vtype@ x)
 {
     return _mm512_roundscale_@vsub@(x, 0x09);
 }
 
-static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F @vtype@
+NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F @vtype@
 avx512_ceil_@vsub@(@vtype@ x)
 {
     return _mm512_roundscale_@vsub@(x, 0x0A);
 }
 
-static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F @vtype@
+NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F @vtype@
 avx512_trunc_@vsub@(@vtype@ x)
 {
     return _mm512_roundscale_@vsub@(x, 0x0B);
 }
 
-static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F @vtype@
+NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F @vtype@
 avx512_hadd_@vsub@(const @vtype@ x)
 {
     return _mm512_add_@vsub@(x, _mm512_permute_@vsub@(x, @perm_@));
 }
 
-static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F @vtype@
+NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F @vtype@
 avx512_hsub_@vsub@(const @vtype@ x)
 {
     return _mm512_sub_@vsub@(x, _mm512_permute_@vsub@(x, @perm_@));
 }
 
-static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F @vtype@
+NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F @vtype@
 avx512_cabsolute_@vsub@(const @vtype@ x1,
                         const @vtype@ x2,
                         const __m512i re_indices,
@@ -1057,7 +1057,7 @@ avx512_cabsolute_@vsub@(const @vtype@ x1,
     return _mm512_mul_@vsub@(hypot, larger);
 }
 
-static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F @vtype@
+NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F @vtype@
 avx512_conjugate_@vsub@(const @vtype@ x)
 {
     /*
@@ -1070,7 +1070,7 @@ avx512_conjugate_@vsub@(const @vtype@ x)
     return _mm512_castsi512_@vsub@(res);
 }
 
-static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F @vtype@
+NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F @vtype@
 avx512_cmul_@vsub@(@vtype@ x1, @vtype@ x2)
 {
     // x1 = r1, i1
@@ -1083,7 +1083,7 @@ avx512_cmul_@vsub@(@vtype@ x1, @vtype@ x2)
     return _mm512_mask_blend_@vsub@(@cmpx_img_mask@, outreal, outimg);
 }
 
-static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F @vtype@
+NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_AVX512F @vtype@
 avx512_csquare_@vsub@(@vtype@ x)
 {
     return avx512_cmul_@vsub@(x, x);
@@ -1106,25 +1106,25 @@ avx512_csquare_@vsub@(@vtype@ x)
 
 #if defined @CHK@
 
-static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_@ISA@ @vtype@
+NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_@ISA@ @vtype@
 @isa@_sqrt_ps(@vtype@ x)
 {
     return _mm@vsize@_sqrt_ps(x);
 }
 
-static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_@ISA@ @vtype@d
+NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_@ISA@ @vtype@d
 @isa@_sqrt_pd(@vtype@d x)
 {
     return _mm@vsize@_sqrt_pd(x);
 }
 
-static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_@ISA@ @vtype@
+NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_@ISA@ @vtype@
 @isa@_square_ps(@vtype@ x)
 {
     return _mm@vsize@_mul_ps(x,x);
 }
 
-static NPY_INLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_@ISA@ @vtype@d
+NPY_FINLINE NPY_GCC_OPT_3 NPY_GCC_TARGET_@ISA@ @vtype@d
 @isa@_square_pd(@vtype@d x)
 {
     return _mm@vsize@_mul_pd(x,x);
@@ -1615,7 +1615,7 @@ AVX512F_absolute_@TYPE@(@type@ * op,
  * you never know
  */
 #if !@and@
-static NPY_INLINE @vtype@ byte_to_true(@vtype@ v)
+NPY_FINLINE @vtype@ byte_to_true(@vtype@ v)
 {
     const @vtype@ zero = @vpre@_setzero_@vsuf@();
     const @vtype@ truemask = @vpre@_set1_epi8(1 == 1);

From 2d905c3114713da11cbead5588b1d4f9537c1750 Mon Sep 17 00:00:00 2001
From: DWesl <22566757+DWesl@users.noreply.github.com>
Date: Tue, 20 Jul 2021 18:55:47 -0400
Subject: [PATCH 10/12] TST: Move the DLL checker to a separate script

Instead of trying to write it every run, just add it to the repository
and use that.  I'm not sure how Windows git handles line endings; I
suspect it changes things to \r\n, which is not what I want.  I'm on
Cygwin, not MSYS, so everything expects \n.
---
 .github/workflows/cygwin.yml                  | 28 +-------------
 .../list_installed_dll_dependencies_cygwin.sh | 38 +++++++++++++++++++
 2 files changed, 39 insertions(+), 27 deletions(-)
 create mode 100644 tools/list_installed_dll_dependencies_cygwin.sh

diff --git a/.github/workflows/cygwin.yml b/.github/workflows/cygwin.yml
index 8fd4babd24f0..b5d61c592fda 100644
--- a/.github/workflows/cygwin.yml
+++ b/.github/workflows/cygwin.yml
@@ -63,31 +63,5 @@ jobs:
         run: |
           dash -c "/usr/bin/python3.7 -m pip show numpy"
           dash -c "/usr/bin/python3.7 -m pip show -f numpy | grep .dll"
-          echo >list_dlls_dos.sh '#!/bin/dash
-          site_packages=$(python3.7 -m pip show numpy | \
-              grep Location | cut -d " " -f 2 -);
-          dll_list=$(for name in $(python3.7 -m pip show -f numpy | \
-              grep -F .dll); do echo ${site_packages}/${name}; done)
-          echo "Checks for existence, permissions and file type"
-          ls -l ${dll_list}
-          file ${dll_list}
-          echo "Dependency checks"
-          ldd ${dll_list} | grep -F -e " => not found" && exit 1
-          cygcheck ${dll_list} >cygcheck_dll_list 2>cygcheck_missing_deps
-          grep -F -e "cygcheck: track_down: could not find " cygcheck_missing_deps && exit 1
-          echo "Import tests"
-          mkdir -p dist/
-          cd dist/
-          for name in ${dll_list};
-          do
-              echo ${name}
-              ext_module=$(echo ${name} | \
-                  sed -E \
-                      -e "s/^\/+(home|usr).*?site-packages\/+//" \
-                      -e "s/.cpython-3.m?-x86(_64)?-cygwin.dll$//" \
-                      -e "s/\//./g")
-              python3.7 -c "import ${ext_module}"
-          done
-          '
-          dash -c "/bin/tr -d '\r' <list_dlls_dos.sh >list_dlls_unix.sh"
+          dash -c "/bin/tr -d '\r' <tools/list_installed_dll_dependencies_cygwin.sh >list_dlls_unix.sh"
           dash "list_dlls_unix.sh"
diff --git a/tools/list_installed_dll_dependencies_cygwin.sh b/tools/list_installed_dll_dependencies_cygwin.sh
new file mode 100644
index 000000000000..5b81998dbcca
--- /dev/null
+++ b/tools/list_installed_dll_dependencies_cygwin.sh
@@ -0,0 +1,38 @@
+#!/bin/dash
+# Check permissions and dependencies on installed DLLs
+# DLLs need execute permissions to be used
+# DLLs must be able to find their dependencies
+# This checks both of those, then does a direct test
+# The best way of checking whether a C extension module is importable
+# is trying to import it.  The rest is trying to give reasons why it
+# isn't importing.
+#
+# One of the tools and the extension for shared libraries are
+# Cygwin-specific, but the rest should work on most platforms with
+# /bin/sh
+
+py_ver=3.7
+site_packages=$(python${py_ver} -m pip show numpy | \
+		    grep Location | cut -d " " -f 2 -);
+dll_list=$(for name in $(python${py_ver} -m pip show -f numpy | \
+			     grep -F .dll); do echo ${site_packages}/${name}; done)
+echo "Checks for existence, permissions and file type"
+ls -l ${dll_list}
+file ${dll_list}
+echo "Dependency checks"
+ldd ${dll_list} | grep -F -e " => not found" && exit 1
+cygcheck ${dll_list} >cygcheck_dll_list 2>cygcheck_missing_deps
+grep -F -e "cygcheck: track_down: could not find " cygcheck_missing_deps && exit 1
+echo "Import tests"
+mkdir -p dist/
+cd dist/
+for name in ${dll_list};
+do
+    echo ${name}
+    ext_module=$(echo ${name} | \
+                     sed -E \
+			 -e "s/^\/+(home|usr).*?site-packages\/+//" \
+			 -e "s/.cpython-3.m?-x86(_64)?-cygwin.dll$//" \
+			 -e "s/\//./g")
+    python${py_ver} -c "import ${ext_module}"
+done

From 065434155bb59df8cc6e9da5f5492bef31e7c1c8 Mon Sep 17 00:00:00 2001
From: DWesl <22566757+DWesl@users.noreply.github.com>
Date: Tue, 20 Jul 2021 19:00:59 -0400
Subject: [PATCH 11/12] TST: Prettify the cabsl test xfail declaration.

Marking this in the parametrize call gets really big.  Moving this
inside the function is much shorter, even if it won't tell me when a
new Cygwin release makes this obsolete.  Since this only needs to wait
for one Cygwin release, I suppose that's not much of a stretch.
---
 numpy/core/tests/test_scalarmath.py | 32 ++++++-----------------------
 1 file changed, 6 insertions(+), 26 deletions(-)

diff --git a/numpy/core/tests/test_scalarmath.py b/numpy/core/tests/test_scalarmath.py
index 8ddebcd6f914..4a9b342c191e 100644
--- a/numpy/core/tests/test_scalarmath.py
+++ b/numpy/core/tests/test_scalarmath.py
@@ -678,36 +678,16 @@ def _test_abs_func(self, absfunc, test_dtype):
         x = test_dtype(np.finfo(test_dtype).min)
         assert_equal(absfunc(x), -x.real)
 
-    @pytest.mark.parametrize(
-        "dtype",
-        [
-            pytest.param(
-                dtype,
-                marks=pytest.mark.xfail(
-                    sys.platform == "cygwin" and dtype == np.clongdouble,
-                    reason="npy_cabsl calls npy_hypotl, which is npy_hypot",
-                ),
-            )
-            for dtype in floating_types + complex_floating_types
-        ],
-    )
+    @pytest.mark.parametrize("dtype", floating_types + complex_floating_types)
     def test_builtin_abs(self, dtype):
+        if dtype == np.clongdouble and sys.platform == "cygwin":
+            pytest.xfail(reason="npy_cabsl calls npy_hypotl, which is npy_hypot")
         self._test_abs_func(abs, dtype)
 
-    @pytest.mark.parametrize(
-        "dtype",
-        [
-            pytest.param(
-                dtype,
-                marks=pytest.mark.xfail(
-                    sys.platform == "cygwin" and dtype == np.clongdouble,
-                    reason="npy_cabsl calls npy_hypotl, which is npy_hypot",
-                ),
-            )
-            for dtype in floating_types + complex_floating_types
-        ],
-    )
+    @pytest.mark.parametrize("dtype", floating_types + complex_floating_types)
     def test_numpy_abs(self, dtype):
+        if dtype == np.clongdouble and sys.platform == "cygwin":
+            pytest.xfail(reason="npy_cabsl calls npy_hypotl, which is npy_hypot")
         self._test_abs_func(np.abs, dtype)
 
 class TestBitShifts:

From 52fa14eeebc6ad917eaf87e4e5185ed7a29bcfc1 Mon Sep 17 00:00:00 2001
From: DWesl <22566757+DWesl@users.noreply.github.com>
Date: Tue, 20 Jul 2021 19:16:57 -0400
Subject: [PATCH 12/12] Wrap long line and change condition order.

I want the `sys.platform == "cygwin"` check before I try to access
`np.clongdouble`, so the check doesn't crash the test on platforms
where np.clongdouble doesn't exist.
---
 numpy/core/tests/test_scalarmath.py | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

diff --git a/numpy/core/tests/test_scalarmath.py b/numpy/core/tests/test_scalarmath.py
index 4a9b342c191e..9d1d514fbb9e 100644
--- a/numpy/core/tests/test_scalarmath.py
+++ b/numpy/core/tests/test_scalarmath.py
@@ -680,14 +680,18 @@ def _test_abs_func(self, absfunc, test_dtype):
 
     @pytest.mark.parametrize("dtype", floating_types + complex_floating_types)
     def test_builtin_abs(self, dtype):
-        if dtype == np.clongdouble and sys.platform == "cygwin":
-            pytest.xfail(reason="npy_cabsl calls npy_hypotl, which is npy_hypot")
+        if sys.platform == "cygwin" and dtype == np.clongdouble:
+            pytest.xfail(
+                reason="absl is computed in double precision on cygwin"
+            )
         self._test_abs_func(abs, dtype)
 
     @pytest.mark.parametrize("dtype", floating_types + complex_floating_types)
     def test_numpy_abs(self, dtype):
-        if dtype == np.clongdouble and sys.platform == "cygwin":
-            pytest.xfail(reason="npy_cabsl calls npy_hypotl, which is npy_hypot")
+        if sys.platform == "cygwin" and dtype == np.clongdouble:
+            pytest.xfail(
+                reason="absl is computed in double precision on cygwin"
+            )
         self._test_abs_func(np.abs, dtype)
 
 class TestBitShifts: