8000 ENH, SIMD: Initial implementation of Highway wrapper by seiko2plus · Pull Request #28622 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

ENH, SIMD: Initial implementation of Highway wrapper #28622

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jun 5, 2025
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
SIMD: Update wrapper with improved docs and type support
  - Fix hardware/platform terminology in documentation for clarity
  - Add support for long double in template specializations
  - Add kMaxLanes constant to expose maximum vector width information
  - Follows clang formatting style for consistency with NumPy codebase.
  • Loading branch information
seiko2plus committed May 19, 2025
commit 5d48ec32b7e4f17e9c5934f1e029c25f683dc840
9 changes: 7 additions & 2 deletions numpy/_core/src/common/simd/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -75,6 +75,11 @@ The wrapper provides type constraints to help with SFINAE (Substitution Failure
constexpr bool kSupportLane<double> = NPY_SIMDX_F64 != 0;
```

- `kMaxLanes<TLane>`: Maximum number of lanes supported by the SIMD extension for the specified lane type.
```cpp
template <typename TLane>
constexpr size_t kMaxLanes = HWY_MAX_LANES_D(_Tag<TLane>);
```

```cpp
#include "simd/simd.hpp"
Expand Down Expand Up @@ -175,13 +180,13 @@ The SIMD wrapper automatically disables SIMD operations when optimizations are d

3. **Feature Detection Constants vs. Highway Constants**
- NumPy-specific constants (`NPY_SIMDX_F16`, `NPY_SIMDX_F64`, `NPY_SIMDX_FMA`) provide additional safety beyond raw Highway constants
- Highway constants (e.g., `HWY_HAVE_FLOAT16`) only check hardware capabilities but don't consider NumPy's build configuration
- Highway constants (e.g., `HWY_HAVE_FLOAT16`) only check platform capabilities but don't consider NumPy's build configuration
- Our constants combine both checks:
```cpp
#define NPY_SIMDX_F16 (NPY_SIMDX && HWY_HAVE_FLOAT16)
```
- This ensures SIMD features won't be used when:
- Hardware supports it but NumPy optimization is disabled via meson option:
- Platform supports it but NumPy optimization is disabled via meson option:
```
option('disable-optimization', type: 'boolean', value: false,
description: 'Disable CPU optimized code (dispatch,simd,unroll...)')
Expand Down
70 changes: 47 additions & 23 deletions numpy/_core/src/common/simd/simd.inc.hpp
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
#ifndef NPY_SIMDX
#error "This is not a standalone header. Include simd.hpp instead."
#define NPY_SIMDX 1 // Prevent editors from graying out the happy branch
#endif

// NOTE: This file is included by simd.hpp multiple times with different namespaces
// so avoid including any headers here
// #define NPY_SIMDX 1 // uncomment to enable Highlighting

/**
* Determines whether the specified lane type is supported by the SIMD extension.
Expand All @@ -21,6 +21,13 @@ template <>
constexpr bool kSupportLane<hwy::float16_t> = HWY_HAVE_FLOAT16 != 0;
template <>
constexpr bool kSupportLane<double> = HWY_HAVE_FLOAT64 != 0;
template <>
constexpr bool kSupportLane<long double> =
HWY_HAVE_FLOAT64 != 0 && sizeof(long double) == sizeof(double);

/// Maximum number of lanes supported by the SIMD extension for the specified lane type.
template <typename TLane>
constexpr size_t kMaxLanes = HWY_MAX_LANES_D(_Tag<TLane>);

/// Represents an N-lane vector based on the specified lane type.
/// @tparam TLane The scalar type for each vector lane
Expand All @@ -34,69 +41,86 @@ using Mask = hn::Mask<_Tag<TLane>>;

/// Unaligned load of a vector from memory.
template <typename TLane>
HWY_API Vec<TLane> LoadU(const TLane* ptr) {
HWY_API Vec<TLane>
LoadU(const TLane *ptr)
{
return hn::LoadU(_Tag<TLane>(), ptr);
}

/// Unaligned store of a vector to memory.
template <typename TLane>
HWY_API void StoreU(const Vec<TLane>& a, TLane* ptr) {
HWY_API void
StoreU(const Vec<TLane> &a, TLane *ptr)
{
hn::StoreU(a, _Tag<TLane>(), ptr);
}

/// Returns the number of vector lanes based on the lane type.
template <typename TLane>
HWY_API constexpr size_t Lanes(TLane tag = 0) {
HWY_API constexpr size_t
Lanes(TLane tag = 0)
{
return hn::Lanes(_Tag<TLane>());
}

/// Returns an uninitialized N-lane vector.
template <typename TLane>
HWY_API Vec<TLane> Undefined(TLane tag = 0) {
HWY_API Vec<TLane>
Undefined(TLane tag = 0)
{
return hn::Undefined(_Tag<TLane>());
}

/// Returns N-lane vector with all lanes equal to zero.
template <typename TLane>
HWY_API Vec<TLane> Zero(TLane tag = 0) {
HWY_API Vec<TLane>
Zero(TLane tag = 0)
{
return hn::Zero(_Tag<TLane>());
}

/// Returns N-lane vector with all lanes equal to the given value of type `TLane`.
template <typename TLane>
HWY_API Vec<TLane> Set(TLane val) {
HWY_API Vec<TLane>
Set(TLane val)
{
return hn::Set(_Tag<TLane>(), val);
}

/// Converts a mask to a vector based on the specified lane type.
template <typename TLane, typename TMask>
HWY_API Vec<TLane> VecFromMask(const TMask &m) {
HWY_API Vec<TLane>
VecFromMask(const TMask &m)
{
return hn::VecFromMask(_Tag<TLane>(), m);
}

/// Convert (Reinterpret) an N-lane vector to a different type without modifying the underlying data.
/// Convert (Reinterpret) an N-lane vector to a different type without modifying the
/// underlying data.
template <typename TLaneTo, typename TVec>
HWY_API Vec<TLaneTo> BitCast(const TVec &v) {
HWY_API Vec<TLaneTo>
BitCast(const TVec &v)
{
return hn::BitCast(_Tag<TLaneTo>(), v);
}

// Import common Highway intrinsics
using hn::Eq;
using hn::Le;
using hn::Lt;
using hn::Gt;
using hn::Ge;
using hn::Abs;
using hn::Add;
using hn::And;
using hn::Or;
using hn::Xor;
using hn::AndNot;
using hn::Sub;
using hn::Add;
using hn::Mul;
using hn::Div;
using hn::Min;
using hn::Eq;
using hn::Ge;
using hn::Gt;
using hn::Le;
using hn::Lt;
using hn::Max;
using hn::Abs;
using hn::Min;
using hn::Mul;
using hn::Or;
using hn::Sqrt;
using hn::Sub;
using hn::Xor;

#endif // NPY_SIMDX
#endif // NPY_SIMDX
0