-
-
Notifications
You must be signed in to change notification settings - Fork 56.2k
added detection & dispatching of some modern NEON instructions (NEON_FP16, NEON_BF16) #24420
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… method to detect various NEON extensions, such as FP16 SIMD arithmetics, BF16 SIMD arithmetics, SIMD dotprod etc. It could be propagated to other instruction sets if necessary.
Looks like the mentioned features are supported in |
Link: https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/signal?view=msvc-170 |
There is a header of actual auxv flags / values: https://elixir.bootlin.com/linux/latest/source/arch/arm64/include/uapi/asm/hwcap.h#L106 Also some libraries parse |
@@ -343,12 +345,17 @@ elseif(ARM OR AARCH64) | |||
endif() | |||
ocv_update(CPU_FP16_IMPLIES "NEON") | |||
else() | |||
ocv_update(CPU_KNOWN_OPTIMIZATIONS "NEON;FP16;NEON_DOTPROD") | |||
ocv_update(CPU_KNOWN_OPTIMIZATIONS "NEON;FP16;NEON_DOTPROD;FP16_SIMD;BF16_SIMD") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FP16_SIMD;BF16_SIMD
How does scope of these instructions correlate with other platforms?
... and available universal intrinsics?
Until it is unclear, it is better to add ARM_
prefix.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fixed
…ions without signal() * renamed FP16_SIMD & BF16_SIMD to NEON_FP16 and NEON_BF16, respectively
@asmorkalov, @opencv-alalek, thank you for the comments and for the links! all your concerns have been addressed :) |
ocv_update(CPU_NEON_FLAGS_ON "") | ||
ocv_update(CPU_FP16_IMPLIES "NEON") | ||
ocv_update(CPU_NEON_DOTPROD_FLAGS_ON "-march=armv8.2-a+dotprod") | ||
ocv_update(CPU_NEON_DOTPROD_IMPLIES "NEON") | ||
ocv_update(CPU_NEON_FP16_FLAGS_ON "-march=armv8.2-a+fp16") | ||
ocv_update(CPU_NEON_FP16_IMPLIES "NEON") | ||
ocv_update(CPU_NEON_BF16_FLAGS_ON "-march=armv8.2-a+fp16+bf16") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
…FP16, NEON_BF16) (opencv#24420) * added more or less cross-platform (based on POSIX signal() semantics) method to detect various NEON extensions, such as FP16 SIMD arithmetics, BF16 SIMD arithmetics, SIMD dotprod etc. It could be propagated to other instruction sets if necessary. * hopefully fixed compile errors * continue to fix CI * another attempt to fix build on Linux aarch64 * * reverted to the original method to detect special arm neon instructions without signal() * renamed FP16_SIMD & BF16_SIMD to NEON_FP16 and NEON_BF16, respectively * removed extra whitespaces
…FP16, NEON_BF16) (opencv#24420) * added more or less cross-platform (based on POSIX signal() semantics) method to detect various NEON extensions, such as FP16 SIMD arithmetics, BF16 SIMD arithmetics, SIMD dotprod etc. It could be propagated to other instruction sets if necessary. * hopefully fixed compile errors * continue to fix CI * another attempt to fix build on Linux aarch64 * * reverted to the original method to detect special arm neon instructions without signal() * renamed FP16_SIMD & BF16_SIMD to NEON_FP16 and NEON_BF16, respectively * removed extra whitespaces
…FP16, NEON_BF16) (opencv#24420) * added more or less cross-platform (based on POSIX signal() semantics) method to detect various NEON extensions, such as FP16 SIMD arithmetics, BF16 SIMD arithmetics, SIMD dotprod etc. It could be propagated to other instruction sets if necessary. * hopefully fixed compile errors * continue to fix CI * another attempt to fix build on Linux aarch64 * * reverted to the original method to detect special arm neon instructions without signal() * renamed FP16_SIMD & BF16_SIMD to NEON_FP16 and NEON_BF16, respectively * removed extra whitespaces
Currently, platform-specific (NEON) code is required to make use of those instructions. Later on, maybe universal intrinsics for FP16 and BF16 arithmetics will be added. Note that even modern ARM platforms don't have full set of BF16 operations. This is mostly instructions to implement BF16xBF16 to FP32 matrix multiplication.
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.