8000 added detection & dispatching of some modern NEON instructions (NEON_FP16, NEON_BF16) by vpisarev · Pull Request #24420 · opencv/opencv · GitHub
[go: up one dir, main page]

Skip to content

added detection & dispatching of some modern NEON instructions (NEON_FP16, NEON_BF16) #24420

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Oct 18, 2023

Conversation

vpisarev
Copy link
Contributor
@vpisarev vpisarev commented Oct 17, 2023

Currently, platform-specific (NEON) code is required to make use of those instructions. Later on, maybe universal intrinsics for FP16 and BF16 arithmetics will be added. Note that even modern ARM platforms don't have full set of BF16 operations. This is mostly instructions to implement BF16xBF16 to FP32 matrix multiplication.

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

… method to detect various NEON extensions, such as FP16 SIMD arithmetics, BF16 SIMD arithmetics, SIMD dotprod etc. It could be propagated to other instruction sets if necessary.
@vpisarev vpisarev changed the title added more or less cross-platform (based on POSIX signal() semantics) mechanism to detect of various modern NEON instructions added more or less cross-platform (based on POSIX signal() semantics) mechanism to detect various modern NEON instructions Oct 18, 2023
@vpisarev vpisarev changed the title added more or less cross-platform (based on POSIX signal() semantics) mechanism to detect various modern NEON instructions added more or less cross-platform (based on POSIX signal() semantics) mechanism to detect some modern NEON instructions Oct 18, 2023
@asmorkalov
Copy link
Contributor

Looks like the mentioned features are supported in /proc/self/auxv. No need to rework the approach.
See profiles from ARM: https://developer.arm.com/downloads/-/exploration-tools/feature-names-for-a-profile and constants for HWCAP: https://www.kernel.org/doc/html/v6.1/arm64/elf_hwcaps.html

@asmorkalov
Copy link
Contributor
The SIGILL and SIGTERM signals aren't generated under Windows. They're included for ANSI compatibility. Therefore, you can set signal handlers for these signals by using signal, and you can also explicitly generate these signals by calling [raise](https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/raise?view=msvc-170).

Link: https://learn.microsoft.com/en-us/cpp/c-runtime-library/reference/signal?view=msvc-170

@asmorkalov asmorkalov added category: core platform: arm ARM boards related issues: RPi, NVIDIA TK/TX, etc pr: Discussion Required labels Oct 18, 2023
@opencv-alalek
Copy link
Contributor

/proc/self/auxv

There is a header of actual auxv flags / values: https://elixir.bootlin.com/linux/latest/source/arch/arm64/include/uapi/asm/hwcap.h#L106

Also some libraries parse /proc/cpuinfo (but auxv is preferable)

@@ -343,12 +345,17 @@ elseif(ARM OR AARCH64)
endif()
ocv_update(CPU_FP16_IMPLIES "NEON")
else()
ocv_update(CPU_KNOWN_OPTIMIZATIONS "NEON;FP16;NEON_DOTPROD")
ocv_update(CPU_KNOWN_OPTIMIZATIONS "NEON;FP16;NEON_DOTPROD;FP16_SIMD;BF16_SIMD")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FP16_SIMD;BF16_SIMD

How does scope of these instructions correlate with other platforms?
... and available universal intrinsics?

Until it is unclear, it is better to add ARM_ prefix.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

…ions without signal()

* renamed FP16_SIMD & BF16_SIMD to NEON_FP16 and NEON_BF16, respectively
@vpisarev vpisarev changed the title added more or less cross-platform (based on POSIX signal() semantics) mechanism to detect some modern NEON instructions added detection & dispatching of some modern NEON instructions (NEON_FP16, NEON_BF16) Oct 18, 2023
@vpisarev
Copy link
Contributor Author

@asmorkalov, @opencv-alalek, thank you for the comments and for the links! all your concerns have been addressed :)

@vpisarev vpisarev merged commit ba4d6c8 into opencv:4.x Oct 18, 2023
@opencv-alalek opencv-alalek added this to the 4.9.0 milestone Oct 18, 2023
@asmorkalov asmorkalov mentioned this pull request Nov 3, 2023
@vpisarev vpisarev deleted the fp16bf16_arithm branch November 20, 2023 01:03
ocv_update(CPU_NEON_FLAGS_ON "")
ocv_update(CPU_FP16_IMPLIES "NEON")
ocv_update(CPU_NEON_DOTPROD_FLAGS_ON "-march=armv8.2-a+dotprod")
ocv_update(CPU_NEON_DOTPROD_IMPLIES "NEON")
ocv_update(CPU_NEON_FP16_FLAGS_ON "-march=armv8.2-a+fp16")
ocv_update(CPU_NEON_FP16_IMPLIES "NEON")
ocv_update(CPU_NEON_BF16_FLAGS_ON "-march=armv8.2-a+fp16+bf16")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need to combine +bf16 and +fp16? @vpisarev
I'm discussing this on #24588 and I'd like to hear your original thoughts about this line.

IskXCr pushed a commit to Haosonn/opencv that referenced this pull request Dec 20, 2023
…FP16, NEON_BF16) (opencv#24420)

* added more or less cross-platform (based on POSIX signal() semantics) method to detect various NEON extensions, such as FP16 SIMD arithmetics, BF16 SIMD arithmetics, SIMD dotprod etc. It could be propagated to other instruction sets if necessary.

* hopefully fixed compile errors

* continue to fix CI

* another attempt to fix build on Linux aarch64

* * reverted to the original method to detect special arm neon instructions without signal()
* renamed FP16_SIMD & BF16_SIMD to NEON_FP16 and NEON_BF16, respectively

* removed extra whitespaces
thewoz pushed a commit to thewoz/opencv that referenced this pull request Jan 4, 2024
…FP16, NEON_BF16) (opencv#24420)

* added more or less cross-platform (based on POSIX signal() semantics) method to detect various NEON extensions, such as FP16 SIMD arithmetics, BF16 SIMD arithmetics, SIMD dotprod etc. It could be propagated to other instruction sets if necessary.

* hopefully fixed compile errors

* continue to fix CI

* another attempt to fix build on Linux aarch64

* * reverted to the original method to detect special arm neon instructions without signal()
* renamed FP16_SIMD & BF16_SIMD to NEON_FP16 and NEON_BF16, respectively

* removed extra whitespaces
thewoz pushed a commit to thewoz/opencv that referenced this pull request May 29, 2024
…FP16, NEON_BF16) (opencv#24420)

* added more or less cross-platform (based on POSIX signal() semantics) method to detect various NEON extensions, such as FP16 SIMD arithmetics, BF16 SIMD arithmetics, SIMD dotprod etc. It could be propagated to other instruction sets if necessary.

* hopefully fixed compile errors

* continue to fix CI

* another attempt to fix build on Linux aarch64

* * reverted to the original method to detect special arm neon instructions without signal()
* renamed FP16_SIMD & BF16_SIMD to NEON_FP16 and NEON_BF16, respectively

* removed extra whitespaces
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0