-
-
Notifications
You must be signed in to change notification settings - Fork 56.2k
Speed up line merging in INTER_AREA #24412
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Running:
Here are the results ("real" INTER_AREA is only used for non-integer scale division):
|
My results for AMD Ryzen 7 2700X:
|
Less obvious result for Jetson TK1 (Armv7 with NEON):
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OpenCV team migrates from static-size vector code to scalable vector code to support RISC-V RVV and other platforms. Could you accommodate to it? Example: #24166
Thx, I added taskset -c 33 python3 .//modules/ts/misc/run.py ...... |
modules/imgproc/src/resize.cpp
Outdated
const v_int32 tmp0 = v_round(vx_load(src + 0 * v_float32::nlanes)); | ||
const v_int32 tmp1 = v_round(vx_load(src + 1 * v_float32::nlanes)); | ||
const v_int32 tmp2 = v_round(vx_load(src + 2 * v_float32::nlanes)); | ||
const v_int32 tmp3 = v_round(vx_load(src + 3 * v_float32::nlanes)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
v_float32::nlanes
-> VTraits<float32>::vlanes()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx, done. BTW, it could be nice to change that API to make it constexpr.
@mshabunin , it seems the CI is hanging for RISC-V. |
@vrabaud, thank you for the patch! I wonder why in several places of the patch double-precision floating-point arithmetics is used? Isn't FP32 enough for this algorithm? (unless resize is applied to FP64 images). I mean, it's fine to use double precision to construct the interpolation tables, but when we do actual interpolation and accumulation, FP32 should probably be enough, right? |
@vpisarev , indeed float32 is used for all integer types and float32. Double is only used for doubles: opencv/modules/imgproc/src/resize.cpp Line 3797 in 2f1d529
This is the current behavior so I kept it. BTW, any idea as to why int and schar are disabled? |
RSIC-V builds are broken in CI now. Manual build error messages:
|
@vrabaud you need to use v_add, v_mul and other functions instead of overloaded +, * and other operators. |
Thx @asmorkalov , I believe I fixed it using |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
Tested RISC-V config manually.
As summary: |
0.1 and 0.25 mean integer proportions (a tenth and 4th of the original image) whose dimensions are divisible by 10 or 4. In that case, a different algorithm is used (areafast) that this pull request does not speed up. It is normal to get no change there. 0.81 is a non-even scale, normal INTER_AREA is used there and that's where you can see the speed-up. |
Speed up line merging in INTER_AREA opencv#24412 This provides a 10 to 20% speed-up. Related perf test fix: opencv#24417 This is a split of opencv#23525 that will be updated to only deal with column merging. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake
Speed up line merging in INTER_AREA opencv#24412 This provides a 10 to 20% speed-up. Related perf test fix: opencv#24417 This is a split of opencv#23525 that will be updated to only deal with column merging. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake
Speed up line merging in INTER_AREA opencv#24412 This provides a 10 to 20% speed-up. Related perf test fix: opencv#24417 This is a split of opencv#23525 that will be updated to only deal with column merging. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake
This provides a 10 to 20% speed-up.
Related perf test fix: #24417
This is a split of #23525 that will be updated to only deal with column merging.
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.