-
-
Notifications
You must be signed in to change notification settings - Fork 56.2k
WIP: Vectorize cv::resize for INTER_AREA #23525
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
6d8841d
to
c0c11f1
Compare
QR code decoding pipeline uses resize INTER_AREA internally. The python test crash may be caused by the optimization. |
192bbe0
to
c5ffc7d
Compare
@asmorkalov , I am trying to get accurate perf results with "modules/ts/misc/run.py" following https://github.com/opencv/opencv/wiki/HowToUsePerfTests. What options can I use to get more accurate results ? Like @vpisarev , to speed-up loading and float conversion, I copied code from opencv/modules/core/src/convert.hpp Line 16 in 6dbc5e0
|
be4331c
to
2c2b901
Compare
BTW, I believe this is ready to be reviewed. Here are the speed-ups I get:
|
Hello @vrabaud I made some research of your patch and result is very controversial. For my AMD Ryzen7 2700X it does not improve performance at all. The difference is comparable with statistics fluctuations. For Jetson NANO (arm v8 x64 by NVIDIA) I see performance speedup for small images. Looks like the speedup is related to more efficient cache reuse. HD images and larger with 3-4 channels become slower. For Jetson tk1 (armv7) I see the same behavior, but size threshold is even lower than for Jetson NANO. Most of resolutions have degradation. I attached archive with all experiments (xml). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR does not demonstrate stable performance improvement.
Thx for checking. Let me try two things:
|
@vrabaud do you plan to work on it or I may close the PR? |
I renamed it as WIP to avoid confusion. I can close it and re-open later if you prefer. I thought this might get others interested in the meantime. |
@vrabaud BTW, there is
|
Speed up line merging in INTER_AREA #24412 This provides a 10 to 20% speed-up. Related perf test fix: #24417 This is a split of #23525 that will be updated to only deal with column merging. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake
Actually no: the other PR is just about line merging. I will now specialize this one for column merging. Almost done :) |
Speed up line merging in INTER_AREA opencv#24412 This provides a 10 to 20% speed-up. Related perf test fix: opencv#24417 This is a split of opencv#23525 that will be updated to only deal with column merging. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake
Speed up line merging in INTER_AREA opencv#24412 This provides a 10 to 20% speed-up. Related perf test fix: opencv#24417 This is a split of opencv#23525 that will be updated to only deal with column merging. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake
Speed up line merging in INTER_AREA opencv#24412 This provides a 10 to 20% speed-up. Related perf test fix: opencv#24417 This is a split of opencv#23525 that will be updated to only deal with column merging. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake
@vrabaud Is the PR still relevant? |
#24412 actually got most of the speed-up. Doing the transpose here is too slow. A last solution is to first pack the image vertically, then process the columns in parallel. But the load/store are slow as the memory is not contiguous, for operations that are already partially parallel for 2,3,4 channels. parts of the registers are not used in those cases but that does not bring any measurable speed-up. |
This is just a vectorization of the original code. I'll make speed tests once this PR is reviewed.
Pull Request Readiness Checklist
See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request
Patch to opencv_extra has the same branch name.