8000 WIP: Vectorize cv::resize for INTER_AREA by vrabaud · Pull Request #23525 · opencv/opencv · GitHub
[go: up one dir, main page]

Skip to content

WIP: Vectorize cv::resize for INTER_AREA #23525

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 9 commits into from
Closed

Conversation

vrabaud
Copy link
Contributor
@vrabaud vrabaud commented Apr 21, 2023

This is just a vectorization of the original code. I'll make speed tests once this PR is reviewed.

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

@asmorkalov
Copy link
Contributor

QR code decoding pipeline uses resize INTER_AREA internally. The python test crash may be caused by the optimization.

@asmorkalov asmorkalov added this to the 4.8.0 milestone Apr 22, 2023
@vrabaud vrabaud force-pushed the avif branch 7 times, most recently from 192bbe0 to c5ffc7d Compare April 27, 2023 09:34
@vrabaud
Copy link
Contributor Author
vrabaud commented Apr 27, 2023

@asmorkalov , I am trying to get accurate perf results with "modules/ts/misc/run.py" following https://github.com/opencv/opencv/wiki/HowToUsePerfTests. What options can I use to get more accurate results ? Like --perf_force_samples.

@vpisarev , to speed-up loading and float conversion, I copied code from

static inline void vx_load_as(const uchar* ptr, v_float32& a)
. Should that be made part of HAL ?

@vrabaud vrabaud force-pushed the avif branch 2 times, most recently from be4331c to 2c2b901 Compare April 27, 2023 14:52
@vrabaud
Copy link
Contributor Author
vrabaud commented Apr 28, 2023

BTW, I believe this is ready to be reviewed. Here are the speed-ups I get:

Geometric mean (ms)

                       Name of Test                        imgproc imgproc  imgproc  
                                                             old     new      new    
                                                                               vs    
                                                                            imgproc  
                                                                              old    
                                                                           (x-factor)
CreateHanningWindow::CreateHanningWindowFixture::640x480    0.177   0.177     1.00   
CreateHanningWindow::CreateHanningWindowFixture::1920x1080  1.066   1.060     1.01   
Resize::OCL_ResizeAreaFixture::(640x480, 8UC1, 0.3)         0.900   0.491     1.83   
Resize::OCL_ResizeAreaFixture::(640x480, 8UC1, 0.5)         0.019   0.019     1.00   
Resize::OCL_ResizeAreaFixture::(640x480, 8UC1, 0.6)         1.230   0.219     5.61   
Resize::OCL_ResizeAreaFixture::(640x480, 32FC1, 0.3)        0.501   0.416     1.21   
Resize::OCL_ResizeAreaFixture::(640x480, 32FC1, 0.5)        0.063   0.056     1.13   
Resize::OCL_ResizeAreaFixture::(640x480, 32FC1, 0.6)        0.405   0.185     2.18   
Resize::OCL_ResizeAreaFixture::(640x480, 8UC3, 0.3)         2.084   0.579     3.60   
Resize::OCL_ResizeAreaFixture::(640x480, 8UC3, 0.5)         0.134   0.135     0.99   
Resize::OCL_ResizeAreaFixture::(640x480, 8UC3, 0.6)         4.189   0.576     7.27   
Resize::OCL_ResizeAreaFixture::(640x480, 32FC3, 0.3)        0.797   0.442     1.80   
Resize::OCL_ResizeAreaFixture::(640x480, 32FC3, 0.5)        0.756   0.757     1.00   
Resize::OCL_ResizeAreaFixture::(640x480, 32FC3, 0.6)        0.632   0.630     1.00   
Resize::OCL_ResizeAreaFixture::(640x480, 8UC4, 0.3)         2.849   0.750     3.80   
Resize::OCL_ResizeAreaFixture::(640x480, 8UC4, 0.5)         0.098   0.099     0.99   
Resize::OCL_ResizeAreaFixture::(640x480, 8UC4, 0.6)         4.795   0.799     6.00   
Resize::OCL_ResizeAreaFixture::(640x480, 32FC4, 0.3)        1.255   0.542     2.32   
Resize::OCL_ResizeAreaFixture::(640x480, 32FC4, 0.5)        0.236   0.236     1.00   
Resize::OCL_ResizeAreaFixture::(640x480, 32FC4, 0.6)        0.990   0.898     1.10   
Resize::OCL_ResizeAreaFixture::(1280x720, 8UC1, 0.3)        2.679   0.535     5.00   
Resize::OCL_ResizeAreaFixture::(1280x720, 8UC1, 0.5)        0.053   0.138     0.38   
Resize::OCL_ResizeAreaFixture::(1280x720, 8UC1, 0.6)        2.449   0.260     9.42   
Resize::OCL_ResizeAreaFixture::(1280x720, 32FC1, 0.3)       1.507   0.410     3.67   
Resize::OCL_ResizeAreaFixture::(1280x720, 32FC1, 0.5)       0.105   0.131     0.80   
Resize::OCL_ResizeAreaFixture::(1280x720, 32FC1, 0.6)       0.468   0.260     1.80   
Resize::OCL_ResizeAreaFixture::(1280x720, 8UC3, 0.3)        6.095   1.726     3.53   
Resize::OCL_ResizeAreaFixture::(1280x720, 8UC3, 0.5)        0.307   0.189     1.63   
Resize::OCL_ResizeAreaFixture::(1280x720, 8UC3, 0.6)        5.023   0.954     5.26   
Resize::OCL_ResizeAreaFixture::(1280x720, 32FC3, 0.3)       2.379   1.812     1.31   
Resize::OCL_ResizeAreaFixture::(1280x720, 32FC3, 0.5)       0.658   0.683     0.96   
Resize::OCL_ResizeAreaFixture::(1280x720, 32FC3, 0.6)       0.852   1.414     0.60   
Resize::OCL_ResizeAreaFixture::(1280x720, 8UC4, 0.3)        8.442   2.121     3.98   
Resize::OCL_ResizeAreaFixture::(1280x720, 8UC4, 0.5)        0.150   0.131     1.14   
Resize::OCL_ResizeAreaFixture::(1280x720, 8UC4, 0.6)        6.653   1.670     3.98   
Resize::OCL_ResizeAreaFixture::(1280x720, 32FC4, 0.3)       4.122   2.461     1.67   
Resize::OCL_ResizeAreaFixture::(1280x720, 32FC4, 0.5)       0.243   0.253     0.96   
Resize::OCL_ResizeAreaFixture::(1280x720, 32FC4, 0.6)       1.630   2.636     0.62   
Resize::OCL_ResizeAreaFixture::(1920x1080, 8UC1, 0.3)       3.461   0.495     7.00   
Resize::OCL_ResizeAreaFixture::(1920x1080, 8UC1, 0.5)       0.029   0.076     0.38   
Resize::OCL_ResizeAreaFixture::(1920x1080, 8UC1, 0.6)       3.029   0.379     7.99   
Resize::OCL_ResizeAreaFixture::(1920x1080, 32FC1, 0.3)      1.504   0.484     3.11   
Resize::OCL_ResizeAreaFixture::(1920x1080, 32FC1, 0.5)      0.141   0.155     0.92   
Resize::OCL_ResizeAreaFixture::(1920x1080, 32FC1, 0.6)      0.687   0.407     1.69   
Resize::OCL_ResizeAreaFixture::(1920x1080, 8UC3, 0.3)       6.281   3.112     2.02   
Resize::OCL_ResizeAreaFixture::(1920x1080, 8UC3, 0.5)       0.190   0.235     0.81   
Resize::OCL_ResizeAreaFixture::(1920x1080, 8UC3, 0.6)       6.523   2.035     3.21   
Resize::OCL_ResizeAreaFixture::(1920x1080, 32FC3, 0.3)      3.496   3.736     0.94   
Resize::OCL_ResizeAreaFixture::(1920x1080, 32FC3, 0.5)      0.854   0.874     0.98   
Resize::OCL_ResizeAreaFixture::(1920x1080, 32FC3, 0.6)      1.294   3.213     0.40   
Resize::OCL_ResizeAreaFixture::(1920x1080, 8UC4, 0.3)       6.389   3.846     1.66   
Resize::OCL_ResizeAreaFixture::(1920x1080, 8UC4, 0.5)       0.200   0.135     1.49   
Resize::OCL_ResizeAreaFixture::(1920x1080, 8UC4, 0.6)       9.901   2.996     3.30   
Resize::OCL_ResizeAreaFixture::(1920x1080, 32FC4, 0.3)      4.487   4.302     1.04   
Resize::OCL_ResizeAreaFixture::(1920x1080, 32FC4, 0.5)      0.435   0.438     0.99   
Resize::OCL_ResizeAreaFixture::(1920x1080, 32FC4, 0.6)      2.919   3.718     0.79   
Resize::OCL_ResizeAreaFixture::(3840x2160, 8UC1, 0.3)       4.462   0.987     4.52   
Resize::OCL_ResizeAreaFixture::(3840x2160, 8UC1, 0.5)       0.058   0.055     1.06   
Resize::OCL_ResizeAreaFixture::(3840x2160, 8UC1, 0.6)       5.996   1.144     5.24   
Resize::OCL_ResizeAreaFixture::(3840x2160, 32FC1, 0.3)      3.119   1.562     2.00   
Resize::OCL_ResizeAreaFixture::(3840x2160, 32FC1, 0.5)      0.390   0.391     1.00   
Resize::OCL_ResizeAreaFixture::(3840x2160, 32FC1, 0.6)      2.273   1.701     1.34   
Resize::OCL_ResizeAreaFixture::(3840x2160, 8UC3, 0.3)      10.888   4.802     2.27   
Resize::OCL_ResizeAreaFixture::(3840x2160, 8UC3, 0.5)       0.381   0.348     1.09   
Resize::OCL_ResizeAreaFixture::(3840x2160, 8UC3, 0.6)      13.213   6.481     2.04   
Resize::OCL_ResizeAreaFixture::(3840x2160, 32FC3, 0.3)      4.749   6.773     0.70   
Resize::OCL_ResizeAreaFixture::(3840x2160, 32FC3, 0.5)      2.764   3.308     0.84   
Resize::OCL_ResizeAreaFixture::(3840x2160, 32FC3, 0.6)      4.155   8.645     0.48   
Resize::OCL_ResizeAreaFixture::(3840x2160, 8UC4, 0.3)      14.787   7.452     1.98   
Resize::OCL_ResizeAreaFixture::(3840x2160, 8UC4, 0.5)       0.443   0.539     0.82   
Resize::OCL_ResizeAreaFixture::(
8000
3840x2160, 8UC4, 0.6)      17.201   9.058     1.90   
Resize::OCL_ResizeAreaFixture::(3840x2160, 32FC4, 0.3)      6.319   8.625     0.73   
Resize::OCL_ResizeAreaFixture::(3840x2160, 32FC4, 0.5)      3.057   3.578     0.85   
Resize::OCL_ResizeAreaFixture::(3840x2160, 32FC4, 0.6)      6.035  10.741     0.56   
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 640x480, 1.3)   0.097   0.142     0.69   
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 640x480, 2.4)   0.237   0.227     1.04   
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 640x480, 3.4)   0.239   0.347     0.69   
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 960x540, 1.3)   0.117   0.183     0.64   
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 960x540, 2.4)   0.221   0.340     0.65   
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 960x540, 3.4)   0.449   0.444     1.01   
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 1280x720, 1.3)  0.178   0.205     0.87   
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 1280x720, 2.4)  0.324   0.431     0.75   
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 1280x720, 3.4)  0.688   0.654     1.05   
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 640x480, 1.3)   0.312   0.243     1.28   
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 640x480, 2.4)   0.499   0.458     1.09   
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 640x480, 3.4)   0.725   0.629     1.15   
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 960x540, 1.3)   0.291   0.311     0.93   
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 960x540, 2.4)   0.630   0.594     1.06   
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 960x540, 3.4)   1.012   1.009     1.00   
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 1280x720, 1.3)  0.527   0.465     1.13   
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 1280x720, 2.4)  0.991   1.080     0.92   
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 1280x720, 3.4)  1.746   1.856     0.94   
ResizeAreaFast::MatInfo_Size_Scale::(8UC1, 640x480, 2)      0.018   0.018     1.01   
ResizeAreaFast::MatInfo_Size_Scale::(8UC1, 960x540, 2)      0.032   0.035     0.90   
ResizeAreaFast::MatInfo_Size_Scale::(8UC1, 1280x720, 2)     0.036   0.033     1.08   
ResizeAreaFast::MatInfo_Size_Scale::(8UC1, 1920x1080, 2)    0.044   0.037     1.18   
ResizeAreaFast::MatInfo_Size_Scale::(16UC1, 640x480, 2)     0.035   0.034     1.03   
ResizeAreaFast::MatInfo_Size_Scale::(16UC1, 960x540, 2)     0.048   0.048     1.01   
ResizeAreaFast::MatInfo_Size_Scale::(16UC1, 1280x720, 2)    0.049   0.038     1.29   
ResizeAreaFast::MatInfo_Size_Scale::(16UC1, 1920x1080, 2)   0.057   0.058     0.98   
ResizeAreaFast::MatInfo_Size_Scale::(8UC3, 640x480, 2)      0.133   0.135     0.99   
ResizeAreaFast::MatInfo_Size_Scale::(8UC3, 960x540, 2)      0.131   0.134     0.97   
ResizeAreaFast::MatInfo_Size_Scale::(8UC3, 1280x720, 2)     0.125   0.123     1.01   
ResizeAreaFast::MatInfo_Size_Scale::(8UC3, 1920x1080, 2)    0.149   0.159     0.94   
ResizeAreaFast::MatInfo_Size_Scale::(16UC3, 640x480, 2)     0.127   0.132     0.96   
ResizeAreaFast::MatInfo_Size_Scale::(16UC3, 960x540, 2)     0.126   0.127     0.99   
ResizeAreaFast::MatInfo_Size_Scale::(16UC3, 1280x720, 2)    0.117   0.123     0.96   
ResizeAreaFast::MatInfo_Size_Scale::(16UC3, 1920x1080, 2)   0.151   0.156     0.97   
ResizeAreaFast::MatInfo_Size_Scale::(8UC4, 640x480, 2)      0.097   0.098     0.99   
ResizeAreaFast::MatInfo_Size_Scale::(8UC4, 960x540, 2)      0.084   0.101     0.84   
ResizeAreaFast::MatInfo_Size_Scale::(8UC4, 1280x720, 2)     0.094   0.095     0.99   
ResizeAreaFast::MatInfo_Size_Scale::(8UC4, 1920x1080, 2)    0.117   0.118     1.00   
ResizeAreaFast::MatInfo_Size_Scale::(16UC4, 640x480, 2)     0.133   0.144     0.93   
ResizeAreaFast::MatInfo_Size_Scale::(16UC4, 960x540, 2)     0.131   0.137     0.96   
ResizeAreaFast::MatInfo_Size_Scale::(16UC4, 1280x720, 2)    0.125   0.126     0.99   
ResizeAreaFast::MatInfo_Size_Scale::(16UC4, 1920x1080, 2)   0.177   0.188     0.95   

@asmorkalov asmorkalov self-requested a review May 23, 2023 10:12
@asmorkalov
Copy link
Contributor

Hello @vrabaud I made some research of your patch and result is very controversial. For my AMD Ryzen7 2700X it does not improve performance at all. The difference is comparable with statistics fluctuations. For Jetson NANO (arm v8 x64 by NVIDIA) I see performance speedup for small images. Looks like the speedup is related to more efficient cache reuse. HD images and larger with 3-4 channels become slower. For Jetson tk1 (armv7) I see the same behavior, but size threshold is even lower than for Jetson NANO. Most of resolutions have degradation.

I attached archive with all experiments (xml).
resize_perf.zip

Copy link
Contributor
@asmorkalov asmorkalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR does not demonstrate stable performance improvement.

@asmorkalov asmorkalov removed this from the 4.8.0 milestone May 23, 2023
@vrabaud
Copy link
Contributor Author
vrabaud commented May 23, 2023

Thx for checking. Let me try two things:

  • a load/store using a step for the second pass so that we do not do two transpositions
  • a vectorization of the multiplication but not the summation. That should beat the loop unrolling at least.

@asmorkalov
Copy link
Contributor

@vrabaud do you plan to work on it or I may close the PR?

@vrabaud vrabaud changed the title Vectorize cv::resize for INTER_AREA WIP: Vectorize cv::resize for INTER_AREA Sep 20, 2023
@vrabaud
Copy link
Contributor Author
vrabaud commented Sep 20, 2023

I renamed it as WIP to avoid confusion. I can close it and re-open later if you prefer. I thought this might get others interested in the meantime.

@opencv-alalek
Copy link
Contributor

@vrabaud BTW, there is Draft feature for PRs:

Still in progress? Convert to draft under "Reviewers" section

@vrabaud vrabaud marked this pull request as draft September 22, 2023 09:37
asmorkalov pushed a commit that referenced this pull request Oct 19, 2023
Speed up line merging in INTER_AREA #24412

This provides a 10 to 20% speed-up.

Related perf test fix: #24417
This is a split of #23525 that will be updated to only deal with column merging.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
@asmorkalov
Copy link
Contributor

@vrabaud Should I close this after #24412 merge?

@vrabaud
Copy link
Contributor Author
vrabaud commented Oct 20, 2023

Actually no: the other PR is just about line merging. I will now specialize this one for column merging. Almost done :)

IskXCr pushed a commit to Haosonn/opencv that referenced this pull request Dec 20, 2023
Speed up line merging in INTER_AREA opencv#24412

This provides a 10 to 20% speed-up.

Related perf test fix: opencv#24417
This is a split of opencv#23525 that will be updated to only deal with column merging.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
thewoz pushed a commit to thewoz/opencv that referenced this pull request Jan 4, 2024
Speed up line merging in INTER_AREA opencv#24412

This provides a 10 to 20% speed-up.

Related perf test fix: opencv#24417
This is a split of opencv#23525 that will be updated to only deal with column merging.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
thewoz pushed a commit to thewoz/opencv that referenced this pull request May 29, 2024
Speed up line merging in INTER_AREA opencv#24412

This provides a 10 to 20% speed-up.

Related perf test fix: opencv#24417
This is a split of opencv#23525 that will be updated to only deal with column merging.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
@asmorkalov
Copy link
Contributor

@vrabaud Is the PR still relevant?

@vrabaud
Copy link
Contributor Author
vrabaud commented Mar 20, 2025

#24412 actually got most of the speed-up. Doing the transpose here is too slow. A last solution is to first pack the image vertically, then process the columns in parallel. But the load/store are slow as the memory is not contiguous, for operations that are already partially parallel for 2,3,4 channels. parts of the registers are not used in those cases but that does not bring any measurable speed-up.
Stopping here.

@vrabaud vrabaud closed this Mar 20, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0