WIP: Vectorize cv::resize for INTER_AREA #23525

vrabaud · 2023-04-21T22:24:13Z

This is just a vectorization of the original code. I'll make speed tests once this PR is reviewed.

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

I agree to contribute to the project under Apache 2 License.
To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
The PR is proposed to the proper branch
There is a reference to the original bug report and related work
There is accuracy test, performance test and test data in opencv_extra repository, if applicable
Patch to opencv_extra has the same branch name.
The feature is well documented and sample code can be built with the project CMake

asmorkalov · 2023-04-22T05:50:45Z

QR code decoding pipeline uses resize INTER_AREA internally. The python test crash may be caused by the optimization.

vrabaud · 2023-04-27T11:07:19Z

@asmorkalov , I am trying to get accurate perf results with "modules/ts/misc/run.py" following https://github.com/opencv/opencv/wiki/HowToUsePerfTests. What options can I use to get more accurate results ? Like --perf_force_samples.

@vpisarev , to speed-up loading and float conversion, I copied code from

opencv/modules/core/src/convert.hpp

Line 16 in 6dbc5e0

static inline void vx_load_as(const uchar* ptr, v_float32& a)

. Should that be made part of HAL ?

vrabaud · 2023-04-28T12:24:50Z

BTW, I believe this is ready to be reviewed. Here are the speed-ups I get:

Geometric mean (ms)

                       Name of Test                        imgproc imgproc  imgproc  
                                                             old     new      new    
                                                                               vs    
                                                                            imgproc  
                                                                              old    
                                                                           (x-factor)
CreateHanningWindow::CreateHanningWindowFixture::640x480    0.177   0.177     1.00   
CreateHanningWindow::CreateHanningWindowFixture::1920x1080  1.066   1.060     1.01   
Resize::OCL_ResizeAreaFixture::(640x480, 8UC1, 0.3)         0.900   0.491     1.83   
Resize::OCL_ResizeAreaFixture::(640x480, 8UC1, 0.5)         0.019   0.019     1.00   
Resize::OCL_ResizeAreaFixture::(640x480, 8UC1, 0.6)         1.230   0.219     5.61   
Resize::OCL_ResizeAreaFixture::(640x480, 32FC1, 0.3)        0.501   0.416     1.21   
Resize::OCL_ResizeAreaFixture::(640x480, 32FC1, 0.5)        0.063   0.056     1.13   
Resize::OCL_ResizeAreaFixture::(640x480, 32FC1, 0.6)        0.405   0.185     2.18   
Resize::OCL_ResizeAreaFixture::(640x480, 8UC3, 0.3)         2.084   0.579     3.60   
Resize::OCL_ResizeAreaFixture::(640x480, 8UC3, 0.5)         0.134   0.135     0.99   
Resize::OCL_ResizeAreaFixture::(640x480, 8UC3, 0.6)         4.189   0.576     7.27   
Resize::OCL_ResizeAreaFixture::(640x480, 32FC3, 0.3)        0.797   0.442     1.80   
Resize::OCL_ResizeAreaFixture::(640x480, 32FC3, 0.5)        0.756   0.757     1.00   
Resize::OCL_ResizeAreaFixture::(640x480, 32FC3, 0.6)        0.632   0.630     1.00   
Resize::OCL_ResizeAreaFixture::(640x480, 8UC4, 0.3)         2.849   0.750     3.80   
Resize::OCL_ResizeAreaFixture::(640x480, 8UC4, 0.5)         0.098   0.099     0.99   
Resize::OCL_ResizeAreaFixture::(640x480, 8UC4, 0.6)         4.795   0.799     6.00   
Resize::OCL_ResizeAreaFixture::(640x480, 32FC4, 0.3)        1.255   0.542     2.32   
Resize::OCL_ResizeAreaFixture::(640x480, 32FC4, 0.5)        0.236   0.236     1.00   
Resize::OCL_ResizeAreaFixture::(640x480, 32FC4, 0.6)        0.990   0.898     1.10   
Resize::OCL_ResizeAreaFixture::(1280x720, 8UC1, 0.3)        2.679   0.535     5.00   
Resize::OCL_ResizeAreaFixture::(1280x720, 8UC1, 0.5)        0.053   0.138     0.38   
Resize::OCL_ResizeAreaFixture::(1280x720, 8UC1, 0.6)        2.449   0.260     9.42   
Resize::OCL_ResizeAreaFixture::(1280x720, 32FC1, 0.3)       1.507   0.410     3.67   
Resize::OCL_ResizeAreaFixture::(1280x720, 32FC1, 0.5)       0.105   0.131     0.80   
Resize::OCL_ResizeAreaFixture::(1280x720, 32FC1, 0.6)       0.468   0.260     1.80   
Resize::OCL_ResizeAreaFixture::(1280x720, 8UC3, 0.3)        6.095   1.726     3.53   
Resize::OCL_ResizeAreaFixture::(1280x720, 8UC3, 0.5)        0.307   0.189     1.63   
Resize::OCL_ResizeAreaFixture::(1280x720, 8UC3, 0.6)        5.023   0.954     5.26   
Resize::OCL_ResizeAreaFixture::(1280x720, 32FC3, 0.3)       2.379   1.812     1.31   
Resize::OCL_ResizeAreaFixture::(1280x720, 32FC3, 0.5)       0.658   0.683     0.96   
Resize::OCL_ResizeAreaFixture::(1280x720, 32FC3, 0.6)       0.852   1.414     0.60   
Resize::OCL_ResizeAreaFixture::(1280x720, 8UC4, 0.3)        8.442   2.121     3.98   
Resize::OCL_ResizeAreaFixture::(1280x720, 8UC4, 0.5)        0.150   0.131     1.14   
Resize::OCL_ResizeAreaFixture::(1280x720, 8UC4, 0.6)        6.653   1.670     3.98   
Resize::OCL_ResizeAreaFixture::(1280x720, 32FC4, 0.3)       4.122   2.461     1.67   
Resize::OCL_ResizeAreaFixture::(1280x720, 32FC4, 0.5)       0.243   0.253     0.96   
Resize::OCL_ResizeAreaFixture::(1280x720, 32FC4, 0.6)       1.630   2.636     0.62   
Resize::OCL_ResizeAreaFixture::(1920x1080, 8UC1, 0.3)       3.461   0.495     7.00   
Resize::OCL_ResizeAreaFixture::(1920x1080, 8UC1, 0.5)       0.029   0.076     0.38   
Resize::OCL_ResizeAreaFixture::(1920x1080, 8UC1, 0.6)       3.029   0.379     7.99   
Resize::OCL_ResizeAreaFixture::(1920x1080, 32FC1, 0.3)      1.504   0.484     3.11   
Resize::OCL_ResizeAreaFixture::(1920x1080, 32FC1, 0.5)      0.141   0.155     0.92   
Resize::OCL_ResizeAreaFixture::(1920x1080, 32FC1, 0.6)      0.687   0.407     1.69   
Resize::OCL_ResizeAreaFixture::(1920x1080, 8UC3, 0.3)       6.281   3.112     2.02   
Resize::OCL_ResizeAreaFixture::(1920x1080, 8UC3, 0.5)       0.190   0.235     0.81   
Resize::OCL_ResizeAreaFixture::(1920x1080, 8UC3, 0.6)       6.523   2.035     3.21   
Resize::OCL_ResizeAreaFixture::(1920x1080, 32FC3, 0.3)      3.496   3.736     0.94   
Resize::OCL_ResizeAreaFixture::(1920x1080, 32FC3, 0.5)      0.854   0.874     0.98   
Resize::OCL_ResizeAreaFixture::(1920x1080, 32FC3, 0.6)      1.294   3.213     0.40   
Resize::OCL_ResizeAreaFixture::(1920x1080, 8UC4, 0.3)       6.389   3.846     1.66   
Resize::OCL_ResizeAreaFixture::(1920x1080, 8UC4, 0.5)       0.200   0.135     1.49   
Resize::OCL_ResizeAreaFixture::(1920x1080, 8UC4, 0.6)       9.901   2.996     3.30   
Resize::OCL_ResizeAreaFixture::(1920x1080, 32FC4, 0.3)      4.487   4.302     1.04   
Resize::OCL_ResizeAreaFixture::(1920x1080, 32FC4, 0.5)      0.435   0.438     0.99   
Resize::OCL_ResizeAreaFixture::(1920x1080, 32FC4, 0.6)      2.919   3.718     0.79   
Resize::OCL_ResizeAreaFixture::(3840x2160, 8UC1, 0.3)       4.462   0.987     4.52   
Resize::OCL_ResizeAreaFixture::(3840x2160, 8UC1, 0.5)       0.058   0.055     1.06   
Resize::OCL_ResizeAreaFixture::(3840x2160, 8UC1, 0.6)       5.996   1.144     5.24   
Resize::OCL_ResizeAreaFixture::(3840x2160, 32FC1, 0.3)      3.119   1.562     2.00   
Resize::OCL_ResizeAreaFixture::(3840x2160, 32FC1, 0.5)      0.390   0.391     1.00   
Resize::OCL_ResizeAreaFixture::(3840x2160, 32FC1, 0.6)      2.273   1.701     1.34   
Resize::OCL_ResizeAreaFixture::(3840x2160, 8UC3, 0.3)      10.888   4.802     2.27   
Resize::OCL_ResizeAreaFixture::(3840x2160, 8UC3, 0.5)       0.381   0.348     1.09   
Resize::OCL_ResizeAreaFixture::(3840x2160, 8UC3, 0.6)      13.213   6.481     2.04   
Resize::OCL_ResizeAreaFixture::(3840x2160, 32FC3, 0.3)      4.749   6.773     0.70   
Resize::OCL_ResizeAreaFixture::(3840x2160, 32FC3, 0.5)      2.764   3.308     0.84   
Resize::OCL_ResizeAreaFixture::(3840x2160, 32FC3, 0.6)      4.155   8.645     0.48   
Resize::OCL_ResizeAreaFixture::(3840x2160, 8UC4, 0.3)      14.787   7.452     1.98   
Resize::OCL_ResizeAreaFixture::(3840x2160, 8UC4, 0.5)       0.443   0.539     0.82   
Resize::OCL_ResizeAreaFixture::(
8000
3840x2160, 8UC4, 0.6)      17.201   9.058     1.90   
Resize::OCL_ResizeAreaFixture::(3840x2160, 32FC4, 0.3)      6.319   8.625     0.73   
Resize::OCL_ResizeAreaFixture::(3840x2160, 32FC4, 0.5)      3.057   3.578     0.85   
Resize::OCL_ResizeAreaFixture::(3840x2160, 32FC4, 0.6)      6.035  10.741     0.56   
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 640x480, 1.3)   0.097   0.142     0.69   
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 640x480, 2.4)   0.237   0.227     1.04   
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 640x480, 3.4)   0.239   0.347     0.69   
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 960x540, 1.3)   0.117   0.183     0.64   
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 960x540, 2.4)   0.221   0.340     0.65   
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 960x540, 3.4)   0.449   0.444     1.01   
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 1280x720, 1.3)  0.178   0.205     0.87   
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 1280x720, 2.4)  0.324   0.431     0.75   
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 1280x720, 3.4)  0.688   0.654     1.05   
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 640x480, 1.3)   0.312   0.243     1.28   
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 640x480, 2.4)   0.499   0.458     1.09   
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 640x480, 3.4)   0.725   0.629     1.15   
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 960x540, 1.3)   0.291   0.311     0.93   
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 960x540, 2.4)   0.630   0.594     1.06   
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 960x540, 3.4)   1.012   1.009     1.00   
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 1280x720, 1.3)  0.527   0.465     1.13   
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 1280x720, 2.4)  0.991   1.080     0.92   
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 1280x720, 3.4)  1.746   1.856     0.94   
ResizeAreaFast::MatInfo_Size_Scale::(8UC1, 640x480, 2)      0.018   0.018     1.01   
ResizeAreaFast::MatInfo_Size_Scale::(8UC1, 960x540, 2)      0.032   0.035     0.90   
ResizeAreaFast::MatInfo_Size_Scale::(8UC1, 1280x720, 2)     0.036   0.033     1.08   
ResizeAreaFast::MatInfo_Size_Scale::(8UC1, 1920x1080, 2)    0.044   0.037     1.18   
ResizeAreaFast::MatInfo_Size_Scale::(16UC1, 640x480, 2)     0.035   0.034     1.03   
ResizeAreaFast::MatInfo_Size_Scale::(16UC1, 960x540, 2)     0.048   0.048     1.01   
ResizeAreaFast::MatInfo_Size_Scale::(16UC1, 1280x720, 2)    0.049   0.038     1.29   
ResizeAreaFast::MatInfo_Size_Scale::(16UC1, 1920x1080, 2)   0.057   0.058     0.98   
ResizeAreaFast::MatInfo_Size_Scale::(8UC3, 640x480, 2)      0.133   0.135     0.99   
ResizeAreaFast::MatInfo_Size_Scale::(8UC3, 960x540, 2)      0.131   0.134     0.97   
ResizeAreaFast::MatInfo_Size_Scale::(8UC3, 1280x720, 2)     0.125   0.123     1.01   
ResizeAreaFast::MatInfo_Size_Scale::(8UC3, 1920x1080, 2)    0.149   0.159     0.94   
ResizeAreaFast::MatInfo_Size_Scale::(16UC3, 640x480, 2)     0.127   0.132     0.96   
ResizeAreaFast::MatInfo_Size_Scale::(16UC3, 960x540, 2)     0.126   0.127     0.99   
ResizeAreaFast::MatInfo_Size_Scale::(16UC3, 1280x720, 2)    0.117   0.123     0.96   
ResizeAreaFast::MatInfo_Size_Scale::(16UC3, 1920x1080, 2)   0.151   0.156     0.97   
ResizeAreaFast::MatInfo_Size_Scale::(8UC4, 640x480, 2)      0.097   0.098     0.99   
ResizeAreaFast::MatInfo_Size_Scale::(8UC4, 960x540, 2)      0.084   0.101     0.84   
ResizeAreaFast::MatInfo_Size_Scale::(8UC4, 1280x720, 2)     0.094   0.095     0.99   
ResizeAreaFast::MatInfo_Size_Scale::(8UC4, 1920x1080, 2)    0.117   0.118     1.00   
ResizeAreaFast::MatInfo_Size_Scale::(16UC4, 640x480, 2)     0.133   0.144     0.93   
ResizeAreaFast::MatInfo_Size_Scale::(16UC4, 960x540, 2)     0.131   0.137     0.96   
ResizeAreaFast::MatInfo_Size_Scale::(16UC4, 1280x720, 2)    0.125   0.126     0.99   
ResizeAreaFast::MatInfo_Size_Scale::(16UC4, 1920x1080, 2)   0.177   0.188     0.95

asmorkalov · 2023-05-23T10:23:08Z

Hello @vrabaud I made some research of your patch and result is very controversial. For my AMD Ryzen7 2700X it does not improve performance at all. The difference is comparable with statistics fluctuations. For Jetson NANO (arm v8 x64 by NVIDIA) I see performance speedup for small images. Looks like the speedup is related to more efficient cache reuse. HD images and larger with 3-4 channels become slower. For Jetson tk1 (armv7) I see the same behavior, but size threshold is even lower than for Jetson NANO. Most of resolutions have degradation.

I attached archive with all experiments (xml).
resize_perf.zip

asmorkalov

The PR does not demonstrate stable performance improvement.

vrabaud · 2023-05-23T14:05:13Z

Thx for checking. Let me try two things:

a load/store using a step for the second pass so that we do not do two transpositions
a vectorization of the multiplication but not the summation. That should beat the loop unrolling at least.

asmorkalov · 2023-09-20T12:30:02Z

@vrabaud do you plan to work on it or I may close the PR?

vrabaud · 2023-09-20T12:58:55Z

I renamed it as WIP to avoid confusion. I can close it and re-open later if you prefer. I thought this might get others interested in the meantime.

opencv-alalek · 2023-09-21T22:44:31Z

@vrabaud BTW, there is Draft feature for PRs:

Still in progress? Convert to draft under "Reviewers" section

Speed up line merging in INTER_AREA #24412 This provides a 10 to 20% speed-up. Related perf test fix: #24417 This is a split of #23525 that will be updated to only deal with column merging. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake

asmorkalov · 2023-10-20T11:23:36Z

@vrabaud Should I close this after #24412 merge?

vrabaud · 2023-10-20T12:10:28Z

Actually no: the other PR is just about line merging. I will now specialize this one for column merging. Almost done :)

Speed up line merging in INTER_AREA opencv#24412 This provides a 10 to 20% speed-up. Related perf test fix: opencv#24417 This is a split of opencv#23525 that will be updated to only deal with column merging. ### Pull Request Readiness Checklist See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request - [x] I agree to contribute to the project under Apache 2 License. - [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV - [x] The PR is proposed to the proper branch - [x] There is a reference to the original bug report and related work - [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable Patch to opencv_extra has the same branch name. - [x] The feature is well documented and sample code can be built with the project CMake

asmorkalov · 2025-03-14T06:20:49Z

@vrabaud Is the PR still relevant?

vrabaud · 2025-03-20T09:10:09Z

#24412 actually got most of the speed-up. Doing the transpose here is too slow. A last solution is to first pack the image vertically, then process the columns in parallel. But the load/store are slow as the memory is not contiguous, for operations that are already partially parallel for 2,3,4 channels. parts of the registers are not used in those cases but that does not bring any measurable speed-up.
Stopping here.

vrabaud force-pushed the avif branch 3 times, most recently from 6d8841d to c0c11f1 Compare April 21, 2023 22:58

asmorkalov added optimization category: imgproc labels Apr 22, 2023

asmorkalov added this to the 4.8.0 milestone Apr 22, 2023

vrabaud force-pushed the avif branch 7 times, most recently from 192bbe0 to c5ffc7d Compare April 27, 2023 09:34

vrabaud force-pushed the avif branch 2 times, most recently from be4331c to 2c2b901 Compare April 27, 2023 14:52

vrabaud added 7 commits April 28, 2023 10:33

Vectorize cv::resize for INTER_AREA

5622ffe

Revert last minute bad dst copy.

7bdfd94

Remove the old INTER_AREA implementation.

b8f4105

Speed up saturate_cast copies.

59c140c

Remove the need for an extra buffer + conversion.

40342e4

Small clean-ups

fdbd16e

Save a memcpy if the type is the same.

2284049

vrabaud force-pushed the avif branch from dd72b8a to d28c59f Compare April 28, 2023 08:35

Use FMA and do not multiply by cn in tabs computation.

ec59870

vrabaud force-pushed the avif branch from d28c59f to ec59870 Compare April 28, 2023 08:41

Fix some compilation issues.

13918bc

asmorkalov self-requested a review May 23, 2023 10:12

asmorkalov requested changes May 23, 2023

View reviewed changes

asmorkalov removed this from the 4.8.0 milestone May 23, 2023

vrabaud changed the title ~~Vectorize cv::resize for INTER_AREA~~ WIP: Vectorize cv::resize for INTER_AREA Sep 20, 2023

vrabaud marked this pull request as draft September 22, 2023 09:37

vrabaud mentioned this pull request Oct 16, 2023

Speed up line merging in INTER_AREA #24412

Merged

6 tasks

vrabaud closed this Mar 20, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

WIP: Vectorize cv::resize for INTER_AREA #23525

WIP: Vectorize cv::resize for INTER_AREA #23525

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

WIP: Vectorize cv::resize for INTER_AREA #23525

WIP: Vectorize cv::resize for INTER_AREA #23525

Uh oh!

Conversation

Pull Request Readiness Checklist

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!