8000 Speed up line merging in INTER_AREA by vrabaud · Pull Request #24412 · opencv/opencv · GitHub
[go: up one dir, main page]

Skip to content

Speed up line merging in INTER_AREA #24412

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Oct 19, 2023
Merged

Conversation

vrabaud
Copy link
Contributor
@vrabaud vrabaud commented Oct 16, 2023

This provides a 10 to 20% speed-up.

Related perf test fix: #24417
This is a split of #23525 that will be updated to only deal with column merging.

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
  • The PR is proposed to the proper branch
  • There is a reference to the original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake

@vrabaud
Copy link
Contributor Author
vrabaud commented Oct 17, 2023

Running:

taskset -c 33 python3 .//modules/ts/misc/run.py ./build/bin/ -t imgproc --gtest_filter=*MatInfo_Size_Scale_Area* --perf_min_samples=500 --perf_force_samples=500

Here are the results ("real" INTER_AREA is only used for non-integer scale division):

                        Name of Test                             imgproc     imgproc     imgproc                                                                 20231017-104246  patch       patch     
                                                                                           vs                                                                                                imgproc    
                                                                                     20231017-104246
                                                                                       (x-factor)   
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 640x480, 0.1)         0.107       0.099       1.07      
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 640x480, 0.25)        0.164       0.164       1.00      
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 640x480, 0.81)        1.285       1.119       1.15      
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 960x540, 0
10000
.1)         0.167       0.167       1.00      
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 960x540, 0.25)        0.276       0.276       1.00      
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 960x540, 0.81)        2.169       1.890       1.15      
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 1280x720, 0.1)        0.296       0.297       1.00      
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 1280x720, 0.25)       0.491       0.492       1.00      
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 1280x720, 0.81)       3.898       3.420       1.14      
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 1920x1080, 0.1)       0.673       0.673       1.00      
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 1920x1080, 0.25)      1.121       1.121       1.00      
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 1920x1080, 0.81)      8.704       7.642       1.14      
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 3840x2160, 0.1)       2.643       2.644       1.00      
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 3840x2160, 0.25)      4.434       4.433       1.00      
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 3840x2160, 0.81)     35.254      30.796       1.14      
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 640x480, 0.1)         0.296       0.296       1.00      
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 640x480, 0.25)        0.487       0.487       1.00      
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 640x480, 0.81)        2.990       2.477       1.21      
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 960x540, 0.1)         0.505       0.505       1.00      
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 960x540, 0.25)        0.843       0.843       1.00      
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 960x540, 0.81)        5.000       4.189       1.19      
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 1280x720, 0.1)        0.889       0.887       1.00      
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 1280x720, 0.25)       1.475       1.475       1.00      
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 1280x720, 0.81)       9.054       7.559       1.20      
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 1920x1080, 0.1)       1.996       1.990       1.00      
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 1920x1080, 0.25)      3.289       3.289       1.00      
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 1920x1080, 0.81)     20.178      16.801       1.20      
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 3840x2160, 0.1)       7.963       7.961       1.00      
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 3840x2160, 0.25)     13.174      13.190       1.00      
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 3840x2160, 0.81)     81.505      67.946       1.20      
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 640x480, 0.1)         0.400       0.400       1.00      
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 640x480, 0.25)        0.671       0.671       1.00      
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 640x480, 0.81)        3.868       3.206       1.21      
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 960x540, 0.1)         0.666       0.666       1.00      
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 960x540, 0.25)        1.112       1.111       1.00      
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 960x540, 0.81)        6.553       5.399       1.21      
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 1280x720, 0.1)        1.186       1.186       1.00      
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 1280x720, 0.25)       1.962       1.962       1.00      
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 1280x720, 0.81)      11.697       9.631       1.21      
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 1920x1080, 0.1)       2.652       2.647       1.00      
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 1920x1080, 0.25)      4.411       4.410       1.00      
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 1920x1080, 0.81)     26.304      21.632       1.22      
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 3840x2160, 0.1)      10.567      10.563       1.00      
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 3840x2160, 0.25)     17.683      17.692       1.00      
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 3840x2160, 0.81)     106.167     87.211       1.22

@asmorkalov
Copy link
Contributor
asmorkalov commented Oct 17, 2023

My results for AMD Ryzen 7 2700X:

ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 640x480, 0.1)             0.091       0.091         1.00   
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 640x480, 0.25)            0.125       0.125         1.00   
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 640x480, 0.81)            0.579       0.371         1.56   
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 960x540, 0.1)             0.148       0.149         1.00   
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 960x540, 0.25)            0.204       0.392         0.52   
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 960x540, 0.81)            0.986       0.554         1.78   
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 1280x720, 0.1)            0.260       0.257         1.01   
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 1280x720, 0.25)           0.359       0.358         1.00   
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 1280x720, 0.81)           1.000       0.756         1.32   
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 1920x1080, 0.1)           0.590       0.591         1.00   
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 1920x1080, 0.25)          0.416       0.423         0.98   
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 1920x1080, 0.81)          1.517       1.378         1.10   
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 3840x2160, 0.1)           2.395       2.444         0.98   
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 3840x2160, 0.25)          0.720       0.759         0.95   
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 3840x2160, 0.81)          4.486       4.018         1.12   
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 640x480, 0.1)             0.273       0.273         1.00   
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 640x480, 0.25)            0.371       0.371         1.00   
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 640x480, 0.81)            0.974       0.903         1.08   
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 960x540, 0.1)             0.440       0.443         0.99   
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 960x540, 0.25)            0.607       0.607         1.00   
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 960x540, 0.81)            1.197       1.184         1.01   
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 1280x720, 0.1)            0.782       0.782         1.00   
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 1280x720, 0.25)           1.076       1.075         1.00   
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 1280x720, 0.81)           1.890       1.642         1.15   
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 1920x1080, 0.1)           1.754       1.785         0.98   
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 1920x1080, 0.25)          1.216       1.227         0.99   
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 1920x1080, 0.81)          3.192       3.043         1.05   
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 3840x2160, 0.1)           7.130       7.174         0.99   
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 3840x2160, 0.25)          2.201       1.928         1.14   
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 3840x2160, 0.81)          9.497       8.959         1.06   
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 640x480, 0.1)             0.361       0.361         1.00   
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 640x480, 0.25)            0.496       0.499         1.00   
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 640x480, 0.81)            1.247       1.057         1.18   
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 960x540, 0.1)             0.596       0.605         0.99   
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 960x540, 0.25)            0.812       0.834         0.97   
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 960x540, 0.81)            1.298       1.103         1.18   
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 1280x720, 0.1)            1.060       1.075         0.99   
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 1280x720, 0.25)           1.465       1.492         0.98   
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 1280x720, 0.81)           2.253       2.042         1.10   
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 1920x1080, 0.1)           2.786       2.394         1.16   
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 1920x1080, 0.25)          1.634       1.644         0.99   
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 1920x1080, 0.81)          4.238       3.883         1.09   
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 3840x2160, 0.1)           9.363       9.626         0.97   
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 3840x2160, 0.25)          2.429       2.637         0.92   
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 3840x2160, 0.81)         12.587      11.660         1.08 

@asmorkalov
Copy link
Contributor
asmorkalov commented Oct 17, 2023

Less obvious result for Jetson TK1 (Armv7 with NEON):

ubuntu@jetson1:~/Projects/perf-resize$ ../opencv/modules/ts/misc/summary.py ./perf_imgproc-4.x-2.xml ./perf_imgproc-patched-2.xml | grep MatInfo_Size_Scale_Area
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 640x480, 0.1)           0.351      0.351   1.00  
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 640x480, 0.25)          0.057      0.056   1.01  
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 640x480, 0.81)          5.853      5.547   1.06  
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 960x540, 0.1)           0.593      0.594   1.00  
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 960x540, 0.25)          0.088      0.087   1.01  
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 960x540, 0.81)         10.575      9.523   1.11  
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 1280x720, 0.1)          1.087      1.067   1.02  
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 1280x720, 0.25)         0.151      0.152   1.00  
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 1280x720, 0.81)        19.225     18.115   1.06  
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 1920x1080, 0.1)         3.074      3.122   0.98  
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 1920x1080, 0.25)        0.384      0.389   0.99  
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 1920x1080, 0.81)       47.033     44.744   1.05  
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 3840x2160, 0.1)        16.553     16.479   1.00  
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 3840x2160, 0.25)        2.205      2.171   1.02  
ResizeArea::MatInfo_Size_Scale_Area::(8UC1, 3840x2160, 0.81)       192.111    183.870  1.04  
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 640x480, 0.1)           1.068      1.064   1.00  
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 640x480, 0.25)          0.223      0.223   1.00  
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 640x480, 0.81)         11.450     10.327   1.11  
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 960x540, 0.1)           2.035      1.957   1.04  
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 960x540, 0.25)          0.390      0.404   0.96  
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 960x540, 0.81)         19.645     18.686   1.05  
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 1280x720, 0.1)          4.919      4.907   1.00  
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 1280x720, 0.25)         0.959      0.940   1.02  
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 1280x720, 0.81)        36.524     33.644   1.09  
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 1920x1080, 0.1)        12.503     12.508   1.00  
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 1920x1080, 0.25)        2.478      2.431   1.02  
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 1920x1080, 0.81)       83.945     76.746   1.09  
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 3840x2160, 0.1)        50.541     50.496   1.00  
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 3840x2160, 0.25)        9.283      9.418   0.99  
ResizeArea::MatInfo_Size_Scale_Area::(8UC3, 3840x2160, 0.81)       331.895    314.355  1.06  
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 640x480, 0.1)           1.460      1.445   1.01  
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 640x480, 0.25)          0.140      0.142   0.99  
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 640x480, 0.81)         13.958     13.407   1.04  
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 960x540, 0.1)           3.063      3.128   0.98  
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 960x540, 0.25)          0.271      0.283   0.96  
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 960x540, 0.81)         24.871     23.424   1.06  
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 1280x720, 0.1)          7.147      7.103   1.01  
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 1280x720, 0.25)         0.758      0.744   1.02  
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 1280x720, 0.81)        44.767     43.367   1.03  
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 1920x1080, 0.1)        16.615     16.530   1.01  
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 1920x1080, 0.25)        2.009      1.992   1.01  
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 1920x1080, 0.81)       102.265    98.891   1.03  
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 3840x2160, 0.1)        66.154     66.484   1.00  
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 3840x2160, 0.25)        7.917      7.922   1.00  
ResizeArea::MatInfo_Size_Scale_Area::(8UC4, 3840x2160, 0.81)       410.325    390.888  1.05 

@asmorkalov asmorkalov self-requested a review October 17, 2023 12:57
Copy link
Contributor
@asmorkalov asmorkalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OpenCV team migrates from static-size vector code to scalable vector code to support RISC-V RVV and other platforms. Could you accommodate to it? Example: #24166

@vrabaud
Copy link
Contributor Author
vrabaud commented Oct 17, 2023

Thx, I added CV_SIMD_SCALABLE.
To get more precise results, may I suggest you run on a specific core? E.g.

taskset -c 33 python3 .//modules/ts/misc/run.py ......

@vrabaud vrabaud requested a review from asmorkalov October 17, 2023 14:43
Comment on lines 3025 to 3028
const v_int32 tmp0 = v_round(vx_load(src + 0 * v_float32::nlanes));
const v_int32 tmp1 = v_round(vx_load(src + 1 * v_float32::nlanes));
const v_int32 tmp2 = v_round(vx_load(src + 2 * v_float32::nlanes));
const v_int32 tmp3 = v_round(vx_load(src + 3 * v_float32::nlanes));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v_float32::nlanes -> VTraits<float32>::vlanes()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx, done. BTW, it could be nice to change that API to make it constexpr.

@vrabaud
Copy link
Contributor Author
vrabaud commented Oct 18, 2023

@mshabunin , it seems the CI is hanging for RISC-V.

@vpisarev
Copy link
Contributor
vpisarev commented Oct 18, 2023

@vrabaud, thank you for the patch! I wonder why in several places of the patch double-precision floating-point arithmetics is used? Isn't FP32 enough for this algorithm? (unless resize is applied to FP64 images).

I mean, it's fine to use double precision to construct the interpolation tables, but when we do actual interpolation and accumulation, FP32 should probably be enough, right?

@vrabaud
Copy link
Contributor Author
vrabaud commented Oct 18, 2023

@vpisarev , indeed float32 is used for all integer types and float32. Double is only used for doubles:

resizeArea_<uchar, float>, 0, resizeArea_<ushort, float>,

This is the current behavior so I kept it.
BTW, any idea as to why int and schar are disabled?

@asmorkalov
Copy link
Contributor

RSIC-V builds are broken in CI now. Manual build error messages:

[ 80%] Building CXX object modules/imgproc/CMakeFiles/opencv_imgproc.dir/src/samplers.cpp.o
/opencv/modules/imgproc/src/resize.cpp:3103:44: error: invalid operands to binary expression ('v_float32' (aka '__rvv_float32m1_t') and 'v_float32')
        vx_store(sum + dx, vx_setall(beta) * vx_load(buf + dx));
                           ~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~
/opencv/modules/imgproc/src/resize.cpp:3229:29: note: in instantiation of function template specialization 'cv::inter_area::mul<float>' requested here
                inter_area::mul(buf, dsize.width, beta, sum);
                            ^
/opencv/modules/imgproc/src/resize.cpp:3258:18: note: in instantiation of member function 'cv::ResizeArea_Invoker<unsigned char, float>::operator()' requested here
                 ResizeArea_Invoker<T, WT>(src, dst, xtab, xtab_size, ytab, ytab_size, tabofs),
                 ^
/opencv/modules/imgproc/src/resize.cpp:3892:9: note: in instantiation of function template specialization 'cv::resizeArea_<unsigned char, float>' requested here
        resizeArea_<uchar, float>, 0, resizeArea_<ushort, float>,
        ^
/opencv/modules/imgproc/src/resize.cpp:3117:64: error: invalid operands to binary expression ('v_float32' (aka '__rvv_float32m1_t') and 'v_float32')
        vx_store(sum + dx, vx_load(sum + dx) + vx_setall(beta) * vx_load(buf + dx));
                                               ~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~
/opencv/modules/imgproc/src/resize.cpp:3234:29: note: in instantiation of function template specialization 'cv::inter_area::muladd<float>' requested here
                inter_area::muladd(buf, dsize.width, beta, sum);
                            ^
/opencv/modules/imgproc/src/resize.cpp:3258:18: note: in instantiation of member function 'cv::ResizeArea_Invoker<unsigned char, float>::operator()' requested here
                 ResizeArea_Invoker<T, WT>(src, dst, xtab, xtab_size, ytab, ytab_size, tabofs),
                 ^
/opencv/modules/imgproc/src/resize.cpp:3892:9: note: in instantiation of function template specialization 'cv::resizeArea_<unsigned char, float>' requested here
        resizeArea_<uchar, float>, 0, resizeArea_<ushort, float>,
        ^
/opencv/modules/imgproc/src/resize.cpp:3229:17: error: no matching function for call to 'mul'
                inter_area::mul(buf, dsize.width, beta, sum);
                ^~~~~~~~~~~~~~~
/opencv/modules/imgproc/src/resize.cpp:3258:18: note: in instantiation of member function 'cv::ResizeArea_Invoker<unsigned short, float>::operator()' requested here
                 ResizeArea_Invoker<T, WT>(src, dst, xtab, xtab_size, ytab, ytab_size, tabofs),
                 ^
/opencv/modules/imgproc/src/resize.cpp:3892:39: note: in instantiation of function template specialization 'cv::resizeArea_<unsigned short, float>' requested here
        resizeArea_<uchar, float>, 0, resizeArea_<ushort, float>,
                                      ^
/opencv/modules/imgproc/src/resize.cpp:3098:13: note: candidate template ignored: substitution failure [with WT = float]
inline void mul(const WT* buf, int width, WT beta, WT* sum) {
            ^
/opencv/modules/imgproc/src/resize.cpp:3234:17: error: no matching function for call to 'muladd'
                inter_area::muladd(buf, dsize.width, beta, sum);
                ^~~~~~~~~~~~~~~~~~
/opencv/modules/imgproc/src/resize.cpp:3112:13: note: candidate template ignored: substitution failure [with WT = float]
inline void muladd(const WT* buf, int width, WT beta, WT* sum) {
            ^
/opencv/modules/imgproc/src/resize.cpp:3229:17: error: no matching function for call to 'mul'
                inter_area::mul(buf, dsize.width, beta, sum);
                ^~~~~~~~~~~~~~~
/opencv/modules/imgproc/src/resize.cpp:3258:18: note: in instantiation of member function 'cv::ResizeArea_Invoker<short, float>::operator()' requested here
                 ResizeArea_Invoker<T, WT>(src, dst, xtab, xtab_size, ytab, ytab_size, tabofs),
                 ^
/opencv/modules/imgproc/src/resize.cpp:3893:9: note: in instantiation of function template specialization 'cv::resizeArea_<short, float>' requested here
        resizeArea_<short, float>, 0, resizeArea_<float, float>,
        ^
/opencv/modules/imgproc/src/resize.cpp:3098:13: note: candidate template ignored: substitution failure [with WT = float]
inline void mul(const WT* buf, int width, WT beta, WT* sum) {
            ^
/opencv/modules/imgproc/src/resize.cpp:3234:17: error: no matching function for call to 'muladd'
                inter_area::muladd(buf, dsize.width, beta, sum);
                ^~~~~~~~~~~~~~~~~~
/opencv/modules/imgproc/src/resize.cpp:3112:13: note: candidate template ignored: substitution failure [with WT = float]
inline void muladd(const WT* buf, int width, WT beta, WT* sum) {
            ^
/opencv/modules/imgproc/src/resize.cpp:3229:17: error: no matching function for call to 'mul'
                inter_area::mul(buf, dsize.width, beta, sum);
                ^~~~~~~~~~~~~~~
/opencv/modules/imgproc/src/resize.cpp:3258:18: note: in instantiation of member function 'cv::ResizeArea_Invoker<float, float>::operator()' requested here
                 ResizeArea_Invoker<T, WT>(src, dst, xtab, xtab_size, ytab, ytab_size, tabofs),
                 ^
/opencv/modules/imgproc/src/resize.cpp:3893:39: note: in instantiation of function template specialization 'cv::resizeArea_<float, float>' requested here
        resizeArea_<short, float>, 0, resizeArea_<float, float>,
                                      ^
/opencv/modules/imgproc/src/resize.cpp:3098:13: note: candidate template ignored: substitution failure [with WT = float]
inline void mul(const WT* buf, int width, WT beta, WT* sum) {
            ^
/opencv/modules/imgproc/src/resize.cpp:3234:17: error: no matching function for call to 'muladd'
                inter_area::muladd(buf, dsize.width, beta, sum);
                ^~~~~~~~~~~~~~~~~~
/opencv/modules/imgproc/src/resize.cpp:3112:13: note: candidate template ignored: substitution failure [with WT = float]
inline void muladd(const WT* buf, int width, WT beta, WT* sum) {
            ^
/opencv/modules/imgproc/src/resize.cpp:3103:44: error: invalid operands to binary expression ('v_float64' (aka '__rvv_float64m1_t') and 'v_float64')
        vx_store(sum + dx, vx_setall(beta) * vx_load(buf + dx));
                           ~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~
/opencv/modules/imgproc/src/resize.cpp:3229:29: note: in instantiation of function template specialization 'cv::inter_area::mul<double>' requested here
                inter_area::mul(buf, dsize.width, beta, sum);
                            ^
/opencv/modules/imgproc/src/resize.cpp:3258:18: note: in instantiation of member function 'cv::ResizeArea_Invoker<double, double>::operator()' requested here
                 ResizeArea_Invoker<T, WT>(src, dst, xtab, xtab_size, ytab, ytab_size, tabofs),
                 ^
/opencv/modules/imgproc/src/resize.cpp:3894:9: note: in instantiation of function template specialization 'cv::resizeArea_<double, double>' requested here
        resizeArea_<double, double>, 0
        ^
/opencv/modules/imgproc/src/resize.cpp:3117:64: error: invalid operands to binary expression ('v_float64' (aka '__rvv_float64m1_t') and 'v_float64')
        vx_store(sum + dx, vx_load(sum + dx) + vx_setall(beta) * vx_load(buf + dx));
                                               ~~~~~~~~~~~~~~~ ^ ~~~~~~~~~~~~~~~~~
/opencv/modules/imgproc/src/resize.cpp:3234:29: note: in instantiation of function template specialization 'cv::inter_area::muladd<double>' requested here
                inter_area::muladd(buf, dsize.width, beta, sum);
                            ^
/opencv/modules/imgproc/src/resize.cpp:3258:18: note: in instantiation of member function 'cv::ResizeArea_Invoker<double, double>::operator()' requested here
                 ResizeArea_Invoker<T, WT>(src, dst, xtab, xtab_size, ytab, ytab_size, tabofs),
                 ^
/opencv/modules/imgproc/src/resize.cpp:3894:9: note: in instantiation of function template specialization 'cv::resizeArea_<double, double>' requested here
        resizeArea_<double, double>, 0
        ^
10 errors generated.

@asmorkalov
Copy link
Contributor

@vrabaud you need to use v_add, v_mul and other functions instead of overloaded +, * and other operators.

@vrabaud
Copy link
Contributor Author
vrabaud commented Oct 19, 2023

Thx @asmorkalov , I believe I fixed it using v_add and v_mul as suggested.

Copy link
Contributor
@asmorkalov asmorkalov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍
Tested RISC-V config manually.

@asmorkalov
Copy link
Contributor

As summary:
I see 10-20% speedup for the cases, where scale coefficient is closer to 1, e.g. 0.81 in our perf test. Other cases with scale 0.1 and 0.25 have the same performance as before. The effect is even more stable with single thread configuration (--perf_threads=1).

@asmorkalov asmorkalov merged commit c96f48e into opencv:4.x Oct 19, 2023
@vrabaud
Copy link
Contributor Author
vrabaud commented Oct 19, 2023

0.1 and 0.25 mean integer proportions (a tenth and 4th of the original image) whose dimensions are divisible by 10 or 4. In that case, a different algorithm is used (areafast) that this pull request does not speed up. It is normal to get no change there. 0.81 is a non-even scale, normal INTER_AREA is used there and that's where you can see the speed-up.

@vrabaud vrabaud deleted the inter_area1 branch October 19, 2023 15:24
@asmorkalov asmorkalov mentioned this pull request Nov 3, 2023
IskXCr pushed a commit to Haosonn/opencv that referenced this pull request Dec 20, 2023
Speed up line merging in INTER_AREA opencv#24412

This provides a 10 to 20% speed-up.

Related perf test fix: opencv#24417
This is a split of opencv#23525 that will be updated to only deal with column merging.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
thewoz pushed a commit to thewoz/opencv that referenced this pull request Jan 4, 2024
Speed up line merging in INTER_AREA opencv#24412

This provides a 10 to 20% speed-up.

Related perf test fix: opencv#24417
This is a split of opencv#23525 that will be updated to only deal with column merging.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
thewoz pushed a commit to thewoz/opencv that referenced this pull request May 29, 2024
Speed up line merging in INTER_AREA opencv#24412

This provides a 10 to 20% speed-up.

Related perf test fix: opencv#24417
This is a split of opencv#23525 that will be updated to only deal with column merging.

### Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

- [x] I agree to contribute to the project under Apache 2 License.
- [x] To the best of my knowledge, the proposed patch is not based on a code under GPL or another license that is incompatible with OpenCV
- [x] The PR is proposed to the proper branch
- [x] There is a reference to the original bug report and related work
- [x] There is accuracy test, performance test and test data in opencv_extra repository, if applicable
      Patch to opencv_extra has the same branch name.
- [x] The feature is well documented and sample code can be built with the project CMake
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0