8000 GAPI Fluid: Dynamic dispatching for Split3 kernel by alexgiving · Pull Request #21441 · opencv/opencv · GitHub
[go: up one dir, main page]

Skip to content

GAPI Fluid: Dynamic dispatching for Split3 kernel #21441

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jan 25, 2022

Conversation

alexgiving
Copy link
Member
@alexgiving alexgiving commented Jan 13, 2022

Split3 SIMD.xlsx

Pull Request Readiness Checklist

See details at https://github.com/opencv/opencv/wiki/How_to_contribute#making-a-good-pull-request

  • I agree to contribute to the project under Apache 2 License.
  • To the best of my knowledge, the proposed patch is not based on a code under GPL or other license that is incompatible with OpenCV
  • The PR is proposed to proper branch
  • There is reference to original bug report and related work
  • There is accuracy test, performance test and test data in opencv_extra repository, if applicable
    Patch to opencv_extra has the same branch name.
  • The feature is well documented and sample code can be built with the project CMake
force_builders=Linux AVX2,Custom,Custom Win,Custom Mac
build_gapi_standalone:Linux x64=ade-0.1.1f
build_gapi_standalone:Win64=ade-0.1.1f
Xbuild_gapi_standalone:Mac=ade-0.1.1f
build_gapi_standalone:Linux x64 Debug=ade-0.1.1f

build_image:Custom=centos:7
buildworker:Custom=linux-1
build_gapi_standalone:Custom=ade-0.1.1f

Xbuild_image:Custom=ubuntu-openvino-2021.3.0:20.04
build_image:Custom Win=openvino-2021.4.1
build_image:Custom Mac=openvino-2021.2.0

buildworker:Custom Win=windows-3

test_modules:Custom=gapi,python2,python3,java
test_modules:Custom Win=gapi,python2,python3,java
test_modules:Custom Mac=gapi,python2,python3,java

buildworker:Custom=linux-1
# disabled due high memory usage: test_opencl:Custom=ON
Xtest_opencl:Custom=OFF
Xtest_bigdata:Custom=1
Xtest_filter:Custom=*

CPU_BASELINE:Custom Win=AVX512_SKX
CPU_BASELINE:Custom=SSE4_2

@@ -207,6 +207,18 @@ ABSDIFFC_SIMD(float)

#undef ABSDIFFC_SIMD

#define SPLIT3_SIMD(SRC, DST) \
int split3_simd(const SRC in[], DST out1[], DST out2[], \
DST out3[], const int width) \
Copy link
Member
@anna-khakimova anna-khakimova Jan 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes no sense to make this function a macro for this kernel. Make this function as a normal function.

anna-khakimova
}
#endif
#if CV_SIMD
w = split3_simd(in, out1, out2, out3, width);

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove empty line.

//-------------------------

#define SPLIT3_SIMD(SRC, DST) \
int split3_simd(const SRC in[], DST out1[], DST out2[], \
Copy link
Member
@anna-khakimova anna-khakimova Jan 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes no sense to make this function as a macro for this kernel. Make this function as a normal function with uchar type for all parameters.

@@ -184,6 +184,14 @@ ABSDIFFC_SIMD(float)

#undef ABSDIFFC_SIMD

#define SPLIT3_SIMD(DST1, DST2, DST3, SRC) \
int split3_simd(const SRC in[], DST1 out1[], DST2 out2[], \
Copy link
Member
@anna-khakimova anna-khakimova Jan 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And there is only one DST type. Three different DST types are redundant.
This construction must be forward declaration for this function. So this macro would be the same as this

NOTE:
It makes no sense to make this function as a macro for this kernel. Please make this function as a usual function.

int x = 0; \
for (; x <= width - nlanes; x += nlanes) \
{ \
v_uint8 a, b, c; \
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be better to move creation and initialization of these variables from the loop.

@alexgiving alexgiving force-pushed the atrutnev/split3_simd_fluid branch from 2d19f3f to 0b8d606 Compare January 21, 2022 12:10
Copy link
Member
@anna-khakimova anna-khakimova left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

{
constexpr int nlanes = v_uint8::nlanes;
int x = 0;
v_uint8 a, b, c;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

v_uint8 a, b, c;

Why is it outside of the loop body?

Keep variables declaration near their usage.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it necessary to create vectors at each iteration?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is necessary to keep vector values between iterations?

Copy link
Member
@anna-khakimova anna-khakimova Jan 25, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is necessary to keep vector values between iterations?

@alalek
There is no necessity to do setzero() for each iteration of the loop since they are initialized by the v_deinterleave() function before calculations. So creation and first initialization with zeroes can be moved from the loop.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do you see "setzero()" here?

Good rule of thumb for local variables to declare them in place where they are used.
No need to cross for loop boundary or any other, especially for SIMD register wrappers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@dmatveev dmatveev added this to the 4.6.0 milestone Jan 24, 2022
Copy link
Contributor
@dmatveev dmatveev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@alexgiving alexgiving force-pushed the atrutnev/split3_simd_fluid branch 2 times, most recently from 8bb494b to 5e89b9a Compare January 25, 2022 12:03
@anna-khakimova anna-khakimova requested a review from alalek January 25, 2022 13:44
@opencv-pushbot opencv-pushbot merged commit 9238316 into opencv:4.x Jan 25, 2022
@alalek alalek mentioned this pull request Feb 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0