[go: up one dir, main page]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unit stride access of blocked Coefficients #713

Open
michalhabera opened this issue Sep 11, 2024 · 3 comments
Open

Unit stride access of blocked Coefficients #713

michalhabera opened this issue Sep 11, 2024 · 3 comments

Comments

@michalhabera
Copy link
Contributor

Running some benchmarks with @jhale we've noticed that there was a performance regression introduced in #645.

More specifically, before the above PR any blocked Coeffcient (e.g. a vector-valued material property) would be first copied into a temporary outside of the quadrature loop such that in the hot loop it is accessed with unit stride, see removed logic #645

For higher order quadrature rules (but otherwise simple scalar trees) this could lead to >30% performance regression.

@IgorBaratta
Copy link
Member

I can add the logic back, but I think that the best place to do this is when packing coefficients in dolfinx.
Otherwise we repeat the packing step (pack coefficients in a given order, then pack them again in a different order).

More specifically I think ffcx kernels should receive data in XXXXXYYYYYZZZZ format and not XYZXYZXYZ...

@IgorBaratta
Copy link
Member

When I introduced temporaries to avoid strided memory accesses, it was intended as a short-term solution, as it only works with vector elements.

@michalhabera
Copy link
Contributor Author

Agree that this should be done in dolfinx when packing coefficients. I do not remember if coeff packing is templated based on the block size, so there could be small performance drawback when packing is done over non-compile-time constant loop.

I'd suggest to keep this issue open, it will require changes in the coefficient access in FFCx too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants