Description
I am mostly creating this issue for the hope of a short discussion. Maybe @ahaldane you have an opinion on this?
Currently, we have the alignment
on dtypes, which works fine. But the dtype copy code actually uses the full itemsize sometimes. This is relevant only for complex data types as far as I can tell, because np.complex64
has a 32-bit alignment, since that is the alignment of the embedded floats.
The copy code in some cases will use uint64
to copy a complex64
currently, which is convenient, because we can reuse this for all types. But, it breaks alignment and requires us to check against both alignments (although in most cases they match).
It would be nice to solve this in the long run. My main issue is that it is confusing that alignment is usually 32-bit but sometimes 64-bit here. Possible solutions:
- Define the alignment as 64-bit for this complex dtype, even though that is much stricter for than necessary usually.
- Specialize complex copy code so that it cannot have any alignment issues (does not use
uint64
internally). - Signal the alignment specific to a certain functionality. This is possible, but annoying since we need the alignment requirement before getting the final inner-loop function. Because we would like to set up buffering first.
There might be one other "middle" ground: Require complex to provide a 32bit aligned copy function, but actually use the current uint64
copy function as a fast-path (because at that point, we already know that just copying the data is sufficient).