-
Notifications
You must be signed in to change notification settings - Fork 961
Description
Is your feature request related to a problem? Please describe.
cuDF informally states that we follow the Arrow physical memory layout spec. However, we're a little loose on some of the particulars, especially around alignment/padding.
Arrow requires all allocations be 8B aligned and padded to 8B sizes. It recommends expanding this to 64B aligned and 64B padded.
cuDF isn't (formally) following this. Implicitly, we know all allocations made through RMM will be 256B aligned. As it happens, RMM will currently pad out to 8B (though it shouldn't do that) . However, due to zero-copy slicing, we can't assume column.data<T>()
pointer will be 256B aligned, and can only expect the pointer alignment to be at least alignof(T)
.
Specifically for null masks, we expect/allocate the null mask of a column to always be padded to 64B.
Describe the solution you'd like
At the very least, we should write down our current expectations. These requirements can be best captured in requirements of the column_view
class, as column_view
is the arbiter of how we interact will all memory consumed by libcudf.
data
must be aligned to at least the alignment of the underlying type- It must be safe to dereference any address in the range
[data, data + num_elements * size_of(data_type))
- Note: This is currently incompatible with the Arrow spec. e.g., libcudf would currently allow an
INT8
allocation to be any size, whereas Arrow would require it to be padded to a multiple of 8 bytes.
- Note: This is currently incompatible with the Arrow spec. e.g., libcudf would currently allow an
null_mask
must be aligned toalignof(bitmask_type)
null_mask
must point to an allocation padded to a multiple of 64 bytes such that it is safe to dereference any address in the range[null_mask, null_mask + bitmask_allocation_size_bytes(num_elements))
Additionally, we should consider expanding the requirement for allocation padding to make it such that all allocations be padded out to 64B. This could be achieved easily enough with a device_memory_resource
adapter that rounds allocations up to at least 64B.
Additional Considerations
As best I can tell, these requirements are technically incompatible with the CUDA Array Interface because the CAI doesn't say anything about padding/alignment. This seems like something we should push to be addressed in the CAI.