8000 [FEA] Document cuDF allocation alignment/padding requirements · Issue #9389 · rapidsai/cudf · GitHub
[go: up one dir, main page]

Skip to content
[FEA] Document cuDF allocation alignment/padding requirements #9389
@jrhemstad

Description

@jrhemstad

Is your feature request related to a problem? Please describe.

cuDF informally states that we follow the Arrow physical memory layout spec. However, we're a little loose on some of the particulars, especially around alignment/padding.

Arrow requires all allocations be 8B aligned and padded to 8B sizes. It recommends expanding this to 64B aligned and 64B padded.

cuDF isn't (formally) following this. Implicitly, we know all allocations made through RMM will be 256B aligned. As it happens, RMM will currently pad out to 8B (though it shouldn't do that) . However, due to zero-copy slicing, we can't assume column.data<T>() pointer will be 256B aligned, and can only expect the pointer alignment to be at least alignof(T).

Specifically for null masks, we expect/allocate the null mask of a column to always be padded to 64B.

Describe the solution you'd like

At the very least, we should write down our current expectations. These requirements can be best captured in requirements of the column_view class, as column_view is the arbiter of how we interact will all memory consumed by libcudf.

  • data must be aligned to at least the alignment of the underlying type
  • It must be safe to dereference any address in the range [data, data + num_elements * size_of(data_type))
    • Note: This is currently incompatible with the Arrow spec. e.g., libcudf would currently allow an INT8 allocation to be any size, whereas Arrow would require it to be padded to a multiple of 8 bytes.
  • null_mask must be aligned to alignof(bitmask_type)
  • null_mask must point to an allocation padded to a multiple of 64 bytes such that it is safe to dereference any address in the range [null_mask, null_mask + bitmask_allocation_size_bytes(num_elements))

Additionally, we should consider expanding the requirement for allocation padding to make it such that all allocations be padded out to 64B. This could be achieved easily enough with a device_memory_resource adapter that rounds allocations up to at least 64B.

Additional Considerations

As best I can tell, these requirements are technically incompatible with the CUDA Array Interface because the CAI doesn't say anything about padding/alignment. This seems like something we should push to be addressed in the CAI.

Metadata

Metadata

Assignees

No one assigned

    Labels

    PerformancePerformance related issuePythonAffects Python cuDF API.docDocumentationfeature requestNew feature or requestlibcudfAffects libcudf (C++/CUDA) code.proposalChange current process or code

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0