E40F Use a buffer pools for reads from disk + in-memory shuffling · Issue #105 · scverse/annbatch · GitHub
[go: up one dir, main page]

Skip to content

Use a buffer pools for reads from disk + in-memory shuffling #105

@ilan-gold

Description

@ilan-gold

Description of feature

In theory, it should be possible to pre-allocate and then reuse memory for

  1. io operations (i.e., read from disk per anndata). For sparse data, this could prove challenging (although I’d guess not impossible) because of the uneven nature of data and indices reads - certainly upper bounds derived from indptr should be doable on how much memory needs to be preallocated
  2. then also for in-memory shuffle (i.e., we concatenate the read-from-disk data directly into preallocated buffers, shuffle the data that was put into that buffer, yield, repeat after next read. this suffers from the aame above problem around uneven buffers for sparse matrices)

Handling leftover data for the second buffwr (i.e., cocnat buffer) might make this challenging, but it’s also possible we can at least upper bound the needed memory and then track how much and where the needed data is stored.

The benefit here would of course be not having to spend time allocating memory. Is this actually a bottleneck though? Maybe, maybe not.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0