[V3] Support for batched Store API in v3

The Store API in v3 supports fetching a single key at a time or partial values of multiple keys.

class Store(ABC):
    @abstractmethod
    async def get(
        self, key: str, byte_range: Optional[Tuple[int, Optional[int]]] = None
    ) -> Optional[bytes]:
        ...

    @abstractmethod
    async def get_partial_values(
        self, key_ranges: List[Tuple[str, Tuple[int, int]]]
    ) -> List[bytes]:
        ...

The new BatchedCodecPipeline, for instance, currently fetches data for all requested chunks concurrently by using an asyncio thread pool with each task calling the get method of the Store:

            chunk_bytes_batch = await concurrent_map(
                [(byte_getter,) for byte_getter, _, _, _ in batch_info],
                lambda byte_getter: byte_getter.get(),
                runtime_configuration.concurrency,
            )

Since there have been some concerns about scalability of asyncio for a large number of tasks, would it make sense to move this batched fetch into the Store itself? This would allow another Store implementation to potentially use a more performant asynchronous framework for the batched fetch, say in C++ or Rust, and can look like a single asyncio task to zarr-python.

This is a feature that currently exists in v2 through the getitems Store API which is used to enable GPU Direct Storage in Zarr through kvikIO.

A similar feature is also now added (provisionally) to v3 codecs that support an encode_batch and a decode_batch with a default implementation that ships tasks off to an asyncio thread pool but allows a codec to override that in favor or another approach if needed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions