8000 [V3] Support for batched Store API in v3 · Issue #1806 · zarr-developers/zarr-python · GitHub
[go: up one dir, main page]

Skip to content
[V3] Support for batched Store API in v3 #1806
Open
@akshaysubr

Description

@akshaysubr

The Store API in v3 supports fetching a single key at a time or partial values of multiple keys.

class Store(ABC):
    @abstractmethod
    async def get(
        self, key: str, byte_range: Optional[Tuple[int, Optional[int]]] = None
    ) -> Optional[bytes]:
        ...

    @abstractmethod
    async def get_partial_values(
        self, key_ranges: List[Tuple[str, Tuple[int, int]]]
    ) -> List[bytes]:
        ...

The new BatchedCodecPipeline, for instance, currently fetches data for all requested chunks concurrently by using an asyncio thread pool with each task calling the get method of the Store:

            chunk_bytes_batch = await concurrent_map(
                [(byte_getter,) for byte_getter, _, _, _ in batch_info],
                lambda byte_getter: byte_getter.get(),
                runtime_configuration.concurrency,
            )

Since there have been some concerns about scalability of asyncio for a large number of tasks, would it make sense to move this batched fetch into the Store itself? This would allow another Store implementation to potentially use a more performant asynchronous framework for the batched fetch, say in C++ or Rust, and can look like a single asyncio task to zarr-python.

This is a feature that currently exists in v2 through the getitems Store API which is used to enable GPU Direct Storage in Zarr through kvikIO.

A similar feature is also now added (provisionally) to v3 codecs that support an encode_batch and a decode_batch with a default implementation that ships tasks off to an asyncio thread pool but allows a codec to override that in favor or another approach if needed.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0