8000 automatically specifying shards and chunks · Issue #2572 · zarr-developers/zarr-python · GitHub
[go: up one dir, main page]

Skip to content
automatically specifying shards and chunks #2572
Closed
@d-v-b

Description

@d-v-b

I think it would be great if zarr-python could automatically pick a smart shard shape and chunk shape for users, based on an array shape and a dtype (i.e., the stuff that we will know if a user is coming in with a numpy array). Good defaults would make a lot of users happy.

Off the top of my head, the following constraints should factor in to the automatic shard shape / chunk shape:

  • min / max size (in bytes)
  • min / max count
  • shape constraints. some examples:
    • chunks must tile the shard perfectly (non-configurable)
    • chunks should have 1 axis length that is fixed to a constant, other lengths can vary to satisfy other constraints

it might be useful to combine a size constraint to shards, and a mixed size / shape constraint to chunks, e.g. "chunks should be ~isotropic, divisible by a power of 2 on each size, inside a shard that is at most 100 MB"

and it's possible that these constraints should be configurable, via the global config, or via keyword arguments to array creation.

Any thoughts? @jbms if you have any tensorstore stories to share about this I would be very interested.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew features or improvements

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0