Description
I think it would be great if zarr-python
could automatically pick a smart shard shape and chunk shape for users, based on an array shape and a dtype (i.e., the stuff that we will know if a user is coming in with a numpy array). Good defaults would make a lot of users happy.
Off the top of my head, the following constraints should factor in to the automatic shard shape / chunk shape:
- min / max size (in bytes)
- min / max count
- shape constraints. some examples:
- chunks must tile the shard perfectly (non-configurable)
- chunks should have 1 axis length that is fixed to a constant, other lengths can vary to satisfy other constraints
it might be useful to combine a size constraint to shards, and a mixed size / shape constraint to chunks, e.g. "chunks should be ~isotropic, divisible by a power of 2 on each size, inside a shard that is at most 100 MB"
and it's possible that these constraints should be configurable, via the global config, or via keyword arguments to array creation.
Any thoughts? @jbms if you have any tensorstore stories to share about this I would be very interested.