You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
The reason will be displayed to describe this comment to others. Learn more.
This effectively hard-codes sharding into the spec, something like a sharded=True flag that might have existed on the CHunkSpec object. How do you expect this to extend to variable chunking or other schemes that might be created in the future?
The reason will be displayed to describe this comment to others. Learn more.
This proposal can use whatever specification for variable length chunks we come up with, e.g. tuples of tuples of ints. You could specify variable length chunking with no sharding via something like chunks = {'write_shape': ((10,5), (1,2,3)}, and variable length chunking with sharding via something like chunks = {'write_shape: ((10,5), (1,2,3)), 'read_shape': (1,1)}. The read shape would have to checked for consistency with all the unique chunk shapes in this case. We would of course need to widen the type of ChunkSpec for this to accept tuple[tuple[int, ...]] for the write_shape keys.
The reason will be displayed to describe this comment to others. Learn more.
something like a sharded=True flag that might have existed on the CHunkSpec object.
If we had such a flag on the chunkspec object, then it would semantically collide with read_shape: {'write_shape': (10,10), 'read_shape': (2,2), sharding: False} would not be valid, because there's no way to have read_shape and write_shape differ without sharding. BTW when I say "sharding" i don't mean "the sharding codec", I mean the general concept of packing multiple subchunks into a single file. If a non-codec implementation of sharding emerges, then I would like to imagine that this API could wrap that.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This effectively hard-codes sharding into the spec, something like a
sharded=True
flag that might have existed on the CHunkSpec object. How do you expect this to extend to variable chunking or other schemes that might be created in the future?Uh oh!
There was an error while loading. Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This proposal can use whatever specification for variable length chunks we come up with, e.g. tuples of tuples of ints. You could specify variable length chunking with no sharding via something like
chunks = {'write_shape': ((10,5), (1,2,3)}
, and variable length chunking with sharding via something likechunks = {'write_shape: ((10,5), (1,2,3)), 'read_shape': (1,1)}
. The read shape would have to checked for consistency with all the unique chunk shapes in this case. We would of course need to widen the type ofChunkSpec
for this to accepttuple[tuple[int, ...]]
for thewrite_shape
keys.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we had such a flag on the chunkspec object, then it would semantically collide with
read_shape
:{'write_shape': (10,10), 'read_shape': (2,2), sharding: False}
would not be valid, because there's no way to haveread_shape
andwrite_shape
differ without sharding. BTW when I say "sharding" i don't mean "the sharding codec", I mean the general concept of packing multiple subchunks into a single file. If a non-codec implementation of sharding emerges, then I would like to imagine that this API could wrap that.