-
Notifications
You must be signed in to change notification settings - Fork 58
Description
Greetings from the Mehta Lab and apologies in advance for the long post!
I am attempting to use gunpowder as a dataloader for (float32) data in the ome-zarr format. I have run into a few issues trying to get functionality to work with some data I have. I have enumerated some questions I have below.
Support for multiple zarr stores in the OME-HCS Zarr format
If I have data stored in ome-zarr format as a series of hierarchical groups (row > col > position > data_arrays), when I create datasets inside of a source node, they need to be specified by inputting the full hierarchy path to the dataset source:
raw = gp.ArrayKey('RAW')
source = gp.ZarrSource(
filename=zarr_dir,
datasets={raw: 'Row_0/Col_1/Pos_1/arr_0'},
array_specs={raw: gp.ArraySpec(interpolatable=True)}
)Because of this format, we store arrays containing data in different rows that are all part of one 'dataset' in different zarr stores. Is it possible to create a single source that can access multiple zarr stores?
Inconsistent behavior of BatchRequest objects
When applying some augmentations (for example the SimpleAugment node), re-usage of a BatchRequest without redefining the request or pipelines will randomly result in data returned with the wrong indices:
For example, I define a dataset and a pipeline with and without a simple augmentation node:
raw = gp.ArrayKey('RAW')
source = gp.ZarrSource(
zarr_dir, # the zarr container
{raw: 'Row_0/Col_1/Pos_1/arr_0'}, # arr_0 is 3 channels of 3D image stacks, dims: (1, 3, 41, 2048, 2048)
{raw: gp.ArraySpec(interpolatable=True)}
)
simple_augment = gp.SimpleAugment(transpose_only=(-1,-2))
pipelines = [source, source + simple_augment]Then I define a batch request:
request = gp.BatchRequest()
request[raw] = gp.Roi((0,0,0,0,0), (1,3,1,768,768))Then I use that request to generate two batches from each pipeline in sequence:
#First loop is fine, second loop has 2nd dimension flipped/transposed
for n in range(2):
batches = []
for pipeline in pipelines: #for both augmented and plain pipeline
with gp.build(pipeline):
batch = pipeline.request_batch(request) #get batch
batches.append(batch)
# visualize the content of the batches
fig, ax = plt.subplots(len(pipelines), 3, figsize = (14,10))
for i in range(len(pipelines)):
for j in range(3):
ax[i][j].imshow(batches[i][raw].data[0,j,0])
ax[0][1].set_title('source')
ax[1][1].set_title('source + aug')
plt.show()The result is the following:
Visualization of batch from loop 1 Visualization of batch from loop 2
I am confused as to why the behavior changes when the data, pipeline, and batch request haven't changed. Is there a reason that the second augmentation batch returns with reversed channels?
Thanks!!

