Functionality questions for ome-zarr formatted data

Greetings from the Mehta Lab and apologies in advance for the long post!
I am attempting to use gunpowder as a dataloader for (float32) data in the ome-zarr format. I have run into a few issues trying to get functionality to work with some data I have. I have enumerated some questions I have below.

Support for multiple zarr stores in the OME-HCS Zarr format
If I have data stored in ome-zarr format as a series of hierarchical groups (row > col > position > data_arrays), when I create datasets inside of a source node, they need to be specified by inputting the full hierarchy path to the dataset source:

raw = gp.ArrayKey('RAW')
source = gp.ZarrSource(
    filename=zarr_dir,
    datasets={raw: 'Row_0/Col_1/Pos_1/arr_0'},
    array_specs={raw: gp.ArraySpec(interpolatable=True)}
)

Because of this format, we store arrays containing data in different rows that are all part of one 'dataset' in different zarr stores. Is it possible to create a single source that can access multiple zarr stores?

Inconsistent behavior of BatchRequest objects
When applying some augmentations (for example the SimpleAugment node), re-usage of a BatchRequest without redefining the request or pipelines will randomly result in data returned with the wrong indices:

For example, I define a dataset and a pipeline with and without a simple augmentation node:

raw = gp.ArrayKey('RAW')

source = gp.ZarrSource(
    zarr_dir,  # the zarr container
    {raw: 'Row_0/Col_1/Pos_1/arr_0'},  # arr_0 is 3 channels of 3D image stacks, dims: (1, 3, 41, 2048, 2048) 
    {raw: gp.ArraySpec(interpolatable=True)} 
)

simple_augment = gp.SimpleAugment(transpose_only=(-1,-2))

pipelines = [source, source + simple_augment]

Then I define a batch request:

request = gp.BatchRequest()
request[raw] = gp.Roi((0,0,0,0,0), (1,3,1,768,768))

Then I use that request to generate two batches from each pipeline in sequence:

#First loop is fine, second loop has 2nd dimension flipped/transposed
for n in range(2):
  batches = []
  for pipeline in pipelines: #for both augmented and plain pipeline
    with gp.build(pipeline):
      batch = pipeline.request_batch(request) #get batch
      batches.append(batch)

  # visualize the content of the batches
  fig, ax = plt.subplots(len(pipelines), 3, figsize = (14,10))
  for i in range(len(pipelines)):
    for j in range(3):
      ax[i][j].imshow(batches[i][raw].data[0,j,0])
  ax[0][1].set_title('source')
  ax[1][1].set_title('source + aug')    
  plt.show()

The result is the following:
Visualization of batch from loop 1 Visualization of batch from loop 2

I am confused as to why the behavior changes when the data, pipeline, and batch request haven't changed. Is there a reason that the second augmentation batch returns with reversed channels?

Thanks!!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Functionality questions for ome-zarr formatted data #181

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Functionality questions for ome-zarr formatted data #181

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions