10000 Read performance regression via Store backed by FSMap (fsspec, GCS) · Issue #1296 · zarr-developers/zarr-python · GitHub
[go: up one dir, main page]

Skip to content
Read performance regression via Store backed by FSMap (fsspec, GCS) #1296
Closed
@ravwojdyla

Description

@ravwojdyla

Zarr version

2.11.0 and up (including current main)

Numcodecs version

0.10.2

Python Version

3.10

Operating System

Mac and Linux

Installation

conda, pip and from source

Description

It appears that #789, commit: 5c71212 so from zarr 0.11.0, there's a performance regression that affects reading zarr data via Store backed by fsspec/FSMap.

In our test example (in practice we use xarray), we have a zarr array made of 2K files (total 1GB compressed), reading it via:

np.asarray(zarr.open(fsspec.get_mapper(...), mode="r"))
  • on zarr 0.10.3 took about 12 seconds
  • on zarr 0.13.3 took about 90 seconds (so roughly 7x longer)
    • the same problem exists starting from version 0.11.0

Looking at the stacktraces from the different versions, looks like 0.10.3 was asynchronous fetching multiple items, while 0.13.3 is synchronized per storage item?

zarr 0.13.3
Thread 3529282 (idle): "MainThread"
    do_futex_wait.constprop.0 (libpthread-2.31.so)
    __new_sem_wait_slow.constprop.0 (libpthread-2.31.so)
    PyThread_acquire_lock_timed.localalias (python3.10)
    lock_PyThread_acquire_lock (python3.10)
    wait (threading.py:324)
    wait (threading.py:607)
    sync (fsspec/asyn.py:86)
    wrapper (fsspec/asyn.py:113)
    __getitem__ (fsspec/mapping.py:143)
    __getitem__ (zarr/storage.py:724)
    _chunk_getitem (zarr/core.py:1966)
    _get_selection (zarr/core.py:1267)
    _get_basic_selection_nd (zarr/core.py:976)
    get_basic_selection (zarr/core.py:933)
    __getitem__ (zarr/core.py:807)
    __array__ (zarr/core.py:589)
    PyArray_FromArrayAttr_int (numpy/core/_multiarray_umath.cpython-310-x86_64-linux-gnu.so)
    _array_from_array_like (numpy/core/_multiarray_umath.cpython-310-x86_64-linux-gnu.so)
    PyArray_DiscoverDTypeAndShape_Recursive (numpy/core/_multiarray_umath.cpython-310-x86_64-linux-gnu.so)
    PyArray_DiscoverDTypeAndShape (numpy/core/_multiarray_umath.cpython-310-x86_64-linux-gnu.so)
    PyArray_FromAny (numpy/core/_multiarray_umath.cpython-310-x86_64-linux-gnu.so)
    PyArray_CheckFromAny (numpy/core/_multiarray_umath.cpython-310-x86_64-linux-gnu.so)
    _array_fromobject_generic (numpy/core/_multiarray_umath.cpython-310-x86_64-linux-gnu.so)
    array_asarray (numpy/core/_multiarray_umath.cpython-310-x86_64-linux-gnu.so)
    <module> (<stdin>:2)
zarr 0.10.3
Thread 3530062 (idle): "MainThread"
    do_futex_wait.constprop.0 (libpthread-2.31.so)
    __new_sem_wait_slow.constprop.0 (libpthread-2.31.so)
    PyThread_acquire_lock_timed.localalias (python3.10)
    lock_PyThread_acquire_lock (python3.10)
    wait (threading.py:324)
    wait (threading.py:607)
    sync (fsspec/asyn.py:86)
    wrapper (fsspec/asyn.py:113)
    getitems (fsspec/mapping.py:93)
    _chunk_getitems (zarr/core.py:1847)
    _get_selection (zarr/core.py:1136)
    _get_basic_selection_nd (zarr/core.py:841)
    get_basic_selection (zarr/core.py:798)
    __getitem__ (zarr/core.py:673)
    __array__ (zarr/core.py:469)
    PyArray_FromArrayAttr_int (numpy/core/_multiarray_umath.cpython-310-x86_64-linux-gnu.so)
    _array_from_array_like (numpy/core/_multiarray_umath.cpython-310-x86_64-linux-gnu.so)
    PyArray_DiscoverDTypeAndShape_Recursive (numpy/core/_multiarray_umath.cpython-310-x86_64-linux-gnu.so)
    PyArray_DiscoverDTypeAndShape (numpy/core/_multiarray_umath.cpython-310-x86_64-linux-gnu.so)
    PyArray_FromAny (numpy/core/_multiarray_umath.cpython-310-x86_64-linux-gnu.so)
    PyArray_CheckFromAny (numpy/core/_multiarray_umath.cpython-310-x86_64-linux-gnu.so)
    _array_fromobject_generic (numpy/core/_multiarray_umath.cpython-310-x86_64-linux-gnu.so)
    array_asarray (numpy/core/_multiarray_umath.cpython-310-x86_64-linux-gnu.so)
    <module> (<stdin>:2)

Steps to reproduce

And we need to an existing zarr array to read:

np.asarray(zarr.open(fsspec.get_mapper(...), mode="r"))

Additional output

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugPotential issues with the zarr-python library

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0