@@ -927,34 +927,24 @@ filters (e.g., byte-shuffle) have been applied.
927
927
Parallel computing and synchronization
928
928
--------------------------------------
929
929
930
- Zarr arrays have been designed for use as the source and/or sink for data in
931
- parallel computations. Please note that this is an area of ongoing research and
932
- development. If you are using Zarr for parallel computing, we welcome feedback,
933
- experience, discussion, ideas and advice, particularly about issues such as data
934
- integrity and performance.
935
-
936
- Both multi-threaded and multi-process parallelism are possible, although Zarr
937
- can use a number of different storage systems (see :ref: `tutorial_storage `) and
938
- not all storage systems support both types of parallelism. Please see the API
939
- docs for the :mod: `zarr.storage ` module for more information about which storage
940
- classes support parellel computing.
941
-
942
- The bottleneck for most storage and retrieval operations is
943
- compression/decompression, and the Python global interpreter lock (GIL) is
944
- released wherever possible during these operations, so Zarr will generally not
945
- block other Python threads from running.
946
-
947
- Depending on how data are being accessed or updated, some synchronization
948
- (locking) may be required to avoid data loss. If an array is being read
949
- concurrently by multiple threads or processes, no synchronization is
950
- required. If an array is being written to concurrently by multiple threads or
951
- processes, some synchronization may be required, depending on the way the data
952
- is being written.
953
-
954
- If each worker in a parallel computation is writing to a separate region of the
955
- array, and if region boundaries are perfectly aligned with chunk boundaries,
956
- then no synchronization is required. However, if region and chunk boundaries are
957
- not perfectly aligned, then synchronization is required to avoid two workers
930
+ Zarr arrays have been designed for use as the source or sink for data in
931
+ parallel computations. By data source we mean that multiple concurrent read
932
+ operations may occur. By data sink we mean that multiple concurrent write
933
+ operations may occur, with each writer updating a different region of the
934
+ array. Zarr arrays have **not ** been designed for situations where multiple
935
+ readers and writers are concurrently operating on the same array.
936
+
937
+ Both multi-threaded and multi-process parallelism are possible. The bottleneck
938
+ for most storage and retrieval operations is compression/decompression, and the
939
+ Python global interpreter lock (GIL) is released wherever possible during these
940
+ operations, so Zarr will generally not block other Python threads
B267
from running.
941
+
942
+ When using a Zarr array as a data sink, some synchronization (locking) may be
943
+ required to avoid data loss, depending on how data are being updated. If each
944
+ worker in a parallel computation is writing to a separate region of the array,
945
+ and if region boundaries are perfectly aligned with chunk boundaries, then no
946
+ synchronization is required. However, if region and chunk boundaries are not
947
+ perfectly aligned, then synchronization is required to avoid two workers
958
948
attempting to modify the same chunk at the same time, which could result in data
959
949
loss.
960
950
@@ -991,6 +981,11 @@ some networked file systems). E.g.::
991
981
992
982
This array is safe to read or write from multiple processes.
993
983
984
+ Please note that support for parallel computing is an area of ongoing research
985
+ and development. If you are using Zarr for parallel computing, we welcome
986
+ feedback, experience, discussion, ideas and advice, particularly about issues
987
+ related to data integrity and performance.
988
+
994
989
.. _tutorial_pickle :
995
990
996
991
Pickle support
0 commit comments