8000 Concurrent writes to the same chunk using DirectoryStore · Issue #328 · zarr-developers/zarr-python · GitHub
[go: up one dir, main page]

Skip to content
Concurrent writes to the same chunk using DirectoryStore #328
Open
@alimanfoo

Description

@alimanfoo

Currently, DirectoryStore (and sub-classes) implement a form of atomic write, which means that when a key/value is being set (e.g., data for a chunk are being written) the data will first be written to a temporary file, and then if the write is successful, the temporary file will be moved into place. The rationale for this design is that data should never be left in a half-written state, i.e., a write either succeeds completely, or fails completely.

Also, DirectoryStore currently aims to be safe to use with multiple threads and/or processes. A question arises, what should happen if two threads or two processes attempt to write to the same key at the same time?

Before discussing that question, it is worth saying that in general, if there is a chance that two threads or processes might write to the same key at the same time, then that is something the user should be trying to avoid. Even if DirectoryStore uses atomic writes, if two write operations occur concurrently, then some data will get lost - one of the concurrent writes will get overwritten by the other. The user can do two things to avoid this. Either (1) they craft their own code carefully to ensure write operations are fully aligned with chunk boundaries, so two threads or processes are never writing to the same chunk, or (2) they use the support provided in zarr for synchronisation (locking). Zarr decouples storage from synchronisation, which allows different methods of synchronisation (locking) to be used with different types of storage. In other words, synchronisation is not the responsibility of the storage layer.

That said, it can (and does) happen that a user who is relatively new to Zarr might not yet be aware of these issues, and so might try to run a program that does not align writes with chunk boundaries, and does not implement any synchronisation. A specific example of that is #277. Ironically, there is currently a bug (#263) in the atomic writing implementation, which means that a race condition can occur during concurrent writes to the same chunks, which generates an error. In the case of #277, that bug was in fact a boon, because it caused an error that ultimately led to the user realising that synchronisation issues were occurring, which then led them to rework their own code.

However, the atomic write bug (#263) is in the process of being fixed (#327). When it is fixed, that will mean that in a situation like #277, the user would not get any error messages, and write operations would silently get lost. Presumably the problem would take much longer to surface, because the user will need to inspect the data to realise something has gone wrong. It might even be so subtle as to go unnoticed, which could be very bad for obvious reasons.

This is causing me some concern, and making me think that DirectoryStore should at least fail (i.e., generate an exception) if an attempt is made by two processes to concurrently write to the same key. That way, a user would realise as in #277 that something was wrong, and they need to implement a solution.

I also wonder if a possible solution might be that, instead of each atomic write opening a completely new temporary file, whether there should be just one temporary file for each key to which data are initially written, and this file is opened in exclusive mode (i.e., mode='x') so that if two concurrent threads or processes attempt to write an error is generated.

I.e., the code currently from this line would become something like:

        temp_path = file_path + '.partial'
        try:
            open(temp_path, mode='xb') as f:
                f.write(value)

            # move temporary file into place
            os.replace(temp_path, file_path)

        finally:
            # clean up if temp file still exists for whatever reason
            if os.path.exists(temp_path):
                os.remove(temp_path)

This also relates to #325 as there is discussion there about how to open a file for writing to.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew features or improvements

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0