Bug fix/fix remote executor races #10206

goedderz · 2019-10-09T13:21:04Z

Scope & Purpose

Fix some races in the RemoteExecutor

Bug-Fix for devel-branch (no need for backports)

goedderz · 2019-10-09T13:39:01Z

http://jenkins01.arangodb.biz:8080/view/PR/job/arangodb-matrix-pr/6661/

graetzer · 2019-10-09T13:39:11Z

arangod/Aql/RemoteExecutor.cpp

-                          } else {
-                            _lastResponse = std::move(res);
-                          }
+                      std::lock_guard<std::mutex> guard(_communicationMutex);


actually I do not get the need for the extra mutex here, an atomic _lastTicket should be sufficient so synchronize access to _lastError and _lastResponse ?

The problem is that we do not only have to synchronize a request with its answer, but also protect an answer with another request; e.g., RemoteExecutor sends a getSome/skipSome request, then a shutdown happens (e.g. due to a timeout) at the same time when the answer arrives. Then, this code in ExecutionBlockImpl<RemoteExecutor>::shutdown

if (!_hasTriggeredShutdown) { std::lock_guard<std::mutex> guard(_communicationMutex); std::ignore = generateNewTicket(); _hasTriggeredShutdown = true; }

which resets _lastTicket, _lastError and _lastResponse, races with the lambda that's called with the answer:

[=, ref(std::move(ref))](fuerte::Error err, std::unique_ptr<fuerte::Request>, std::unique_ptr<fuerte::Response> res) { std::lock_guard<std::mutex> guard(_communicationMutex); if (_lastTicket == ticket) { if (err != fuerte::Error::NoError || res->statusCode() >= 400) { _lastError = handleErrorResponse(spec, err, res.get()); } else { _lastResponse = std::move(res); } _query.sharedState()->execute(); } }

which reads _lastTicket and then sets _lastError or _lastResponse. I do not see how that should be correctly synchronized with just an atomic _lastTicket.

mchacki

Optimal comment for more Debug output.
i am not bound to this being addressed.
Otherwise 👍

mchacki · 2019-10-09T13:40:15Z

arangod/Aql/RemoteExecutor.cpp

+
+  // Already sent a shutdown request, but haven't got an answer yet.
+  if (_didSendShutdownRequest) {
+    return {ExecutionState::WAITING, TRI_ERROR_NO_ERROR};


Do we want to add. traceShutdown here as well?

From the code I had locally, I just added the tracing of actual remote requests, not the tracing of ::shutdown() calls - I thought that'd be enough in most of the cases where shutdown is of interest at all, and doesn't clutter the log as much.

…ture/one-shard-clean-up-2 * 'devel' of https://github.com/arangodb/arangodb: Bug fix/improve stringutils performance (#10208) add option to talk to the SUT using VST (#10217) Doc - Added "log-output" example (#10207) fix it! (#10198) add missing include Bug fix/fix simple example dep proxy skip some regression test (#10213) fixed ui behaviour when replacing a foxx app (#9719) [devel] Fix document search (Ctrl+F/Cmd+F) (#10216) Convert many uses of ClusterComm to Fuerte (#10154) Remove invokeOnAllElements (#10212) AQL Subquery: MultiDependencyRowFetcher (#10101) Bug fix/fix remote executor races (#10206) fix several inefficiencies in Store (#10189) Deprecate rocksdb.max-write-buffer-number startup option (#9654) fix arangosh with vst

goedderz self-assigned this Oct 9, 2019

Fix races in RemoteExecutor

0b5512e

goedderz force-pushed the bug-fix/fix-remote-executor-races branch from e9fa270 to 0b5512e Compare October 9, 2019 13:21

goedderz added the 1 Bug label Oct 9, 2019

goedderz added this to the devel milestone Oct 9, 2019

Removed #ifdef

5d21d7f

goedderz requested review from graetzer and mchacki October 9, 2019 13:38

graetzer reviewed Oct 9, 2019

View reviewed changes

mchacki approved these changes Oct 9, 2019

View reviewed changes

goedderz added the 9 Ready to Merge label Oct 10, 2019

mchacki merged commit 6528c59 into devel Oct 10, 2019

mchacki deleted the bug-fix/fix-remote-executor-races branch October 10, 2019 10:36

goedderz mentioned this pull request Oct 17, 2019

Avoid a race and a use-after-free bug in the RemoteExecutor #10275

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug fix/fix remote executor races #10206

Bug fix/fix remote executor races #10206

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Bug fix/fix remote executor races #10206

Bug fix/fix remote executor races #10206

Uh oh!

Conversation

Scope & Purpose

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!