8000 Get rid of use-after-free in async registry by jvolmer · Pull Request #21779 · arangodb/arangodb · GitHub
[go: up one dir, main page]

Skip to content

Get rid of use-after-free in async registry #21779

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into devel from
May 20, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 0 additions & 1 deletion lib/TaskMonitoring/include/TaskMonitoring/task.h
Original file line number Diff line number Diff line change
Expand Up @@ -122,7 +122,6 @@ struct TaskInRegistry {

std::string const name;
std::atomic<State> state;
std::atomic<bool> isDeleted = false;
ParentTask parent;
std::optional<basics::ThreadId>
running_thread; // proably has to also be atomic because
Expand Down
77 changes: 14 additions & 63 deletions lib/TaskMonitoring/task.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -73,74 +73,25 @@ auto TaskInRegistry::snapshot() -> TaskSnapshot {

namespace {
/**
Marks itself for deletion and all finishes parent nodes that do not have any
other children.
Marks itself for deletion and deletes a parent reference.

Loops over the hierarchy of parents of the given node. This function makes
sure that these hierarchies are all marked for deletion together and can
therefore be deleted in the next garbage collection run. Otherwise, a parent
exists as long as its last child is not garbage collected because the child
owns a shared reference to the parent, requiring as many garbage collection
cycles as hierarchy levels to cleanup the given hierarchy.

Some examples:

If there are dependencies like (A └ B means A is parent of B)
node_A
└ node_B
└ node (input to this function)
and all of these nodes do not have any other references and are therefore
finished, then this function makes sure that all of these node are marked for
deletion.

If we have dependencies like
node_A
├ node_B
└ node_C
└ node (input to this function)
with node_C and node beeing finished, this function will mark node_C and node
for deletion but neither node_B nor node_A, independent of their state.
Deleting the parent reference makes sure that
a parent can directly be marked for deletion when all its children are
marked. Otherwise, we need to wait for the garbage collection to delete the
references, possibly requiring several garbage collection cycles to delete
all hierarchy levels.
*/
auto mark_finished_nodes_for_deletion(Node* node) {
auto current_node = node;
while (true) {
auto specific_node =
reinterpret_cast<containers::ThreadOwnedList<TaskInRegistry>::Node*>(
current_node);

// make sure that we don't mark a node twice for deletion
auto expected = false;
if (not specific_node->data.isDeleted.compare_exchange_strong(
expected, true, std::memory_order_acq_rel)) {
break;
}

// do not continue if node does not have a parent
auto& parent = specific_node->data.parent;
if (not std::holds_alternative<NodeReference>(parent)) {
specific_node->list->mark_for_deletion(specific_node);
break;
}
// do not continue if node is not the only child of parent
// if child is the only reference to parent, parent is not running any more
// (there is no Task that holds another shared reference) so no new children
// can be added to the parent
auto& parent_ref = std::get<NodeReference>(parent);
if (parent_ref.use_count() != 1) {
// if we miss decrement updates here, it does not matter:
// miss 2 -> 1: parent mark for deletion is done by the other child
// miss 1 -> 0: loop once more but exit early because isDeleted==true
specific_node->list->mark_for_deletion(specific_node);
break;
}
auto specific_node =
reinterpret_cast<containers::ThreadOwnedList<TaskInRegistry>::Node*>(
node);

// continue with parent node
current_node = parent_ref.get();
// get rid of parent task and delete a shared reference
specific_node->data.parent = {RootTask{}};

// mark node for deletion needs to be last action on specific_node, because
// then a garbage collection run can destroy the node at any time
specific_node->list->mark_for_deletion(specific_node);
}
// mark node for deletion needs to be last action on specific_node, because
// then a garbage collection run can destroy the node at any time
specific_node->list->mark_for_deletion(specific_node);
}
} // namespace

Expand Down
18 changes: 0 additions & 18 deletions tests/TaskMonitoring/TaskRegistryTest.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -54,24 +54,6 @@ struct MyTask : public Task {

struct TaskRegistryTest : ::testing::Test {
void TearDown() override {
// garbage collection has to run at most twice in order to clean everything
// up on the current thread:
// - when a child task scope is deleted, the child's task-in-registry is
// marked for deletion
// - at this point its parent task scope can still exist, therefore it is
// not marked for deletion inside the child task scope destructor
// - when then the parent task scope is deleted, the parent's
// task-in-registry is still referenced by the child's task-in-registry
// (which is not yet deleted), therefore it is not yet marked for deletion

// the first gc run destroys the child's task-in-registry
// which destroys the last reference to the parent's task-in-registry, which
// is therfore marked for deletion (together with all remaining
// task-in-registries higher up in the hierarchy that are not referenced by
// any other tasks)
get_thread_registry().garbage_collect();
// the second gc run destroys the parent's task-in-registry (and possibly
// other marked for deletion items)
get_thread_registry().garbage_collect();
EXPECT_EQ(get_all_tasks().size(), 0);
}
Expand Down
0