10000 Bug fix/collection babies race timeout (#9185) · reynoldsm88/arangodb@2c78e24 · GitHub
[go: up one dir, main page]

Skip to content

Commit 2c78e24

Browse files
mchackifceller
authored andcommitted
Bug fix/collection babies race timeout (arangodb#9185)
* Fixed include guard. * Forward port of 3.4 bug-fix * Removed lockers alltogether we are secured mutex already * Fixed recursive lock gathering
1 parent cc125b3 commit 2c78e24

File tree

4 files changed

+39
-26
lines changed

4 files changed

+39
-26
lines changed

CHANGELOG

Lines changed: 12 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,9 @@
11
devel
22
-----
33

4+
* Speed up collection creation process in cluster, if not all agency callbacks are
5+
delivered successfully.
6+
47
* increased performance of document inserts, by reducing the number of checks in unique / primary indexes
58

69
* fixed a callback function in the web UI where the variable `this` was out of scope.
@@ -34,7 +37,7 @@ devel
3437

3538
v3.5.0-rc.3 (2019-05-31)
3639
------------------------
37-
40+
3841
* fix issue #9106: Sparse Skiplist Index on multiple fields not used for FILTER + SORT query
3942

4043
Allow AQL query optimizer to use sparse indexes in more cases, specifically when
@@ -52,7 +55,7 @@ v3.5.0-rc.3 (2019-05-31)
5255
* Bugfix for smart graph traversals with uniqueVertices: path, which could
5356
sometimes lead to erroneous traversal results
5457

55-
* Pregel algorithms can be run with the option "useMemoryMaps: true" to be
58+
* Pregel algorithms can be run with the option "useMemoryMaps: true" to be
5659
able to run algorithms on data that is bigger than the available RAM.
5760

5861
* fix a race in TTL thread deactivation/shutdown
@@ -80,15 +83,15 @@ v3.5.0-rc.2 (2019-05-23)
8083
and uncompressed data blocks not fitting into the block cache
8184

8285
The error can only occur for collection or index scans with the RocksDB storage engine
83-
when the RocksDB block cache is used and set to a very small size, plus its maximum size is
86+
when the RocksDB block cache is used and set to a very small size, plus its maximum size is
8487
enforced by setting the `--rocksdb.enforce-block-cache-size-limit` option to `true`.
8588

8689
Previously these incomplete reads could have been ignored silently, making collection or
8790
index scans return less documents than there were actually present.
8891

8992
* fixed internal issue #3918: added optional second parameter "withId" to AQL
9093
function PREGEL_RESULT
91-
94+
9295
this parameter defaults to `false`. When set to `true` the results of the Pregel
9396
computation run will also contain the `_id` attribute for each vertex and not
9497
just `_key`. This allows distinguishing vertices from different vertex collections.
@@ -99,9 +102,9 @@ v3.5.0-rc.2 (2019-05-23)
99102

100103
* internally switch unit tests framework from catch to gtest
101104

102-
* disable selection of index types "hash" and "skiplist" in the web interface when
103-
using the RocksDB engine. The index types "hash", "skiplist" and "persistent" are
104-
just aliases of each other with the RocksDB engine, so there is no need to offer all
105+
* disable selection of index types "hash" and "skiplist" in the web interface when
106+
using the RocksDB engine. The index types "hash", "skiplist" and "persistent" are
107+
just aliases of each other with the RocksDB engine, so there is no need to offer all
105108
of them. After initially only offering "hash" indexes, we decided to only offer
106109
indexes of type "persistent", as it is technically the most
107110
appropriate description.
@@ -619,15 +622,15 @@ v3.4.6 (2019-05-21)
619622
and uncompressed data blocks not fitting into the block cache
620623

621624
The 8000 error can only occur for collection or index scans with the RocksDB storage engine
622-
when the RocksDB block cache is used and set to a very small size, plus its maximum size is
625+
when the RocksDB block cache is used and set to a very small size, plus its maximum size is
623626
enforced by setting the `--rocksdb.enforce-block-cache-size-limit` option to `true`.
624627

625628
Previously these incomplete reads could have been ignored silently, making collection or
626629
index scans return less documents than there were actually present.
627630

628631
* fixed internal issue #3918: added optional second parameter "withId" to AQL
629632
function PREGEL_RESULT
630-
633+
631634
this parameter defaults to `false`. When set to `true` the results of the Pregel
632635
computation run will also contain the `_id` attribute for each vertex and not
633636
just `_key`. This allows distinguishing vertices from different vertex collections.

arangod/Cluster/AgencyCallback.cpp

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -125,7 +125,7 @@ bool AgencyCallback::execute(std::shared_ptr<VPackBuilder> newData) {
125125
return result;
126126
}
127127

128-
void AgencyCallback::executeByCallbackOrTimeout(double maxTimeout) {
128+
bool AgencyCallback::executeByCallbackOrTimeout(double maxTimeout) {
129129
// One needs to acquire the mutex of the condition variable
130130
// before entering this function!
131131
if (!_cv.wait(static_cast<uint64_t>(maxTimeout * 1000000.0)) &&
@@ -134,5 +134,7 @@ void AgencyCallback::executeByCallbackOrTimeout(double maxTimeout) {
134134
<< "Waiting done and nothing happended. Refetching to be sure";
135135
// mop: watches have not triggered during our sleep...recheck to be sure
136136
refetchAndUpdate(false, true); // Force a check
137+
return true;
137138
}
139+
return false;
138140
}

arangod/Cluster/AgencyCallback.h

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -112,9 +112,12 @@ class AgencyCallback {
112112

113113
//////////////////////////////////////////////////////////////////////////////
114114
/// @brief wait until a callback is received or a timeout has happened
115+
///
116+
/// @return true => if we got woken up after maxTimeout
117+
/// false => if someone else ringed the condition variable
115118
//////////////////////////////////////////////////////////////////////////////
116119

117-
void executeByCallbackOrTimeout(double);
120+
bool executeByCallbackOrTimeout(double);
118121

119122
//////////////////////////////////////////////////////////////////////////////
120123
/// @brief private members

arangod/Cluster/ClusterInfo.cpp

Lines changed: 20 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -1977,13 +1977,9 @@ Result ClusterInfo::createCollectionsCoordinator(std::string const& databaseName
19771977

19781978
if (nrDone->load(std::memory_order_acquire) == infos.size()) {
19791979
{
1980-
// We need to lock all condition variables
1981-
std::vector<::arangodb::basics::ConditionLocker> lockers;
1982-
for (auto& cb : agencyCallbacks) {
1983-
CONDITION_LOCKER(locker, cb->_cv);
1984-
}
1980+
// We do not need to lock all condition variables
1981+
// we are save by cacheMutex
19851982
cbGuard.fire();
1986-
// After the guard is done we can release the lockers
19871983
}
19881984
// Now we need to remove TTL + the IsBuilding flag in Agency
19891985
opers.clear();
@@ -2009,13 +2005,9 @@ Result ClusterInfo::createCollectionsCoordinator(std::string const& databaseName
20092005
}
20102006
if (tmpRes > TRI_ERROR_NO_ERROR) {
20112007
{
2012-
// We need to lock all condition variables
2013-
std::vector<::arangodb::basics::ConditionLocker> lockers;
2014-
for (auto& cb : agencyCallbacks) {
2015-
CONDITION_LOCKER(locker, cb->_cv);
2016-
}
2008+
// We do not need to lock all condition variables
2009+
// we are save by cacheMutex
20172010
cbGuard.fire();
2018-
// After the guard is done we can release the lockers
20192011
}
20202012

20212013
// report error
@@ C463 -2047,9 +2039,22 @@ Result ClusterInfo::createCollectionsCoordinator(std::string const& databaseName
20472039
TRI_ASSERT(agencyCallbacks.size() == infos.size());
20482040
for (size_t i = 0; i < infos.size(); ++i) {
20492041
if (infos[i].state == ClusterCollectionCreationInfo::INIT) {
2050-
// This one has not responded, wait for it.
2051-
CONDITION_LOCKER(locker, agencyCallbacks[i]->_cv);
2052-
agencyCallbacks[i]->executeByCallbackOrTimeout(interval);
2042+
bool wokenUp = false;
2043+
{
2044+
// This one has not responded, wait for it.
2045+
CONDITION_LOCKER(locker, agencyCallbacks[i]->_cv);
2046+
wokenUp = agencyCallbacks[i]->executeByCallbackOrTimeout(interval);
2047+
}
2048+
if (wokenUp) {
2049+
++i;
2050+
// We got woken up by waittime, not by callback.
2051+
// Let us check if we skipped other callbacks as well
2052+
for (; i < infos.size(); ++i) {
2053+
if (infos[i].state == ClusterCollectionCreationInfo::INIT) {
2054+
agencyCallbacks[i]->refetchAndUpdate(true, false);
2055+
}
2056+
}
2057+
}
20532058
break;
20542059
}
20552060
}

0 commit comments

Comments
 (0)
0