8000 protoype for forceOneShardAttributeValue (#14707) · arangodb/arangodb@49be0ea · GitHub
[go: up one dir, main page]

Skip to content
8000

Commit 49be0ea

Browse files
jsteemanngoedderzKVS85hkernbach
authored
protoype for forceOneShardAttributeValue (#14707)
* [3.8] Lower priority of AQL lanes (#14699) * Lower priority of AQL lanes * Added CHANGELOG entry * Improved comments Co-authored-by: Vadim <vadim@arangodb.com> * added a test for statistics behavior (#14703) * properly rename test file (#14705) * protoype for forceOneShardAttributeValue * only enable restrictedShards in case one shard rule got active * fixed getResponsibleShards usage * do not check single boolean twice :) * changelog * Update CHANGELOG * Update CHANGELOG Co-authored-by: Tobias Gödderz <tobias@arangodb.com> Co-authored-by: Vadim <vadim@arangodb.com> Co-authored-by: Heiko Kernbach <heiko@arangodb.com>
1 parent d3fde79 commit 49be0ea

6 files changed

+157
-40
lines changed

CHANGELOG

Lines changed: 38 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,44 @@
11
v3.8.2 (XXXX-XX-XX)
22
-------------------
33

4+
* (EE only) Bug-fix: If you created a ArangoSearch view on Satellite-
5+
Collections only and then join with a collection only having a single shard
6+
the cluster-one-shard-rule was falsely applied and could lead to empty view
7+
results. The Rule will now detect the situation properly, and not trigger.
8+
9+
* (EE only) If you have a query using only satellite collections, now the
10+
cluster-one-shard-rule can be applied to improve query performance.
11+
12+
* (Enterprise Edition only): added query option `forceOneShardAttributeValue` to
13+
explicitly set a shard key value that will be used during query snippet
14+
distribution to limit the query to a specific server in the cluster.
15+
16+
This query option can be used in complex queries in case the query optimizer
17+
cannot automatically detect that the query can be limited to only a single
18+
server (e.g. in a disjoint smart graph case).
19+
When the option is set to the correct shard key value, the query will be
20+
limited to the target server determined by the shard key value. It thus
21+
requires that all collections in the query use the same distribution (i.e.
22+
`distributeShardsLike` attribute via disjoint SmartGraphs).
23+
24+
Limiting the query to a single DB server is a performance optimization and may
25+
make complex queries run a lot faster because of the reduced setup and
26+
teardown costs and the reduced cluster-internal traffic during query
27+
execution.
28+
29+
If the option is set incorrectly, i.e. to a wrong shard key value, then the
30+
query may be shipped to a wrong DB server and may not return results (i.e.
31+
empty result set). It is thus the caller's responsibility to set the
32+
`forceOneShardAttributeValue` correctly or not use it.
33+
34+
The `forceOneShardAttributeValue` option will only honor string values. All
35+
other values as well as the empty string will be ignored and treated as if the
36+
option is not set.
37+
38+
If the option is set and the query satisfies the requirements for using the
39+
option, the query's execution plan will contain the "cluster-one-shard"
40+
optimizer rule.
41+
442
* Updated ArangoDB Starter to 0.15.2.
543

644
* SEARCH-238: Improved SortNodes placement optimization in cluster so late

arangod/Aql/EngineInfoContainerDBServerServerBased.cpp

Lines changed: 10 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,7 @@ EngineInfoContainerDBServerServerBased::TraverserEngineShardLists::TraverserEngi
8282
auto const& restrictToShards = query.queryOptions().restrictToShards;
8383
// Extract the local shards for edge collections.
8484
for (auto const& col : edges) {
85+
TRI_ASSERT(col != nullptr);
8586
#ifdef USE_ENTERPRISE
8687
if (query.trxForOptimization().isInaccessibleCollection(col->id())) {
8788
_inaccessible.insert(col->name());
@@ -98,6 +99,7 @@ EngineInfoContainerDBServerServerBased::TraverserEngineShardLists::TraverserEngi
9899
// It might in fact be empty, if we only have edge collections in a graph.
99100
// Or if we guarantee to never read vertex data.
100101
for (auto const& col : vertices) {
102+
TRI_ASSERT(col != nullptr);
101103
#ifdef USE_ENTERPRISE
102104
if (query.trxForOptimization().isInaccessibleCollection(col->id())) {
103105
_inaccessible.insert(col->name());
@@ -115,7 +117,11 @@ std::vector<ShardID> EngineInfoContainerDBServerServerBased::TraverserEngineShar
115117
std::vector<ShardID> localShards;
116118
for (auto const& shard : *shardIds) {
117119
auto const& it = shardMapping.find(shard);
118-
TRI_ASSERT(it != shardMapping.end());
120+
if (it == shardMapping.end()) {
121+
THROW_ARANGO_EXCEPTION_MESSAGE(
122+
TRI_ERROR_INTERNAL,
123+
"no entry for shard '" + shard + "' in shard mapping table (" + std::to_string(shardMapping.size()) + " entries)");
124+
}
119125
if (it->second == server) {
120126
localShards.emplace_back(shard);
121127
_hasShard = true;
@@ -758,15 +764,15 @@ void EngineInfoContainerDBServerServerBased::addOptionsPart(arangodb::velocypack
758764
#endif
759765
}
760766

761-
// Insert the Variables information into the message to be send to DBServers
767+
// Insert the Variables information into the message to be sent to DBServers
762768
void EngineInfoContainerDBServerServerBased::addVariablesPart(arangodb::velocypack::Builder& builder) const {
763769
TRI_ASSERT(builder.isOpenObject());
764770
builder.add(VPackValue("variables"));
765771
// This will open and close an Object.
766772
_query.ast()->variables()->toVelocyPack(builder);
767773
}
768774

769-
// Insert the Snippets information into the message to be send to DBServers
775+
// Insert the Snippets information into the message to be sent to DBServers
770776
void EngineInfoContainerDBServerServerBased::addSnippetPart(
771777
std::unordered_map<ExecutionNodeId, ExecutionNode*> const& nodesById,
772778
arangodb::velocypack::Builder& builder, ShardLocking& shardLocking,
@@ -780,7 +786,7 @@ void EngineInfoContainerDBServerServerBased::addSnippetPart(
780786
builder.close(); // snippets
781787
}
782788

783-
// Insert the TraversalEngine information into the message to be send to DBServers
789+
// Insert the TraversalEngine information into the message to be sent to DBServers
784790
std::vector<bool> EngineInfoContainerDBServerServerBased::addTraversalEnginesPart(
785791
arangodb::velocypack::Builder& infoBuilder,
786792
std::unordered_map<ShardID, ServerID> const& shardMapping, S F438 erverID const& server) const {

arangod/Aql/GraphNode.cpp

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -466,12 +466,14 @@ void GraphNode::setGraphInfoAndCopyColls(std::vector<Collection*> const& edgeCol
466466
std::vector<Collection*> const& vertexColls) {
467467
_graphInfo.openArray();
468468
for (auto& it : edgeColls) {
469+
TRI_ASSERT(it != nullptr);
469470
_edgeColls.emplace_back(it);
470471
_graphInfo.add(VPackValue(it->name()));
471472
}
472473
_graphInfo.close();
473474

474475
for (auto& it : vertexColls) {
476+
TRI_ASSERT(it != nullptr);
475477
addVertexCollection(*it);
476478
}
477479
}
@@ -551,6 +553,7 @@ void GraphNode::toVelocyPackHelper(VPackBuilder& nodes, unsigned flags,
551553
{
552554
VPackArrayBuilder guard(&nodes);
553555
for (auto const& e : _edgeColls) {
556+
TRI_ASSERT(e != nullptr);
554557
auto const& shard = collectionToShardName(e->name());
555558
// if the mapped shard for a collection is empty, it means that
556559
// we have an edge collection that is only relevant on some of the
@@ -565,6 +568,7 @@ void GraphNode::toVelocyPackHelper(VPackBuilder& nodes, unsigned flags,
565568
{
566569
VPackArrayBuilder guard(&nodes);
567570
for (auto const& v : _vertexColls) {
571+
TRI_ASSERT(v != nullptr);
568572
// if the mapped shard for a collection is empty, it means that
569573
// we have a vertex collection that is only relevant on some of the
570574
// target servers
@@ -635,6 +639,7 @@ CostEstimate GraphNode::estimateCost() const {
635639
double baseCost = 1;
636640
size_t baseNumItems = 0;
637641
for (auto& e : _edgeColls) {
642+
TRI_ASSERT(e != nullptr);
638643
auto count = e->count(_options->trx(), transaction::CountType::TryCache);
639644
// Assume an estimate if 10% hit rate
640645
baseCost *= count / 10;
@@ -798,9 +803,11 @@ std::vector<aql::Collection const*> GraphNode::collections() const {
798803
set.reserve(_edgeColls.size() + _vertexColls.size());
799804

800805
for (auto const& collPointer : _edgeColls) {
806+
TRI_ASSERT(collPointer != nullptr);
801807
set.emplace(collPointer);
802808
}
803809
for (auto const& collPointer : _vertexColls) {
810+
TRI_ASSERT(collPointer != nullptr);
804811
set.emplace(collPointer);
805812
}
806813

arangod/Aql/QueryOptions.cpp

Lines changed: 19 additions & 23 deletions
Original file line numberDiff line numberDiff line change
@@ -159,50 +159,43 @@ void QueryOptions::fromVelocyPack(VPackSlice slice) {
159159
traversalProfile = static_cast<TraversalProfileLevel>(value.getNumber<uint16_t>());
160160
}
161161

162-
value = slice.get("allPlans");
163-
if (value.isBool()) {
162+
if (value = slice.get("allPlans"); value.isBool()) {
164163
allPlans = value.getBool();
165164
}
166-
value = slice.get("verbosePlans");
167-
if (value.isBool()) {
165+
if (value = slice.get("verbosePlans"); value.isBool()) {
168166
verbosePlans = value.getBool();
169167
}
170-
value = slice.get("stream");
171-
if (value.isBool()) {
168+
if (value = slice.get("stream"); value.isBool()) {
172169
stream = value.getBool();
173170
}
174-
value = slice.get("silent");
175-
if (value.isBool()) {
171+
if (value = slice.get("silent"); value.isBool()) {
176172
silent = value.getBool();
177173
}
178-
value = slice.get("failOnWarning");
179-
if (value.isBool()) {
174+
if (value = slice.get("failOnWarning"); value.isBool()) {
180175
failOnWarning = value.getBool();
181176
}
182-
value = slice.get("cache");
183-
if (value.isBool()) {
177+
if (value = slice.get("cache"); value.isBool()) {
184178
cache = value.getBool();
185179
}
186-
value = slice.get("fullCount");
187-
if (value.isBool()) {
180+
if (value = slice.get("fullCount"); value.isBool()) {
188181
fullCount = value.getBool();
189182
}
190-
value = slice.get("count");
191-
if (value.isBool()) {
183+
if (value = slice.get("count"); value.isBool()) {
192184
count = value.getBool();
193185
}
194-
value = slice.get("verboseErrors");
195-
if (value.isBool()) {
186+
if (value = slice.get("verboseErrors"); value.isBool()) {
196187
verboseErrors = value.getBool();
197188
}
198-
value = slice.get("explainRegisters");
199-
if (value.isBool()) {
200-
explainRegisters =
201-
value.getBool() ? ExplainRegisterPlan::Yes : ExplainRegisterPlan::No;
189+
if (value = slice.get("explainRegisters"); value.isBool()) {
190+
explainRegisters = value.getBool() ? ExplainRegisterPlan::Yes : ExplainRegisterPlan::No;
202191
}
203-
192+
204193
// note: skipAudit is intentionally not read here.
205194
// the end user cannot override this setting
195+
196+
if (value = slice.get("forceOneShardAttributeValue"); value.isString()) {
197+
forceOneShardAttributeValue = value.copyString();
198+
}
206199

207200
VPackSlice optimizer = slice.get("optimizer");
208201
if (optimizer.isObject()) {
@@ -270,6 +263,9 @@ void QueryOptions::toVelocyPack(VPackBuilder& builder, bool disableOptimizerRule
270263
builder.add("fullCount", VPackValue(fullCount));
271264
builder.add("count", VPackValue(count));
272265
builder.add("verboseErrors", VPackValue(verboseErrors));
266+
if (!forceOneShardAttributeValue.empty()) {
267+
builder.add("forceOneShardAttributeValue< 7802 span class="pl-pds">", VPackValue(forceOneShardAttributeValue));
268+
}
273269

274270
// note: skipAudit is intentionally not serialized here.
275271
// the end user cannot override this setting anyway.

arangod/Aql/QueryOptions.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -95,6 +95,10 @@ struct QueryOptions {
9595
/// @brief hack to be used only for /_api/export, contains the name of
9696
/// the target collection
9797
std::string exportCollection;
98+
99+
/// @brief shard key attribute value used to push a query down
100+
/// to a single server
101+
std::string forceOneShardAttributeValue;
98102

99103
/// @brief optimizer rules to turn off/on manually
100104
std::vector<std::string> optimizerRules;

0 commit comments

Comments
 (0)
0