10000 prototype for forceOneShardAttributeValue by jsteemann · Pull Request #14701 · arangodb/arangodb · GitHub
[go: up one dir, main page]

Skip to content

prototype for forceOneShardAttributeValue #14701

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Sep 7, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
40 changes: 40 additions & 0 deletions CHANGELOG
Original file line number Diff line number Diff line change
@@ -1,6 +1,46 @@
devel
-----

* (EE only) Bug-fix: If you created a ArangoSearch view on Satellite-
Collections only and then join with a collection only having a single
shard the cluster-one-shard-rule was falsely applied and could lead to
empty view results. The Rule will now detect the situation properly,
and not trigger.

* (EE only) If you have a query using only satellite collections,
now the cluster-one-shard-rule can be applied to improve
query performance.

* (Enterprise Edition only): added query option `forceOneShardAttributeValue`
to explicitly set a shard key value that will be used during query snippet
distribution to limit the query to a specific server in the cluster.

This query option can be used in complex queries in case the query optimizer
cannot automatically detect that the query can be limited to only a single
server (e.g. in a disjoint smart graph case).
When the option is set to the correct shard key value, the query will be
limited to the target server determined by the shard key value. It thus
requires that all collections in the query use the same distribution
(i.e. `distributeShardsLike` attribute via disjoint SmartGraphs).

Limiting the query to a single DB server is a performance optimization
and may make complex queries run a lot faster because of the reduced
setup and teardown costs and the reduced cluster-internal traffic during
query execution.

If the option is set incorrectly, i.e. to a wrong shard key value, then
the query may be shipped to a wrong DB server and may not return results
(i.e. empty result set). It is thus the caller's responsibility to set
the `forceOneShardAttributeValue` correctly or not use it.

The `forceOneShardAttributeValue` option will only honor string values.
All other values as well as the empty string will be ignored and treated
as if the option is not set.

If the option is set and the query satisfies the requirements for using
the option, the query's execution plan will contain the "cluster-one-shard"
optimizer rule.

* Updated ArangoDB Starter to 0.15.2.

* SEARCH-238: Improved SortNodes placement optimization in cluster so
Expand Down
14 changes: 10 additions & 4 deletions arangod/Aql/EngineInfoContainerDBServerServerBased.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -87,6 +87,7 @@ EngineInfoContainerDBServerServerBased::TraverserEngineShardLists::TraverserEngi
auto const& restrictToShards = query.queryOptions().restrictToShards;
// Extract the local shards for edge collections.
for (auto const& col : edges) {
TRI_ASSERT(col != nullptr);
#ifdef USE_ENTERPRISE
if (query.trxForOptimization().isInaccessibleCollection(col->id())) {
_inaccessible.insert(col->name());
Expand All @@ -103,6 +104,7 @@ EngineInfoContainerDBServerServerBased::TraverserEngineShardLists::TraverserEngi
// It might in fact be empty, if we only have edge collections in a graph.
// Or if we guarantee to never read vertex data.
for (auto const& col : vertices) {
TRI_ASSERT(col != nullptr);
#ifdef USE_ENTERPRISE
if (query.trxForOptimization().isInaccessibleCollection(col->id())) {
_inaccessible.insert(col->name());
Expand All @@ -120,7 +122,11 @@ std::vector<ShardID> EngineInfoContainerDBServerServerBased::TraverserEngineShar
std::vector<ShardID> localShards;
for (auto const& shard : *shardIds) {
auto const& it = shardMapping.find(shard);
TRI_ASSERT(it != shardMapping.end());
if (it == shardMapping.end()) {
THROW_ARANGO_EXCEPTION_MESSAGE(
TRI_ERROR_INTERNAL,
"no entry for shard '" + shard + "' in shard mapping table (" + std::to_string(shardMapping.size()) + " entries)");
}
if (it->second == server) {
localShards.emplace_back(shard);
_hasShard = true;
Expand Down Expand Up @@ -853,15 +859,15 @@ void EngineInfoContainerDBServerServerBased::addOptionsPart(arangodb::velocypack
#endif
}

// Insert the Variables information into the message to be send to DBServers
// Insert the Variables information into the message to be sent to DBServers
void EngineInfoContainerDBServerServerBased::addVariablesPart(arangodb::velocypack::Builder& builder) const {
TRI_ASSERT(builder.isOpenObject());
builder.add(VPackValue("variables"));
// This will open and close an Object.
_query.ast()->variables()->toVelocyPack(builder);
}

// Insert the Snippets information into the message to be send to DBServers
// Insert the Snippets information into the message to be sent to DBServers
void EngineInfoContainerDBServerServerBased::addSnippetPart(
std::unordered_map<ExecutionNodeId, ExecutionNode*> const& nodesById,
arangodb::velocypack::Builder& builder, ShardLocking& shardLocking,
Expand All @@ -875,7 +881,7 @@ void EngineInfoContainerDBServerServerBased::addSnippetPart(
builder.close(); // snippets
}

// Insert the TraversalEngine information into the message to be send to DBServers
// Insert the TraversalEngine information into the message to be sent to DBServers
std::vector<bool> EngineInfoContainerDBServerServerBased::addTraversalEnginesPart(
arangodb::velocypack::Builder& infoBuilder,
std::unordered_map<ShardID, ServerID> const& shardMapping, ServerID const& server) const {
Expand Down
7 changes: 7 additions & 0 deletions arangod/Aql/GraphNode.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -466,12 +466,14 @@ void GraphNode::setGraphInfoAndCopyColls(std::vector<Collection*> const& edgeCol
std::vector<Collection*> const& vertexColls) {
_graphInfo.openArray();
for (auto& it : edgeColls) {
TRI_ASSERT(it != nullptr);
_edgeColls.emplace_back(it);
_graphInfo.add(VPackValue(it->name()));
}
_graphInfo.close();

for (auto& it : vertexColls) {
TRI_ASSERT(it != nullptr);
addVertexCollection(*it);
}
}
Expand Down Expand Up @@ -547,6 +549,7 @@ void GraphNode::doToVelocyPack(VPackBuilder& nodes, unsigned flags) const {
{
VPackArrayBuilder guard(&nodes);
for (auto const& e : _edgeColls) {
TRI_ASSERT(e != nullptr);
auto const& shard = collectionToShardName(e->name());
// if the mapped shard for a collection is empty, it means that
// we have an edge collection that is only relevant on some of the
Expand All @@ -561,6 +564,7 @@ void GraphNode::doToVelocyPack(VPackBuilder& nodes, unsigned flags) const {
{
VPackArrayBuilder guard(&nodes);
for (auto const& v : _vertexColls) {
TRI_ASSERT(v != nullptr);
// if the mapped shard for a collection is empty, it means that
// we have a vertex collection that is only relevant on some of the
// target servers
Expand Down Expand Up @@ -631,6 +635,7 @@ CostEstimate GraphNode::estimateCost() const {
double baseCost = 1;
size_t baseNumItems = 0;
for (auto& e : _edgeColls) {
TRI_ASSERT(e != nullptr);
auto count = e->count(_options->trx(), transaction::CountType::TryCache);
// Assume an estimate if 10% hit rate
baseCost *= count / 10;
Expand Down Expand Up @@ -794,9 +799,11 @@ std::vector<aql::Collection const*> GraphNode::collections() const {
set.reserve(_edgeColls.size() + _vertexColls.size());

for (auto const& collPointer : _edgeColls) {
TRI_ASSERT(collPointer != nullptr);
set.emplace(collPointer);
}
for (auto const& collPointer : _vertexColls) {
TRI_ASSERT(collPointer != nullptr);
set.emplace(collPointer);
}

Expand Down
42 changes: 19 additions & 23 deletions arangod/Aql/QueryOptions.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -172,50 +172,43 @@ void QueryOptions::fromVelocyPack(VPackSlice slice) {
traversalProfile = static_cast<TraversalProfileLevel>(value.getNumber<uint16_t>());
}

value = slice.get("allPlans");
if (value.isBool()) {
if (value = slice.get("allPlans"); value.isBool()) {
allPlans = value.getBool();
}
value = slice.get("verbosePlans");
if (value.isBool()) {
if (value = slice.get("verbosePlans"); value.isBool()) {
verbosePlans = value.getBool();
}
value = slice.get("stream");
if (value.isBool()) {
if (value = slice.get("stream"); value.isBool()) {
stream = value.getBool();
}
value = slice.get("silent");
if (value.isBool()) {
if (value = slice.get("silent"); value.isBool()) {
silent = value.getBool();
}
value = slice.get("failOnWarning");
if (value.isBool()) {
if (value = slice.get("failOnWarning"); value.isBool()) {
failOnWarning = value.getBool();
}
value = slice.get("cache");
if (value.isBool()) {
if (value = slice.get("cache"); value.isBool()) {
cache = value.getBool();
}
value = slice.get("fullCount");
if (value.isBool()) {
if (value = slice.get("fullCount"); value.isBool()) {
fullCount = value.getBool();
}
value = slice.get("count");
if (value.isBool()) {
if (value = slice.get("count"); value.isBool()) {
count = value.getBool();
}
value = slice.get("verboseErrors");
if (value.isBool()) {
if (value = slice.get("verboseErrors"); value.isBool()) {
verboseErrors = value.getBool();
}
value = slice.get("explainRegisters");
if (value.isBool()) {
explainRegisters =
value.getBool() ? ExplainRegisterPlan::Yes : ExplainRegisterPlan::No;
if (value = slice.get("explainRegisters"); value.isBool()) {
explainRegisters = value.getBool() ? ExplainRegisterPlan::Yes : ExplainRegisterPlan::No;
}

// note: skipAudit is intentionally not read here.
// the end user cannot override this setting

if (value = slice.get("forceOneShardAttributeValue"); value.isString()) {
forceOneShardAttributeValue = value.copyString();
}

VPackSlice optimizer = slice.get("optimizer");
if (optimizer.isObject()) {
Expand Down Expand Up @@ -279,6 +272,9 @@ void QueryOptions::toVelocyPack(VPackBuilder& builder, bool disableOptimizerRule
builder.add("fullCount", VPackValue(fullCount));
builder.add("count", VPackValue(count));
builder.add("verboseErrors", VPackValue(verboseErrors));
if (!forceOneShardAttributeValue.empty()) {
builder.add("forceOneShardAttributeValue", VPackValue(forceOneShardAttributeValue));
}

// note: skipAudit is intentionally not serialized here.
// the end user cannot override this setting anyway.
Expand Down
4 changes: 4 additions & 0 deletions arangod/Aql/QueryOptions.h
Original file line number Diff line number Diff line change
Expand Up @@ -92,6 +92,10 @@ struct QueryOptions {
bool skipAudit; // skips audit logging - used only internally
ExplainRegisterPlan explainRegisters;

/// @brief shard key attribute value used to push a query down
/// to a single server
std::string forceOneShardAttributeValue;

/// @brief optimizer rules to turn off/on manually
std::vector<std::string> optimizerRules;

Expand Down
Loading
0