8000 ZKD Indexes by maierlars · Pull Request #13650 · arangodb/arangodb · GitHub
[go: up one dir, main page]

Skip to content

ZKD Indexes #13650

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 50 commits into from
Sep 10, 2021
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
c19d549
Bootstrap zkd indexes. Now fill in the details.
Mar 3, 2021
60d6ceb
Added prototype for condition finder.
Mar 4, 2021
63fd0b7
Added helper for Range.
Mar 4, 2021
a43cf14
Fixed double conversion.
Mar 4, 2021
9305dbc
Added wrong but working nextImpl
Mar 5, 2021
14ba135
Extract bounds for iterator.
Mar 5, 2021
3d7d5fc
Fixed extraction logic with reversed operator usage.
Mar 5, 2021
db52b6d
Detect equal operator correctly.
Mar 5, 2021
ede46db
Implemented nextImpl
goedderz Mar 5, 2021
5c78b96
Removed log devel for hot path.
Mar 5, 2021
0b73dcf
Cleanup
goedderz Mar 5, 2021
4fb378c
Added support for partially unbounded queries.
Mar 5, 2021
6948eb6
Do not use index for full collection scans.
Mar 5, 2021
34117f4
Merge branch 'feature/zkd-index' of github.com:arangodb/arangodb into…
Mar 5, 2021
59fd78e
Added tests for ZkdHelper.
Mar 8, 2021
db392e3
Added functional test suite skeleton for zkd index
goedderz Mar 8, 2021
811e660
Merge remote-tracking branch 'origin/devel' into feature/zkd-index
Mar 8, 2021
91f5651
Some fixes to double conversions
goedderz Mar 8, 2021
5be1cd2
Fixed bug in double to byte_string conversion.
Mar 8, 2021
34c4324
Added support for denormalized doubles and infinity.
Mar 9, 2021
e2977ff
Test all interesting double values.
Mar 9, 2021
c9042d2
Disallow Nan.
Mar 9, 2021
dd3b8f2
Added more js tests.
Mar 9, 2021
7b879dc
Added a lot of tests.
Mar 9, 2021
6fcf4d1
Cleaned up test.
Mar 9, 2021
5697c52
[zkd] Strict comparsion (#13673)
Mar 10, 2021
1d3522d
[zkd] Cluster support (#13677)
Mar 11, 2021
94dd02e
Merge remote-tracking branch 'origin/devel' into feature/zkd-index
Mar 11, 2021
3d1ce4b
Added support for nested fields - excluding expansions. (#13681)
Mar 11, 2021
7618470
[zkd] Unique Constraints (#13691)
Mar 12, 2021
1d4928b
[zkd] Forward Compat (#13694)
Mar 15, 2021
8cabd6f
[zkd] Column Family (#13692)
Mar 15, 2021
bbf25b4
Added zkd index docu block. (#13698)
Mar 15, 2021
caa87ae
Fixed bug in RocksDBKeyBounds and using default cost estimation
Mar 16, 2021
6c6a34e
Merge remote-tracking branch 'origin/devel' into feature/zkd-index
Mar 22, 2021
b81f948
[zkd] testInBox speedup (#13798)
Mar 24, 2021
aaddb71
Feature/zkd index speedup getnextzvalue (#13799)
goedderz Mar 25, 2021
10ed61a
Merge remote-tracking branch 'origin/devel' into feature/zkd-index
May 25, 2021
5a1adc3
Merge remote-tracking branch 'origin/devel' into feature/zkd-index
May 26, 2021
584f524
Merge remote-tracking branch 'origin/devel' into feature/zkd-index
Jul 1, 2021
d650d3f
Updated CHANGELOG.
Jul 1, 2021
265ed5c
Merge branch 'devel' of github.com:arangodb/arangodb into feature/zkd…
goedderz Jul 6, 2021
3c73e3d
Merge remote-tracking branch 'origin/devel' into feature/zkd-index
Jul 15, 2021
41ea5e2
Merge remote-tracking branch 'origin/devel' into feature/zkd-index
Aug 5, 2021
fa7d8b0
Fixing iterator.
Aug 6, 2021
5f83c9f
Merge remote-tracking branch 'origin/devel' into feature/zkd-index
Aug 12, 2021
dd48bf9
Added review suggestions.
Aug 20, 2021
a55c4c6
Merge remote-tracking branch 'origin/devel' into feature/zkd-index
Sep 9, 2021
60473c5
Applied suggestions from review.
Sep 9, 2021
a185712
Merge branch 'devel' into feature/zkd-index
mchacki Sep 10, 2021
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 17 additions & 0 deletions CHANGELOG
Original file line number Diff line number Diff line change
@@ -1,6 +1,23 @@
devel
-----

* Added multidimensional indexes which can be used to efficiently intersect
multiple range queries. They are currently limited to IEEE-754 double values.
Given documents of the form {x: 12.9, y: -284.0, z: 0.02} one can define a
multidimensional index using the new type 'zkd' on the fields ["x", "y", "z"].

The AQL optimizer will then consider this index when doing queries on multiple
ranges, for example:

FOR p IN points
FILTER x0 <= p.x && p.x <= x1
FILTER y0 <= p.y && p.y <= y1
FILTER z0 <= p.z && p.z <= z1
RETURN p

The index implements the relation <=, == and >= natively. Strict relations are
emulated using post filtering. Ranges can be unbounded on one or both sides.

* No runtime limits for shard move and server cleanout jobs, instead
possibility to cancel them.

Expand Down
68 changes: 68 additions & 0 deletions Documentation/DocuBlocks/Rest/Indexes/post_api_index_zkd.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@

@startDocuBlock post_api_index_zkd
@brief creates a multi-dimensional index

@RESTHEADER{POST /_api/index#multi-dim, Create multi-dimensional index, createIndex#multi-dim}

@RESTQUERYPARAMETERS

@RESTQUERYPARAM{collection,string,required}
The collection name.

@RESTBODYPARAM{type,string,required,string}
must be equal to *"zkd"*.

@RESTBODYPARAM{fields,array,required,string}
an array of attribute names used for each dimension. Array expansions are not allowed.

@RESTBODYPARAM{unique,boolean,required,}
if *true*, then create a unique index.

@RESTBODYPARAM{fieldValueTypes,string,required,string}
must be equal to *"double"*. Currently only doubles are supported as values.

@RESTDESCRIPTION
Creates a multi-dimensional index for the collection *collection-name*, if
it does not already exist. The call expects an object containing the index
details.

@RESTRETURNCODES

@RESTRETURNCODE{200}
If the index already exists, then a *HTTP 200* is
returned.

@RESTRETURNCODE{201}
If the index does not already exist and could be created, then a *HTTP 201*
is returned.

@RESTRETURNCODE{404}
If the *collection-name* is unknown, then a *HTTP 404* is returned.

@RESTRETURNCODE{400}
If the index definition is invalid, then a *HTTP 400* is returned.

@EXAMPLES

Creating a multi-dimensional index

@EXAMPLE_ARANGOSH_RUN{RestIndexCreateNewFulltext}
var cn = "intervals";
db._drop(cn);
db._create(cn);

var url = "/_api/index?collection=" + cn;
var body = {
type: "zkd",
fields: [ "from", "to" ],
fieldValueTypes: "double"
};

var response = logCurlRequest('POST', url, body);

assert(response.code === 201);

logJsonResponse(response);
~ db._drop(cn);
@END_EXAMPLE_ARANGOSH_RUN
@endDocuBlock
3 changes: 3 additions & 0 deletions arangod/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -836,6 +836,8 @@ endif()
include(ClusterEngine/CMakeLists.txt)
include(RocksDBEngine/CMakeLists.txt)

add_library(arango_zkd STATIC Zkd/ZkdHelper.cpp)

add_library(arango_restart_action STATIC
RestServer/RestartAction.cpp
)
Expand Down Expand Up @@ -919,6 +921,7 @@ add_library(arango_rocksdb STATIC
${ROCKSDB_SOURCES}
${ADDITIONAL_LIB_ARANGO_ROCKSDB_SOURCES}
)
target_link_libraries(arango_rocksdb arango_zkd)

add_dependencies(arango_rocksdb snappy)

Expand Down
13 changes: 12 additions & 1 deletion arangod/ClusterEngine/ClusterIndex.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@
#include "ClusterIndex.h"
#include "Indexes/SimpleAttributeEqualityMatcher.h"
#include "Indexes/SortedIndexAttributeMatcher.h"
#include "RocksDBEngine/RocksDBZkdIndex.h"
#include "StorageEngine/EngineSelectorFeature.h"
#include "VocBase/LogicalCollection.h"
#include "VocBase/ticks.h"
Expand Down Expand Up @@ -277,6 +278,9 @@ Index::FilterCosts ClusterIndex::supportsFilterCondition(
return Index::supportsFilterCondition(allIndexes, node, reference, itemsInIndex);
}

case TRI_IDX_TYPE_ZKD_INDEX:
return zkd::supportsFilterCondition(this, allIndexes, node, reference, itemsInIndex);

case TRI_IDX_TYPE_UNKNOWN:
break;
}
Expand Down Expand Up @@ -315,6 +319,10 @@ Index::SortCosts ClusterIndex::supportsSortCondition(arangodb::aql::SortConditio
break;
}

case TRI_IDX_TYPE_ZKD_INDEX:
// Sorting not supported
return Index::SortCosts{};

case TRI_IDX_TYPE_UNKNOWN:
break;
}
Expand Down Expand Up @@ -359,7 +367,10 @@ aql::AstNode* ClusterIndex::specializeCondition(aql::AstNode* node,
case TRI_IDX_TYPE_PERSISTENT_INDEX: {
return SortedIndexAttributeMatcher::specializeCondition(this, node, reference);
}


case TRI_IDX_TYPE_ZKD_INDEX:
return zkd::specializeCondition(this, node, reference);

case TRI_IDX_TYPE_UNKNOWN:
break;
}
Expand Down
2 changes: 2 additions & 0 deletions arangod/ClusterEngine/ClusterIndexFactory.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -166,6 +166,7 @@ void ClusterIndexFactory::linkIndexFactories(application_features::ApplicationSe
static const PrimaryIndexFactory primaryIndexFactory(server, "primary");
static const DefaultIndexFactory skiplistIndexFactory(server, "skiplist");
static const DefaultIndexFactory ttlIndexFactory(server, "ttl");
static const DefaultIndexFactory zkdIndexFactory(server, "zkd");

factory.emplace(edgeIndexFactory._type, edgeIndexFactory);
factory.emplace(fulltextIndexFactory._type, fulltextIndexFactory);
Expand All @@ -177,6 +178,7 @@ void ClusterIndexFactory::linkIndexFactories(application_features::ApplicationSe
factory.emplace(primaryIndexFactory._type, primaryIndexFactory);
factory.emplace(skiplistIndexFactory._type, skiplistIndexFactory);
factory.emplace(ttlIndexFactory._type, ttlIndexFactory);
factory.emplace(zkdIndexFactory._type, zkdIndexFactory);
}

ClusterIndexFactory::ClusterIndexFactory(application_features::ApplicationServer& server)
Expand Down
5 changes: 5 additions & 0 deletions arangod/Indexes/Index.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -332,6 +332,9 @@ Index::IndexType Index::type(char const* type, size_t len) {
if (::typeMatch(type, len, "geo2")) {
return TRI_IDX_TYPE_GEO2_INDEX;
}
if (::typeMatch(type, len, "zkd")) {
return TRI_IDX_TYPE_ZKD_INDEX;
}
std::string const& tmp = arangodb::iresearch::DATA_SOURCE_TYPE.name();
if (::typeMatch(type, len, tmp.c_str())) {
return TRI_IDX_TYPE_IRESEARCH_LINK;
Expand Down Expand Up @@ -374,6 +377,8 @@ char const* Index::oldtypeName(Index::IndexType type) {
return arangodb::iresearch::DATA_SOURCE_TYPE.name().c_str();
case TRI_IDX_TYPE_NO_ACCESS_INDEX:
return "noaccess";
case TRI_IDX_TYPE_ZKD_INDEX:
return "zkd";
case TRI_IDX_TYPE_UNKNOWN: {
}
}
Expand Down
3 changes: 2 additions & 1 deletion arangod/Indexes/Index.h
Original file line number Diff line number Diff line change
Expand Up @@ -102,7 +102,8 @@ class Index {
TRI_IDX_TYPE_TTL_INDEX,
TRI_IDX_TYPE_PERSISTENT_INDEX,
TRI_IDX_TYPE_IRESEARCH_LINK,
TRI_IDX_TYPE_NO_ACCESS_INDEX
TRI_IDX_TYPE_NO_ACCESS_INDEX,
TRI_IDX_TYPE_ZKD_INDEX
};

/// @brief: helper struct returned by index methods that determine the costs
Expand Down
35 changes: 33 additions & 2 deletions arangod/Indexes/IndexFactory.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -301,8 +301,9 @@ std::shared_ptr<Index> IndexFactory::prepareIndexFromSlice(velocypack::Slice def

/// same for both storage engines
std::vector<std::string> IndexFactory::supportedIndexes() const {
return std::vector<std::string>{"primary", "edge", "hash", "skiplist",
"ttl", "persistent", "geo", "fulltext"};
return std::vector<std::string>{"primary", "edge", "hash",
"skiplist", "ttl", "persistent",
"geo", "fulltext", "zkd"};
}

std::unordered_map<std::string, std::string> IndexFactory::indexAliases() const {
Expand Down Expand Up @@ -581,4 +582,34 @@ Result IndexFactory::enhanceJsonIndexFulltext(VPackSlice definition,
return res;
}

/// @brief enhances the json of a zkd index
Result IndexFactory::enhanceJsonIndexZkd(VPackSlice definition,
VPackBuilder& builder, bool create) {
if (auto fieldValueTypes = definition.get("fieldValueTypes");
!fieldValueTypes.isString() || !fieldValueTypes.isEqualString("double")) {
return Result(
TRI_ERROR_BAD_PARAMETER,
"zkd index requires `fieldValueTypes` to be set to `double` - future "
"releases might lift this requirement");
}

builder.add("fieldValueTypes", VPackValue("double"));
Result res = processIndexFields(definition, builder, 1, INT_MAX, create, false);

if (res.ok()) {
if (auto isSparse = definition.get(StaticStrings::IndexSparse).isTrue(); isSparse) {
return Result(TRI_ERROR_BAD_PARAMETER,
"zkd index does not support sparse property");
}

processIndexUniqueFlag(definition, builder);

bool bck = basics::VelocyPackHelper::getBooleanValue(definition, StaticStrings::IndexInBackground,
false);
builder.add(StaticStrings::IndexInBackground, VPackValue(bck));
}

return res;
}

} // namespace arangodb
6 changes: 5 additions & 1 deletion arangod/Indexes/IndexFactory.h
Original file line number Diff line number Diff line change
Expand Up @@ -158,11 +158,15 @@ class IndexFactory {
static Result enhanceJsonIndexGeo(velocypack::Slice definition,
velocypack::Builder& builder, bool create,
int minFields, int maxFields);

/// @brief enhances the json of a fulltext index
static Result enhanceJsonIndexFulltext(velocypack::Slice definition,
velocypack::Builder& builder, bool create);

/// @brief enhances the json of a zkd index
static Result enhanceJsonIndexZkd(arangodb::velocypack::Slice definition,
arangodb::velocypack::Builder& builder, bool create);

protected:
/// @brief clear internal factory/normalizer maps
void clear();
Expand Down
5 changes: 3 additions & 2 deletions arangod/RocksDBEngine/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -72,9 +72,9 @@ set(ROCKSDB_SOURCES
RocksDBEngine/RocksDBLogValue.cpp
RocksDBEngine/RocksDBMetaCollection.cpp
RocksDBEngine/RocksDBMetadata.cpp
RocksDBEngine/RocksDBTransactionMethods.cpp
RocksDBEngine/RocksDBOptimizerRules.cpp
RocksDBEngine/RocksDBOptionFeature.cpp
RocksDBEngine/RocksDBPersistedLog.cpp
RocksDBEngine/RocksDBPrimaryIndex.cpp
RocksDBEngine/RocksDBRecoveryManager.cpp
RocksDBEngine/RocksDBReplicationCommon.cpp
Expand All @@ -90,6 +90,7 @@ set(ROCKSDB_SOURCES
RocksDBEngine/RocksDBSettingsManager.cpp
RocksDBEngine/RocksDBSyncThread.cpp
RocksDBEngine/RocksDBTransactionCollection.cpp
RocksDBEngine/RocksDBTransactionMethods.cpp
RocksDBEngine/RocksDBTransactionState.cpp
RocksDBEngine/RocksDBTtlIndex.cpp
RocksDBEngine/RocksDBTypes.cpp
Expand All @@ -98,6 +99,6 @@ set(ROCKSDB_SOURCES
RocksDBEngine/RocksDBVPackIndex.cpp
RocksDBEngine/RocksDBValue.cpp
RocksDBEngine/RocksDBWalAccess.cpp
RocksDBEngine/RocksDBPersistedLog.cpp
RocksDBEngine/RocksDBZkdIndex.cpp
)
set(ROCKSDB_SOURCES ${ROCKSDB_SOURCES} PARENT_SCOPE)
18 changes: 10 additions & 8 deletions arangod/RocksDBEngine/RocksDBColumnFamilyManager.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -30,17 +30,19 @@

namespace arangodb {

std::array<char const*, arangodb::RocksDBColumnFamilyManager::numberOfColumnFamilies>
RocksDBColumnFamilyManager::_internalNames = {"default", "Documents",
"PrimaryIndex", "EdgeIndex",
"VPackIndex", "GeoIndex",
"FulltextIndex", "ReplicatedLogs"};
std::array<char const*, arangodb::RocksDBColumnFamilyManager::numberOfColumnFamilies> RocksDBColumnFamilyManager::_internalNames =
{"default", "Documents", "PrimaryIndex",
"EdgeIndex", "VPackIndex", "GeoIndex",
"FulltextIndex", "ReplicatedLogs", "ZkdIndex"};

std::array<char const*, arangodb::RocksDBColumnFamilyManager::numberOfColumnFamilies> RocksDBColumnFamilyManager::_externalNames =
{"definitions", "documents", "primary", "edge", "vpack", "geo", "fulltext", "replicated-logs"};
{"definitions", "documents", "primary", "edge", "vpack",
"geo", "fulltext", "replicated-logs", "zkd"};

std::array<rocksdb::ColumnFamilyHandle*, RocksDBColumnFamilyManager::numberOfColumnFamilies>
RocksDBColumnFamilyManager::_handles = {nullptr, nullptr, nullptr, nullptr,
nullptr, nullptr, nullptr, nullptr};
RocksDBColumnFamilyManager::_handles = {nullptr, nullptr, nullptr,
nullptr, nullptr, nullptr,
nullptr, nullptr, nullptr};

rocksdb::ColumnFamilyHandle* RocksDBColumnFamilyManager::_defaultHandle = nullptr;

Expand Down
3 changes: 2 additions & 1 deletion arangod/RocksDBEngine/RocksDBColumnFamilyManager.h
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ struct RocksDBColumnFamilyManager {
GeoIndex = 5,
FulltextIndex = 6,
ReplicatedLogs = 7,
ZkdIndex = 8,

Invalid = 1024 // special placeholder
};
Expand All @@ -56,7 +57,7 @@ struct RocksDBColumnFamilyManager {
};

static constexpr size_t minNumberOfColumnFamilies = 7;
static constexpr size_t numberOfColumnFamilies = 8;
static constexpr size_t numberOfColumnFamilies = 9;

static void initialize();

Expand Down
10 changes: 7 additions & 3 deletions arangod/RocksDBEngine/RocksDBEngine.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -707,7 +707,7 @@ void RocksDBEngine::start() {
addFamily(RocksDBColumnFamilyManager::Family::GeoIndex);
addFamily(RocksDBColumnFamilyManager::Family::FulltextIndex);
addFamily(RocksDBColumnFamilyManager::Family::ReplicatedLogs);

addFamily(RocksDBColumnFamilyManager::Family::ZkdIndex);

size_t const minNumberOfColumnFamilies = RocksDBColumnFamilyManager::minNumberOfColumnFamilies;
bool dbExisted = false;
Expand Down Expand Up @@ -746,15 +746,17 @@ void RocksDBEngine::start() {
<< "found existing column families: " << names;
auto const replicatedLogsName = RocksDBColumnFamilyManager::name(
RocksDBColumnFamilyManager::Family::ReplicatedLogs);
auto const zkdIndexName = RocksDBColumnFamilyManager::name(
RocksDBColumnFamilyManager::Family::ReplicatedLogs);

for (auto const& it : cfFamilies) {
auto it2 = std::find(existingColumnFamilies.begin(),
existingColumnFamilies.end(), it.name);
if (it2 == existingColumnFamilies.end()) {

if (it.name == replicatedLogsName) {
if (it.name == replicatedLogsName || it.name == zkdIndexName) {
LOG_TOPIC("293c3", INFO, Logger::STARTUP)
<< "column family " << replicatedLogsName
<< "column family " << it.name
<< " is missing and will be created.";
continue;
}
Expand Down Expand Up @@ -836,6 +838,8 @@ void RocksDBEngine::start() {
cfHandles[6]);
RocksDBColumnFamilyManager::set(RocksDBColumnFamilyManager::Family::ReplicatedLogs,
cfHandles[7]);
RocksDBColumnFamilyManager::set(RocksDBColumnFamilyManager::Family::ZkdIndex,
cfHandles[8]);
TRI_ASSERT(RocksDBColumnFamilyManager::get(RocksDBColumnFamilyManager::Family::Definitions)
->GetID() == 0);

Expand Down
2 changes: 2 additions & 0 deletions arangod/RocksDBEngine/RocksDBIndex.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -378,6 +378,8 @@ RocksDBKeyBounds RocksDBIndex::getBounds(Index::IndexType type, uint64_t objectI
return RocksDBKeyBounds::GeoIndex(objectId);
case RocksDBIndex::TRI_IDX_TYPE_IRESEARCH_LINK:
return RocksDBKeyBounds::DatabaseViews(objectId);
case RocksDBIndex::TRI_IDX_TYPE_ZKD_INDEX:
return RocksDBKeyBounds::ZkdIndex(objectId);
case RocksDBIndex::TRI_IDX_TYPE_UNKNOWN:
default:
THROW_ARANGO_EXCEPTION(TRI_ERROR_NOT_IMPLEMENTED);
Expand Down
Loading
0