8000 Feature/internal issue #672 by Dronplane · Pull Request #11370 · arangodb/arangodb · GitHub
[go: up one dir, main page]

Skip to content

Feature/internal issue #672 #11370

< 8000 /div>
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 39 commits into from
Apr 3, 2020
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
Show all changes
39 commits
Select commit Hold shift + click to select a range
d41bf1b
Tests now passes
Dronplane Mar 25, 2020
6ab67ce
More tests
Dronplane Mar 25, 2020
cd77866
Merge branch 'devel' into feature/internal-issue-#672
Dronplane Mar 25, 2020
c0ee917
Added compression settings
Dronplane Mar 25, 2020
d988302
Fixed storage compression settings
Dronplane Mar 25, 2020
9c7e874
reworked compression setting
Dronplane Mar 26, 2020
e431072
added mock compressor
Dronplane Mar 26, 2020
2cb1112
Merge branch 'devel' into feature/internal-issue-#672
Dronplane Mar 26, 2020
b4daac1
fixed linking
Dronplane Mar 27, 2020
63f0fac
added primarySortCompression
Dronplane Mar 29, 2020
61975a8
Merge branch 'devel' into feature/internal-issue-#672
Dronplane Mar 29, 2020
a92ce3b
Added tests
Dronplane Mar 30, 2020
5757c47
Merge branch 'devel' into feature/internal-issue-#672
Dronplane Mar 30, 2020
ffa1248
fix tests for mac
Dronplane Mar 30, 2020
79561da
added primarySortCompression test
Dronplane Mar 30, 2020
e0b220f
Merge branch 'devel' into feature/internal-issue-#672
Dronplane Mar 31, 2020
62ea27b
added primarySortCompression and storedValues compression to js tests.
Dronplane Apr 1, 2020
e6a390b
more tests
Dronplane Apr 1, 2020
8000
3b60ac0
Merge branch 'devel' into feature/internal-issue-#672
Dronplane Apr 1, 2020
80583ba
jslint fixes
Dronplane Apr 1, 2020
8a36791
code cleanup. Jaccard function fix for empty arrays
Dronplane Apr 1, 2020
b27ab6e
Update CHANGELOG
Dronplane Apr 1, 2020
e3f7746
Code cleanup. More tests
Dronplane Apr 1, 2020
8e12b0d
test fixes
Dronplane Apr 1, 2020
8c473c0
fixed bug
Dronplane Apr 1, 2020
373a884
test
Dronplane Apr 1, 2020
91d2f42
Merge branch 'devel' into feature/internal-issue-#672
Dronplane Apr 1, 2020
93dd5f8
Merge branch 'devel' into feature/internal-issue-#672
Dronplane Apr 2, 2020
5c9c0f5
adressed review comments
Dronplane Apr 2, 2020
f14afc2
Merge branch 'devel' into feature/internal-issue-#672
Dronplane Apr 2, 2020
6429477
Fix after merge
Dronplane Apr 2, 2020
8edfb98
Merge branch 'devel' into feature/internal-issue-#672
Dronplane Apr 2, 2020
975d709
Merge branch 'devel' into feature/internal-issue-#672
Dronplane Apr 2, 2020
693cd26
fix build
Dronplane Apr 2, 2020
a8a3dd5
cleanup
Dronplane Apr 2, 2020
523072b
cleanup
Dronplane Apr 2, 2020
da4c6f8
fixed backslash
Dronplane Apr 3, 2020
eafd1cf
fix
Dronplane Apr 3, 2020
022ee7a
fix typo
Dronplane Apr 3, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
added primarySortCompression
  • Loading branch information
Dronplane committed Mar 29, 2020
commit 63f0facf7ac47e55124d8a3f70c2385f5e8bc904
1 change: 1 addition & 0 deletions arangod/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ add_library(arango_iresearch
IResearch/Containers.cpp IResearch/Containers.h
IResearch/IResearchAnalyzerFeature.cpp IResearch/IResearchAnalyzerFeature.h
IResearch/IResearchCommon.cpp IResearch/IResearchCommon.h
IResearch/IResearchCompression.cpp IResearch/IResearchCompression.h
IResearch/IResearchKludge.cpp IResearch/IResearchKludge.h
IResearch/IResearchLink.cpp IResearch/IResearchLink.h
IResearch/IResearchLinkCoordinator.cpp IResearch/IResearchLinkCoordinator.h
Expand Down
64 changes: 64 additions & 0 deletions arangod/IResearch/IResearchCompression.cpp
Original file line number Diff line number Diff line change
@@ -0,0 +1,64 @@
////////////////////////////////////////////////////////////////////////////////
/// DISCLAIMER
///
/// Copyright 2020 ArangoDB GmbH, Cologne, Germany
///
/// Licensed under the Apache License, Version 2.0 (the "License");
/// you may not use this file except in compliance with the License.
/// You may obtain a copy of the License at
///
/// http://www.apache.org/licenses/LICENSE-2.0
///
/// Unless required by applicable law or agreed to in writing, software
/// distributed under the License is distributed on an "AS IS" BASIS,
/// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
/// See the License for the specific language governing permissions and
/// limitations under the License.
///
/// Copyright holder is ArangoDB GmbH, Cologne, Germany
///
/// @author Andrei Lobov
////////////////////////////////////////////////////////////////////////////////

#include "IResearchCompression.h"
#include "Basics/debugging.h"
#include <unordered_map>

namespace {
const std::unordered_map<
std::string,
arangodb::iresearch::ColumnCompression> COMPRESSION_CONVERT_MAP = {
{ "lz4", arangodb::iresearch::ColumnCompression::LZ4 },
{ "none", arangodb::iresearch::ColumnCompression::NONE },
#ifdef ARANGODB_USE_GOOGLE_TESTS
{ "test", arangodb::iresearch::ColumnCompression::TEST },
#endif
};
}


namespace arangodb {
namespace iresearch {

irs::string_ref columnCompressionToString(ColumnCompression c) {
for (auto const&it : COMPRESSION_CONVERT_MAP) {
if (it.second == c) {
return it.first;
}
}
TRI_ASSERT(false);
return irs::string_ref::NIL;
}

ColumnCompression columnCompressionFromString(irs::string_ref const& c) {
TRI_ASSERT(!c.null());
auto it = COMPRESSION_CONVERT_MAP.find(c);
if (it != COMPRESSION_CONVERT_MAP.end()) {
return it->second;
}
return ColumnCompression::INVALID;
}

} // namespace iresearch
} // namespace arangodb

44 changes: 44 additions & 0 deletions arangod/IResearch/IResearchCompression.h
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
////////////////////////////////////////////////////////////////////////////////
/// DISCLAIMER
///
/// Copyright 2020 ArangoDB GmbH, Cologne, Germany
///
/// Licensed under the Apache License, Version 2.0 (the "License");
/// you may not use this file except in compliance with the License.
/// You may obtain a copy of the License at
///
/// http://www.apache.org/licenses/LICENSE-2.0
///
/// Unless required by applicable law or agreed to in writing, software
/// distributed under the License is distributed on an "AS IS" BASIS,
/// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
/// See the License for the specific language governing permissions and
/// limitations under the License.
///
/// Copyright holder is ArangoDB GmbH, Cologne, Germany
///
/// @author Andrei Lobov
////////////////////////////////////////////////////////////////////////////////
#ifndef ARANGOD_IRESEARCH__IRESEARCH_COMPRESSION_H
#define ARANGOD_IRESEARCH__IRESEARCH_COMPRESSION_H 1

#include "utils/string.hpp"

namespace arangodb {
namespace iresearch {

enum class ColumnCompression {
INVALID = 0,
NONE = 1,
LZ4
#ifdef ARANGODB_USE_GOOGLE_TESTS
, TEST = 999
#endif
};

irs::string_ref columnCompressionToString(ColumnCompression c);
ColumnCompression columnCompressionFromString(irs::string_ref const& c);
} // iresearch
} // arangodb

#endif // ARANGOD_IRESEARCH__IRESEARCH_COMPRESSION_H
2 changes: 1 addition & 1 deletion arangod/IResearch/IResearchFilterFactory.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2742,7 +2742,7 @@ arangodb::Result fromFuncNgramMatch(
kludge::mangleStringField(name, analyzerPool);

auto& ngramFilter = filter->add<irs::by_ngram_similarity>();
ngramFilter.field(std::move(name)).threshold(threshold).boost(filterCtx.boost);;
ngramFilter.field(std::move(name)).threshold((float_t)threshold).boost(filterCtx.boost);;

analyzer->reset(matchValue);
irs::term_attribute const& token = *analyzer->attributes().get<irs::term_attribute>();
Expand Down
2 changes: 2 additions & 0 deletions arangod/IResearch/IResearchKludge.h
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,8 @@ namespace arangodb {
namespace iresearch {
namespace kludge {

const std::string primarySortColumnName{ "" };

void mangleType(std::string& name);
void mangleAnalyzer(std::string& name);

Expand Down
74 changes: 45 additions & 29 deletions arangod/IResearch/IResearchLink.cpp
F421
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@
#include "IResearch/IResearchView.h"
#include "IResearch/IResearchViewCoordinator.h"
#include "IResearch/VelocyPackHelper.h"
#include "IResearch/IResearchKludge.h"
#include "MMFiles/MMFilesCollection.h"
#include "RestServer/DatabaseFeature.h"
#include "RestServer/DatabasePathFeature.h"
Expand Down Expand Up @@ -264,6 +265,31 @@ bool readTick(irs::bytes_ref const& payload, TRI_voc_tick_t& tick) noexcept {
return true;
}

irs::compression::type_id const& decodeCompression(
arangodb::iresearch::ColumnCompression compression) {
irs::compression::type_id const* type{ nullptr };
switch (compression) {
case arangodb::iresearch::ColumnCompression::LZ4:
type = &irs::compression::lz4::type();
break;
case arangodb::iresearch::ColumnCompression::NONE:
type = &irs::compression::raw::type();
break;
#ifdef ARANGODB_USE_GOOGLE_TESTS
case arangodb::iresearch::ColumnCompression::TEST:
type = &irs::compression::mock::test_compressor::type();
break;
#endif
default:
TRI_ASSERT(false);
// fallback to default on runtime
type = &irs::compression::lz4::type();
break;
}
TRI_ASSERT(type != nullptr);
return *type;
}

} // namespace

namespace arangodb {
Expand Down Expand Up @@ -840,7 +866,7 @@ Result IResearchLink::init(
auto& vocbase = _collection.vocbase();
bool const sorted = !meta._sort.empty();
auto const& storedValuesColumns = meta._storedValues.columns();

auto primarySortCompression = meta._sortCompression;
if (ServerState::instance()->isCoordinator()) { // coordinator link
if (!vocbase.server().hasFeature<arangodb::ClusterFeature>()) {
return {
Expand Down Expand Up @@ -897,7 +923,7 @@ Result IResearchLink::init(
if (!clusterWideLink) {
// prepare data-store which can then update options
// via the IResearchView::link(...) call
auto const res = initDataStore(initCallback, sorted, storedValuesColumns);
auto const res = initDataStore(initCallback, sorted, storedValuesColumns, primarySortCompression);

if (!res.ok()) {
return res;
Expand Down Expand Up @@ -967,7 +993,7 @@ Result IResearchLink::init(
} else if (ServerState::instance()->isSingleServer()) { // single-server link
// prepare data-store which can then update options
// via the IResearchView::link(...) call
auto const res = initDataStore(initCallback, sorted, storedValuesColumns);
auto const res = initDataStore(initCallback, sorted, storedValuesColumns, primarySortCompression);

if (!res.ok()) {
return res;
Expand Down Expand Up @@ -1018,7 +1044,8 @@ Result IResearchLink::init(

Result IResearchLink::initDataStore(
InitCallback const& initCallback, bool sorted,
std::vector<IResearchViewStoredValues::StoredColumn> const& storedColumns) {
std::vector<IResearchViewStoredValues::StoredColumn> const& storedColumns,
ColumnCompression primarySortCompression) {
_asyncTerminate.store(true); // mark long-running async jobs for terminatation

if (_asyncFeature) {
Expand Down Expand Up @@ -1144,12 +1171,12 @@ Result IResearchLink::initDataStore(
options.lock_repository = false; // do not lock index, ArangoDB has its own lock
options.comparator = sorted ? &_comparer : nullptr; // set comparator if requested

bool nonDefaultCompressions = false; // storedValues uses no default compression method
bool nonDefaultCompressions = primarySortCompression != ColumnCompression::LZ4; // storedValues uses no default compression method
std::map<std::string, // we must store string as storedColumns could be temporary
const irs::compression::type_id&> compressionMap;
if (!storedColumns.empty()) {
if (!nonDefaultCompressions && !storedColumns.empty()) {
for (auto c : storedColumns) {
if (IResearchViewStoredValues::ColumnCompression::LZ4 != c.compression) {
if (ColumnCompression::LZ4 != c.compression) {
nonDefaultCompressions = true;
break;
}
Expand All @@ -1159,34 +1186,21 @@ Result IResearchLink::initDataStore(
// we will need compression map to handle compressions
// on insert
for (auto c : storedColumns) {
irs::compression::type_id const* compression{ nullptr };
switch (c.compression) {
case IResearchViewStoredValues::ColumnCompression::LZ4:
compression = irs::compression::lz4::type();
break;
case IResearchViewStoredValues::ColumnCompression::NONE:
compression = irs::compression::raw::type();
break;
#ifdef ARANGODB_USE_GOOGLE_TESTS
case IResearchViewStoredValues::ColumnCompression::TEST:
compression = irs::compression::mock::test_compressor::type();
break;
#endif
default:
TRI_ASSERT(false);
// fallback to default on runtime
compression = irs::compression::lz4::type();
break;
}
compressionMap.emplace(c.name, *compression);
irs::compression::type_id const& compression =
decodeCompression(c.compression);
compressionMap.emplace(c.name, compression);
}
compressionMap.emplace(iresearch::kludge::primarySortColumnName,
decodeCompression(primarySortCompression));
}
// setup columnstore compression/encryption if requested by storage engine
auto const encrypt = (nullptr != irs::get_encryption(_dataStore._directory->attributes()));
if (encrypt) {
if (nonDefaultCompressions) {
options.column_info = [compressionMap](const irs::string_ref& name) -> irs::column_info {
auto compress = compressionMap.find(name);
auto compress = name.null() ?
compressionMap.find(iresearch::kludge::primarySortColumnName) :
compressionMap.find(name);
if (compress != compressionMap.end()) {
// do not waste resources to encrypt primary key column
return { compress->second, {}, DocumentPrimaryKey::PK() != name };
Expand All @@ -1203,7 +1217,9 @@ Result IResearchLink::initDataStore(
} else {
if (nonDefaultCompressions) {
options.column_info = [compressionMap](const irs::string_ref& name) -> irs::column_info {
auto compress = compressionMap.find(name);
auto compress = name.null() ?
compressionMap.find(iresearch::kludge::primarySortColumnName) :
compressionMap.find(name);
if (compress != compressionMap.end()) {
// do not waste resources to encrypt primary key column
return { compress->second, {}, false };
Expand Down
3 changes: 2 additions & 1 deletion arangod/IResearch/IResearchLink.h
Original file line number Diff line number Diff line change
Expand Up @@ -316,7 +316,8 @@ class IResearchLink {
//////////////////////////////////////////////////////////////////////////////
Result initDataStore(
InitCallback const& initCallback, bool sorted,
std::vector<IResearchViewStoredValues::StoredColumn> const& storedColumns);
std::vector<IResearchViewStoredValues::StoredColumn> const& storedColumns,
ColumnCompression primarySortCompression);

//////////////////////////////////////////////////////////////////////////////
/// @brief set up asynchronous maintenance tasks
Expand Down
21 changes: 21 additions & 0 deletions arangod/IResearch/IResearchLinkMeta.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -551,6 +551,10 @@ bool IResearchLinkMeta::operator==(IResearchLinkMeta const& other) const noexcep
return false;
}

if (_sortCompression != other._sortCompression) {
return false;
}

return true;
}

Expand Down Expand Up @@ -600,6 +604,18 @@ bool IResearchLinkMeta::init(arangodb::application_features::ApplicationServer&
return false;
}
}
if(mask->_sort)
{
// optional sort compression
static VPackStringRef const fieldName("primarySortCompression");
auto const field = slice.get(fieldName);
mask->_sortCompression = field.isString();

if (readAnalyzerDefinition && mask->_sortCompression &&
(_sortCompression = columnCompressionFromString(getStringRef(field))) == ColumnCompression::INVALID) {
return false;
}
}

{
// clear existing definitions
Expand Down Expand Up @@ -781,6 +797,10 @@ bool IResearchLinkMeta::json(arangodb::application_features::ApplicationServer&
}
}

if (writeAnalyzerDefinition && (!mask || mask->_sortCompression)) {
addStringRef(builder, "primarySortCompression", columnCompressionToString(_sortCompression));
}

// output definitions if 'writeAnalyzerDefinition' requested and not maked
// this should be the case for the default top-most call
if (writeAnalyzerDefinition && (!mask || mask->_analyzerDefinitions)) {
Expand All @@ -792,6 +812,7 @@ bool IResearchLinkMeta::json(arangodb::application_features::ApplicationServer&
}
}


return FieldMeta::json(server, builder, ignoreEqual, defaultVocbase, mask);
}

Expand Down
8 changes: 6 additions & 2 deletions arangod/IResearch/IResearchLinkMeta.h
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,7 @@
#include "IResearchAnalyzerFeature.h"
#include "IResearchViewSort.h"
#include "IResearchViewStoredValues.h"
#include "IResearchCompression.h"

namespace arangodb {
namespace velocypack {
Expand Down Expand Up @@ -186,23 +187,26 @@ struct IResearchLinkMeta : public FieldMeta {
: FieldMeta::Mask(mask),
_analyzerDefinitions(mask),
_sort(mask),
_storedValues(mask) {
_storedValues(mask),
_sortCompression(mask) {
}

bool _analyzerDefinitions;
bool _sort;
bool _storedValues;
bool _sortCompression;
};

std::set<AnalyzerPool::ptr, FieldMeta::AnalyzerComparer> _analyzerDefinitions;
IResearchViewSort _sort; // sort condition associated with the link
IResearchViewStoredValues _storedValues; // stored values associated with the link
ColumnCompression _sortCompression{ColumnCompression::LZ4};
// NOTE: if adding fields don't forget to modify the comparison operator !!!
// NOTE: if adding fields don't forget to modify IResearchLinkMeta::Mask !!!
// NOTE: if adding fields don't forget to modify IResearchLinkMeta::Mask constructor !!!
// NOTE: if adding fields don't forget to modify the init(...) function !!!
// NOTE: if adding fields don't forget to modify the json(...) function !!!
// NOTE: if adding fields don't forget to modify the memSize() function !!!
// NOTE: if adding fields don't forget to modify the memory() function !!!

IResearchLinkMeta();
IResearchLinkMeta(IResearchLinkMeta const& other) = default;
Expand Down
Loading
0