10000 Feature/iresearch test latest format (#14542) · arangodb/arangodb@be8b157 · GitHub
[go: up one dir, main page]

Skip to content

Commit be8b157

Browse files
Dronplanegnusi
andauthored
Feature/iresearch test latest format (#14542)
* upgrade iresearch * update vpack * adjusting ArangoSearch internals * wip * wip * add ugly kludge * more ugly kludge * remove redundant vpack registartions for analyzer * flags usage fixes * wip * build fixes * wip * fix Geo equals * update from upstream * rework field feature storage * fix some tests * fix tests * do not log error at shutdown * wip * update iresearch * fix tests * add js tests for newly created analyzers * add more tests * update iresearch from upstream * attempt to fix msvc/mac build * wip * wip * wip * add missing header * fix compilation * fix build * fix msvc build * wip * wip * wip * wip * wip * wip * update iresearch from upstream * fix test * adjust js tests * update iresearch changelog Co-authored-by: Andrey Abramov <andrey@arangodb.com>
1 parent bb39d8a commit be8b157

File tree

665 files changed

+316820
-169765
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

665 files changed

+316820
-169765
lines changed

3rdParty/iresearch/.travis.yml

Lines changed: 0 additions & 450 deletions
This file was deleted.

3rdParty/iresearch/CHANGELOG

Lines changed: 37 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,4 +1,40 @@
1-
v1.0 (2021-03-30)
1+
master
2+
-------------------------
3+
4+
v1.1 (2021-08-18)
5+
-------------------------
6+
7+
* Add new analyzer `collation` capable of producing tokens honoring language
8+
specific sorting.
9+
10+
* Add new feature `iresearch::norm2` representing fixed length norm value.
11+
12+
* Split field features into built-in set of index features and pluggable field features.
13+
14+
* Add ability to specify target prefix for `by_edit_distance` query.
15+
16+
* Fix possible crash in `disjunction` inside `visit` of the already exhausted iterator.
17+
18+
* Fix block boundaries evaluation in `block_iterator` for `memory_directory` and `fs_directory`.
19+
20+
* Reduce number of heap allocations in `numeric_token_stream`.
21+
22+
* Replace RapidJSON with Velocypack for analyzers and scorers serialization and deserialization
23+
24+
* Add new `1_4` segment format utilizing new columnstore implementation.
25+
26+
* Add new columnstore implementation based on sparse bitset format.
27+
28+
* Add random access functionality to `data_input` for both regular reads and
29+
direct buffer access.
30+
31+
* Add `sparse_bitset_writer`/`sparse_bitset_iterator`, a fast and efficient on-disk
32+
format for storing sparse bit sets.
33+
34+
* Add a set of SIMD-based utils for encoding.
35+
36+
37+
v1.0 (2021-06-14)
238
-------------------------
339

440
Initial release of IResearch library

3rdParty/iresearch/CMakeLists.txt

Lines changed: 17 additions & 32 deletions
Original file line numberDiff line numberDiff line change
@@ -87,9 +87,8 @@ option(USE_PYRESEARCH "Build iresearch python bridge" OFF)
8787
option(USE_VALGRIND "Use workarounds to avoid false positives in valgrind" OFF)
8888
option(USE_SIMDCOMP "Use architecture specific low-level optimizations" OFF)
8989
option(USE_CCACHE "Use CCACHE if present" ON)
90-
option(USE_MICRO_BENCHMARCH "Build micro-benchmark project" OFF)
91-
92-
set(SUPPRESS_EXTERNAL_WARNINGS OFF CACHE INTERNAL "Suppress warnings originating in 3rd party code.")
90+
option(USE_MICROBENCH "Build micro-benchmark project" OFF)
91+
option(SUPPRESS_EXTERNAL_WARNINGS "Suppress warnings originating in 3rd party code" ON)
9392

9493
add_option_gprof(FALSE)
9594

@@ -103,8 +102,14 @@ if (MSVC)
103102
# MSVC2017.8 requires the following define
104103
add_definitions(-D_ENABLE_EXTENDED_ALIGNED_STORAGE)
105104

105+
add_definitions(-D_SILENCE_ALL_CXX17_DEPRECATION_WARNINGS)
106+
add_definitions(-D_CRT_SECURE_NO_WARNINGS)
107+
106108
# min/max macros interferes with our codebase
107109
add_definitions(-DNOMINMAX)
110+
111+
# we don't want obvious warning
112+
add_compile_options(/wd4324)
108113

109114
if (MSVC_BUILD_THREADS)
110115
set(CMAKE_C_FLAGS "/MP${MSVC_BUILD_THREADS} ${CMAKE_C_FLAGS}")
@@ -165,12 +170,10 @@ endif()
165170
### setup platform dependent optimizations
166171
################################################################################
167172

168-
if (USE_SIMDCOMP)
169-
include(OptimizeForArchitecture)
170-
OptimizeForArchitecture()
173+
include(OptimizeForArchitecture)
174+
OptimizeForArchitecture()
171175

172-
add_definitions(${Vc_ARCHITECTURE_FLAGS})
173-
endif()
176+
add_definitions(${Vc_ARCHITECTURE_FLAGS})
174177

175178
################################################################################
176179
### find 3rd party libraries
@@ -192,10 +195,7 @@ endif()
192195
# set sanitizers
193196
find_package(Sanitizers)
194197

195-
#find BFD
196-
find_package(BFD
197-
#OPTIONAL
198-
)
198+
find_package(BFD) # optional
199199

200200
if (BFD_FOUND)
201201
add_definitions(-DUSE_LIBBFD)
@@ -206,25 +206,10 @@ else()
206206
set(BFD_SHARED_LIB_RESOURCES "")
207207
endif()
208208

209-
# find LZ4
210-
find_package(Lz4
211-
REQUIRED
212-
)
213-
214-
# find ICU
215-
find_package(ICU
216-
REQUIRED
217-
)
218-
219-
# find Snowball
220-
find_package(Snowball
221-
REQUIRED
222-
)
223-
224-
# find Unwind
225-
find_package(Unwind
226-
#OPTIONAL
227-
)
209+
find_package(Lz4 REQUIRED)
210+
find_package(ICU REQUIRED)
211+
find_package(Snowball REQUIRED)
212+
find_package(Unwind) # optional
228213

229214
if (Unwind_FOUND)
230215
add_definitions(-DUSE_LIBUNWIND)
@@ -340,7 +325,7 @@ if (USE_PVS_STUDIO)
340325
)
341326
endif()
342327

343-
if (USE_MICRO_BENCHMARCH)
328+
if (USE_MICROBENCH)
344329
add_subdirectory(microbench)
345330
endif()
346331

3rdParty/iresearch/README.md

Lines changed: 15 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
-->
1111

1212
# IResearch search engine
13-
### Version 1.0
13+
### Version 1.1
1414

1515
## Table of contents
1616
- [Overview](#overview)
@@ -59,16 +59,17 @@ A document is actually a collection of indexed/stored fields.
5959
In order to be processed each field should satisfy at least `IndexedField` or `StoredField` concept.
6060

6161
#### IndexedField concept
62-
For type `T` to be `IndexedField`, the following conditions have to be satisfied for an object m of type `T`:
62+
For type `T` to be `IndexedField`, the following conditions have to be satisfied for an object `m` of type `T`:
6363

6464
|Expression|Requires|Effects|
6565
|----|----|----|
6666
|`m.name()`|The output type must be convertible to `iresearch::string_ref`|A value uses as a key name.|
6767
|`m.get_tokens()`|The output type must be convertible to `iresearch::token_stream*`|A token stream uses for populating in invert procedure. If value is `nullptr` field is treated as non-indexed.|
68-
|`m.features()`|The output type must be convertible to `const iresearch::flags&`|A set of features requested for evaluation during indexing. E.g. it may contain request of processing positions and frequencies. Later the evaluated information can be used during querying.|
68+
|`m.index_features()`|The output type must be implicitly convertible to `iresearch::IndexFeatures`|A set of features requested for evaluation during indexing. E.g. it may contain request of processing positions and frequencies. Later the evaluated information can be used during querying and scoring.|
69+
|`m.features()`|The output type must be convertible to `const iresearch::flags&`|A set of user supplied features to be associated with a field. E.g. it may contain request of storing field norms. Later the stored information can be used during querying and scoring.|
6970

7071
#### StoredField concept
71-
For type `T` to be `StoredField`, the following conditions have to be satisfied for an object m of type `T`:
72+
For type `T` to be `StoredField`, the following conditions have to be satisfied for an object `m` of type `T`:
7273

7374
|Expression|Requires|Effects|
7475
|----|----|----|
@@ -218,7 +219,7 @@ via the distributions' package manager: libstemmer
218219
219220
```bash
220221
git clone https://github.com/snowballstem/snowball.git
221-
git reset --hard 5137019d68befd633ce8b1cd48065f41e77ed43e
222+
git reset --hard adc028f3ae646623bda2f99191fe9dc3287a909b
222223
mkdir build && cd build
223224
set PATH=%PATH%;<path-to>/build/Debug
224225
cmake -DENABLE_STATIC=OFF -DNO_SHARED=OFF -g "Visual studio 12" -Ax64 ..
@@ -239,6 +240,10 @@ point SNOWBALL_ROOT at the source directory to build together with IResearch
239240
SNOWBALL_ROOT=<path-to-snowball>
240241
```
241242

243+
### [VelocyPack](https://github.com/arangodb/velocypack)
244+
245+
point VPACK_ROOT at the source directory to build together with IResearch
246+
242247
### [BFD](https://sourceware.org/binutils/) <optional>
243248

244249
#### install (*nix)
@@ -435,6 +440,9 @@ matching of words from languages not supported by 'snowball' are done verbatim
435440
### [Google Test](https://code.google.com/p/googletest)
436441
used for writing tests for the IResearch library
437442

443+
### [VelocyPack](https://github.com/arangodb/velocypack)
444+
used for JSON serialization/deserialization
445+
438446
### Stopword list
439447
used by analysis::text_analyzer for filtering out noise words that should not impact text ranging
440448
e.g. for 'en' these are usualy 'a', 'the', etc...
@@ -458,6 +466,7 @@ the first whitespace is ignored), in the directory corresponding to its language
458466
|iresearch::by_range|for filtering of values within a given range, with the possibility of specifying open/closed ranges
459467
|iresearch::by_same_position|for term-insertion-order sensitive filtering of exact values
460468
|iresearch::by_term|for filtering of exact values
469+
|iresearch::by_terms|for filtering of exact values by a set of specified terms
461470
|iresearch::by_wildcard|for filtering of values based on matching pattern
462471
|iresearch::And|boolean conjunction of multiple filters, influencing document ranks/scores as appropriate
463472
|iresearch::Or|boolean disjunction of multiple filters, influencing document ranks/scores as appropriate (including "minimum match" functionality)
@@ -578,7 +587,7 @@ The following grammar is currently defined via Bison (the root is <query>):
578587
- Apple Clang: 9
579588

580589
## License
581-
Copyright (c) 2017-2020 ArangoDB GmbH
590+
Copyright (c) 2017-2021 ArangoDB GmbH
582591

583592
Copyright (c) 2016-2017 EMC Corporation
584593

3rdParty/iresearch/THIRD_PARTY_README.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -81,9 +81,9 @@ number of components produced by third parties
8181
- Copyright: Daniel Povey <dpovey@gmail.com>
8282
- License: [Apache 2.0](http://www.apache.org/licenses/LICENSE-2.0)
8383
- How it's used: a set of extensions for OpenFST library
84-
14. Title: Unicorn Library
85-
- Copyright: Ross Smith
86-
- License: [MIT license](https://opensource.org/licenses/MIT)
84+
14. Title: Boost::Text
85+
- Copyright: Zachary Laine
86+
- License: [Boost Software License, Version 1.0](http://www.boost.org/LICENSE_1_0.txt)
8787
- How it's used: Word segmentation rules implementation for
8888
segmentation analyzer
8989
15. Title: Velocypack Library

0 commit comments

Comments
 (0)
0