8000 Feature/iresearch test latest format (#14542) · arangodb/arangodb@be8b157 · GitHub
[go: up one dir, main page]

Skip to content

Commit be8b157

Browse files
Andrei Lobovgnusi
andauthored
Feature/iresearch test latest format (#14542)
* upgrade iresearch * update vpack * adjusting ArangoSearch internals * wip * wip * add ugly kludge * more ugly kludge * remove redundant vpack registartions for analyzer * flags usage fixes * wip * build fixes * wip * fix Geo equals * update from upstream * rework field feature storage * fix some tests * fix tests * do not log error at shutdown * wip * update iresearch * fix tests * add js tests for newly created analyzers * add more tests * update iresearch from upstream * attempt to fix msvc/mac build * wip * wip * wip * add missing header * fix compilation * fix build * fix msvc build * wip * wip * wip * wip * wip * wip * update iresearch from upstream * fix test * adjust js tests * update iresearch changelog Co-authored-by: Andrey Abramov <andrey@arangodb.com>
1 parent bb39d8a commit be8b157

File tree

665 files changed

+316820
-169765
lines changed
  • scripts
  • src
  • unicorn
  • microbench
  • scripts
  • tests
  • utils
  • velocypack
  • arangod
  • lib/Geo
  • tests
  • Some content is hidden

    Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

    665 files changed

    +316820
    -169765
    lines changed

    3rdParty/iresearch/.travis.yml

    Lines changed: 0 additions & 450 deletions
    This file was deleted.

    3rdParty/iresearch/CHANGELOG

    Lines changed: 37 additions & 1 deletion
    Original file line numberDiff line numberDiff line change
    @@ -1,4 +1,40 @@
    1-
    v1.0 (2021-03-30)
    1+
    master
    2+
    -------------------------
    3+
    4+
    v1.1 (2021-08-18)
    5+
    -------------------------
    6+
    7+
    * Add new analyzer `collation` capable of producing tokens honoring language
    8+
    specific sorting.
    9+
    10+
    * Add new feature `iresearch::norm2` representing fixed length norm value.
    11+
    12+
    * Split field features into built-in set of index features and pluggable field features.
    13+
    14+
    * Add ability to specify target prefix for `by_edit_distance` query.
    15+
    16+
    * Fix possible crash in `disjunction` inside `visit` of the already exhausted iterator.
    17+
    18+
    * Fix block boundaries evaluation in `block_iterator` for `memory_directory` and `fs_directory`.
    19+
    20+
    * Reduce number of heap allocations in `numeric_token_stream`.
    21+
    22+
    * Replace RapidJSON with Velocypack for analyzers and scorers serialization and deserialization
    23+
    24+
    * Add new `1_4` segment format utilizing new columnstore implementation.
    25+
    26+
    * Add new columnstore implementation based on sparse bitset format.
    27+
    28+
    * Add random access functionality to `data_input` for both regular reads and
    29+
    direct buffer access.
    30+
    31+
    * Add `sparse_bitset_writer`/`sparse_bitset_iterator`, a fast and efficient on-disk
    32+
    format for storing sparse bit sets.
    33+
    34+
    * Add a set of SIMD-based utils for encoding.
    35+
    36+
    37+
    v1.0 (2021-06-14)
    238
    -------------------------
    339

    440
    Initial release of IResearch library

    3rdParty/iresearch/CMakeLists.txt

    Lines changed: 17 additions & 32 deletions
    Original file line numberDiff line numberDiff line change
    @@ -87,9 +87,8 @@ option(USE_PYRESEARCH "Build iresearch python bridge" OFF)
    8787
    option(USE_VALGRIND "Use workarounds to avoid false positives in valgrind" OFF)
    8888
    option(USE_SIMDCOMP "Use architecture specific low-level optimizations" OFF)
    8989
    option(USE_CCACHE "Use CCACHE if present" ON)
    90-
    option(USE_MICRO_BENCHMARCH "Build micro-benchmark project" OFF)
    91-
    92-
    set(SUPPRESS_EXTERNAL_WARNINGS OFF CACHE INTERNAL "Suppress warnings originating in 3rd party code.")
    90+
    option(USE_MICROBENCH "Build micro-benchmark project" OFF)
    91+
    option(SUPPRESS_EXTERNAL_WARNINGS "Suppress warnings originating in 3rd party code" ON)
    9392

    9493
    add_option_gprof(FALSE)
    9594

    @@ -103,8 +102,14 @@ if (MSVC)
    103102
    # MSVC2017.8 requires the following define
    104103
    add_definitions(-D_ENABLE_EXTENDED_ALIGNED_STORAGE)
    105104

    105+
    add_definitions(-D_SILENCE_ALL_CXX17_DEPRECATION_WARNINGS)
    106+
    add_definitions(-D_CRT_SECURE_NO_WARNINGS)
    107+
    106108
    # min/max macros interferes with our codebase
    107109
    add_definitions(-DNOMINMAX)
    110+
    111+
    # we don't want obvious warning
    112+
    add_compile_options(/wd4324)
    108113

    109114
    if (MSVC_BUILD_THREADS)
    110115
    set(CMAKE_C_FLAGS "/MP${MSVC_BUILD_THREADS} ${CMAKE_C_FLAGS}")
    @@ -165,12 +170,10 @@ endif()
    165170
    ### setup platform dependent optimizations
    166171
    ################################################################################
    167172

    168-
    if (USE_SIMDCOMP)
    169-
    include(OptimizeForArchitecture)
    170-
    OptimizeForArchitecture()
    173+
    include(OptimizeForArchitecture)
    174+
    OptimizeForArchitecture()
    171175

    172-
    add_definitions(${Vc_ARCHITECTURE_FLAGS})
    173-
    endif()
    176+
    add_definitions(${Vc_ARCHITECTURE_FLAGS})
    174177

    175178
    ################################################################################
    176179
    ### find 3rd party libraries
    @@ -192,10 +195,7 @@ endif()
    192195
    # set sanitizers
    193196
    find_package(Sanitizers)
    194197

    195-
    #find BFD
    196-
    find_package(BFD
    197-
    #OPTIONAL
    198-
    )
    198+
    find_package(BFD) # optional
    199199

    200200
    if (BFD_FOUND)
    201201
    add_definitions(-DUSE_LIBBFD)
    @@ -206,25 +206,10 @@ else()
    206206
    set(BFD_SHARED_LIB_RESOURCES "")
    207207
    endif()
    208208

    209-
    # find LZ4
    210-
    find_package(Lz4
    211-
    REQUIRED
    212-
    )
    213-
    214-
    # find ICU
    215-
    find_package(ICU
    216-
    REQUIRED
    217-
    )
    218-
    219-
    # find Snowball
    220-
    find_package(Snowball
    221-
    REQUIRED
    222-
    )
    223-
    224-
    # find Unwind
    225-
    find_package(Unwind
    226-
    #OPTIONAL
    227-
    )
    209+
    find_package(Lz4 REQUIRED)
    210+
    find_package(ICU REQUIRED)
    211+
    find_package(Snowball REQUIRED)
    212+
    find_package(Unwind) # optional
    228213

    229214
    if (Unwind_FOUND)
    230215
    add_definitions(-DUSE_LIBUNWIND)
    @@ -340,7 +325,7 @@ if (USE_PVS_STUDIO)
    340325
    )
    341326
    endif()
    342327

    343-
    if (USE_MICRO_BENCHMARCH)
    328+
    if (USE_MICROBENCH)
    344329
    add_subdirectory(microbench)
    345330
    endif()
    346331

    3rdParty/iresearch/README.md

    Lines changed: 15 additions & 6 deletions
    Original file line numberDiff line numberDiff line change
    @@ -10,7 +10,7 @@
    1010
    -->
    1111

    1212
    # IResearch search engine
    13-
    ### Version 1.0
    13+
    ### Version 1.1
    1414

    1515
    ## Table of contents
    1616
    - [Overview](#overview)
    @@ -59,16 +59,17 @@ A document is actually a collection of indexed/stored fields.
    5959
    In order to be processed each field should satisfy at least `IndexedField` or `StoredField` concept.
    6060

    6161
    #### IndexedField concept
    62-
    For type `T` to be `IndexedField`, the following conditions have to be satisfied for an object m of type `T`:
    62+
    For type `T` to be `IndexedField`, the following conditions have to be satisfied for an object `m` of type `T`:
    6363

    6464
    |Expression|Requires|Effects|
    6565
    |----|----|----|
    6666
    |`m.name()`|The output type must be convertible to `iresearch::string_ref`|A value uses as a key name.|
    6767
    |`m.get_tokens()`|The output type must be convertible to `iresearch::token_stream*`|A token stream uses for populating in invert procedure. If value is `nullptr` field is treated as non-indexed.|
    68-
    |`m.features()`|The output type must be convertible to `const iresearch::flags&`|A set of features requested for evaluation during indexing. E.g. it may contain request of processing positions and frequencies. Later the evaluated information can be used during querying.|
    68+
    |`m.index_features()`|The output type must be implicitly convertible to `iresearch::IndexFeatures`|A set of features requested for evaluation during indexing. E.g. it may contain request of processing positions and frequencies. Later the evaluated information can be used during querying and scoring.|
    69+
    |`m.features()`|The output type must be convertible to `const iresearch::flags&`|A set of user supplied features to be associated with a field. E.g. it may contain request of storing field norms. Later the stored information can be used during querying and scoring.|
    6970

    7071
    #### StoredField concept
    71-
    For type `T` to be `StoredField`, the following conditions have to be satisfied for an object m of type `T`:
    72+
    For type `T` to be `StoredField`, the following conditions have to be satisfied for an object `m` of type `T`:
    7273

    7374
    |Expression|Requires|Effects|
    7475
    |----|----|----|
    @@ -218,7 +219,7 @@ via the distributions' package manager: libstemmer
    218219
    219220
    ```bash
    220221
    git clone https://github.com/snowballstem/snowball.git
    221-
    git reset --hard 5137019d68befd633ce8b1cd48065f41e77ed43e
    222+
    git reset --hard adc028f3ae646623bda2f99191fe9dc3287a909b
    222223
    mkdir build && cd build
    223224
    set PATH=%PATH%;<path-to>/build/Debug
    224225
    cmake -DENABLE_STATIC=OFF -DNO_SHARED=OFF -g "Visual studio 12" -Ax64 ..
    @@ -239,6 +240,10 @@ point SNOWBALL_ROOT at the source directory to build together with IResearch
    239240
    SNOWBALL_ROOT=<path-to-snowball>
    240241
    ```
    241242

    243+
    ### [VelocyPack](https://github.com/arangodb/velocypack)
    244+
    245+
    point VPACK_ROOT at the source directory to build together with IResearch
    246+
    242247
    ### [BFD](https://sourceware.org/binutils/) <optional>
    243248

    244249
    #### install (*nix)
    @@ -435,6 +440,9 @@ matching of words from languages not supported by 'snowball' are done verbatim
    435440
    ### [Google Test](https://code.google.com/p/googletest)
    436441
    used for writing tests for the IResearch library
    437442

    443+
    ### [VelocyPack](https://github.com/arangodb/velocypack)
    444+
    used for JSON serialization/deserialization
    445+
    438446
    ### Stopword list
    439447
    used by analysis::text_analyzer for filtering out noise words that should not impact text ranging
    440448
    e.g. for 'en' these are usualy 'a', 'the', etc...
    @@ -458,6 +466,7 @@ the first whitespace is ignored), in the directory corresponding to its language
    458466
    |iresearch::by_range|for filtering of values within a given range, with the possibility of specifying open/closed ranges
    459467
    |iresearch::by_same_position|for term-insertion-order sensitive filtering of exact values
    460468
    |iresearch::by_term|for filtering of exact values
    469+
    |iresearch::by_terms|for filtering of exact values by a set of specified terms
    461470
    |iresearch::by_wildcard|for filtering of values based on matching pattern
    462471
    |iresearch::And|boolean conjunction of multiple filters, influencing document ranks/scores as appropriate
    463472
    |iresearch::Or|boolean disjunction of multiple filters, influencing document ranks/scores as appropriate (including "minimum match" functionality)
    @@ -578,7 +587,7 @@ The following grammar is currently defined via Bison (the root is <query>):
    578587
    - Apple Clang: 9
    579588

    580589
    ## License
    581-
    Copyright (c) 2017-2020 ArangoDB GmbH
    590+
    Copyright (c) 2017-2021 ArangoDB GmbH
    582591

    583592
    Copyright (c) 2016-2017 EMC Corporation
    584593

    3rdParty/iresearch/THIRD_PARTY_README.md

    Lines changed: 3 additions & 3 deletions
    Original file line numberDiff line numberDiff line change
    @@ -81,9 +81,9 @@ number of components produced by third parties
    8181
    - Copyright: Daniel Povey <dpovey@gmail.com>
    8282
    - License: [Apache 2.0](http://www.apache.org/licenses/LICENSE-2.0)
    8383
    - How it's used: a set of extensions for OpenFST library
    84-
    14. Title: Unicorn Library
    85-
    - Copyright: Ross Smith
    86-
    - License: [MIT license](https://opensource.org/licenses/MIT)
    84+
    14. Title: Boost::Text
    85+
    - Copyright: Zachary Laine
    86+
    - License: [Boost Software License, Version 1.0](http://www.boost.org/LICENSE_1_0.txt)
    8787
    - How it's used: Word segmentation rules implementation for
    8888
    segmentation analyzer
    8989
    15. Title: Velocypack Library

    0 commit comments

    Comments
     (0)
    0