Doc-Search Metaverse project

Doc-Search is the simple and flexible searching documents application, leveraging the capabilities of Rust and Opensearch to provide efficient and effective full-text search in documents. This project aims to offer a straightforward solution for indexing and searching through a large corpus of documents with the speed and accuracy provided by Opensearch.

The main goal is implement simple and powerful system of storing and indexing documents with searching functionality (full-text, semantic and hybrid). I decided to use opensearch as default searching engine, but you may use own solutions by implementing several async traits for Tantivy, QDrant or own solution:

The princ AAD4 iple schema:

Doc-Search includes following sub-services:

Cache Service - API of caching service like Redis;
Metrics Service - API of metrics to Prometheus monitoring;
Storage Service - API (CRUD) of indexed folders and documents;
Searcher Service - API of searcher functionalities (fulltext, semantic, hybrid);
Embeddings Service (removed) - API of embeddings service if you would like to use own model.

Changelog:

OpenSearch instead Elasticsearch Searcher and Storage services at this moment has common implementation with opensearch

Removed custom embeddings functionality After switching on OpenSearch instead Elasticsearch the neccessary of custon embeddings model integration has gone, because the newer versions of OpenSearch provides ML plugin with neccessary functionality (chunking and emebdding). So Embeddings module was been removed from code base. When i add Qdrant supporting his functionality will be added into infrastructure with Qdrant client implementation.

Features

Service based:

Rust Performance: Benefit from the speed and safety of Rust;
REST API: Easy to use REST API for searching documents and control management of indexing;
Swagger: Using swagger documentation service for all available endpoints;
Remote logging: Send error or warning messages or other metrics to remote server;
Docker Support: Easy deployment with Docker and docker-compose;
Caching Queries: Store data to cache service like Redis or own solutions;

Searching:

Full-Text Search: Quickly find documents based on content based on choose searching engine;
Semantic Search: Fast semantic searching by external embeddings service;
Hybrid Search: Fast hybrid searching by external embeddings service;

Domain

There are following domains:

domain
   |----> Document storage (core)
   |        |----> Index
   |        |       |----> Context: index management into vector storage
   |        |       |----> Services: IIndexStorage
   |        |----> Document
   |                |----> Context: splits document on parts and stores into vector storage
   |                |----> Services: IDocumentPartStorage
   |
   |----> Document searching (core)
   |        |----> Founded document
   |        |       |----> Context: multiple searching kind results 
   |        |       |----> Services: ISearcher
   |        |----> Pagination
   |                |----> Context: paginating of founded results
   |                |----> Services: IPAginator

And there are usecases:

usecase
   |----> Storage Use Case
   |        |----> CRUD of index and document
   |        |----> split large document on parts to store 
   |        |----> upload file to storage and create new task processing event
   |
   |----> Searching Use Case
   |        |----> searching document parts by multiple algorithms
   |        |----> paginate founded document parts results

There is context map:

+----------------+         +-----------------+
| StorageUseCase | <────── | SearcherUseCase |
+----------------+         +-----------------+
        |                           |
        ▼                           ▼
+----------------+         +-----------------+
| Storage Domain |         | Searcher Domain |
+----------------+         +-----------------+

Context data flow:

HTTP Request
     │
     ▼
HTTP Handler (ServerState)
     │
     ▼
ServerAppState
    ├── StorageUseCase (application)
    │       │
    │       ▼
    │    Storage (domain)
    │
    └── SearcherUseCase (application)
            │
            ▼
          Task (domain)

Getting Started

These instructions will get you a copy of the project up and running on your local machine for development and testing purposes.

Prerequisites

Rust
Docker & docker-compose
Cache (Redis)
Opensearch

Quick Start

Check docs/opensearch scripts how load ml cluster into single node and setup infrastructure as ingest and searching pipelines and deploying model.
Clone the repository
Run cargo install --path . to build project
Setting up .env file with services creds
Run cargo run --bin init-infrastructure to init elasticsearch schemas
Run cargo run --bin launch to launch service

Features of project

Features to parse and store documents localy from current service (Not stable):

enable-unique-doc-id - enable generating unique document id based on index and document ids.

Name		Name	Last commit message	Last commit date
Latest commit History 811 Commits
.cargo		.cargo
.github/workflows		.github/workflows
config		config
doc-search-core		doc-search-core
doc-search-otlp		doc-search-otlp
docs		docs
src		src
static		static
tests		tests
.dockerignore		.dockerignore
.env		.env
.gitignore		.gitignore
Cargo.toml		Cargo.toml
Dockerfile		Dockerfile
README.md		README.md
docker-compose-test.yml		docker-compose-test.yml
docker-compose.yml		docker-compose.yml
rust-toolchain.toml		rust-toolchain.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Doc-Search Metaverse project

Changelog:

Features

Domain

Getting Started

Prerequisites

Quick Start

Features of project

About

Uh oh!

Releases 17

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Doc-Search Metaverse project

Changelog:

Features

Domain

Getting Started

Prerequisites

Quick Start

Features of project

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases 17

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages