E61A NEW BACKEND! MongoDB Atlas by caseyclements · Pull Request #1883 · docarray/docarray · GitHub
[go: up one dir, main page]

Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
98dda1c
Mongo backend index initial.
WaVEV Mar 18, 2024
3150375
find unit test.
WaVEV Mar 20, 2024
386b5ab
unit test of configuration, find, del, get and persist data.
WaVEV Mar 22, 2024
08cbc90
query builder refactor.
WaVEV Mar 22, 2024
6d45d1f
Rename function _collect_query_args_required_args => _collect_query_r…
WaVEV Mar 22, 2024
12c7216
filter by parent id and query builder refactor.
WaVEV Mar 25, 2024
df0e461
Add nested schema fixture.
WaVEV Mar 25, 2024
f0ff913
refactor test find, moving text and filter test.
WaVEV Mar 25, 2024
028f1be
Add test filter.
WaVEV Mar 25, 2024
d2f1f03
Add test text search.
WaVEV Mar 25, 2024
2cafd61
Add query builder test.
WaVEV Mar 25, 2024
61b9943
Set collection name by schema name.
WaVEV Mar 25, 2024
b29290e
Fix unit test.
WaVEV Mar 25, 2024
b3b35cf
Add index name property.
WaVEV Mar 25, 2024
fdbb334
set scope for mongo_fixture_env.
WaVEV Mar 25, 2024
4cb814d
add subindex test.
WaVEV Mar 25, 2024
00de14f
subindex find.
WaVEV Mar 26, 2024
2c083f9
Update readme and manage exception when an Index is missing.
WaVEV Mar 26, 2024
b52cd14
test find without index.
WaVEV Mar 26, 2024
63ede54
refactor fixtures import.
WaVEV Mar 26, 2024
1b1ab1c
Importing all fixtures to avoid importing dependencies.
WaVEV Mar 26, 2024
04cd2a6
fix poetry lock.
WaVEV Mar 28, 2024
de1d74f
QueryBuilder: hybrid search implementation.
WaVEV Apr 15, 2024
df57c22
Hybrid search: fix score.
WaVEV Apr 16, 2024
53e93f4
Refactor: project now takes extra fields to project scores.
WaVEV Apr 17, 2024
addac74
Moved query builder to query-builder-implementation branch.
WaVEV Apr 20, 2024
95bf4f3
Refactor and clean up.
WaVEV Apr 23, 2024
78a670d
Rename MongoAtlasDocumentIndex to MongoDBAtlasDocumentIndex
caseyclements Apr 23, 2024
09c6e70
Update env variable name
caseyclements Apr 23, 2024
b5b73d1
Add MongoDB Atlas setup instructions.
caseyclements Apr 23, 2024
62be05c
Updates in response to maintainer comments
caseyclements Apr 24, 2024
aed03a2
Added detailed README in tests/ for setup of indexes
caseyclements Apr 24, 2024
9df653b
black formatted
caseyclements Apr 25, 2024
c2178de
Changed typing tuple > Tuple for python<=3.8
caseyclements Apr 25, 2024
30d2b40
black formatting ellipsis to pass
caseyclements Apr 26, 2024
5c01811
Updated typing of Lists for backward compatibility
caseyclements Apr 26, 2024
3b03f06
Merge branch 'main' into mongodb-atlas-backend
Apr 26, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ repos:
exclude: ^(docarray/proto/pb/docarray_pb2.py|docarray/proto/pb/docarray_pb2.py|docs/|docarray/resources/)

- repo: https://githu 10BC0 b.com/charliermarsh/ruff-pre-commit
rev: v0.0.243
rev: v0.0.250
hooks:
- id: ruff

Expand Down
9 changes: 6 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@ DocArray is a Python library expertly crafted for the [representation](#represen

- :fire: Offers native support for **[NumPy](https://github.com/numpy/numpy)**, **[PyTorch](https://github.com/pytorch/pytorch)**, **[TensorFlow](https://github.com/tensorflow/tensorflow)**, and **[JAX](https://github.com/google/jax)**, catering specifically to **model training scenarios**.
- :zap: Based on **[Pydantic](https://github.com/pydantic/pydantic)**, and instantly compatible with web and microservice frameworks like **[FastAPI](https://github.com/tiangolo/fastapi/)** and **[Jina](https://github.com/jina-ai/jina/)**.
- :package: Provides support for vector databases such as **[Weaviate](https://weaviate.io/), [Qdrant](https://qdrant.tech/), [ElasticSearch](https://www.elastic.co/de/elasticsearch/), [Redis](https://redis.io/)**, and **[HNSWLib](https://github.com/nmslib/hnswlib)**.
- :package: Provides support for vector databases such as **[Weaviate](https://weaviate.io/), [Qdrant](https://qdrant.tech/), [ElasticSearch](https://www.elastic.co/de/elasticsearch/), **[Redis](https://redis.io/)**, **[Mongo Atlas](https://www.mongodb.com/)**, and **[HNSWLib](https://github.com/nmslib/hnswlib)**.
- :chains: Allows data transmission as JSON over **HTTP** or as **[Protobuf](https://protobuf.dev/)** over **[gRPC](https://grpc.io/)**.

## Installation
Expand Down Expand Up @@ -350,7 +350,7 @@ This is useful for:
- :mag: **Neural search** applications
- :bulb: **Recommender systems**

Currently, Document Indexes support **[Weaviate](https://weaviate.io/)**, **[Qdrant](https://qdrant.tech/)**, **[ElasticSearch](https://www.elastic.co/)**, **[Redis](https://redis.io/)**, and **[HNSWLib](https://github.com/nmslib/hnswlib)**, with more to come!
Currently, Document Indexes support **[Weaviate](https://weaviate.io/)**, **[Qdrant](https://qdrant.tech/)**, **[ElasticSearch](https://www.elastic.co/)**, **[Redis](https://redis.io/)**, **[Mongo Atlas](https://www.mongodb.com/)**, and **[HNSWLib](https://github.com/nmslib/hnswlib)**, with more to come!

The Document Index interface lets you index and retrieve Documents from multiple vector databases, all with the same user interface.

Expand Down Expand Up @@ -421,7 +421,7 @@ They are now called **Document Indexes** and offer the following improvements (s
- **Production-ready:** The new Document Indexes are a much thinner wrapper around the various vector DB libraries, making them more robust and easier to maintain
- **Increased flexibility:** We strive to support any configuration or setting that you could perform through the DB's first-party client

For now, Document Indexes support **[Weaviate](https://weaviate.io/)**, **[Qdrant](https://qdrant.tech/)**, **[ElasticSearch](https://www.elastic.co/)**, **[Redis](https://redis.io/)**, Exact Nearest Neighbour search and **[HNSWLib](https://github.com/nmslib/hnswlib)**, with more to come.
For now, Document Indexes support **[Weaviate](https://weaviate.io/)**, **[Qdrant](https://qdrant.tech/)**, **[ElasticSearch](https://www.elastic.co/)**, **[Redis](https://redis.io/)**, **[Mongo Atlas](https://www.mongodb.com/)**, Exact Nearest Neighbour search and **[HNSWLib](https://github.com/nmslib/hnswlib)**, with more to come.

</details>

Expand Down Expand Up @@ -844,6 +844,7 @@ Currently, DocArray supports the following vector databases:
- [Milvus](https://milvus.io)
- ExactNNMemorySearch as a local alternative with exact kNN search.
- [HNSWlib](https://github.com/nmslib/hnswlib) as a local-first ANN alternative
- [Mongo Atlas](https://www.mongodb.com/)

An integration of [OpenSearch](https://opensearch.org/) is currently in progress.

Expand Down Expand Up @@ -874,6 +875,7 @@ from langchain.embeddings.openai import OpenAIEmbeddings

embeddings = OpenAIEmbeddings()


# Define a document schema
class MovieDoc(BaseDoc):
title: str
Expand Down Expand Up @@ -903,6 +905,7 @@ from docarray.index import (
QdrantDocumentInde EECB x,
ElasticDocIndex,
RedisDocumentIndex,
MongoDBAtlasDocumentIndex,
)

# Select a suitable backend and initialize it with data
Expand Down
7 changes: 7 additions & 0 deletions docarray/index/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,9 @@
from docarray.index.backends.epsilla import EpsillaDocumentIndex # noqa: F401
from docarray.index.backends.hnswlib import HnswDocumentIndex # noqa: F401
from docarray.index.backends.milvus import MilvusDocumentIndex # noqa: F401
from docarray.index.backends.mongodb_atlas import ( # noqa: F401
MongoDBAtlasDocumentIndex,
)
from docarray.index.backends.qdrant import QdrantDocumentIndex # noqa: F401
from docarray.index.backends.redis import RedisDocumentIndex # noqa: F401
from docarray.index.backends.weaviate import WeaviateDocumentIndex # noqa: F401
Expand All @@ -26,6 +29,7 @@
'WeaviateDocumentIndex',
'RedisDocumentIndex',
'MilvusDocumentIndex',
'MongoDBAtlasDocumentIndex',
]


Expand Down Expand Up @@ -55,6 +59,9 @@ def __getattr__(name: str):
elif name == 'RedisDocumentIndex':
import_library('redis', raise_error=True)
import docarray.index.backends.redis as lib
elif name == 'MongoDBAtlasDocumentIndex':
import_library('pymongo', raise_error=True)
import docarray.index.backends.mongodb_atlas as lib
else:
raise ImportError(
f'cannot import name \'{name}\' from \'{_get_path_from_docarray_root_level(__file__)}\''
Expand Down
Loading
0