8000 NEW BACKEND! MongoDB Atlas by caseyclements · Pull Request #1883 · docarray/docarray · GitHub
[go: up one dir, main page]

Skip to content

Conversation

@caseyclements
Copy link
Contributor

Description

This pull request introduces MongoDB integration as a document index backend, enhancing search capabilities within the application. Below are the details of the implementation and its supported functionalities:

Simple Usage

from docarray.index import MongoAtlasDocumentIndex
import numpy as np

class MyDoc(BaseDoc):
    text: str
    embedding: NdArray[10]

docs = [MyDoc(text=f'text {i}', embedding=np.random.rand(10)) for i in range(10)]
query = np.random.rand(10)
db = MongoAtlasDocumentIndex[MyDoc](host='localhost')
db.index(docs)
results = index.find(query, search_field='embedding', limit=10)

Supported Functionality

  • Find (vector search): Enables vector-based search.
  • Filter: Allows filtering on textual and numeric data using MongoDB syntax.
  • Text Search: Supports text search using regex match.
  • Get/Del: Retrieve and delete operations.
  • Subindex: Ability to create subindexes for better organization.

Integration Tests and documentation

  • tests/index/mongo_atlas
  • docs/API_reference/doc_index/backends/mongodb.md

Coming soon

  • Implementation of QueryBuilder and Hybrid Search.

@JoanFM
Copy link
Member
JoanFM commented Apr 24, 2024

Hello @caseyclements ,

Thanks a lot for this amazing contribution. May I ask if you could please sign off the commits so that we can pass the DCO and merge the PR if accepted?

WaVEV and others added 28 commits April 24, 2024 12:36
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
…equired_args

Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
@caseyclements caseyclements force-pushed the mongodb-atlas-backend branch from d74b174 to b5b73d1 Compare April 24, 2024 16:36
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
@prakul
Copy link
prakul commented Apr 24, 2024

All the comments have been addressed @JoanFM

Copy link
Member
@JoanFM JoanFM left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you please check how to pass the black check? You can find in our CONTRIBUTING guidelines the steps to find

Signed-off-by: Casey Clements <casey.clements@mongodb.com>
Signed-off-by: Casey Clements <casey.clements@mongodb.com>
@JoanFM
Copy link
Member
JoanFM commented Apr 26, 2024

We are gettting these errors in the tests caused by the type annotations , maybe my requests were not ggood enough.

docarray/index/backends/mongodb_atlas.py:197: in MongoDBAtlasDocumentIndex
    ) -> Tuple[list[dict], list[float]]:
E   TypeError: 'type' object is not subscriptable

Signed-off-by: Casey Clements <casey.clements@mongodb.com>
@caseyclements
Copy link
Contributor Author

Hi @JoanFM. Thank you for your working with me on this. I created a python 3.8 poetry environment for better coverage. black appears to have changed its mind between versions with regards to the ellipsis .... Running from 3.8, it moved it to the following line, where the day before, my 3.11 black moved it to the same one! :) I changed it to pass. ¯_(ツ)_/¯

-    class RuntimeConfig(BaseDocIndex.RuntimeConfig): ...
+    class RuntimeConfig(BaseDocIndex.RuntimeConfig):
+        ...

And now

-    class RuntimeConfig(BaseDocIndex.RuntimeConfig): ...
+    class RuntimeConfig(BaseDocIndex.RuntimeConfig):
+        pass

I'll turn to the typing issues, mypy with py3.8.

Signed-off-by: Casey Clements <casey.clements@mongodb.com>
@caseyclements
Copy link
Contributor Author

I'm sorry that I missed list to List. I recognized the required change while updating tuple, but it was a busy morning. I pushed a new commit.

@codecov
Copy link
codecov bot commented Apr 26, 2024

Codecov Report

Attention: Patch coverage is 41.11675% with 116 lines in your changes are missing coverage. Please review.

Project coverage is 44.75%. Comparing base (febbdc4) to head (5c01811).

❗ Current head 5c01811 differs from pull request most recent head 3b03f06. Consider uploading reports for the commit 3b03f06 to get more accurate results

Files Patch % Lines
docarray/index/backends/mongodb_atlas.py 40.10% 115 Missing ⚠️
docarray/index/__init__.py 75.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##             main    #1883       +/-   ##
===========================================
- Coverage   84.69%   44.75%   -39.95%     
===========================================
  Files         136      137        +1     
  Lines        9263     9459      +196     
===========================================
- Hits         7845     4233     -3612     
- Misses       1418     5226     +3808     
Flag Coverage Δ
docarray 44.75% <41.11%> (-39.95%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@caseyclements
Copy link
Contributor Author

Hi @JoanFM . I can address the code coverage by adding a new action to .github/workflows/ci.yml. @prakul We can set up the correct credentials and share a google doc.

@JoanFM
Copy link
Member
JoanFM commented Apr 26, 2024

Hi @JoanFM . I can address the code coverage by adding a new action to .github/workflows/ci.yml. @prakul We can set up the correct credentials and share a google doc.

no need to worry about code coverage

@caseyclements
Copy link
Contributor Author

Hi @JoanFM . I can address the code coverage by adding a new action to .github/workflows/ci.yml. @prakul We can set up the correct credentials and share a google doc.

no need to worry about code coverage

Cool. What remains then?

In the next two weeks (I'm in London this coming one) when we add the QueryBuilder, we'll also set up the testing on your end. We are already running on our CI against Atlas on our end. Maybe we could set up a face-to-face meeting in a couple weeks. We can optimize to get the most of MongoDB's API once we know the scope of the use cases for the Indexes, and the data types.

@JoanFM
Copy link
Member
JoanFM commented Apr 26, 2024

Hi @JoanFM . I can address the code coverage by adding a new action to .github/workflows/ci.yml. @prakul We can set up the correct credentials and share a google doc.

no need to worry about code coverage

Cool. What remains then?

In the next two weeks (I'm in London this coming one) when we add the QueryBuilder, we'll also set up the testing on your end. We are already running on our CI against Atlas on our end. Maybe we could set up a face-to-face meeting in a couple weeks. We can optimize to get the most of MongoDB's API once we know the scope of the use cases for the Indexes, and the data types.

there seems to be a test timing out but not sure if it comes from your changes.

So what is the plan for the upcoming changes feom your side?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants

0