[go: up one dir, main page]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Draft] rust-agent-qbg #2751

Draft
wants to merge 1 commit into
base: main
Choose a base branch
from
Draft

[Draft] rust-agent-qbg #2751

wants to merge 1 commit into from

Conversation

datelier
Copy link
Contributor
@datelier datelier commented Nov 19, 2024

Description

Related Issue

Versions

  • Vald Version: v1.7.14
  • Go Version: v1.23.3
  • Rust Version: v1.82.0
  • Docker Version: v27.3.1
  • Kubernetes Version: v1.31.2
  • Helm Version: v3.16.2
  • NGT Version: v2.3.4
  • Faiss Version: v1.9.0

Checklist

Special notes for your reviewer

Summary by CodeRabbit

  • New Features
    • Introduced a new library for managing Quantized Binary Graph (QBG) construction and indexing.
    • Added functionality for creating and manipulating properties and indices related to QBG, including robust parameter management and lifecycle operations.
  • Bug Fixes
    • Enhanced error handling in index operations.
  • Documentation
    • Added comprehensive documentation for new classes and methods related to QBG.
  • Tests
    • Implemented tests to ensure the functionality of creating, appending, and searching indices.

Copy link
Contributor
coderabbitai bot commented Nov 19, 2024
📝 Walkthrough
📝 Walkthrough

Walkthrough

The changes in this pull request introduce a new library, qbg, into the Rust project by modifying several Cargo.toml files and adding new source files. The Cargo.toml in the workspace is updated to include the qbg library, while the Cargo.toml for the algorithm library adds a dependency on qbg. A new Cargo.toml file is created for the qbg library, which specifies its dependencies and metadata. Additionally, new source files (build.rs, input.cpp, input.h, lib.rs) are added to implement functionality related to QBG construction and indexing.

Changes

File Path Change Summary
rust/Cargo.toml Added new workspace member: "libs/algorithms/qbg"
rust/libs/algorithm/Cargo.toml Added new dependency: qbg = { version = "0.1.0", path = "../algorithms/qbg" }
rust/libs/algorithms/qbg/Cargo.toml Created new package with metadata, version, edition, and dependencies (anyhow, cxx, etc.)
rust/libs/algorithms/qbg/build.rs Added build script for integrating C++ code using cxx, specifying source files and compiler flags.
rust/libs/algorithms/qbg/src/input.cpp Introduced Property and Index classes with methods for QBG management and error handling.
rust/libs/algorithms/qbg/src/input.h Declared Property and Index classes and their associated functions.
rust/libs/algorithms/qbg/src/lib.rs Added FFI module for Property and Index, including various function signatures and test cases.

Possibly related PRs

Suggested labels

size/XXXL, area/agent/core/faiss, area/agent/core/ngt, area/agent/sidecar, area/manager/index, area/tools/cli/loadtest, area/gateway/lb, area/gateway/filter, area/gateway/mirror, area/index/job/correction, area/index/job/creation, area/index/job/save, area/index/job/readreplica

Suggested reviewers

  • vankichi
  • kmrmt

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@vdaas-ci
Copy link
Collaborator

[CHATOPS:HELP] ChatOps commands.

  • 🙆‍♀️ /approve - approve
  • 🍱 /format - format codes and add licenses
  • /gen-test - generate test codes
  • 🏷️ /label - add labels
  • 🔚 2️⃣ 🔚 /label actions/e2e-deploy - run E2E deploy & integration test

Copy link
Contributor
@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 15

🧹 Outside diff range and nitpick comments (10)
rust/libs/algorithm/Cargo.toml (1)

25-25: Documentation needed for QBG integration

Since this is a draft PR introducing a new algorithm component, please provide:

  1. Description of what QBG is and its purpose
  2. Integration requirements and dependencies
  3. Any performance implications or considerations

Consider adding these details to:

  • PR description
  • README in the QBG module
  • Code documentation
rust/libs/algorithms/qbg/src/input.h (2)

32-39: Add parameter names to set_qbg_construction_parameters for clarity

Including parameter names in the function declaration enhances readability and helps developers understand the purpose of each parameter.


49-65: Include parameter names in set_qbg_build_parameters function

Providing parameter names in the function declaration improves code comprehension and maintainability.

rust/libs/algorithms/qbg/src/lib.rs (5)

59-62: Inconsistent Integer Types in set_hierarchical_clustering_init_mode

The function set_hierarchical_clustering_init_mode uses i16 for the parameter hierarchical_clustering_init_mode, whereas similar parameters in other functions use i32. Consider using consistent integer types to avoid potential issues.


63-65: Type Mismatch in set_number_of_second_objects

The parameter number_of_second_objects in set_number_of_second_objects is of type u32, while other similar parameters use usize. For consistency and to prevent potential bugs, consider using usize here as well.


109-130: Use Meaningful Values in Test Property Setters

In the test test_qbg, many property setters are assigned the value 1. Using more realistic and diverse values can improve the effectiveness of the test by covering a wider range of scenarios.


184-185: Simplify Return Statements in Test Functions

The lines return Ok(()); can be simplified to just Ok(()) for brevity and idiomatic Rust code.

Also applies to: 216-217


84-85: Clarify Parameter Naming and Types in new_index and new_prebuilt_index

The new_index function takes a Pin<&mut Property> as the second parameter named p, while new_prebuilt_index takes a bool named p. This overloading of p with different types may cause confusion. Consider renaming the boolean parameter to something more descriptive, like prebuilt, for clarity.

rust/libs/algorithms/qbg/src/input.cpp (2)

315-328: Missing Error Handling in Factory Functions

The factory functions new_index and new_prebuilt_index do not handle exceptions that may be thrown during object creation, potentially leading to unhandled exceptions.

Consider using exception handling to capture and report any errors during object creation:

std::unique_ptr<Index> new_index(const rust::String &path, Property &p)
{
    try
    {
        return std::make_unique<Index>(path, p);
    }
    catch (const std::exception &e)
    {
        // Handle exception, possibly log the error
        std::cerr << "Error creating Index: " << e.what() << std::endl;
        return nullptr;
    }
}

227-238: Use of const char* Instead of std::string in open_index

Passing cpath.c_str() may lead to issues if cpath goes out of scope before c_str() is used.

Since cpath is a local variable, it's safe here, but consider passing std::string directly if the underlying function accepts it, to reduce unnecessary conversions.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 3d51a12 and d6e423f.

⛔ Files ignored due to path filters (1)
  • rust/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (7)
  • rust/Cargo.toml (1 hunks)
  • rust/libs/algorithm/Cargo.toml (1 hunks)
  • rust/libs/algorithms/qbg/Cargo.toml (1 hunks)
  • rust/libs/algorithms/qbg/build.rs (1 hunks)
  • rust/libs/algorithms/qbg/src/input.cpp (1 hunks)
  • rust/libs/algorithms/qbg/src/input.h (1 hunks)
  • rust/libs/algorithms/qbg/src/lib.rs (1 hunks)
✅ Files skipped from review due to trivial changes (2)
  • rust/libs/algorithms/qbg/Cargo.toml
  • rust/libs/algorithms/qbg/build.rs
🔇 Additional comments (4)
rust/Cargo.toml (2)

24-24: LGTM! The workspace member addition follows project conventions.

The new qbg algorithm library is properly placed alongside other algorithm implementations (ngt, faiss).


24-24: Verify qbg library setup

Let's verify that the qbg library directory exists and has the required Cargo.toml configuration.

✅ Verification successful

QBG library setup is properly configured

The verification shows that the qbg library is correctly set up with:

  • Directory structure present at rust/libs/algorithms/qbg/
  • Required files: Cargo.toml, build.rs, and src/ directory
  • Proper Cargo.toml configuration with:
    • Package metadata
    • Required dependencies (anyhow, cxx)
    • Build dependencies (cxx-build, miette)
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify qbg library structure and configuration

# Check if the directory exists
if [ ! -d "rust/libs/algorithms/qbg" ]; then
    echo "Error: qbg directory does not exist at expected path"
    exit 1
fi

# Check for Cargo.toml and essential files
echo "Checking qbg library structure:"
ls -la rust/libs/algorithms/qbg/

# Display Cargo.toml contents if it exists
if [ -f "rust/libs/algorithms/qbg/Cargo.toml" ]; then
    echo -e "\nQBG Cargo.toml contents:"
    cat rust/libs/algorithms/qbg/Cargo.toml
else
    echo "Error: Cargo.toml not found in qbg directory"
fi

Length of output: 1461

rust/libs/algorithm/Cargo.toml (1)

25-25: Verify QBG integration path and dependencies

The dependency path looks consistent with other algorithm dependencies, but let's verify the integration.

✅ Verification successful

QBG dependency path and integration verified

The QBG integration appears to be properly structured:

  • QBG directory exists at the correct path (rust/libs/algorithms/qbg)
  • The library implements a C++ bridge for QBG index construction and search
  • Proper license headers and version metadata are present
  • No direct usage in algorithm lib yet, which is expected for a draft PR
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify QBG integration

# Check if qbg directory exists
echo "Checking qbg directory..."
if [ -d "rust/libs/algorithms/qbg" ]; then
    echo "✓ QBG directory exists"
else
    echo "✗ QBG directory not found"
fi

# Check for other potential integration points
echo -e "\nChecking for other QBG references..."
rg -l "qbg" --type rust

Length of output: 336


Script:

#!/bin/bash
# Let's check the content of lib.rs to understand the integration points
echo "=== Content of QBG lib.rs ==="
cat rust/libs/algorithms/qbg/src/lib.rs

echo -e "\n=== Checking Cargo.toml for metadata and description ==="
cat rust/libs/algorithms/qbg/Cargo.toml

echo -e "\n=== Checking for any usage in algorithm lib ==="
rg "qbg" rust/libs/algorithm/src/ -A 2 -B 2

Length of output: 9834

rust/libs/algorithms/qbg/src/input.cpp (1)

273-273: Incorrect Use of v.length() in append

The use of v.length() may be incorrect if v does not have a length() method or if it does not return the expected size.

Verify that v.length() correctly returns the number of elements in the slice. If not, use v.size() or adjust accordingly.

Run the following script to check for the correct method to retrieve the size of rust::Slice:

Ensure that the correct method is consistently used across the codebase.

✅ Verification successful

Let's search for actual usage patterns of rust::Slice in the codebase to understand how it's being used and what methods are available. I've also included a search for the specific vector construction pattern to see the surrounding context.


Use of v.length() is correct for rust::Slice

Looking at the codebase usage, v.length() is the correct method for rust::Slice. This is evidenced by its usage in qbg/src/input.cpp and supported by the fact that rust::Slice is a Rust-C++ bridge type where the Rust convention for getting collection size is length(). The code correctly constructs a vector from the slice's iterators and uses length() to get its size.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check for the correct method to get size from rust::Slice in the codebase.

# Search for usage of rust::Slice methods
rg --type-add 'cpp' --files-with-matches 'rust::Slice' | xargs grep -E 'rust::Slice.*\.(length\(\)|size\(\))' -A 2

Length of output: 172


Script:

#!/bin/bash
# Search for rust::Slice usage patterns in cpp files
rg "rust::Slice" -A 3 -g "*.cpp"

# Also search for the specific line context
rg "std::vector<float>\(.*\.begin\(\),.*\.end\(\)\)" -g "*.cpp" -A 2

Length of output: 2993

rust/libs/algorithms/qbg/src/input.h Show resolved Hide resolved
rust/libs/algorithms/qbg/src/input.h Outdated Show resolved Hide resolved
rust/libs/algorithms/qbg/src/input.h Show resolved Hide resolved
rust/libs/algorithms/qbg/src/lib.rs Outdated Show resolved Hide resolved
rust/libs/algorithms/qbg/src/lib.rs Outdated Show resolved Hide resolved
rust/libs/algorithms/qbg/src/input.cpp Outdated Show resolved Hide resolved
rust/libs/algorithms/qbg/src/input.cpp Outdated Show resolved Hide resolved
rust/libs/algorithms/qbg/src/input.cpp Outdated Show resolved Hide resolved
rust/libs/algorithms/qbg/src/input.cpp Outdated Show resolved Hide resolved
rust/libs/algorithms/qbg/src/input.cpp Outdated Show resolved Hide resolved
Copy link
cloudflare-workers-and-pages bot commented Nov 19, 2024

Deploying vald with  Cloudflare Pages  Cloudflare Pages

Latest commit: 6e5aa96
Status: ✅  Deploy successful!
Preview URL: https://0abe4f88.vald.pages.dev
Branch Preview URL: https://feature-agent-rust-agent-qbg.vald.pages.dev

View logs

coderabbitai[bot]
coderabbitai bot previously approved these changes Nov 19, 2024
Copy link
Contributor
@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

🧹 Outside diff range and nitpick comments (2)
rust/libs/algorithms/qbg/src/input.cpp (1)

31-34: Add const correctness to getter methods.

The getter methods should be marked as const since they don't modify the object's state.

-QBGConstructionParameters *Property::get_qbg_construction_parameters()
+QBGConstructionParameters *Property::get_qbg_construction_parameters() const
 {
     return qbg_construction_parameters;
 }

-QBGBuildParameters *Property::get_qbg_build_parameters()
+QBGBuildParameters *Property::get_qbg_build_parameters() const
 {
     return qbg_build_parameters;
 }

Also applies to: 95-98

rust/libs/algorithms/qbg/src/lib.rs (1)

362-443: Add error case testing

The test only covers the happy path. Add tests for error cases:

  • Invalid dimensions
  • Empty vectors
  • Out of bounds K values
  • Invalid file paths

Add error case tests:

#[test]
fn test_ffi_qbg_errors() -> Result<()> {
    let path = "invalid/path".to_string();
    let mut p = Property::new();
    p.init_qbg_construction_parameters();
    
    // Test invalid path
    assert!(Index::new(&path, &mut p).is_err());
    
    // Test invalid dimensions
    p.set_dimension(0);
    assert!(Index::new(&"index".to_string(), &mut p).is_err());
    
    Ok(())
}
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between d6e423f and 6de8a6a.

⛔ Files ignored due to path filters (1)
  • rust/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (7)
  • rust/Cargo.toml (1 hunks)
  • rust/libs/algorithm/Cargo.toml (1 hunks)
  • rust/libs/algorithms/qbg/Cargo.toml (1 hunks)
  • rust/libs/algorithms/qbg/build.rs (1 hunks)
  • rust/libs/algorithms/qbg/src/input.cpp (1 hunks)
  • rust/libs/algorithms/qbg/src/input.h (1 hunks)
  • rust/libs/algorithms/qbg/src/lib.rs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (5)
  • rust/Cargo.toml
  • rust/libs/algorithm/Cargo.toml
  • rust/libs/algorithms/qbg/Cargo.toml
  • rust/libs/algorithms/qbg/build.rs
  • rust/libs/algorithms/qbg/src/input.h
🔇 Additional comments (2)
rust/libs/algorithms/qbg/src/input.cpp (2)

230-233: Uncomment and implement the destructor.

The destructor is currently commented out, which could lead to resource leaks. The index should be properly closed when the object is destroyed.


335-348: LGTM! Well-implemented factory functions.

The factory functions make good use of modern C++ practices with std::unique_ptr for automatic memory management.

Comment on lines +60 to +93
void Property::set_extended_dimension(rust::usize extended_dimension)
{
qbg_construction_parameters->extended_dimension = extended_dimension;
}

void Property::set_dimension(rust::usize dimension)
{
qbg_construction_parameters->dimension = dimension;
}

void Property::set_number_of_subvectors(rust::usize number_of_subvectors)
{
qbg_construction_parameters->number_of_subvectors = number_of_subvectors;
}

void Property::set_number_of_blobs(rust::usize number_of_blobs)
{
qbg_construction_parameters->number_of_blobs = number_of_blobs;
}

void Property::set_internal_data_type(rust::i32 internal_data_type)
{
qbg_construction_parameters->internal_data_type = internal_data_type;
}

void Property::set_data_type(rust::i32 data_type)
{
qbg_construction_parameters->data_type = data_type;
}

void Property::set_distance_type(rust::i32 distance_type)
{
qbg_construction_parameters->distance_type = distance_type;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add parameter validation in setter methods.

The setter methods for construction parameters should validate their input values to ensure they are within acceptable ranges. For example, dimension and number_of_subvectors should be positive values.

Example implementation for one setter:

 void Property::set_dimension(rust::usize dimension)
 {
+    if (dimension == 0) {
+        throw std::invalid_argument("dimension must be positive");
+    }
     qbg_construction_parameters->dimension = dimension;
 }

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +299 to +333
void Index::search(
rust::Slice<const rust::f32> v,
rust::usize k,
rust::i32 *ids,
rust::f32 *distances)
{
QBGQuery query;
qbg_initialize_query(&query);
std::vector<float> vec(v.begin(), v.end());
query.query = vec.data();

NGTError err = ngt_create_error_object();
NGTObjectDistances results = ngt_create_empty_results(err);
bool ok = qbg_search_index(index, query, results, err);
if (!ok)
{
std::cerr << "Error: " << __func__ << std::endl;
std::cerr << ngt_get_error_string(err) << std::endl;
qbg_destroy_results(results);
ngt_destroy_error_object(err);
return;
}

size_t rsize = qbg_get_result_size(results, err);
size_t limit = std::min(k, rsize);
for (size_t i = 0; i < limit; i++)
{
NGTObjectDistance obj = qbg_get_result(results, i, err);
ids[i] = obj.id;
distances[i] = obj.distance;
}

qbg_destroy_results(results);
ngt_destroy_error_object(err);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Consider thread safety in search method.

The search method uses shared resources (index) without synchronization. If this method is called from multiple threads, it could lead to race conditions.

Consider adding mutex protection:

+#include <mutex>
+
 class Index {
+private:
+    std::mutex search_mutex;
+
 public:
     void search(
         rust::Slice<const rust::f32> v,
         rust::usize k,
         rust::i32 *ids,
         rust::f32 *distances)
     {
+        std::lock_guard<std::mutex> lock(search_mutex);
         // ... rest of the method
     }
 };

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines +284 to +297
rust::i32 Index::append(rust::Slice<const rust::f32> v)
{
NGTError err = ngt_create_error_object();
std::vector<float> vec(v.begin(), v.end());
unsigned int id = qbg_append_object(index, vec.data(), v.length(), err);
if (id == 0)
{
std::cerr << ngt_get_error_string(err) << std::endl;
ngt_destroy_error_object(err);
return 0;
}
ngt_destroy_error_object(err);
return id;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Improve error handling in append method.

The method returns 0 on error, which could be confused with a valid ID. Consider using exceptions or a more explicit error handling mechanism.

-rust::i32 Index::append(rust::Slice<const rust::f32> v)
+rust::i32 Index::append(rust::Slice<const rust::f32> v)
 {
     NGTError err = ngt_create_error_object();
     std::vector<float> vec(v.begin(), v.end());
     unsigned int id = qbg_append_object(index, vec.data(), v.length(), err);
     if (id == 0)
     {
-        std::cerr << ngt_get_error_string(err) << std::endl;
+        std::string error_msg = ngt_get_error_string(err);
         ngt_destroy_error_object(err);
-        return 0;
+        throw std::runtime_error("Failed to append object: " + error_msg);
     }
     ngt_destroy_error_object(err);
     return id;
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
rust::i32 Index::append(rust::Slice<const rust::f32> v)
{
NGTError err = ngt_create_error_object();
std::vector<float> vec(v.begin(), v.end());
unsigned int id = qbg_append_object(index, vec.data(), v.length(), err);
if (id == 0)
{
std::cerr << ngt_get_error_string(err) << std::endl;
ngt_destroy_error_object(err);
return 0;
}
ngt_destroy_error_object(err);
return id;
}
rust::i32 Index::append(rust::Slice<const rust::f32> v)
{
NGTError err = ngt_create_error_object();
std::vector<float> vec(v.begin(), v.end());
unsigned int id = qbg_append_object(index, vec.data(), v.length(), err);
if (id == 0)
{
std::string error_msg = ngt_get_error_string(err);
ngt_destroy_error_object(err);
throw std::runtime_error("Failed to append object: " + error_msg);
}
ngt_destroy_error_object(err);
return id;
}

Comment on lines +330 to +332
pub fn open_index(&mut self, path: &String, prebuilt: bool) {
self.inner.pin_mut().open_index(path, prebuilt);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add error handling for index operations

Methods like open_index, build_index, and save_index don't return Results, making it impossible to handle failures gracefully.

Add consistent error handling:

-    pub fn open_index(&mut self, path: &String, prebuilt: bool) {
-        self.inner.pin_mut().open_index(path, prebuilt);
+    pub fn open_index(&mut self, path: &String, prebuilt: bool) -> Result<(), cxx::Exception> {
+        self.inner.pin_mut().open_index(path, prebuilt)?;
+        Ok(())
     }

-    pub fn build_index(&mut self, path: &String, p: &mut property::Property) {
-        self.inner.pin_mut().build_index(path, p.get_property());
+    pub fn build_index(&mut self, path: &String, p: &mut property::Property) -> Result<(), cxx::Exception> {
+        self.inner.pin_mut().build_index(path, p.get_property())?;
+        Ok(())
     }

Also applies to: 334-336, 338-340

Comment on lines +91 to +97
unsafe fn search(
self: Pin<&mut Index>,
v: &[f32],
k: usize,
ids: *mut i32,
distances: *mut f32,
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add safety documentation for unsafe search function

The unsafe search function uses raw pointers without clear documentation about:

  • Required buffer sizes for ids and distances
  • Alignment requirements
  • Ownership and lifetime guarantees
  • Valid parameter ranges

Add comprehensive safety documentation:

+        /// # Safety
+        /// 
+        /// Caller must ensure:
+        /// - `ids` points to a buffer of size >= k
+        /// - `distances` points to a buffer of size >= k
+        /// - Both buffers are properly aligned
+        /// - Buffers remain valid for the duration of the call
+        /// - k > 0 and k <= number of indexed vectors
         unsafe fn search(
             self: Pin<&mut Index>,
             v: &[f32],
             k: usize,
             ids: *mut i32,
             distances: *mut f32,
         );
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
unsafe fn search(
self: Pin<&mut Index>,
v: &[f32],
k: usize,
ids: *mut i32,
distances: *mut f32,
);
/// # Safety
///
/// Caller must ensure:
/// - `ids` points to a buffer of size >= k
/// - `distances` points to a buffer of size >= k
/// - Both buffers are properly aligned
/// - Buffers remain valid for the duration of the call
/// - k > 0 and k <= number of indexed vectors
unsafe fn search(
self: Pin<&mut Index>,
v: &[f32],
k: usize,
ids: *mut i32,
distances: *mut f32,
);

Comment on lines +124 to +147
pub fn set_qbg_construction_parameters(
&mut self,
extended_dimension: usize,
dimension: usize,
number_of_subvectors: usize,
number_of_blobs: usize,
internal_data_type: i32,
data_type: i32,
distance_type: i32,
) {
self.inner
.pin_mut()
.set_extended_dimension(extended_dimension);
self.inner.pin_mut().set_dimension(dimension);
self.inner
.pin_mut()
.set_number_of_subvectors(number_of_subvectors);
self.inner.pin_mut().set_number_of_blobs(number_of_blobs);
self.inner
.pin_mut()
.set_internal_data_type(internal_data_type);
self.inner.pin_mut().set_data_type(data_type);
self.inner.pin_mut().set_distance_type(distance_type);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add input validation for construction parameters

The set_qbg_construction_parameters method accepts parameters without validation. Consider adding checks for:

  • Non-zero dimensions
  • Valid relationships between dimensions and subvectors
  • Valid enum values for data types and distance types

Add validation:

     pub fn set_qbg_construction_parameters(
         &mut self,
         extended_dimension: usize,
         dimension: usize,
         number_of_subvectors: usize,
         number_of_blobs: usize,
         internal_data_type: i32,
         data_type: i32,
         distance_type: i32,
     ) {
+        if dimension == 0 {
+            panic!("dimension must be greater than 0");
+        }
+        if number_of_subvectors == 0 || number_of_subvectors > dimension {
+            panic!("invalid number_of_subvectors");
+        }
+        // Add more validation as needed
         self.inner
             .pin_mut()
             .set_extended_dimension(extended_dimension);

Committable suggestion skipped: line range outside the PR's diff.

Comment on lines 423 to 461
// Search
let mut ids: Vec<i32> = vec![0; K];
let mut distances: Vec<f32> = vec![0.0; K];
let mut vec = vec![0.0; DIMENSION];
for i in 0..DIMENSION {
vec[i] = i as f32
}
unsafe {
index.pin_mut().search(
vec.as_slice(),
K,
&mut ids[0] as *mut i32,
&mut distances[0] as *mut f32,
)
};
println!("ids:\n{:?}", ids);
println!("distances:\n{:?}", distances);
index.pin_mut().close_index();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Enhance search result validation in tests

The test prints search results without verifying their correctness. Add assertions to validate:

  • Result ordering (by distance)
  • Distance values are non-negative and properly ordered
  • IDs are within valid range

Add validation:

         unsafe {
             index.pin_mut().search(
                 vec.as_slice(),
                 K,
                 &mut ids[0] as *mut i32,
                 &mut distances[0] as *mut f32,
             )
         };
-        println!("ids:\n{:?}", ids);
-        println!("distances:\n{:?}", distances);
+        // Validate results
+        assert!(ids.iter().all(|&id| id >= 0 && id < 100), "Invalid IDs in results");
+        assert!(distances.windows(2).all(|w| w[0] <= w[1]), "Distances not properly ordered");
+        assert!(distances.iter().all(|&d| d >= 0.0), "Negative distances found");
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Search
let mut ids: Vec<i32> = vec![0; K];
let mut distances: Vec<f32> = vec![0.0; K];
let mut vec = vec![0.0; DIMENSION];
for i in 0..DIMENSION {
vec[i] = i as f32
}
unsafe {
index.pin_mut().search(
vec.as_slice(),
K,
&mut ids[0] as *mut i32,
&mut distances[0] as *mut f32,
)
};
println!("ids:\n{:?}", ids);
println!("distances:\n{:?}", distances);
index.pin_mut().close_index();
// Search
let mut ids: Vec<i32> = vec![0; K];
let mut distances: Vec<f32> = vec![0.0; K];
let mut vec = vec![0.0; DIMENSION];
for i in 0..DIMENSION {
vec[i] = i as f32
}
unsafe {
index.pin_mut().search(
vec.as_slice(),
K,
&mut ids[0] as *mut i32,
&mut distances[0] as *mut f32,
)
};
// Validate results
assert!(ids.iter().all(|&id| id >= 0 && id < 100), "Invalid IDs in results");
assert!(distances.windows(2).all(|w| w[0] <= w[1]), "Distances not properly ordered");
assert!(distances.iter().all(|&d| d >= 0.0), "Negative distances found");
index.pin_mut().close_index();

Copy link
Contributor
@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between 6de8a6a and 6e5aa96.

⛔ Files ignored due to path filters (1)
  • rust/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (7)
  • rust/Cargo.toml (1 hunks)
  • rust/libs/algorithm/Cargo.toml (1 hunks)
  • rust/libs/algorithms/qbg/Cargo.toml (1 hunks)
  • rust/libs/algorithms/qbg/build.rs (1 hunks)
  • rust/libs/algorithms/qbg/src/input.cpp (1 hunks)
  • rust/libs/algorithms/qbg/src/input.h (1 hunks)
  • rust/libs/algorithms/qbg/src/lib.rs (1 hunks)
🚧 Files skipped from review as they are similar to previous changes (6)
  • rust/Cargo.toml
  • rust/libs/algorithm/Cargo.toml
  • rust/libs/algorithms/qbg/Cargo.toml
  • rust/libs/algorithms/qbg/build.rs
  • rust/libs/algorithms/qbg/src/input.h
  • rust/libs/algorithms/qbg/src/lib.rs
🔇 Additional comments (1)
rust/libs/algorithms/qbg/src/input.cpp (1)

335-348: LGTM! Well-designed factory functions.

The factory functions are well-implemented:

  • Use of std::unique_ptr ensures proper memory management
  • Clear and concise interface for object creation
  • Follow modern C++ practices

Comment on lines +41 to +58
void Property::set_qbg_construction_parameters(
rust::usize extended_dimension,
rust::usize dimension,
rust::usize number_of_subvectors,
rust::usize number_of_blobs,
rust::i32 internal_data_type,
rust::i32 data_type,
rust::i32 distance_type)
{
qbg_initialize_construction_parameters(qbg_construction_parameters);
qbg_construction_parameters->extended_dimension = extended_dimension;
qbg_construction_parameters->dimension = dimension;
qbg_construction_parameters->number_of_subvectors = number_of_subvectors;
qbg_construction_parameters->number_of_blobs = number_of_blobs;
qbg_construction_parameters->internal_data_type = internal_data_type;
qbg_construction_parameters->data_type = data_type;
qbg_construction_parameters->distance_type = distance_type;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add comprehensive parameter validation in bulk parameter setters.

The bulk parameter setters set_qbg_construction_parameters and set_qbg_build_parameters should validate their input parameters before setting them. This helps catch invalid configurations early.

Example implementation for set_qbg_construction_parameters:

 void Property::set_qbg_construction_parameters(
     rust::usize extended_dimension,
     rust::usize dimension,
     rust::usize number_of_subvectors,
     rust::usize number_of_blobs,
     rust::i32 internal_data_type,
     rust::i32 data_type,
     rust::i32 distance_type)
 {
+    if (dimension == 0) {
+        throw std::invalid_argument("dimension must be positive");
+    }
+    if (number_of_subvectors == 0) {
+        throw std::invalid_argument("number_of_subvectors must be positive");
+    }
+    if (number_of_blobs == 0) {
+        throw std::invalid_argument("number_of_blobs must be positive");
+    }
+    if (extended_dimension < dimension) {
+        throw std::invalid_argument("extended_dimension must be >= dimension");
+    }
     qbg_initialize_construction_parameters(qbg_construction_parameters);
     // ... rest of the method
 }

Also applies to: 105-136

Comment on lines +230 to +233
Index::~Index()
{
// close_index();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Uncomment and implement the destructor.

The destructor is currently commented out, which will lead to resource leaks. The index needs to be properly closed when the object is destroyed.

Apply this fix:

 Index::~Index()
 {
-    // close_index();
+    close_index();
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Index::~Index()
{
// close_index();
}
Index::~Index()
{
close_index();
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants