8000 Replace glob crate with globset for config file pattern matching by GabrielBianconi · Pull Request #3866 · tensorzero/tensorzero · GitHub
[go: up one dir, main page]

Skip to content

Conversation

GabrielBianconi
Copy link
Member

This change replaces the glob crate with globset + walkdir for parsing glob patterns in ConfigFileGlob::new, following the same approach as ripgrep and the original glob crate's design.

Benefits:

  • Better pattern matching with well-maintained globset (used by ripgrep)
  • Efficient directory traversal with walkdir
  • Clear, well-documented implementation with helper functions
  • Support for future OR operations via GlobSetBuilder

Implementation details:

  • Extracts base path at component level (like glob crate does)
  • Uses globset::Glob for pattern matching
  • Uses walkdir for recursive directory traversal
  • Maintains backward compatibility

Added comprehensive unit tests covering:

  • Base path extraction for various glob patterns
  • Glob metacharacter detection
  • Integration tests with real file system operations

Fixes #3350

This change replaces the `glob` crate with `globset` + `walkdir` for
parsing glob patterns in `ConfigFileGlob::new`, following the same
approach as ripgrep and the original glob crate's design.

Benefits:
- Better pattern matching with well-maintained globset (used by ripgrep)
- Efficient directory traversal with walkdir
- Clear, well-documented implementation with helper functions
- Support for future OR operations via GlobSetBuilder

Implementation details:
- Extracts base path at component level (like glob crate does)
- Uses globset::Glob for pattern matching
- Uses walkdir for recursive directory traversal
- Maintains backward compatibility

Added comprehensive unit tests covering:
- Base path extraction for various glob patterns
- Glob metacharacter detection
- Integration tests with real file system operations

Fixes #3350

Co-authored-by: Sculptor <sculptor@imbue.com>
@Copilot Copilot AI review requested due to automatic review settings October 9, 2025 21:16
Copy link
Contributor
@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR replaces the glob crate with globset + walkdir for configuration file pattern matching, providing better pattern matching capabilities and more efficient directory traversal.

Key changes:

  • Replaces single glob dependency with globset and walkdir crates
  • Implements custom base path extraction and file matching logic
  • Adds comprehensive unit and integration tests for glob functionality

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
tensorzero-core/Cargo.toml Updates dependencies by removing glob and adding globset and walkdir
tensorzero-core/src/config/mod.rs Refactors ConfigFileGlob::new() with helper functions for pattern matching and directory traversal
tensorzero-core/src/config/tests.rs Adds comprehensive test coverage for glob pattern matching functionality

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@Copilot Copilot AI review requested due to automatic review settings October 9, 2025 21:29
Copy link
Contributor
@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 3 out of 4 changed files in this pull request and generated 2 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Copy link
Contributor
github-actions bot commented Oct 9, 2025

TensorZero CI Bot Automated Comment

Thanks for the PR! CI is failing due to Rust type inference errors in tests triggered by the switch from glob to globset (which pulled in bstr). Specifically, in tensorzero-core/tests/optimization/common/dicl.rs, calls like .as_ref() on a String are now ambiguous because String implements AsRef for multiple targets (str, [u8], OsStr, Path). This causes assert_eq! to require type annotations (E0283).

Examples from the logs:

  • assert_eq!(dicl_config.embedding_model.as_ref(), embedding_model);
  • assert_eq!(dicl_config.model.as_ref(), model);

Fix: replace .as_ref() with .as_str() on the String fields so the comparison type is unambiguous (&str). This resolves the ambiguity introduced by bstr and the additional PartialEq impls.

Once this change is applied, the lint, node bindings build, and rust test jobs should pass.

Warning

I encountered an error while trying to create a follow-up PR: Failed to create follow-up PR using remote https://x-access-token:***@github.com/tensorzero/tensorzero.git: git apply --whitespace=nowarn /tmp/tensorzero-pr-1XLzYp/repo/tensorzero.patch failed: error: No valid patches in input (allow with "--allow-empty")
.

The patch I tried to generate is as follows:

*** Begin Patch
*** Update File: tensorzero-core/tests/optimization/common/dicl.rs
@@
-    assert_eq!(dicl_config.embedding_model.as_ref(), embedding_model);
+    assert_eq!(dicl_config.embedding_model.as_str(), embedding_model);
@@
-    assert_eq!(dicl_config.model.as_ref(), model);
+    assert_eq!(dicl_config.model.as_str(), model);
@@
-    assert_eq!(dicl_config.embedding_model.as_ref(), embedding_model);
+    assert_eq!(dicl_config.embedding_model.as_str(), embedding_model);
@@
-    assert_eq!(dicl_config.model.as_ref(), model);
+    assert_eq!(dicl_config.model.as_str(), model);
*** End Patch

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@Copilot Copilot AI review requested due to automatic review settings October 9, 2025 21:57
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Contributor
@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 3 out of 4 changed files in this pull request and generated 2 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@Copilot Copilot AI review requested due to automatic review settings October 9, 2025 22:17
Copy link
Contributor
@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 5 out of 6 changed files in this pull request and generated 2 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.


// Validate that the returned config matches our input
assert_eq!(dicl_config.embedding_model.as_ref(), embedding_model);
assert_eq!(&dicl_config.embedding_model, &embedding_model);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

? is this related

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

globset added some trait that was making clippy complain no idea why

Copy link
Contributor
github-actions bot commented Oct 9, 2025

TensorZero CI Bot Automated Comment

Rust tests failed in validate due to a single failing unit test in tensorzero-core: config::tests::test_extract_base_path_from_glob_no_pattern.

Root cause:

  • The new extract_base_path_from_glob helper returned the full file path when the glob contained no metacharacters. The intended behavior (and test expectation) is to return the parent directory of the provided path when no glob metacharacters are present. This caused:
    left: "/tmp/config/file.toml"
    right: "/tmp/config"

Fix:

  • Adjust extract_base_path_from_glob so that if the pattern contains no glob metacharacters at all, it returns the parent directory (or "." if there is no parent). All other test cases in the new glob logic already pass.

Once applied, this should unblock the Rust test job, allowing the rest of the workflow steps to run.

Warning

I encountered an error while trying to create a follow-up PR: Failed to create follow-up PR using remote https://x-access-token:***@github.com/tensorzero/tensorzero.git: git apply --whitespace=nowarn /tmp/tensorzero-pr-tKOD4d/repo/tensorzero.patch failed: error: No valid patches in input (allow with "--allow-empty")
.

The patch I tried to generate is as follows:

*** Begin Patch
*** Update File: tensorzero-core/src/config/mod.rs
@@
 /// - &#x60;/tmp/config/**/*.toml&#x60; → &#x60;/tmp/config&#x60;
 /// - &#x60;config/**/*.toml&#x60; → &#x60;config&#x60;
 /// - &#x60;*.toml&#x60; → &#x60;.&#x60;
 fn extract_base_path_from_glob(glob: &amp;str) -&gt; PathBuf {
     let path &#x3D; Path::new(glob);
     let mut base_components &#x3D; Vec::new();
+    let mut encountered_meta &#x3D; false;
 
     for component in path.components() {
         let component_str &#x3D; component.as_os_str().to_string_lossy();
         // Stop at the first component containing glob metacharacters
         if component_str.contains([&#x27;*&#x27;, &#x27;?&#x27;, &#x27;[&#x27;, &#x27;]&#x27;, &#x27;{&#x27;, &#x27;}&#x27;]) {
+            encountered_meta &#x3D; true;
             break;
         }
         base_components.push(component);
     }
 
-    if base_components.is_empty() {
+    if base_components.is_empty() {
         PathBuf::from(&quot;.&quot;)
     } else {
-        base_components.iter().collect()
+        // If we never encountered any glob metacharacters in the entire pattern,
+        // return the parent directory of the path (or &quot;.&quot; if none).
+        if !encountered_meta {
+            return path
+                .parent()
+                .map(|p| {
+                    if p.as_os_str().is_empty() {
+                        PathBuf::from(&quot;.&quot;)
+                    } else {
+                        p.to_path_buf()
+                    }
+                })
+                .unwrap_or_else(|| PathBuf::from(&quot;.&quot;));
+        }
+
+        base_components.iter().collect()
     }
 }
 
 /// Walk a directory tree and find all files matching the glob pattern.
 fn find_matching_files(base_path: &amp;Path, matcher: &amp;globset::GlobMatcher) -&gt; Vec&lt;PathBuf&gt; {
*** End Patch

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@Copilot Copilot AI review requested due to automatic review settings October 9, 2025 22:49
Copy link
Contributor
@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 5 out of 6 changed files in this pull request and generated no new comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

@GabrielBianconi GabrielBianconi marked this pull request as draft October 9, 2025 23:04
@Copilot Copilot AI review requested due to automatic review settings October 10, 2025 13:36
Copy link
Contributor
@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

Copilot reviewed 4 out of 5 changed files in this pull request and generated 2 comments.


Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines +616 to +618
/// Check which files match the glob pattern.
/// If the base path is a file, check it directly against the matcher.
/// If the base path is a directory, walk it.
Copy link
Copilot AI Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing function name and parameters in documentation. Should use proper rustdoc format with /// # Arguments and /// # Returns sections.

Copilot uses AI. Check for mistakes.

for component in path.components() {
let component_str = component.as_os_str().to_string_lossy();
// Stop at the first component containing glob metacharacters
if component_str.contains(['*', '?', '[', ']', '{', '}']) {
Copy link
Copilot AI Oct 10, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The glob metacharacters are hardcoded. Consider extracting them into a constant for better maintainability and to avoid duplication if used elsewhere.

Copilot uses AI. Check for mistakes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Consider switching to globset for glob parsing.

2 participants

0