-
Notifications
You must be signed in to change notification settings - Fork 711
Replace glob crate with globset for config file pattern matching #3866
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
This change replaces the `glob` crate with `globset` + `walkdir` for parsing glob patterns in `ConfigFileGlob::new`, following the same approach as ripgrep and the original glob crate's design. Benefits: - Better pattern matching with well-maintained globset (used by ripgrep) - Efficient directory traversal with walkdir - Clear, well-documented implementation with helper functions - Support for future OR operations via GlobSetBuilder Implementation details: - Extracts base path at component level (like glob crate does) - Uses globset::Glob for pattern matching - Uses walkdir for recursive directory traversal - Maintains backward compatibility Added comprehensive unit tests covering: - Base path extraction for various glob patterns - Glob metacharacter detection - Integration tests with real file system operations Fixes #3350 Co-authored-by: Sculptor <sculptor@imbue.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR replaces the glob
crate with globset
+ walkdir
for configuration file pattern matching, providing better pattern matching capabilities and more efficient directory traversal.
Key changes:
- Replaces single
glob
dependency withglobset
andwalkdir
crates - Implements custom base path extraction and file matching logic
- Adds comprehensive unit and integration tests for glob functionality
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
File | Description |
---|---|
tensorzero-core/Cargo.toml | Updates dependencies by removing glob and adding globset and walkdir |
tensorzero-core/src/config/mod.rs | Refactors ConfigFileGlob::new() with helper functions for pattern matching and directory traversal |
tensorzero-core/src/config/tests.rs | Adds comprehensive test coverage for glob pattern matching functionality |
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 3 out of 4 changed files in this pull request and generated 2 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
TensorZero CI Bot Automated CommentThanks for the PR! CI is failing due to Rust type inference errors in tests triggered by the switch from glob to globset (which pulled in bstr). Specifically, in tensorzero-core/tests/optimization/common/dicl.rs, calls like .as_ref() on a String are now ambiguous because String implements AsRef for multiple targets (str, [u8], OsStr, Path). This causes assert_eq! to require type annotations (E0283). Examples from the logs:
Fix: replace .as_ref() with .as_str() on the String fields so the comparison type is unambiguous (&str). This resolves the ambiguity introduced by bstr and the additional PartialEq impls. Once this change is applied, the lint, node bindings build, and rust test jobs should pass. Warning I encountered an error while trying to create a follow-up PR: Failed to create follow-up PR using remote https://x-access-token:***@github.com/tensorzero/tensorzero.git: git apply --whitespace=nowarn /tmp/tensorzero-pr-1XLzYp/repo/tensorzero.patch failed: error: No valid patches in input (allow with "--allow-empty") The patch I tried to generate is as follows: *** Begin Patch
*** Update File: tensorzero-core/tests/optimization/common/dicl.rs
@@
- assert_eq!(dicl_config.embedding_model.as_ref(), embedding_model);
+ assert_eq!(dicl_config.embedding_model.as_str(), embedding_model);
@@
- assert_eq!(dicl_config.model.as_ref(), model);
+ assert_eq!(dicl_config.model.as_str(), model);
@@
- assert_eq!(dicl_config.embedding_model.as_ref(), embedding_model);
+ assert_eq!(dicl_config.embedding_model.as_str(), embedding_model);
@@
- assert_eq!(dicl_config.model.as_ref(), model);
+ assert_eq!(dicl_config.model.as_str(), model);
*** End Patch |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 3 out of 4 changed files in this pull request and generated 2 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 5 out of 6 changed files in this pull request and generated 2 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
|
||
// Validate that the returned config matches our input | ||
assert_eq!(dicl_config.embedding_model.as_ref(), embedding_model); | ||
assert_eq!(&dicl_config.embedding_model, &embedding_model); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
? is this related
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
globset
added some trait that was making clippy complain no idea why
TensorZero CI Bot Automated CommentRust tests failed in validate due to a single failing unit test in tensorzero-core: config::tests::test_extract_base_path_from_glob_no_pattern. Root cause:
Fix:
Once applied, this should unblock the Rust test job, allowing the rest of the workflow steps to run. Warning I encountered an error while trying to create a follow-up PR: Failed to create follow-up PR using remote https://x-access-token:***@github.com/tensorzero/tensorzero.git: git apply --whitespace=nowarn /tmp/tensorzero-pr-tKOD4d/repo/tensorzero.patch failed: error: No valid patches in input (allow with "--allow-empty") The patch I tried to generate is as follows: *** Begin Patch
*** Update File: tensorzero-core/src/config/mod.rs
@@
/// - `/tmp/config/**/*.toml` → `/tmp/config`
/// - `config/**/*.toml` → `config`
/// - `*.toml` → `.`
fn extract_base_path_from_glob(glob: &str) -> PathBuf {
let path = Path::new(glob);
let mut base_components = Vec::new();
+ let mut encountered_meta = false;
for component in path.components() {
let component_str = component.as_os_str().to_string_lossy();
// Stop at the first component containing glob metacharacters
if component_str.contains(['*', '?', '[', ']', '{', '}']) {
+ encountered_meta = true;
break;
}
base_components.push(component);
}
- if base_components.is_empty() {
+ if base_components.is_empty() {
PathBuf::from(".")
} else {
- base_components.iter().collect()
+ // If we never encountered any glob metacharacters in the entire pattern,
+ // return the parent directory of the path (or "." if none).
+ if !encountered_meta {
+ return path
+ .parent()
+ .map(|p| {
+ if p.as_os_str().is_empty() {
+ PathBuf::from(".")
+ } else {
+ p.to_path_buf()
+ }
+ })
+ .unwrap_or_else(|| PathBuf::from("."));
+ }
+
+ base_components.iter().collect()
}
}
/// Walk a directory tree and find all files matching the glob pattern.
fn find_matching_files(base_path: &Path, matcher: &globset::GlobMatcher) -> Vec<PathBuf> {
*** End Patch |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 5 out of 6 changed files in this pull request and generated no new comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
…com/tensorzero/tensorzero into sculptor/implement-tensor-zero-core
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Copilot reviewed 4 out of 5 changed files in this pull request and generated 2 comments.
Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.
/// Check which files match the glob pattern. | ||
/// If the base path is a file, check it directly against the matcher. | ||
/// If the base path is a directory, walk it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing function name and parameters in documentation. Should use proper rustdoc format with /// # Arguments
and /// # Returns
sections.
Copilot uses AI. Check for mistakes.
for component in path.components() { | ||
let component_str = component.as_os_str().to_string_lossy(); | ||
// Stop at the first component containing glob metacharacters | ||
if component_str.contains(['*', '?', '[', ']', '{', '}']) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The glob metacharacters are hardcoded. Consider extracting them into a constant for better maintainability and to avoid duplication if used elsewhere.
Copilot uses AI. Check for mistakes.
This change replaces the
glob
crate withglobset
+walkdir
for parsing glob patterns inConfigFileGlob::new
, following the same approach as ripgrep and the original glob crate's design.Benefits:
Implementation details:
Added comprehensive unit tests covering:
Fixes #3350