-
Notifications
You must be signed in to change notification settings - Fork 3.9k
feat(metastore): shard sections queries over index files #20134
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
b8abe5e to
15e2ffa
Compare
623f0ab to
3b0f256
Compare
3b0f256 to
fb14c2f
Compare
|
💻 Deploy preview available (feat(metastore): query metastore before building physical plan (POC)): |
fb14c2f to
78945e5
Compare
7fc0687 to
b0b6541
Compare
373565e to
27b42f4
Compare
b0b6541 to
5515873
Compare
- Build a distributed metastore plan from GetIndexes and execute it via the v2 scheduler/worker pipeline - Split the request into per-index PointersScan tasks, then fan-in and CollectSections to produce final section descriptors - Add physical plan + protobuf support for Merge/PointersScan, improve tracing/cleanup, and add coverage for planner/workflow/proto roundtrips feat(metastore): distributed metastore queries execution
71ab37a to
501dce4
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed offline: This LGTM, I'm happy with how it's implement and I'm willing to merge it to unblock other work and follow up on any minor issues later.
I left a couple of questions but neither are blocking.
| return translateEOFPipeline{pipeline} | ||
| } | ||
|
|
||
| type translateEOFPipeline struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this need to be a pipeline? Could it be a standard function wrapping the error?
| Start: start, | ||
| End: end, | ||
|
|
||
| MaxTimeRange: TimeRange{ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we have two start/end ranges here?
What this PR does / why we need it:
Running metastore queriesin a distributed manner using query engine workers:
feat(metastore): distributed metastore queries execution
Special notes for your reviewer:
This PR is based on top of
Checklist
CONTRIBUTING.mdguide (required)featPRs are unlikely to be accepted unless a case can be made for the feature actually being a bug fix to existing behavior.docs/sources/setup/upgrade/_index.mddeprecated-config.yamlanddeleted-config.yamlfiles respectively in thetools/deprecated-config-checkerdirectory. Example PR