Conversation
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Polar Signals Profiling ResultsLatest Run
Previous Runs (4)
Powered by Polar Signals Cloud |
Merging this PR will not alter performance
Comparing Footnotes
|
This comment was marked as off-topic.
This comment was marked as off-topic.
Benchmarks: TPC-H SF=1 on NVMESummary
Detailed Results Table
|
Benchmarks: FineWeb NVMeSummary
Detailed Results Table
|
Benchmarks: TPC-H SF=1 on S3Summary
Detailed Results Table
|
Benchmarks: TPC-DS SF=1 on NVMESummary
Detailed Results Table
|
Benchmarks: TPC-H SF=10 on NVMESummary
Detailed Results Table
|
Benchmarks: FineWeb S3Summary
Detailed Results Table
|
Benchmarks: Statistical and Population GeneticsSummary
Detailed Results Table
|
Benchmarks: TPC-H SF=10 on S3Summary
Detailed Results Table
|
Benchmarks: Clickbench on NVMESummary
Detailed Results Table
|
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
Signed-off-by: Nicholas Gates <nick@nickgates.com>
| // This is used by duckdb whenever there is no projection id in a logical_get node. | ||
| // For some reason we cannot return an empty DataChunk and duckdb will look for the virtual column | ||
| // with this index and create a data chunk with a single vector of that type. | ||
| static EMPTY_COLUMN_IDX: u64 = 18446744073709551614; |
There was a problem hiding this comment.
fun fact: if you cast this to i64, it's -1
|
@gatesn why did we remove the footer cache? |
|
We want a multi-file version of a Vortex scan that is entirely implemented in Vortex, doesn't rely on query engine features. This is so we can provide good support to e.g. PyVortex where users might just write a scan directly. So the MultiFileDataSource handles this internally now. |
This PR migrates the DuckDB implementation extension over to only using the Scan API. We create a generic DuckDB table function that operates over any DataSource, and then a specific implementation that takes a path glob and runs a multi-file scan as is the existing behavior. In the future we can easily add alternate table functions such as creating one from a DataSourceProvider URI, or by passing in a C pointer to a DataSource implementation. --------- Signed-off-by: Nicholas Gates <nick@nickgates.com>
This PR migrates the DuckDB implementation extension over to only using the Scan API.
We create a generic DuckDB table function that operates over any DataSource, and then a specific implementation that takes a path glob and runs a multi-file scan as is the existing behavior.
In the future we can easily add alternate table functions such as creating one from a DataSourceProvider URI, or by passing in a C pointer to a DataSource implementation.