8000 Add a UUID extension type by connortsui20 · Pull Request #6832 · vortex-data/vortex · GitHub
[go: up one dir, main page]

Skip to content

Add a UUID extension type#6832

Merged
gatesn merged 4 commits intodevelopfrom
ct/uuid
Mar 9, 2026
Merged

Add a UUID extension type#6832
gatesn merged 4 commits intodevelopfrom
ct/uuid

Conversation

@connortsui20
Copy link
Contributor
@connortsui20 connortsui20 commented Mar 6, 2026

Summary

Tracking Issue: #6854

Adds a UUID extension type to Vortex over a FixedSizeList<u8, 16>. There isn't much functionality implemented on top of this, and there is no exporter to Arrow's UUID canonical extension type. All of the logic is delegated to the Rust uuid crate.

There is still a question of if we want to bring in FixedSizeBinary instead of doing this.

Testing

Some small basic tests. They serve mostly as example code on how to work with Uuid extension arrays in general.

Open Questions

  • Do we actually need FixedSizeBinary instead here? Why does this not suffice?
  • Is there actually an alignment issue here?
  • The performance of unpack_native is terrible because we still do not have ScalarValue::Array (see ScalarValue::Array #6717)
  • Should we name this something other than vortex.uuid since we know we might change the storage type in the future?
  • UuidFromString should really be implemented under the Cast expression, but we do not have a system yet for pluggable expressions so it is not entirely clear how we would do that.
  • What other expressions should we have on Uuid? UuidToString is an easy next one, what else is there?

@connortsui20 connortsui20 requested review from asubiotto and gatesn March 6, 2026 19:52
@connortsui20 connortsui20 added the changelog/feature A new feature label Mar 6, 2026
@connortsui20 connortsui20 marked this pull request as ready for review March 6, 2026 19:58
elements.len()
);

let mut bytes = [0u8; UUID_BYTE_LEN];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤮 - can't wait for #6717

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

😭

/// Accepts any standard UUID string format (hyphenated, simple, braced, URN). Invalid strings
/// cause an error.
#[derive(Clone)]
pub struct UuidFromString;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I really do think we should just go directly to pluggable casting. CC @joseph-isaacs

@codspeed-hq
Copy link
codspeed-hq bot commented Mar 6, 2026

Merging this PR will improve performance by 26.44%

⚡ 4 improved benchmarks
✅ 996 untouched benchmarks
⏩ 1466 skipped benchmarks1

Performance Changes

Mode Benchmark BASE HEAD Efficiency
Simulation patched_take_200k_dispersed 5.6 ms 4.7 ms +19.86%
Simulation patched_take_200k_first_chunk_only 5.4 ms 4.8 ms +11.99%
Simulation take_200k_dispersed 4.5 ms 3.6 ms +24.36%
Simulation take_200k_first_chunk_only 4.2 ms 3.3 ms +26.44%

Comparing ct/uuid (b706a10) with develop (000e896)

Open in CodSpeed

Footnotes

  1. 1466 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

@connortsui20 connortsui20 marked this pull request as draft March 6, 2026 20:09
@connortsui20
Copy link
Contributor Author

Ok so I'm going to remove the UuidFromString scalar fn and add a Option<Version> using https://docs.rs/uuid/latest/uuid/enum.Version.html as metadata

@connortsui20 connortsui20 marked this pull request as ready for review March 6, 2026 21:47
@connortsui20 connortsui20 force-pushed the ct/uuid branch 3 times, most recently from 9242442 to 32ccf9f Compare March 6, 2026 22:23
@asubiotto
Copy link
Contributor

there is no exporter to Arrow's UUID canonical extension type.

Just to confirm, is the reason just to keep this PR contained?

Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
Signed-off-by: Connor Tsui <connor.tsui20@gmail.com>
@gatesn gatesn merged commit 2e58c8e into develop Mar 9, 2026
52 checks passed
@gatesn gatesn deleted the ct/uuid branch March 9, 2026 18:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

changelog/feature A new feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants

0