8000 Cube support by kirkbrauer · Pull Request #3867 · npgsql/npgsql · GitHub
[go: up one dir, main page]

Skip to content

Cube support#3867

Merged
NinoFloris merged 5 commits intonpgsql:mainfrom
kirkbrauer:main
Nov 3, 2025
Merged

Cube support#3867
NinoFloris merged 5 commits intonpgsql:mainfrom
kirkbrauer:main

Conversation

@kirkbrauer
Copy link
Contributor

Here's my implementation of the cube type from #698.

The binary send/recv functions are not documented anywhere at the moment, so I had to read the PostgreSQL source files to determine the binary type of the cube type.

https://github.com/postgres/postgres/blob/c30f54ad732ca5c8762bb68bbe0f51de9137dd72/contrib/cube/cube.c

From what I found, there appears to be a 32-bit header that indicates the number of dimensions (i31) in the cube and whether the cube is a point (the first bit) meaning it has the same lower-left and upper-right coordinates. After that there is a linear list of doubles containing all the coordinates in the cube, if the cube is a point they don't bother storing the upper-right coordinates since they are the same as the lower-left ones.

For example, here is a hex dump of the cube (1):

00000000  80 00 00 01 3F F0 00 00 00 00 00 00 -> (1)

And here is a hex dump for a cube with lower-left and upper-right coordinates:

00000000  00 00 00 02 40 14 00 00 00 00 00 00 40 08 00 00
00000010  00 00 00 00 40 00 00 00 00 00 00 00 40 18 00 00
00000020  00 00 00 00 -> (5, 3),(2, 6)

We also need to determine which parts of the API need to be implemented on the client side so users can do basic cube manipulation and comparison directly in their program (or if we even need this).

@kirkbrauer kirkbrauer requested review from roji and vonzshik as code owners July 1, 2021 04:29
Copy link
Member
@roji roji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks pretty good! See some comments/questions below.

The NpgsqlCube API surface is indeed the thing we need to tie down well. I'm not convinced we need to provide too many client-side operation implementations - the type seems simple enough (basically two coordinate lists) that users can do whatever they want. But am definitely open if you think we should provide some operations.

PS I'm on semi-vacation for the coming week, so expect some delays with responses.

@kirkbrauer
Copy link
Contributor Author

@roji What do you think about adding the basic client-side API for manipulating and querying the cube?

I think that the cube type is specific enough that it needs the basic client-side operations so that users can do local math on the cube and manipulate its values. We already have to provide a few of the query methods (cube_is_point, cube_dim, cube_ll_coord, cube_ur_coord) for our implementation to display itself and serialize/deserialize.

My reasoning for this is that manually manipulating the cube's lower-left or upper-right coordinate lists is inherently unsafe and makes it very easy to corrupt the cube. If the number of coordinates in the two lists gets out of sync, we cannot properly serialize/deserialize the cube anymore because such a scenario is not allowed in the cube spec.

Making the LowerLeft and UpperRight IEnumerable<double>s would prevent the user from unsafely changing the dimensions of the cube and require them to use the manipulation API therefore preventing corruption of the cube data. We could also provide utility methods like AddDimension(double c1, double? c2 = null) to help with cube manipulation if we want the cube to be a mutable data type.

@roji
Copy link
Member
roji commented Jul 3, 2021

Making the LowerLeft and UpperRight IEnumerables would prevent the user from unsafely changing the dimensions of the cube and require them to use the manipulation API therefore preventing corruption of the cube data

For preventing direct manipulation of the lists, IReadOnlyList can be used instead of IEnumerable.

More generally, I'd avoid going in the direction of doing client-side implementations - we generally don't want to fully reproduce PostgreSQL types, with all their complexity. That can easily lead us to a place where we duplicate the entire PostgreSQL function set in .NET (because why implement one operation client-side but not another?), and have to maintain it, fix bugs, etc.

We can always start with a minimal implementation, where the data is made fully available, and see what users request; if there's a a lot of need for certain client-side operations, we can always add those later. Otherwise, if you strongly feel we should be providing client-side implementations, do you have certain operations in mind (and reasoning for them)? Or just all of them?

@kirkbrauer
Copy link
Contributor Author
kirkbrauer commented Jul 3, 2021

That makes sense, I forgot about the specific advantages of the 'IReadOnlyList' interface.

I also agree that we should limit the client-side API to the bare minimum operations to start out and then expand it if necessary. I think the best way that users can manipulate the cube is just by using the constructor and the provided lists to re-build a new cube if they desire.

We should probably also leave all the mathematical operations to the database so we don't have to code or debug those.

The only exception to this could be the AddDimension(double a, double b) API (this does not exist in the database impl), which might make it a bit easier to add a dimension to the cube.

@roji
Copy link
Member
roji commented Jul 3, 2021

Great, sounds like we're aligned. There's still an open question in my mind as to whether the type should be mutable or immutable (i.e. if we should expose List or IReadOnlyList). Immutable would allow better enforcing of correctness, but if the lists are long there's a perf hit for having to generate copies etc.

@kirkbrauer
Copy link
Contributor Author

@roji The maximum size of each list is 100 elements, so AddDimension might be warranted, of course we have to determine if this method returns a new instance of the cube or if it just 8000 mutates the existing lists internally.

@roji
Copy link
Member
roji commented Jul 3, 2021

Yeah. Since the data size here isn't small and fixed, it may make sense to have a fully mutable API and leave it to the user's responsibility to not mess up (e.g. by having different list lengths).

@kirkbrauer
Copy link
Contributor Author

That makes sense, if the list sizes get out of sync, we can just pad the opposite array with zeros when serializing or displaying the string rep.

@roji
Copy link
Member
roji commented Jul 12, 2021

@kirkbrauer just a note to say that this is waiting on you (but absolutely no rush of course!). You can push more commits, and when you'd like another review, ask for one in the reviews panel above.

@kirkbrauer
Copy link
Contributor Author

@roji No worries, I got a bit busy last week, but I'll push a new updated version with the changes we discussed above when I get a chance.

@kirkbrauer
Copy link
Contributor Author

@roji Quick question regarding the public shipped API vs public unshipped API. Which file would be the best to place the declarations for the NpgsqlCube so we can run the tests? I am not super familiar with these files, but I know that they are commonly used on MS open-source projects to denote the public API. I also know that there isn't great support for Rider, but I'll try to autogenerate them in Visual Studio.

@kirkbrauer kirkbrauer requested a review from roji July 12, 2021 19:20
@roji
Copy link
Member
roji commented Jul 13, 2021

Re the public API files, all changes should be added to the Unshipped file (as you've done) - this is basically a snapshot of all changes done in the current, unreleased version. When we release 6.0, we'll merge those changes into the Shipped file, and the Unshipped file will be empty again.

Unfortunately you're right that the analyzer has issues under Rider - I usually add into Unshipped manually, but VS can indeed be used as well.

Copy link
Member
@roji roji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kirkbrauer thanks for the work on this - here are some more comments.

@kirkbrauer kirkbrauer requested a rev BEA4 iew from roji July 19, 2021 13:23
Copy link
Member
@roji roji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kirkbrauer sorry for the long wait (was away on vacation). This looks almost ready - see remaining comments below, I'm now available to finalize this and merge.

@roji
Copy link
Member
roji commented Aug 2, 2021

@kirkbrauer can you please squash and rebase this on the latest main branch?

@kirkbrauer
Copy link
Contributor Author

@roji Doing it right now.

@kirkbrauer
Copy link
Contributor Author

@roji All squashed and rebased. We should be ready for merge, though we might need to do a little work on the tests first.

@roji
Copy link
Member
roji commented Aug 3, 2021

@kirkbrauer agreed - everything LGTM, but test coverage currently seems to be mainly for the type handler (reading/writing from PG). Some coverage for the APIs on the type would be good.

@kirkbrauer kirkbrauer requested a review from vonzshik August 3, 2021 14:10
Copy link
Contributor
@vonzshik vonzshik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just as @roji suggested, we should have some tests for the cube API. After that. it should be good enough to merge.

@kirkbrauer
Copy link
Contributor Author
kirkbrauer commented Aug 3, 2021

@roji I'm working on adding additional tests for the cube type, however I have run into a very obvious problem related to our reference optimizations.

If the cube is a point, we set both the lowerLeft and upperRight lists to the same List instance. This is super efficient, but if a user wants to change one list, they will end up manipulating both lists and the source list since all those values reference the same original list instance.

This is probably fine if we treat the cube as immutable, however, if the user is often creating or manipulating their own cubes, it would produce the (possibly) unexpected behavior especially for users who might not be as familiar with pass-by-reference languages.

In other types such as the NpgsqlPath, we are instead re-instantiating the List objects with the same values within the constructor at the penalty of performance, I think this might be a better approach to maintain parity with the rest of the library.

@roji
Copy link
Member
roji commented Aug 4, 2021

You raise a good point... Here are some options I'm seeing:

  1. Duplicate the user-provided list when constructing a point. As you pointed out, this wouldn't be good for perf (note that NpgsqlPath and others were written a long time ago, and shouldn't constrain this type).
  2. Make the lists immutable - also bad for perf.
  3. Have a separate type for points, which only exposes a single list. This seems a bit heavy/over-complicated to me, and I also worry a bit about confusion between this new point type and PostGIS points.
  4. Keep the 2nd list internally null for points, and duplicate it only when accessed through the public property (a sort of "copy-on-write"). The handler would use a separate, internal access that bypasses duplication. This is still non-ideal perf-wise, as duplication happens when reading the property (and not when updating the list), so certain user access patterns would still cause needless duplication.
  5. Simply say that UpperRight is null for points. This would make that property nullable (possibly annoying), but would be simple and consistent.
  6. Do nothing, and explicitly document that when using the point constructor (or factory), a point remains a point, and so any change in one list is reflected in the other.

Since this type is generally about reading/writing data from PostgreSQL, I think perf should be prioritized. I actually think the current situation isn't necessarily that bad - some users may trip over it, but with good documentation things should be OK. Otherwise options 4 or 5 above also seem reasonable - what do you think?

(@vonzshik @NinoFloris any thoughts here?)

@kirkbrauer
Copy link
Contributor Author

I've been thinking about the best way to go about this. I feel that options 2, and 3 are probably off the table because they introduce additional complexity/confusion that would not benefit end users at all.

Option 6 could work because we already have that implemented, but this would also create the issues where a user would be locked into a specific type of cube, which reduces flexibility and reusability of code making use of the NpgsqlCube type.

As you mentioned, 5 is probably the most reasonable balance between perf and DX, I like the simplicity of setting the UpperRight property to null, however if we make the UpperRight nullable, it could create a situation where a point has to be treated as a special case in any business logic a user would create. If we deem this OK, then we can go ahead with this approach.

If we wanted to go with option 4, it would make much more sense to just make the lists read-only and introduce custom APIs to add/remove coordinates, that way we can preform any optimizations we want including only duplicating the list on write if the lower-left and upper-right values are different. I'm not convinced this is the best approach because it requires custom cube manipulation methods, but at least it allows us to provide the most optimized internal impl and keeps the list lengths the same, preventing us from having to throw exceptions in the properties.

A more radical solution might be to just totally abandon the LowerLeft and UpperRight lists concept and make the entire cube an enumerable of named value tuples (double LowerLeft, double UpperRight) and provide APIs to manipulate the internal List<(double, double)>. This would require duplicating the double for each coordinate of a point, however since ValueTuples are just structs on the stack, the performance shouldn't take too much of a hit compared to a list of decimals and we get the advantage of only having to memory manage a single list.

@keyneom
Copy link
keyneom commented Mar 24, 2023

Awesome, yes, I see that makes sense I hadn't been paying attention to the differences between the two repos, this one handles the underlying types and that one interfaces with efcore. Thanks for the explanation! This looks good to me then!

@NinoFloris
Copy link
Member

@kirkbrauer are you still interested in bringing this over the finish line? Maybe there is something we can do to help?

4D1C

@kirkbrauer
Copy link
Contributor Author

Hi @NinoFloris it's been quite a while, but I do think it would be valuable to get this merged into core. Even though I no longer work with low-level Npgsql things on a daily basis, having full coverage for the Postgres API is still good for the project. We have this reference implementation now, but it needs to be rebased and thoroughly tested. Also, I don't think I ever contributed the LINQ side of things which would be required to make this actually useful, so perhaps someone who is more of a LINQ expert than myself could tackle that later.

@Jimmacle
Copy link

I started working on EF Core support for this a while back but it's been idle for a couple years: npgsql/efcore.pg#2808

@roji
Copy link
Member
roji commented Oct 30, 2025

@Jimmacle for EF Core support we can absolutely do this, but I think we need to merge the lower-level support here first.

- Arrays of cube
- Dimension mismatches
- Exceeding max dimensions (100)
- Negative values
- HashCode equality
- Zero values
- Maximum dimensions (100)
@kirkbrauer
Copy link
Contributor Author

@roji I attempted to rebase my changes on the latest. I had to make a few modifications to how we register the type, but my understanding is that since cube is an extension type, it should be treated as a plug-in like ltree. I also expanded the test cases to cover more scenarios. Things seem to be working locally when running the tests.

Copy link
Member
@roji roji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks pretty good! I'll take a deeper look later, but here are some first comments.


[Test, TestCaseSource(nameof(CubeValues))]
public Task Cube(NpgsqlCube cube, string sqlLiteral)
=> AssertType(cube, sqlLiteral, "cube", NpgsqlDbType.Cube, isDefaultForWriting: false);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't isDefaultForWriting be true, as this is a dedicated .NET type that corresponds exactly to the PG type?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isDefault in general should be true, we only have a single mapping for cube.

Copy link
Member
@NinoFloris NinoFloris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is great! Thanks a lot for updating this. I commented on quite a few things, but that should be all from my end.

…lied API suggestions, fixed CLR type inference (need to discuss further)
Add(DataTypeNames.PgLsn, oid: 3220, arrayOid: 3221);
Add(DataTypeNames.Unknown, oid: 705, arrayOid: 0);
Add(DataTypeNames.Void, oid: 2278, arrayOid: 0);
Add(DataTypeNames.Cube, oid: 100000, arrayOid: 100001);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to figure out a solution to this problem, this was required to make the tests pass, but probably creates many other unintended consequences.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We definitely shouldn't have cube in PostgresMinimalDatabaseInfo.

public static DataTypeName TimeTz => ValidatedName("pg_catalog.timetz");
public static DataTypeName Inet => ValidatedName("pg_catalog.inet");
public static DataTypeName Cidr => ValidatedName("pg_catalog.cidr");
public static DataTypeName Cube => ValidatedName("pg_catalog.cube");
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The cube extension type can be installed in any schema not just pg_catalog, typically it is installed in public, but the user may choose to install it somewhere else. Therefore, this qualifier is actually invalid.

@kirkbrauer
Copy link
Contributor Author

@roji @NinoFloris I discovered a critical issue with the implementation of the cube type. Since this type is technically an extension type, it won't be installed in the pg_catalog schema, instead it will usually be installed in public or whatever schema the user chooses to install it in.

It appears that there is currently no mechanism to support extension types within the GlobalTypeMapper's CLR type inference unless we have registered them in PostgresMinimalDatabaseInfo which is designed to only handle pg_catalog types.

We could pretend that the cube type is part of the pg_catalog, but I don't think that is a long-term solution.

@roji
Copy link
Member
roji commented Nov 2, 2025

@kirkbrauer are you sure about that? Isn't cube exactly like, say, the PostGIS types, which also come from an extension (and can be installed in any schema) - and AFAIK work fine?

If we do have a limitation/problem, I'd treat that as separate and open a new issue, without blocking cube per se; again, whatever problem we have here would also affect PostGIS (and probably some other types as well)...

@kirkbrauer
Copy link
Contributor Author

Thanks @NinoFloris, disabling CLR type inference makes sense. I think I was trying to force it to work the same way as the built-in types which obviously won't work hence the confusion. @roji if this works properly then we should be good.

@NinoFloris
Copy link
Member
NinoFloris commented Nov 2, 2025

No problem, it's something to improve in Npgsql. All external plugins have to rely on DataTypeName anyway, so this part of the inference isn't relevant there. And as it really only affects the plugin types that have NpgsqlDbType cases it hasn't really been worth the effort.

Note, it really just disables NpgsqlDbType inference. So when we see just an NpgsqlCube on a parameter, and you inspect NpgsqlDbType we fail to tell you it's NpgsqlDbType.Cube.

@NinoFloris NinoFloris dismissed vonzshik’s stale review November 2, 2025 12:44

Stale review from 2021

@NinoFloris NinoFloris merged commit 9163444 into npgsql:main Nov 3, 2025
13 checks passed
@NinoFloris
Copy link
Member

@kirkbrauer thank you for your work and persistence. I'm very glad to see this be part of npgsql for 10.0!

@roji
Copy link
Member
roji commented Nov 3, 2025

Likewise, really happy this got in!

@roji roji linked an issue Nov 3, 2025 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support for the cube type

6 participants

0