Conversation
There was a problem hiding this comment.
Thanks, this looks pretty good! See some comments/questions below.
The NpgsqlCube API surface is indeed the thing we need to tie down well. I'm not convinced we need to provide too many client-side operation implementations - the type seems simple enough (basically two coordinate lists) that users can do whatever they want. But am definitely open if you think we should provide some operations.
PS I'm on semi-vacation for the coming week, so expect some delays with responses.
|
@roji What do you think about adding the basic client-side API for manipulating and querying the cube? I think that the cube type is specific enough that it needs the basic client-side operations so that users can do local math on the cube and manipulate its values. We already have to provide a few of the query methods (cube_is_point, cube_dim, cube_ll_coord, cube_ur_coord) for our implementation to display itself and serialize/deserialize. My reasoning for this is that manually manipulating the cube's lower-left or upper-right coordinate lists is inherently unsafe and makes it very easy to corrupt the cube. If the number of coordinates in the two lists gets out of sync, we cannot properly serialize/deserialize the cube anymore because such a scenario is not allowed in the cube spec. Making the |
For preventing direct manipulation of the lists, IReadOnlyList can be used instead of IEnumerable. More generally, I'd avoid going in the direction of doing client-side implementations - we generally don't want to fully reproduce PostgreSQL types, with all their complexity. That can easily lead us to a place where we duplicate the entire PostgreSQL function set in .NET (because why implement one operation client-side but not another?), and have to maintain it, fix bugs, etc. We can always start with a minimal implementation, where the data is made fully available, and see what users request; if there's a a lot of need for certain client-side operations, we can always add those later. Otherwise, if you strongly feel we should be providing client-side implementations, do you have certain operations in mind (and reasoning for them)? Or just all of them? |
|
That makes sense, I forgot about the specific advantages of the 'IReadOnlyList' interface. I also agree that we should limit the client-side API to the bare minimum operations to start out and then expand it if necessary. I think the best way that users can manipulate the cube is just by using the constructor and the provided lists to re-build a new cube if they desire. We should probably also leave all the mathematical operations to the database so we don't have to code or debug those. The only exception to this could be the |
|
Great, sounds like we're aligned. There's still an open question in my mind as to whether the type should be mutable or immutable (i.e. if we should expose List or IReadOnlyList). Immutable would allow better enforcing of correctness, but if the lists are long there's a perf hit for having to generate copies etc. |
|
@roji The maximum size of each list is 100 elements, so |
|
Yeah. Since the data size here isn't small and fixed, it may make sense to have a fully mutable API and leave it to the user's responsibility to not mess up (e.g. by having different list lengths). |
|
That makes sense, if the list sizes get out of sync, we can just pad the opposite array with zeros when serializing or displaying the string rep. |
|
@kirkbrauer just a note to say that this is waiting on you (but absolutely no rush of course!). You can push more commits, and when you'd like another review, ask for one in the reviews panel above. |
|
@roji No worries, I got a bit busy last week, but I'll push a new updated version with the changes we discussed above when I get a chance. |
|
@roji Quick question regarding the public shipped API vs public unshipped API. Which file would be the best to place the declarations for the NpgsqlCube so we can run the tests? I am not super familiar with these files, but I know that they are commonly used on MS open-source projects to denote the public API. I also know that there isn't great support for Rider, but I'll try to autogenerate them in Visual Studio. |
|
Re the public API files, all changes should be added to the Unshipped file (as you've done) - this is basically a snapshot of all changes done in the current, unreleased version. When we release 6.0, we'll merge those changes into the Shipped file, and the Unshipped file will be empty again. Unfortunately you're right that the analyzer has issues under Rider - I usually add into Unshipped manually, but VS can indeed be used as well. |
There was a problem hiding this comment.
@kirkbrauer thanks for the work on this - here are some more comments.
There was a problem hiding this comment.
@kirkbrauer sorry for the long wait (was away on vacation). This looks almost ready - see remaining comments below, I'm now available to finalize this and merge.
|
@kirkbrauer can you please squash and rebase this on the latest main branch? |
|
@roji Doing it right now. |
|
@roji All squashed and rebased. We should be ready for merge, though we might need to do a little work on the tests first. |
|
@kirkbrauer agreed - everything LGTM, but test coverage currently seems to be mainly for the type handler (reading/writing from PG). Some coverage for the APIs on the type would be good. |
|
@roji I'm working on adding additional tests for the cube type, however I have run into a very obvious problem related to our reference optimizations. If the cube is a point, we set both the lowerLeft and upperRight lists to the same List instance. This is super efficient, but if a user wants to change one list, they will end up manipulating both lists and the source list since all those values reference the same original list instance. This is probably fine if we treat the cube as immutable, however, if the user is often creating or manipulating their own cubes, it would produce the (possibly) unexpected behavior especially for users who might not be as familiar with pass-by-reference languages. In other types such as the NpgsqlPath, we are instead re-instantiating the List objects with the same values within the constructor at the penalty of performance, I think this might be a better approach to maintain parity with the rest of the library. |
|
You raise a good point... Here are some options I'm seeing:
Since this type is generally about reading/writing data from PostgreSQL, I think perf should be prioritized. I actually think the current situation isn't necessarily that bad - some users may trip over it, but with good documentation things should be OK. Otherwise options 4 or 5 above also seem reasonable - what do you think? (@vonzshik @NinoFloris any thoughts here?) |
|
I've been thinking about the best way to go about this. I feel that options 2, and 3 are probably off the table because they introduce additional complexity/confusion that would not benefit end users at all. Option 6 could work because we already have that implemented, but this would also create the issues where a user would be locked into a specific type of cube, which reduces flexibility and reusability of code making use of the NpgsqlCube type. As you mentioned, 5 is probably the most reasonable balance between perf and DX, I like the simplicity of setting the UpperRight property to null, however if we make the UpperRight nullable, it could create a situation where a point has to be treated as a special case in any business logic a user would create. If we deem this OK, then we can go ahead with this approach. If we wanted to go with option 4, it would make much more sense to just make the lists read-only and introduce custom APIs to add/remove coordinates, that way we can preform any optimizations we want including only duplicating the list on write if the lower-left and upper-right values are different. I'm not convinced this is the best approach because it requires custom cube manipulation methods, but at least it allows us to provide the most optimized internal impl and keeps the list lengths the same, preventing us from having to throw exceptions in the properties. A more radical solution might be to just totally abandon the |
|
Awesome, yes, I see that makes sense I hadn't been paying attention to the differences between the two repos, this one handles the underlying types and that one interfaces with efcore. Thanks for the explanation! This looks good to me then! |
|
@kirkbrauer are you still interested in bringing this over the finish line? Maybe there is something we can do to help? |
|
Hi @NinoFloris it's been quite a while, but I do think it would be valuable to get this merged into core. Even though I no longer work with low-level Npgsql things on a daily basis, having full coverage for the Postgres API is still good for the project. We have this reference implementation now, but it needs to be rebased and thoroughly tested. Also, I don't think I ever contributed the LINQ side of things which would be required to make this actually useful, so perhaps someone who is more of a LINQ expert than myself could tackle that later. |
|
I started working on EF Core support for this a while back but it's been idle for a couple years: npgsql/efcore.pg#2808 |
|
@Jimmacle for EF Core support we can absolutely do this, but I think we need to merge the lower-level support here first. |
- Arrays of cube - Dimension mismatches - Exceeding max dimensions (100) - Negative values - HashCode equality - Zero values - Maximum dimensions (100)
|
@roji I attempted to rebase my changes on the latest. I had to make a few modifications to how we register the type, but my understanding is that since |
There was a problem hiding this comment.
Thanks, this looks pretty good! I'll take a deeper look later, but here are some first comments.
test/Npgsql.Tests/Types/CubeTests.cs
Outdated
|
|
||
| [Test, TestCaseSource(nameof(CubeValues))] | ||
| public Task Cube(NpgsqlCube cube, string sqlLiteral) | ||
| => AssertType(cube, sqlLiteral, "cube", NpgsqlDbType.Cube, isDefaultForWriting: false); |
There was a problem hiding this comment.
Shouldn't isDefaultForWriting be true, as this is a dedicated .NET type that corresponds exactly to the PG type?
There was a problem hiding this comment.
isDefault in general should be true, we only have a single mapping for cube.
There was a problem hiding this comment.
This is great! Thanks a lot for updating this. I commented on quite a few things, but that should be all from my end.
src/Npgsql/Internal/ResolverFactories/CubeTypeInfoResolverFactory.cs
Outdated
Show resolved
Hide resolved
…lied API suggestions, fixed CLR type inference (need to discuss further)
| Add(DataTypeNames.PgLsn, oid: 3220, arrayOid: 3221); | ||
| Add(DataTypeNames.Unknown, oid: 705, arrayOid: 0); | ||
| Add(DataTypeNames.Void, oid: 2278, arrayOid: 0); | ||
| Add(DataTypeNames.Cube, oid: 100000, arrayOid: 100001); |
There was a problem hiding this comment.
We need to figure out a solution to this problem, this was required to make the tests pass, but probably creates many other unintended consequences.
There was a problem hiding this comment.
We definitely shouldn't have cube in PostgresMinimalDatabaseInfo.
| public static DataTypeName TimeTz => ValidatedName("pg_catalog.timetz"); | ||
| public static DataTypeName Inet => ValidatedName("pg_catalog.inet"); | ||
| public static DataTypeName Cidr => ValidatedName("pg_catalog.cidr"); | ||
| public static DataTypeName Cube => ValidatedName("pg_catalog.cube"); |
There was a problem hiding this comment.
The cube extension type can be installed in any schema not just pg_catalog, typically it is installed in public, but the user may choose to install it somewhere else. Therefore, this qualifier is actually invalid.
|
@roji @NinoFloris I discovered a critical issue with the implementation of the cube type. Since this type is technically an extension type, it won't be installed in the It appears that there is currently no mechanism to support extension types within the We could pretend that the |
|
@kirkbrauer are you sure about that? Isn't cube exactly like, say, the PostGIS types, which also come from an extension (and can be installed in any schema) - and AFAIK work fine? If we do have a limitation/problem, I'd treat that as separate and open a new issue, without blocking cube per se; again, whatever problem we have here would also affect PostGIS (and probably some other types as well)... |
|
Thanks @NinoFloris, disabling CLR type inference makes sense. I think I was trying to force it to work the same way as the built-in types which obviously won't work hence the confusion. @roji if this works properly then we should be good. |
|
No problem, it's something to improve in Npgsql. All external plugins have to rely on DataTypeName anyway, so this part of the inference isn't relevant there. And as it really only affects the plugin types that have NpgsqlDbType cases it hasn't really been worth the effort. Note, it really just disables NpgsqlDbType inference. So when we see just an NpgsqlCube on a parameter, and you inspect NpgsqlDbType we fail to tell you it's NpgsqlDbType.Cube. |
|
@kirkbrauer thank you for your work and persistence. I'm very glad to see this be part of npgsql for 10.0! |
|
Likewise, really happy this got in! |
Here's my implementation of the cube type from #698.
The binary send/recv functions are not documented anywhere at the moment, so I had to read the PostgreSQL source files to determine the binary type of the cube type.
https://github.com/postgres/postgres/blob/c30f54ad732ca5c8762bb68bbe0f51de9137dd72/contrib/cube/cube.c
From what I found, there appears to be a 32-bit header that indicates the number of dimensions (i31) in the cube and whether the cube is a point (the first bit) meaning it has the same lower-left and upper-right coordinates. After that there is a linear list of doubles containing all the coordinates in the cube, if the cube is a point they don't bother storing the upper-right coordinates since they are the same as the lower-left ones.
For example, here is a hex dump of the cube
(1):And here is a hex dump for a cube with lower-left and upper-right coordinates:
We also need to determine which parts of the API need to be implemented on the client side so users can do basic cube manipulation and comparison directly in their program (or if we even need this).