-
Notifications
You must be signed in to change notification settings - Fork 565
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: add coverage_simplify exposing topological simplification of coverages #1969
base: main
Are you sure you want to change the base?
Conversation
Pull Request Test Coverage Report for Build 10995499008Warning: This coverage report may be inaccurate.This pull request's base commit is no longer the HEAD commit of its target branch. This means it includes changes from outside the original pull request, including, potentially, unrelated coverage changes.
Details
💛 - Coveralls |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this @martinfleis !
It would be good to report test cases that crash the kernel to GEOS; those should be handled upstream.
I assume you are going to expand the test cases?
I'll fish those cases out and report to GEOS, good point. I actually wrote more tests but apparently did not push the commit. Will do once I'll get to laptop. |
Co-authored-by: Brendan Ward <bcward@astutespruce.com>
I suppose that the input coverage needs to be entirely topologically sound? Eg. that if two polygons touch not in a common point but a point touches a vector of another polygon between two points, this will not be treated as a common boundary? If this is the case, I think it will be useful to explicitly mention this in the doc to avoid misunderstandings... |
We also need to decide how to treat the input geometry argument. AFAIK in the current state of the PR, you just pass each input geometry to the GEOS function, and so this assumes that the user already collected their set of polygons into a collection. Alternatively, we can also do that for the user. Similarly as |
That's a good point. I initially wanted to do that on the GeoPandas side (collect, simplify, explode) but it may make sense to do the first two steps (collect and simplify) already in shapely and return the collection. I think this covers majority of use cases and if you want to simplify per geometry, you can always loop. I'll leave this decision to you as you have a better feeling for what the optimal behaviour of shapely is. (Another example of this is polygonize.) |
I have changed the behaviour to follow the logic of |
returned collected but unchanged. | ||
|
||
If the geometry is polygonal but does not form a valid coverage due to overlaps, | ||
it will be simplified but it may result in invalid topology. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@martinfleis Lovely docstring, thank you!
shapely/constructive.py
Outdated
Axis along which the operation is performed. The default (None) | ||
performs the operation over all axes, returning a scalar value. | ||
Axis may be negative, in which case it counts from the last to the | ||
first axis. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
axis
is new to me. It comes from Pandas, right? I think boundaries between Pandas/GeoPandas and Shapely are a good thing and that we should resist blurring the projects if we can. What does axis
really mean to Shapely, given that inputs can only be a scalar geometry or a 1-D array of geometries? Can't we infer axis from the type of input?
That aside, the doc for axis
may be technically correct but isn't useful to a developer who isn't super fluent with numpy or Pandas. Can we make it more approachable?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This comes directly from shapely.union_all
and other predicate _all functions which have the same notion of using axis to define what should be treated as a single group. So it is not new.
What does axis really mean to Shapely, given that inputs can only be a scalar geometry or a 1-D array of geometries?
That is not correct as far as I'm aware. While some operations will not work on a N-D array (STRtree notably), most of shapely operations are perfectly capable of handling N-D arrays of geometries and there's a clear use case for that in the vector data cube space for dealing with time-dependent geometries.
If you think this should change, we should do that across all functions that use it, so I'd keep it for a separate discussion and PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, since the low-level operations were written like ufuncs, I shouldn't be surprised that they would work on N-D arrays.
Even, so why would we want to ravel in one case, and rollaxis in the other? What does this mean to a GIS programmer? If lib.coverage_simplify()
can accept N-D arrays, can the caller not ravel or rollaxis or neither (perhaps?) as they need before calling the method?
In matters of software design, I tend to be anti-convenience. Instead of accepting an axis
argument and then modifying input based on the value, callers should just modify their inputs in their own applications. Then we have less code, fewer special cases and branches to test. I would make an exception for situations where GEOS C API methods must be called, because how is a user going to do that? But here the user already has numpy installed and can easily reshape their inputs.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need some clarification on what axis
really stands for in this method and I'd like a discussion about whether this is a real feature or just a convenience.
shapely/constructive.py
Outdated
Axis along which the operation is performed. The default (None) | ||
performs the operation over all axes, returning a scalar value. | ||
Axis may be negative, in which case it counts from the last to the | ||
first axis. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, since the low-level operations were written like ufuncs, I shouldn't be surprised that they would work on N-D arrays.
Even, so why would we want to ravel in one case, and rollaxis in the other? What does this mean to a GIS programmer? If lib.coverage_simplify()
can accept N-D arrays, can the caller not ravel or rollaxis or neither (perhaps?) as they need before calling the method?
In matters of software design, I tend to be anti-convenience. Instead of accepting an axis
argument and then modifying input based on the value, callers should just modify their inputs in their own applications. Then we have less code, fewer special cases and branches to test. I would make an exception for situations where GEOS C API methods must be called, because how is a user going to do that? But here the user already has numpy installed and can easily reshape their inputs.
I mean, I am not attached to the
The reason I included it in this way is that we need to collect individual geometries in some way before passing them to GEOS, otherwise we expect user to do that. The latter was originally implemented here but as pointed out by @jorisvandenbossche above, it may not be what user expects from this function. When collecting geometries, shapely has a precedent of using the axis keyword. I simply followed that for the sake of consistency. Coverage simplification then works as aggregation, since it returns a GeometryCollection, following practically the same pattern as I fully understand your concern but I am not sure if this function is different enough to deviate from this pattern. But in the end, that is not my call to make. I can remove the axis and follow the route currently implemented for |
To be clear, the Given that the core GEOS functions we now wrap are implemented as numpy ufuncs, I think providing the
The Can you the user do this themselves? Potentially yes, depending on what signature we would provide instead.
To be fair, I don't think this specific capability is that useful for typical GIS applications. I think it is quite normal that a user would want to simplify different groups of geometries together, but it is probably not that likely that those different groups are then exactly the same size such that you actually can put them in a multidimensional numpy array and make use of reducing over an axis. But while this specific case doesn't seem that useful, there are several other aspects about the numpy ufunc capabilities that are very useful, such as the broadcasting behaviour (allowing you to do pairwise operations). And when we essentially provide standard numpy functions with those features, I would personally go for being consistent in following numpy's design entirely, and not only provide that subset of features that we think will be useful. That said, we still need to make a choice about what kind of function we want to provide, and there are different options which all have some pros and cons (and not all of those options have an
The first option is the closest to the pure GEOS function, the last option is closest to how PostGIS exposes this as Note that there is prior art in "hiding" the fact that certain GEOS functions work on a GeometryCollection (instead of on an array of geometries, but that might be because the collection is easier in the C API). The |
I have just tested that doing the option four, meaning collect-simplify-explode, is possible as the order of geometries in the collection seems to be preserved throughout the simplification procedure. That is how I would like this to behave in GeoPandas but within shapely, it feels inconsistent as other reductions like polygonize return a collection of polygons, not an array of individual geometries. But truth is, that a user does not need to know the implementation details in GEOS and we can use the API that more mirrors normal So I'm happy to update the PR to follow the first option. However, with a N-D array (e.g. time x geom used in vector data cubes), we'd need to slice and loop over 1D arrays, compared to the current state of the PR. |
Yes, but we could also argue that (for |
Alright. I'll try to update the PR over the weekend. |
(maybe wait a bit to give Sean time to respond as well) |
@sgillies do you have time to take another look at the discussion above? |
@jorisvandenbossche @martinfleis thank you for the discussion. I've finally got my head around the current implementation. It makes sense to me. I'm also attracted to the 4th option. Do either of you know if GEOS coverage simplification always conserves polygons? If pieces of a coverage are ever simplified away to nothing, then we could not implement the fourth option, right? |
The VW algorithm reduces a polygon to a triangle at most, so there's always a polygon as an output. I will verify that again but so far I believe that we can implement the fourth option. |
That's indeed what the doc comment in the C API guarantees (https://github.com/libgeos/geos/blob/2b5721bc96d64c3d768bd0e91dba6c3bbca354a9/capi/geos_c.h.in#L3960):
|
Let's do the fourth option, then, and agree that this isn't a reduction operation? BTW, I'm happy with the 5 instances in set_operations.py. |
I have changed the behaviour to follow the fourth model but I don't think it is quite right implementation-wise. Right now, it works in principle but does not work like a ufunc, so casting it on a GeoSeries does not preserve GeoSeries output etc... And I think that this logic should happen in C but I think I've hit the limits of my C and wasn't able to make it work. So marking this as a draft but I'll probably need to pass this on to someone more capable in C. It got more complicated than anticipated. |
xref #1570
GEOS supports this simplification for collections of polygons only. Here, I have currently opted for returning other geometries untouched but the correct behaviour is up to a discussion. When checking for correct geom types, nesting og GeometryCollections is not supported and a GeometryCollection of a GeometryCollections will not be simplified no matter the contents of the inner collection.
If we did not do this check, when points or line strings are passed, shapely returns (empty) GEOS Exception. When passing GeometryCollection with points or line strings, it kills the kernel, so I suppose that the checks I have included now are quite a feasible way forward.
I am mirroring the API of PostGIS, not that of GEOS. So the keyword is
simplify_boundary
rather thanpreserve_boundary
which seems more intuitive.