8000 ENH: Translate `sph_harm` Cython->C++, add `sph_harm_all` gufunc by izaid · Pull Request #20438 · scipy/scipy · GitHub
[go: up one dir, main page]

Skip to content

ENH: Translate sph_harm Cython->C++, add sph_harm_all gufunc #20438

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Apr 11, 2024

Conversation

izaid
Copy link
Contributor
@izaid izaid commented Apr 10, 2024

This PR translates sph_harm from Cython to C++. It was reasonably straightforward.

As usual, I discovered some unexpected things going on. For instance, we still support passing double m and n which then get casted to integers. I maintained this behaviour.

Something else I found is sph_harmonic_unsafe in _legacy.pxd, see https://github.com/scipy/scipy/blob/main/scipy/special/_legacy.pxd#L39. This isn't public facing or part of any API, so can we just delete it?

cc @steppi

@izaid izaid requested review from person142 and steppi as code owners April 10, 2024 11:57
@github-actions github-actions bot added scipy.special C/C++ Items related to the internal C/C++ code base Cython Issues with the internal Cython code base Meson Items related to the introduction of Meson as the new build system for SciPy enhancement A new feature or improvement labels Apr 10, 2024
@lucascolley lucascolley added this to the 1.14.0 milestone Apr 10, 2024
@lucascolley lucascolley removed the Meson Items related to the introduction of Meson as the new build system for SciPy label Apr 10, 2024
@ilayn
Copy link
Member
ilayn commented Apr 10, 2024

Not related to this PR but I think we must address this folder and file naming scheme before it is too late and before it starts to require massive amount of work to iron out. I am already completely lost in terms of what is in which file just to review things. Let's do it after this PR if possible.

@izaid
Copy link
Contributor Author
izaid commented Apr 10, 2024

Not related to this PR but I think we must address this folder and file naming scheme before it is too late and before it starts to require massive amount of work to iron out. I am already completely lost in terms of what is in which file just to review things. Let's do it after this PR if possible.

Totally agree on this. I think our general plan is to group functions by family (e.g. legendre.h, bessel.h, etc). The issue is as things get translated from the older libraries, they come in as a monolith of code and it's impossible for us to break them up in a single PR. So it's a bit of a process to get them into the right place.

@steppi
Copy link
Contributor
steppi commented Apr 10, 2024

Not related to this PR but I think we must address this folder and file naming scheme before it is too late and before it starts to require massive amount of work to iron out. I am already completely lost in terms of what is in which file just to review things. Let's do it after this PR if possible.

I think two steps we can take are:

  1. Come up with a good name for this C++ special function library.
  2. Move it to scipy/scipy/_lib where submodules currently are, in preparation for it eventually becoming a submodule.

This will help us get away from the confusing scipy/scipy/special/special structure, and I think are good steps to take anyway.

@ilayn
Copy link
Member
ilayn commented Apr 10, 2024

We can at least get the top level files in order, by what they do. Currently everything is XXX_wrappers, ?special?.XXX and cython_XXX.pxd.

We can use things like

ufunc_definitions.XXX,
top_level_wrappers.XXX

and so on, in other words call them by what they are doing. Also under special/special the header file names are identical to the library header only files amos.h -> amos/amos.h. I think we should use the word special very sparingly as it is name of everything in that module.

@steppi
Copy link
Contributor
steppi commented Apr 10, 2024

I think we should use the word special very sparingly as it is name of everything in that module.

I think having a name for the C++ library will help with that, so that if the library is called X, things likeX_wrappers will be clear.

Also, I don’t think anyone likes things amos.h, amos/amos.h, it was just a temporary expedient.

Restructuring things sounds like a good next step. I completely agree that things are a bit of a mess now.

@izaid izaid changed the title ENH: Translated sph_harm from Cython to C++ ENH: Translated sph_harm from Cython to C++and added sph_harm_all gufunc Apr 10, 2024
@izaid
Copy link
Contributor Author
izaid commented Apr 10, 2024

Right, I've added one more thing to this PR. A gufunc called sph_harm_all that returns a (m + 1, n + 1, ...) array, broadcasting over the theta and phi parameters. This didn't exist in SciPy before, but as you can see it is not so hard to do now that we have the gufunc machinery merged. And I had to make use of mdspan to compose functions and views at the C++ level. I think there are probably a lot of people who would like this.

More generally, a lot of the special functions in SciPy actually need to compute a sequence or table of values to return the result. Examples include the Legendre functions, the spherical harmonics, the Bessel functions, and so on. In many cases, what is wanted is the whole set of values, not just a single selection. For instance, this issue #19079 (comment) has asked for exactly this for the Bessel functions. Sometimes, though, a selection is desired too.

We are now in a position where we can provide these things relatively easily by exposing them as generalised ufuncs. This PR only intends to do so for sph_harm. A question, however, is how these functions should be named. Already in SciPy we have lpn, lpmn, pbdv_seq, obl_cv_seq, and so on that compute the whole set but without broadcasting. (Well, lpn, and lpmn do with broadcasting as of the merge yesterday.) I'd propose that all the functions that compute a sequence with broadcasting be given a clear subscript to distinguish them from their non-set counterparts. The spherical harmonics are a good example, because the existing sph_harm does not compute a sequence and this PR adds in its counterpart sph_harm_all that does.

I've used the subscript _all in this PR. I'm happy to change it, but we should pick something and stick with it. And other functions like lpmn that do compute a set of values should be slowly adapted for consistency: like sph_harm, we'd want a lpmn that returns values for fixed m and n and a lpmn_all that returns the whole table of values. The existing lpmn currently returns the whole table, and the existing sph_harm currently returns a fixed selection, so it's inconsistent.

Other subscript options besides _all are _seq, _table, _array, _multi, and _full. I think _all makes the most sense here. _seq is too related to something 1D, and we already have examples where the result is not 1D. But open to suggestions!

@izaid
Copy link
Contributor Author
izaid commented Apr 10, 2024

Oh, and there is still work to do on this PR, I see some failing jobs, etc. Just wanted to start the discussion.

This PR is actually done. But there needs to be a bit of discussion about how to name things above.

And there needs to be a bit of implementation thought on what to do about functions like lpmn that currently return both the value and the derivative. Really those should be split up, at least at the C++ level with the current SciPy behaviour not changing for now. I've done that for lpmn here as I had no need for the derivative.

Edit: There's also a question of whether sph_harm_all should return a (m + 1, n + 1, ...) array or a (2 * m + 1, n + 1, ...) array to include the negative values. I think the latter, in which case this PR is not done.

@ilayn
Copy link
Member
ilayn commented Apr 10, 2024

I think there are probably a lot of people who would like this.

I am one of them. Though we could also have done this std::vector but your C++-fu is more advanced than mine so not going to press on any points :)

The spherical harmonics are a good example, because the existing sph_harm does not compute a sequence and this PR adds in its counterpart sph_harm_all that does.

Can the current sph_harm be a special case of sph_harm conceptually without changing things too much in the signature? Then we can actually overhaul the functions themselves which would be amazing.

@izaid
Copy link
Contributor Author
izaid commented Apr 10, 2024

Can the current sph_harm be a special case of sph_harm conceptually without changing things too much in the signature? Then we can actually overhaul the functions themselves which would be amazing.

Unfortunately, no, I don't think so. The current sph_harm takes arrays m, n, theta, and phi and returns output of shape np.broadcast_shapes(m.shape, n.shape, theta.shape, phi.shape). The sph_harm_all written here takes integers m and n as well as arrays theta and phi and returns output of shape (m + 1, n + 1) + np.broadcast_shapes(theta.shape, phi.shape), grea 8000 tly saving on the computation as it computes the entire consecutive set of values. They are only the same if, for the existing sph_harm, m and n are integer arrays in order.

I think the behaviour of the existing sph_harm is useful and we want to retain it, in addition to the new sph_harm_all. What is not good is lpmn and a few other functions as they are now, which are really doing what these _all functions should be doing. lpmn probably needs to ultimately become lpmn_all with a new lpmn coming in that behaves like the current sph_harm.

@ilayn
Copy link
Member
ilayn commented Apr 10, 2024

The shapes are not that important, the number of output args and their type is essential so that we don't break anything. Also _all means really not much in this context because all of what? is the natural question.

@izaid
Copy link
Contributor Author
izaid commented Apr 11, 2024

Right, I've updated this PR so sph_harm_all now returns the full 2 * m + 1 set of values, with the negative ones after the positive ones. This means the result can be indexed like y[m, n] where m can be positive or negative, and you get what you expect. I've seen this convention elsewhere, and it makes the whole thing very convenient.

@ilayn
Copy link
Member
ilayn commented Apr 11, 2024

Just to be clear, what I was wondering is whether this sph_harm_all can be subsumed into sph_harm or not. So that we don't introduce a new public API function.

@izaid
Copy link
Contributor Author
izaid commented Apr 11, 2024

Just to be clear, what I was wondering is whether this sph_harm_all can be subsumed into sph_harm or not. So that we don't introduce a new public API function.

It could be done by adding an appropriate keyword to sph_harm, if that's the way we want to go. In which case we are just toggling the behaviour. If we want to do this, we should probably think about it with the proposed special extension in mind.

Edit. A potential downside of this is that the existing sph_harm can not be a ufunc. Instead it becomes a Python function that toggles between the sph_harm ufunc and the sph_harm_all gufunc. In which case, it really is just two different functions.

@lucascolley lucascolley changed the title ENH: Translated sph_harm from Cython to C++and added sph_harm_all gufunc ENH: Translate sph_harm Cython->C++, add sph_harm_all gufunc Apr 11, 2024
@ilayn
Copy link
Member
ilayn commented Apr 11, 2024

If we want to do this, we should probably think about it with the proposed data-apis/array-api#725 in mind.

Ah yes, good point. Then a single signature is even more important to keep the API surface small.

@steppi
Copy link
Contributor
steppi commented Apr 11, 2024

Something else I found is sph_harmonic_unsafe in _legacy.pxd, see https://github.com/scipy/scipy/blob/main/scipy/special/_legacy.pxd#L39. This isn't public facing or part of any API, so can we just delete it?

OK, this actually is used. I even found a comment where I referred to it, demonstrating that I was once aware of its purpose.

The wrappers in _legacy.pxd are used to print runtime warnings if someone inputs a float for an argument that should be an int. On main we see

In [2]: sph_harm(3.2, 5, [0.2, 0.4, 0.2], [0.1, 0.1, 0.1])
<ipython-input-2-78ce2280316f>:1: RuntimeWarning: floating point number truncated to an integer
  sph_harm(3.2, 5, [0.2, 0.4, 0.2], [0.1, 0.1, 0.1])
Out[2]: 
array([-0.00224728-0.00153745j, -0.00098665-0.00253782j,
       -0.00224728-0.00153745j])

but on izaid:sph_harm we have

In [5]: sph_harm(3.2, 5, [0.2, 0.4, 0.2], [0.1, 0.1, 0.1])
Out[5]: 
array([-0.00224728-0.00153745j, -0.00098665-0.00253782j,
       -0.00224728-0.00153745j])

I think we can't just drop the RuntimeWarnings here, so we need to find a way to preserve them. Eventually I think we should go through a deprecation cycle for non-integer arguments for all of these functions.

@izaid
Copy link
Contributor Author
izaid commented Apr 11, 2024

RuntimeWarning: floating point number truncated to an integer

Right, but we just need that behaviour? I can make that happen in the extension module.

@steppi
Copy link
Contributor
steppi commented Apr 11, 2024

RuntimeWarning: floating point number truncated to an integer

Right, but we just need that behaviour? I can make that happen in the extension module.

Yes, you just need to raise the RuntimeWarning when floating point input is given. It doesn't matter how it's done.

@izaid
Copy link
Contributor Author
izaid commented Apr 11, 2024

RuntimeWarning: floating point number truncated to an integer

Right, but we just need that behaviour? I can make that happen in the extension module.

Yes, you just need to raise the RuntimeWarning when floating point input is given. It doesn't matter how it's done.

Done. It looks identical to the old Cython, and warning raised.

Screenshot 2024-04-11 at 3 59 53 PM

@steppi
Copy link
Contributor
steppi commented Apr 11, 2024

@izaid, once the RuntimeWarnings are sorted out, I'd be happy to merge this if sph_harm_all is taken out of the public API (so that it can only be imported as

 from scipy.special._gufuncs import _sph_harm_all

and it gets a docstring which says "Private function. This may be removed or modified at any time.")

@izaid
Copy link
Contributor Author
izaid commented Apr 11, 2024

@izaid, once the RuntimeWarnings are sorted out, I'd be happy to merge this if sph_harm_all is taken out of the public API (so that it can only be imported as

 from scipy.special._gufuncs import _sph_harm_all

and it gets a docstring which says "Private function. This may be removed or modified at any time.")

All done. It has become _sph_harm_all and is not part of the public API. The gufunc lives in _gufuncs and the Python function that does the allocation and calls the gfunc is _sph_harm_all, which lives in _basic.py.

Copy link
Contributor
@steppi steppi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with this now and feel it's safe to merge since nothing has been added to the public API. Let's start a discussion for establishing a consistent convention for gufunc versions of existing functions like this.

@steppi steppi merged commit 003026a into scipy:main Apr 11, 2024
@izaid izaid deleted the sph_harm branch April 11, 2024 16:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C/C++ Items related to the internal C/C++ code base Cython Issues with the internal Cython code base enhancement A new feature or improvement scipy.special
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0