8000 MAINT: lstsq: compute residuals inside the ufunc by eric-wieser · Pull Request #10890 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

MAINT: lstsq: compute residuals inside the ufunc #10890

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Apr 20, 2018

Conversation

eric-wieser
Copy link
Member

This saves a larger-than-needed output array from being allocated

@eric-wieser eric-wieser requested a review from mhvk April 12, 2018 06:55
@eric-wieser eric-wieser changed the title MAINT: compute residuals inside the ufunc MAINT: lstsq: compute residuals inside the ufunc Apr 12, 2018
*(npy_int*) args[5] = params.RANK;
delinearize_@REALTYPE@_matrix(args[6], params.S, &s_out);

if (excess >= 0 && params.RANK == n) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that np.linalg.lstsq only actually returns residuals if excess > 0. I'd consider this a bug, as to me the residuals are well defined as 0 when m == n.

Either way, I've deliberately not changed the external interface yet.

}
}
else {
nan_@REALTYPE@_matrix(args[4], &r_out);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also never escapes to the external interface, and is replaced by an shape == (0,) array (which makes no sense!)

@eric-wieser
Copy link
Member Author

I think #9997 might have been incorrect, which is why this fails.

@eric-wieser eric-wieser force-pushed the linalg-lstsq-ufunc branch 2 times, most recently from 3c6f110 to 8442391 Compare April 17, 2018 06:42
@eric-wieser
Copy link
Member Author

Updated with a manual and probably faster implementation of abs2, rather than dot(x, x.conj()) which needlessly computes a zero for the imaginary part.

@pv
Copy link
Member
pv commented Apr 17, 2018 via email

@pv
Copy link
Member
pv commented Apr 17, 2018 via email

@eric-wieser
Copy link
Member Author

That crossed my mind too, but I assume it's only useful when avoiding overflow.

I suppose that by moving this to C, we lose the pairwise summation that numpy normally does. But I doubt that level of precision matters here anyway

This prevents an overly large output array being allocated.

It also means the the residuals can be handled as a separate out argument in future.
@ewmoore
Copy link
Contributor
ewmoore commented Apr 17, 2018 via email

@eric-wieser
Copy link
Member Author
eric-wieser commented Apr 17, 2018

@ewmoore: the best I could find was zdotc. Unfortunately, that's tricky to invoke, as the function prototype changes depending on whether we're using the f2c'd lapack or the Fortran one (a bug in f2c?)

@mhvk
Copy link
Contributor
mhvk commented Apr 17, 2018

For reference, #3994 is a long-standing request for an abs2 ufunc, with more discussion about pros and cons of approaches. I think if it weren't for not being sure what the name should be, we'd have it already...

@eric-wieser
Copy link
Member Author
eric-wieser commented Apr 17, 2018

Even if we had such a ufunc, it would be quite tricky to call it from within this ufunc - in particular, it would end up being called multiple times, since the memory I'm calling it on does not pers 8000 ist across the stacked arrays. With the overhead associated with a ufunc call, that would be prohibitively expensive.

Copy link
Contributor
@mhvk mhvk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great to wrap the blas functions more completely! Two small comments/questions only.

@ftyp@ *vector = components + i*m;
/* Numpy and fortran floating types are the same size,
* so this case is safe */
@basetyp@ abs2 = @TYPE@_abs2((@typ@ *)vector, excess);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Am not sure this is worth splitting out as a function: it feels to me like it just becomes less readable.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're already at 5 levels of indentation here - I didn't want to introduce a 6th as well as #ifdefs.

To me, it seems clearer to try and fold the #ifdefs into the function definitions as much as possible.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Splitting the function also makes it easier to drop in a lapack/blas implementation of them if someone finds that's faster (and cares)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, it was not a big deal.

/* Numpy and fortran floating types are the same size,
* so this case is safe */
@basetyp@ abs2 = @ 8000 TYPE@_abs2((@typ@ *)vector, excess);
memcpy(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My poor knowledge of C does not really help here, but memcpy feels very odd. Can one not just assign?

Copy link
Member Author
@eric-wieser eric-wieser Apr 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If resid + i*r_out.column_strides is not aligned, there's no guarantee that the write will succeed. Unless there's some gufunc magic wrapping us that ensures this is the case, memcpy is safer.

delinearize_@TYPE@_matrix uses memcpy, which is where we copy between numpy and the other lapack buffers - so memcpy is at least consistent.

Also, something-something-strict-aliasing.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fear I remain confused, but fine to go with something that works!

Copy link
Member Author
@eric-wieser eric-wieser Apr 20, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A simple example: This will not work on some platforms:

char bytes[8] = {0};
(uint32_t *)&bytes[1] = 0x12345678;  // unaligned 32-bit write may fail
assert((uint32_t *)&bytes[0] == 0x00123456);  // assuming big endian

But using memcpy would work every time

Relatedly: https://www.alfonsobeato.net/arm/how-to-access-safely-unaligned-data/

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, thanks that makes it clearer (link is a good one too)

@mhvk mhvk added this to the 1.15.0 release milestone Apr 20, 2018
@mhvk
Copy link
Contributor
mhvk commented Apr 20, 2018

OK, I think this is all set, so will merge.

@mhvk mhvk merged commit de62ef9 into numpy:master Apr 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants
0