MAINT: Fix the cutoff value inconsistency for pinv2 and pinvh #10067

ilayn · 2019-04-16T07:11:42Z

The least square solver bound value is notoriously difficult to select due to the inherent trade-off between selecting the best minimizer for the ||Ax-b|| and the precision of x when b is in the image of the matrix A. Since pinv is a special least squares problem the usual bound of dtype eps is relaxed to have a factor of MAX(M, N). Otherwise the problem shown in the linked issue is hit more often than acceptable times.

On the other hand SVD based pseudo-inverses enjoy the convenience of having the singular values available for obtaining an upper bound on the error. In this PR we remove the hard-coded values of 1E3 and 1E6 factors for pinv2 and pinvh and tie them to the max(M,N)*sigma_max*eps. Probably for historical reasons the cutoff values were hardcoded.

Again, these values by no means provide a failsafe situation. However, they are kind of the standard values that serve well in most of the cases. Otherwise one can always provide a better cond value manually.

Here a possible backwards incompatibility can arise if a user relies on the resulting precision too heavily. But in my opinion, it repairs quite more than it might break.

rgommers · 2019-04-16T10:07:08Z

Can you explain in the description in what cases this matters for backwards compatibility?

tylerjereddy · 2019-04-16T17:34:14Z

@ilayn If you are comfortable assessing pinv related issues, note that NumPy has recently been plagued by sporadic / bizzare pinv failures: numpy/numpy#12862

Probably a little too optimistic to hope for any connection though.

ilayn · 2019-04-16T20:41:43Z

@rgommers Done.

@tylerjereddy I'll have a look.

ilayn · 2019-04-16T21:03:47Z

@tylerjereddy It looks like the same issue fixed here. For single precision (np.float32, np.complex64 etc.) pinv value is still 1e-15 by default and that causes mismatches almost every failed test very close the single precision eps (~1e-7).

If OpenBLAS folks touched these and their related LAPACK routines or number representations, they can flip to the left or to the right if they are sitting on the fence. Hence I am not so sure if it is compiler related.

Also, unfortunately, np.linalg.pinv is sp.linalg.pinv2 :(

rgommers · 2019-04-16T21:13:22Z

Thanks @ilayn, clear description. Let's leave this open for a while to give people a chance to weigh in.

ilayn · 2019-04-21T11:21:51Z

If there are no more comments on this, I'd like to merge it tomorrow.

larsoner

Looks good to me. Probably worth a note in the backward incompatible changes release notes section, too, though.

Also some tests that show things work better now than on master would help if it's not too much work. Maybe something simple like pinv, pinvh, and pinv2 (good case for parametrize) on a 2x2 double array with 1 and 1e-10 in the diagonal?

scipy/linalg/basic.py

ilayn · 2019-04-21T20:30:46Z

Ah yes. Good idea.

ilayn · 2019-04-22T13:21:53Z

@larsoner pinvh is basically pinv2 with different solver so I only compared pinv and pinv2. I couldn't replicate a symmetric matrix that exhibits the issue but let me know if you have a better example to test with

@tylerjereddy If there is still enough time please consider this to add to 1.3.0

ilayn · 2019-04-22T15:35:50Z

Sigh... pypy is not having a good day again. @mattip Any suggestions?

mattip · 2019-04-22T15:46:36Z

This is using NumPy 1.16.3 I assume, which was released yesterday and added a new C-API function. I will issue a PR to go back to using nightly builds rather than official releases, since PyPy fixed when it was merged to NumPy but it is not in a release yet..

mattip · 2019-04-22T15:48:08Z

~~Maybe better would be to pin the NumPy used in the PyPy CI run to the version used in the other CI runs.~~

Edit: travis runs seem to use a number of different NumPy versions, including latest.

ilayn · 2019-04-24T06:49:22Z

@tylerjereddy I forgot to add the milestone to this if @larsoner is happy with the changes it is ok to merge.

larsoner

pinvh is basically pinv2 with different solver so I only compared pinv and pinv2. I couldn't replicate a symmetric matrix that exhibits the issue but let me know if you have a better example to test with

What I meant above about "a 2x2 double array with 1 and 1e-10 in the diagonal" is something like the following, which should fail on master for pinvh and pinv2 (and succeed for pinv) but all pass on your PR (at least when I tested it as a script, now generalized to the pytest case, it did I think):

@pytest.mark.parametrize('scale', (1e-20, 1., 1e20))
@pytest.mark.parametrize('pinv_', (pinv, pinvh, pinv2))
def test_auto_rcond(scale, pinv_):
    x = np.array([[1, 0], [0, 1e-10]]) * scale
    expected = np.diag(1. / np.diag(x))
    x_inv = pinv_(x)
    assert_allclose(x_inv, expected)

scipy/linalg/basic.py

tylerjereddy · 2019-04-24T18:43:51Z

I think we should bump the milestone--looks like there's still some active discussion / revision going on here.

ilayn · 2019-04-24T18:58:35Z

Give me an hour and it will be done :)

ilayn · 2019-04-24T19:46:49Z

@larsoner After having sufficient coffee I can give a better response: Now if the cond arg is given it is respected and we only interfere if both are None which is what this PR is fixing.

Also thank you for tests indeed it fails on master and passes this PR. Hence please have a look.

But I think the terminology is wrong in the inversion context; just as an example https://nl.mathworks.com/help/matlab/ref/rcond.html

larsoner

Apologies for being a pain here, but I'm not sure the docstring rewordings help the problem here.

scipy/linalg/basic.py

[ci skip]

ilayn · 2019-04-25T09:17:13Z

OK I thought [ci skip] would stop these and not the CircleCI sorry about that,

larsoner

Thanks for iterating, wording LGTM now

ilayn · 2019-04-25T13:14:19Z

Thanks for keeping it sane. I really confused myself with this one 😃

tylerjereddy · 2019-04-25T17:02:19Z

I restored the milestone to 1.3.0 & did the same for the cognate issue. If this deserves a release note please feel free to add one on the wiki page.

ilayn · 2019-04-25T17:38:49Z

Thanks Tyler, wiki page edited.

GillesOrban · 2019-11-08T10:20:16Z

Dear member of scipy,

Using pinv2 and wanted to fix the relative conditioning number with rcond, I saw in the code that rcond and cond have the same behavior - as far as I can tell.
Not sure if this is indented.
My suggestion would be to replace in line 1372
cond = rcond
by
cond = rcond * np.max(s)

Thank you for supporting scipy,

Gilles

ilayn · 2019-11-08T21:32:38Z

cond = rcond only if it is not None otherwise it has the max(s) by default

ogrisel · 2021-03-15T15:13:19Z

cond = rcond only if it is not None otherwise it has the max(s) by default

I agree with @GillesOrban. @ilayn by reading the lines: https://github.com/scipy/scipy/pull/10067/files#diff-8400a871af5e3ff38643aff36ff192650313e769cdfe80f57b02140def686ee1R1368-R1377 one can see that:

if rcond is not None and cond is None, then cond = rcond and rank = np.sum(s > cond) or equivalently rank = np.sum(s > rcond)
if rcond is None and cond is not None, then rank = np.sum(s > cond) directly.

Therefore those two arguments mean the same thing in practice.

Another way to frame it: there is no way to pass xcond and have rank = np.sum(s > xcond * np.max(s)) (I am not sure if such an xcond should actually be named cond or rcond).

Finally the docstring does not explain what is the (expected) difference between those two arguments.

ilayn · 2021-03-16T07:08:42Z

Arrrgh, this issue still haunts me 🤦

@ogrisel I think these keywords were incorrect to start with, as you mentioned. I'll come back to that in a sec.

However I think there is a discrepancy in that rcond is not a relative condition number as in rtol to atol but the reciprocal condition number. So the functionality you want to pass is essentially a rtol, based on the largest singular value and makes sense (That's basically what the default value is trying to do in this PR). rcond and cond were placed as the condition number wrt inversion which is essentially an atol for cutoff and that's where the hardcoded numbers were placed.

Back to keywords, ideally we shouldn't have had these wrongly named keywords, shortcut together behind the scenes but I didn't touch it for backwards compatibility. However I do agree that the naming is confusing. And if you want to have the relative cutoff value I would really appreciate a new issue/feature request as here not many people will see it. And more people can chime in. Probably we should have gone for rtol and atol with better names.

Hopefully this time we can fix it for good and be done with this.

MAINT: Fix the cutoff value inconsistency for pinv2

0f4c793

ilayn added scipy.linalg maintenance Items related to regular maintenance tasks labels Apr 16, 2019

MAINT: Fix cond value of pinv

e2e1776

rgommers added the needs-decision Items that need further discussion before they are merged or closed label Apr 16, 2019

rgommers removed the needs-decision Items that need further discussion before they are merged or closed label Apr 21, 2019

larsoner reviewed Apr 21, 2019

View reviewed changes

scipy/linalg/basic.py Outdated Show resolved Hide resolved

TST: DOC: add comparison test and versionchanged

45d4654

TST: Relax the pinv(2) test tolerance

43e62e7

ilayn added this to the 1.3.0 milestone Apr 24, 2019

larsoner reviewed Apr 24, 2019

View reviewed changes

scipy/linalg/basic.py Show resolved Hide resolved

tylerjereddy modified the milestones: 1.3.0, 1.4.0 Apr 24, 2019

MAINT:TST: Fixed valid cond value branch

57bf302

larsoner approved these changes Apr 24, 2019

View reviewed changes

scipy/linalg/basic.py Show resolved Hide resolved

scipy/linalg/basic.py Outdated Show resolved Hide resolved

DOC: Reworded pinv, pinv2, pinvh cond/rcond parameters

4795be9

larsoner reviewed Apr 25, 2019

View reviewed changes

scipy/linalg/basic.py Outdated Show resolved Hide resolved

DOC: Reword pinv, pinv2 cond/rcond parameters

d0c05a5

[ci skip]

larsoner approved these changes Apr 25, 2019

View reviewed changes

ilayn merged commit c42462a into scipy:master Apr 25, 2019

ilayn deleted the pinv2_eps branch April 25, 2019 13:16

tylerjereddy modified the milestones: 1.4.0, 1.3.0 Apr 25, 2019

thomasjpfan mentioned this pull request May 18, 2019

[MRG] Fixes test_scale_and_stability by clipping small values scikit-learn/scikit-learn#13903

Merged

NicolasHug mentioned this pull request May 22, 2019

[MRG before #12069] KernelPCA: raise Errors and Warnings according to eigenvalue decomposition numerical/conditioning issues scikit-learn/scikit-learn#12145

Merged

rth mentioned this pull request Jun 10, 2019

ARD Regressor accuracy degrades when upgrading Scipy 1.2.1 -> 1.3.0 scikit-learn/scikit-learn#14055

Closed

ilayn mentioned this pull request Sep 28, 2019

scipy.linalg.pinv2 default cutoff possibly too conservative #10879

Closed

yaroslavvb mentioned this pull request Sep 30, 2019

Improve default cut-off for pinverse pytorch/pytorch#27095

Closed

yaroslavvb mentioned this pull request Oct 21, 2019

[feature request] symmetric matrix square root pytorch/pytorch#25481

Open

thomasjpfan mentioned this pull request Nov 19, 2019

[MRG] BUG Fixes test_scale_and_stability in windows scikit-learn/scikit-learn#15661

Merged

thomasjpfan mentioned this pull request Mar 8, 2021

FIX Fixes regression in CCA scikit-learn/scikit-learn#19646

Merged

ogrisel mentioned this pull request Mar 19, 2021

Make it possible to pass a rank cut-off value relatively to the largest singular value in pinv2 #13704

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MAINT: Fix the cutoff value inconsistency for pinv2 and pinvh #10067

MAINT: Fix the cutoff value inconsistency for pinv2 and pinvh #10067

MAINT: Fix the cutoff value inconsistency for pinv2 and pinvh #10067

MAINT: Fix the cutoff value inconsistency for pinv2 and pinvh #10067

Conversation

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment