scipy.linalg.pinv gives wrong result while scipy.linalg.pinv2 works #8861

eubicht · 2018-05-24T09:59:50Z

For a student project with finite differences we use the pseudo inverse. Here, we noticed large differences in the accuracy of scipy.linalg.pinv and scipy.linalg.pinv2: while the latter works just fine, the first produces useless pseudo inverses in this example, see the code below.

As far as I understand, pinv uses scipy.linalg.lstsq to minimize | e^i - A x^i |_2 for the i-th unit vector e^i in order to obtain the pseudo inverse X, whereas pinv directly uses a truncated singular value decomposition.
However, internally lstsq uses the LAPACK routine gelss which solves the minimization also using a singular value decomposition, see e. g. http://www.netlib.org/lapack/lug/node27.html .

The reason why pinv fails for the example below while pinv2 doesn't results from the fact, that the standard cut-off threshold for the small singular values in pinv2 is set to 1e6*eps (for double), whereas pinv uses eps*max(n,m) if A is an nxm matrix, which is much smaller for the matrix in our example.

Now I wonder why pinv and pinv2 use different cut-off thresholds as standard values. If I use the standard value of pinv2 for pinv, see comment in example code, the results are about the same.
Moreover, I wonder why the least squares solver is namend pinv and the classical method is called pinv2. In my opinion it might by better to switch them.
Since in the end, both use a truncated singular value decomposition, I do not completely understand the fundamental need of both.

Reproducing code example:

import numpy as np
import scipy.sparse as sp
import scipy.linalg as la

ns = 4,
hs = [1 / (n + 1) for n in ns]

def get_A_h(n):
    T_diagonals = [[4] * (n + 2), [-2] + [-1] * n, [-1] * n + [-2]]
    T = sp.diags(T_diagonals, [0, 1, -1])   
    Ts_diagonals = [[-2] + [-1] * n, [-1] * n + [-2]]
    Ts = sp.diags(Ts_diagonals, [1, -1])    
    I_n2 = np.identity(n + 2)
    A_h = (n + 1) ** 2 * (sp.kron(I_n2, T) + sp.kron(Ts, I_n2))
    return A_h

def check_moore_penrose_axioms(A, Apinv):
    print()
    print("1:", la.norm(A@Apinv@A - A))
    print("2:", la.norm(Apinv@A@Apinv - Apinv))
    print("3:", la.norm(A@Apinv - (A@Apinv).T))
    print("4:", la.norm(Apinv@A - (Apinv@A).T))

for n in ns:
    N = n + 2

    A_h = get_A_h(n).toarray()
    R = np.eye(*A_h.shape)
    
    print(A_h)
    
    A_h_plus = la.pinv(A_h) #, rcond=1e6*np.finfo(A_h.dtype).eps)    
    check_moore_penrose_axioms(A_h, A_h_plus)
    
    A_h_plus = la.pinv2(A_h, )
    check_moore_penrose_axioms(A_h, A_h_plus)

Output:

[[100. -50.   0. ...   0.   0.   0.]
 [-25. 100. -25. ...   0.   0.   0.]
 [  0. -25. 100. ...   0.   0.   0.]
 ...
 [  0.   0.   0. ... 100. -25.   0.]
 [  0.   0.   0. ... -25. 100. -25.]
 [  0.   0.   0. ...   0. -50. 100.]]

1: 10.966687766642307
2: 3108066184397.3657
3: 0.313565155535093
4: 2.715716258843298

1: 1.2034095577848958e-12
2: 4.613607271655997e-16
3: 2.7619937032685725e-14
4: 2.531755927761354e-14

Scipy/Numpy/Python version information:

1.0.1 1.14.3 sys.version_info(major=3, minor=6, micro=5, releaselevel='final', serial=0)

The text was updated successfully, but these errors were encountered:

ilayn · 2019-04-16T07:07:38Z

I don't know the reasons but probably they are historical artifacts. I will change the cutoff values and ask over the mailing list. Actually it is the pinv2 values that should be kept but pinv doesn't push the condition number to lstsq hence the factors are lost.

skjerns · 2022-02-06T10:25:27Z

Starting from scipy 1.7.0 , pinv2 is deprecated and also replaced by a SVD solution.

DeprecationWarning: scipy.linalg.pinv2 is deprecated since SciPy 1.7.0, use scipy.linalg.pinv instead

That means, numpy.pinv, scipy.pinv and scipy.pinv2 now compute all equivalent solutions. They are also equally fast in their computation, with scipy being slightly faster.

import numpy as np
import scipy

arr = np.random.rand(1000, 2000)
res1 = np.linalg.pinv(arr)
res2 = scipy.linalg.pinv(arr)
res3 = scipy.linalg.pinv2(arr)

np.testing.assert_array_almost_equal(res1, res2, decimal=10)
np.testing.assert_array_almost_equal(res1, res3, decimal=10)

More details: scipy/scipy#8861 (comment)

eubicht changed the title ~~Problems with scipy.linalg.pinv and scipy.linalg.pinv2~~ scipy.linalg.pinv gives wrong result while scipy.linalg.pinv2 works May 30, 2018

pvanmulbregt added the scipy.linalg label Mar 10, 2019

ismael-mendoza mentioned this issue Apr 5, 2019

grp covariances depending on simulation size?? LSSTDESC/WeakLensingDeblending#38

Closed

ilayn mentioned this issue Apr 16, 2019

MAINT: Fix the cutoff value inconsistency for pinv2 and pinvh #10067

Merged

ilayn closed this as completed in #10067 Apr 25, 2019

tylerjereddy added this to the 1.3.0 milestone Apr 25, 2019

PaulWAyers mentioned this issue Aug 22, 2022

Deprecated pinv2 function in scipy.linalg theochem/procrustes#174

Closed

FanwangM added a commit to FanwangM/procrustes that referenced this issue Aug 22, 2022

Remove deprecated pinv2 function in scipy

0c411ba

More details: scipy/scipy#8861 (comment)

FanwangM mentioned this issue Aug 22, 2022

Remove deprecated pinv2 function in scipy theochem/procrustes#176

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

scipy.linalg.pinv gives wrong result while scipy.linalg.pinv2 works #8861

scipy.linalg.pinv gives wrong result while scipy.linalg.pinv2 works #8861

scipy.linalg.pinv gives wrong result while scipy.linalg.pinv2 works #8861

scipy.linalg.pinv gives wrong result while scipy.linalg.pinv2 works #8861

Comments

Reproducing code example:

Output:

Scipy/Numpy/Python version information: