8000 BUG, API: np.random.multivariate_normal behavior with bad covariance matrix by cowlicks · Pull Request #5726 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

BUG, API: np.random.multivariate_normal behavior with bad covariance matrix #5726

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jan 4, 2017
Merged
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations 8000
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Next Next commit
BUG, API: Allow covariance matrix with small fp errors.
In np.random.multivariate_normal allow the covariance matrix to have
small floating point errors. And allow control over what to do if the
PSD check fails.
  • Loading branch information
cowlicks committed May 19, 2015
commit fc57915afddad49f49611a623ed8360a3b44ed25
30 changes: 21 additions & 9 deletions numpy/random/mtrand/mtrand.pyx
Original file line number Diff line number Diff line change
Expand Up @@ -4240,7 +4240,8 @@ cdef class RandomState:
self.lock)

# Multivariate distributions:
def multivariate_normal(self, mean, cov, size=None):
def multivariate_normal(self, mean, cov, size=None, check_valid='warn',
tol=1e-8):
"""
multivariate_normal(mean, cov[, size])

Expand All @@ -4265,6 +4266,10 @@ cdef class RandomState:
generated, and packed in an `m`-by-`n`-by-`k` arrangement. Because
each sample is `N`-dimensional, the output shape is ``(m,n,k,N)``.
If no shape is specified, a single (`N`-D) sample is returned.
check_valid : 'warn', 'raise', 'ignore'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

formatting: change to {'warn', 'raise', 'ignore'}, optional

Behavior when the covariance matrix is not Positive Semi-definite.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

style nitpick: use lower case for positive semidefinite

tol : float
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

float, optional

Tolerance of the singular values in covariance matrix.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

absolute or relative tolerance?


Returns
-------
Expand Down Expand Up @@ -4349,11 +4354,11 @@ cdef class RandomState:
shape = size

if len(mean.shape) != 1:
raise ValueError("mean must be 1 dimensional")
raise ValueError("mean must be 1 dimensional")
if (len(cov.shape) != 2) or (cov.shape[0] != cov.shape[1]):
raise ValueError("cov must be 2 dimensional and square")
raise ValueError("cov must be 2 dimensional and square")
if mean.shape[0] != cov.shape[0]:
raise ValueError("mean and cov must have same length")
raise ValueError("mean and cov must have same length")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a check for check_valid being one of the 3 accepted strings?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually, move it from line 4511 up here. It's misplaced there; only checked for when not psd.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that looks good now

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note that the check is itself later in the code, when checking. However, it may go unchecked when the psd passes, so I have rearranged the logic so it is always checked.

# Compute shape of output and create a matrix of independent
# standard normally distributed random numbers. The matrix has rows
Expand All @@ -4376,12 +4381,19 @@ cdef class RandomState:
# not zero. We continue to use the SVD rather than Cholesky in
# order to preserve current outputs. Note that symmetry has not
# been checked.

(u, s, v) = svd(cov)
neg = (np.sum(u.T * v, axis=1) < 0) & (s > 0)
if np.any(neg):
s[neg] = 0.
warnings.warn("covariance is not positive-semidefinite.",
RuntimeWarning)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would be the entire next part under
if check_valid != 'ignore':

no reason to check if we 'ignore'

if check_valid != 'ignore':
psd = np.allclose(np.dot(v.T * s, v), cov)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tol should be used here. It's not used anywhere now.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a test with non-default tol for good measure?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize. I wonder if tol should be changed to atol and rtol and just forward them in here.

Copy link
Member
@rgommers rgommers Jan 3, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You'll have to specify both atol and rtol (here, not in the signature), because the defaults are both nonzero: rtol=1e-05, atol=1e-08

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry my comment wasn't a reply to yours, non-updating interface.

atol = rtol = tol seems okay. atol is the only one of interest near covariance values of 0, and rtol may do something useful for very large values.

if not psd:
if check_valid == 'warn':
warnings.warn("covariance is not positive-semidefinite.",
RuntimeWarning)
elif check_valid == 'raise':
raise ValueError("covariance is not positive-semidefinite.")
else:
raise ValueError("check_valid must equal 'warn', 'raise', or 'ignore'")

x = np.dot(x, np.sqrt(s)[:, None] * v)
x += mean
Expand Down
0