8000 Allow many distributions to have a scale of 0. by anntzer · Pull Request #5822 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

Allow many distributions to have a scale of 0. #5822

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 23, 2016

Conversation

anntzer
Copy link
Contributor
@anntzer anntzer commented May 2, 2015

(in which case a stream of 0's is usually returned (or 1's)).

See #5818.

if np.any(np.less_equal(oshape, 0.0)):
raise ValueError("shape <= 0")
if np.any(np.less(oshape, 0.0)):
raise ValueError("shape < 0")
return cont1_array(self.internal_state, rk_standard_gamma, size,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is calling rk_standard_gamma with shape = 0. If you go to the source code of that function in distributions.c, you will find that, with shape = 0, you are computing things like X = pow(U, 1./shape)). I think we should handle that particular case directly there, not rely on the behavior of pow for corner cases.

@jaimefrio
Copy link
Member

Please, double check what the rk_xxx functions are doing when you pass them a 0 value as scale parameter, I have marked the ones I think should be handled explicitly.

Another thing to consider is whether we want to handle the scale == 0 case explicitly in all the distribution functions, so that the random state is unmodified when a non-random value is requested, e.g. rk_normal will compute, and then throw away by multiplying it by 0, an rk_gauss value for every call with scale = 0. Probably worth some discussion on the list.

@njsmith
Copy link
Member
njsmith commented May 2, 2015

Arguably the right way to handle a lot of those cases is to fix the rk_*
functions themselves, but I don't know the whole consequences of that for
maintenance. Is randomkit maintained somewhere upstream? Does our version
have other local changes? Anyone know?
On May 2, 2015 5:23 AM, "Jaime" notifications@github.com wrote:

Please, double check what the rk_xxx functions are doing when you pass
them a 0 value as scale parameter, I have marked the ones I think should
be handled explicitly.

Another thing to consider is whether we want to handle the scale == 0
case explicitly in all the distribution functions, so that the random state
is unmodified when a non-random value is requested, e.g. rk_normal
https://github.com/numpy/numpy/blob/master/numpy/random/mtrand/distributions.c#L104
will compute, and then throw away by multiplying it by 0, an rk_gauss
value for every call with scale = 0. Probably worth some discussion on
the list.


Reply to this email directly or view it on GitHub
#5822 (comment).

@anntzer
Copy link
Contributor Author
anntzer commented May 2, 2015

pow(x, 0) is not a "corner case", its behavior is well defined in man pow: "If the absolute value of x is less than 1, and y is positive infinity, the result is +0." Likewise, relying on 1/0=inf is totally standard (... afaict).
Note that I have at least checked that all results "work" by unit-testing them.
I don't see why requesting shape=0 should not advance the random state, except perhaps for performance reasons; but again if you care about that case you can easily add a check yourself at the Python level.

@jaimefrio
Copy link
Member

There are to sources of unease for me here:

  1. Are we sure that all platforms that numpy can be built on fully support IEC 60559?
  2. Are we sure that in all the cases relevant to this PR there can never be undefined behavior?

I don't know the answer to 1, but for 2, if you take a look at rk_pareto, it will return:

exp(rk_standard_exponential(state)/a) - 1

where a = 0 is the scale factor, and rk_standard_exponential returns:

-log(1.0 - rk_double(state))

Now if rk_double happens to generate a value of exactly 0, you will end up trying to compute

exp(0/0) - 1

which I am guessing will return a big fat NaN.

From a quick look it seems that pareto is the only one in trouble, but I really would appreciate if you could go over each of the functions you are proposing to change and verify that the behavior is well defined for all possible values coming from the RNG, and that the result is consistent for both scale = +0 and scale = -0.

@anntzer anntzer force-pushed the 0-scale-distributions branch from 374ab22 to c4ff5f6 Compare May 3, 2015 06:20
@anntzer
Copy link
Contributor Author
anntzer commented May 3, 2015

I apologize for pareto, this was a mistake. You may have noticed that unlike the other distributions, the editing was incomplete was pareto (only one <= changed into <), also it was the only one (I think) for which I didn't add a test_foo_0 test (which would have failed in all cases, in fact). I have shamelessly removed that mistake from the history.
Numpy doesn't pretend to support non-IEEE754 platforms (see http://www.scipy.org/scipylib/faq.html#does-numpy-support-nan), and right now given the way the RNGs are set up special casing 0 would probably imply an additional check for each single generated number (because you can pass an array of scales instead of a single one) -- probably not that expensive but seems a bit gratuitious.

The case of negative zero is more interesting; as it is at least some of the generators (gamma) cannot handle it, so I have prepared a patch that raises ValueError whenever -0. is given as a paremeter (just like before). Sounds reasonable?

@njsmith
Copy link
Member
njsmith commented May 3, 2015

I think treating negative zero like zero would be better? In general they
are designed to be indistinguishable except in a very few specific cases.
(For example, I'm pretty sure -0 + -0 returns +0.) I'm probably not the
most fp-wise person reading this though so would welcome second opinions.

Regarding special value handling in general: NumPy certainly requires
IEEE754 floating point, but IEEE754 doesn't specify the behavior of
transcendental functions. For that you need annex F to to the c standard,
and I believe this only came in with c99. I would definitely not assume
that NumPy is only used on machines that comply with annex F. In particular
I do not trust MSVC 2008 to get this right (it doesn't even try for c99
compliance, and I know one thing the mingw-w64 folks struggle with is that
at least some MSVC runtimes are not trustworthy on these issues), and
that's the standard compiler for python 2.7. At the least I'd like to see
tests pass on Windows before merging anything that depends on annex F
compliance.
On May 2, 2015 11:48 PM, "Antony Lee" notifications@github.com wrote:

I apologize for pareto, this was a mistake. You may have noticed that
unlike the other distributions, the editing was incomplete was pareto
(only one <= changed into <), also it was the only one (I think) for
which I didn't add a test_foo_0 test (which would have failed in all
cases, in fact). I have shamelessly removed that mistake from the history.
Numpy doesn't pretend to support non-IEEE754 platforms (see
http://www.scipy.org/scipylib/faq.html#does-numpy-support-nan), and right
now given the way the RNGs are set up special casing 0 would probably imply
an additional check for each single generated number (because you can pass
an array of scales instead of a single one) -- probably not that
expensive but seems a bit gratuitious.

The case of negative zero is more interesting; as it is at least some of
the generators (gamma) cannot handle it, so I have prepared a patch that
raises ValueError whenever -0. is given as a paremeter (just like before).


Reply to this email directly or view it on GitHub
#5822 (comment).

@rkern
Copy link
Member
rkern commented May 3, 2015

Everything in distributions.c is numpy's. Continuing to use the rk_* prefix was perhaps misleading, but dammit, I like having my initials all over the place! :-)

Most of the changes we made to randomkit.c were platform-specific fixes for the seeding. RandomKit saw one more independent release, 1.6, after it was incorporated into numpy, but mostly to add some new uniform PRNGs (unfortunately, not in a way that makes it any easier for us to add new uniform PRNGs) and QRNGs. But otherwise, it has been dead for the past decade.

@charris
Copy link
Member
charris commented May 3, 2015

but dammit, I like having my initials all over the place! :-)

Heh.

@anntzer
Copy link
Contributor Author
anntzer commented May 3, 2015

Actually -0.+-0. = -0. but that' not the point there :-) I can certainly switch the checks to "normalize -0. to +0.", I don't care much.
I cannot run the test suite on Windows myself, I agree that could be useful.

8000

@anntzer
Copy link
Contributor Author
anntzer commented Jun 19, 2015

I would like to see this going into 1.10. Kindly bumping the issue, now that 1.10 is being branched out.

I know that Pillow is starting to use Appveyor for Windows testing, perhaps that could be useful for numpy too?

@homu
Copy link
Contributor
homu commented Dec 17, 2015

☔ The latest upstream changes (presumably #6831) made this pull request unmergeable. Please resolve the merge conflicts.

@charris
Copy link
Member
charris commented Dec 17, 2015

@anntzer Oops, needs rebase after spelling fix went in...

@anntzer
Copy link
Contributor Author
anntzer commented Dec 17, 2015

Any news as to testing on Windows (which is all that blocks this issue)? Even though I gave in and finally installed MSVC for other purposes, I still can't manage to build numpy (this fails with an unresolved external symbol, __external_check_cookie).

(in which case a stream of 0's is usually returned (or 1's)).

See numpy#5818.
@anntzer anntzer force-pushed the 0-scale-distributions branch from 26f0e39 to 5bc4624 Compare April 7, 2016 02:26
@anntzer
Copy link
Contributor Author
anntzer commented Apr 7, 2016

Kindly bumping the issue. The remaining blocker was the lack of tests on Windows (wrt. MSVC compliance with the annex F of the C standard), but now we can check that the tests pass on Appveyor.

@@ -799,6 +835,9 @@ def test_weibull(self):
[0.67057783752390987, 1.39494046635066793]])
assert_array_almost_equal(actual, desired, decimal=15)

def test_weibull_0(self):
assert_equal(np.random.weibull(a=0), 0)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Real nitpick, but for completeness might as well test for the exception being raise on -0. here as well.

This does look all OK otherwise.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Force edited.

At least the gamma generator doesn't support it.
@anntzer anntzer force-pushed the 0-scale-distributions branch from 5bc4624 to 0319d0c Compare June 9, 2016 15:58
@njsmith
Copy link
Member
njsmith commented Jun 23, 2016

Sorry for how long this has been sitting, and thanks for being stubborn about pinging. Indeed the windows tests seem to be passing, I don't see any complaints, and from a quick read through the changes it LGTM, so merging!

@njsmith njsmith merged commit c62b20d into numpy:master Jun 23, 2016
@anntzer anntzer deleted the 0-scale-distributions branch June 23, 2016 03:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants
0