8000 BUG: random: Create a legacy implementation of random.binomial. by WarrenWeckesser · Pull Request #14531 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

BUG: random: Create a legacy implementation of random.binomial. #14531

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Sep 19, 2019

Conversation

WarrenWeckesser
Copy link
Member

Create a legacy implementation of the random_binomial method of
RandomState that does not include the "short-circuit" check for
n == 0 or p == 0. This ensures that the stream of variates
is consistent with the behavior in 1.16.

Closes gh-14522.

Create a legacy implementation of the random_binomial method of
RandomState that does not include the "short-circuit" check for
n == 0 or p == 0.  This ensures that the stream of variates
is consistent with the behavior in 1.16.

Closes numpygh-14522.
@charris charris added the 09 - Backport-Candidate PRs tagged should be backported label Sep 17, 2019
@charris charris added this to the 1.17.3 release milestone Sep 17, 2019
@charris
Copy link
Member
charris commented Sep 17, 2019

Grrr. I don't know why the legacy functions were even touched, they should have been taken over exactly as they were and copied if they were needed for the new work so that the two implementations were strictly independent. As is, I see int_64 and have no idea if it is correct without wading through the code, the old code used long except in some of the rk_* functions. And I cannot directly compare to the original either. The attempt to have common code just made a mess.

Anyway, just venting. Although It's tempting to undertake a massive redo :) Grrr. Will take some time to work through this.

@WarrenWeckesser
Copy link
Member Author

@charris, yeah, making a fix like this is a bit convoluted, and as I was working on this, I was thinking what you were probably thinking: wouldn't it have been easier to leave the legacy code untouched, and port a copy of it to use int64 for the new Generator class? That would have duplicated a lot of code, but we don't plan on touching the old code much, so the duplication probably wouldn't have been that much of an added maintenance burden.

@charris
Copy link
Member
charris commented Sep 18, 2019

wouldn't it have been easier to leave the legacy code untouched

I'm thinking we may still want to do that, especially if we want to maintain the old lower level API. But maybe not for 1.17.x, too big a change. There was a small fix to legacy that we did backport, so we will need to keep track of that if we go that way.

@rgommers @rkern @bashtage @mattip Thoughts?

@rgommers
Copy link
Member

Sounds reasonable. I don't have much to add here since I haven't looked under the hood of the new numpy.random much.

@bashtage
Copy link
Contributor
bashtage commented Sep 18, 2019

We ended up here because it went to

a. new implementation ->
b. restore legacy normals ->
c. restore everything different (that we could see/test) ->
d. restore a separate mtrand.pyx with its own docstrings
e. limit RandomState to be really nearly the same as the old one (e.g., can't accept different BitGenerators).

EDIT: Removed wrong info, correction below

@bashtage
Copy link
Contributor

Once we got to (e) it would have made as much sense to restore the randomkit implementation and the 1.16 implementation. This is the easiest course now and it is the only way to ensure that there aren't further edge cases.

Restoring randomkit so that the two are orthogonal code bases also avoids future issues where someone wants to improve a shared C generator function which then has to be forked into a legacy_ version before changing.

What is wrong with my earlier missive is that the new infrastructure is required if the feature to share a single BitGenerator across both RandomState and Generator. So rolling this back would violate this feature.

So keeping randomkit isn't an option. What could be done would be to fully mirror the functions in distributions.c to legacy_distributions.c so that nothing is shared. This might simplify some maintenance later. It would also allow someone to perform the gruesome task of trying to compare a diff between randomkit and a complete legacy_distributions for inadvertant changes.

@bashtage
Copy link
Contributor
bashtage commented Sep 18, 2019

@mattip Do you know what the coverage reported here:

https://codecov.io/gh/numpy/numpy/src/master/numpy/random/src/distributions/distributions.c

is so spotty for distributions.c? I'm sure that almost all of that is tested in the standard test suite.

For example, random_noncentral_chisquare shows no coverage.

However, there is a test here that should hit that path:

https://github.com/numpy/numpy/blob/master/numpy/random/tests/test_generator_mt19937.py#L1103

I was thinking using the coverage to see if there are paths that aren't hit so that we could add tests from RandomState that covered this and made sure they passed against 1.16.

Edit

Looking at the coverage it seems that only tests of RandomState are being executed with coverage, and that tests of Generator are not. This seems to be the case since none of the Ziggurat algos are covered, and these aren't used by RandomState. The other with missing coverage depend on outputs of the Ziggurat and so also aren't covered by tests that hit RandomState.

@mattip
Copy link
Member
mattip commented Sep 18, 2019

pinging @tylerjereddy about coverage. It seems something is not right

@bashtage
Copy link
Contributor

When I try to get the coverage locally by running

python runtests.py --coverage --gcov -s random
python runtests.py --lcov-html

I get a warning and no coverage:

mnt/c/git/numpy/build/temp.linux-x86_64-3.7/numpy/random/src/distributions/distributions.gcda:stamp mismatch with notes file
geninfo: WARNING: gcov did not create any files for /mnt/c/git/numpy/build/temp.linux-x86_64-3.7/numpy/random/src/distributions/distributions.gcda!

I have to manually edit the time stamp in the files (offset 8, 4 bytes) so that the gcda's timestamp is after the gcno's. Possibly this is happening because distributions is being compiled multiple times by both the cython build of Generator and RandomState.

@mattip
Copy link
Member
mattip commented Sep 19, 2019

Let's put this in and not try to untangle the legacy code at this time. If down the road we run into more, we should discuss reverting the code before digging too much into the exact problem.

@mattip mattip merged commit 66fa2d7 into numpy:master Sep 19, 2019
@WarrenWeckesser WarrenWeckesser deleted the binomial-regression branch September 19, 2019 23:30
@charris charris added 06 - Regression and removed 09 - Backport-Candidate PRs tagged should be backported labels Sep 22, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

np.random.binomial produces inconsistent output with same seed in 1.16 -> 1.17
5 participants
0