-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
ENH: tracking issue for merging randomgen into numpy #13164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
xref #13163 |
The first three tasks are completed. |
marked more tasks as completed as #13163 has progressed. Added two more tasks to refactor the namespaces and more tightly integrate documentation |
Should remove |
Edit: reverted this change. We need to discuss this more widely, there still seem to be some use cases for |
@mattip This isn't quite right. You can generate every possible value using randint for a dtype, including one about the max value. random_integers is redundant. |
For example
|
I think the problem comes when the range is a NumPy integer type, which is unavoidable for broadcasting if we go that way. Python ints are fine. In the old generator the c-level routine worked with closed intervals, so all that was needed was |
The problem @charris mentions has come up on github at least once before. Some options I think I remember seeing:
|
The underlying C generation code is the same and works on closed intervals so that there are no issues with types in C. The value checking and downcasting are handled using object dtype for u/int64, and the next largest type for all other types. Broadcasting also works in extreme cases,
|
The danger comes with code like
How does this behave? |
On the other hand
Now if we want to get rid of randint and only have a bounded interval generator, then that would allow the code to be simplified. |
I still think
In [6]: np.random.random_integers(0,np.int64(2**63-1))
/home/kevin/miniconda3/bin/ipython:1: DeprecationWarning: This function is deprecated. Please call randint(0, 9223372036854775807 + 1) instead
#!/home/kevin/miniconda3/bin/python
/home/kevin/miniconda3/bin/ipython:1: RuntimeWarning: overflow encountered in long_scalars
#!/home/kevin/miniconda3/bin/python
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-6-a7b97ffba7f7> in <module>
----> 1 np.random.random_integers(0,np.int64(2**63-1))
mtrand.pyx in mtrand.RandomState.random_integers()
mtrand.pyx in mtrand.RandomState.randint()
ValueError: Range cannot be empty (low >= high) unless no samples are taken
|
The closed=True seems like the easiest fix. Would require virtually no changes except for the u/int64 paths. |
Should I mean only in RandomGenerator, not in RandomState |
|
I went quickly through the issues and PRs mentioned above and closed the ones I could do without too much thinking. I will make another pass through soonish. There are still a few open, as well as all those labelled with |
Do you think some of these need additional tests? |
I think #7861 can be closed now since one can use the new random API in low-level Cython code. |
Summary for the present status: Looking at the issues and PRs marked with
A number of other issues propose enhancements and clarifications on best practices, like #6132, or #9650 that can be done on an ongoing basis. |
I think zipf (and related) needs a |
To clarify a bit, the sample scaling is one problem, the other is that with values of |
A couple more things:
|
Oops, a slightly old version of that got copy-pasted. I also tried This is how I envisioned The # Each of these would be valid ways to instantiate a Philox BitGenerator.
np.random.Philox() # -> Philox(SeedSequence())
np.random.Philox(None) # -> Philox(SeedSequence(None))
np.random.Philox(12345) # -> Philox(SeedSequence(12345)
np.random.Philox([314159, 271828]) # -> Philox(SeedSequence([314159, 271828]))
np.random.Philox(SeedSequence(12345))
np.random.Philox(MyOwnSeedSequenceImplementation(12345))
np.random.Philox({'counter': np.array([0x01, 0, 0, 0], dtype=np.uint64),
'key': np.array([ 7639784375349706623, 18058138874403901375], dtype=np.uint64)}) Why the elif not hasattr(bit_generator, 'capsule'):
bit_generator = _MT19937(LegacyMTSeedSequence(bit_generator))` Other generator authors also provide their own recommended ways of expanding integer seeds out to internal states. I'm not concerned about replacing those with We discussed the spawning in the most recent Development Meeting. I think the general consensus was not to expose the API at the level of the I'm also happy to drop the |
In addition to changing
@bashtage: any thoughts? Especially around the idea to continue to maintain a repo of alternative BitGenerators that would not be directly part of this repo? |
As much as I like the SeedSequence idea, I am a bit unsure whether we need to expose it. I understand the argument that it hides I suppose the argument of being able to replace the actual seeding procedure is valid, but I am unsure if it is an actual likely use case. Or is there an API reason for wanting to use a SeedSequence for anything more then On the other hand, if we use it internally anyway, we could just deprecate it at some point and keep it around indefinitely without much harm.... |
To me, the costs entailed by exposing it are not much. Since the Since exposing I think the |
I am:
I think #6132 is . #9650 is solved using SeedSequence although it isn't automatic I think. |
I would like to close this issue, since it has served its purpose while we were intensively working on random. Not all the issues referenced are closed, but I don't think keeping this open is raising the chance of closing those. Please reopen with more details if you see a purpose in keeping this open. |
Merge bashtage/randomgen into numpy, as part of NEP 19.
python -mcython infile outfile
with no ability to add extra command line argsPost merge checklist
Edit: add more tasks (replace mtrand, fix TODOs)
Edit: add NEP check
Edit: add post-merge tasks
The text was updated successfully, but these errors were encountered: