-
-
Notifications
You must be signed in to change notification settings - Fork 11k
BUG: One Element Array Inputs Return Scalars in np.random #7055
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: One Element Array Inputs Return Scalars in np.random #7055
Conversation
PR is currently WIP because I still need to add a test to make sure that this behavior does not resurface. However, I just want to make sure that my changes aren't breaking any existing tests. |
@@ -1551,18 +1551,17 @@ cdef class RandomState: | |||
cdef double flow, fhigh, fscale | |||
cdef object temp | |||
|
|||
flow = PyFloat_AsDouble(low) | |||
fhigh = PyFloat_AsDouble(high) | |||
if np.array(low).shape == np.array(high).shape == (): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this double comparison valid Cython? Or does it end comparing True
(or False
) to ()
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems to have worked. I tested the methods briefly by hand (more formal testing is coming though!), and I didn't have any issues with the double (or even triple) comparison with regards to output data type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should add that while I do not know the validity of chained comparisons in Cython, I know it is valid in pure Python. I guess since you can compile pure Python with Cython, it should then be valid by extension? A quick look over the Cython docs does not indicate any issues with handling such chains.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, it is the Python thing, guess what confused me was the double ==
, but it does what it is expected to:
In [184]: 0 == 0 == 0
Out[184]: True
In [185]: (0 == 0) == 0
Out[185]: False
In [186]: 0 == (0 == 0)
Out[186]: False
Tests have been added, and Travis + Appveyor are happy. Should be good to merge as is, as I can uncomment the |
This change makes a call to In [39]: %timeit np.random.uniform(0, 1)
# current master
The slowest run took 19.17 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 199 ns per loop
# this PR
The slowest run took 26.30 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 988 ns per loop Not sure if this is really relevant, as bulk calling e.g. |
@jaimefrio : Should we wait longer, or can this be merged? |
You can get some of the speed back if you change the condition: if np.array(low).shape == np.array(high).shape == (): to: if (not (isinstance(low, np.ndarray) and low.shape) and
not (isinstance(high, np.ndarray) and high.shape)) In [2]: %timeit np.random.uniform(0, 1)
# this PR
1000000 loops, best of 3: 1.04 µs per loop
# my changes
1000000 loops, best of 3: 358 ns per loop
In [3]: %timeit np.random.uniform([0], [1])
# this PR
100000 loops, best of 3: 4.95 µs per loop
# my changes
100000 loops, best of 3: 3.76 µs per loop Not sure if the extra complexity is worth the gains, what do you think? |
@jaimefrio : IMHO, simplicity wins in this case. It seems that the performance scales relatively well from the timing analysis you provided earlier, so obfuscating the code in order to optimize the smallest input case possible does not seem worthwhile here. |
If we go this route, I think we should move the array creation up, rather than redoing it: olow = <ndarray>PyArray_FROM_OTF(low, NPY_DOUBLE, NPY_ARRAY_ALIGNED)
ohigh = <ndarray>PyArray_FROM_OTF(high, NPY_DOUBLE, NPY_ARRAY_ALIGNED)
if olow.shape == ohigh.shape == ():
... It will make a noticeable difference if |
Ah, yes, good point. Done. |
☔ The latest upstream changes (presumably #7082) made this pull request unmergeable. Please resolve the merge conflicts. |
This is going to be painful to rebase, but once you get it done, it should be ready to go. |
…andom Fixes bug in np.random methods that would return scalars when passed one-element array inputs. This is because one-element ndarrays can be cast to integers / floats, which is what functions like PyFloat_AsDouble do before converting to the intended data type. This commit changes the check used to determine whether the inputs are purely scalar by converting all inputs to arrays and checking if the resulting shape is an empty tuple (scalar) or not (array). Closes gh-4263.
@jaimefrio : Rebase is done, and Travis + Appveyor are happy. |
Accidentally clicked the wrong button when I wanted to comment. Whoops. |
BUG: One Element Array Inputs Return Scalars in np.random
Thanks Greg, one more in... |
👍 |
Fixes issue raised in #4263 in which
np.random
methods returned scalars when passed one-element array inputs. This is because one-element ndarrays can be cast to integers / floats, which is what functions likePyFloat_AsDouble
do before converting to the intended data type.This commit changes the check used to determine whether the inputs are purely scalar by, as suggested by @rkern, converting all inputs to arrays and checking if the resulting shape is an empty tuple (scalar) or not (array).