8000 BUG: Inconsistent type resolution for 0d arrays · Issue #10322 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

BUG: Inconsistent type resolution for 0d arrays #10322

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
eric-wieser opened this issue Jan 4, 2018 · 11 comments
Closed

BUG: Inconsistent type resolution for 0d arrays #10322

eric-wieser opened this issue Jan 4, 2018 · 11 comments
Labels

Comments

@eric-wieser
Copy link
Member
eric-wieser commented Jan 4, 2018

Using direct ufunc calls below to eliminate operator overloading from the test case:

>>> one_eps = 1.00000001
>>> np.greater_equal(np.array(1.0, np.float32), np.array(one_eps, np.float64))
False
>>> np.greater_equal(np.array([1.0], np.float32), np.array(one_eps, np.float64))
array([ True])  # wrong!
>>> np.greater_equal(np.array([1.0], np.float32), np.array([one_eps], np.float64))
array([False])
>>> np.greater_equal(np.array(1.0, np.float32), np.array([one_eps], np.float64))
array([False])

Caused by result_type having some unusual rules:

>>> np.result_type(np.float32, np.float64)
dtype('float64') # ok
>>> np.result_type(np.float32, np.float64([1]))
dtype('float64') # ok
>>> np.result_type(np.float32, np.float64(1))
dtype('float32') # what

This seems wrong to me, but appears to be deliberate.

It seems the goal here is to interpret python scalars as loosely-typed (in the presence of arrays). This handles thing like:

  • letting integer literals choose between signed and unsigned
  • letting number literals have smaller representations than float64 and int_ if they'd fit

However, the method is too greedy, and also (in the presence of arrays) discards type information from:

  1. np.number instances with an explicit type, like np.int64(1) and np.float32(1)
  2. 0d ndarray instances, by decaying them to their scalars

The problem is that PyArray_ResultType (and the ufunc type resolver) has lost the information about where it's array arguments came from, and whether their types are explicit or implicit.

@eric-wieser eric-wieser changed the title BUG: Inconsistent type resolution for float32 arrays BUG: Inconsistent type resolution for 0d arrays Jan 4, 2018
@seberg
Copy link
Member
seberg commented Jan 4, 2018

Yeah, this stuff is pretty annoying... But I think we need to do a major release (hehe, we can do those anyway ;)) to change it another time.

I suppose we could try to aim for 0d arrays be interpreted as arrays here in principle. In practice you need to also fix the ufunc machinery to actually also signal the scalar when it was one, I am sure.

Also as long as 0d arrays are second class citizens compared to scalars, I do not really see this improving much personally. My personal opinion is still that we should aim for (in the very long run) 0d arrays and scalars (not mutable normally) and things like arr.sum() would be a scalar, while arr.sum((0, 1)) would be a 0d array. I actually believe that it is transparent, though its probably just consistent and that makes me think it is transparent ;).

@mhvk
Copy link
Contributor
mhvk commented Jan 4, 2018

Duplicate of #9194. It is probably worth noting that one can avoid the problem by also passing a dtype to the ufunc.

It also worth noting that the solution is not that obvious. E.g., the following would, generally be the expected result:

np.equal(np.arange(0., 0.3, 0.1, dtype=np.float32), 0.1)
# array([False,  True, False], dtype=bool)

At least, I think we'd get even more bug reports if the following was the default!

np.equal(np.arange(0., 0.3, 0.1, dtype=np.float32), 0.1, dtype=float)
# array([False, False, False], dtype=bool)

@eric-wieser
Copy link
Member Author
eric-wieser commented Jan 4, 2018

E.g., the following would, generally be the expected result:

Right, but I'd argue that's because it uses the float type for 0.1. I'd also expect:

np.equal(np.arange(0., 0.3, 0.1, dtype=np.float32), np.float64(0.1))
# array([False, False, False], dtype=bool)

@seberg
Copy link
Member
seberg commented Jan 4, 2018

Hehe, true, but equality with floats is a tricky beast in any case and nobody is suggesting to change it for the python scalar case I believe, possibly for np.equal(np.arange(0., 0.3, 0.1, dtype=np.float32), np.float64(0.1)) or at least for np.equal(np.arange(0., 0.3, 0.1, dtype=np.float32), np.array(0.1)).

Which both are of course a bit annoying since it breaks the "casting everything first with np.array" is the same thing identity of course….

@eric-wieser
Copy link
Member Author

@mhvk: I've leave it to you to decide whether to close this or the old one

@mhvk
Copy link
Contributor
mhvk commented Jan 4, 2018

Closed #9194, so we can discuss in one place.

It does seem to me array scalars should always be treated like arrays.

My ideal future would have gotten rid of numpy scalars altogether - use array scalars mostly and python numbers when needed - but perhaps I am missing what they can be useful for.

eric-wieser added a commit to eric-wieser/numpy that referenced this issue Feb 2, 2018
Fixes numpy#8123, closes numpy#9189, fixes numpy#10319

This is a workaround to numpy#10322, not a fix for it.

Adds tests for cases where bounds are more precise than the data, which led to inconsistencies in the optimized path.
hanjohn pushed a commit to hanjohn/numpy that referenced this issue Feb 15, 2018
Fixes numpy#8123, closes numpy#9189, fixes numpy#10319

This is a workaround to numpy#10322, not a fix for it.

Adds tests for cases where bounds are more precise than the data, which led to inconsistencies in the optimized path.
@seberg
Copy link
Member
seberg commented Mar 6, 2021

So my current though was that all examples here would be "fixed", but the Python operators might stay unmodified:

arr = np.arange(0., 0.3, 0.1, dtype=np.float32)
arr == 0.1

would actually return True for the second value (but == [0.1] would not and neither would np.equal(arr, 0.1)! (This can also be slightly crazy for integers, more so than currently. It also won't be quite trivial when you look at things like the / operator where np.result_type is not always correct, so that non-trivial promotion is necessary, probably on the ufunc even though the ufunc might never use it itself...)

@seberg seberg added the triage review Issue/PR to be discussed at the next triage meeting label Mar 10, 2021
@seberg seberg removed the triage review Issue/PR to be discussed at the next triage meeting label Mar 24, 2021
@seberg
Copy link
Member
seberg commented Mar 24, 2021

We had discussed this a bit (two weeks ago). I think we had some consensus, that we should probably just do it (but it will probably need testing).

By now, I am almost convinced that simply changing our "casting/promotion" rules to fix this everywhere (also in normal ufunc calls), might be the best thing. When scalars are involved, that would usually lead to higher precision (less scary), when Python scalars are involved, errors/warning on conversions should notify users of any changes (so this is also unlikely). The worst thing is probably:

 np.float32(3) + 3.  # the np.float32 could be created by indexing

Which previously would give float64, but with a change would give float32... I tried this a bit actually. SciPy doesn't really notice (in two tests or so it might, aside from the sparse tests comparing the result with NumPy). astropy did not notice, pandas I am not sure, it has its own type resolution, which probably needs to deal with it. (i.e. it seemed noisy, but probably a lot of that is one failure fairly low).

Compared to that, warning on future precision change is pretty darn noisy (largely due to this issue).

@asmeurer
Copy link
Member

For anyone else coming to this issue trying to understand what is happening, the example in the OP is a bit confusing, because it demonstrates the issue indirectly via a function that returns a bool. It's easier to see via a function that directly returns the promoted value, like add.

>>> add(array([1.0], dtype=float32), array([1.0], dtype=float64)).dtype
dtype('float64')
>>> add(array(1.0, dtype=float32), array(1.0, dtype=float64)).dtype
dtype('float64')
>>> add(array(1.0, dtype=float32), array([1.0], dtype=float64)).dtype
dtype('float64')
>>> add(array([1.0], dtype=float32), array(1.0, dtype=float64)).dtype
dtype('float32')

Also, if the resulting value does not fit in float32, the result is upcast (value-based casting)

>>> add(array([1.0], dtype=float32), array(3.4e+38, dtype=float64)).dtype
dtype('float64')

Finally, the issue only concerns 0-d arrays, not scalar objects

>>> add(float32(1.0), float64(1.0)).dtype
dtype('float64')
>>> add(float32(1.0), array([1.0], dtype=float64)).dtype
dtype('float64')
>>> add(array([1.0]), float64(1.0)).dtype
dtype('float64')
>>> add(array([1.0], dtype=float32), float64(3.4e+38)).dtype
dtype('float64')

@seberg
Copy link
Member
seberg commented Dec 20, 2022

Just to make it easier to find again, NEP 50 is the current attempt to fix these inconsistencies. (And is fairly far along, although it is still possible to modify details.)

@seberg
Copy link
Member
seberg commented Jan 20, 2024

Closing, weak promotion/NEP 50 should make these consistent.

@seberg seberg closed this as completed Jan 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants
0