default data-type allocation for arrays could be misleading and harmful #18624

h4m3d43 · 2021-03-16T08:24:29Z

Reproducing code example:

import numpy as np

a = np.array([1, 2, 3])
b = np.array([0.5, 0.5, 0.5])

c = a - b
# c = [0.5, 1.5, 2.5]

# problem is here:
n = np.array([0, 0, 0])  # n will be an integer array and will round c numbers automatically
n[0] = c[0]
n[1] = c[1]
n[2] = c[2]

print('n', n)
# will print [0, 1, 2] without any exception
# must print [0.5, 1.5, 2.5] or give some warning etc.

Error message:

NumPy/Python version information:

bashtage · 2021-03-16T13:40:25Z

I think this falls into a category of ever changable. Or perhaps better described as this is an intentional features.

seberg · 2021-03-16T14:10:20Z

I don't think we can change the default data-type. I would go as far as argue that its convenient that the default is integer (when all inputs are integers).

The second point seems possible to me (a warning when the assignment is unsafe). A bit similar to the Complex warning we give when you cast complex to float.

IchiruTake · 2021-03-17T07:32:39Z

import numpy as np

a = np.array([1, 2, 3])
b = np.array([0.5, 0.5, 0.5])

c = a - b
# c = [0.5, 1.5, 2.5]

# problem is here:
n = np.array([0, 0, 0])  # n will be an integer array and will round c numbers automatically
n[0] = c[0]
n[1] = c[1]
n[2] = c[2]

print('n', n)
# will print [0, 1, 2] without any exception
# must print [0.5, 1.5, 2.5] or give some warning etc.

I am not sure about this, I suppose numpy datatype will convert your new data to merge with the defined datatype (NOT default datatype). I sure that changing one value only to fit with the datatype is completely faster than changing the data in the whole array, especially when it is large array. Ex: 50000 rows * 1000 columns.

Moreover, this implementation is used to ensure that your data must be matched with other data value instead.
Solve it by this: n = n.astype("float32"). It would help.
This problem is not a bug. And thus, don't need to be fixed

h4m3d43 · 2021-03-17T11:00:34Z

we will have our desirable output if we use a python list instead of np array for defining n.
list will change it's data-type according to upcoming values data-types.
i still think it's a troublesome issue.
it takes hours for me to know where's the problem. even my code was not so complicated.
it's better to raise a warning at least, if np arrays could not be handled like lists.

seberg · 2021-03-17T15:04:02Z

@hamed4343 yeah, there are even code comments for years around this type of thing, saying things like "We really should be using same-kind casting here" (which would give you an error in most of these cases when doing things like arr[indx] = 3.5 for an integer array).
I don't think we can do that, but a warning that can be turned off may be possible. Although, I am not sure it will get up high enough on the priority list for anyone to spend serious time on it right now, and it will need some careful testing. There might be quite a lot of code out there relying on the fact that NumPy quietly does unsafe casts in these cases. That doesn't make it impossible, but slow to figure out.

bashtage · 2021-03-17T15:34:46Z

There might be quite a lot of code out there relying on the fact that NumPy quietly does unsafe casts in these cases.

I think there is. I've seen (and probably written) code that performs some kind of rounding (usually floor) and then stores a float in an integer array.

x = np.array([1,2,3])
x[-1] = np.floor(3*np.random.rand())

When typing is more complete so that it is common to tell type checkers that x is an integer array, and if this becomes a type checking fault, then it will probably be fixed. Until then it is not very motivating.

seberg · 2021-03-17T15:48:32Z

Yeah, I am not sure its feasible considering the downstream impact. And right now there are more important things that will also annoy downstream and we try to avoid doing too much of that at once. One way to tune it down – and a cool feature – might be if we make it an "unsafe-but-warn-on-loss" casting level (is float64 -> float32 lossy there?). That would not really be a casting level though, since it is parameter for the actual cast loop! (It has to inspect the values as it goes, casting safety is defined only based the dtypes themselves – or should be. Unless you limit this to some scalars only at least.)

That would be more like adding a parameter to np.exp to allow it to use a less precise but faster loop, so it is probably a "maybe in a longer while"... There would be a lot to figure out first, but I would not be surprised if that is a fairly important feature to grow before long.

seberg · 2021-04-14T19:25:15Z

Thanks for the report. It is very unlikely we touch the way this works currently, we could touch how assignment does things, but that is a duplicate of e.g. gh-8733.

seberg closed this as completed Apr 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

default data-type allocation for arrays could be misleading and harmful #18624

default data-type allocation for arrays could be misleading and harmful #18624

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

default data-type allocation for arrays could be misleading and harmful #18624

default data-type allocation for arrays could be misleading and harmful #18624

Comments

Reproducing code example:

Error message:

NumPy/Python version information:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!