8000 default data-type allocation for arrays could be misleading and harmful · Issue #18624 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

default data-type allocation for arrays could be misleading and harmful #18624

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
h4m3d43 opened this issue Mar 16, 2021 · 8 comments
Closed

Comments

@h4m3d43
Copy link
h4m3d43 commented Mar 16, 2021

Reproducing code example:

import numpy as np

a = np.array([1, 2, 3])
b = np.array([0.5, 0.5, 0.5])

c = a - b
# c = [0.5, 1.5, 2.5]

# problem is here:
n = np.array([0, 0, 0])  # n will be an integer array and will round c numbers automatically
n[0] = c[0]
n[1] = c[1]
n[2] = c[2]

print('n', n)
# will print [0, 1, 2] without any exception
# must print [0.5, 1.5, 2.5] or give some warning etc.

Error message:

NumPy/Python version information:

@bashtage
Copy link
Contributor

I think this falls into a category of ever changable. Or perhaps better described as this is an intentional features.

@seberg
Copy link
Member
seberg commented Mar 16, 2021

I don't think we can change the default data-type. I would go as far as argue that its convenient that the default is integer (when all inputs are integers).

The second point seems possible to me (a warning when the assignment is unsafe). A bit similar to the Complex warning we give when you cast complex to float.

@IchiruTake
Copy link
import numpy as np

a = np.array([1, 2, 3])
b = np.array([0.5, 0.5, 0.5])

c = a - b
# c = [0.5, 1.5, 2.5]

# problem is here:
n = np.array([0, 0, 0])  # n will be an integer array and will round c numbers automatically
n[0] = c[0]
n[1] = c[1]
n[2] = c[2]

print('n', n)
# will print [0, 1, 2] without any exception
# must print [0.5, 1.5, 2.5] or give some warning etc.

I am not sure about this, I suppose numpy datatype will convert your new data to merge with the defined datatype (NOT default datatype). I sure that changing one value only to fit with the datatype is completely faster than changing the data in the whole array, especially when it is large array. Ex: 50000 rows * 1000 columns.

Moreover, this implementation is used to ensure that your data must be matched with other data value instead.
Solve it by this: n = n.astype("float32"). It would help.
This problem is not a bug. And thus, don't need to be fixed

@h4m3d43
Copy link
Author
h4m3d43 commented Mar 17, 2021

we will have our desirable output if we use a python list instead of np array for defining n.
list will change it's data-type according to upcoming values data-types.
i still think it's a troublesome issue.
it takes hours for me to know where's the problem. even my code was not so complicated.
it's better to raise a warning at least, if np arrays could not be handled like lists.

@seberg
Copy link
Member
seberg commented Mar 17, 2021

@hamed4343 yeah, there are even code comments for years around this type of thing, saying things like "We really should be using same-kind casting here" (which would give you an error in most of these cases when doing things like arr[indx] = 3.5 for an integer array).
I don't think we can do that, but a warning that can be turned off may be possible. Although, I am not sure it will get up high enough on the priority list for anyone to spend serious time on it right now, and it will need some careful testing. There might be quite a lot of code out there relying on the fact that NumPy quietly does unsafe casts in these cases. That doesn't make it impossible, but slow to figure out.

@bashtage
Copy link
Contributor

There might be quite a lot of code out there relying on the fact that NumPy quietly does unsafe casts in these cases.

I think there is. I've seen (and probably written) code that performs some kind of rounding (usually floor) and then stores a float in an integer array.

x = np.array([1,2,3])
x[-1] = np.floor(3*np.random.rand())

When typing is more complete so that it is common to tell type checkers that x is an integer array, and if this becomes a type checking fault, then it will probably be fixed. Until then it is not very motivating.

@seberg
Copy link
Member
seberg commented Mar 17, 2021

Yeah, I am not sure its feasible considering the downstream impact. And right now there are more important things that will also annoy downstream and we try to avoid doing too much of that at once. One way to tune it down – and a cool feature – might be if we make it an "unsafe-but-warn-on-loss" casting level (is float64 -> float32 lossy there?). That would not really be a casting level though, since it is parameter for the actual cast loop! (It has to inspect the values as it goes, casting safety is defined only based the dtypes themselves – or should be. Unless you limit this to some scalars only at least.)

That would be more like adding a parameter to np.exp to allow it to use a less precise but faster loop, so it is probably a "maybe in a longer while"... There would be a lot to figure out first, but I would not be surprised if that is a fairly important feature to grow before long.

@seberg
Copy link
Member
seberg commented Apr 14, 2021

Thanks for the report. It is very unlikely we touch the way this works currently, we could touch how assignment does things, but that is a duplicate of e.g. gh-8733.

@seberg seberg closed this as completed Apr 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
0