ENH: change default empty array dtype, currently 'float64' #10405

nschloe · 2018-01-14T20:50:45Z

In Numpy, empty arrays have the default data type numpy.float64.

print(numpy.array([]).dtype)  # float64

I'm questioning whether this is the right choice.

In principle, the default dtype choice for empty arrays is arbitrary since there is no data to work with. One good reason to choose a small data type is that operations with empty arrays should not needlessly augment the dtype. The current setting leads to rather surprising dtype changes like

print(numpy.concatenate([[0], []]))  # [0.0], not [0]

This would not have happened with, e.g., numpy.array([]).dtype == numpy.int8.

The text was updated successfully, but these errors were encountered:

pv · 2018-01-14T21:14:35Z

The default data type is float64 also for zeros, empty, et al.

It's also useful to keep in mind that Numpy API is no longer as open for changes as it was 10 years ago when it was established, and small cosmetic changes (which usually are subjective) are more likely to be harmful than useful, in that they can break many existing codebases.

seberg · 2018-01-14T21:25:18Z

I do not even believe there is a reasonable "minimal dtype". Even if a change would be reasonably possible, I doubt it makes sense. Yes, this is sometimes annoying, but no changing the default type is not the solution, and generally I personally think the solution is: Too bad, sometimes the user may have to explicitly put in the dtype by doing the array cast by hand.

nschloe · 2018-01-15T09:20:25Z

Thank you very much for the comments. I agree with some and disagree with some other comments.

Yes, this is sometimes annoying, but no changing the default type is not the solution,

and small cosmetic changes (which usually are subjective)

I presented one example where changing the dtype does fix a problem. The change is certainly not cosmetic.

It's also useful to keep in mind that Numpy API is no longer as open for changes as it was 10 years ago when it was established, and small cosmetic changes (which usually are subjective) are more likely to be harmful than useful, in that they can break many existing codebases.

Indeed, changing something can always lead to problems downstream, and that is particularly risky if there is no estimation whatsoever how much could break. I would suggest that I make the change locally and compile a handful of large numerical libraries against it (e.g., scipy, sympy, pandas, scikit-learn, and astropy). If there are too many downstream changes required, I'd say let's forget about it. If there is no change, perhaps we could continue discussing.

seberg · 2018-01-15T12:16:14Z

Maybe it is not just cosmetic, but I disagree with the solution. It has a bad smell for me, basically it seems like replacing one problem with another one, and yes that might fix many problems but it also will create weird new ones (of course those might be better, but they might also be more tricky?!).
There are cases (such as concatenate), where you can assume that what is meant is: All arguments probably are supposed to have the same dtype, or cases like: This is an indexing array, so probably we can assume an integer is wanted. However, I believe those cases are limited to a specific scenario, and you are proposing a scenario agnostic solution, which I do not believe in.

nschloe · 2018-01-16T08:47:58Z

Thanks again for the comment.

Let me first say that I'm not emotionally attached to the fix, it's just an inconsistency I noticed which I though could be fixed easily enough. I can also understand that a package like numpy takes a rather conservative approach in software development. When rejecting, it certainly makes things clearer if a small counterexample is given, something along the lines of

The suggestion may fix one thing, but unfortunately it changes the behavior of
something(else)
so it's probably not worth fixing it this way.

That said, feel free to close.

seberg · 2018-01-17T19:09:58Z

Well, I do not have a very concrete example, but if you sum up an empty array or use its dtype for anything specifically, you will then get an integer dtype (possibly a small integer).
This whole thing is just guessing, int8 will in many cases be a more useful guess, but it is still a guess. And it is a dangerous type if you actually manage to use it somehow (read: most users do not want it or even really understand it well).
So yeah, maybe it is even a better guess on average, but I think it can also be an even more confusing guess sometimes.

godaygo · 2018-01-19T10:35:24Z

While I agree with initial idea, that float type is not the best default choice. Nevertheless, in the case of an integer type, especially int8, for many users, the following result will be very strange, although correct in nature:

# default is np.int8
>>> np.concatenate([[100], []]) + np.concatenate([[100], []])
np.array([-56], dtype=int8)

But, saying this, I still believe that if the default type was chosen as the numpy's default integer type, this would be more reasonable for me.

hmaarrfk · 2018-10-06T15:34:31Z

I think the toy example could be made more explicit to solve this ~~bug~~ unexpected behaviour.

index_array_a = np.asarray([0], dtype=int)
index_array_b = np.asarray([], dtype=int)
numpy.concatenate([index_array_a, index_array_b])

Seems to solve that particular edge case at least :S

if list concatenation is all that is needed,
then [0] + [] should be used.

I would be wary of stating that int8 is faster than float. I've tried to do some simple benchmarks and sometimes didn't find much difference.

hmaarrfk · 2018-10-06T15:44:36Z

very similar discussion happening here: #1586

rusu24edward · 2020-09-02T23:46:20Z

I think the toy example could be made more explicit to solve this ~~bug~~ unexpected behaviour.
index_array_a = np.asarray([0], dtype=int)
index_array_b = np.asarray([], dtype=int)
numpy.concatenate([index_array_a, index_array_b])

Would it be worthwhile to have dtype as an additional argument in concatenate?

seberg · 2020-09-03T01:08:46Z

Pretty much just requires a merge: gh-16134 (although in the meantime a merge conflict happened).

EDIT: Although, since that still requires casting, I am not sure it would help you much, it would still be a floating point array, so it is an unsafe cast to integer. That PR also adds casting="unsafe", but the question is wheth 7A1F er that is much nicer in that use-case. (Its still an important thing, because it was discussed as a prerequisite for other things)

This was referenced Feb 16, 2018

Consistency of docstrings between ones, zeros and empty. #10611

Closed

DOC: Improve the description of the dtype parameter in numpy.array docstring #10614

Closed

mattip changed the title ~~empty array has needlessly large data type~~ ENH: change default empty array dtype, currently 'float64' Jun 21, 2018

mattip added 01 - Enhancement 54 - Needs decision 57 - Close? Issues which may be closable unless discussion continued labels Jun 21, 2018

mattip added 23 - Wish List and removed 54 - Needs decision 57 - Close? Issues which may be closable unless discussion continued labels Apr 26, 2019

nschloe mentioned this issue Jan 19, 2021

np.append([], 0) -> float64 #18189

Closed

nschloe mentioned this issue Apr 14, 2021

arr.astype(None) converts to float64 #18774

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ENH: change default empty array dtype, currently 'float64' #10405

ENH: change default empty array dtype, currently 'float64' #10405

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ENH: change default empty array dtype, currently 'float64' #10405

ENH: change default empty array dtype, currently 'float64' #10405

Comments

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!