-
-
Notifications
You must be signed in to change notification settings - Fork 11k
ENH: add np.stack #5605
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENH: add np.stack #5605
Conversation
if axis < 0: | ||
axis += result_ndim | ||
sl = (slice(None),) * axis + (_nx.newaxis,) | ||
return _nx.concatenate([arr[sl] for arr in arrays], axis=axis) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't trust this on first sight (but if you have a test, ignore the comment). What if the second array has more dimensions then the first one?
One suggestion for the API. How about adding an ndmin
kwarg. That way hstack, etc. would really be just special cases of this function, I believe (though whether we should implement them as such, I don't know).
EDIT: Frankly, I am not sure if ndmin makes sense, it might be a bad design choice (and that hstack does it doesn't make it better)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me at least, it would be clearer if one were to do:
new_shape = array[0].shape[:axis] + [_nx.newaxis] + array[0].shape[axis:]
return _nx.concatenate([arr.reshape(new_shape) for arr in arrays], axis=axis)
(note that with the broadcasting suggestion, it might be faster to use the machinery from #5371 to get the broadcast shape at the top, and then use broadcast_to(arr, new_shape)
here. Alternatively, use b = np.broadcast(*arrays)
above, and new_shape = b.shape[:axis] ...
here).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@seberg I find the ndmin
behavior of the {v,h,d}stack
functions confusing. I would say that the lack of automated rules for adding dimensions is a feature of this function :).
@mhvk My only hesitation with using reshape is that I know it can result in copies instead of views in some edge cases. So, I usually stick to indexing to be safe.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@shoyer - actually, looking better both your method and mine can lead to unexpected errors when a later array has fewer dimensions than the first one, like::
np.stack([np.arange(9.).reshape(3,3), np.arange(3)], axis=2)
At least, I would think you would get sl=[:,:,np.newaxis]
in which case you get an IndexError: too many indices
on the second array (rather than the much more informative ValueError
that concatenate
would throw).
Though it would perhaps be needlessly expensive to test for this beforehand. Maybe just
try:
return _nx.concatenate(...)
except IndexError:
raise ValueError(<like concatenate>)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good catch -- fixed. Though again, weird stuff will happen if you pass an array whose dimensions can't be expanded in here and this will make that error message harder to track down. I would think that would include np.matrix, but look at what it does:
In [8]: np.mat('1')[:, :, None].shape
Out[8]: (1, 1, 1)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make the indexing slightly safer, one could explicitly use reshape(arr.shape[:axis] + [np.newaxis] + arr.shape[axis:])
after all -- in that case one uses both the shape
and reshape
of the array subclass itself. Though, matrix
is again interesting...
np.mat([[1,2],[3,4]]).reshape(2, 1, 2).shape
# (2, 2)
I don't know what's worse... Though again I don't feel one should "punish" other subclasses for strange behaviour of some.
@rgommers Good point. I did see that discussion on the mailing list; I did not realize that #5057 proposed using the same name. I'll raise this point again on the mailing list. I think both types of functionality are useful, but this is a more fundamental type of "stacking" (it unifies the existing |
I asked on the mailing list about the name Based on @sotte (author of #5057) and @njsmith's comments, it seems like we would be OK to use stack here -- #5057 should probably use something else, such as |
Looks all OK. |
Just looked over the api and I like it. row_stack and column_stack are special in that they magically alternate
|
@njsmith I'm afraid
To summarize: if From an implementation perspective, things are not so bad -- each of these methods just calls
Basically there is no way to understand these functions without reading the source code. |
Ping! Does this need more work? I'd love to be able to merge this in time for numpy 1.10.... |
|
||
Parameters | ||
---------- | ||
arrays : sequence of ndarrays |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The inputs are converted to arrays, so this should be "sequence of array_like"
Does this deal with array scalers and empty arrays? |
[3, 4]]) | ||
|
||
""" | ||
arrays = [asanyarray(arr) for arr in arrays] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens with mixed subtypes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could maybe check that all types are the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could do that by checking that set(type(a) for a in arrays)
has one member.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens for subtypes is mostly dictated by the behavior of np.concatenate
. I don't see much advantage in explicitly checking for consistent types here when none of the logic in this function relies on that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the type checking should be left for concatenate
(which does not currently do this all that well, but could be rewritten, e.g., using insert
methods if present on the first member or so).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK.
@charris It does seem to deal properly with array scalers and empty arrays -- I'll add some tests. |
% (axis, result_ndim, result_ndim)) | ||
if axis < 0: | ||
A93C axis += result_ndim | ||
sl = (slice(None),) * axis + (_nx.newaxis,) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An alternative method, once you have the shape of the arrays, is
newshape = shape[:axis] + (1,) + shape[axis:]
expanded_arrays = [a.reshape(newshape) for a in arrays]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or, getting rid of expanded_arrays
_nx.concatenate([a.reshape(newshape) for a in arrays], axis=axis)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like using slicing for this operation rather than reshape because I know that slicing will always using a view rather than a copy. Though I suppose reshape
is probably also safe when used in this way.
My latest commit includes changes in response to @charris's review (thanks!) |
@shoyer Some more inprovement in the summary part of the docstring is needed, then this can go in. |
1113c59
to
a9c7d8b
Compare
@charris I added your suggested words to the docstring and squashed my commits. I'm still waiting for Travis to run its tests, but they did pass (except for |
""" | ||
Join a sequence of arrays along a new axis. | ||
|
||
.. versionadded:: 1.10.0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.. versionadded:: 1.10.0
and its preceding blank line should come at the end of the summary ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed.
Looks good. One more nitpick and it's done... |
The motivation here is to present a uniform and N-dimensional interface for joining arrays along a new axis, similarly to how `concatenate` provides a uniform and N-dimensional interface for joining arrays along an existing axis. Background ~~~~~~~~~~ Currently, users can choose between `hstack`, `vstack`, `column_stack` and `dstack`, but none of these functions handle N-dimensional input. In my opinion, it's also difficult to keep track of the differences between these methods and to predict how they will handle input with different dimensions. In the past, my preferred approach has been to either construct the result array explicitly and use indexing for assignment, to or use `np.array` to stack along the first dimension and then use `transpose` (or a s D690 imilar method) to reorder dimensions if necessary. This is pretty awkward. I brought this proposal up a few weeks on the numpy-discussion list: http://mail.scipy.org/pipermail/numpy-discussion/2015-February/072199.html I also received positive feedback on Twitter: https://twitter.com/shoyer/status/565937244599377920 Implementation notes ~~~~~~~~~~~~~~~~~~~~ The one line summaries for `concatenate` and `stack` have been (re)written to mirror each other, and to make clear that the distinction between these functions is whether they join over an existing or new axis. In general, I've tweaked the documentation and docstrings with an eye toward pointing users to `concatenate`/`stack`/`split` as a fundamental set of basic array manipulation routines, and away from `array_split`/`{h,v,d}split`/`{h,v,d,column_}stack` I put this implementation in `numpy.core.shape_base` alongside `hstack`/`vstack`, but it appears that there is also a `numpy.lib.shape_base` module that contains another larger set of functions, including `dstack`. I'm not really sure where this belongs (or if it even matters). Finally, it might be a good idea to write a masked array version of `stack`. But I don't use masked arrays, so I'm not well motivated to do that.
Great. Thanks Stephan. |
The motivation here is to present a uniform and N-dimensional interface for joining arrays along a new axis, similarly to how
concatenate
provides a uniform and N-dimensional interface for joining arrays along an existing axis.Background
Currently, users can choose between
hstack
,vstack
,column_stack
anddstack
, but none of these functions handle N-dimensional input. In my opinion, it's also difficult to keep track of the differences between these methods and to predict how they will handle input with different dimensions.In the past, my preferred approach has been to either construct the result array explicitly and use indexing for assignment, to or use
np.array
to stack along the first dimension and then usetranspose
(or a similar method) to reorder dimensions if necessary. This is pretty awkward.I brought this proposal up a few weeks on the numpy-discussion list:
http://mail.scipy.org/pipermail/numpy-discussion/2015-February/072199.html
I also received positive feedback on Twitter:
https://twitter.com/shoyer/status/565937244599377920
Implementation notes
The one line summaries for
concatenate
andstack
have been (re)written to mirror each other, and to make clear that the distinction between these functions is whether they join over an existing or new axis.In general, I've tweaked the documentation and docstrings with an eye toward pointing users to
concatenate
/stack
/split
as a fundamental set of basic array manipulation routines, and away fromarray_split
/{h,v,d}split
/{h,v,d,column_}stack
I put this implementation in
numpy.core.shape_base
alongsidehstack
/vstack
, but it appears that there is also anumpy.lib.shape_base
module that contains another larger set of functions, includingdstack
. I'm not really sure where this belongs (or if it even matters).Finally, it might be a good idea to write a masked array version of
stack
. But I don't use masked arrays, so I'm not well motivated to do that.