ENH: add np.stack #5605

shoyer · 2015-02-25T10:38:50Z

The motivation here is to present a uniform and N-dimensional interface for joining arrays along a new axis, similarly to how concatenate provides a uniform and N-dimensional interface for joining arrays along an existing axis.

Background

Currently, users can choose between hstack, vstack, column_stack and dstack, but none of these functions handle N-dimensional input. In my opinion, it's also difficult to keep track of the differences between these methods and to predict how they will handle input with different dimensions.

In the past, my preferred approach has been to either construct the result array explicitly and use indexing for assignment, to or use np.array to stack along the first dimension and then use transpose (or a similar method) to reorder dimensions if necessary. This is pretty awkward.

I brought this proposal up a few weeks on the numpy-discussion list:
http://mail.scipy.org/pipermail/numpy-discussion/2015-February/072199.html

I also received positive feedback on Twitter:
https://twitter.com/shoyer/status/565937244599377920

Implementation notes

The one line summaries for concatenate and stack have been (re)written to mirror each other, and to make clear that the distinction between these functions is whether they join over an existing or new axis.

In general, I've tweaked the documentation and docstrings with an eye toward pointing users to concatenate/stack/split as a fundamental set of basic array manipulation routines, and away from array_split/{h,v,d}split/{h,v,d,column_}stack

I put this implementation in numpy.core.shape_base alongside hstack/vstack, but it appears that there is also a numpy.lib.shape_base module that contains another larger set of functions, including dstack. I'm not really sure where this belongs (or if it even matters).

Finally, it might be a good idea to write a masked array version of stack. But I don't use masked arrays, so I'm not well motivated to do that.

seberg · 2015-02-25T12:09:26Z

numpy/core/shape_base.py

+    if axis < 0:
+        axis += result_ndim
+    sl = (slice(None),) * axis + (_nx.newaxis,)
+    return _nx.concatenate([arr[sl] for arr in arrays], axis=axis)


I don't trust this on first sight (but if you have a test, ignore the comment). What if the second array has more dimensions then the first one?
One suggestion for the API. How about adding an ndmin kwarg. That way hstack, etc. would really be just special cases of this function, I believe (though whether we should implement them as such, I don't know).

EDIT: Frankly, I am not sure if ndmin makes sense, it might be a bad design choice (and that hstack does it doesn't make it better)

To me at least, it would be clearer if one were to do:

new_shape = array[0].shape[:axis] + [_nx.newaxis] + array[0].shape[axis:] return _nx.concatenate([arr.reshape(new_shape) for arr in arrays], axis=axis)

(note that with the broadcasting suggestion, it might be faster to use the machinery from #5371 to get the broadcast shape at the top, and then use broadcast_to(arr, new_shape) here. Alternatively, use b = np.broadcast(*arrays) above, and new_shape = b.shape[:axis] ... here).

@seberg I find the ndmin behavior of the {v,h,d}stack functions confusing. I would say that the lack of automated rules for adding dimensions is a feature of this function :).

@mhvk My only hesitation with using reshape is that I know it can result in copies instead of views in some edge cases. So, I usually stick to indexing to be safe.

@shoyer - actually, looking better both your method and mine can lead to unexpected errors when a later array has fewer dimensions than the first one, like::

np.stack([np.arange(9.).reshape(3,3), np.arange(3)], axis=2)

At least, I would think you would get sl=[:,:,np.newaxis] in which case you get an IndexError: too many indices on the second array (rather than the much more informative ValueError that concatenate would throw).

Though it would perhaps be needlessly expensive to test for this beforehand. Maybe just

try: return _nx.concatenate(...) except IndexError: raise ValueError(<like concatenate>)

Good catch -- fixed. Though again, weird stuff will happen if you pass an array whose dimensions can't be expanded in here and this will make that error message harder to track down. I would think that would include np.matrix, but look at what it does:

In [8]: np.mat('1')[:, :, None].shape Out[8]: (1, 1, 1)

To make the indexing slightly safer, one could explicitly use reshape(arr.shape[:axis] + [np.newaxis] + arr.shape[axis:]) after all -- in that case one uses both the shape and reshape of the array subclass itself. Though, matrix is again interesting...

np.mat([[1,2],[3,4]]).reshape(2, 1, 2).shape # (2, 2)

I don't know what's worse... Though again I don't feel one should "punish" other subclasses for strange behaviour of some.

rgommers · 2015-03-08T20:37:41Z

@shoyer did you see gh-5057, which also adds a stack function? A comparison with that PR may be useful (these PRs can't both be merged...).

shoyer · 2015-03-08T20:50:17Z

@rgommers Good point. I did see that discussion on the mailing list; I did not realize that #5057 proposed using the same name.

I'll raise this point again on the mailing list. I think both types of functionality are useful, but this is a more fundamental type of "stacking" (it unifies the existing *stack functions) and fills a more obvious hole in NumPy's functionality (no N-dimensional way to stack ndarray objects).

shoyer · 2015-03-22T18:33:49Z

I asked on the mailing list about the name np.stack last week: http://mail.scipy.org/pipermail/numpy-discussion/2015-March/072491.html

Based on @sotte (author of #5057) and @njsmith's comments, it seems like we would be OK to use stack here -- #5057 should probably use something else, such as barray or block.

mhvk · 2015-03-23T13:23:36Z

Looks all OK.

njsmith · 2015-03-23T19:10:02Z

Just looked over the api and I like it.

row_stack and column_stack are special in that they magically alternate
between stacking and concatenation depending on the dimensionality of the
input. But {h,v,d}stack are not so magical. Are they equivalent to
stack(..., axis=...)? (This is both a question about api consistency, and a
question about whether their implementation can be simplified.)
On Feb 25, 2015 2:38 AM, "Stephan Hoyer" notifications@github.com wrote:

The motivation here is to present a uniform and N-dimensional interface
for joining arrays along a new axis, similarly to how concatenate
provides a uniform and N-dimensional interface for joining arrays along an
existing axis.
Background

Currently, users can choose between hstack, vstack, column_stack and
dstack, but none of these functions handle N-dimensional input. In my
opinion, it's also difficult to keep track of the differences between these
methods and to predict how they will handle input with different dimensions.

In the past, my preferred approach has been to either construct the result
array explicitly and use indexing for assignment, to or use np.array to
stack along the first dimension and then use transpose (or a similar
method) to reorder dimensions if necessary. This is pretty awkward.

I brought this proposal up a few weeks on the numpy-discussion list:
http://mail.scipy.org/pipermail/numpy-discussion/2015-February/072199.html

I also received positive feedback on Twitter:
https://twitter.com/shoyer/status/565937244599377920
Implementation notes

The one line summaries for concatenate and stack have been (re)written to
mirror each other, and to make clear that the distinction between these
functions is whether they join over an existing or new axis.

In general, I've tweaked the documentation and docstrings with an eye
toward pointing users to concatenate/stack/split as a fundamental set of
basic array manipulation routines, and away from array_split/{h,v,d}split/
{h,v,d,column_}stack

I put this implementation in numpy.core.shape_base alongside hstack/vstack,
but it appears that there is also a numpy.lib.shape_base module that
contains another larger set of functions, including dstack. I'm not
really sure where this belongs (or if it even matters).

Finally, it might be a good idea to write a masked array version of stack.

But I don't use masked arrays, so I'm not well motivated to do that.

You can view, comment on, or merge this pull request online at:

#5605
Commit Summary

ENH: add np.stack

File Changes

M doc/release/1.10.0-notes.rst
https://github.com/numpy/numpy/pull/5605/files#diff-0 (3)

M doc/source/reference/routines.array-manipulation.rst
https://github.com/numpy/numpy/pull/5605/files#diff-1 (5)

M numpy/add_newdocs.py
https://github.com/numpy/numpy/pull/5605/files#diff-2 (3)

M numpy/core/shape_base.py
https://github.com/numpy/numpy/pull/5605/files#diff-3 (68)

M numpy/core/tests/test_shape_base.py
https://github.com/numpy/numpy/pull/5605/files#diff-4 (39)

M numpy/lib/function_base.py
https://github.com/numpy/numpy/pull/5605/files#diff-5 (2)

M numpy/lib/index_tricks.py
https://github.com/numpy/numpy/pull/5605/files#diff-6 (2)

M numpy/lib/shape_base.py
https://github.com/numpy/numpy/pull/5605/files#diff-7 (6)

Patch Links:

https://github.com/numpy/numpy/pull/5605.patch

https://github.com/numpy/numpy/pull/5605.diff

—
Reply to this email directly or view it on GitHub
#5605.

shoyer · 2015-03-23T20:29:38Z

@njsmith I'm afraid {h,v,d}stack are so magical (note that row_stack is an alias for vstack):

In [12]: x1 = np.zeros(1)

In [13]: x2 = np.zeros((1, 1))

In [14]: x3 = np.zeros((1, 1, 1))

In [16]: np.hstack([x1, x1]).shape
Out[16]: (2,)

In [17]: np.hstack([x2, x2]).shape
Out[17]: (1, 2)

In [18]: np.hstack([x3, x3]).shape
Out[18]: (1, 2, 1)

In [19]: np.vstack([x1, x1]).shape
Out[19]: (2, 1)

In [20]: np.vstack([x2, x2]).shape
Out[20]: (2, 1)

In [21]: np.vstack([x3, x3]).shape
Out[21]: (2, 1, 1)

In [22]: np.dstack([x1, x1]).shape
Out[22]: (1, 1, 2)

In [23]: np.dstack([x2, x2]).shape
Out[23]: (1, 1, 2)

In [24]: np.dstack([x3, x3]).shape
Out[24]: (1, 1, 2)

To summarize: if array.ndim < stack_dim (where stack_dim is 1 for hstack, 2 to vstack and 3 for dstack), then the {h,v,d}stack function stacks. Otherwise, it concatenates. There are currently no functions in the NumPy API that stack for arbitrary dimensional input -- that's why I made this PR.

From an implementation perspective, things are not so bad -- each of these methods just calls np.atleast_*d on all the inputs and then np.concatenate on the result. So in my opinion the API design problems are two fold:

The behavior of atleast_*d is not very intuitive -- axes are inserted in somewhat random locations:
```
In [45]: np.atleast_2d(np.zeros(2)).shape
Out[45]: (1, 2)

In [46]: np.atleast_3d(np.zeros(2)).shape
Out[46]: (1, 2, 1)

In [47]: np.atleast_3d(np.zeros((2, 2))).shape
Out[47]: (2, 2, 1)
```
If we were starting from scratch, I would pick the rule "always insert insert new dimensions at the start", which at least works like array broadcasting.
The axis that {h,v,d}stack concatenates along has no clear progression:
- hstack: use atleast_1d and concatenates with axis=1 (unless input is 1d, in which case axis=0)
- vstack: use atleast_2d and concatenates with axis=0
- dstack: use atleast_3d and concatenates with axis=2

Basically there is no way to understand these functions without reading the source code.

shoyer · 2015-04-22T02:31:42Z

Ping! Does this need more work? I'd love to be able to merge this in time for numpy 1.10....

charris · 2015-05-06T19:25:03Z

numpy/core/shape_base.py

+
+    Parameters
+    ----------
+    arrays : sequence of ndarrays


The inputs are converted to arrays, so this should be "sequence of array_like"

charris · 2015-05-06T19:49:04Z

Does this deal with array scalers and empty arrays?

charris · 2015-05-06T19:51:59Z

numpy/core/shape_base.py

+           [3, 4]])
+
+    """
+    arrays = [asanyarray(arr) for arr in arrays]


What happens with mixed subtypes?

Could maybe check that all types are the same.

Could do that by checking that set(type(a) for a in arrays) has one member.

What happens for subtypes is mostly dictated by the behavior of np.concatenate. I don't see much advantage in explicitly checking for consistent types here when none of the logic in this function relies on that.

I think the type checking should be left for concatenate (which does not currently do this all that well, but could be rewritten, e.g., using insert methods if present on the first member or so).

shoyer · 2015-05-06T20:06:46Z

@charris It does seem to deal properly with array scalers and empty arrays -- I'll add some tests.

charris · 2015-05-06T20:17:07Z

numpy/core/shape_base.py

+                         % (axis, result_ndim, result_ndim))
+    if axis < 0:
+        axis += result_ndim
+    sl = (slice(None),) * axis + (_nx.newaxis,)


An alternative method, once you have the shape of the arrays, is

newshape = shape[:axis] + (1,) + shape[axis:] expanded_arrays = [a.reshape(newshape) for a in arrays]

Or, getting rid of expanded_arrays

_nx.concatenate([a.reshape(newshape) for a in arrays], axis=axis)

I like using slicing for this operation rather than reshape because I know that slicing will always using a view rather than a copy. Though I suppose reshape is probably also safe when used in this way.

shoyer · 2015-05-08T08:27:42Z

My latest commit includes changes in response to @charris's review (thanks!)

charris · 2015-05-11T17:46:28Z

@shoyer Some more inprovement in the summary part of the docstring is needed, then this can go in.

shoyer · 2015-05-12T02:35:21Z

@charris I added your suggested words to the docstring and squashed my commits. I'm still waiting for Travis to run its tests, but they did pass (except for USE_BENTO=1) on my fork:
https://travis-ci.org/shoyer/numpy/builds/62164157

charris · 2015-05-12T04:09:14Z

numpy/core/shape_base.py

+    """
+    Join a sequence of arrays along a new axis.
+
+    .. versionadded:: 1.10.0


.. versionadded:: 1.10.0 and its preceding blank line should come at the end of the summary ;)

charris · 2015-05-12T04:10:41Z

Looks good. One more nitpick and it's done...

The motivation here is to present a uniform and N-dimensional interface for joining arrays along a new axis, similarly to how `concatenate` provides a uniform and N-dimensional interface for joining arrays along an existing axis. Background ~~~~~~~~~~ Currently, users can choose between `hstack`, `vstack`, `column_stack` and `dstack`, but none of these functions handle N-dimensional input. In my opinion, it's also difficult to keep track of the differences between these methods and to predict how they will handle input with different dimensions. In the past, my preferred approach has been to either construct the result array explicitly and use indexing for assignment, to or use `np.array` to stack along the first dimension and then use `transpose` (or a s D690 imilar method) to reorder dimensions if necessary. This is pretty awkward. I brought this proposal up a few weeks on the numpy-discussion list: http://mail.scipy.org/pipermail/numpy-discussion/2015-February/072199.html I also received positive feedback on Twitter: https://twitter.com/shoyer/status/565937244599377920 Implementation notes ~~~~~~~~~~~~~~~~~~~~ The one line summaries for `concatenate` and `stack` have been (re)written to mirror each other, and to make clear that the distinction between these functions is whether they join over an existing or new axis. In general, I've tweaked the documentation and docstrings with an eye toward pointing users to `concatenate`/`stack`/`split` as a fundamental set of basic array manipulation routines, and away from `array_split`/`{h,v,d}split`/`{h,v,d,column_}stack` I put this implementation in `numpy.core.shape_base` alongside `hstack`/`vstack`, but it appears that there is also a `numpy.lib.shape_base` module that contains another larger set of functions, including `dstack`. I'm not really sure where this belongs (or if it even matters). Finally, it might be a good idea to write a masked array version of `stack`. But I don't use masked arrays, so I'm not well motivated to do that.

ENH: add np.stack

charris · 2015-05-12T04:43:27Z

Great. Thanks Stephan.

seberg reviewed Feb 25, 2015
View reviewed changes

shoyer force-pushed the stack branch from 9061361 to cfb68d6 Compare February 26, 2015 04:19

rgommers added 01 - Enhancement component: numpy._core labels Mar 8, 2015

charris added this to the 1.10.0 release milestone Apr 22, 2015

charris reviewed May 6, 2015
View reviewed changes

sotte mentioned this pull request May 10, 2015

ENH: add block() function to create block arrays #5057

Closed

shoyer force-pushed the stack branch 2 times, most recently from 1113c59 to a9c7d8b Compare May 12, 2015 00:33

charris reviewed May 12, 2015
View reviewed changes

shoyer force-pushed the stack branch from a9c7d8b to 93d3b8d Compare May 12, 2015 04:18

charris added a commit that referenced this pull request May 12, 2015

Merge pull request #5605 from shoyer/stack

18c89db

ENH: add np.stack

charris merged commit 18c89db into numpy:master May 12, 2015

nouiz mentioned this pull request Jul 21, 2015

wrap numpy.stack Theano/Theano#3182

Closed

shoyer mentioned this pull request Feb 5, 2016

numpy.stack name is confusing given that it's different from hstack, vstack #7183

Closed

aukejw mentioned this pull request Feb 15, 2016

DOC: note in h/v/dstack points users to stack/concatenate #7253

Merged

Uh oh!

ENH: add np.stack #5605

ENH: add np.stack #5605

Uh oh!

Conversation

Background

Implementation notes

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

But I don't use masked arrays, so I'm not well motivated to do that.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!