10000 BUG: Concatenate with empty sequences, fixes #1586 by jaimefrio · Pull Request #6224 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

BUG: Concatenate with empty sequences, fixes #1586 #6224

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 637 commits into from

Conversation

jaimefrio
Copy link
Member

This is still WIP, as I may be complicating things more than needed. And tests are still missing.

The code avoids empty sequences in concatenate being converted to the default float64 type, which causes a typically undesired behavior:

>>> np.concatenate(([], [1, 2, 3, 4])).dtype
dtype('float64')

To do so, it remembers the last dtype it got, either from an ndarray, of from a non-ndarray that was converted to a non-empty ndarray, and casts non-ndarrays that convert to empty ndarrays to that dtype. Since at the beginning there is no dtype to remember, leading non-ndarrays that convert to empty ndarrays are discarded in the first pass, and converted again in a second pass.

Because we are using the size of the resulting ndarray as the criterion for emptiness, it also works with nested empty sequences, i.e. both of these produce arguably correct results:

>>> np.concatenate(([], [[1, 2, 3, 4]]), axis=None)
array([1, 2, 3, 4])
>>> np.concatenate(([[]], [[1, 2, 3, 4]], [[]]), axis=None)
array([1, 2, 3, 4])

The second case was not contemplated in the original #1586 issue, and the code can probably be simplified a little if only strictly empty sequences trigger the cast. But it is probably a good thing that [[]] behaves the same as [], right?

Thoughts on this are very welcome in order to write some meaningful tests.

@charris
Copy link
Member
charris commented Jan 8, 2016

I'd suggest ignoring empty tuples and arrays. The only information that they carry is dimensionality, which we might want to check, but otherwise they should have no effect.

jakirkham and others added 25 commits March 15, 2016 08:22
may return something else than one or zero and npy_bool is
unfortunately an int8 not a c99 bool
…Object_Repr`. Also, do a better job of handling any errors raised while constructing the error message.
This add benchmarks randint. There is one set of benchmarks for the
default dtype, 'l', that can be tracked back, and another set for the
new dtypes 'bool', 'uint8', 'uint16', 'uint32', and 'uint64'.
Added functions are

- cacos
- cacosf
- cacosl
- cacosh
- cacoshf
- cacoshl

Closes numpy#6063.
Fix issue numpy#6723.  Given an exotic masked structured array, where one of
the fields has a multidimensional dtype, make sure that, when accessing
this field, the fill_value still makes sense.  As it stands prior to this
commit, the fill_value will end up being multidimensional, possibly with
a shape incompatible with the mother array, which leads to broadcasting
errors in methods such as .filled().  This commit uses the first element
of this multidimensional fill value as the new fill value.  When more
than one unique value existed in fill_value, a warning is issued.

Also add a test to verify that fill_value.ndim remains 0 after indexing.
empty strings are the default for the new rpath,
extra_compile_args and extra_link_args sections
I have found that there are two missing numbers in a sequence in the documentation.
http://docs.scipy.org/doc/numpy/user/misc.html#interfacing-to-c

It goes 1,2,3,5,7,8 with missing 4 and 6.
Fixes GH6452

There are two types of datetime64/timedelta64 objects with generic times
units:
* NaT
* unit-less timedelta64 objects

Both of these should be safely castable to any more specific dtype. However,
more specific dtypes should not be safely castable to generic units.

Otherwise, the result of `np.datetime64('NaT')` or `np.timedelta(1)` is
entirely useless, because they can't be used in any arithmetic operations or
comparisons.

This is a regression from NumPy 1.9, where these sort of operations worked
because the default casting rules with ufuncs were less strict.
Now, NaT compares like NaN:
- NaT != NaT -> True
- NaT == NaT (and all other comparisons) -> False

We discussed this on the mailing list back in October:
https://mail.scipy.org/pipermail/numpy-discussion/2015-October/073968.html
Adds the 'order' parameter to the __new__ override
in MaskedArray construction, enabling it to be enforced
in methods like np.ma.core.array and np.ma.core.asarray.

Closes numpygh-6646.
np.put and np.place do something only when the first argument
is an instance of np.ndarray. These changes will cause a TypeError
to be thrown in either function should that requirement not be
satisfied.
… behavior of `MaskedArray`'s masks is changing.
… of their masks when they are also returning views of their data.
gkBCCN and others added 27 commits March 15, 2016 08:22
...to hstack, vstack, stack,
   hsplit, vsplit, dsplit, dstack
   that check that they raise exceptions.
The identity for bitwise_xor is zero.
Current value is 1, which only works for the low order bit. Use -1
instead.

Closes numpy#7060.
- test values
- test identity for bitwise_or, bitwise_xor, bitwise_and
The identity has changed from 1 to -1.
I know int is between 0 and 4294967295, but I think many people that do
not know that will benefit from this comment.

[ci skip]
Add note about wheels on pypi, and Windows wheels in particular.  See
discussion at: numpy#5479
Not everyone might recognize Travis status
Give hook to allow platform-specific installs to modify the
initialization of numpy.  Particular use-case is to allow check for SSE2
on Windows when shipping with ATLAS wheel.
Add description of ``numpy/_distribution_init.py`` file and init hook to
release notes for 1.12.0.
Found this while reading a docstring.
This was otherwise undocumented, so the nanprod.rst page wasn't being generated.
DEP: Deprecated using a float index in linspace

DEP: Deprecated using a float index in linspace

DEP: Deprecated using a float index in linspace

DEP: Deprecated using a float index in linspace

DEP: Deprecated using a float index in linspace

DEP: Release notes for PR#7328
Completely rewrote binary_repr method to use
the Python's built-in 'bin' function to generate
binary representations quickly.

Fixed the behaviour for negative numbers in which
insufficient widths resulted in outputs of all zero's
for the two's complement. Now, it will return the
two's complement with width equal to the minimum
number of bits needed to represent the complement.

Closes numpygh-7168.
The input arrays are documented to have ndim <=2, so check for that
and raise a ValueError on failure.
The non-nan elements of the result of corrcoef should satisfy the
inequality abs(x) <= 1 and the non-nan elements of the diagonal should
be exactly one. We can't guarantee those results due to roundoff, but
clipping the real and imaginary parts to the interval [-1, 1] improves
things to a small degree.

Closes numpy#7392.
This doesn't actually test much, as we don't have any inputs where that
was not already the case. But at least it is there and perhaps a fuzz
test can be added at a later date.
Empty non-arrays no longer participate in determining the dtype of
the concatenated array. Have also refactored the code to unify as
much as possible the logic for flattened and non-flattened paths,
unifying the error checks and adding a few more tests, both for
the fixed bug and for other functionality.
@jaimefrio jaimefrio closed this Mar 22, 2016
@jaimefrio jaimefrio deleted the concatenate_empty branch March 22, 2016 22:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0