object array creation new conversion to int #3804

juliantaylor · 2013-09-27T16:56:19Z

#3696 introduced a suble change in behavior for record arrays

import numpy as np
items = np.array([(1, 1), (2, 2)], dtype=[('f0', '<i4'), ('f1', '<i4')])
freq = [2, 1]
v = np.array([items, freq]).T
print map(type, v[1,0])

before the PR it keept the numpy scalars

[<type 'numpy.int32'>, <type 'numpy.int32'>]

now it converts them to python integers.

[<type 'int'>, <type 'int'>]

The reason seems to be that old code just copied the points of the object, while the new code goes through PyArray_CopyInto which calls dtype->get_element on each entry.
The get_elemnt of int arrays converts it to python integers in python2.

This causes a test failure in scipy where a int type array now compares false to the np.int32 type array (test_stats.py)

I'm not sure whats the best way to fix that or if it even should be fixed.

The text was updated successfully, but these errors were encountered:

seberg · 2013-11-08T23:24:33Z

Hmmm, this is not quite clear to me. The old behaviour somewhat seems more sensible to me, however this overlaps:

a = np.ones(2, dtype=object)
a[...] = np.ones(2)
type(a[0]) == float

I.e. the numpy type gets converted to the python type in object assignments due to CopyTo behaviour as if a.item(0) got called.

seberg · 2013-11-08T23:47:07Z

Or identical, but maybe even closer to the question, .astype(object) will also convert the elements. If converting them there is right, then doing it everywhere might make sense, too...

seberg · 2013-12-06T15:23:37Z

I am seriously starting to think the change is good, but we should change the behaviour back by also changing that assignment subtlety. It would be a subtle change, but I really can't see why we should lose type information on object array assignments (won't rule out that it can be useful at times, but I think it is rather unexpected) or casting to object.

charris · 2014-02-24T05:32:04Z

@juliantaylor @seberg What is the status here? Sounds like this should be fixed.

seberg · 2014-02-27T12:17:33Z

One option may be something like this (likely not exactly this):

--- a/numpy/core/src/multiarray/arraytypes.c.src
+++ b/numpy/core/src/multiarray/arraytypes.c.src
@@ -1169,7 +1169,7 @@ static void
     PyObject *tmp;
     for (i = 0; i < n; i++, ip +=skip, op++) {
         tmp = *op;
-        *op = @FROMTYPE@_getitem((char *)ip, aip);
+        *op = PyArray_Scalar((char *)ip, PyArray_DESCR(aip), aip);
         Py_XDECREF(tmp);
     }
 }

I just tried and it creates a couple of test failures, would have to investigate further if that might be an option or the failures are not easy to fix. It would be the change I mentioned above...

seberg · 2014-04-28T14:55:55Z

OK, so the problem with that above change is probably "just" that the numpy scalar types call into the ufunc/array machinery for their stuff. This leads to an infinite loop:

numpy_scalar + object
-> array(numpy_scalar) + object
-> for item in array(numpy_scalar) + object
->    numpy_scalar + object

kaboom.

charris · 2014-05-05T18:06:24Z

I'm inclined to leave this be for 1.9. @juliantaylor @seberg Thoughts?

seberg · 2014-05-05T20:58:41Z

Well, I think the "correct" thing would be to do that change I mentioned. Since that requires some rework of the scalars... As far as I understood the problems scipy had with this are fixed or not very substantial. So, while ideally, I think we basically reverse the current object casting behaviour (reverse in the sense of the np.array change), this is probably not as bad as to block 1.9, I guess.

charris · 2014-05-06T18:59:46Z

@seberg OK, I'll leave this open and remove it from the 1.9 blockers. Do you think it should be a 1.10 blocker? That way it won't get completely lost.

ahaldane · 2015-03-08T15:01:11Z

it seems to me it should return a void scalar. Eg:

>>> items = np.array([(1, 1), (2, 2)], dtype=[('f0', '<i4'), ('f1', '<i4')])
>>> freq = [2, 1]
>>> x = np.array([items, freq]).T[1,0]
>>> type(x)
numpy.void
>>> x.dtype
dtype([('f0', '<i4'), ('f1', '<i4')])

Edit: Oh I see that's what it originally did

ahaldane · 2015-03-08T15:45:40Z

I think the solution might be to add a new function VOID_to_OBJECT in arraytypes.c.src, instead of using @FROMTYPE@_to_OBJECT as used presently.

ahaldane · 2015-03-14T17:35:38Z

After examining it a little more:

I think the problem here is that object arrays have two distinct (sometimes incompatible) functions: 1) to allow arrays containing arbitrary python objects, and 2) to cast numpy-numeric-types to python-numeric-types. The two functions are conflicting in this issue.

is useful for nested ndarrays of different lengths, or for arrays of special numeric types (eg mpmath's multiprecision objects). 2) is useful for large integers: In order to compute arange(10) + 2**128 numpy casts to Object (converting to python integers) since python integers have arbitrary size.

Here are a few examples that helped me think about what was going on. The current rule seems to be: If the sub-sequences are the same length cast all elements to the closest compatible type (which may be object/python-types if there is any non-numeric element). If the sub-sequences are different length, create an object array but do not cast to object.

def printtype(x):
    if isinstance(x, np.ndarray):
        if x.dtype == dtype('O'):
    return x.dtype, type(x[0])
        return x.dtype
    return type(x)
printtypes = lambda a: (a.dtype, [printtype(x) for x in a])

>>> printtypes( np.array([np.array([(1,),(2,),(3,)], dtype=[('f0', 'i4')]), [1,2,3]]) )
(dtype('O'), [(dtype('O'), tuple), (dtype('O'), int)])

>>> printtypes( np.array([np.array([(1,),(2,),(3,)], dtype=[('f0', 'i4')]), [1,2]]) )
(dtype('O'), [dtype([('f0', '<i4')]), list])

>>> printtypes( np.array([np.array([1,2,3]), [1,2]]) )
(dtype('O'), [dtype('int64'), list])

>>> printtypes( np.array([np.array([1,2,3]), [1,2,3]], dtype=np.object) )
(dtype('O'), [(dtype('O'), int), (dtype('O'), int)])

>>> printtypes( np.array([np.array([1,2,3]), [1,2,3]]) )
(dtype('int64'), [dtype('int64'), dtype('int64')])

However we do it, I think there should be a way of creating object arrays both with and without casting to python types. What should that look like?

One far-out idea is doing arange(10).astype(np.object) could give you an object array where each element is an np.int scalar, while arange(10).astype(np.pytype) would give you an object array of python ints.

charris · 2015-06-19T03:08:27Z

@seberg @ahaldane I'm going to remove this from the 1.10 blockers. Do you think is still need fixing at some point?

njsmith · 2015-06-19T03:14:41Z

It looks to me like a valid bug, but it's not a 1.10 blocker in any sense.
On Jun 18, 2015 8:08 PM, "Charles Harris" notifications@github.com wrote:

@seberg https://github.com/seberg @ahaldane
https://github.com/ahaldane I'm going to remove this from the 1.10
blockers. Do you think is still need fixing at some point?

—
Reply to this email directly or view it on GitHub
#3804 (comment).

ahaldane · 2015-06-19T04:23:30Z

I agree with @njsmith, it's not urgent

charris · 2016-01-12T20:43:00Z

Bumbing up to 1.12 release, and will continue to do so until someone steps up to fix this.

seberg · 2020-09-09T18:45:21Z

I am going to close this. I think that gh-16876 basically half covers this, and its so many years ago that we should not worry about the initial "regression" anymore, I think.

juliantaylor mentioned this issue Jan 24, 2014

Fix test errors with numpy master scipy/scipy#3223

Merged

charris modified the milestones: 1.9 blockers, 1.10 blockers May 6, 2014

charris modified the milestones: 1.10 blockers, 1.11.0 release Jun 19, 2015

charris modified the milestones: 1.12.0 release, 1.11.0 release Jan 12, 2016

rgommers modified the milestone: 1.12.0 release Feb 15, 2017

ahaldane mentioned this issue Feb 19, 2018

DOC: Improve the description of the dtype parameter in numpy.array docstring #10614

Closed

seberg closed this as completed Sep 9, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

object array creation new conversion to int #3804

object array creation new conversion to int #3804

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

object array creation new conversion to int #3804

object array creation new conversion to int #3804

Comments

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!