8000 object array creation new conversion to int · Issue #3804 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

object array creation new conversion to int #3804

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
juliantaylor opened this issue Sep 27, 2013 · 17 comments
Closed

object array creation new conversion to int #3804

juliantaylor opened this issue Sep 27, 2013 · 17 comments

Comments

@juliantaylor
Copy link
Contributor

#3696 introduced a suble change in behavior for record arrays

import numpy as np
items = np.array([(1, 1), (2, 2)], dtype=[('f0', '<i4'), ('f1', '<i4')])
freq = [2, 1]
v = np.array([items, freq]).T
print map(type, v[1,0])

before the PR it keept the numpy scalars

[<type 'numpy.int32'>, <type 'numpy.int32'>]

now it converts them to python integers.

[<type 'int'>, <type 'int'>]

The reason seems to be that old code just copied the points of the object, while the new code goes through PyArray_CopyInto which calls dtype->get_element on each entry.
The get_elemnt of int arrays converts it to python integers in python2.

This causes a test failure in scipy where a int type array now compares false to the np.int32 type array (test_stats.py)

I'm not sure whats the best way to fix that or if it even should be fixed.

@seberg
Copy link
Member
seberg commented Nov 8, 2013

Hmmm, this is not quite clear to me. The old behaviour somewhat seems more sensible to me, however this overlaps:

a = np.ones(2, dtype=object)
a[...] = np.ones(2)
type(a[0]) == float

I.e. the numpy type gets converted to the python type in object assignments due to CopyTo behaviour as if a.item(0) got called.

@seberg
Copy link
Member
seberg commented Nov 8, 2013

Or identical, but maybe even closer to the question, .astype(object) will also convert the elements. If converting them there is right, then doing it everywhere might make sense, too...

@seberg
Copy link
Member
seberg commented Dec 6, 2013

I am seriously starting to think the change is good, but we should change the behaviour back by also changing that assignment subtlety. It would be a subtle change, but I really can't see why we should lose type information on object array assignments (won't rule out that it can be useful at times, but I think it is rather unexpected) or casting to object.

@charris
Copy link
Member
charris commented Feb 24, 2014

@juliantaylor @seberg What is the status here? Sounds like this should be fixed.

@seberg
Copy link
Member
seberg commented Feb 27, 2014

One option may be something like this (likely not exactly this):

--- a/numpy/core/src/multiarray/arraytypes.c.src
+++ b/numpy/core/src/multiarray/arraytypes.c.src
@@ -1169,7 +1169,7 @@ static void
     PyObject *tmp;
     for (i = 0; i < n; i++, ip +=skip, op++) {
         tmp = *op;
-        *op = @FROMTYPE@_getitem((char *)ip, aip);
+        *op = PyArray_Scalar((char *)ip, PyArray_DESCR(aip), aip);
         Py_XDECREF(tmp);
     }
 }

I just tried and it creates a couple of test failures, would have to investigate further if that might be an option or the failures are not easy to fix. It would be the change I mentioned above...

@seberg
Copy link
Member
seberg commented Apr 28, 2014

OK, so the problem with that above change is probably "just" that the numpy scalar types call into the ufunc/array machinery for their stuff. This leads to an infinite loop:

numpy_scalar + object
-> array(numpy_scalar) + object
-> for item in array(numpy_scalar) + object
->    numpy_scalar + object

kaboom.

@charris
Copy link
Member
charris commented May 5, 2014

I'm inclined to leave this be for 1.9. @juliantaylor @seberg Thoughts?

@seberg
Copy link
Member
seberg commented May 5, 2014

Well, I think the "correct" thing would be to do that change I mentioned. Since that requires some rework of the scalars... As far as I understood the problems scipy had with this are fixed or not very substantial. So, while ideally, I think we basically reverse the current object casting behaviour (reverse in the sense of the np.array change), this is probably not as bad as to block 1.9, I guess.

@charris
Copy link
Member
charris commented May 6, 2014

@seberg OK, I'll leave this open and remove it from the 1.9 blockers. Do you think it should be a 1.10 blocker? That way it won't get completely lost.

@charris charris modified the milestones: 1.9 blockers, 1.10 blockers May 6, 2014
@ahaldane
Copy link
Member
ahaldane commented Mar 8, 2015

it seems to me it should return a void scalar. Eg:

>>> items = np.array([(1, 1), (2, 2)], dtype=[('f0', '<i4'), ('f1', '<i4')])
>>> freq = [2, 1]
>>> x = np.array([items, freq]).T[1,0]
>>> type(x)
numpy.void
>>> x.dtype
dtype([('f0', '<i4'), ('f1', '<i4')])

Edit: Oh I see that's what it originally did

@ahaldane
Copy link
Member
ahaldane commented Mar 8, 2015

I think the solution might be to add a new function VOID_to_OBJECT in arraytypes.c.src, instead of using @FROMTYPE@_to_OBJECT as used presently.

@ahaldane
Copy link
Member

After examining it a little more:

I think the problem here is that object arrays have two distinct (sometimes incompatible) functions: 1) to allow arrays containing arbitrary python objects, and 2) to cast numpy-numeric-types to python-numeric-types. The two functions are conflicting in this issue.

  1. is useful for nested ndarrays of different lengths, or for arrays of special numeric types (eg mpmath's multiprecision objects). 2) is useful for large integers: In order to compute arange(10) + 2**128 numpy casts to Object (converting to python integers) since python integers have arbitrary size.

Here are a few examples that helped me think about what was going on. The current rule seems to be: If the sub-sequences are the same length cast all elements to the closest compatible type (which may be object/python-types if there is any non-numeric element). If the sub-sequences are different length, create an object array but do not cast to object.

def printtype(x):
    if isinstance(x, np.ndarray):
        if x.dtype == dtype('O'):
    return x.dtype, type(x[0])
        return x.dtype
    return type(x)
printtypes = lambda a: (a.dtype, [printtype(x) for x in a])

>>> printtypes( np.array([np.array([(1,),(2,),(3,)], dtype=[('f0', 'i4')]), [1,2,3]]) )
(dtype('O'), [(dtype('O'), tuple), (dtype('O'), int)])

>>> printtypes( np.array([np.array([(1,),(2,),(3,)], dtype=[('f0', 'i4')]), [1,2]]) )
(dtype('O'), [dtype([('f0', '<i4')]), list])

>>> printtypes( np.array([np.array([1,2,3]), [1,2]]) )
(dtype('O'), [dtype('int64'), list])

>>> printtypes( np.array([np.array([1,2,3]), [1,2,3]], dtype=np.object) )
(dtype('O'), [(dtype('O'), int), (dtype('O'), int)])

>>> printtypes( np.array([np.array([1,2,3]), [1,2,3]]) )
(dtype('int64'), [dtype('int64'), dtype('int64')])

However we do it, I think there should be a way of creating object arrays both with and without casting to python types. What should that look like?

One far-out idea is doing arange(10).astype(np.object) could give you an object array where each element is an np.int scalar, while arange(10).astype(np.pytype) would give you an object array of python ints.

@charris
Copy link
Member
charris commented Jun 19, 2015

@seberg @ahaldane I'm going to remove this from the 1.10 blockers. Do you think is still need fixing at some point?

@charris charris modified the milestones: 1.10 blockers, 1.11.0 release Jun 19, 2015
@njsmith
Copy link
Member
njsmith commented Jun 19, 2015

It looks to me like a valid bug, but it's not a 1.10 blocker in any sense.
On Jun 18, 2015 8:08 PM, "Charles Harris" notifications@github.com wrote:

@seberg https://github.com/seberg @ahaldane
https://github.com/ahaldane I'm going to remove this from the 1.10
blockers. Do you think is still need fixing at some point?


Reply to this email directly or view it on GitHub
#3804 (comment).

@ahaldane
Copy link
Member

I agree with @njsmith, it's not urgent

@charris
Copy link
Member
charris commented Jan 12, 2016

Bumbing up to 1.12 release, and will continue to do so until someone steps up to fix this.

@seberg
Copy link
Member
seberg commented Sep 9, 2020

I am going to close this. I think that gh-16876 basically half covers this, and its so many years ago that we should not worry about the initial "regression" anymore, I think.

@seberg seberg closed this as completed Sep 9, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants
0