8000 array.tolist() speed enhancement (migrated from Trac #1779) · Issue #3327 · thouis/numpy-trac-migration · GitHub
[go: up one dir, main page]

Skip to content
8000

array.tolist() speed enhancement (migrated from Trac #1779) #3327

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
thouis opened this issue Sep 27, 2012 · 10 comments
Closed

array.tolist() speed enhancement (migrated from Trac #1779) #3327

thouis opened this issue Sep 27, 2012 · 10 comments

Comments

@thouis
Copy link
Owner
thouis commented Sep 27, 2012

Original ticket http://projects.scipy.org/numpy/ticket/1779
Reported 2011-03-23 by trac user Han, assigned to unknown.

Hi,

For a while, a small issue has been bugging me, where array.tolist() takes a huge amount of time, compared to the speed of Python.

To illustrate, here are some timings (on Windows XP):

>>> timeit.timeit('a.tolist()', 'from numpy import arange; a = arange(1e5)', number=500)
19.07303038231646

The conversion of 500 x 100000 elements takes up to 20(!) seconds.

>>> timeit.timeit('a = arange(1e5)', 'from numpy import arange', number=500)
0.4194663546483639

While creating a NumPy arrays with the same amount of items takes .5 seconds.

>>> timeit.timeit('a = range(int(1e5))', number=500)
0.97656364422391562

And creating a Python list takes no more than 1 second, this is 20x faster than array.tolist(). So where is this discrepancy coming from?

To find out, I did some runs with valgrind on NumPy 1.4.1 dbg (Debian), and used kcachegrind to produce a few calling graphs.

The first thing I noticed was the amount of array_alloc and consequent array_dealloc calls. The number of calls amount up to the number of elements in the array! PyArray_NewFromDescr is called per array element, which creates a lot of overhead.

In NumPy 1.6.1b1, this overhead still exists; it stems from the PyArray_ToList function in convert.c:

NPY_NO_EXPORT PyObject *
PyArray_ToList(PyArrayObject *self)
{
    PyObject *lp;
    PyArrayObject *v;
    intp sz, i;

    if (!PyArray_Check(self)) {
        return (PyObject *)self;
    }
    if (self->nd == 0) {
        return self->descr->f->getitem(self->data,self);
    }

    sz = self->dimensions[0];
    lp = PyList_New(sz);
    for (i = 0; i < sz; i++) {
        v = (PyArrayObject *)array_big_item(self, i);
        if (PyArray_Check(v) && (v->nd >= self->nd)) {
            PyErr_SetString(PyExc_RuntimeError,
                            "array_item not returning smaller-" \
                            "dimensional array");
            Py_DECREF(v);
            Py_DECREF(lp);
            return NULL;
        }
        PyList_SetItem(lp, i, PyArray_ToList(v));
        Py_DECREF(v);
    }
    return lp;
}

For every element in the array, a array_big_item call is made to create a new array with that element and given recursively to PyArray_ToList to get the actual element item.

I added an extra clause to the function to account for 1-dimensional arrays:

    if (self->nd == 1) {
        sz = self->dimensions[0];
        lp = PyList_New(sz);
        for (i = 0; i < sz; i++) {
            PyList_SetItem(lp, i, self->descr->f->getitem(index2ptr(self, i),self));
        }
        return lp;
    }

Which gets the time down to 2 seconds on Windows.

I'm not sure about the patch, though, because it does not account for errors, and might be more optimized / streamlined.

Anyway, hope it can go in at 1.6.0 in some way or another, because it really helps in NumPy->Python conversions!

[attached: calling graphs]

@thouis
Copy link
Owner Author
thouis commented Sep 27, 2012

Attachment in Trac by trac user Han, 2011-03-23: callgraph005.jpg

@thouis
Copy link
Owner Author
thouis commented Sep 27, 2012

Attachment in Trac by trac user Han, 2011-03-23: callgraph001.jpg

@thouis
Copy link
Owner Author
thouis commented Sep 27, 2012

Attachment in Trac by trac user Han, 2011-03-23: convert.patch

@thouis
Copy link
Owner Author
thouis commented Sep 27, 2012

Comment in Trac by trac user Han, 2011-03-23

@thouis
Copy link
Owner Author
thouis commented Sep 27, 2012

Comment in Trac by trac user Han, 2011-03-23

Hmmm.. already spotted an issue where this is converted correctly:

>>> a = array([[1,2,3,object()],array([5,6,7,8])])
>>> a.tolist()
[[1, 2, 3, <object object at 0x00A515A0>], [5, 6, 7, 8]]

But this isn't:

>>> a = array([[1,2,3,object()],array([[5,6,7,8]])])
>>> a.tolist()
[[1, 2, 3, <object object at 0x00A51598>], array([[5, 6, 7, 8]])]

So the patch doesn't cover all bases..

@thouis
Copy link
Owner Author
thouis commented Sep 27, 2012

Comment in Trac by trac user Han, 2011-03-23

(Ah, sorry for the noise, that also was not possible in the original conversion, it seems.)

@thouis
Copy link
Owner Author
thouis commented Sep 27, 2012

Comment in Trac by trac user Han, 2011-03-23

When using multi-dimensional arrays, the advantage of this patch goes away.. Maybe it would be better to improve the conversion mechanism, instead..

To convert 20 times 50000x10x10 arrays still takes up to 150 seconds with patch, against 180 seconds without patch. (Python takes about 50 seconds from a list comprehension.)

@thouis
Copy link
Owner Author
thouis commented Sep 27, 2012

Comment in Trac by atmention:mwiebe, 2011-03-23

I made a pull request which gives some speedup.

numpy/numpy#59

# BASELINE:
In [16]: timeit a = np.arange(100000)
1000 loops, best of 3: 241 us per loop

In [17]: timeit range(100000)
100 loops, best of 3: 2.29 ms per loop

In [18]: timeit a.tolist()
100 loops, best of 3: 14.9 ms per loop

# WITH PATCH
In [3]: timeit a = np.arange(100000)
1000 loops, best of 3: 245 us per loop

In [4]: timeit range(100000)
100 loops, best of 3: 2.4 ms per loop

In [5]: timeit a.tolist()
100 loops, best of 3: 3.57 ms per loop

@thouis
Copy link
Owner Author
thouis commented Sep 27, 2012

Comment in Trac by trac user Han, 2011-03-23

Whoa, that was quick! :-D

Your patch also gives a large improvement on multidimensional data:

# Before:

>>> timeit.timeit('a.tolist()', 'from numpy import zeros; a = zeros((1e4,10,10))', numbe
8000
r=50)
20.638730049133301

# After:

>>> timeit.timeit('a.tolist()', 'from numpy import zeros; a = zeros((1e4,10,10))', number=50)
4.0468051433563232

Very nice, thanks!

@thouis
Copy link
Owner Author
thouis commented Sep 27, 2012

Comment in Trac by atmention:mwiebe, 2011-03-23

Fixed in c7040bab4c94b2e2cdf5 (in 1.6 branch) and 89292bfc866f5b45b57a (master).

@thouis thouis closed this as completed Sep 27, 2012
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant
0