8000 np.array(string_array.tolist(), dtype=int) is faster than without the .tolist() · Issue #11014 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content
8000

np.array(string_array.tolist(), dtype=int) is faster than without the .tolist() #11014

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
alexmojaki opened this issue Apr 30, 2018 · 12 comments

Comments

@alexmojaki
Copy link
alexmojaki commented Apr 30, 2018

np.array(string_array.tolist(), dtype=int)

is almost twice as fast as

np.array(string_array, dtype=int)

Full demo script:

from timeit import timeit
import numpy as np
import sys

string_array = np.array(list(map(str, range(1000000))))

print(timeit(lambda: np.array(string_array, dtype=int), number=10))
print(timeit(lambda: np.array(string_array.tolist(), dtype=int), number=10))
print(np.__version__)
print(sys.version)

Output:

3.470734696020372
2.058091988990782
1.14.2
3.6.2 (default, Jul 29 2017, 00:00:00) 
[GCC 4.8.4]

This seems like an optimisation that numpy should be able to do automatically, and also possible a hint at an underlying performance problem.


Summary by @seberg 2021-11:

  • As Eric notes at the end, the current timings should be mainly due to the weird casting functions.
  • The solution will be to implement new-style casts (instead of the weird legacy cast function), for string to integer casts. Even the old functions are bad (they go via scalars!), but there is probably not much point in trying to improve them.
@Dan-Patterson
Copy link

@alexmojaki I got a similar ratio of the times as you did using your script for numpy 1.13.3 but using IPython %timeit, the differences were less dramatic.

`%timeit -n10 -r10 np.array(string_array, dtype=int)
349 ms ± 4.07 ms per loop (mean ± std. dev. of 10 runs, 10 loops each)

%timeit -n10 -r10 np.array(string_array.tolist(), dtype=int)
317 ms ± 21.4 ms per loop (mean ± std. dev. of 10 runs, 10 loops each)`

Can you confirm your timing by other means?

@alexmojaki
Copy link
Author

Using more manual timing:

from contextlib import contextmanager
from time import time

import numpy as np


@contextmanager
def timer(description='Operation'):
    start = time()
    yield
    elapsed = time() - start
    message = '%s took %s seconds' % (description, elapsed)
    print(message)


string_array = np.array(list(map(str, range(1000000))))

with timer('plain'):
    for _ in range(10):
        np.array(string_array, dtype=int)

with timer('tolist'):
    for _ in range(10):
        np.array(string_array.tolist(), dtype=int)
plain took 6.220866918563843 seconds
tolist took 3.8044321537017822 seconds

@Dan-Patterson
Copy link

Something else must be going on. Using your script above, I ran it from n=2 to 10 and the ratios weren't as marked as yours. Here are a couple of the samples

`n = 2
plain took 0.6495838165283203 seconds
tolist took 0.622671365737915 seconds

n = 4
plain took 1.3122646808624268 seconds
tolist took 1.2245070934295654 seconds

n = 6
plain took 1.9206631183624268 seconds
tolist took 1.9051151275634766 seconds

n = 8
plain took 2.601041316986084 seconds
tolist took 2.455733299255371 seconds

n = 10
plain took 3.2278361320495605 seconds
tolist took 3.079383611679077 seconds`

@alexmojaki
Copy link
Author

You mentioned using numpy 1.13.3. Did you mean 1.14.3? If not, can you try with that? Also what version of Python are you using?

@Dan-Patterson
Copy link
Dan-Patterson commented May 1, 2018 via email

@alexmojaki
Copy link
Author

Well I just downgraded to 1.13.3, and now tolist() is slower as one would expect. So I guess it's a new issue.

@eric-wieser
Copy link
Member

@alexmojaki: Did .tolist() get slower in 1.13 on your machine, or did array(...) get faster (compared to 1.14)?

@alexmojaki
Copy link
Author

It looks like in 1.14 array got slower. Here are some fresh runs with both scripts:

1.13:

plain took 3.423121929168701 seconds
tolist took 3.9357383251190186 seconds

3.5675266589969397
3.9893020659219474

1.14:

plain took 6.254313945770264 seconds
tolist took 3.8215630054473877 seconds

6.432628321927041
3.8574256210122257

@ahaldane
Copy link
Member
ahaldane commented May 1, 2018

I haven't checked, but perhaps #9978 is involved...

@ahaldane
Copy link
Member
ahaldane commented May 1, 2018

Actually, we modified arraytypes.c.src a couple times in the last few months, that was just the first one I remembered.

My bet for this slowdown is now ##9856, since it modified the inner loop of STRING_to_INT:

     for (i = 0; i < n; i++, ip+=skip, op+=oskip) {
-        temp = @from@_getitem(ip, aip);
+        PyObject *new;
+        PyObject *temp = PyArray_Scalar(ip, PyArray_DESCR(aip), (PyObject *)aip);

@eric-wieser
Copy link
Member
eric-wieser commented May 1, 2018

Not sure how best to fix this. There's a horrible dance going on between doing the work inscalartypes.c.src and doing it in arraytypes.c.src

Ideally we'd do a conversion straight from array to array, rather than the round trip that currently happens of:

ndarray[np.str_]np.str_intndarray[np.int32]

I'm pretty sure there are more steps there, but would need to add instrumentation to find them

@ahaldane
Copy link
Member
ahaldane commented May 1, 2018

Agreed.

In that file we have a lot of functions like INT_fromstr, which appear to avoid all that complication by using PyOS_strtol, which should be fast. But that expects a null terminated string, and we have non-null terminated strings. C does not have a function to parse a non-null-terminated integer, so probably we'd have to make a null-terminated copy of the value first, or else roll our own strntol.

That's ignoring any back-compat quirks we might have to support if we change things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
0