-
-
Notifications
You must be signed in to change notification settings - Fork 11k
np.array(string_array.tolist(), dtype=int) is faster than without the .tolist() #11014
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
@alexmojaki I got a similar ratio of the times as you did using your script for numpy 1.13.3 but using IPython %timeit, the differences were less dramatic. `%timeit -n10 -r10 np.array(string_array, dtype=int) %timeit -n10 -r10 np.array(string_array.tolist(), dtype=int) Can you confirm your timing by other means? |
Using more manual timing: from contextlib import contextmanager
from time import time
import numpy as np
@contextmanager
def timer(description='Operation'):
start = time()
yield
elapsed = time() - start
message = '%s took %s seconds' % (description, elapsed)
print(message)
string_array = np.array(list(map(str, range(1000000))))
with timer('plain'):
for _ in range(10):
np.array(string_array, dtype=int)
with timer('tolist'):
for _ in range(10):
np.array(string_array.tolist(), dtype=int)
|
Something else must be going on. Using your script above, I ran it from n=2 to 10 and the ratios weren't as marked as yours. Here are a couple of the samples `n = 2 n = 4 n = 6 n = 8 n = 10 |
You mentioned using numpy 1.13.3. Did you mean 1.14.3? If not, can you try with that? Also what version of Python are you using? |
I am using python 3.6.2 and numpy 1.13.3. I tested using that setup to see if the behaviour existed with those versions.
I was interested if this was indeed a new issue. I can't replicate the differences to the degree that you have using numpy 1.14. Was this observed when you used numpy 1.13.x or have you just tested this solely on the current version?
…________________________________
From: Alex Hall <notifications@github.com>
Sent: April 30, 2018 8:06:05 PM
To: numpy/numpy
Cc: Dan Patterson; Comment
Subject: Re: [numpy/numpy] np.array(string_array.tolist(), dtype=int) is faster than without the .tolist() (#11014)
You mentioned using numpy 1.13.3. Did you mean 1.14.3? If not, can you try with that? Also what version of Python are you using?
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub<#11014 (comment)>, or mute the thread<https://github.com/notifications/unsubscribe-auth/AFyjn67ZHAqAhERQKHw110JkPSfPK3o8ks5tt6btgaJpZM4Tso4T>.
{"api_version":"1.0","publisher":{"api_key":"05dde50f1d1a384dd78767c55493e4bb","name":"GitHub"},"entity":{"external_key":"github/numpy/numpy","title":"numpy/numpy","subtitle":"GitHub repository","main_image_url":"https://cloud.githubusercontent.com/assets/143418/17495839/a5054eac-5d88-11e6-95fc-7290892c7bb5.png","avatar_image_url":"https://cloud.githubusercontent.com/assets/143418/15842166/7c72db34-2c0b-11e6-9aed-b52498112777.png","action":{"name":"Open in GitHub","url":"https://github.com/numpy/numpy"}},"updates":{"snippets":[{"icon":"PERSON","message":"@alexmojaki in #11014: You mentioned using numpy 1.13.3. Did you mean 1.14.3? If not, can you try with that? Also what version of Python are you using?"}],"action":{"name":"View Issue","url":"#11014 (comment)"}}}
|
Well I just downgraded to 1.13.3, and now |
@alexmojaki: Did |
It looks like in 1.14 1.13: plain took 3.423121929168701 seconds 3.5675266589969397 1.14: plain took 6.254313945770264 seconds 6.432628321927041 |
I haven't checked, but perhaps #9978 is involved... |
Actually, we modified My bet for this slowdown is now ##9856, since it modified the inner loop of for (i = 0; i < n; i++, ip+=skip, op+=oskip) {
- temp = @from@_getitem(ip, aip);
+ PyObject *new;
+ PyObject *temp = PyArray_Scalar(ip, PyArray_DESCR(aip), (PyObject *)aip); |
Not sure how best to fix this. There's a horrible dance going on between doing the work in Ideally we'd do a conversion straight from array to array, rather than the round trip that currently happens of:
I'm pretty sure there are more steps there, but would need to add instrumentation to find them |
Agreed. In that file we have a lot of functions like That's ignoring any back-compat quirks we might have to support if we change things. |
Uh oh!
There was an error while loading. Please reload this page.
np.array(string_array.tolist(), dtype=int)
is almost twice as fast as
np.array(string_array, dtype=int)
Full demo script:
Output:
This seems like an optimisation that numpy should be able to do automatically, and also possible a hint at an underlying performance problem.
Summary by @seberg 2021-11:
The text was updated successfully, but these errors were encountered: