-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
Open
Description
np.array(string_array.tolist(), dtype=int)
is almost twice as fast as
np.array(string_array, dtype=int)
Full demo script:
from timeit import timeit
import numpy as np
import sys
string_array = np.array(list(map(str, range(1000000))))
print(timeit(lambda: np.array(string_array, dtype=int), number=10))
print(timeit(lambda: np.array(string_array.tolist(), dtype=int), number=10))
print(np.__version__)
print(sys.version)
Output:
3.470734696020372
2.058091988990782
1.14.2
3.6.2 (default, Jul 29 2017, 00:00:00)
[GCC 4.8.4]
This seems like an optimisation that numpy should be able to do automatically, and also possible a hint at an underlying performance problem.
Summary by @seberg 2021-11:
- As Eric notes at the end, the current timings should be mainly due to the weird casting functions.
- The solution will be to implement new-style casts (instead of the weird legacy cast function), for string to integer casts. Even the old functions are bad (they go via scalars!), but there is probably not much point in trying to improve them.
Metadata
Metadata
Assignees
Labels
No labels