8000 gh-91146: Reduce allocation size of list from str.split()/rsplit() by corona10 · Pull Request #95473 · python/cpython · GitHub
[go: up one dir, main page]

Skip to content

gh-91146: Reduce allocation size of list from str.split()/rsplit() #95473

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jul 31, 2022

Conversation

corona10
Copy link
Member
@corona10 corona10 commented Jul 30, 2022

Memory size

AS-IS

>>> import sys
>>> s = "1 2".split()
>>> sys.getsizeof(s)
152
>>> s = "12345".split()
>>> sys.getsizeof(s)
152
>>> s = "1 2 3 4 5".split()
>>> sys.getsizeof(s)
152
>>> s = "1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5".split()
>>> sys.getsizeof(s)
280

TO-BE (Allocation is reduced & No regression)


>>> import sys
>>> s = "1 2".split()
>>> sys.getsizeof(s)
88
>>> s = "12345".split()
>>> sys.getsizeof(s)
104
>>> s = "1 2 3 4 5".split()
>>> sys.getsizeof(s)
136
>>> s = "1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5".split()
>>> sys.getsizeof(s)
280

Performance: No regression

Script

import pyperf

def bench_split_tiny():
    s = "1 2".split()

def bench_split_small():
    s = "1 2 3 4 5".split()

def bench_split_larger():
    s = "1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5".split()

def bench_split_tiny_append():
    s = "1 2".split()
    for e in range(100):
        s.append("9")

def bench_split_small_append():
    s = "1 2 3 4 5".split()
    for e in range(100):
        s.append("9")

def bench_split_larger_append():
    s = "1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5 1 2 3 4 5".split()
    for e in range(100):
        s.append("9")

runner = pyperf.Runner()
runner.bench_func('bench_split_tiny', bench_split_tiny)
runner.bench_func('bench_split_small', bench_split_small)
runner.bench_func('bench_split_larger', bench_split_larger)
runner.bench_func('bench_split_tiny_append', bench_split_tiny_append)
runner.bench_func('bench_split_small_append', bench_split_small_append)
runner.bench_func('bench_split_larger_append', bench_split_larger_append)

Result

Benchmark base opt
bench_split_small (consistently 1-2% faster) 63.5 ns 62.7 ns: 1.01x faster
bench_split_tiny_append (unstable) 1.60 us 1.57 us: 1.02x faster
bench_split_small_append (unstable) 1.61 us 1.57 u 8000 s: 1.03x faster
Geometric mean (ref) 1.01x faster

Benchmark hidden because not significant (2): bench_split_tiny, bench_split_larger

@corona10 corona10 changed the title gh-91146: Reduce allocation size of list from str.split() gh-91146: Reduce allocation size of list from str.split()/rsplit() Jul 31, 2022
@corona10 corona10 added the performance Performance or resource usage label Jul 31, 2022
@methane
Copy link
Member
methane commented Jul 31, 2022

preallocation size can be smarter.

if (sub == NULL) {
    if (maxsplit < 0) {
        maxsplit = (len1+1) / 2;
    }
}
...
if (maxsplit < 0) {
    maxsplit = len1 / len2 + 1;
}

But this pull request is better than status quo and very simple.

@corona10
Copy link
Member Author

preallocation size can be smarter.

Thank you Naoki san, I like your suggestion too!
we can try the approach if we need the smarter way!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Performance or resource usage
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants
0