8000 python: use PyUnicode_FromStringAndSize() by methane · Pull Request #14895 · duckdb/duckdb · GitHub
[go: up one dir, main page]

Skip to content

python: use PyUnicode_FromStringAndSize() #14895

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Nov 19, 2024
Merged

python: use PyUnicode_FromStringAndSize() #14895

merged 1 commit into from
Nov 19, 2024

Conversation

methane
Copy link
Contributor
@methane methane commented Nov 19, 2024

DuckDB introduced optimization for UTF-8 decoder.
It is up to 40% faster for short non-ASCII case.
But it is 4x slower for long ASCII case.

Python has optimized code to decode ASCII. So decoding UTF-8 containing long ASCII part is faster than UTF8Proc::UTF8ToCodepoint.
And I am optimizing short non-ASCII case handling in CPython.

ref: python/cpython#126025 (comment)

Background

  • Using PEP 393 based API that heavily depending on current CPython internal in 3rd party code makes difficult to evolve Python internal (e.g. use UTF-8 as internal representation of Unicode).
  • Using PEP 393 slows down Python implementations other than CPython that use UTF-8 string representations. e.g. PyPy.
  • PyUnicode_FromStringAndSize is Stable ABI. Moving from non-Stable ABI to Stable ABI makes you possible to build Python modules that works with several Python versions.

@Mytherin
Copy link
Collaborator

Thanks! Playing around with this locally it seems to perform around as well as the current implementation, in which case definitely agreed we should switch back to the stable API.

@Mytherin Mytherin merged commit fbbfc4a into duckdb:main Nov 19, 2024
17 checks passed
@methane methane deleted the python-use-capi branch November 20, 2024 03:44
github-actions bot pushed a commit to duckdb/duckdb-r that referenced this pull request Dec 24, 2024
Top-N: Improve performance with large heaps, and correctly call Reduce (duckdb/duckdb#14900)
python: use PyUnicode_FromStringAndSize() (duckdb/duckdb#14895)
github-actions bot added a commit to duckdb/duckdb-r that referenced this pull request Dec 24, 2024
Top-N: Improve performance with large heaps, and correctly call Reduce (duckdb/duckdb#14900)
python: use PyUnicode_FromStringAndSize() (duckdb/duckdb#14895)

Co-authored-by: krlmlr <krlmlr@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants
0