8000 gh-119182: Add PyUnicodeWriter C API by vstinner · Pull Request #119184 · python/cpython · GitHub
[go: up one dir, main page]

Skip to content

gh-119182: Add PyUnicodeWriter C API #119184

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 15 commits into from
Jun 17, 2024
Merged
Prev Previous commit
Next Next commit
Add documentation
  • Loading branch information
vstinner committed Jun 7, 2024
commit 99fa2cb4a146e7d9b7b58c8ad73dc0c87fca5f3a
75 changes: 75 additions & 0 deletions Doc/c-api/unicode.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1502,3 +1502,78 @@ They all return ``NULL`` or ``-1`` if an exception occurs.
:c:func:`PyUnicode_InternInPlace`, returning either a new Unicode string
object that has been interned, or a new ("owned") reference to an earlier
interned string object with the same value.

PyUnicodeWriter
^^^^^^^^^^^^^^^

The :c:type:`PyUnicodeWriter` API can be used to create a Python :class:`str`
object.

.. versionadded:: 3.14

.. c:type:: PyUnicodeWriter

An Unicode writer instance.

.. c:function:: PyUnicodeWriter* PyUnicodeWriter_Create(Py_ssize_t length)

Create an Unicode writer instance.

Set an exception and return ``NULL`` on error.

.. c:function:: void PyUnicodeWriter_Discard(PyUnicodeWriter *writer)

Discard an Unicode writer instance: free its memory.

.. c:function:: PyObject* PyUnicodeWriter_Finish(PyUnicodeWriter *writer)

Get the final Python :class:`str` object and free the writer instance.

Set an exception and return ``NULL`` on error.

.. c:function:: int PyUnicodeWriter_WriteChar(PyUnicodeWriter *writer, Py_UCS4 ch)

Write a single Unicode character.

Return ``0`` on success, or set an exception and return ``-1`` on error.

.. c:function:: int PyUnicodeWriter_WriteUTF8(PyUnicodeWriter *writer, const char *str, Py_ssize_t size)

Decode a string from UTF-8 in strict mode and write the output into the
writer.

*size* is the string length in bytes. If *size* is equal to ``-1``, call
``strlen(str)`` to get the string length.

Return ``0`` on success, or set an exception and return ``-1`` on error.

.. c:function:: int PyUnicodeWriter_WriteStr(PyUnicodeWriter *writer, PyObject *str)

Call :c:func:`PyObject_Str(obj) <PyObject_Str>` and write the output into
the writer.

Return ``0`` on success, or set an exception and return ``-1`` on error.

.. c:function:: int PyUnicodeWriter_WriteRepr(PyUnicodeWriter *writer, PyObject *obj)

Call :c:func:`PyObject_Repr(obj) <PyObject_Repr>` and write the output into
the writer.

Return ``0`` on success, or set an exception and return ``-1`` on error.

.. c:function:: int PyUnicodeWriter_WriteSubstring(PyUnicodeWriter *writer, PyObject *str, Py_ssize_t start, Py_ssize_t end)

Write the substring ``str[start:end]`` into the writer.

*str* must be Python :class:`str` object. *start* must be greater than or
equal to 0, and less than or equal to *end*. *end* must be less than or
equal to *str* length.
Comment on lines +1576 to +1578
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit; I prefer to use SemBr for paragraphs like this.

Suggested change
*str* must be Python :class:`str` object. *start* must be greater than or
equal to 0, and less than or equal to *end*. *end* must be less than or
equal to *str* length.
*str* must be Python :class:`str` object.
*start* must be greater than or equal to 0,
and less than or equal to *end*.
*end* must be less than or equal to *str* length.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL that this is called SemBr!

Breaking on comma may be too much, but I prefer to break at the sentence boundary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. You don't need to break at comma, but I often do to minimise future diffs.

Alternative suggestion:

Suggested change
*str* must be Python :class:`str` object. *start* must be greater than or
equal to 0, and less than or equal to *end*. *end* must be less than or
equal to *str* length.
*str* must be Python :class:`str` object.
*start* must be greater than or equal to 0, and less than or equal to *end*.
*end* must be less than or equal to *str* length.


Return ``0`` on success, or set an exception and return ``-1`` on error.

.. c:function:: int PyUnicodeWriter_Format(PyUnicodeWriter *writer, const char *format, ...)

Similar to :c:func:`PyUnicode_FromFormat`, but write directly the output
into the writer.

Return ``0`` on success, or set an exception and return ``-1`` on error.
15 changes: 15 additions & 0 deletions Doc/whatsnew/3.14.rst
Original file line number Diff line number Diff line change
Expand Up @@ -258,6 +258,21 @@ New Features
* Add :c:func:`PyLong_GetSign` function to get the sign of :class:`int` objects.
(Contributed by Sergey B Kirpichev in :gh:`116560`.)

* Add a new :c:type:`PyUnicodeWriter` API to create a Python :class:`str`
object:

* :c:func:`PyUnicodeWriter_Create`
* :c:func:`PyUnicodeWriter_Discard`
* :c:func:`PyUnicodeWriter_Finish`
* :c:func:`PyUnicodeWriter_WriteChar`
* :c:func:`PyUnicodeWriter_WriteUTF8`
* :c:func:`PyUnicodeWriter_WriteStr`
* :c:func:`PyUnicodeWriter_WriteRepr`
* :c:func:`PyUnicodeWriter_WriteSubstring`
* :c:func:`PyUnicodeWriter_Format`

(Contributed by Victor Stinner in :gh:`119182`.)

Porting to Python 3.14
----------------------

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
Add a new :c:type:`PyUnicodeWriter` API to create a Python :class:`str` object:

* :c:func:`PyUnicodeWriter_Create`
* :c:func:`PyUnicodeWriter_Discard`
* :c:func:`PyUnicodeWriter_Finish`
* :c:func:`PyUnicodeWriter_WriteChar`
* :c:func:`PyUnicodeWriter_WriteUTF8`
* :c:func:`PyUnicodeWriter_WriteStr`
* :c:func:`PyUnicodeWriter_WriteRepr`
* :c:func:`PyUnicodeWriter_WriteSubstring`
* :c:func:`PyUnicodeWriter_Format`

Patch by Victor Stinner.
Loading
0