8000 gh-119182: Add PyUnicodeWriter C API by vstinner · Pull Request #119184 · python/cpython · GitHub
[go: up one dir, main page]

Skip to content

gh-119182: Add PyUnicodeWriter C API #119184

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

merged 15 commits into from
Jun 17, 2024
84 changes: 84 additions & 0 deletions Doc/c-api/unicode.rst
Original file line number Diff line number Diff line change
Expand Up @@ -1502,3 +1502,87 @@ They all return ``NULL`` or ``-1`` if an exception occurs.
:c:func:`PyUnicode_InternInPlace`, returning either a new Unicode string
object that has been interned, or a new ("owned") reference to an earlier
interned string object with the same value.

PyUnicodeWriter
^^^^^^^^^^^^^^^

The :c:type:`PyUnicodeWriter` API can be used to create a Python :class:`str`
object.

.. versionadded:: 3.14

.. c:type:: PyUnicodeWriter

A Unicode writer instance.

The instance must be destroyed by :c:func:`PyUnicodeWriter_Finish` on
success, or :c:func:`PyUnicodeWriter_Discard` on error.

.. c:function:: PyUnicodeWriter* PyUnicodeWriter_Create(Py_ssize_t length)

Create a Unicode writer instance.

Set an exception and return ``NULL`` on error.

.. c:function:: PyObject* PyUnicodeWriter_Finish(PyUnicodeWriter *writer)

Return the final Python :class:`str` object and destroy the writer instance.

Set an exception and return ``NULL`` on error.

.. c:function:: void PyUnicodeWriter_Discard(PyUnicodeWriter *writer)

Discard the internal Unicode buffer and destroy the writer instance.

.. c:function:: int PyUnicodeWriter_WriteChar(PyUnicodeWriter *writer, Py_UCS4 ch)

Write the single Unicode character *ch* into *writer*.

On success, return ``0``.
On error, set an exception, leave the writer unchanged, and return ``-1``.

.. c:function:: int PyUnicodeWriter_WriteUTF8(PyUnicodeWriter *writer, const char *str, Py_ssize_t size)

Decode the string *str* from UTF-8 in strict mode and write the output into *writer*.

*size* is the string length in bytes. If *size* is equal to ``-1``, call
``strlen(str)`` to get the string length.

On success, return ``0``.
On error, set an exception, leave the writer unchanged, and return ``-1``.

To use a different error handler than ``strict``,
:c:func:`PyUnicode_DecodeUTF8` can be used with
:c:func:`PyUnicodeWriter_WriteStr`.

.. c:function:: int PyUnicodeWriter_WriteStr(PyUnicodeWriter *writer, PyObject *obj)

Call :c:func:`PyObject_Str` on *obj* and write the output into *writer*.

On success, return ``0``.
On error, set an exception, leave the writer unchanged, and return ``-1``.

.. c:function:: int PyUnicodeWriter_WriteRepr(PyUnicodeWriter *writer, PyObject *obj)

Call :c:func:`PyObject_Repr` on *obj* and write the output into *writer*.

On success, return ``0``.
On error, set an exception, leave the writer unchanged, and return ``-1``.

.. c:function:: int PyUnicodeWriter_WriteSubstring(PyUnicodeWriter *writer, PyObject *str, Py_ssize_t start, Py_ssize_t end)

Write the substring ``str[start:end]`` into *writer*.

*str* must be Python :class:`str` object. *start* must be greater than or
equal to 0, and less than or equal to *end*. *end* must be less than or
equal to *str* length.
Comment on lines +1576 to +1578
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit; I prefer to use SemBr for paragraphs like this.

Suggested change
*str* must be Python :class:`str` object. *start* must be greater than or
equal to 0, and less than or equal to *end*. *end* must be less than or
equal to *str* length.
*str* must be Python :class:`str` object.
*start* must be greater than or equal to 0,
and less than or equal to *end*.
*end* must be less than or equal to *str* length.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TIL that this is called SemBr!

Breaking on comma may be too much, but I prefer to break at the sentence boundary.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. You don't need to break at comma, but I often do to minimise future diffs.

Alternative suggestion:

Suggested change
*str* must be Python :class:`str` object. *start* must be greater than or
equal to 0, and less than or equal to *end*. *end* must be less than or
equal to *str* length.
*str* must be Python :class:`str` object.
*start* must be greater than or equal to 0, and less than or equal to *end*.
*end* must be less than or equal to *str* length.


On success, return ``0``.
On error, set an exception, leave the writer unchanged, and return ``-1``.

.. c:function:: int PyUnicodeWriter_Format(PyUnicodeWriter *writer, const char *format, ...)

Similar to :c:func:`PyUnicode_FromFormat`, but write the output directly into *writer*.

On success, return ``0``.
On error, set an exception, leave the writer unchanged, and return ``-1``.
15 changes: 15 additions & 0 deletions Doc/whatsnew/3.14.rst
Original file line number Diff line number Diff line change
Expand Up @@ -258,6 +258,21 @@ New Features
* Add :c:func:`PyLong_GetSign` function to get the sign of :class:`int` objects.
(Contributed by Sergey B Kirpichev in :gh:`116560`.)

* Add a new :c:type:`PyUnicodeWriter` API to create a Python :class:`str`
object:

* :c:func:`PyUnicodeWriter_Create`.
* :c:func:`PyUnicodeWriter_Discard`.
* :c:func:`PyUnicodeWriter_Finish`.
* :c:func:`PyUnicodeWriter_WriteChar`.
* :c:func:`PyUnicodeWriter_WriteUTF8`.
* :c:func:`PyUnicodeWriter_WriteStr`.
* :c:func:`PyUnicodeWriter_WriteRepr`.
* :c:func:`PyUnicodeWriter_WriteSubstring`.
* :c:func:`PyUnicodeWriter_Format`.

(Contributed by Victor Stinner in :gh:`119182`.)

Porting to Python 3.14
----------------------

Expand Down
37 changes: 35 additions & 2 deletions Include/cpython/unicodeobject.h
57AE
Original file line number Diff line number Diff line change
Expand Up @@ -444,7 +444,40 @@ PyAPI_FUNC(PyObject*) PyUnicode_FromKindAndData(
Py_ssize_t size);


/* --- _PyUnicodeWriter API ----------------------------------------------- */
/* --- Public PyUnicodeWriter API ----------------------------------------- */

typedef struct PyUnicodeWriter PyUnicodeWriter;

PyAPI_FUNC(PyUnicodeWriter*) PyUnicodeWriter_Create(Py_ssize_t length);
PyAPI_FUNC(void) PyUnicodeWriter_Discard(PyUnicodeWriter *writer);
PyAPI_FUNC(PyObject*) PyUnicodeWriter_Finish(PyUnicodeWriter *writer);

PyAPI_FUNC(int) PyUnicodeWriter_WriteChar(
PyUnicodeWriter *writer,
Py_UCS4 ch);
PyAPI_FUNC(int) PyUnicodeWriter_WriteUTF8(
PyUnicodeWriter *writer,
const char *str,
Py_ssize_t size);

PyAPI_FUNC(int) PyUnicodeWriter_WriteStr(
PyUnicodeWriter *writer,
PyObject *obj);
PyAPI_FUNC(int) PyUnicodeWriter_WriteRepr(
PyUnicodeWriter *writer,
PyObject *obj);
PyAPI_FUNC(int) PyUnicodeWriter_WriteSubstring(
PyUnicodeWriter *writer,
PyObject *str,
Py_ssize_t start,
Py_ssize_t end);
PyAPI_FUNC(int) PyUnicodeWriter_Format(
PyUnicodeWriter *writer,
const char *format,
...);


/* --- Private _PyUnicodeWriter API --------------------------------------- */

typedef struct {
PyObject *buffer;
Expand All @@ -466,7 +499,7 @@ typedef struct {
/* If readonly is 1, buffer is a shared string (cannot be modified)
and size is set to 0. */
unsigned char readonly;
} _PyUnicodeWriter ;
} _PyUnicodeWriter;

// Initialize a Unicode writer.
//
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
Add a new :c:type:`PyUnicodeWriter` API to create a Python :class:`str` object:

* :c:func:`PyUnicodeWriter_Create`.
* :c:func:`PyUnicodeWriter_Discard`.
* :c:func:`PyUnicodeWriter_Finish`.
* :c:func:`PyUnicodeWriter_WriteChar`.
* :c:func:`PyUnicodeWriter_WriteUTF8`.
* :c:func:`PyUnicodeWriter_WriteStr`.
* :c:func:`PyUnicodeWriter_WriteRepr`.
* :c:func:`PyUnicodeWriter_WriteSubstring`.
* :c:func:`PyUnicodeWriter_Format`.

Patch by Victor Stinner.
Loading
Loading
0