8000 doc: PyUnicode_AsUTF8String() fails if string contains surrogates (#1… · python/cpython@d8cf587 · GitHub
[go: up one dir, main page]

Skip to content

Commit d8cf587

Browse files
authored
doc: PyUnicode_AsUTF8String() fails if string contains surrogates (#124605)
1 parent 34158c2 commit d8cf587

File tree

1 file changed

+10
-3
lines changed

1 file changed

+10
-3
lines changed

Doc/c-api/unicode.rst

Lines changed: 10 additions & 3 deletions
< 41D9 tr class="diff-line-row">
Original file line numberDiff line numberDiff line change
@@ -317,7 +317,7 @@ These APIs can be used to work with surrogates:
317317
318318
.. c:function:: Py_UCS4 Py_UNICODE_JOIN_SURROGATES(Py_UCS4 high, Py_UCS4 low)
319319
320-
Join two surrogate characters and return a single :c:type:`Py_UCS4` value.
320+
Join two surrogate code points and return a single :c:type:`Py_UCS4` value.
321321
*high* and *low* are respectively the leading and trailing surrogates in a
322322
surrogate pair. *high* must be in the range [0xD800; 0xDBFF] and *low* must
323323
be in the range [0xDC00; 0xDFFF].
@@ -999,6 +999,9 @@ These are the UTF-8 codec APIs:
999999
object. Error handling is "strict". Return ``NULL`` if an exception was
10001000
raised by the codec.
10011001
1002+
The function fails if the string contains surrogate code points
1003+
(``U+D800`` - ``U+DFFF``).
1004+
10021005
10031006
.. c:function:: const char* PyUnicode_AsUTF8AndSize(PyObject *unicode, Py_ssize_t *size)
10041007
@@ -1011,6 +1014,9 @@ These are the UTF-8 codec APIs:
10111014
On error, set an exception, set *size* to ``-1`` (if it's not NULL) and
10121015
return ``NULL``.
10131016
1017+
The function fails if the string contains surrogate code points
1018+
(``U+D800`` - ``U+DFFF``).
1019+
10141020
This caches the UTF-8 representation of the string in the Unicode object, and
10151021
subsequent calls will return a pointer to the same buffer. The caller is not
10161022
responsible for deallocating the buffer. The buffer is deallocated and
@@ -1438,8 +1444,9 @@ They all return ``NULL`` or ``-1`` if an exception occurs.
14381444
Compare a Unicode object with a char buffer which is interpreted as
14391445
being UTF-8 or ASCII encoded and return true (``1``) if they are equal,
14401446
or false (``0``) otherwise.
1441-
If the Unicode object contains surrogate characters or
1442-
the C string is not valid UTF-8, false (``0``) is returned.
1447+
If the Unicode object contains surrogate code points
1448+
(``U+D800`` - ``U+DFFF``) or the C string is not valid UTF-8,
1449+
false (``0``) is returned.
14431450
14441451
This function does not raise exceptions.
14451452

0 commit comments

Comments
 (0)
0