@@ -31,6 +31,12 @@ Unicode Type
31
31
These are the basic Unicode object types used for the Unicode implementation in
32
32
Python:
33
33
34
+ .. c :var :: PyTypeObject PyUnicode_Type
35
+
36
+ This instance of :c:type: `PyTypeObject ` represents the Python Unicode type. It
37
+ is exposed to Python code as :py:class: `str `.
38
+
39
+
34
40
.. c :type :: Py_UCS4
35
41
Py_UCS2
36
42
Py_UCS1
@@ -42,19 +48,6 @@ Python:
42
48
.. versionadded :: 3.3
43
49
44
50
45
- .. c :type :: Py_UNICODE
46
-
47
- This is a typedef of :c:type: `wchar_t `, which is a 16-bit type or 32-bit type
48
- depending on the platform.
49
-
50
- .. versionchanged :: 3.3
51
- In previous versions, this was a 16-bit type or a 32-bit type depending on
52
- whether you selected a "narrow" or "wide" Unicode version of Python at
53
- build time.
54
-
55
- .. deprecated-removed :: 3.13 3.15
56
-
57
-
58
51
.. c :type :: PyASCIIObject
59
52
PyCompactUnicodeObject
60
53
PyUnicodeObject
@@ -66,12 +59,6 @@ Python:
66
59
.. versionadded :: 3.3
67
60
68
61
69
- .. c :var :: PyTypeObject PyUnicode_Type
70
-
71
- This instance of :c:type: `PyTypeObject ` represents the Python Unicode type. It
72
- is exposed to Python code as ``str ``.
73
-
74
-
75
62
The following APIs are C macros and static inlined functions for fast checks and
76
63
access to internal read-only data of Unicode objects:
77
64
@@ -87,16 +74,6 @@ access to internal read-only data of Unicode objects:
87
74
subtype. This function always succeeds.
88
75
89
76
90
- .. c :function :: int PyUnicode_READY (PyObject *unicode)
91
-
92
- Returns ``0 ``. This API is kept only for backward compatibility.
93
-
94
- .. versionadded :: 3.3
95
-
96
- .. deprecated :: 3.10
97
- This API does nothing since Python 3.12.
98
-
99
-
100
77
.. c :function :: Py_ssize_t PyUnicode_GET_LENGTH (PyObject *unicode)
101
78
102
79
Return the length of the Unicode string, in code points. *unicode * has to be a
@@ -149,12 +126,16 @@ access to internal read-only data of Unicode objects:
149
126
.. c:function:: void PyUnicode_WRITE(int kind, void *data, \
150
127
Py_ssize_t index, Py_UCS4 value)
151
128
152
- Write into a canonical representation *data * (as obtained with
153
- :c:func: `PyUnicode_DATA `). This function performs no sanity checks, and is
154
- intended for usage in loops. The caller should cache the *kind* value and
155
- *data* pointer as obtained from other calls. *index* is the index in
156
- the string (starts at 0) and *value* is the new code point value which should
157
- be written to that location.
129
+ Write the code point *value * to the given zero-based *index * in a string.
130
+
131
+ The *kind * value and *data * pointer must have been obtained from a
132
+ string using :c:func: `PyUnicode_KIND ` and :c:func: `PyUnicode_DATA `
133
+ respectively. You must hold a reference to that string while calling
134
+ :c:func: `!PyUnicode_WRITE `. All requirements of
135
+ :c:func: `PyUnicode_WriteChar ` also apply.
136
+
137
+ The function performs no checks for any of its requirements,
138
+ and is intended for usage in loops.
158
139
159
140
.. versionadded :: 3.3
160
141
@@ -196,6 +177,14 @@ access to internal read-only data of Unicode objects:
196
177
is not ready.
197
178
198
179
180
+ .. c :function :: unsigned int PyUnicode_IS_ASCII (PyObject *unicode)
181
+
182
+ Return true if the string only contains ASCII characters.
183
+ Equivalent to :py:meth: `str.isascii `.
184
+
185
+ .. versionadded :: 3.2
186
+
187
+
199
188
Unicode Character Properties
200
189
""""""""""""""""""""""""""""
201
190
@@ -330,11 +319,29 @@ APIs:
330
319
to be placed in the string. As an approximation, it can be rounded up to the
331
320
nearest value in the sequence 127, 255, 65535, 1114111.
332
321
333
- This is the recommended way to allocate a new Unicode object. Objects
334
- created using this function are not resizable.
335
-
336
322
On error, set an exception and return ``NULL``.
337
323
324
+ After creation, the string can be filled by :c:func:`PyUnicode_WriteChar`,
325
+ :c:func:`PyUnicode_CopyCharacters`, :c:func:`PyUnicode_Fill`,
326
+ :c:func:`PyUnicode_WRITE` or similar.
327
+ Since strings are supposed to be immutable, take care to not “use” the
328
+ result while it is being modified. In particular, before it's filled
329
+ with its final contents, a string:
330
+
331
+ - must not be hashed,
332
+ - must not be :c:func:`converted to UTF-8 <PyUnicode_AsUTF8AndSize>`,
333
+ or another non-"canonical" representation,
334
+ - must not have its reference count changed,
335
+ - must not be shared with code that might do one of the above.
336
+
337
+ This list is not exhaustive. Avoiding these uses is your responsibility;
338
+ Python does not always check these requirements.
339
+
340
+ To avoid accidentally exposing a partially-written string object, prefer
341
+ using the :c:type: `PyUnicodeWriter ` API, or one of the ``PyUnicode_From* ``
342
+ functions below.
343
+
344
+
338
345
.. versionadded :: 3.3
339
346
340
347
@@ -607,6 +614,15 @@ APIs:
607
614
decref'ing the returned objects.
608
615
609
616
617
+ .. c:function:: const char* PyUnicode_GetDefaultEncoding(void)
618
+
619
+ Return the name of the default string encoding, ``"utf-8"``.
620
+ See :func:`sys.getdefaultencoding`.
621
+
622
+ The returned string does not need to be freed, and is valid
623
+ until interpreter shutdown.
624
+
625
+
610
626
.. c:function:: Py_ssize_t PyUnicode_GetLength(PyObject *unicode)
611
627
612
628
Return the length of the Unicode object, in code points.
@@ -627,6 +643,9 @@ APIs:
627
643
possible. Returns ``-1 `` and sets an exception on error, otherwise returns
628
644
the number of copied characters.
629
645
646
+ The string must not have been “used” yet.
647
+ See :c:func: `PyUnicode_New ` for details.
648
+
630
649
.. versionadded :: 3.3
631
650
632
651
@@ -639,6 +658,9 @@ APIs:
639
658
Fail if *fill_char * is bigger than the string maximum character, or if the
640
659
string has more than 1 reference.
641
660
661
+ The string must not have been “used” yet.
662
+ See :c:func: `PyUnicode_New ` for details.
663
+
642
664
Return the number of written character, or return ``-1 `` and raise an
643
665
exception on error.
644
666
@@ -648,15 +670,16 @@ APIs:
648
670
.. c :function :: int PyUnicode_WriteChar (PyObject *unicode, Py_ssize_t index, \
649
671
Py_UCS4 character)
650
672
651
- Write a character to a string. The string must have been created through
652
- :c:func: `PyUnicode_New `. Since Unicode strings are supposed to be immutable,
653
- the string must not be shared, or have been hashed yet.
673
+ Write a *character * to the string *unicode * at the zero-based *index *.
674
+ Return ``0 `` on success, ``-1 `` on error with an exception set.
654
675
655
676
This function checks that *unicode * is a Unicode object, that the index is
656
- not out of bounds, and that the object can be modified safely (i.e. that it
657
- its reference count is one).
677
+ not out of bounds, and that the object's reference count is one).
678
+ See :c:func:`PyUnicode_WRITE` for a version that skips these checks,
679
+ making them your responsibility.
658
680
659
- Return ``0`` on success, ``-1`` on error with an exception set.
681
+ The string must not have been “used” yet.
682
+ See :c:func:`PyUnicode_New` for details.
660
683
661
684
.. versionadded:: 3.3
662
685
@@ -1640,6 +1663,20 @@ They all return ``NULL`` or ``-1`` if an exception occurs.
1640
1663
Strings interned this way are made :term:`immortal`.
1641
1664
1642
1665
1666
+ .. c:function:: unsigned int PyUnicode_CHECK_INTERNED(PyObject *str)
1667
+
1668
+ Return a non-zero value if *str * is interned, zero if not.
1669
+ The *str * argument must be a string; this is not checked.
1670
+ This function always succeeds.
1671
+
1672
+ .. impl-detail ::
1673
+
1674
+ A non-zero return value may carry additional information
1675
+ about *how * the string is interned.
1676
+ The meaning of such non-zero values, as well as each specific string's
1677
+ intern-related details, may change between CPython versions.
1678
+
1679
+
1643
1680
PyUnicodeWriter
1644
1681
^^^^^^^^^^^^^^^
1645
1682
@@ -1760,8 +1797,8 @@ object.
1760
1797
*size * is the string length in bytes. If *size * is equal to ``-1 ``, call
1761
1798
``strlen(str) `` to get the string length.
1762
1799
1763
- *errors * is an error handler name, such as `` "replace" ``. If * errors * is
1764
- ``NULL ``, use the strict error handler.
1800
+ *errors * is an :ref: ` error handler < error-handlers >` name, such as
1801
+ ``"replace" ``. If * errors * is `` NULL ``, use the strict error handler.
1765
1802
1766
1803
If *consumed * is not ``NULL ``, set *\* consumed * to the number of decoded
1767
1804
bytes on success.
@@ -1772,3 +1809,49 @@ object.
1772
1809
On error, set an exception, leave the writer unchanged, and return ``-1 ``.
1773
1810
1774
1811
See also :c:func: `PyUnicodeWriter_WriteUTF8 `.
1812
+
1813
+ Deprecated API
1814
+ ^^^^^^^^^^^^^^
1815
+
1816
+ The following API is deprecated.
1817
+
1818
+ .. c :type :: Py_UNICODE
1819
+
1820
+ This is a typedef of :c:type: `wchar_t `, which is a 16-bit type or 32-bit type
1821
+ depending on the platform.
1822
+ Please use :c:type: `wchar_t ` directly instead.
1823
+
1824
+ .. versionchanged :: 3.3
1825
+ In previous versions, this was a 16-bit type or a 32-bit type depending on
1826
+ whether you selected a "narrow" or "wide" Unicode version of Python at
1827
+ build time.
1828
+
1829
+ .. deprecated-removed :: 3.13 3.15
1830
+
1831
+
1832
+ .. c :function :: int PyUnicode_READY (PyObject *unicode)
1833
+
1834
+ Do nothing and return ``0 ``.
1835
+ This API is kept only for backward compatibility, but there are no plans
1836
+ to remove it.
1837
+
1838
+ .. versionadded :: 3.3
1839
+
1840
+ .. deprecated :: 3.10
1841
+ This API does nothing since Python 3.12.
1842
+ Previously, this needed to be called for each string created using
1843
+ the old API (:c:func: `!PyUnicode_FromUnicode ` or similar).
1844
+
1845
+
1846
+ .. c:function:: unsigned int PyUnicode_IS_READY(PyObject *unicode)
1847
+
1848
+ Do nothing and return ``1 ``.
1849
+ This API is kept only for backward compatibility, but there are no plans
1850
+ to remove it.
1851
+
1852
+ .. versionadded :: 3.3
1853
+
1854
+ .. deprecated :: next
1855
+ This API does nothing since Python 3.12.
1856
+ Previously, this could be called to check if
1857
+ :c:func: `PyUnicode_READY ` is necessary.
0 commit comments