-
-
Notifications
You must be signed in to change notification settings - Fork 18.7k
API: Add string extension type #27949
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
c24b5b6
3ecb5cc
59a7d39
7c07070
9e1a73b
16ccad8
1027463
aafb53b
9cdfe2f
ab49169
978fb55
aebc688
d90d0ad
41dc0f9
b783559
13cdddd
78c2eaa
726d0af
69d24e5
9cd9945
070fb76
2b90639
381c889
bf82aad
79bd87a
2af8c81
fd24274
0635ede
d3311ee
dce9258
0524f7e
292a8f3
2c88e3b
1b8c83a
f1dad2a
be95ecb
903ea2f
0e1f479
c168ecf
d06ba73
3ba27c3
fe8ee77
d9f63aa
d3c49e2
dcb84f9
43b51cd
4fd2d11
713f807
777b295
8714a53
41f234c
dc9ef3c
9419af2
462b29d
0391563
129fe29
6aebd8c
2ee5e30
7e92cde
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
- Loading branch information
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -16,9 +16,9 @@ Text Data Types | |
There are two main ways to store text data | ||
|
||
1. ``object`` -dtype NumPy array. | ||
2. :class:`TextDtype` extension type. | ||
2. :class:`StringDtype` extension type. | ||
|
||
We recommend using :class:`TextDtype` to store text data. | ||
We recommend using :class:`StringDtype` to store text data. | ||
|
||
Prior to pandas 1.0, ``object`` dtype was the only option. This was unfortunate | ||
for many reasons: | ||
|
@@ -32,13 +32,13 @@ for many reasons: | |
than ``text``. | ||
|
||
Currently, the performance of ``object`` dtype arrays of strings and | ||
:class:`arrays.TextArray` are about the same. We expect future enhancements | ||
:class:`arrays.StringArray` are about the same. We expect future enhancements | ||
to significantly increase the performance and lower the memory overhead of | ||
:class:`~arrays.TextArray`. | ||
:class:`~arrays.StringArray`. | ||
|
||
.. warning:: | ||
|
||
``TextArray`` is currently considered experimental. The implementation | ||
``StringArray`` is currently considered experimental. The implementation | ||
and parts of the API may change without warning. | ||
|
||
For backwards-compatibility, ``object`` dtype remains the default type we | ||
|
@@ -53,7 +53,7 @@ To explicitly request ``text`` dtype, specify the ``dtype`` | |
.. ipython:: python | ||
|
||
pd.Series(['a', 'b', 'c'], dtype="text") | ||
pd.Series(['a', 'b', 'c'], dtype=pd.TextDtype()) | ||
pd.Series(['a', 'b', 'c'], dtype=pd.StringDtype()) | ||
|
||
Or ``astype`` after the ``Series`` or ``DataFrame`` is created | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. not sure of the convention, should Series and DataFrame be ":class: There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Don't think we have a formal policy. I vaguely recall a discussion somewhere about doing it ~once per paragraph? |
||
|
||
|
@@ -170,8 +170,8 @@ It is easy to expand this to return a DataFrame using ``expand``. | |
|
||
s2.str.split('_', expand=True) | ||
|
||
When original ``Series`` has :class:`TextDtype`, the output columns will all | ||
be :class:`TextDtype` as well. | ||
When original ``Series`` has :class:`StringDtype`, the output columns will all | ||
be :class:`StringDtype` as well. | ||
|
||
It is also possible to limit the number of splits: | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -66,7 +66,7 @@ | |
PeriodDtype, | ||
IntervalDtype, | ||
DatetimeTZDtype, | ||
TextDtype, | ||
StringDtype, | ||
# missing | ||
isna, | ||
isnull, | ||
|
Uh oh!
There was an error while loading. Please reload this page.