-pandas supports the integration with many file formats or data sources out of the box (csv, excel, sql, json, parquet,…). Importing data from each of these
-data sources is provided by function with the prefix ``read_*``. Similarly, the ``to_*`` methods are used to store data.
+pandas supports the integration with many file formats or data sources out of the box (csv, excel, sql, json, parquet,…). The ability to import data from each of these
+data sources is provided by functions with the prefix, ``read_*``. Similarly, the ``to_*`` methods are used to store data.
.. image:: ../_static/schemas/02_io_readwrite.svg
:align: center
@@ -181,7 +180,7 @@ data sources is provided by function with the prefix ``read_*``. Similarly, the
-Selecting or filtering specific rows and/or columns? Filtering the data on a condition? Methods for slicing, selecting, and extracting the
+Selecting or filtering specific rows and/or columns? Filtering the data on a particular condition? Methods for slicing, selecting, and extracting the
data you need are available in pandas.
.. image:: ../_static/schemas/03_subset_columns_rows.svg
@@ -228,7 +227,7 @@ data you need are available in pandas.
-pandas provides plotting your data out of the box, using the power of Matplotlib. You can pick the plot type (scatter, bar, boxplot,...)
+pandas provides plotting for your data right out of the box with the power of Matplotlib. Simply pick the plot type (scatter, bar, boxplot,...)
corresponding to your data.
.. image:: ../_static/schemas/04_plot_overview.svg
@@ -275,7 +274,7 @@ corresponding to your data.
-There is no need to loop over all rows of your data table to do calculations. Data manipulations on a column work elementwise.
+There's no need to loop over all rows of your data table to do calculations. Column data manipulations work elementwise in pandas.
Adding a column to a :class:`DataFrame` based on existing data in other columns is straightforward.
.. image:: ../_static/schemas/05_newcolumn_2.svg
@@ -322,7 +321,7 @@ Adding a column to a :class:`DataFrame` based on existing data in other columns
-Basic statistics (mean, median, min, max, counts...) are easily calculable. These or custom aggregations can be applied on the entire
+Basic statistics (mean, median, min, max, counts...) are easily calculable across data frames. These, or even custom aggregations, can be applied on the entire
data set, a sliding window of the data, or grouped by categories. The latter is also known as the split-apply-combine approach.
.. image:: ../_static/schemas/06_groupby.svg
@@ -369,8 +368,8 @@ data set, a sliding window of the data, or grouped by categories. The latter is
-Change the structure of your data table in multiple ways. You can :func:`~pandas.melt` your data table from wide to long/tidy form or :func:`~pandas.pivot`
-from long to wide format. With aggregations built-in, a pivot table is created with a single command.
+Change the structure of your data table in a variety of ways. You can use :func:`~pandas.melt` to reshape your data from a wide format to a long and tidy one. Use :func:`~pandas.pivot`
+ to go from long to wide format. With aggregations built-in, a pivot table can be created with a single command.
.. image:: ../_static/schemas/07_melt.svg
:align: center
@@ -416,7 +415,7 @@ from long to wide format. With aggregations built-in, a pivot table is created w
-Multiple tables can be concatenated both column wise and row wise as database-like join/merge operations are provided to combine multiple tables of data.
+Multiple tables can be concatenated column wise or row wise with pandas' database-like join and merge operations.
.. image:: ../_static/schemas/08_concat_row.svg
:align: center
@@ -505,7 +504,7 @@ pandas has great support for time series and has an extensive set of tools for w
-Data sets do not only contain numerical data. pandas provides a wide range of functions to clean textual data and extract useful information from it.
+Data sets often contain more than just numerical data. pandas provides a wide range of functions to clean textual data and extract useful information from it.
.. raw:: html
@@ -551,9 +550,9 @@ the pandas-equivalent operations compared to software you already know:
:class-card: comparison-card
:shadow: md
- The `R programming language
`__ provides the
- ``data.frame`` data structure and multiple packages, such as
- `tidyverse
`__ use and extend ``data.frame``
+ The `R programming language `__ provides a
+ ``data.frame`` data structure as well as packages like
+ `tidyverse `__ which use and extend ``data.frame``
for convenient data handling functionalities similar to pandas.
+++
@@ -572,8 +571,8 @@ the pandas-equivalent operations compared to software you already know:
:class-card: comparison-card
:shadow: md
- Already familiar to ``SELECT``, ``GROUP BY``, ``JOIN``, etc.?
- Most of these SQL manipulations do have equivalents in pandas.
+ Already familiar with ``SELECT``, ``GROUP BY``, ``JOIN``, etc.?
+ Many SQL manipulations have equivalents in pandas.
+++
@@ -613,7 +612,7 @@ the pandas-equivalent operations compared to software you already know:
Users of `Excel `__
or other spreadsheet programs will find that many of the concepts are
- transferrable to pandas.
+ transferable to pandas.
+++
@@ -631,10 +630,10 @@ the pandas-equivalent operations compared to software you already know:
:class-card: comparison-card
:shadow: md
- The `SAS `__ statistical software suite
- also provides the ``data set`` corresponding to the pandas ``DataFrame``.
- Also SAS vectorized operations, filtering, string processing operations,
- and more have similar functions in pandas.
+ `SAS `__, the statistical software suite,
+ uses the ``data set`` structure, which closely corresponds pandas' ``DataFrame``.
+ Also SAS vectorized operations such as filtering or string processing operations
+ have similar functions in pandas.
+++
diff --git a/doc/source/getting_started/install.rst b/doc/source/getting_started/install.rst
index 1d7eca5223544..93663c1cced7e 100644
--- a/doc/source/getting_started/install.rst
+++ b/doc/source/getting_started/install.rst
@@ -6,88 +6,75 @@
Installation
============
-The easiest way to install pandas is to install it
-as part of the `Anaconda `__ distribution, a
-cross platform distribution for data analysis and scientific computing.
-The `Conda `__ package manager is the
-recommended installation method for most users.
+The pandas development team officially distributes pandas for installation
+through the following methods:
-Instructions for installing :ref:`from source `,
-:ref:`PyPI `, or a
-:ref:`development version ` are also provided.
+* Available on `conda-forge `__ for installation with the conda package manager.
+* Available on `PyPI `__ for installation with pip.
+* Available on `Github `__ for installation from source.
+
+.. note::
+ pandas may be installable from other sources besides the ones listed above,
+ but they are **not** managed by the pandas development team.
.. _install.version:
Python version support
----------------------
-Officially Python 3.9, 3.10 and 3.11.
+See :ref:`Python support policy `.
Installing pandas
-----------------
-.. _install.anaconda:
+.. _install.conda:
-Installing with Anaconda
-~~~~~~~~~~~~~~~~~~~~~~~~
+Installing with Conda
+~~~~~~~~~~~~~~~~~~~~~
-For users that are new to Python, the easiest way to install Python, pandas, and the
-packages that make up the `PyData `__ stack
-(`SciPy `__, `NumPy `__,
-`Matplotlib `__, `and more `__)
-is with `Anaconda `__, a cross-platform
-(Linux, macOS, Windows) Python distribution for data analytics and
-scientific computing. Installation instructions for Anaconda
-`can be found here `__.
+For users working with the `Conda `__ package manager,
+pandas can be installed from the ``conda-forge`` channel.
-.. _install.miniconda:
+.. code-block:: shell
-Installing with Miniconda
-~~~~~~~~~~~~~~~~~~~~~~~~~
+ conda install -c conda-forge pandas
-For users experienced with Python, the recommended way to install pandas with
-`Miniconda `__.
-Miniconda allows you to create a minimal, self-contained Python installation compared to Anaconda and use the
-`Conda `__ package manager to install additional packages
-and create a virtual environment for your installation. Installation instructions for Miniconda
-`can be found here `__.
+To install the Conda package manager on your system, the
+`Miniforge distribution `__
+is recommended.
-The next step is to create a new conda environment. A conda environment is like a
-virtualenv that allows you to specify a specific version of Python and set of libraries.
-Run the following commands from a terminal window.
+Additionally, it is recommended to install and run pandas from a virtual environment.
.. code-block:: shell
conda create -c conda-forge -n name_of_my_env python pandas
-
-This will create a minimal environment with only Python and pandas installed.
-To put your self inside this environment run.
-
-.. code-block:: shell
-
+ # On Linux or MacOS
source activate name_of_my_env
# On Windows
activate name_of_my_env
-.. _install.pypi:
+.. tip::
+ For users that are new to Python, the easiest way to install Python, pandas, and the
+ packages that make up the `PyData `__ stack such as
+ `SciPy `__, `NumPy `__ and
+ `Matplotlib `__
+ is with `Anaconda `__, a cross-platform
+ (Linux, macOS, Windows) Python distribution for data analytics and
+ scientific computing.
-Installing from PyPI
-~~~~~~~~~~~~~~~~~~~~
+ However, pandas from Anaconda is **not** officially managed by the pandas development team.
-pandas can be installed via pip from
-`PyPI `__.
+.. _install.pip:
-.. code-block:: shell
-
- pip install pandas
+Installing with pip
+~~~~~~~~~~~~~~~~~~~
-.. note::
- You must have ``pip>=19.3`` to install from PyPI.
+For users working with the `pip `__ package manager,
+pandas can be installed from `PyPI `__.
-.. note::
+.. code-block:: shell
- It is recommended to install and run pandas from a virtual environment, for example,
- using the Python standard library's `venv `__
+ pip install pandas
pandas can also be installed with sets of optional dependencies to enable certain functionality. For example,
to install pandas with the optional dependencies to read Excel files.
@@ -98,25 +85,8 @@ to install pandas with the optional dependencies to read Excel files.
The full list of extras that can be installed can be found in the :ref:`dependency section.`
-Handling ImportErrors
-~~~~~~~~~~~~~~~~~~~~~
-
-If you encounter an ``ImportError``, it usually means that Python couldn't find pandas in the list of available
-libraries. Python internally has a list of directories it searches through, to find packages. You can
-obtain these directories with.
-
-.. code-block:: python
-
- import sys
- sys.path
-
-One way you could be encountering this error is if you have multiple Python installations on your system
-and you don't have pandas installed in the Python installation you're currently using.
-In Linux/Mac you can run ``which python`` on your terminal and it will tell you which Python installation you're
-using. If it's something like "/usr/bin/python", you're using the Python from the system, which is not recommended.
-
-It is highly recommended to use ``conda``, for quick installation and for package and dependency updates.
-You can find simple installation instructions for pandas :ref:`in this document `.
+Additionally, it is recommended to install and run pandas from a virtual environment, for example,
+using the Python standard library's `venv `__
.. _install.source:
@@ -144,49 +114,24 @@ index from the PyPI registry of anaconda.org. You can install it by running.
pip install --pre --extra-index https://pypi.anaconda.org/scientific-python-nightly-wheels/simple pandas
-Note that you might be required to uninstall an existing version of pandas to install the development version.
+.. note::
+ You might be required to uninstall an existing version of pandas to install the development version.
-.. code-block:: shell
+ .. code-block:: shell
- pip uninstall pandas -y
+ pip uninstall pandas -y
Running the test suite
----------------------
-pandas is equipped with an exhaustive set of unit tests. The packages required to run the tests
-can be installed with ``pip install "pandas[test]"``. To run the tests from a
-Python terminal.
-
-.. code-block:: python
-
- >>> import pandas as pd
- >>> pd.test()
- running: pytest -m "not slow and not network and not db" /home/user/anaconda3/lib/python3.9/site-packages/pandas
-
- ============================= test session starts ==============================
- platform linux -- Python 3.9.7, pytest-6.2.5, py-1.11.0, pluggy-1.0.0
- rootdir: /home/user
- plugins: dash-1.19.0, anyio-3.5.0, hypothesis-6.29.3
- collected 154975 items / 4 skipped / 154971 selected
- ........................................................................ [ 0%]
- ........................................................................ [ 99%]
- ....................................... [100%]
-
- ==================================== ERRORS ====================================
-
- =================================== FAILURES ===================================
-
- =============================== warnings summary ===============================
-
- =========================== short test summary info ============================
-
- = 1 failed, 146194 passed, 7402 skipped, 1367 xfailed, 5 xpassed, 197 warnings, 10 errors in 1090.16s (0:18:10) =
+If pandas has been installed :ref:`from source `, running ``pytest pandas`` will run all of pandas unit tests.
+The unit tests can also be run from the pandas module itself with the :func:`test` function. The packages required to run the tests
+can be installed with ``pip install "pandas[test]"``.
.. note::
- This is just an example of what information is shown. Test failures are not necessarily indicative
- of a broken pandas installation.
+ Test failures are not necessarily indicative of a broken pandas installation.
.. _install.dependencies:
@@ -203,9 +148,8 @@ pandas requires the following dependencies.
================================================================ ==========================
Package Minimum supported version
================================================================ ==========================
-`NumPy `__ 1.22.4
+`NumPy `__ 1.23.5
`python-dateutil `__ 2.8.2
-`pytz `__ 2020.1
`tzdata `__ 2022.7
================================================================ ==========================
@@ -220,7 +164,7 @@ For example, :func:`pandas.read_hdf` requires the ``pytables`` package, while
optional dependency is not installed, pandas will raise an ``ImportError`` when
the method requiring that dependency is called.
-If using pip, optional pandas dependencies can be installed or managed in a file (e.g. requirements.txt or pyproject.toml)
+With pip, optional pandas dependencies can be installed or managed in a file (e.g. requirements.txt or pyproject.toml)
as optional extras (e.g. ``pandas[performance, aws]``). All optional dependencies can be installed with ``pandas[all]``,
and specific sets of dependencies are listed in the sections below.
@@ -239,9 +183,9 @@ Installable with ``pip install "pandas[performance]"``
===================================================== ================== ================== ===================================================================================================================================================================================
Dependency Minimum Version pip extra Notes
===================================================== ================== ================== ===================================================================================================================================================================================
-`numexpr `__ 2.8.4 performance Accelerates certain numerical operations by using multiple cores as well as smart chunking and caching to achieve large speedups
+`numexpr `__ 2.9.0 performance Accelerates certain numerical operations by using multiple cores as well as smart chunking and caching to achieve large speedups
`bottleneck `__ 1.3.6 performance Accelerates certain types of ``nan`` by using specialized cython routines to achieve large speedup.
-`numba `__ 0.56.4 performance Alternative execution engine for operations that accept ``engine="numba"`` using a JIT compiler that translates Python functions to optimized machine code using the LLVM compiler.
+`numba `__ 0.59.0 performance Alternative execution engine for operations that accept ``engine="numba"`` using a JIT compiler that translates Python functions to optimized machine code using the LLVM compiler.
===================================================== ================== ================== ===================================================================================================================================================================================
Visualization
@@ -249,53 +193,56 @@ Visualization
Installable with ``pip install "pandas[plot, output-formatting]"``.
-========================= ================== ================== =============================================================
-Dependency Minimum Version pip extra Notes
-========================= ================== ================== =============================================================
-matplotlib 3.6.3 plot Plotting library
-Jinja2 3.1.2 output-formatting Conditional formatting with DataFrame.style
-tabulate 0.9.0 output-formatting Printing in Markdown-friendly format (see `tabulate`_)
-========================= ================== ================== =============================================================
+========================================================== ================== ================== =======================================================
+Dependency Minimum Version pip extra Notes
+========================================================== ================== ================== =======================================================
+`matplotlib `__ 3.8.3 plot Plotting library
+`Jinja2 `__ 3.1.3 output-formatting Conditional formatting with DataFrame.style
+`tabulate `__ 0.9.0 output-formatting Printing in Markdown-friendly format (see `tabulate`_)
+========================================================== ================== ================== =======================================================
Computation
^^^^^^^^^^^
Installable with ``pip install "pandas[computation]"``.
-========================= ================== =============== =============================================================
-Dependency Minimum Version pip extra Notes
-========================= ================== =============== =============================================================
-SciPy 1.10.0 computation Miscellaneous statistical functions
-xarray 2022.12.0 computation pandas-like API for N-dimensional data
-========================= ================== =============== =============================================================
+============================================== ================== =============== =======================================
+Dependency Minimum Version pip extra Notes
+============================================== ================== =============== =======================================
+`SciPy `__ 1.12.0 computation Miscellaneous statistical functions
+`xarray `__ 2024.1.1 computation pandas-like API for N-dimensional data
+============================================== ================== =============== =======================================
+
+.. _install.excel_dependencies:
Excel files
^^^^^^^^^^^
Installable with ``pip install "pandas[excel]"``.
-========================= ================== =============== =============================================================
-Dependency Minimum Version pip extra Notes
-========================= ================== =============== =============================================================
-xlrd 2.0.1 excel Reading Excel
-xlsxwriter 3.0.5 excel Writing Excel
-openpyxl 3.1.0 excel Reading / writing for xlsx files
-pyxlsb 1.0.10 excel Reading for xlsb files
-python-calamine 0.1.7 excel Reading for xls/xlsx/xlsb/ods files
-========================= ================== =============== =============================================================
+================================================================== ================== =============== =============================================================
+Dependency Minimum Version pip extra Notes
+================================================================== ================== =============== =============================================================
+`xlrd `__ 2.0.1 excel Reading for xls files
+`xlsxwriter `__ 3.2.0 excel Writing for xlsx files
+`openpyxl `__ 3.1.2 excel Reading / writing for Excel 2010 xlsx/xlsm/xltx/xltm files
+`pyxlsb `__ 1.0.10 excel Reading for xlsb files
+`python-calamine `__ 0.1.7 excel Reading for xls/xlsx/xlsm/xlsb/xla/xlam/ods files
+`odfpy `__ 1.4.1 excel Reading / writing for OpenDocument 1.2 files
+================================================================== ================== =============== =============================================================
HTML
^^^^
Installable with ``pip install "pandas[html]"``.
-========================= ================== =============== =============================================================
-Dependency Minimum Version pip extra Notes
-========================= ================== =============== =============================================================
-BeautifulSoup4 4.11.2 html HTML parser for read_html
-html5lib 1.1 html HTML parser for read_html
-lxml 4.9.2 html HTML parser for read_html
-========================= ================== =============== =============================================================
+=============================================================== ================== =============== ==========================
+Dependency Minimum Version pip extra Notes
+=============================================================== ================== =============== ==========================
+`BeautifulSoup4 `__ 4.12.3 html HTML parser for read_html
+`html5lib `__ 1.1 html HTML parser for read_html
+`lxml `__ 4.9.2 html HTML parser for read_html
+=============================================================== ================== =============== ==========================
One of the following combinations of libraries is needed to use the
top-level :func:`~pandas.read_html` function:
@@ -326,45 +273,45 @@ XML
Installable with ``pip install "pandas[xml]"``.
-========================= ================== =============== =============================================================
-Dependency Minimum Version pip extra Notes
-========================= ================== =============== =============================================================
-lxml 4.9.2 xml XML parser for read_xml and tree builder for to_xml
-========================= ================== =============== =============================================================
+======================================== ================== =============== ====================================================
+Dependency Minimum Version pip extra Notes
+======================================== ================== =============== ====================================================
+`lxml `__ 4.9.2 xml XML parser for read_xml and tree builder for to_xml
+======================================== ================== =============== ====================================================
SQL databases
^^^^^^^^^^^^^
Traditional drivers are installable with ``pip install "pandas[postgresql, mysql, sql-other]"``
-========================= ================== =============== =============================================================
-Dependency Minimum Version pip extra Notes
-========================= ================== =============== =============================================================
-SQLAlchemy 2.0.0 postgresql, SQL support for databases other than sqlite
- mysql,
- sql-other
-psycopg2 2.9.6 postgresql PostgreSQL engine for sqlalchemy
-pymysql 1.0.2 mysql MySQL engine for sqlalchemy
-adbc-driver-postgresql 0.8.0 postgresql ADBC Driver for PostgreSQL
-adbc-driver-sqlite 0.8.0 sql-other ADBC Driver for SQLite
-========================= ================== =============== =============================================================
+================================================================== ================== =============== ============================================
+Dependency Minimum Version pip extra Notes
+================================================================== ================== =============== ============================================
+`SQLAlchemy `__ 2.0.0 postgresql, SQL support for databases other than sqlite
+ mysql,
+ sql-other
+`psycopg2 `__ 2.9.6 postgresql PostgreSQL engine for sqlalchemy
+`pymysql `__ 1.1.0 mysql MySQL engine for sqlalchemy
+`adbc-driver-postgresql `__ 0.10.0 postgresql ADBC Driver for PostgreSQL
+`adbc-driver-sqlite `__ 0.8.0 sql-other ADBC Driver for SQLite
+================================================================== ================== =============== ============================================
Other data sources
^^^^^^^^^^^^^^^^^^
-Installable with ``pip install "pandas[hdf5, parquet, feather, spss, excel]"``
+Installable with ``pip install "pandas[hdf5, parquet, iceberg, feather, spss, excel]"``
-========================= ================== ================ =============================================================
-Dependency Minimum Version pip extra Notes
-========================= ================== ================ =============================================================
-PyTables 3.8.0 hdf5 HDF5-based reading / writing
-blosc 1.21.3 hdf5 Compression for HDF5; only available on ``conda``
-zlib hdf5 Compression for HDF5
-fastparquet 2022.12.0 - Parquet reading / writing (pyarrow is default)
-pyarrow 10.0.1 parquet, feather Parquet, ORC, and feather reading / writing
-pyreadstat 1.2.0 spss SPSS files (.sav) reading
-odfpy 1.4.1 excel Open document format (.odf, .ods, .odt) reading / writing
-========================= ================== ================ =============================================================
+====================================================== ================== ================ ==========================================================
+Dependency Minimum Version pip extra Notes
+====================================================== ================== ================ ==========================================================
+`PyTables `__ 3.8.0 hdf5 HDF5-based reading / writing
+`zlib `__ hdf5 Compression for HDF5
+`fastparquet `__ 2024.2.0 - Parquet reading / writing (pyarrow is default)
+`pyarrow `__ 10.0.1 parquet, feather Parquet, ORC, and feather reading / writing
+`PyIceberg `__ 0.7.1 iceberg Apache Iceberg reading
+`pyreadstat `__ 1.2.6 spss SPSS files (.sav) reading
+`odfpy `__ 1.4.1 excel Open document format (.odf, .ods, .odt) reading / writing
+====================================================== ================== ================ ==========================================================
.. _install.warn_orc:
@@ -379,27 +326,26 @@ Access data in the cloud
Installable with ``pip install "pandas[fss, aws, gcp]"``
-========================= ================== =============== =============================================================
-Dependency Minimum Version pip extra Notes
-========================= ================== =============== =============================================================
-fsspec 2022.11.0 fss, gcp, aws Handling files aside from simple local and HTTP (required
- dependency of s3fs, gcsfs).
-gcsfs 2022.11.0 gcp Google Cloud Storage access
-pandas-gbq 0.19.0 gcp Google Big Query access
-s3fs 2022.11.0 aws Amazon S3 access
-========================= ================== =============== =============================================================
+============================================ ================== =============== ==========================================================
+Dependency Minimum Version pip extra Notes
+============================================ ================== =============== ==========================================================
+`fsspec `__ 2023.12.2 fss, gcp, aws Handling files aside from simple local and HTTP (required
+ dependency of s3fs, gcsfs).
+`gcsfs `__ 2023.12.2 gcp Google Cloud Storage access
+`s3fs `__ 2023.12.2 aws Amazon S3 access
+============================================ ================== =============== ==========================================================
Clipboard
^^^^^^^^^
Installable with ``pip install "pandas[clipboard]"``.
-========================= ================== =============== =============================================================
-Dependency Minimum Version pip extra Notes
-========================= ================== =============== =============================================================
-PyQt4/PyQt5 5.15.9 clipboard Clipboard I/O
-qtpy 2.3.0 clipboard Clipboard I/O
-========================= ================== =============== =============================================================
+======================================================================================== ================== =============== ==============
+Dependency Minimum Version pip extra Notes
+======================================================================================== ================== =============== ==============
+`PyQt4 `__/`PyQt5 `__ 5.15.9 clipboard Clipboard I/O
+`qtpy `__ 2.3.0 clipboard Clipboard I/O
+======================================================================================== ================== =============== ==============
.. note::
@@ -412,19 +358,19 @@ Compression
Installable with ``pip install "pandas[compression]"``
-========================= ================== =============== =============================================================
-Dependency Minimum Version pip extra Notes
-========================= ================== =============== =============================================================
-Zstandard 0.19.0 compression Zstandard compression
-========================= ================== =============== =============================================================
+================================================= ================== =============== ======================
+Dependency Minimum Version pip extra Notes
+================================================= ================== =============== ======================
+`Zstandard `__ 0.19.0 compression Zstandard compression
+================================================= ================== =============== ======================
-Consortium Standard
-^^^^^^^^^^^^^^^^^^^
+Timezone
+^^^^^^^^
-Installable with ``pip install "pandas[consortium-standard]"``
+Installable with ``pip install "pandas[timezone]"``
-========================= ================== =================== =============================================================
-Dependency Minimum Version pip extra Notes
-========================= ================== =================== =============================================================
-dataframe-api-compat 0.1.7 consortium-standard Consortium Standard-compatible implementation based on pandas
-========================= ================== =================== =============================================================
+========================================== ================== =================== ==============================================
+Dependency Minimum Version pip extra Notes
+========================================== ================== =================== ==============================================
+`pytz `__ 2023.4 timezone Alternative timezone library to ``zoneinfo``.
+========================================== ================== =================== ==============================================
diff --git a/doc/source/getting_started/intro_tutorials/01_table_oriented.rst b/doc/source/getting_started/intro_tutorials/01_table_oriented.rst
index caaff3557ae40..efcdb22778ef4 100644
--- a/doc/source/getting_started/intro_tutorials/01_table_oriented.rst
+++ b/doc/source/getting_started/intro_tutorials/01_table_oriented.rst
@@ -46,7 +46,7 @@ I want to store passenger data of the Titanic. For a number of passengers, I kno
"Name": [
"Braund, Mr. Owen Harris",
"Allen, Mr. William Henry",
- "Bonnell, Miss. Elizabeth",
+ "Bonnell, Miss Elizabeth",
],
"Age": [22, 35, 58],
"Sex": ["male", "male", "female"],
@@ -192,8 +192,8 @@ Check more options on ``describe`` in the user guide section about :ref:`aggrega
.. note::
This is just a starting point. Similar to spreadsheet
software, pandas represents data as a table with columns and rows. Apart
- from the representation, also the data manipulations and calculations
- you would do in spreadsheet software are supported by pandas. Continue
+ from the representation, the data manipulations and calculations
+ you would do in spreadsheet software are also supported by pandas. Continue
reading the next tutorials to get started!
.. raw:: html
@@ -204,7 +204,7 @@ Check more options on ``describe`` in the user guide section about :ref:`aggrega
- Import the package, aka ``import pandas as pd``
- A table of data is stored as a pandas ``DataFrame``
- Each column in a ``DataFrame`` is a ``Series``
-- You can do things by applying a method to a ``DataFrame`` or ``Series``
+- You can do things by applying a method on a ``DataFrame`` or ``Series``
.. raw:: html
@@ -215,7 +215,7 @@ Check more options on ``describe`` in the user guide section about :ref:`aggrega
To user guide
-A more extended explanation to ``DataFrame`` and ``Series`` is provided in the :ref:`introduction to data structures
`.
+A more extended explanation of ``DataFrame`` and ``Series`` is provided in the :ref:`introduction to data structures ` page.
.. raw:: html
diff --git a/doc/source/getting_started/intro_tutorials/02_read_write.rst b/doc/source/getting_started/intro_tutorials/02_read_write.rst
index 832c2cc25712f..0549c17a1013c 100644
--- a/doc/source/getting_started/intro_tutorials/02_read_write.rst
+++ b/doc/source/getting_started/intro_tutorials/02_read_write.rst
@@ -97,11 +97,11 @@ in this ``DataFrame`` are integers (``int64``), floats (``float64``) and
strings (``object``).
.. note::
- When asking for the ``dtypes``, no brackets are used!
+ When asking for the ``dtypes``, no parentheses ``()`` are used!
``dtypes`` is an attribute of a ``DataFrame`` and ``Series``. Attributes
- of a ``DataFrame`` or ``Series`` do not need brackets. Attributes
+ of a ``DataFrame`` or ``Series`` do not need ``()``. Attributes
represent a characteristic of a ``DataFrame``/``Series``, whereas
- methods (which require brackets) *do* something with the
+ methods (which require parentheses ``()``) *do* something with the
``DataFrame``/``Series`` as introduced in the :ref:`first tutorial <10min_tut_01_tableoriented>`.
.. raw:: html
@@ -111,6 +111,12 @@ strings (``object``).
My colleague requested the Titanic data as a spreadsheet.
+.. note::
+ If you want to use :func:`~pandas.to_excel` and :func:`~pandas.read_excel`,
+ you need to install an Excel reader as outlined in the
+ :ref:`Excel files ` section of the
+ installation documentation.
+
.. ipython:: python
titanic.to_excel("titanic.xlsx", sheet_name="passengers", index=False)
@@ -166,11 +172,11 @@ The method :meth:`~DataFrame.info` provides technical information about a
- The table has 12 columns. Most columns have a value for each of the
rows (all 891 values are ``non-null``). Some columns do have missing
values and less than 891 ``non-null`` values.
-- The columns ``Name``, ``Sex``, ``Cabin`` and ``Embarked`` consists of
+- The columns ``Name``, ``Sex``, ``Cabin`` and ``Embarked`` consist of
textual data (strings, aka ``object``). The other columns are
- numerical data with some of them whole numbers (aka ``integer``) and
- others are real numbers (aka ``float``).
-- The kind of data (characters, integers,…) in the different columns
+ numerical data, some of them are whole numbers (``integer``) and
+ others are real numbers (``float``).
+- The kind of data (characters, integers, …) in the different columns
are summarized by listing the ``dtypes``.
- The approximate amount of RAM used to hold the DataFrame is provided
as well.
@@ -188,7 +194,7 @@ The method :meth:`~DataFrame.info` provides technical information about a
- Getting data in to pandas from many different file formats or data
sources is supported by ``read_*`` functions.
- Exporting data out of pandas is provided by different
- ``to_*``\ methods.
+ ``to_*`` methods.
- The ``head``/``tail``/``info`` methods and the ``dtypes`` attribute
are convenient for a first check.
diff --git a/doc/source/getting_started/intro_tutorials/03_subset_data.rst b/doc/source/getting_started/intro_tutorials/03_subset_data.rst
index 6d7ec01551572..ced976f680885 100644
--- a/doc/source/getting_started/intro_tutorials/03_subset_data.rst
+++ b/doc/source/getting_started/intro_tutorials/03_subset_data.rst
@@ -101,7 +101,7 @@ selection brackets ``[]``.
.. note::
The inner square brackets define a
:ref:`Python list ` with column names, whereas
- the outer brackets are used to select the data from a pandas
+ the outer square brackets are used to select the data from a pandas
``DataFrame`` as seen in the previous example.
The returned data type is a pandas DataFrame:
@@ -300,7 +300,7 @@ want to select.
-When using the column names, row labels or a condition expression, use
+When using column names, row labels or a condition expression, use
the ``loc`` operator in front of the selection brackets ``[]``. For both
the part before and after the comma, you can use a single label, a list
of labels, a slice of labels, a conditional expression or a colon. Using
@@ -335,14 +335,14 @@ the name ``anonymous`` to the first 3 elements of the fourth column:
.. ipython:: python
titanic.iloc[0:3, 3] = "anonymous"
- titanic.head()
+ titanic.iloc[:5, 3]
.. raw:: html
To user guide
-See the user guide section on :ref:`different choices for indexing
` to get more insight in the usage of ``loc`` and ``iloc``.
+See the user guide section on :ref:`different choices for indexing ` to get more insight into the usage of ``loc`` and ``iloc``.
.. raw:: html
@@ -354,13 +354,11 @@ See the user guide section on :ref:`different choices for indexing REMEMBER
- When selecting subsets of data, square brackets ``[]`` are used.
-- Inside these brackets, you can use a single column/row label, a list
+- Inside these square brackets, you can use a single column/row label, a list
of column/row labels, a slice of labels, a conditional expression or
a colon.
-- Select specific rows and/or columns using ``loc`` when using the row
- and column names.
-- Select specific rows and/or columns using ``iloc`` when using the
- positions in the table.
+- Use ``loc`` for label-based selection (using row/column names).
+- Use ``iloc`` for position-based selection (using table positions).
- You can assign new values to a selection based on ``loc``/``iloc``.
.. raw:: html
diff --git a/doc/source/getting_started/intro_tutorials/04_plotting.rst b/doc/source/getting_started/intro_tutorials/04_plotting.rst
index e96eb7c51a12a..e9f83c602d086 100644
--- a/doc/source/getting_started/intro_tutorials/04_plotting.rst
+++ b/doc/source/getting_started/intro_tutorials/04_plotting.rst
@@ -32,8 +32,10 @@ How do I create plots in pandas?
air_quality.head()
.. note::
- The usage of the ``index_col`` and ``parse_dates`` parameters of the ``read_csv`` function to define the first (0th) column as
- index of the resulting ``DataFrame`` and convert the dates in the column to :class:`Timestamp` objects, respectively.
+ The ``index_col=0`` and ``parse_dates=True`` parameters passed to the ``read_csv`` function define
+ the first (0th) column as index of the resulting ``DataFrame`` and convert the dates in the column
+ to :class:`Timestamp` objects, respectively.
+
.. raw:: html
@@ -85,7 +87,7 @@ I want to plot only the columns of the data table with the data from Paris.
air_quality["station_paris"].plot()
plt.show()
-To plot a specific column, use the selection method of the
+To plot a specific column, use a selection method from the
:ref:`subset data tutorial <10min_tut_03_subset>` in combination with the :meth:`~DataFrame.plot`
method. Hence, the :meth:`~DataFrame.plot` method works on both ``Series`` and
``DataFrame``.
@@ -127,7 +129,7 @@ standard Python to get an overview of the available plot methods:
]
.. note::
- In many development environments as well as IPython and
+ In many development environments such as IPython and
Jupyter Notebook, use the TAB button to get an overview of the available
methods, for example ``air_quality.plot.`` + TAB.
@@ -238,7 +240,7 @@ This strategy is applied in the previous example:
- The ``.plot.*`` methods are applicable on both Series and DataFrames.
- By default, each of the columns is plotted as a different element
- (line, boxplot,…).
+ (line, boxplot, …).
- Any plot created by pandas is a Matplotlib object.
.. raw:: html
diff --git a/doc/source/getting_started/intro_tutorials/05_add_columns.rst b/doc/source/getting_started/intro_tutorials/05_add_columns.rst
index d59a70cc2818e..481c094870e12 100644
--- a/doc/source/getting_started/intro_tutorials/05_add_columns.rst
+++ b/doc/source/getting_started/intro_tutorials/05_add_columns.rst
@@ -51,7 +51,7 @@ hPa, the conversion factor is 1.882*)
air_quality["london_mg_per_cubic"] = air_quality["station_london"] * 1.882
air_quality.head()
-To create a new column, use the ``[]`` brackets with the new column name
+To create a new column, use the square brackets ``[]`` with the new column name
at the left side of the assignment.
.. raw:: html
@@ -89,8 +89,8 @@ values in each row*.
-Also other mathematical operators (``+``, ``-``, ``*``, ``/``,…) or
-logical operators (``<``, ``>``, ``==``,…) work element-wise. The latter was already
+Other mathematical operators (``+``, ``-``, ``*``, ``/``, …) and logical
+operators (``<``, ``>``, ``==``, …) also work element-wise. The latter was already
used in the :ref:`subset data tutorial <10min_tut_03_subset>` to filter
rows of a table using a conditional expression.
diff --git a/doc/source/getting_started/intro_tutorials/06_calculate_statistics.rst b/doc/source/getting_started/intro_tutorials/06_calculate_statistics.rst
index fe3ae820e7085..1399ab66426f4 100644
--- a/doc/source/getting_started/intro_tutorials/06_calculate_statistics.rst
+++ b/doc/source/getting_started/intro_tutorials/06_calculate_statistics.rst
@@ -162,7 +162,7 @@ columns by passing ``numeric_only=True``:
It does not make much sense to get the average value of the ``Pclass``.
If we are only interested in the average age for each gender, the
-selection of columns (rectangular brackets ``[]`` as usual) is supported
+selection of columns (square brackets ``[]`` as usual) is supported
on the grouped data as well:
.. ipython:: python
@@ -235,7 +235,7 @@ category in a column.
-The function is a shortcut, as it is actually a groupby operation in combination with counting of the number of records
+The function is a shortcut, it is actually a groupby operation in combination with counting the number of records
within each group:
.. ipython:: python
diff --git a/doc/source/getting_started/intro_tutorials/08_combine_dataframes.rst b/doc/source/getting_started/intro_tutorials/08_combine_dataframes.rst
index 9081f274cd941..024300bb8a9b0 100644
--- a/doc/source/getting_started/intro_tutorials/08_combine_dataframes.rst
+++ b/doc/source/getting_started/intro_tutorials/08_combine_dataframes.rst
@@ -137,7 +137,7 @@ Hence, the resulting table has 3178 = 1110 + 2068 rows.
Most operations like concatenation or summary statistics are by default
across rows (axis 0), but can be applied across columns as well.
-Sorting the table on the datetime information illustrates also the
+Sorting the table on the datetime information also illustrates the
combination of both tables, with the ``parameter`` column defining the
origin of the table (either ``no2`` from table ``air_quality_no2`` or
``pm25`` from table ``air_quality_pm25``):
@@ -271,7 +271,7 @@ Add the parameters' full description and name, provided by the parameters metada
Compared to the previous example, there is no common column name.
However, the ``parameter`` column in the ``air_quality`` table and the
-``id`` column in the ``air_quality_parameters_name`` both provide the
+``id`` column in the ``air_quality_parameters`` table both provide the
measured variable in a common format. The ``left_on`` and ``right_on``
arguments are used here (instead of just ``on``) to make the link
between the two tables.
@@ -286,7 +286,7 @@ between the two tables.
To user guide
-pandas supports also inner, outer, and right joins.
+pandas also supports inner, outer, and right joins.
More information on join/merge of tables is provided in the user guide section on
:ref:`database style merging of tables
`. Or have a look at the
:ref:`comparison with SQL` page.
@@ -300,7 +300,7 @@ More information on join/merge of tables is provided in the user guide section o
REMEMBER
-- Multiple tables can be concatenated both column-wise and row-wise using
+- Multiple tables can be concatenated column-wise or row-wise using
the ``concat`` function.
- For database-like merging/joining of tables, use the ``merge``
function.
diff --git a/doc/source/getting_started/intro_tutorials/09_timeseries.rst b/doc/source/getting_started/intro_tutorials/09_timeseries.rst
index b0530087e5b84..6ba3c17fac3c3 100644
--- a/doc/source/getting_started/intro_tutorials/09_timeseries.rst
+++ b/doc/source/getting_started/intro_tutorials/09_timeseries.rst
@@ -77,9 +77,9 @@ I want to work with the dates in the column ``datetime`` as datetime objects ins
Initially, the values in ``datetime`` are character strings and do not
provide any datetime operations (e.g. extract the year, day of the
-week,…). By applying the ``to_datetime`` function, pandas interprets the
+week, …). By applying the ``to_datetime`` function, pandas interprets the
strings and convert these to datetime (i.e. ``datetime64[ns, UTC]``)
-objects. In pandas we call these datetime objects similar to
+objects. In pandas we call these datetime objects that are similar to
``datetime.datetime`` from the standard library as :class:`pandas.Timestamp`.
.. raw:: html
@@ -117,7 +117,7 @@ length of our time series:
air_quality["datetime"].max() - air_quality["datetime"].min()
The result is a :class:`pandas.Timedelta` object, similar to ``datetime.timedelta``
-from the standard Python library and defining a time duration.
+from the standard Python library which defines a time duration.
.. raw:: html
@@ -257,7 +257,7 @@ the adapted time scale on plots. Let’s apply this on our data.
-
-Create a plot of the :math:`NO_2` values in the different stations from the 20th of May till the end of 21st of May
+Create a plot of the :math:`NO_2` values in the different stations from May 20th till the end of May 21st.
.. ipython:: python
:okwarning:
@@ -295,7 +295,7 @@ Aggregate the current hourly time series values to the monthly maximum value in
.. ipython:: python
- monthly_max = no_2.resample("ME").max()
+ monthly_max = no_2.resample("MS").max()
monthly_max
A very powerful method on time series data with a datetime index, is the
@@ -310,7 +310,7 @@ converting secondly data into 5-minutely data).
The :meth:`~Series.resample` method is similar to a groupby operation:
- it provides a time-based grouping, by using a string (e.g. ``M``,
- ``5H``,…) that defines the target frequency
+ ``5H``, …) that defines the target frequency
- it requires an aggregation function such as ``mean``, ``max``,…
.. raw:: html
diff --git a/doc/source/getting_started/intro_tutorials/10_text_data.rst b/doc/source/getting_started/intro_tutorials/10_text_data.rst
index 5b1885791d8fb..8493a071863c4 100644
--- a/doc/source/getting_started/intro_tutorials/10_text_data.rst
+++ b/doc/source/getting_started/intro_tutorials/10_text_data.rst
@@ -134,8 +134,8 @@ only one countess on the Titanic, we get one row as a result.
.. note::
More powerful extractions on strings are supported, as the
:meth:`Series.str.contains` and :meth:`Series.str.extract` methods accept `regular
- expressions `__, but out of
- scope of this tutorial.
+ expressions `__, but are out of
+ the scope of this tutorial.
.. raw:: html
@@ -200,7 +200,7 @@ In the "Sex" column, replace values of "male" by "M" and values of "female" by "
Whereas :meth:`~Series.replace` is not a string method, it provides a convenient way
to use mappings or vocabularies to translate certain values. It requires
-a ``dictionary`` to define the mapping ``{from : to}``.
+a ``dictionary`` to define the mapping ``{from: to}``.
.. raw:: html
diff --git a/doc/source/getting_started/intro_tutorials/includes/titanic.rst b/doc/source/getting_started/intro_tutorials/includes/titanic.rst
index 6e03b848aab06..41159516200fa 100644
--- a/doc/source/getting_started/intro_tutorials/includes/titanic.rst
+++ b/doc/source/getting_started/intro_tutorials/includes/titanic.rst
@@ -11,7 +11,7 @@ This tutorial uses the Titanic data set, stored as CSV. The data
consists of the following data columns:
- PassengerId: Id of every passenger.
-- Survived: Indication whether passenger survived. ``0`` for yes and ``1`` for no.
+- Survived: Indication whether passenger survived. ``0`` for no and ``1`` for yes.
- Pclass: One out of the 3 ticket classes: Class ``1``, Class ``2`` and Class ``3``.
- Name: Name of passenger.
- Sex: Gender of passenger.
diff --git a/doc/source/getting_started/overview.rst b/doc/source/getting_started/overview.rst
index 05a7d63b7ff47..98a68080d33ef 100644
--- a/doc/source/getting_started/overview.rst
+++ b/doc/source/getting_started/overview.rst
@@ -6,11 +6,11 @@
Package overview
****************
-pandas is a `Python `__ package providing fast,
+pandas is a `Python `__ package that provides fast,
flexible, and expressive data structures designed to make working with
"relational" or "labeled" data both easy and intuitive. It aims to be the
-fundamental high-level building block for doing practical, **real-world** data
-analysis in Python. Additionally, it has the broader goal of becoming **the
+fundamental high-level building block for Python's practical, **real-world** data
+analysis. Additionally, it seeks to become **the
most powerful and flexible open source data analysis/manipulation tool
available in any language**. It is already well on its way toward this goal.
@@ -174,3 +174,4 @@ License
-------
.. literalinclude:: ../../../LICENSE
+ :language: none
diff --git a/doc/source/getting_started/tutorials.rst b/doc/source/getting_started/tutorials.rst
index 4393c3716bdad..eae7771418485 100644
--- a/doc/source/getting_started/tutorials.rst
+++ b/doc/source/getting_started/tutorials.rst
@@ -112,7 +112,7 @@ Various tutorials
* `Wes McKinney's (pandas BDFL) blog `_
* `Statistical analysis made easy in Python with SciPy and pandas DataFrames, by Randal Olson `_
-* `Statistical Data Analysis in Python, tutorial videos, by Christopher Fonnesbeck from SciPy 2013 `_
+* `Statistical Data Analysis in Python, tutorial by Christopher Fonnesbeck from SciPy 2013 `_
* `Financial analysis in Python, by Thomas Wiecki `_
* `Intro to pandas data structures, by Greg Reda `_
* `Pandas DataFrames Tutorial, by Karlijn Willems `_
diff --git a/doc/source/reference/arrays.rst b/doc/source/reference/arrays.rst
index fe65364896f54..d37eebef5c0c0 100644
--- a/doc/source/reference/arrays.rst
+++ b/doc/source/reference/arrays.rst
@@ -61,7 +61,7 @@ is an :class:`ArrowDtype`.
support as NumPy including first-class nullability support for all data types, immutability and more.
The table below shows the equivalent pyarrow-backed (``pa``), pandas extension, and numpy (``np``) types that are recognized by pandas.
-Pyarrow-backed types below need to be passed into :class:`ArrowDtype` to be recognized by pandas e.g. ``pd.ArrowDtype(pa.bool_())``
+Pyarrow-backed types below need to be passed into :class:`ArrowDtype` to be recognized by pandas e.g. ``pd.ArrowDtype(pa.bool_())``.
=============================================== ========================== ===================
PyArrow type pandas extension type NumPy type
@@ -114,7 +114,7 @@ values.
ArrowDtype
-For more information, please see the :ref:`PyArrow user guide `
+For more information, please see the :ref:`PyArrow user guide `.
.. _api.arrays.datetime:
@@ -495,7 +495,7 @@ a :class:`CategoricalDtype`.
CategoricalDtype.categories
CategoricalDtype.ordered
-Categorical data can be stored in a :class:`pandas.Categorical`
+Categorical data can be stored in a :class:`pandas.Categorical`:
.. autosummary::
:toctree: api/
@@ -539,6 +539,21 @@ To create a Series of dtype ``category``, use ``cat = s.astype(dtype)`` or
If the :class:`Series` is of dtype :class:`CategoricalDtype`, ``Series.cat`` can be used to change the categorical
data. See :ref:`api.series.cat` for more.
+More methods are available on :class:`Categorical`:
+
+.. autosummary::
+ :toctree: api/
+
+ Categorical.as_ordered
+ Categorical.as_unordered
+ Categorical.set_categories
+ Categorical.rename_categories
+ Categorical.reorder_categories
+ Categorical.add_categories
+ Categorical.remove_categories
+ Categorical.remove_unused_categories
+ Categorical.map
+
.. _api.arrays.sparse:
Sparse
@@ -649,6 +664,7 @@ Data type introspection
api.types.is_datetime64_dtype
api.types.is_datetime64_ns_dtype
api.types.is_datetime64tz_dtype
+ api.types.is_dtype_equal
api.types.is_extension_array_dtype
api.types.is_float_dtype
api.types.is_int64_dtype
@@ -685,7 +701,6 @@ Scalar introspection
api.types.is_float
api.types.is_hashable
api.types.is_integer
- api.types.is_interval
api.types.is_number
api.types.is_re
api.types.is_re_compilable
diff --git a/doc/source/reference/frame.rst b/doc/source/reference/frame.rst
index fefb02dd916cd..e701d48a89db7 100644
--- a/doc/source/reference/frame.rst
+++ b/doc/source/reference/frame.rst
@@ -48,7 +48,7 @@ Conversion
DataFrame.convert_dtypes
DataFrame.infer_objects
DataFrame.copy
- DataFrame.bool
+ DataFrame.to_numpy
Indexing, iteration
~~~~~~~~~~~~~~~~~~~
@@ -74,6 +74,7 @@ Indexing, iteration
DataFrame.where
DataFrame.mask
DataFrame.query
+ DataFrame.isetitem
For more information on ``.at``, ``.iat``, ``.loc``, and
``.iloc``, see the :ref:`indexing documentation `.
@@ -117,7 +118,6 @@ Function application, GroupBy & window
DataFrame.apply
DataFrame.map
- DataFrame.applymap
DataFrame.pipe
DataFrame.agg
DataFrame.aggregate
@@ -185,11 +185,8 @@ Reindexing / selection / label manipulation
DataFrame.duplicated
DataFrame.equals
DataFrame.filter
- DataFrame.first
- DataFrame.head
DataFrame.idxmax
DataFrame.idxmin
- DataFrame.last
DataFrame.reindex
DataFrame.reindex_like
DataFrame.rename
@@ -198,7 +195,6 @@ Reindexing / selection / label manipulation
DataFrame.sample
DataFrame.set_axis
DataFrame.set_index
- DataFrame.tail
DataFrame.take
DataFrame.truncate
@@ -209,7 +205,6 @@ Missing data handling
.. autosummary::
:toctree: api/
- DataFrame.backfill
DataFrame.bfill
DataFrame.dropna
DataFrame.ffill
@@ -219,7 +214,6 @@ Missing data handling
DataFrame.isnull
DataFrame.notna
DataFrame.notnull
- DataFrame.pad
DataFrame.replace
Reshaping, sorting, transposing
@@ -238,7 +232,6 @@ Reshaping, sorting, transposing
DataFrame.swaplevel
DataFrame.stack
DataFrame.unstack
- DataFrame.swapaxes
DataFrame.melt
DataFrame.explode
DataFrame.squeeze
@@ -382,7 +375,6 @@ Serialization / IO / conversion
DataFrame.to_feather
DataFrame.to_latex
DataFrame.to_stata
- DataFrame.to_gbq
DataFrame.to_records
DataFrame.to_string
DataFrame.to_clipboard
diff --git a/doc/source/reference/groupby.rst b/doc/source/reference/groupby.rst
index 771163ae1b0bc..004651ac0074f 100644
--- a/doc/source/reference/groupby.rst
+++ b/doc/source/reference/groupby.rst
@@ -79,8 +79,9 @@ Function application
DataFrameGroupBy.cumsum
DataFrameGroupBy.describe
DataFrameGroupBy.diff
+ DataFrameGroupBy.ewm
+ DataFrameGroupBy.expanding
DataFrameGroupBy.ffill
- DataFrameGroupBy.fillna
DataFrameGroupBy.first
DataFrameGroupBy.head
DataFrameGroupBy.idxmax
@@ -105,6 +106,7 @@ Function application
DataFrameGroupBy.shift
DataFrameGroupBy.size
DataFrameGroupBy.skew
+ DataFrameGroupBy.kurt
DataFrameGroupBy.std
DataFrameGroupBy.sum
DataFrameGroupBy.var
@@ -130,8 +132,9 @@ Function application
SeriesGroupBy.cumsum
SeriesGroupBy.describe
SeriesGroupBy.diff
+ SeriesGroupBy.ewm
+ SeriesGroupBy.expanding
SeriesGroupBy.ffill
- SeriesGroupBy.fillna
SeriesGroupBy.first
SeriesGroupBy.head
SeriesGroupBy.last
@@ -161,6 +164,7 @@ Function application
SeriesGroupBy.shift
SeriesGroupBy.size
SeriesGroupBy.skew
+ SeriesGroupBy.kurt
SeriesGroupBy.std
SeriesGroupBy.sum
SeriesGroupBy.var
diff --git a/doc/source/reference/index.rst b/doc/source/reference/index.rst
index 7da02f7958416..639bac4d40b70 100644
--- a/doc/source/reference/index.rst
+++ b/doc/source/reference/index.rst
@@ -24,13 +24,14 @@ The following subpackages are public.
`pandas-stubs `_ package
which has classes in addition to those that occur in pandas for type-hinting.
-In addition, public functions in ``pandas.io`` and ``pandas.tseries`` submodules
-are mentioned in the documentation.
+In addition, public functions in ``pandas.io``, ``pandas.tseries``, ``pandas.util`` submodules
+are explicitly mentioned in the documentation. Further APIs in these modules are not guaranteed
+to be stable.
.. warning::
- The ``pandas.core``, ``pandas.compat``, and ``pandas.util`` top-level modules are PRIVATE. Stable functionality in such modules is not guaranteed.
+ The ``pandas.core``, ``pandas.compat`` top-level modules are PRIVATE. Stable functionality in such modules is not guaranteed.
.. If you update this toctree, also update the manual toctree in the
.. main index.rst.template
@@ -61,7 +62,6 @@ are mentioned in the documentation.
..
.. toctree::
- api/pandas.Index.holds_integer
api/pandas.Index.nlevels
api/pandas.Index.sort
diff --git a/doc/source/reference/indexing.rst b/doc/source/reference/indexing.rst
index fa6105761df0a..7a4bc0f467f9a 100644
--- a/doc/source/reference/indexing.rst
+++ b/doc/source/reference/indexing.rst
@@ -41,6 +41,7 @@ Properties
Index.empty
Index.T
Index.memory_usage
+ Index.array
Modifying and computations
~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -61,13 +62,6 @@ Modifying and computations
Index.identical
Index.insert
Index.is_
- Index.is_boolean
- Index.is_categorical
- Index.is_floating
- Index.is_integer
- Index.is_interval
- Index.is_numeric
- Index.is_object
Index.min
Index.max
Index.reindex
@@ -110,6 +104,7 @@ Conversion
Index.to_list
Index.to_series
Index.to_frame
+ Index.to_numpy
Index.view
Sorting
diff --git a/doc/source/reference/io.rst b/doc/source/reference/io.rst
index fbd0f6bd200b9..6e5992916f800 100644
--- a/doc/source/reference/io.rst
+++ b/doc/source/reference/io.rst
@@ -156,6 +156,15 @@ Parquet
read_parquet
DataFrame.to_parquet
+Iceberg
+~~~~~~~
+.. autosummary::
+ :toctree: api/
+
+ read_iceberg
+
+.. warning:: ``read_iceberg`` is experimental and may change without warning.
+
ORC
~~~
.. autosummary::
@@ -188,13 +197,6 @@ SQL
read_sql
DataFrame.to_sql
-Google BigQuery
-~~~~~~~~~~~~~~~
-.. autosummary::
- :toctree: api/
-
- read_gbq
-
STATA
~~~~~
.. autosummary::
diff --git a/doc/source/reference/offset_frequency.rst b/doc/source/reference/offset_frequency.rst
index ab89fe74e7337..5876e005574fd 100644
--- a/doc/source/reference/offset_frequency.rst
+++ b/doc/source/reference/offset_frequency.rst
@@ -26,8 +26,6 @@ Properties
DateOffset.normalize
DateOffset.rule_code
DateOffset.n
- DateOffset.is_month_start
- DateOffset.is_month_end
Methods
~~~~~~~
@@ -35,7 +33,6 @@ Methods
:toctree: api/
DateOffset.copy
- DateOffset.is_anchored
DateOffset.is_on_offset
DateOffset.is_month_start
DateOffset.is_month_end
@@ -82,7 +79,6 @@ Methods
:toctree: api/
BusinessDay.copy
- BusinessDay.is_anchored
BusinessDay.is_on_offset
BusinessDay.is_month_start
BusinessDay.is_month_end
@@ -122,7 +118,6 @@ Methods
:toctree: api/
BusinessHour.copy
- BusinessHour.is_anchored
BusinessHour.is_on_offset
BusinessHour.is_month_start
BusinessHour.is_month_end
@@ -169,7 +164,6 @@ Methods
:toctree: api/
CustomBusinessDay.copy
- CustomBusinessDay.is_anchored
CustomBusinessDay.is_on_offset
CustomBusinessDay.is_month_start
CustomBusinessDay.is_month_end
@@ -209,7 +203,6 @@ Methods
:toctree: api/
CustomBusinessHour.copy
- CustomBusinessHour.is_anchored
CustomBusinessHour.is_on_offset
CustomBusinessHour.is_month_start
CustomBusinessHour.is_month_end
@@ -244,7 +237,6 @@ Methods
:toctree: api/
MonthEnd.copy
- MonthEnd.is_anchored
MonthEnd.is_on_offset
MonthEnd.is_month_start
MonthEnd.is_month_end
@@ -279,7 +271,6 @@ Methods
:toctree: api/
MonthBegin.copy
- MonthBegin.is_anchored
MonthBegin.is_on_offset
MonthBegin.is_month_start
MonthBegin.is_month_end
@@ -323,7 +314,6 @@ Methods
:toctree: api/
BusinessMonthEnd.copy
- BusinessMonthEnd.is_anchored
BusinessMonthEnd.is_on_offset
BusinessMonthEnd.is_month_start
BusinessMonthEnd.is_month_end
@@ -367,7 +357,6 @@ Methods
:toctree: api/
BusinessMonthBegin.copy
- BusinessMonthBegin.is_anchored
BusinessMonthBegin.is_on_offset
BusinessMonthBegin.is_month_start
BusinessMonthBegin.is_month_end
@@ -415,7 +404,6 @@ Methods
:toctree: api/
CustomBusinessMonthEnd.copy
- CustomBusinessMonthEnd.is_anchored
CustomBusinessMonthEnd.is_on_offset
CustomBusinessMonthEnd.is_month_start
CustomBusinessMonthEnd.is_month_end
@@ -463,7 +451,6 @@ Methods
:toctree: api/
CustomBusinessMonthBegin.copy
- CustomBusinessMonthBegin.is_anchored
CustomBusinessMonthBegin.is_on_offset
CustomBusinessMonthBegin.is_month_start
CustomBusinessMonthBegin.is_month_end
@@ -499,7 +486,6 @@ Methods
:toctree: api/
SemiMonthEnd.copy
- SemiMonthEnd.is_anchored
SemiMonthEnd.is_on_offset
SemiMonthEnd.is_month_start
SemiMonthEnd.is_month_end
@@ -535,7 +521,6 @@ Methods
:toctree: api/
SemiMonthBegin.copy
- SemiMonthBegin.is_anchored
SemiMonthBegin.is_on_offset
SemiMonthBegin.is_month_start
SemiMonthBegin.is_month_end
@@ -571,7 +556,6 @@ Methods
:toctree: api/
Week.copy
- Week.is_anchored
Week.is_on_offset
Week.is_month_start
Week.is_month_end
@@ -607,7 +591,6 @@ Methods
:toctree: api/
WeekOfMonth.copy
- WeekOfMonth.is_anchored
WeekOfMonth.is_on_offset
WeekOfMonth.weekday
WeekOfMonth.is_month_start
@@ -645,7 +628,6 @@ Methods
:toctree: api/
LastWeekOfMonth.copy
- LastWeekOfMonth.is_anchored
LastWeekOfMonth.is_on_offset
LastWeekOfMonth.is_month_start
LastWeekOfMonth.is_month_end
@@ -681,7 +663,6 @@ Methods
:toctree: api/
BQuarterEnd.copy
- BQuarterEnd.is_anchored
BQuarterEnd.is_on_offset
BQuarterEnd.is_month_start
BQuarterEnd.is_month_end
@@ -717,7 +698,6 @@ Methods
:toctree: api/
BQuarterBegin.copy
- BQuarterBegin.is_anchored
BQuarterBegin.is_on_offset
BQuarterBegin.is_month_start
BQuarterBegin.is_month_end
@@ -753,7 +733,6 @@ Methods
:toctree: api/
QuarterEnd.copy
- QuarterEnd.is_anchored
QuarterEnd.is_on_offset
QuarterEnd.is_month_start
QuarterEnd.is_month_end
@@ -789,7 +768,6 @@ Methods
:toctree: api/
QuarterBegin.copy
- QuarterBegin.is_anchored
QuarterBegin.is_on_offset
QuarterBegin.is_month_start
QuarterBegin.is_month_end
@@ -798,6 +776,146 @@ Methods
QuarterBegin.is_year_start
QuarterBegin.is_year_end
+BHalfYearEnd
+------------
+.. autosummary::
+ :toctree: api/
+
+ BHalfYearEnd
+
+Properties
+~~~~~~~~~~
+.. autosummary::
+ :toctree: api/
+
+ BHalfYearEnd.freqstr
+ BHalfYearEnd.kwds
+ BHalfYearEnd.name
+ BHalfYearEnd.nanos
+ BHalfYearEnd.normalize
+ BHalfYearEnd.rule_code
+ BHalfYearEnd.n
+ BHalfYearEnd.startingMonth
+
+Methods
+~~~~~~~
+.. autosummary::
+ :toctree: api/
+
+ BHalfYearEnd.copy
+ BHalfYearEnd.is_on_offset
+ BHalfYearEnd.is_month_start
+ BHalfYearEnd.is_month_end
+ BHalfYearEnd.is_quarter_start
+ BHalfYearEnd.is_quarter_end
+ BHalfYearEnd.is_year_start
+ BHalfYearEnd.is_year_end
+
+BHalfYearBegin
+--------------
+.. autosummary::
+ :toctree: api/
+
+ BHalfYearBegin
+
+Properties
+~~~~~~~~~~
+.. autosummary::
+ :toctree: api/
+
+ BHalfYearBegin.freqstr
+ BHalfYearBegin.kwds
+ BHalfYearBegin.name
+ BHalfYearBegin.nanos
+ BHalfYearBegin.normalize
+ BHalfYearBegin.rule_code
+ BHalfYearBegin.n
+ BHalfYearBegin.startingMonth
+
+Methods
+~~~~~~~
+.. autosummary::
+ :toctree: api/
+
+ BHalfYearBegin.copy
+ BHalfYearBegin.is_on_offset
+ BHalfYearBegin.is_month_start
+ BHalfYearBegin.is_month_end
+ BHalfYearBegin.is_quarter_start
+ BHalfYearBegin.is_quarter_end
+ BHalfYearBegin.is_year_start
+ BHalfYearBegin.is_year_end
+
+HalfYearEnd
+-----------
+.. autosummary::
+ :toctree: api/
+
+ HalfYearEnd
+
+Properties
+~~~~~~~~~~
+.. autosummary::
+ :toctree: api/
+
+ HalfYearEnd.freqstr
+ HalfYearEnd.kwds
+ HalfYearEnd.name
+ HalfYearEnd.nanos
+ HalfYearEnd.normalize
+ HalfYearEnd.rule_code
+ HalfYearEnd.n
+ HalfYearEnd.startingMonth
+
+Methods
+~~~~~~~
+.. autosummary::
+ :toctree: api/
+
+ HalfYearEnd.copy
+ HalfYearEnd.is_on_offset
+ HalfYearEnd.is_month_start
+ HalfYearEnd.is_month_end
+ HalfYearEnd.is_quarter_start
+ HalfYearEnd.is_quarter_end
+ HalfYearEnd.is_year_start
+ HalfYearEnd.is_year_end
+
+HalfYearBegin
+-------------
+.. autosummary::
+ :toctree: api/
+
+ HalfYearBegin
+
+Properties
+~~~~~~~~~~
+.. autosummary::
+ :toctree: api/
+
+ HalfYearBegin.freqstr
+ HalfYearBegin.kwds
+ HalfYearBegin.name
+ HalfYearBegin.nanos
+ HalfYearBegin.normalize
+ HalfYearBegin.rule_code
+ HalfYearBegin.n
+ HalfYearBegin.startingMonth
+
+Methods
+~~~~~~~
+.. autosummary::
+ :toctree: api/
+
+ HalfYearBegin.copy
+ HalfYearBegin.is_on_offset
+ HalfYearBegin.is_month_start
+ HalfYearBegin.is_month_end
+ HalfYearBegin.is_quarter_start
+ HalfYearBegin.is_quarter_end
+ HalfYearBegin.is_year_start
+ HalfYearBegin.is_year_end
+
BYearEnd
--------
.. autosummary::
@@ -825,7 +943,6 @@ Methods
:toctree: api/
BYearEnd.copy
- BYearEnd.is_anchored
BYearEnd.is_on_offset
BYearEnd.is_month_start
BYearEnd.is_month_end
@@ -861,7 +978,6 @@ Methods
:toctree: api/
BYearBegin.copy
- BYearBegin.is_anchored
BYearBegin.is_on_offset
BYearBegin.is_month_start
BYearBegin.is_month_end
@@ -897,7 +1013,6 @@ Methods
:toctree: api/
YearEnd.copy
- YearEnd.is_anchored
YearEnd.is_on_offset
YearEnd.is_month_start
YearEnd.is_month_end
@@ -933,7 +1048,6 @@ Methods
:toctree: api/
YearBegin.copy
- YearBegin.is_anchored
YearBegin.is_on_offset
YearBegin.is_month_start
YearBegin.is_month_end
@@ -973,7 +1087,6 @@ Methods
FY5253.copy
FY5253.get_rule_code_suffix
FY5253.get_year_end
- FY5253.is_anchored
FY5253.is_on_offset
FY5253.is_month_start
FY5253.is_month_end
@@ -1014,7 +1127,6 @@ Methods
FY5253Quarter.copy
FY5253Quarter.get_rule_code_suffix
FY5253Quarter.get_weeks
- FY5253Quarter.is_anchored
FY5253Quarter.is_on_offset
FY5253Quarter.year_has_extra_week
FY5253Quarter.is_month_start
@@ -1050,7 +1162,6 @@ Methods
:toctree: api/
Easter.copy
- Easter.is_anchored
Easter.is_on_offset
Easter.is_month_start
Easter.is_month_end
@@ -1071,7 +1182,6 @@ Properties
.. autosummary::
:toctree: api/
- Tick.delta
Tick.freqstr
Tick.kwds
Tick.name
@@ -1086,7 +1196,6 @@ Methods
:toctree: api/
Tick.copy
- Tick.is_anchored
Tick.is_on_offset
Tick.is_month_start
Tick.is_month_end
@@ -1107,7 +1216,6 @@ Properties
.. autosummary::
:toctree: api/
- Day.delta
Day.freqstr
Day.kwds
Day.name
@@ -1122,7 +1230,6 @@ Methods
:toctree: api/
Day.copy
- Day.is_anchored
Day.is_on_offset
Day.is_month_start
Day.is_month_end
@@ -1143,7 +1250,6 @@ Properties
.. autosummary::
:toctree: api/
- Hour.delta
Hour.freqstr
Hour.kwds
Hour.name
@@ -1158,7 +1264,6 @@ Methods
:toctree: api/
Hour.copy
- Hour.is_anchored
Hour.is_on_offset
Hour.is_month_start
Hour.is_month_end
@@ -1179,7 +1284,6 @@ Properties
.. autosummary::
:toctree: api/
- Minute.delta
Minute.freqstr
Minute.kwds
Minute.name
@@ -1194,7 +1298,6 @@ Methods
:toctree: api/
Minute.copy
- Minute.is_anchored
Minute.is_on_offset
Minute.is_month_start
Minute.is_month_end
@@ -1215,7 +1318,6 @@ Properties
.. autosummary::
:toctree: api/
- Second.delta
Second.freqstr
Second.kwds
Second.name
@@ -1230,7 +1332,6 @@ Methods
:toctree: api/
Second.copy
- Second.is_anchored
Second.is_on_offset
Second.is_month_start
Second.is_month_end
@@ -1251,7 +1352,6 @@ Properties
.. autosummary::
:toctree: api/
- Milli.delta
Milli.freqstr
Milli.kwds
Milli.name
@@ -1266,7 +1366,6 @@ Methods
:toctree: api/
Milli.copy
- Milli.is_anchored
Milli.is_on_offset
Milli.is_month_start
Milli.is_month_end
@@ -1287,7 +1386,6 @@ Properties
.. autosummary::
:toctree: api/
- Micro.delta
Micro.freqstr
Micro.kwds
Micro.name
@@ -1302,7 +1400,6 @@ Methods
:toctree: api/
Micro.copy
- Micro.is_anchored
Micro.is_on_offset
Micro.is_month_start
Micro.is_month_end
@@ -1323,7 +1420,6 @@ Properties
.. autosummary::
:toctree: api/
- Nano.delta
Nano.freqstr
Nano.kwds
Nano.name
@@ -1338,7 +1434,6 @@ Methods
:toctree: api/
Nano.copy
- Nano.is_anchored
Nano.is_on_offset
Nano.is_month_start
Nano.is_month_end
diff --git a/doc/source/reference/resampling.rst b/doc/source/reference/resampling.rst
index edbc8090fc849..2e0717081b129 100644
--- a/doc/source/reference/resampling.rst
+++ b/doc/source/reference/resampling.rst
@@ -38,7 +38,6 @@ Upsampling
Resampler.ffill
Resampler.bfill
Resampler.nearest
- Resampler.fillna
Resampler.asfreq
Resampler.interpolate
diff --git a/doc/source/reference/series.rst b/doc/source/reference/series.rst
index af262f9e6c336..6006acc8f5e16 100644
--- a/doc/source/reference/series.rst
+++ b/doc/source/reference/series.rst
@@ -25,6 +25,7 @@ Attributes
Series.array
Series.values
Series.dtype
+ Series.info
Series.shape
Series.nbytes
Series.ndim
@@ -47,7 +48,6 @@ Conversion
Series.convert_dtypes
Series.infer_objects
Series.copy
- Series.bool
Series.to_numpy
Series.to_period
Series.to_timestamp
@@ -177,17 +177,16 @@ Reindexing / selection / label manipulation
:toctree: api/
Series.align
+ Series.case_when
Series.drop
Series.droplevel
Series.drop_duplicates
Series.duplicated
Series.equals
- Series.first
Series.head
Series.idxmax
Series.idxmin
Series.isin
- Series.last
Series.reindex
Series.reindex_like
Series.rename
@@ -209,7 +208,6 @@ Missing data handling
.. autosummary::
:toctree: api/
- Series.backfill
Series.bfill
Series.dropna
Series.ffill
@@ -219,7 +217,6 @@ Missing data handling
Series.isnull
Series.notna
Series.notnull
- Series.pad
Series.replace
Reshaping, sorting
@@ -237,10 +234,8 @@ Reshaping, sorting
Series.unstack
Series.explode
Series.searchsorted
- Series.ravel
Series.repeat
Series.squeeze
- Series.view
Combining / comparing / joining / merging
-----------------------------------------
@@ -341,7 +336,6 @@ Datetime properties
Series.dt.tz
Series.dt.freq
Series.dt.unit
- Series.dt.normalize
Datetime methods
^^^^^^^^^^^^^^^^
diff --git a/doc/source/reference/style.rst b/doc/source/reference/style.rst
index 2256876c93e01..742263c788c2f 100644
--- a/doc/source/reference/style.rst
+++ b/doc/source/reference/style.rst
@@ -27,6 +27,7 @@ Styler properties
Styler.template_html_style
Styler.template_html_table
Styler.template_latex
+ Styler.template_typst
Styler.template_string
Styler.loader
@@ -41,6 +42,7 @@ Style application
Styler.map_index
Styler.format
Styler.format_index
+ Styler.format_index_names
Styler.relabel_index
Styler.hide
Styler.concat
@@ -76,6 +78,7 @@ Style export and import
Styler.to_html
Styler.to_latex
+ Styler.to_typst
Styler.to_excel
Styler.to_string
Styler.export
diff --git a/doc/source/reference/testing.rst b/doc/source/reference/testing.rst
index a5d61703aceed..1f164d1aa98b4 100644
--- a/doc/source/reference/testing.rst
+++ b/doc/source/reference/testing.rst
@@ -58,8 +58,6 @@ Exceptions and warnings
errors.PossiblePrecisionLoss
errors.PyperclipException
errors.PyperclipWindowsException
- errors.SettingWithCopyError
- errors.SettingWithCopyWarning
errors.SpecificationError
errors.UndefinedVariableError
errors.UnsortedIndexError
diff --git a/doc/source/reference/window.rst b/doc/source/reference/window.rst
index 14af2b8a120e0..2bd63f02faf69 100644
--- a/doc/source/reference/window.rst
+++ b/doc/source/reference/window.rst
@@ -30,15 +30,19 @@ Rolling window functions
Rolling.std
Rolling.min
Rolling.max
+ Rolling.first
+ Rolling.last
Rolling.corr
Rolling.cov
Rolling.skew
Rolling.kurt
Rolling.apply
+ Rolling.pipe
Rolling.aggregate
Rolling.quantile
Rolling.sem
Rolling.rank
+ Rolling.nunique
.. _api.functions_window:
@@ -71,15 +75,19 @@ Expanding window functions
Expanding.std
Expanding.min
Expanding.max
+ Expanding.first
+ Expanding.last
Expanding.corr
Expanding.cov
Expanding.skew
Expanding.kurt
Expanding.apply
+ Expanding.pipe
Expanding.aggregate
Expanding.quantile
Expanding.sem
Expanding.rank
+ Expanding.nunique
.. _api.functions_ewm:
diff --git a/doc/source/user_guide/10min.rst b/doc/source/user_guide/10min.rst
index c8e67710c85a9..72bb93d21a99f 100644
--- a/doc/source/user_guide/10min.rst
+++ b/doc/source/user_guide/10min.rst
@@ -19,7 +19,7 @@ Customarily, we import as follows:
Basic data structures in pandas
-------------------------------
-Pandas provides two types of classes for handling data:
+pandas provides two types of classes for handling data:
1. :class:`Series`: a one-dimensional labeled array holding data of any type
such as integers, strings, Python objects etc.
@@ -91,8 +91,8 @@ will be completed:
df2.any df2.combine
df2.append df2.D
df2.apply df2.describe
- df2.applymap df2.diff
df2.B df2.duplicated
+ df2.diff
As you can see, the columns ``A``, ``B``, ``C``, and ``D`` are automatically
tab completed. ``E`` and ``F`` are there as well; the rest of the attributes have been
@@ -101,7 +101,7 @@ truncated for brevity.
Viewing data
------------
-See the :ref:`Essentially basics functionality section `.
+See the :ref:`Essential basic functionality section `.
Use :meth:`DataFrame.head` and :meth:`DataFrame.tail` to view the top and bottom rows of the frame
respectively:
@@ -177,7 +177,7 @@ See the indexing documentation :ref:`Indexing and Selecting Data ` and
Getitem (``[]``)
~~~~~~~~~~~~~~~~
-For a :class:`DataFrame`, passing a single label selects a columns and
+For a :class:`DataFrame`, passing a single label selects a column and
yields a :class:`Series` equivalent to ``df.A``:
.. ipython:: python
@@ -563,7 +563,7 @@ columns:
.. ipython:: python
- stacked = df2.stack(future_stack=True)
+ stacked = df2.stack()
stacked
With a "stacked" DataFrame or Series (having a :class:`MultiIndex` as the
diff --git a/doc/source/user_guide/advanced.rst b/doc/source/user_guide/advanced.rst
index 453536098cfbb..f7ab466e92d93 100644
--- a/doc/source/user_guide/advanced.rst
+++ b/doc/source/user_guide/advanced.rst
@@ -11,13 +11,6 @@ and :ref:`other advanced indexing features `.
See the :ref:`Indexing and Selecting Data ` for general indexing documentation.
-.. warning::
-
- Whether a copy or a reference is returned for a setting operation may
- depend on the context. This is sometimes called ``chained assignment`` and
- should be avoided. See :ref:`Returning a View versus Copy
- `.
-
See the :ref:`cookbook` for some advanced strategies.
.. _advanced.hierarchical:
@@ -402,6 +395,7 @@ slicers on a single axis.
Furthermore, you can *set* the values using the following methods.
.. ipython:: python
+ :okwarning:
df2 = dfmi.copy()
df2.loc(axis=0)[:, :, ["C1", "C3"]] = -10
diff --git a/doc/source/user_guide/basics.rst b/doc/source/user_guide/basics.rst
index f7d89110e6c8f..8155aa0ae03fa 100644
--- a/doc/source/user_guide/basics.rst
+++ b/doc/source/user_guide/basics.rst
@@ -36,7 +36,7 @@ of elements to display is five, but you may pass a custom number.
Attributes and underlying data
------------------------------
-pandas objects have a number of attributes enabling you to access the metadata
+pandas objects have a number of attributes enabling you to access the metadata.
* **shape**: gives the axis dimensions of the object, consistent with ndarray
* Axis labels
@@ -59,7 +59,7 @@ NumPy's type system to add support for custom arrays
(see :ref:`basics.dtypes`).
To get the actual data inside a :class:`Index` or :class:`Series`, use
-the ``.array`` property
+the ``.array`` property.
.. ipython:: python
@@ -88,18 +88,18 @@ NumPy doesn't have a dtype to represent timezone-aware datetimes, so there
are two possibly useful representations:
1. An object-dtype :class:`numpy.ndarray` with :class:`Timestamp` objects, each
- with the correct ``tz``
+ with the correct ``tz``.
2. A ``datetime64[ns]`` -dtype :class:`numpy.ndarray`, where the values have
- been converted to UTC and the timezone discarded
+ been converted to UTC and the timezone discarded.
-Timezones may be preserved with ``dtype=object``
+Timezones may be preserved with ``dtype=object``:
.. ipython:: python
ser = pd.Series(pd.date_range("2000", periods=2, tz="CET"))
ser.to_numpy(dtype=object)
-Or thrown away with ``dtype='datetime64[ns]'``
+Or thrown away with ``dtype='datetime64[ns]'``:
.. ipython:: python
@@ -155,17 +155,6 @@ speedups. ``numexpr`` uses smart chunking, caching, and multiple cores. ``bottle
a set of specialized cython routines that are especially fast when dealing with arrays that have
``nans``.
-Here is a sample (using 100 column x 100,000 row ``DataFrames``):
-
-.. csv-table::
- :header: "Operation", "0.11.0 (ms)", "Prior Version (ms)", "Ratio to Prior"
- :widths: 25, 25, 25, 25
- :delim: ;
-
- ``df1 > df2``; 13.32; 125.35; 0.1063
- ``df1 * df2``; 21.71; 36.63; 0.5928
- ``df1 + df2``; 22.04; 36.50; 0.6039
-
You are highly encouraged to install both libraries. See the section
:ref:`Recommended Dependencies ` for more installation info.
@@ -299,8 +288,7 @@ Boolean reductions
~~~~~~~~~~~~~~~~~~
You can apply the reductions: :attr:`~DataFrame.empty`, :meth:`~DataFrame.any`,
-:meth:`~DataFrame.all`, and :meth:`~DataFrame.bool` to provide a
-way to summarize a boolean result.
+:meth:`~DataFrame.all`.
.. ipython:: python
@@ -477,15 +465,15 @@ For example:
.. ipython:: python
df
- df.mean(0)
- df.mean(1)
+ df.mean(axis=0)
+ df.mean(axis=1)
All such methods have a ``skipna`` option signaling whether to exclude missing
data (``True`` by default):
.. ipython:: python
- df.sum(0, skipna=False)
+ df.sum(axis=0, skipna=False)
df.sum(axis=1, skipna=True)
Combined with the broadcasting / arithmetic behavior, one can describe various
@@ -496,8 +484,8 @@ standard deviation of 1), very concisely:
ts_stand = (df - df.mean()) / df.std()
ts_stand.std()
- xs_stand = df.sub(df.mean(1), axis=0).div(df.std(1), axis=0)
- xs_stand.std(1)
+ xs_stand = df.sub(df.mean(axis=1), axis=0).div(df.std(axis=1), axis=0)
+ xs_stand.std(axis=1)
Note that methods like :meth:`~DataFrame.cumsum` and :meth:`~DataFrame.cumprod`
preserve the location of ``NaN`` values. This is somewhat different from
@@ -1309,8 +1297,8 @@ filling method chosen from the following table:
:header: "Method", "Action"
:widths: 30, 50
- pad / ffill, Fill values forward
- bfill / backfill, Fill values backward
+ ffill, Fill values forward
+ bfill, Fill values backward
nearest, Fill from the nearest index value
We illustrate these fill methods on a simple Series:
@@ -1608,7 +1596,7 @@ For instance:
This method does not convert the row to a Series object; it merely
returns the values inside a namedtuple. Therefore,
:meth:`~DataFrame.itertuples` preserves the data type of the values
-and is generally faster as :meth:`~DataFrame.iterrows`.
+and is generally faster than :meth:`~DataFrame.iterrows`.
.. note::
@@ -2076,12 +2064,12 @@ different numeric dtypes will **NOT** be combined. The following example will gi
.. ipython:: python
- df1 = pd.DataFrame(np.random.randn(8, 1), columns=["A"], dtype="float32")
+ df1 = pd.DataFrame(np.random.randn(8, 1), columns=["A"], dtype="float64")
df1
df1.dtypes
df2 = pd.DataFrame(
{
- "A": pd.Series(np.random.randn(8), dtype="float16"),
+ "A": pd.Series(np.random.randn(8), dtype="float32"),
"B": pd.Series(np.random.randn(8)),
"C": pd.Series(np.random.randint(0, 255, size=8), dtype="uint8"), # [0,255] (range of uint8)
}
diff --git a/doc/source/user_guide/boolean.rst b/doc/source/user_guide/boolean.rst
index 3c361d4de17e5..7de0430123fd2 100644
--- a/doc/source/user_guide/boolean.rst
+++ b/doc/source/user_guide/boolean.rst
@@ -37,6 +37,19 @@ If you would prefer to keep the ``NA`` values you can manually fill them with ``
s[mask.fillna(True)]
+If you create a column of ``NA`` values (for example to fill them later)
+with ``df['new_col'] = pd.NA``, the ``dtype`` would be set to ``object`` in the
+new column. The performance on this column will be worse than with
+the appropriate type. It's better to use
+``df['new_col'] = pd.Series(pd.NA, dtype="boolean")``
+(or another ``dtype`` that supports ``NA``).
+
+.. ipython:: python
+
+ df = pd.DataFrame()
+ df['objects'] = pd.NA
+ df.dtypes
+
.. _boolean.kleene:
Kleene logical operations
diff --git a/doc/source/user_guide/categorical.rst b/doc/source/user_guide/categorical.rst
index 8fb991dca02db..1e7d66dfeb142 100644
--- a/doc/source/user_guide/categorical.rst
+++ b/doc/source/user_guide/categorical.rst
@@ -245,7 +245,8 @@ Equality semantics
Two instances of :class:`~pandas.api.types.CategoricalDtype` compare equal
whenever they have the same categories and order. When comparing two
-unordered categoricals, the order of the ``categories`` is not considered.
+unordered categoricals, the order of the ``categories`` is not considered. Note
+that categories with different dtypes are not the same.
.. ipython:: python
@@ -263,6 +264,16 @@ All instances of ``CategoricalDtype`` compare equal to the string ``'category'``
c1 == "category"
+Notice that the ``categories_dtype`` should be considered, especially when comparing with
+two empty ``CategoricalDtype`` instances.
+
+.. ipython:: python
+
+ c2 = pd.Categorical(np.array([], dtype=object))
+ c3 = pd.Categorical(np.array([], dtype=float))
+
+ c2.dtype == c3.dtype
+
Description
-----------
@@ -782,7 +793,7 @@ Assigning a ``Categorical`` to parts of a column of other types will use the val
:okwarning:
df = pd.DataFrame({"a": [1, 1, 1, 1, 1], "b": ["a", "a", "a", "a", "a"]})
- df.loc[1:2, "a"] = pd.Categorical(["b", "b"], categories=["a", "b"])
+ df.loc[1:2, "a"] = pd.Categorical([2, 2], categories=[2, 3])
df.loc[2:3, "b"] = pd.Categorical(["b", "b"], categories=["a", "b"])
df
df.dtypes
diff --git a/doc/source/user_guide/cookbook.rst b/doc/source/user_guide/cookbook.rst
index b1a6aa8753be1..91a0b4a4fe967 100644
--- a/doc/source/user_guide/cookbook.rst
+++ b/doc/source/user_guide/cookbook.rst
@@ -35,7 +35,7 @@ These are some neat pandas ``idioms``
)
df
-if-then...
+If-then...
**********
An if-then on one column
@@ -176,7 +176,7 @@ One could hard code:
Selection
---------
-Dataframes
+DataFrames
**********
The :ref:`indexing ` docs.
@@ -311,7 +311,7 @@ The :ref:`multindexing ` docs.
df.columns = pd.MultiIndex.from_tuples([tuple(c.split("_")) for c in df.columns])
df
# Now stack & Reset
- df = df.stack(0, future_stack=True).reset_index(1)
+ df = df.stack(0).reset_index(1)
df
# And fix the labels (Notice the label 'level_1' got added automatically)
df.columns = ["Sample", "All_X", "All_Y"]
@@ -459,7 +459,7 @@ Unlike agg, apply's callable is passed a sub-DataFrame which gives you access to
df
# List the size of the animals with the highest weight.
- df.groupby("animal").apply(lambda subf: subf["size"][subf["weight"].idxmax()], include_groups=False)
+ df.groupby("animal").apply(lambda subf: subf["size"][subf["weight"].idxmax()])
`Using get_group
`__
@@ -482,7 +482,7 @@ Unlike agg, apply's callable is passed a sub-DataFrame which gives you access to
return pd.Series(["L", avg_weight, True], index=["size", "weight", "adult"])
- expected_df = gb.apply(GrowUp, include_groups=False)
+ expected_df = gb.apply(GrowUp)
expected_df
`Expanding apply
@@ -688,7 +688,7 @@ The :ref:`Pivot ` docs.
aggfunc="sum",
margins=True,
)
- table.stack("City", future_stack=True)
+ table.stack("City")
`Frequency table like plyr in R
`__
@@ -874,7 +874,7 @@ Timeseries
`__
`Aggregation and plotting time series
-`__
+`__
Turn a matrix with hours in columns and days in rows into a continuous row sequence in the form of a time series.
`How to rearrange a Python pandas DataFrame?
@@ -914,7 +914,7 @@ Using TimeGrouper and another grouping to create subgroups, then apply a custom
`__
`Resample intraday frame without adding new days
-`__
+`__
`Resample minute data
`__
@@ -1043,7 +1043,7 @@ CSV
The :ref:`CSV ` docs
-`read_csv in action `__
+`read_csv in action `__
`appending to a csv
`__
@@ -1489,7 +1489,7 @@ of the data values:
)
df
-Constant series
+Constant Series
---------------
To assess if a series has a constant value, we can check if ``series.nunique() <= 1``.
diff --git a/doc/source/user_guide/copy_on_write.rst b/doc/source/user_guide/copy_on_write.rst
index 050c3901c3420..90353d9f49f00 100644
--- a/doc/source/user_guide/copy_on_write.rst
+++ b/doc/source/user_guide/copy_on_write.rst
@@ -8,16 +8,12 @@ Copy-on-Write (CoW)
.. note::
- Copy-on-Write will become the default in pandas 3.0. We recommend
- :ref:`turning it on now `
- to benefit from all improvements.
+ Copy-on-Write is now the default with pandas 3.0.
Copy-on-Write was first introduced in version 1.5.0. Starting from version 2.0 most of the
optimizations that become possible through CoW are implemented and supported. All possible
optimizations are supported starting from pandas 2.1.
-CoW will be enabled by default in version 3.0.
-
CoW will lead to more predictable behavior since it is not possible to update more than
one object with one statement, e.g. indexing operations or methods won't have side-effects. Additionally, through
delaying copies as long as possible, the average performance and memory usage will improve.
@@ -29,21 +25,25 @@ pandas indexing behavior is tricky to understand. Some operations return views w
other return copies. Depending on the result of the operation, mutating one object
might accidentally mutate another:
-.. ipython:: python
+.. code-block:: ipython
- df = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 5, 6]})
- subset = df["foo"]
- subset.iloc[0] = 100
- df
+ In [1]: df = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 5, 6]})
+ In [2]: subset = df["foo"]
+ In [3]: subset.iloc[0] = 100
+ In [4]: df
+ Out[4]:
+ foo bar
+ 0 100 4
+ 1 2 5
+ 2 3 6
-Mutating ``subset``, e.g. updating its values, also updates ``df``. The exact behavior is
+
+Mutating ``subset``, e.g. updating its values, also updated ``df``. The exact behavior was
hard to predict. Copy-on-Write solves accidentally modifying more than one object,
-it explicitly disallows this. With CoW enabled, ``df`` is unchanged:
+it explicitly disallows this. ``df`` is unchanged:
.. ipython:: python
- pd.options.mode.copy_on_write = True
-
df = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 5, 6]})
subset = df["foo"]
subset.iloc[0] = 100
@@ -57,13 +57,13 @@ applications.
Migrating to Copy-on-Write
--------------------------
-Copy-on-Write will be the default and only mode in pandas 3.0. This means that users
+Copy-on-Write is the default and only mode in pandas 3.0. This means that users
need to migrate their code to be compliant with CoW rules.
-The default mode in pandas will raise warnings for certain cases that will actively
+The default mode in pandas < 3.0 raises warnings for certain cases that will actively
change behavior and thus change user intended behavior.
-We added another mode, e.g.
+pandas 2.2 has a warning mode
.. code-block:: python
@@ -84,7 +84,6 @@ The following few items describe the user visible changes:
**Accessing the underlying array of a pandas object will return a read-only view**
-
.. ipython:: python
ser = pd.Series([1, 2, 3])
@@ -101,16 +100,21 @@ for more details.
**Only one pandas object is updated at once**
-The following code snippet updates both ``df`` and ``subset`` without CoW:
+The following code snippet updated both ``df`` and ``subset`` without CoW:
-.. ipython:: python
+.. code-block:: ipython
- df = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 5, 6]})
- subset = df["foo"]
- subset.iloc[0] = 100
- df
+ In [1]: df = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 5, 6]})
+ In [2]: subset = df["foo"]
+ In [3]: subset.iloc[0] = 100
+ In [4]: df
+ Out[4]:
+ foo bar
+ 0 100 4
+ 1 2 5
+ 2 3 6
-This won't be possible anymore with CoW, since the CoW rules explicitly forbid this.
+This is not possible anymore with CoW, since the CoW rules explicitly forbid this.
This includes updating a single column as a :class:`Series` and relying on the change
propagating back to the parent :class:`DataFrame`.
This statement can be rewritten into a single statement with ``loc`` or ``iloc`` if
@@ -146,7 +150,7 @@ A different alternative would be to not use ``inplace``:
**Constructors now copy NumPy arrays by default**
-The Series and DataFrame constructors will now copy NumPy array by default when not
+The Series and DataFrame constructors now copies a NumPy array by default when not
otherwise specified. This was changed to avoid mutating a pandas object when the
NumPy array is changed inplace outside of pandas. You can set ``copy=False`` to
avoid this copy.
@@ -162,7 +166,7 @@ that shares data with another DataFrame or Series object inplace.
This avoids side-effects when modifying values and hence, most methods can avoid
actually copying the data and only trigger a copy when necessary.
-The following example will operate inplace with CoW:
+The following example will operate inplace:
.. ipython:: python
@@ -207,15 +211,17 @@ listed in :ref:`Copy-on-Write optimizations `.
Previously, when operating on views, the view and the parent object was modified:
-.. ipython:: python
-
- with pd.option_context("mode.copy_on_write", False):
- df = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 5, 6]})
- view = df[:]
- df.iloc[0, 0] = 100
+.. code-block:: ipython
- df
- view
+ In [1]: df = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 5, 6]})
+ In [2]: subset = df["foo"]
+ In [3]: subset.iloc[0] = 100
+ In [4]: df
+ Out[4]:
+ foo bar
+ 0 100 4
+ 1 2 5
+ 2 3 6
CoW triggers a copy when ``df`` is changed to avoid mutating ``view`` as well:
@@ -236,16 +242,19 @@ Chained Assignment
Chained assignment references a technique where an object is updated through
two subsequent indexing operations, e.g.
-.. ipython:: python
- :okwarning:
+.. code-block:: ipython
- with pd.option_context("mode.copy_on_write", False):
- df = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 5, 6]})
- df["foo"][df["bar"] > 5] = 100
- df
+ In [1]: df = pd.DataFrame({"foo": [1, 2, 3], "bar": [4, 5, 6]})
+ In [2]: df["foo"][df["bar"] > 5] = 100
+ In [3]: df
+ Out[3]:
+ foo bar
+ 0 100 4
+ 1 2 5
+ 2 3 6
-The column ``foo`` is updated where the column ``bar`` is greater than 5.
-This violates the CoW principles though, because it would have to modify the
+The column ``foo`` was updated where the column ``bar`` is greater than 5.
+This violated the CoW principles though, because it would have to modify the
view ``df["foo"]`` and ``df`` in one step. Hence, chained assignment will
consistently never work and raise a ``ChainedAssignmentError`` warning
with CoW enabled:
@@ -272,7 +281,6 @@ shares data with the initial DataFrame:
The array is a copy if the initial DataFrame consists of more than one array:
-
.. ipython:: python
df = pd.DataFrame({"a": [1, 2], "b": [1.5, 2.5]})
@@ -295,7 +303,7 @@ This array is read-only, which means that it can't be modified inplace:
The same holds true for a Series, since a Series always consists of a single array.
-There are two potential solution to this:
+There are two potential solutions to this:
- Trigger a copy manually if you want to avoid updating DataFrames that share memory with your array.
- Make the array writeable. This is a more performant solution but circumvents Copy-on-Write rules, so
@@ -317,7 +325,7 @@ you are modifying one object inplace.
.. ipython:: python
df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
- df2 = df.reset_index()
+ df2 = df.reset_index(drop=True)
df2.iloc[0, 0] = 100
This creates two objects that share data and thus the setitem operation will trigger a
@@ -328,7 +336,7 @@ held by the object.
.. ipython:: python
df = pd.DataFrame({"a": [1, 2, 3], "b": [4, 5, 6]})
- df = df.reset_index()
+ df = df.reset_index(drop=True)
df.iloc[0, 0] = 100
No copy is necessary in this example.
@@ -347,22 +355,3 @@ and :meth:`DataFrame.rename`.
These methods return views when Copy-on-Write is enabled, which provides a significant
performance improvement compared to the regular execution.
-
-.. _copy_on_write_enabling:
-
-How to enable CoW
------------------
-
-Copy-on-Write can be enabled through the configuration option ``copy_on_write``. The option can
-be turned on __globally__ through either of the following:
-
-.. ipython:: python
-
- pd.set_option("mode.copy_on_write", True)
-
- pd.options.mode.copy_on_write = True
-
-.. ipython:: python
- :suppress:
-
- pd.options.mode.copy_on_write = False
diff --git a/doc/source/user_guide/dsintro.rst b/doc/source/user_guide/dsintro.rst
index d1e981ee1bbdc..89981786d60b5 100644
--- a/doc/source/user_guide/dsintro.rst
+++ b/doc/source/user_guide/dsintro.rst
@@ -41,8 +41,8 @@ Here, ``data`` can be many different things:
* an ndarray
* a scalar value (like 5)
-The passed **index** is a list of axis labels. Thus, this separates into a few
-cases depending on what **data is**:
+The passed **index** is a list of axis labels. The constructor's behavior
+depends on **data**'s type:
**From ndarray**
@@ -87,8 +87,9 @@ index will be pulled out.
**From scalar value**
-If ``data`` is a scalar value, an index must be
-provided. The value will be repeated to match the length of **index**.
+If ``data`` is a scalar value, the value will be repeated to match
+the length of **index**. If the **index** is not provided, it defaults
+to ``RangeIndex(1)``.
.. ipython:: python
@@ -97,7 +98,7 @@ provided. The value will be repeated to match the length of **index**.
Series is ndarray-like
~~~~~~~~~~~~~~~~~~~~~~
-:class:`Series` acts very similarly to a ``ndarray`` and is a valid argument to most NumPy functions.
+:class:`Series` acts very similarly to a :class:`numpy.ndarray` and is a valid argument to most NumPy functions.
However, operations such as slicing will also slice the index.
.. ipython:: python
@@ -111,7 +112,7 @@ However, operations such as slicing will also slice the index.
.. note::
We will address array-based indexing like ``s.iloc[[4, 3, 1]]``
- in :ref:`section on indexing `.
+ in the :ref:`section on indexing `.
Like a NumPy array, a pandas :class:`Series` has a single :attr:`~Series.dtype`.
@@ -325,7 +326,7 @@ This case is handled identically to a dict of arrays.
.. ipython:: python
- data = np.zeros((2,), dtype=[("A", "i4"), ("B", "f4"), ("C", "a10")])
+ data = np.zeros((2,), dtype=[("A", "i4"), ("B", "f4"), ("C", "S10")])
data[:] = [(1, 2.0, "Hello"), (2, 3.0, "World")]
pd.DataFrame(data)
diff --git a/doc/source/user_guide/enhancingperf.rst b/doc/source/user_guide/enhancingperf.rst
index 8c510173819e0..647b0f760f4d4 100644
--- a/doc/source/user_guide/enhancingperf.rst
+++ b/doc/source/user_guide/enhancingperf.rst
@@ -171,6 +171,7 @@ can be improved by passing an ``np.ndarray``.
In [4]: %%cython
...: cimport numpy as np
...: import numpy as np
+ ...: np.import_array()
...: cdef double f_typed(double x) except? -2:
...: return x * (x - 1)
...: cpdef double integrate_f_typed(double a, double b, int N):
@@ -225,6 +226,7 @@ and ``wraparound`` checks can yield more performance.
...: cimport cython
...: cimport numpy as np
...: import numpy as np
+ ...: np.import_array()
...: cdef np.float64_t f_typed(np.float64_t x) except? -2:
...: return x * (x - 1)
...: cpdef np.float64_t integrate_f_typed(np.float64_t a, np.float64_t b, np.int64_t N):
@@ -427,7 +429,7 @@ prefer that Numba throw an error if it cannot compile a function in a way that
speeds up your code, pass Numba the argument
``nopython=True`` (e.g. ``@jit(nopython=True)``). For more on
troubleshooting Numba modes, see the `Numba troubleshooting page
-`__.
+`__.
Using ``parallel=True`` (e.g. ``@jit(parallel=True)``) may result in a ``SIGABRT`` if the threading layer leads to unsafe
behavior. You can first `specify a safe threading layer `__
@@ -453,7 +455,7 @@ by evaluate arithmetic and boolean expression all at once for large :class:`~pan
:func:`~pandas.eval` is many orders of magnitude slower for
smaller expressions or objects than plain Python. A good rule of thumb is
to only use :func:`~pandas.eval` when you have a
- :class:`.DataFrame` with more than 10,000 rows.
+ :class:`~pandas.core.frame.DataFrame` with more than 10,000 rows.
Supported syntax
~~~~~~~~~~~~~~~~
diff --git a/doc/source/user_guide/gotchas.rst b/doc/source/user_guide/gotchas.rst
index 99c85ac66623d..e85eead4e0f09 100644
--- a/doc/source/user_guide/gotchas.rst
+++ b/doc/source/user_guide/gotchas.rst
@@ -121,7 +121,7 @@ Below is how to check if any of the values are ``True``:
if pd.Series([False, True, False]).any():
print("I am any")
-Bitwise boolean
+Bitwise Boolean
~~~~~~~~~~~~~~~
Bitwise boolean operators like ``==`` and ``!=`` return a boolean :class:`Series`
@@ -315,19 +315,8 @@ Why not make NumPy like R?
Many people have suggested that NumPy should simply emulate the ``NA`` support
present in the more domain-specific statistical programming language `R
-`__. Part of the reason is the NumPy type hierarchy:
-
-.. csv-table::
- :header: "Typeclass","Dtypes"
- :widths: 30,70
- :delim: |
-
- ``numpy.floating`` | ``float16, float32, float64, float128``
- ``numpy.integer`` | ``int8, int16, int32, int64``
- ``numpy.unsignedinteger`` | ``uint8, uint16, uint32, uint64``
- ``numpy.object_`` | ``object_``
- ``numpy.bool_`` | ``bool_``
- ``numpy.character`` | ``bytes_, str_``
+`__. Part of the reason is the
+`NumPy type hierarchy `__.
The R language, by contrast, only has a handful of built-in data types:
``integer``, ``numeric`` (floating-point), ``character``, and
@@ -383,5 +372,5 @@ constructors using something similar to the following:
s = pd.Series(newx)
See `the NumPy documentation on byte order
-`__ for more
+`__ for more
details.
diff --git a/doc/source/user_guide/groupby.rst b/doc/source/user_guide/groupby.rst
index 11863f8aead31..4ec34db6ed959 100644
--- a/doc/source/user_guide/groupby.rst
+++ b/doc/source/user_guide/groupby.rst
@@ -137,15 +137,6 @@ We could naturally group by either the ``A`` or ``B`` columns, or both:
``df.groupby('A')`` is just syntactic sugar for ``df.groupby(df['A'])``.
-If we also have a MultiIndex on columns ``A`` and ``B``, we can group by all
-the columns except the one we specify:
-
-.. ipython:: python
-
- df2 = df.set_index(["A", "B"])
- grouped = df2.groupby(level=df2.index.names.difference(["B"]))
- grouped.sum()
-
The above GroupBy will split the DataFrame on its index (rows). To split by columns, first do
a transpose:
@@ -247,7 +238,7 @@ GroupBy object attributes
~~~~~~~~~~~~~~~~~~~~~~~~~
The ``groups`` attribute is a dictionary whose keys are the computed unique groups
-and corresponding values are the axis labels belonging to each group. In the
+and corresponding values are the index labels belonging to each group. In the
above example we have:
.. ipython:: python
@@ -289,7 +280,7 @@ the number of groups, which is the same as the length of the ``groups`` dictiona
In [1]: gb. # noqa: E225, E999
gb.agg gb.boxplot gb.cummin gb.describe gb.filter gb.get_group gb.height gb.last gb.median gb.ngroups gb.plot gb.rank gb.std gb.transform
gb.aggregate gb.count gb.cumprod gb.dtype gb.first gb.groups gb.hist gb.max gb.min gb.nth gb.prod gb.resample gb.sum gb.var
- gb.apply gb.cummax gb.cumsum gb.fillna gb.gender gb.head gb.indices gb.mean gb.name gb.ohlc gb.quantile gb.size gb.tail gb.weight
+ gb.apply gb.cummax gb.cumsum gb.gender gb.head gb.indices gb.mean gb.name gb.ohlc gb.quantile gb.size gb.tail gb.weight
.. _groupby.multiindex:
@@ -425,6 +416,12 @@ You can also include the grouping columns if you want to operate on them.
grouped[["A", "B"]].sum()
+.. note::
+
+ The ``groupby`` operation in pandas drops the ``name`` field of the columns Index object
+ after the operation. This change ensures consistency in syntax between different
+ column selection methods within groupby operations.
+
.. _groupby.iterating-label:
Iterating through groups
@@ -509,29 +506,28 @@ listed below, those with a ``*`` do *not* have an efficient, GroupBy-specific, i
.. csv-table::
:header: "Method", "Description"
:widths: 20, 80
- :delim: ;
-
- :meth:`~.DataFrameGroupBy.any`;Compute whether any of the values in the groups are truthy
- :meth:`~.DataFrameGroupBy.all`;Compute whether all of the values in the groups are truthy
- :meth:`~.DataFrameGroupBy.count`;Compute the number of non-NA values in the groups
- :meth:`~.DataFrameGroupBy.cov` * ;Compute the covariance of the groups
- :meth:`~.DataFrameGroupBy.first`;Compute the first occurring value in each group
- :meth:`~.DataFrameGroupBy.idxmax`;Compute the index of the maximum value in each group
- :meth:`~.DataFrameGroupBy.idxmin`;Compute the index of the minimum value in each group
- :meth:`~.DataFrameGroupBy.last`;Compute the last occurring value in each group
- :meth:`~.DataFrameGroupBy.max`;Compute the maximum value in each group
- :meth:`~.DataFrameGroupBy.mean`;Compute the mean of each group
- :meth:`~.DataFrameGroupBy.median`;Compute the median of each group
- :meth:`~.DataFrameGroupBy.min`;Compute the minimum value in each group
- :meth:`~.DataFrameGroupBy.nunique`;Compute the number of unique values in each group
- :meth:`~.DataFrameGroupBy.prod`;Compute the product of the values in each group
- :meth:`~.DataFrameGroupBy.quantile`;Compute a given quantile of the values in each group
- :meth:`~.DataFrameGroupBy.sem`;Compute the standard error of the mean of the values in each group
- :meth:`~.DataFrameGroupBy.size`;Compute the number of values in each group
- :meth:`~.DataFrameGroupBy.skew` *;Compute the skew of the values in each group
- :meth:`~.DataFrameGroupBy.std`;Compute the standard deviation of the values in each group
- :meth:`~.DataFrameGroupBy.sum`;Compute the sum of the values in each group
- :meth:`~.DataFrameGroupBy.var`;Compute the variance of the values in each group
+
+ :meth:`~.DataFrameGroupBy.any`,Compute whether any of the values in the groups are truthy
+ :meth:`~.DataFrameGroupBy.all`,Compute whether all of the values in the groups are truthy
+ :meth:`~.DataFrameGroupBy.count`,Compute the number of non-NA values in the groups
+ :meth:`~.DataFrameGroupBy.cov` * ,Compute the covariance of the groups
+ :meth:`~.DataFrameGroupBy.first`,Compute the first occurring value in each group
+ :meth:`~.DataFrameGroupBy.idxmax`,Compute the index of the maximum value in each group
+ :meth:`~.DataFrameGroupBy.idxmin`,Compute the index of the minimum value in each group
+ :meth:`~.DataFrameGroupBy.last`,Compute the last occurring value in each group
+ :meth:`~.DataFrameGroupBy.max`,Compute the maximum value in each group
+ :meth:`~.DataFrameGroupBy.mean`,Compute the mean of each group
+ :meth:`~.DataFrameGroupBy.median`,Compute the median of each group
+ :meth:`~.DataFrameGroupBy.min`,Compute the minimum value in each group
+ :meth:`~.DataFrameGroupBy.nunique`,Compute the number of unique values in each group
+ :meth:`~.DataFrameGroupBy.prod`,Compute the product of the values in each group
+ :meth:`~.DataFrameGroupBy.quantile`,Compute a given quantile of the values in each group
+ :meth:`~.DataFrameGroupBy.sem`,Compute the standard error of the mean of the values in each group
+ :meth:`~.DataFrameGroupBy.size`,Compute the number of values in each group
+ :meth:`~.DataFrameGroupBy.skew` * ,Compute the skew of the values in each group
+ :meth:`~.DataFrameGroupBy.std`,Compute the standard deviation of the values in each group
+ :meth:`~.DataFrameGroupBy.sum`,Compute the sum of the values in each group
+ :meth:`~.DataFrameGroupBy.var`,Compute the variance of the values in each group
Some examples:
@@ -622,7 +618,7 @@ this will make an extra copy.
.. _groupby.aggregate.udf:
-Aggregation with User-Defined Functions
+Aggregation with user-defined functions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Users can also provide their own User-Defined Functions (UDFs) for custom aggregations.
@@ -672,8 +668,9 @@ column, which produces an aggregated result with a hierarchical column index:
grouped[["C", "D"]].agg(["sum", "mean", "std"])
-The resulting aggregations are named after the functions themselves. If you
-need to rename, then you can add in a chained operation for a ``Series`` like this:
+The resulting aggregations are named after the functions themselves.
+
+For a ``Series``, if you need to rename, you can add in a chained operation like this:
.. ipython:: python
@@ -683,8 +680,19 @@ need to rename, then you can add in a chained operation for a ``Series`` like th
.rename(columns={"sum": "foo", "mean": "bar", "std": "baz"})
)
+Or, you can simply pass a list of tuples each with the name of the new column and the aggregate function:
+
+.. ipython:: python
+
+ (
+ grouped["C"]
+ .agg([("foo", "sum"), ("bar", "mean"), ("baz", "std")])
+ )
+
For a grouped ``DataFrame``, you can rename in a similar manner:
+By chaining ``rename`` operation,
+
.. ipython:: python
(
@@ -693,6 +701,16 @@ For a grouped ``DataFrame``, you can rename in a similar manner:
)
)
+Or, passing a list of tuples,
+
+.. ipython:: python
+
+ (
+ grouped[["C", "D"]].agg(
+ [("foo", "sum"), ("bar", "mean"), ("baz", "std")]
+ )
+ )
+
.. note::
In general, the output column names should be unique, but pandas will allow
@@ -835,19 +853,18 @@ The following methods on GroupBy act as transformations.
.. csv-table::
:header: "Method", "Description"
:widths: 20, 80
- :delim: ;
-
- :meth:`~.DataFrameGroupBy.bfill`;Back fill NA values within each group
- :meth:`~.DataFrameGroupBy.cumcount`;Compute the cumulative count within each group
- :meth:`~.DataFrameGroupBy.cummax`;Compute the cumulative max within each group
- :meth:`~.DataFrameGroupBy.cummin`;Compute the cumulative min within each group
- :meth:`~.DataFrameGroupBy.cumprod`;Compute the cumulative product within each group
- :meth:`~.DataFrameGroupBy.cumsum`;Compute the cumulative sum within each group
- :meth:`~.DataFrameGroupBy.diff`;Compute the difference between adjacent values within each group
- :meth:`~.DataFrameGroupBy.ffill`;Forward fill NA values within each group
- :meth:`~.DataFrameGroupBy.pct_change`;Compute the percent change between adjacent values within each group
- :meth:`~.DataFrameGroupBy.rank`;Compute the rank of each value within each group
- :meth:`~.DataFrameGroupBy.shift`;Shift values up or down within each group
+
+ :meth:`~.DataFrameGroupBy.bfill`,Back fill NA values within each group
+ :meth:`~.DataFrameGroupBy.cumcount`,Compute the cumulative count within each group
+ :meth:`~.DataFrameGroupBy.cummax`,Compute the cumulative max within each group
+ :meth:`~.DataFrameGroupBy.cummin`,Compute the cumulative min within each group
+ :meth:`~.DataFrameGroupBy.cumprod`,Compute the cumulative product within each group
+ :meth:`~.DataFrameGroupBy.cumsum`,Compute the cumulative sum within each group
+ :meth:`~.DataFrameGroupBy.diff`,Compute the difference between adjacent values within each group
+ :meth:`~.DataFrameGroupBy.ffill`,Forward fill NA values within each group
+ :meth:`~.DataFrameGroupBy.pct_change`,Compute the percent change between adjacent values within each group
+ :meth:`~.DataFrameGroupBy.rank`,Compute the rank of each value within each group
+ :meth:`~.DataFrameGroupBy.shift`,Shift values up or down within each group
In addition, passing any built-in aggregation method as a string to
:meth:`~.DataFrameGroupBy.transform` (see the next section) will broadcast the result
@@ -1057,7 +1074,7 @@ missing values with the ``ffill()`` method.
).set_index("date")
df_re
- df_re.groupby("group").resample("1D", include_groups=False).ffill()
+ df_re.groupby("group").resample("1D").ffill()
.. _groupby.filter:
@@ -1095,11 +1112,10 @@ efficient, GroupBy-specific, implementation.
.. csv-table::
:header: "Method", "Description"
:widths: 20, 80
- :delim: ;
- :meth:`~.DataFrameGroupBy.head`;Select the top row(s) of each group
- :meth:`~.DataFrameGroupBy.nth`;Select the nth row(s) of each group
- :meth:`~.DataFrameGroupBy.tail`;Select the bottom row(s) of each group
+ :meth:`~.DataFrameGroupBy.head`,Select the top row(s) of each group
+ :meth:`~.DataFrameGroupBy.nth`,Select the nth row(s) of each group
+ :meth:`~.DataFrameGroupBy.tail`,Select the bottom row(s) of each group
Users can also use transformations along with Boolean indexing to construct complex
filtrations within groups. For example, suppose we are given groups of products and
@@ -1236,16 +1252,16 @@ the argument ``group_keys`` which defaults to ``True``. Compare
.. ipython:: python
- df.groupby("A", group_keys=True).apply(lambda x: x, include_groups=False)
+ df.groupby("A", group_keys=True).apply(lambda x: x)
with
.. ipython:: python
- df.groupby("A", group_keys=False).apply(lambda x: x, include_groups=False)
+ df.groupby("A", group_keys=False).apply(lambda x: x)
-Numba Accelerated Routines
+Numba accelerated routines
--------------------------
.. versionadded:: 1.1
@@ -1680,7 +1696,7 @@ introduction ` and the
dfg.groupby(["A", [0, 0, 0, 1, 1]]).ngroup()
-Groupby by indexer to 'resample' data
+GroupBy by indexer to 'resample' data
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Resampling produces new hypothetical samples (resamples) from already existing observed data or from a model that generates data. These new samples are similar to the pre-existing samples.
@@ -1726,8 +1742,8 @@ column index name will be used as the name of the inserted column:
result = {"b_sum": x["b"].sum(), "c_mean": x["c"].mean()}
return pd.Series(result, name="metrics")
- result = df.groupby("a").apply(compute_metrics, include_groups=False)
+ result = df.groupby("a").apply(compute_metrics)
result
- result.stack(future_stack=True)
+ result.stack()
diff --git a/doc/source/user_guide/index.rst b/doc/source/user_guide/index.rst
index f0d6a76f0de5b..230b2b86b2ffd 100644
--- a/doc/source/user_guide/index.rst
+++ b/doc/source/user_guide/index.rst
@@ -78,6 +78,7 @@ Guides
boolean
visualization
style
+ user_defined_functions
groupby
window
timeseries
diff --git a/doc/source/user_guide/indexing.rst b/doc/source/user_guide/indexing.rst
index 4954ee1538697..ed5c7806b2e23 100644
--- a/doc/source/user_guide/indexing.rst
+++ b/doc/source/user_guide/indexing.rst
@@ -29,13 +29,6 @@ this area.
production code, we recommended that you take advantage of the optimized
pandas data access methods exposed in this chapter.
-.. warning::
-
- Whether a copy or a reference is returned for a setting operation, may
- depend on the context. This is sometimes called ``chained assignment`` and
- should be avoided. See :ref:`Returning a View versus Copy
- `.
-
See the :ref:`MultiIndex / Advanced Indexing ` for ``MultiIndex`` and more advanced indexing documentation.
See the :ref:`cookbook` for some advanced strategies.
@@ -101,13 +94,14 @@ well). Any of the axes accessors may be the null slice ``:``. Axes left out of
the specification are assumed to be ``:``, e.g. ``p.loc['a']`` is equivalent to
``p.loc['a', :]``.
-.. csv-table::
- :header: "Object Type", "Indexers"
- :widths: 30, 50
- :delim: ;
- Series; ``s.loc[indexer]``
- DataFrame; ``df.loc[row_indexer,column_indexer]``
+.. ipython:: python
+
+ ser = pd.Series(range(5), index=list("abcde"))
+ ser.loc[["a", "c", "e"]]
+
+ df = pd.DataFrame(np.arange(25).reshape(5, 5), index=list("abcde"), columns=list("abcde"))
+ df.loc[["a", "c", "e"], ["b", "d"]]
.. _indexing.basics:
@@ -123,10 +117,9 @@ indexing pandas objects with ``[]``:
.. csv-table::
:header: "Object Type", "Selection", "Return Value Type"
:widths: 30, 30, 60
- :delim: ;
- Series; ``series[label]``; scalar value
- DataFrame; ``frame[colname]``; ``Series`` corresponding to colname
+ Series, ``series[label]``, scalar value
+ DataFrame, ``frame[colname]``, ``Series`` corresponding to colname
Here we construct a simple time series data set to use for illustrating the
indexing functionality:
@@ -269,6 +262,10 @@ The most robust and consistent way of slicing ranges along arbitrary axes is
described in the :ref:`Selection by Position ` section
detailing the ``.iloc`` method. For now, we explain the semantics of slicing using the ``[]`` operator.
+ .. note::
+
+ When the :class:`Series` has float indices, slicing will select by position.
+
With Series, the syntax works exactly as with an ndarray, returning a slice of
the values and the corresponding labels:
@@ -299,12 +296,6 @@ largely as a convenience since it is such a common operation.
Selection by label
------------------
-.. warning::
-
- Whether a copy or a reference is returned for a setting operation, may depend on the context.
- This is sometimes called ``chained assignment`` and should be avoided.
- See :ref:`Returning a View versus Copy `.
-
.. warning::
``.loc`` is strict when you present slicers that are not compatible (or convertible) with the index type. For example
@@ -412,9 +403,9 @@ are returned:
s = pd.Series(list('abcde'), index=[0, 3, 2, 5, 4])
s.loc[3:5]
-If at least one of the two is absent, but the index is sorted, and can be
-compared against start and stop labels, then slicing will still work as
-expected, by selecting labels which *rank* between the two:
+If the index is sorted, and can be compared against start and stop labels,
+then slicing will still work as expected, by selecting labels which *rank*
+between the two:
.. ipython:: python
@@ -445,12 +436,6 @@ For more information about duplicate labels, see
Selection by position
---------------------
-.. warning::
-
- Whether a copy or a reference is returned for a setting operation, may depend on the context.
- This is sometimes called ``chained assignment`` and should be avoided.
- See :ref:`Returning a View versus Copy `.
-
pandas provides a suite of methods in order to get **purely integer based indexing**. The semantics follow closely Python and NumPy slicing. These are ``0-based`` indexing. When slicing, the start bound is *included*, while the upper bound is *excluded*. Trying to use a non-integer, even a **valid** label will raise an ``IndexError``.
The ``.iloc`` attribute is the primary access method. The following are valid inputs:
@@ -873,9 +858,10 @@ and :ref:`Advanced Indexing ` you may select along more than one axis
.. warning::
- ``iloc`` supports two kinds of boolean indexing. If the indexer is a boolean ``Series``,
- an error will be raised. For instance, in the following example, ``df.iloc[s.values, 1]`` is ok.
- The boolean indexer is an array. But ``df.iloc[s, 1]`` would raise ``ValueError``.
+ While ``loc`` supports two kinds of boolean indexing, ``iloc`` only supports indexing with a
+ boolean array. If the indexer is a boolean ``Series``, an error will be raised. For instance,
+ in the following example, ``df.iloc[s.values, 1]`` is ok. The boolean indexer is an array.
+ But ``df.iloc[s, 1]`` would raise ``ValueError``.
.. ipython:: python
@@ -967,7 +953,7 @@ To select a row where each column meets its own criterion:
values = {'ids': ['a', 'b'], 'ids2': ['a', 'c'], 'vals': [1, 3]}
- row_mask = df.isin(values).all(1)
+ row_mask = df.isin(values).all(axis=1)
df[row_mask]
@@ -1722,234 +1708,10 @@ You can assign a custom index to the ``index`` attribute:
df_idx.index = pd.Index([10, 20, 30, 40], name="a")
df_idx
-.. _indexing.view_versus_copy:
-
-Returning a view versus a copy
-------------------------------
-
-.. warning::
-
- :ref:`Copy-on-Write `
- will become the new default in pandas 3.0. This means than chained indexing will
- never work. As a consequence, the ``SettingWithCopyWarning`` won't be necessary
- anymore.
- See :ref:`this section `
- for more context.
- We recommend turning Copy-on-Write on to leverage the improvements with
-
- ```
- pd.options.mode.copy_on_write = True
- ```
-
- even before pandas 3.0 is available.
-
-When setting values in a pandas object, care must be taken to avoid what is called
-``chained indexing``. Here is an example.
-
-.. ipython:: python
-
- dfmi = pd.DataFrame([list('abcd'),
- list('efgh'),
- list('ijkl'),
- list('mnop')],
- columns=pd.MultiIndex.from_product([['one', 'two'],
- ['first', 'second']]))
- dfmi
-
-Compare these two access methods:
-
-.. ipython:: python
-
- dfmi['one']['second']
-
-.. ipython:: python
-
- dfmi.loc[:, ('one', 'second')]
-
-These both yield the same results, so which should you use? It is instructive to understand the order
-of operations on these and why method 2 (``.loc``) is much preferred over method 1 (chained ``[]``).
-
-``dfmi['one']`` selects the first level of the columns and returns a DataFrame that is singly-indexed.
-Then another Python operation ``dfmi_with_one['second']`` selects the series indexed by ``'second'``.
-This is indicated by the variable ``dfmi_with_one`` because pandas sees these operations as separate events.
-e.g. separate calls to ``__getitem__``, so it has to treat them as linear operations, they happen one after another.
-
-Contrast this to ``df.loc[:,('one','second')]`` which passes a nested tuple of ``(slice(None),('one','second'))`` to a single call to
-``__getitem__``. This allows pandas to deal with this as a single entity. Furthermore this order of operations *can* be significantly
-faster, and allows one to index *both* axes if so desired.
-
Why does assignment fail when using chained indexing?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-.. warning::
-
- :ref:`Copy-on-Write `
- will become the new default in pandas 3.0. This means than chained indexing will
- never work. As a consequence, the ``SettingWithCopyWarning`` won't be necessary
- anymore.
- See :ref:`this section `
- for more context.
- We recommend turning Copy-on-Write on to leverage the improvements with
-
- ```
- pd.options.mode.copy_on_write = True
- ```
-
- even before pandas 3.0 is available.
-
-The problem in the previous section is just a performance issue. What's up with
-the ``SettingWithCopy`` warning? We don't **usually** throw warnings around when
-you do something that might cost a few extra milliseconds!
-
-But it turns out that assigning to the product of chained indexing has
-inherently unpredictable results. To see this, think about how the Python
-interpreter executes this code:
-
-.. code-block:: python
-
- dfmi.loc[:, ('one', 'second')] = value
- # becomes
- dfmi.loc.__setitem__((slice(None), ('one', 'second')), value)
-
-But this code is handled differently:
-
-.. code-block:: python
-
- dfmi['one']['second'] = value
- # becomes
- dfmi.__getitem__('one').__setitem__('second', value)
-
-See that ``__getitem__`` in there? Outside of simple cases, it's very hard to
-predict whether it will return a view or a copy (it depends on the memory layout
-of the array, about which pandas makes no guarantees), and therefore whether
-the ``__setitem__`` will modify ``dfmi`` or a temporary object that gets thrown
-out immediately afterward. **That's** what ``SettingWithCopy`` is warning you
-about!
-
-.. note:: You may be wondering whether we should be concerned about the ``loc``
- property in the first example. But ``dfmi.loc`` is guaranteed to be ``dfmi``
- itself with modified indexing behavior, so ``dfmi.loc.__getitem__`` /
- ``dfmi.loc.__setitem__`` operate on ``dfmi`` directly. Of course,
- ``dfmi.loc.__getitem__(idx)`` may be a view or a copy of ``dfmi``.
-
-Sometimes a ``SettingWithCopy`` warning will arise at times when there's no
-obvious chained indexing going on. **These** are the bugs that
-``SettingWithCopy`` is designed to catch! pandas is probably trying to warn you
-that you've done this:
-
-.. code-block:: python
-
- def do_something(df):
- foo = df[['bar', 'baz']] # Is foo a view? A copy? Nobody knows!
- # ... many lines here ...
- # We don't know whether this will modify df or not!
- foo['quux'] = value
- return foo
-
-Yikes!
-
-.. _indexing.evaluation_order:
-
-Evaluation order matters
-~~~~~~~~~~~~~~~~~~~~~~~~
-
-.. warning::
-
- :ref:`Copy-on-Write `
- will become the new default in pandas 3.0. This means than chained indexing will
- never work. As a consequence, the ``SettingWithCopyWarning`` won't be necessary
- anymore.
- See :ref:`this section `
- for more context.
- We recommend turning Copy-on-Write on to leverage the improvements with
-
- ```
- pd.options.mode.copy_on_write = True
- ```
-
- even before pandas 3.0 is available.
-
-When you use chained indexing, the order and type of the indexing operation
-partially determine whether the result is a slice into the original object, or
-a copy of the slice.
-
-pandas has the ``SettingWithCopyWarning`` because assigning to a copy of a
-slice is frequently not intentional, but a mistake caused by chained indexing
-returning a copy where a slice was expected.
-
-If you would like pandas to be more or less trusting about assignment to a
-chained indexing expression, you can set the :ref:`option `
-``mode.chained_assignment`` to one of these values:
-
-* ``'warn'``, the default, means a ``SettingWithCopyWarning`` is printed.
-* ``'raise'`` means pandas will raise a ``SettingWithCopyError``
- you have to deal with.
-* ``None`` will suppress the warnings entirely.
-
-.. ipython:: python
- :okwarning:
-
- dfb = pd.DataFrame({'a': ['one', 'one', 'two',
- 'three', 'two', 'one', 'six'],
- 'c': np.arange(7)})
-
- # This will show the SettingWithCopyWarning
- # but the frame values will be set
- dfb['c'][dfb['a'].str.startswith('o')] = 42
-
-This however is operating on a copy and will not work.
-
-.. ipython:: python
- :okwarning:
- :okexcept:
-
- with pd.option_context('mode.chained_assignment','warn'):
- dfb[dfb['a'].str.startswith('o')]['c'] = 42
-
-A chained assignment can also crop up in setting in a mixed dtype frame.
-
-.. note::
-
- These setting rules apply to all of ``.loc/.iloc``.
-
-The following is the recommended access method using ``.loc`` for multiple items (using ``mask``) and a single item using a fixed index:
-
-.. ipython:: python
-
- dfc = pd.DataFrame({'a': ['one', 'one', 'two',
- 'three', 'two', 'one', 'six'],
- 'c': np.arange(7)})
- dfd = dfc.copy()
- # Setting multiple items using a mask
- mask = dfd['a'].str.startswith('o')
- dfd.loc[mask, 'c'] = 42
- dfd
-
- # Setting a single item
- dfd = dfc.copy()
- dfd.loc[2, 'a'] = 11
- dfd
-
-The following *can* work at times, but it is not guaranteed to, and therefore should be avoided:
-
-.. ipython:: python
- :okwarning:
-
- dfd = dfc.copy()
- dfd['a'][2] = 111
- dfd
-
-Last, the subsequent example will **not** work at all, and so should be avoided:
-
-.. ipython:: python
- :okwarning:
- :okexcept:
-
- with pd.option_context('mode.chained_assignment','raise'):
- dfd.loc[0]['a'] = 1111
-
-.. warning::
-
- The chained assignment warnings / exceptions are aiming to inform the user of a possibly invalid
- assignment. There may be false positives; situations where a chained assignment is inadvertently
- reported.
+:ref:`Copy-on-Write ` is the new default with pandas 3.0.
+This means that chained indexing will never work.
+See :ref:`this section `
+for more context.
diff --git a/doc/source/user_guide/integer_na.rst b/doc/source/user_guide/integer_na.rst
index 1a727cd78af09..8d35d1583d3bd 100644
--- a/doc/source/user_guide/integer_na.rst
+++ b/doc/source/user_guide/integer_na.rst
@@ -84,6 +84,19 @@ with the dtype.
In the future, we may provide an option for :class:`Series` to infer a
nullable-integer dtype.
+If you create a column of ``NA`` values (for example to fill them later)
+with ``df['new_col'] = pd.NA``, the ``dtype`` would be set to ``object`` in the
+new column. The performance on this column will be worse than with
+the appropriate type. It's better to use
+``df['new_col'] = pd.Series(pd.NA, dtype="Int64")``
+(or another ``dtype`` that supports ``NA``).
+
+.. ipython:: python
+
+ df = pd.DataFrame()
+ df['objects'] = pd.NA
+ df.dtypes
+
Operations
----------
@@ -134,7 +147,7 @@ Reduction and groupby operations such as :meth:`~DataFrame.sum` work as well.
df.sum()
df.groupby("B").A.sum()
-Scalar NA Value
+Scalar NA value
---------------
:class:`arrays.IntegerArray` uses :attr:`pandas.NA` as its scalar
diff --git a/doc/source/user_guide/io.rst b/doc/source/user_guide/io.rst
index 6148086452d54..2a7cab701eecf 100644
--- a/doc/source/user_guide/io.rst
+++ b/doc/source/user_guide/io.rst
@@ -16,27 +16,26 @@ The pandas I/O API is a set of top level ``reader`` functions accessed like
.. csv-table::
:header: "Format Type", "Data Description", "Reader", "Writer"
:widths: 30, 100, 60, 60
- :delim: ;
-
- text;`CSV `__;:ref:`read_csv`;:ref:`to_csv`
- text;Fixed-Width Text File;:ref:`read_fwf`
- text;`JSON `__;:ref:`read_json`;:ref:`to_json`
- text;`HTML `__;:ref:`read_html`;:ref:`to_html`
- text;`LaTeX `__;;:ref:`Styler.to_latex`
- text;`XML `__;:ref:`read_xml`;:ref:`to_xml`
- text; Local clipboard;:ref:`read_clipboard`;:ref:`to_clipboard`
- binary;`MS Excel `__;:ref:`read_excel`;:ref:`to_excel`
- binary;`OpenDocument `__;:ref:`read_excel`;
- binary;`HDF5 Format `__;:ref:`read_hdf`;:ref:`to_hdf`
- binary;`Feather Format `__;:ref:`read_feather`;:ref:`to_feather`
- binary;`Parquet Format `__;:ref:`read_parquet`;:ref:`to_parquet`
- binary;`ORC Format