8000 Merge remote-tracking branch 'upstream/master' into io-parquet-multii… · pandas-dev/pandas@c974259 · GitHub
[go: up one dir, main page]

Skip to content

Commit c974259

Browse files
committed
Merge remote-tracking branch 'upstream/master' into io-parquet-multiindex
2 parents a46e46f + 2067d7e commit c974259

36 files changed

+230
-171
lines changed

doc/source/user_guide/groupby.rst

Lines changed: 3 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -87,11 +87,9 @@ The mapping can be specified many different ways:
8787
* A Python function, to be called on each of the axis labels.
8888
* A list or NumPy array of the same length as the selected axis.
8989
* A dict or ``Series``, providing a ``label -> group name`` mapping.
90-
* For ``DataFrame`` objects, a string indicating a column to be used to group.
91-
Of course ``df.groupby('A')`` is just syntactic sugar for
92-
``df.groupby(df['A'])``, but it makes life simpler.
93-
* For ``DataFrame`` objects, a string indicating an index level to be used to
94-
group.
90+
* For ``DataFrame`` objects, a string indicating either a column name or
91+
an index level name to be used to group.
92+
* ``df.groupby('A')`` is just syntactic sugar for ``df.groupby(df['A'])``.
9593
* A list of any of the above things.
9694

9795
Collectively we refer to the grouping objects as the **keys**. For example,

doc/source/user_guide/indexing.rst

Lines changed: 12 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -313,8 +313,10 @@ Selection by label
313313
314314
.. warning::
315315

316-
Starting in 0.21.0, pandas will show a ``FutureWarning`` if indexing with a list with missing labels. In the future
317-
this will raise a ``KeyError``. See :ref:`list-like Using loc with missing keys in a list is Deprecated <indexing.deprecate_loc_reindex_listlike>`.
316+
.. versionchanged:: 1.0.0
317+
318+
Pandas will raise a ``KeyError`` if indexing with a list with missing labels. See :ref:`list-like Using loc with
319+
missing keys in a list is Deprecated <indexing.deprecate_loc_reindex_listlike>`.
318320

319321
pandas provides a suite of methods in order to have **purely label based indexing**. This is a strict inclusion based protocol.
320322
Every label asked for must be in the index, or a ``KeyError`` will be raised.
@@ -578,8 +580,9 @@ IX indexer is deprecated
578580

579581
.. warning::
580582

581-
Starting in 0.20.0, the ``.ix`` indexer is deprecated, in favor of the more strict ``.iloc``
582-
and ``.loc`` indexers.
583+
.. versionchanged:: 1.0.0
584+
585+
The ``.ix`` indexer was removed, in favor of the more strict ``.iloc`` and ``.loc`` indexers.
583586

584587
``.ix`` offers a lot of magic on the inference of what the user wants to do. To wit, ``.ix`` can decide
585588
to index *positionally* OR via *labels* depending on the data type of the index. This has caused quite a
@@ -636,11 +639,13 @@ Indexing with list with missing labels is deprecated
636639

637640
.. warning::
638641

639-
Starting in 0.21.0, using ``.loc`` or ``[]`` with a list with one or more missing labels, is deprecated, in favor of ``.reindex``.
642+
.. versionchanged:: 1.0.0
643+
644+
Using ``.loc`` or ``[]`` with a list with one or more missing labels will no longer reindex, in favor of ``.reindex``.
640645

641646
In prior versions, using ``.loc[list-of-labels]`` would work as long as *at least 1* of the keys was found (otherwise it
642-
would raise a ``KeyError``). This behavior is deprecated and will show a warning message pointing to this section. The
643-
recommended alternative is to use ``.reindex()``.
647+
would raise a ``KeyError``). This behavior was changed and will now raise a ``KeyError`` if at least one label is missing.
648+
The recommended alternative is to use ``.reindex()``.
644649

645650
For example.
646651

doc/source/user_guide/io.rst

Lines changed: 3 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -3024,19 +3024,12 @@ It is often the case that users will insert columns to do temporary computations
30243024
in Excel and you may not want to read in those columns. ``read_excel`` takes
30253025
a ``usecols`` keyword to allow you to specify a subset of columns to parse.
30263026

3027-
.. deprecated:: 0.24.0
3027+
.. versionchanged:: 1.0.0
30283028

3029-
Passing in an integer for ``usecols`` has been deprecated. Please pass in a list
3029+
Passing in an integer for ``usecols`` will no longer work. Please pass in a list
30303030
of ints from 0 to ``usecols`` inclusive instead.
30313031

3032-
If ``usecols`` is an integer, then it is assumed to indicate the last column
3033-
to be parsed.
3034-
3035-
.. code-block:: python
3036-
3037-
pd.read_excel('path_to_file.xls', 'Sheet1', usecols=2)
3038-
3039-
You can also specify a comma-delimited set of Excel columns and ranges as a string:
3032+
You can specify a comma-delimited set of Excel columns and ranges as a string:
30403033

30413034
.. code-block:: python
30423035

doc/source/user_guide/timeseries.rst

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -327,11 +327,11 @@ which can be specified. These are computed from the starting point specified by
327327
that was discussed :ref:`above<timeseries.converting.format>`). The
328328
available units are listed on the documentation for :func:`pandas.to_datetime`.
329329

330+
.. versionchanged:: 1.0.0
331+
330332
Constructing a :class:`Timestamp` or :class:`DatetimeIndex` with an epoch timestamp
331-
with the ``tz`` argument specified will currently localize the epoch timestamps to UTC
332-
first then convert the result to the specified time zone. However, this behavior
333-
is :ref:`deprecated <whatsnew_0240.deprecations.integer_tz>`, and if you have
334-
epochs in wall time in another timezone, it is recommended to read the epochs
333+
with the ``tz`` argument specified will raise a ValueError. If you have
334+
epochs in wall time in another timezone, you can read the epochs
335335
as timezone-naive timestamps and then localize to the appropriate timezone:
336336

337337
.. ipython:: python

doc/source/whatsnew/v1.2.0.rst

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -100,8 +100,8 @@ For example:
100100

101101
Other enhancements
102102
^^^^^^^^^^^^^^^^^^
103-
104103
- Added :meth:`~DataFrame.set_flags` for setting table-wide flags on a ``Series`` or ``DataFrame`` (:issue:`28394`)
104+
- :meth:`DataFrame.applymap` now supports ``na_action`` (:issue:`23803`)
105105
- :class:`Index` with object dtype supports division and multiplication (:issue:`34160`)
106106
- :meth:`DataFrame.explode` and :meth:`Series.explode` now support exploding of sets (:issue:`35614`)
107107
-
@@ -335,7 +335,7 @@ Sparse
335335
ExtensionArray
336336
^^^^^^^^^^^^^^
337337

338-
-
338+
- Fixed Bug where :class:`DataFrame` column set to scalar extension type via a dict instantion was considered an object type rather than the extension type (:issue:`35965`)
339339
-
340340

341341

pandas/_libs/index.pyx

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -260,7 +260,7 @@ cdef class IndexEngine:
260260
def get_indexer_non_unique(self, targets):
261261
"""
262262
Return an indexer suitable for taking from a non unique index
263-
return the labels in the same order ast the target
263+< 1C09 div class="diff-text-inner"> return the labels in the same order as the target
264264
and a missing indexer into the targets (which correspond
265265
to the -1 indices in the results
266266
"""

pandas/_libs/lib.pyx

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -2377,14 +2377,17 @@ def map_infer_mask(ndarray arr, object f, const uint8_t[:] mask, bint convert=Tr
23772377

23782378
@cython.boundscheck(False)
23792379
@cython.wraparound(False)
2380-
def map_infer(ndarray arr, object f, bint convert=True):
2380+
def map_infer(ndarray arr, object f, bint convert=True, bint ignore_na=False):
23812381
"""
23822382
Substitute for np.vectorize with pandas-friendly dtype inference.
23832383
23842384
Parameters
23852385
----------
23862386
arr : ndarray
23872387
f : function
2388+
convert : bint
2389+
ignore_na : bint
2390+
If True, NA values will not have f applied
23882391
23892392
Returns
23902393
-------
@@ -2398,6 +2401,9 @@ def map_infer(ndarray arr, object f, bint convert=True):
23982401
n = len(arr)
23992402
result = np.empty(n, dtype=object)
24002403
for i in range(n):
2404+
if ignore_na and checknull(arr[i]):
2405+
result[i] = arr[i]
2406+
continue
24012407
val = f(arr[i])
24022408

24032409
if cnp.PyArray_IsZeroDim(val):

pandas/compat/numpy/function.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -21,7 +21,7 @@
2121
from distutils.version import LooseVersion
2222
from typing import Any, Dict, Optional, Union
2323

24-
from numpy import __version__ as _np_version, ndarray
24+
from numpy import __version__, ndarray
2525

2626
from pandas._libs.lib import is_bool, is_integer
2727
from pandas.errors import UnsupportedFunctionCall
@@ -122,7 +122,7 @@ def validate_argmax_with_skipna(skipna, args, kwargs):
122122
ARGSORT_DEFAULTS["kind"] = "quicksort"
123123
ARGSORT_DEFAULTS["order"] = None
124124

125-
if LooseVersion(_np_version) >= LooseVersion("1.17.0"):
125+
if LooseVersion(__version__) >= LooseVersion("1.17.0"):
126126
# GH-26361. NumPy added radix sort and changed default to None.
127127
ARGSORT_DEFAULTS["kind"] = None
128128

pandas/core/algorithms.py

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -262,7 +262,7 @@ def _get_values_for_rank(values):
262262
return values
263263

264264

265-
def _get_data_algo(values):
265+
def get_data_algo(values):
266266
values = _get_values_for_rank(values)
267267

268268
ndtype = _check_object_for_strings(values)
@@ -491,7 +491,7 @@ def factorize_array(
491491
codes : ndarray
492492
uniques : ndarray
493493
"""
494-
hash_klass, values = _get_data_algo(values)
494+
hash_klass, values = get_data_algo(values)
495495

496496
table = hash_klass(size_hint or len(values))
497497
uniques, codes = table.factorize(
@@ -2086,7 +2086,7 @@ def sort_mixed(values):
20862086

20872087
if sorter is None:
20882088
# mixed types
2089-
hash_klass, values = _get_data_algo(values)
2089+
hash_klass, values = get_data_algo(values)
20902090
t = hash_klass(len(values))
20912091
t.map_locations(values)
20922092
sorter = ensure_platform_int(t.lookup(ordered))

pandas/core/arrays/_mixins.py

Lines changed: 9 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -6,7 +6,7 @@
66
from pandas.errors import AbstractMethodError
77
from pandas.util._decorators import cache_readonly, doc
88

9-
from pandas.core.algorithms import searchsorted, take, unique
9+
from pandas.core.algorithms import take, unique
1010
from pandas.core.array_algos.transforms import shift
1111
from pandas.core.arrays.base import ExtensionArray
1212

@@ -102,6 +102,9 @@ def T(self: _T) -> _T:
102102

103103
# ------------------------------------------------------------------------
104104

105+
def _values_for_argsort(self):
106+
return self._ndarray
107+
105108
def copy(self: _T) -> _T:
106109
new_data = self._ndarray.copy()
107110
return self._from_backing_data(new_data)
@@ -135,7 +138,11 @@ def _concat_same_type(cls, to_concat, axis: int = 0):
135138

136139
@doc(ExtensionArray.searchsorted)
137140
def searchsorted(self, value, side="left", sorter=None):
138-
return searchsorted(self._ndarray, value, side=side, sorter=sorter)
141+
value = self._validate_searchsorted_value(value)
142+
return self._ndarray.searchsorted(value, side=side, sorter=sorter)
143+
144+
def _validate_searchsorted_value(self, value):
145+
return value
139146

140147
@doc(ExtensionArray.shift)
141148
def shift(self, periods=1, fill_value=None, axis=0):

0 commit comments

Comments
 (0)
0