8000 Merge remote-tracking branch 'upstream/master' into str_cat_docs · pandas-dev/pandas@a381c95 · GitHub
[go: up one dir, main page]

Skip to content

Commit a381c95

Browse files
committed
Merge remote-tracking branch 'upstream/master' into str_cat_docs
2 parents ccf14a3 + 0039158 commit a381c95

File tree

120 files changed

+2495
-1833
lines changed
  • dtypes
  • indexes
  • internals
  • reshape
  • io
  • plotting
  • tests
  • util
  • Some content is hidden

    Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

    120 files changed

    +2495
    -1833
    lines changed

    doc/source/timeseries.rst

    Lines changed: 9 additions & 5 deletions
    Original file line numberDiff line numberDiff line change
    @@ -2351,9 +2351,11 @@ A DST transition may also shift the local time ahead by 1 hour creating nonexist
    23512351
    local times. The behavior of localizing a timeseries with nonexistent times
    23522352
    can be controlled by the ``nonexistent`` argument. The following options are available:
    23532353

    2354-
    * ``raise``: Raises a ``pytz.NonExistentTimeError`` (the default behavior)
    2355-
    * ``NaT``: Replaces nonexistent times with ``NaT``
    2356-
    * ``shift``: Shifts nonexistent times forward to the closest real time
    2354+
    * ``'raise'``: Raises a ``pytz.NonExistentTimeError`` (the default behavior)
    2355+
    * ``'NaT'``: Replaces nonexistent times with ``NaT``
    2356+
    * ``'shift_forward'``: Shifts nonexistent times forward to the closest real time
    2357+
    * ``'shift_backward'``: Shifts nonexistent times backward to the closest real time
    2358+
    * timedelta object: Shifts nonexistent times by the timedelta duration
    23572359

    23582360
    .. ipython:: python
    23592361
    @@ -2367,12 +2369,14 @@ Localization of nonexistent times will raise an error by default.
    23672369
    In [2]: dti.tz_localize('Europe/Warsaw')
    23682370
    NonExistentTimeError: 2015-03-29 02:30:00
    23692371
    2370-
    Transform nonexistent times to ``NaT`` or the closest real time forward in time.
    2372+
    Transform nonexistent times to ``NaT`` or shift the times.
    23712373

    23722374
    .. ipython:: python
    23732375
    23742376
    dti
    2375-
    dti.tz_localize('Europe/Warsaw', nonexistent='shift')
    2377+
    dti.tz_localize('Europe/Warsaw', nonexistent='shift_forward')
    2378+
    dti.tz_localize('Europe/Warsaw', nonexistent='shift_backward')
    2379+
    dti.tz_localize('Europe/Warsaw', nonexistent=pd.Timedelta(1, unit='H'))
    23762380
    dti.tz_localize('Europe/Warsaw', nonexistent='NaT')
    23772381
    23782382

    doc/source/whatsnew/v0.24.0.rst

    Lines changed: 7 additions & 1 deletion
    Original file line numberDiff line numberDiff line change
    @@ -407,7 +407,7 @@ Other Enhancements
    407407
    - Added :meth:`Interval.overlaps`, :meth:`IntervalArray.overlaps`, and :meth:`IntervalIndex.overlaps` for determining overlaps between interval-like objects (:issue:`21998`)
    408408
    - :func:`read_fwf` now accepts keyword ``infer_nrows`` (:issue:`15138`).
    409409
    - :func:`~DataFrame.to_parquet` now supports writing a ``DataFrame`` as a directory of parquet files partitioned by a subset of the columns when ``engine = 'pyarrow'`` (:issue:`23283`)
    410-
    - :meth:`Timestamp.tz_localize`, :meth:`DatetimeIndex.tz_localize`, and :meth:`Series.tz_localize` have gained the ``nonexistent`` argument for alternative handling of nonexistent times. See :ref:`timeseries.timezone_nonexistent` (:issue:`8917`)
    410+
    - :meth:`Timestamp.tz_localize`, :meth:`DatetimeIndex.tz_localize`, and :meth:`Series.tz_localize` have gained the ``nonexistent`` argument for alternative handling of nonexistent times. See :ref:`timeseries.timezone_nonexistent` (:issue:`8917`, :issue:`24466`)
    411411
    - :meth:`Index.difference` now has an optional ``sort`` parameter to specify whether the results should be sorted if possible (:issue:`17839`)
    412412
    - :meth:`read_excel()` now accepts ``usecols`` as a list of column names or callable (:issue:`18273`)
    413413
    - :meth:`MultiIndex.to_flat_index` has been added to flatten multiple levels into a single-level :class:`Index` object.
    @@ -430,7 +430,9 @@ Backwards incompatible API changes
    430430
    - ``max_rows`` and ``max_cols`` parameters removed from :class:`HTMLFormatter` since truncation is handled by :class:`DataFrameFormatter` (:issue:`23818`)
    431431
    - :func:`read_csv` will now raise a ``ValueError`` if a column with missing values is declared as having dtype ``bool`` (:issue:`20591`)
    432432
    - The column order of the resultant :class:`DataFrame` from :meth:`MultiIndex.to_frame` is now guaranteed to match the :attr:`MultiIndex.names` order. (:issue:`22420`)
    433+
    - Incorrectly passing a :class:`DatetimeIndex` to :meth:`MultiIndex.from_tuples`, rather than a sequence of tuples, now raises a ``TypeError`` rather than a ``ValueError`` (:issue:`24024`)
    433434
    - :func:`pd.offsets.generate_range` argument ``time_rule`` has been removed; use ``offset`` instead (:issue:`24157`)
    435+
    - In 0.23.x, pandas would raise a ``ValueError`` on a merge of a numeric column (e.g. ``int`` dtyped column) and an ``object`` dtyped column (:issue:`9780`). We have re-enabled the ability to merge ``object`` and other dtypes (:issue:`21681`)
    434436

    435437
    Percentage change on groupby
    436438
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    @@ -1368,6 +1370,7 @@ Datetimelike
    13681370
    - Bug in :class:`DataFrame` with ``datetime64[ns]`` dtype subtracting ``np.datetime64`` object with non-nanosecond unit failing to convert to nanoseconds (:issue:`18874`, :issue:`22163`)
    13691371
    - Bug in :class:`DataFrame` comparisons against ``Timestamp``-like objects failing to raise ``TypeError`` for inequality checks with mismatched types (:issue:`8932`, :issue:`22163`)
    13701372
    - Bug in :class:`DataFrame` with mixed dtypes including ``datetime64[ns]`` incorrectly raising ``TypeError`` on equality comparisons (:issue:`13128`, :issue:`22163`)
    1373+
    - Bug in :attr:`DataFrame.values` returning a :class:`DatetimeIndex` for a single-column ``DataFrame`` with tz-aware datetime values. Now a 2-D :class:`numpy.ndarray` of :class:`Timestamp` objects is returned (:issue:`24024`)
    13711374
    - Bug in :meth:`DataFrame.eq` comparison against ``NaT`` incorrectly returning ``True`` or ``NaN`` (:issue:`15697`, :issue:`22163`)
    13721375
    - Bug in :class:`DatetimeIndex` subtraction that incorrectly failed to raise ``OverflowError`` (:issue:`22492`, :issue:`22508`)
    13731376
    - Bug in :class:`DatetimeIndex` incorrectly allowing indexing with ``Timedelta`` object (:issue:`20464`)
    @@ -1384,6 +1387,7 @@ Datetimelike
    13841387
    - Bug in :func:`period_range` ignoring the frequency of ``start`` and ``end`` when those are provided as :class:`Period` objects (:issue:`20535`).
    13851388
    - Bug in :class:`PeriodIndex` with attribute ``freq.n`` greater than 1 where adding a :class:`DateOffset` object would return incorrect results (:issue:`23215`)
    13861389
    - Bug in :class:`Series` that interpreted string indices as lists of characters when setting datetimelike values (:issue:`23451`)
    1390+
    - Bug in :class:`DataFrame` when creating a new column from an ndarray of :class:`Timestamp` objects with timezones creating an object-dtype column, rather than datetime with timezone (:issue:`23932`)
    13871391
    - Bug in :class:`Timestamp` constructor which would drop the frequency of an input :class:`Timestamp` (:issue:`22311`)
    13881392
    - Bug in :class:`DatetimeIndex` where calling ``np.array(dtindex, dtype=object)`` would incorrectly return an array of ``long`` objects (:issue:`23524`)
    13891393
    - Bug in :class:`Index` where passing a timezone-aware :class:`DatetimeIndex` and `dtype=object` would incorrectly raise a ``ValueError`` (:issue:`23524`)
    @@ -1596,11 +1600,13 @@ Notice how we now instead output ``np.nan`` itself instead of a stringified form
    15961600
    - :func:`read_sas()` will parse numbers in sas7bdat-files that have width less than 8 bytes correctly. (:issue:`21616`)
    15971601
    - :func:`read_sas()` will correctly parse sas7bdat files with many columns (:issue:`22628`)
    15981602
    - :func:`read_sas()` will correctly parse sas7bdat files with data page types having also bit 7 set (so page type is 128 + 256 = 384) (:issue:`16615`)
    1603+
    - Bug in :func:`read_sas()` in which an incorrect error was raised on an invalid file format. (:issue:`24548`)
    15991604
    - Bug in :meth:`detect_client_encoding` where potential ``IOError`` goes unhandled 10000 when importing in a mod_wsgi process due to restricted access to stdout. (:issue:`21552`)
    16001605
    - Bug in :func:`to_html()` with ``index=False`` misses truncation indicators (...) on truncated DataFrame (:issue:`15019`, :issue:`22783`)
    16011606
    - Bug in :func:`to_html()` with ``index=False`` when both columns and row index are ``MultiIndex`` (:issue:`22579`)
    16021607
    - Bug in :func:`to_html()` with ``index_names=False`` displaying index name (:issue:`22747`)
    16031608
    - Bug in :func:`to_html()` with ``header=False`` not displaying row index names (:issue:`23788`)
    1609+
    - Bug in :func:`to_html()` with ``sparsify=False`` that caused it to raise ``TypeError`` (:issue:`22887`)
    16041610
    - Bug in :func:`DataFrame.to_string()` that broke column alignment when ``index=False`` and width of first column's values is greater than the width of first column's header (:issue:`16839`, :issue:`13032`)
    16051611
    - Bug in :func:`DataFrame.to_string()` that caused representations of :class:`DataFrame` to not take up the whole window (:issue:`22984`)
    16061612
    - Bug in :func:`DataFrame.to_csv` where a single level MultiIndex incorrectly wrote a tuple. Now just the value of the index is written (:issue:`19589`).

    pandas/_libs/lib.pyx

    Lines changed: 5 additions & 4 deletions
    Original file line numberDiff line numberDiff line change
    @@ -623,7 +623,7 @@ def clean_index_list(obj: list):
    623623
    return obj, all_arrays
    624624

    625625
    # don't force numpy coerce with nan's
    626-
    inferred = infer_dtype(obj)
    626+
    inferred = infer_dtype(obj, skipna=False)
    627627
    if inferred in ['string', 'bytes', 'unicode', 'mixed', 'mixed-integer']:
    628628
    return np.asarray(obj, dtype=object), 0
    629629
    elif inferred in ['integer']:
    @@ -1210,6 +1210,10 @@ def infer_dtype(value: object, skipna: bool=False) -> str:
    12101210
    values = construct_1d_object_array_from_listlike(value)
    12111211

    12121212
    values = getattr(values, 'values', values)
    1213+
    1214+
    # make contiguous
    1215+
    values = values.ravel()
    1216+
    12131217
    if skipna:
    12141218
    values = values[~isnaobj(values)]
    12151219

    @@ -1220,9 +1224,6 @@ def infer_dtype(value: object, skipna: bool=False) -> str:
    12201224
    if values.dtype != np.object_:
    12211225
    values = values.astype('O')
    12221226

    1223-
    # make contiguous
    1224-
    values = values.ravel()
    1225-
    12261227
    n = len(values)
    12271228
    if n == 0:
    12281229
    return 'empty'

    pandas/_libs/src/ujson/python/objToJSON.c

    Lines changed: 7 additions & 2 deletions
    Original file line numberDiff line numberDiff line change
    @@ -228,6 +228,11 @@ static PyObject *get_values(PyObject *obj) {
    228228
    PRINTMARK();
    229229

    230230
    if (values && !PyArray_CheckExact(values)) {
    231+
    232+
    if (PyObject_HasAttrString(values, "to_numpy")) {
    233+
    values = PyObject_CallMethod(values, "to_numpy", NULL);
    234+
    }
    235+
    231236
    if (PyObject_HasAttrString(values, "values")) {
    232237
    PyObject *subvals = get_values(values);
    233238
    PyErr_Clear();
    @@ -279,8 +284,8 @@ static PyObject *get_values(PyObject *obj) {
    279284
    repr = PyString_FromString("<unknown dtype>");
    280285
    }
    281286

    282-
    PyErr_Format(PyExc_ValueError, "%s or %s are not JSON serializable yet",
    283-
    PyString_AS_STRING(repr), PyString_AS_STRING(typeRepr));
    287+
    PyErr_Format(PyExc_ValueError, "%R or %R are not JSON serializable yet",
    288+
    repr, typeRepr);
    284289
    Py_DECREF(repr);
    285290
    Py_DECREF(typeRepr);
    286291

    pandas/_libs/tslibs/conversion.pyx

    Lines changed: 47 additions & 17 deletions
    Original file line numberDiff line numberDiff line change
    @@ -13,7 +13,8 @@ from dateutil.tz import tzutc
    1313
    from datetime import time as datetime_time
    1414
    from cpython.datetime cimport (datetime, tzinfo,
    1515
    PyDateTime_Check, PyDate_Check,
    16-
    PyDateTime_CheckExact, PyDateTime_IMPORT)
    16+
    PyDateTime_CheckExact, PyDateTime_IMPORT,
    17+
    PyDelta_Check)
    1718
    PyDateTime_IMPORT
    1819

    1920
    from pandas._libs.tslibs.ccalendar import DAY_SECONDS, HOUR_SECONDS
    @@ -28,7 +29,8 @@ from pandas._libs.tslibs.np_datetime import OutOfBoundsDatetime
    2829
    from pandas._libs.tslibs.util cimport (
    2930
    is_string_object, is_datetime64_object, is_integer_object, is_float_object)
    3031

    31-
    from pandas._libs.tslibs.timedeltas cimport cast_from_unit
    32+
    from pandas._libs.tslibs.timedeltas cimport (cast_from_unit,
    33+
    delta_to_nanoseconds)
    3234
    from pandas._libs.tslibs.timezones cimport (
    3335
    is_utc, is_tzlocal, is_fixed_offset, get_utcoffset, get_dst_info,
    3436
    get_timezone, maybe_get_tz, tz_compare)
    @@ -868,7 +870,8 @@ def tz_localize_to_utc(ndarray[int64_t] vals, object tz, object ambiguous=None,
    868870
    - bool if True, treat all vals as DST. If False, treat them as non-DST
    869871
    - 'NaT' will return NaT where there are ambiguous times
    870872
    871-
    nonexistent : {None, "NaT", "shift", "raise"}
    873+
    nonexistent : {None, "NaT", "shift_forward", "shift_backward", "raise",
    874+
    timedelta-like}
    872875
    How to handle non-existent times when converting wall times to UTC
    873876
    874877
    .. versionadded:: 0.24.0
    @@ -884,12 +887,14 @@ def tz_localize_to_utc(ndarray[int64_t] vals, object tz, object ambiguous=None,
    884887
    Py_ssize_t delta_idx_offset, delta_idx, pos_left, pos_right
    885888
    int64_t *tdata
    886889
    int64_t v, left, right, val, v_left, v_right, new_local, remaining_mins
    887-
    int64_t HOURS_NS = HOUR_SECONDS * 1000000000
    890+
    int64_t first_delta
    891+
    int64_t HOURS_NS = HOUR_SECONDS * 1000000000, shift_delta = 0
    888892
    ndarray[int64_t] trans, result, result_a, result_b, dst_hours, delta
    889893
    ndarray trans_idx, grp, a_idx, b_idx, one_diff
    890894
    npy_datetimestruct dts
    891895
    bint infer_dst = False, is_dst = False, fill = False
    892-
    bint shift = False, fill_nonexist = False
    896+
    bint shift_forward = False, shift_backward = False
    897+
    bint fill_nonexist = False
    893898
    list trans_grp
    894899
    str stamp
    895900

    @@ -928,11 +933,16 @@ def tz_localize_to_utc(ndarray[int64_t] vals, object tz, object ambiguous=None,
    928933

    929934
    if nonexistent == 'NaT':
    930935
    fill_nonexist = True
    931-
    elif nonexistent == 'shift':
    932-
    shift = True
    933-
    else:
    934-
    assert nonexistent in ('raise', None), ("nonexistent must be one of"
    935-
    " {'NaT', 'raise', 'shift'}")
    936+
    elif nonexistent == 'shift_forward':
    937+
    shift_forward = True
    938+
    elif nonexistent == 'shift_backward':
    939+
    shift_backward = True
    940+
    elif PyDelta_Check(nonexistent):
    941+
    shift_delta = delta_to_nanoseconds(nonexistent)
    942+
    elif nonexistent not in ('raise', None):
    943+
    msg = ("nonexistent must be one of {'NaT', 'raise', 'shift_forward', "
    944+
    "shift_backwards} or a timedelta object")
    945+
    raise ValueError(msg)
    936946

    937947
    trans, deltas, _ = get_dst_info(tz)
    938948

    @@ -1041,15 +1051,35 @@ def tz_localize_to_utc(ndarray[int64_t] vals, object tz, object ambiguous=None,
    10411051
    result[i] = right
    10421052
    else:
    10431053
    # Handle nonexistent times
    1044-
    if shift:
    1045-
    # Shift the nonexistent time forward to the closest existing
    1046-
    # time
    1054+
    if shift_forward or shift_backward or shift_delta != 0:
    1055+
    # Shift the nonexistent time to the closest existing time
    10471056
    remaining_mins = val % HOURS_NS
    1048-
    new_local = val + (HOURS_NS - remaining_mins)
    1057+
    if shift_delta != 0:
    1058+
    # Validate that we don't relocalize on another nonexistent
    1059+
    # time
    1060+
    if -1 < shift_delta + remaining_mins < HOURS_NS:
    1061+
    raise ValueError(
    1062+
    "The provided timedelta will relocalize on a "
    1063+
    "nonexistent time: {}".format(nonexistent)
    1064+
    )
    1065+
    new_local = val + shift_delta
    1066+
    elif shift_forward:
    1067+
    new_local = val + (HOURS_NS - remaining_mins)
    1068+
    else:
    1069+
    # Subtract 1 since the beginning hour is _inclusive_ of
    1070+
    # nonexistent times
    1071+
    new_local = val - remaining_mins - 1
    10491072
    delta_idx = trans.searchsorted(new_local, side='right')
    1050-
    # Need to subtract 1 from the delta_idx if the UTC offset of
    1051-
    # the target tz is greater than 0
    1052-
    delta_idx_offset = int(deltas[0] > 0)
    1073+
    # Shift the delta_idx by if the UTC offset of
    1074+
    # the target tz is greater than 0 and we're moving forward
    1075+
    # or vice versa
    1076+
    first_delta = deltas[0]
    1077+
    if (shift_forward or shift_delta > 0) and first_delta > 0:
    1078+
    delta_idx_offset = 1
    1079+
    elif (shift_backward or shift_delta < 0) and first_delta < 0:
    1080+
    delta_idx_offset = 1
    1081+
    else:
    1082+
    delta_idx_offset = 0
    10531083
    delta_idx = delta_idx - delta_idx_offset
    10541084
    result[i] = new_local - deltas[delta_idx]
    10551085
    elif fill_nonexist:

    pandas/_libs/tslibs/nattype.pyx

    Lines changed: 28 additions & 12 deletions
    Original file line numberDiff line numberDiff line change
    @@ -481,13 +481,17 @@ class NaTType(_NaT):
    481481
    - 'raise' will raise an AmbiguousTimeError for an ambiguous time
    482482
    483483
    .. versionadded:: 0.24.0
    484-
    nonexistent : 'shift', 'NaT', default 'raise'
    484+
    nonexistent : 'shift_forward', 'shift_backward, 'NaT', timedelta,
    485+
    default 'raise'
    485486
    A nonexistent time does not exist in a particular timezone
    486487
    where clocks moved forward due to DST.
    487488
    488-
    - 'shift' will shift the nonexistent time forward to the closest
    489-
    existing time
    489+
    - 'shift_forward' will shift the nonexistent time forward to the
    490+
    closest existing time
    491+
    - 'shift_backward' will shift the nonexistent time backward to the
    492+
    closest existing time
    490493
    - 'NaT' will return NaT where there are nonexistent times
    494+
    - timedelta objects will shift nonexistent times by the timedelta
    491495
    - 'raise' will raise an NonExistentTimeError if there are
    492496
    nonexistent times
    493497
    @@ -515,13 +519,17 @@ class NaTType(_NaT):
    515519
    - 'raise' will raise an AmbiguousTimeError for an ambiguous time
    516520
    517521
    .. versionadded:: 0.24.0
    518-
    nonexistent : 'shift', 'NaT', default 'raise'
    522+
    nonexistent : 'shift_forward', 'shift_backward, 'NaT', timedelta,
    523+
    default 'raise'
    519524
    A nonexistent time does not exist in a particular timezone
    520525
    where clocks moved forward due to DST.
    521526
    522-
    - 'shift' will shift the nonexistent time forward to the closest
    523-
    existing time
    527+
    - 'shift_forward' will shift the nonexistent time forward to the
    528+
    closest existing time
    529+
    - 'shift_backward' will shift the nonexistent time backward to the
    530+
    closest existing time
    524531
    - 'NaT' will return NaT where there are nonexistent times
    532+
    - timedelta objects will shift nonexistent times by the timedelta
    525533
    - 'raise' will raise an NonExistentTimeError if there are
    526534
    nonexistent times
    527535
    @@ -545,13 +553,17 @@ class NaTType(_NaT):
    545553
    - 'raise' will raise an AmbiguousTimeError for an ambiguous time
    546554
    547555
    .. versionadded:: 0.24.0
    548-
    nonexistent : 'shift', 'NaT', default 'raise'
    556+
    nonexistent : 'shift_forward', 'shift_backward, 'NaT', timedelta,
    557+
    default 'raise'
    549558
    A nonexistent time does not exist in a particular timezone
    550559
    where clocks moved forward due to DST.
    551560
    552-
    - 'shift' will shift the nonexistent time forward to the closest
    553-
    existing time
    561+
    - 'shift_forward' will shift the nonexistent time forward to the
    562+
    closest existing time
    563+
    - 'shift_backward' will shift the nonexistent time backward to the
    564+
    closest existing time
    554565
    - 'NaT' will return NaT where there are nonexistent times
    566+
    - timedelta objects will shift nonexistent times by the timedelta
    555567
    - 'raise' will raise an NonExistentTimeError if there are
    556568
    nonexistent times
    557569
    @@ -605,13 +617,17 @@ class NaTType(_NaT):
    605617
    - 'NaT' will return NaT for an ambiguous time
    606618
    - 'raise' will raise an AmbiguousTimeError for an ambiguous time
    607619
    608-
    nonexistent : 'shift', 'NaT', default 'raise'
    620+
    nonexistent : 'shift_forward', 'shift_backward, 'NaT', timedelta,
    621+
    default 'raise'
    609622
    A nonexistent time does not exist in a particular timezone
    610623
    where clocks moved forward due to DST.
    611624
    612-
    - 'shift' will shift the nonexistent time forward to the closest
    613-
    existing time
    625+
    - 'shift_forward' will shift the nonexistent time forward to the
    626+
    closest existing time
    627+
    - 'shift_backward' will shift the nonexistent time backward to the
    628+
    closest existing time
    614629
    - 'NaT' will return NaT where there are nonexistent times
    630+
    - timedelta objects will shift nonexistent times by the timedelta
    615631
    - 'raise' will raise an NonExistentTimeError if there are
    616632
    nonexistent times
    617633

    0 commit comments

    Comments
     (0)
    0