8000 BUG: nanoseconds and reso in dateutil paths by jbrockmendel · Pull Request #56051 · pandas-dev/pandas · GitHub
  • [go: up one dir, main page]

    Skip to content

    Conversation

    @jbrockmendel
    Copy link
    Member
    @jbrockmendel jbrockmendel commented Nov 18, 2023

    Perf impact is pretty negligible compared to the cost of going through dateutil:

    import pandas as pd
    import numpy as np
    
    dtstr = "2016/01/02 03:04:05.001000 UTC"
    
    %timeit pd.Timestamp(dtstr)
    100 µs ± 3.05 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)  # <- PR
    109 µs ± 10.1 µs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)  # <- main
    
    vals = np.array([dtstr] * 10**5, dtype=object)
    %timeit pd.to_datetime(vals)
    8.37 ms ± 134 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)  # <- PR
    9.1 ms ± 597 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)  # <- main
    
    %timeit pd.to_datetime(vals, format="mixed")
    8.29 ms ± 236 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)  # <- PR
    8.13 ms ± 63.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)  # <- main
    

    return ret


    cdef object _reso_pattern = re.compile(r"\d:\d{2}:\d{2}\.(?P<frac>\d+)")
    Copy link
    Member

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    Are the first \ds always guaranteed to be separate by :?

    Copy link
    Member Author

    Choose a reason for hiding this comment

    The reason will be displayed to describe this comment to others. Learn more.

    dateutil supports some really weird formats (@MarcoGorelli and i have discussed moving away from using it at all) so i dont know. but i think this covers the vast majority of cases we care about

    @mroeschke mroeschke added the Non-Nano datetime64/timedelta64 with non-nanosecond resolution label Nov 19, 2023
    @mroeschke mroeschke added this to the 2.2 milestone Nov 20, 2023
    @mroeschke mroeschke merged commit 92fa9ca into pandas-dev:main Nov 20, 2023
    @mroeschke
    Copy link
    Member

    Thanks @jbrockmendel

    @jbrockmendel jbrockmendel deleted the bug-ts-unit branch November 20, 2023 17:56
    phofl pushed a commit to phofl/pandas that referenced this pull request Nov 21, 2023
    * BUG: nanoseconds and reso in dateutil paths
    
    * GH ref
    Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

    Labels

    Non-Nano datetime64/timedelta64 with non-nanosecond resolution

    Projects

    None yet

    Development

    Successfully merging this pull request may close these issues.

    BUG: inferred Timestamp unit with dateutil paths

    2 participants

    0