8000
We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
There was an error while loading. Please reload this page.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
One more step towards getting tseries.frequencies into cython (and getting rid of non-cython dependencies of tslib/period). Also moves towards de-coupling period from tslib.
tseries.frequencies
git diff upstream/master -u -- "*.py" | flake8 --diff
Sorry, something went wrong.
implement tslibs.resolution
c8cb176
flake cleanup
757321d
Merge branch 'master' of https://github.com/pandas-dev/pandas into ts…
98e612f
…libs-resolution
4b6080d
@jbrockmendel : I presume this is largely copy+paste? Also, a quick asv check might be good just to make sure that nothing slowed down.
asv
I presume this is largely copy+paste?
Yes.
Also, a quick asv check might be good just to make sure that nothing slowed down.
Will do.
Merging #18034 into master will decrease coverage by 0.07%. The diff coverage is 76.92%.
0.07%
76.92%
@@ Coverage Diff @@ ## master #18034 +/- ## ========================================== - Coverage 91.43% 91.36% -0.08% ========================================== Files 164 164 Lines 50090 49869 -221 ========================================== - Hits 45798 45561 -237 - Misses 4292 4308 +16
89.16% <76.92%> (-0.07%)
40.23% <53.84%> (-0.2%)
94.09% <100%> (-1.93%)
92.9% <60%> (+0.01%)
63.44% <66.66%> (-1.77%)
25% <0%> (-58.34%)
97.8% <0%> (-0.1%)
95.48% <0%> (+0.09%)
Continue to review full report at Codecov.
Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update 9e3ad63...7b374d3. Read the comment docs.
Δ = absolute <relative> (impact)
ø = not affected
? = missing data
taskset 3 asv continuous -f 1.1 -E virtualenv master HEAD -b timeseries [...] before after ratio [6a945180] [4b6080da] + 42.3±0.07μs 47.1±0.3μs 1.12 timeseries.SemiMonthOffset.time_begin_apply + 5.75±0μs 6.41±0.01μs 1.11 timeseries.DatetimeIndex.time_timestamp_tzinfo_cons - 1.10±0ms 997±1μs 0.91 timeseries.ResampleSeries.time_1min_5min_mean - 20.7±0.05μs 18.7±0.05μs 0.90 timeseries.Offsets.time_custom_bday_apply_dt64 - 153±3μs 133±0.5μs 0.87 timeseries.Offsets.time_custom_bmonthbegin_decr_n - 33.9±0.3μs 28.6±0.1μs 0.84 timeseries.Offsets.time_custom_bday_cal_decr - 6.15±0.5ms 5.19±0.01ms 0.84 timeseries.DatetimeIndex.time_infer_freq_daily - 14.7±1ms 11.9±0ms 0.81 timeseries.DatetimeIndex.time_infer_freq_none - 18.7±0.04μs 14.6±0.04μs 0.78 timeseries.Offsets.time_custom_bday_apply - 16.3±0.8ms 10.6±0.04ms 0.65 timeseries.AsOfDataFrame.time_asof_single - 140ms 40.4±1ms 0.29 timeseries.DatetimeIndex.time_dti_factorize - 159±5ms 37.0±3ms 0.23 timeseries.DatetimeIndex.time_dti_tz_factorize - 1.96s 270±7ms 0.14 timeseries.AsOfDataFrame.time_asof_nan - 1.95s 265±4ms 0.14 timeseries.AsOfDataFrame.time_asof
your benchmarks look odd
No argument there. I'll rebase and re-run.
58ced56
taskset 8 asv continuous -f 1.1 -E virtualenv master HEAD -b timeseries [...] before after ratio [4578a039] [58ced564] + 28.4±0.1μs 32.1±0.9μs 1.13 timeseries.Offsets.time_custom_bday_cal_incr_neg_n + 132ms 148ms 1.12 timeseries.DatetimeIndex.time_dti_factorize - 152±0.6μs 137±0.8μs 0.90 timeseries.Offsets.time_custom_bmonthend_incr - 22.3±0.08μs 19.8±0.07μs 0.89 timeseries.Offsets.time_custom_bday_incr - 18.9±0.1μs 16.0±0.1μs 0.85 timeseries.AsOf.time_asof_single_early - 2.00s 424±40ms 0.21 timeseries.AsOfDataFrame.time_asof - 2.02s 422±7ms 0.21 timeseries.AsOfDataFrame.time_asof_nan
Clipboard again, is there a game plan for this error?
update imports
1ae12c6
1507a7f
dummy commit to force CI re-run
7404b13
There was a problem hiding this comment.
The reason will be displayed to describe this comment to others. Learn more.
why the noqa?
It's public-facing so I didn't want to break anything downstream. Remove unused imports?
hmm, what exactly is public facing of these? I am ok with simply removing anything _leading, what is not here?
I think the noqa is for DAYS, _maybe_add_count, _weekday_rule_aliases, get_freq_group. I think the private ones here can be removed without breaking anything. DAYS and get_freq_group are imported by and a couple other places.
DAYS
_maybe_add_count
_weekday_rule_aliases
get_freq_group
ok let's remove any non-used, you can put the non-private ones in a whatsnew note
Just pushed. Wasn't sure about wording in whatsnew note for the list DAYS.
this is because this is not a cdef class; make a _Resolution class then provide the public interface via Resolution
alternately you can just move the constants here. note should this class be a Singleton pattern? do we even instantiate it?
note should this class be a Singleton pattern?
Haven't given it any thought. Maybe the idea was to make it easily subclassable?
do we even instantiate it?
Nope, looks like everywhere it shows up its just Resolution.get_freq_group(...) or similar.
Resolution.get_freq_group(...)
ok so it was for grouping things / sharing code.
maybe add to the list to either make this a Singleton, or better yet use instances of these.
Done.
is this actually used elsewhere? if so should fix that, this should be a cdef method of _FrequencyInferer (or at the very lease cdef only)
Nope, just in resolution. Will cdef for now.
Related: there's an import of pandas.core.algorithms.unique. Do you know if there is a cython version of that we can use instead? The uses are all for fairly small integer indexes:
pandas.core.algorithms.unique
if len(unique(self.fields['M'])) > 1: weekdays = unique(self.index.weekday) week_of_months = unique((self.index.day - 1) // 7)
In fact, if these are all being called on Index objects, then the Int64Index.unique method might make sense.
Index
Int64Index.unique
edits per reviewer suggestions
9b8c61c
can be de-privatized (add to list or do here ok)
to the list it goes
remove unused; update imports; add whatsnew note
c9bf005
name fixup
4817941
rebase
25ef9b2
9c17366
Hello @jbrockmendel! Thanks for updating the PR.
Cheers ! There are no PEP8 issues in this Pull Request. 🍻
whitespace cleanup
cd34cdc
d939e5c
Edits per reviewer requests
096a5af
5701c5e
I would just say they are removed from the private API, no need to point to a private file
why doesn't MONTHS simply exist in resolution?
alias naming, will change
same
screaming for parametization (later is ok)
46d6551
edits per reviewer comments
b8b0124
If we want to finish the day with a win, this and #18086 are probably about ready.
rebase this
private -> public
219884e
I think next to go in.
Awesome. I'll exercise some restraint and not immediately replace.
Travis failure is a timeout in TestClipboard. I can only imagine the maintainers find this even more annoying than I do. Still: argh
actually I merged a commit, which may fix this (but you rebase after). don't push again. let me review.
small changes
these tags are not going to work, nor are these in the api.rst so they won't reference anything, just use tseries.frequencies.get_freq_group() etc
tseries.frequencies.get_freq_group()
OK. Does this take precedence over "don't push again"?
no actually push again with this fix everything else ok
add to list to doc-string things
whatsnew edit per request
0ab2b08
ok ping on green.
needs a rebase.
note that if you actually rebase on top of master (as opposed to merge master), then this becomes really simple to see where you are. you prob want to make a single commit when you do this as the conflicts are easier to resolve. or you can use an ide. It is much much easier to see what is going on w/o all of these constant merges. I would highly recommend you follow this workflow. since you rebase fairly frequently this is actually pretty easy to do. you don't actually need to squash (though again would recommend having logical commits).
7b374d3
lgtm. ping on green.
Ping
c3cfe90
thanks @jbrockmendel
put a note - this is non performiant logic here (and duplicative) and this simply should call unique_1d directly plus no reason to depend on khash directly;
Note: "here" refers to tslibs.resolution.unique_deltas.
tslibs.resolution.unique_deltas
So I tried using unique_1d directly and performance was hurt noticeably. Not a huge shock since unique_1d is in py-land. So as an experiment I took unique_1d, cut out a bunch of type-checking because the use case here is always a ndarray[int64_t], and implemented what was left directly in tslibs.resolution. The surprise was that performance was still noticeably worse than under the status quo.
So is there a chance that this is performant logic here after all?
Tslibs resolution (pandas-dev#18034)
860625d
jreback jreback approved these changes
Successfully merging this pull request may close these issues.