-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Description
Not sure if this is feasible, but I had the following situation while working on pandas hashtable code recently (simplified a bit from pandas-dev/pandas#22986 for clarity):
Two methods had substantial code overlap, with the difference being easily absorbed into a bint
parameter - in this case, whether the unique
method returns an inverse or not. It would be very desirable from a clarity/readability/maintenance perspective to unify the code for those two methods.
I tried unifying these functions, but there was a perf-hit for the simpler case (no inverse) of about 10%, which is too much for just always calculating it by default for such a central method.
I then tried (very naively) to get cython to compile the different code branches as follows:
def _unique(self, ndarray[object] values, bint return_inverse=False):
[...]
with nogil:
for i in range(n):
[actual computation]
if return_inverse:
[calculate inverse]
def unique(self, ndarray[object] values, bint return_inverse=False):
# define separate functions with/without inverse to force compilation
# of the different code paths for boolean "return_inverse"
if return_inverse:
return self._unique_with_inverse(values)
return self._unique_no_inverse(values)
def _unique_no_inverse(self, const {{dtype}}_t[:] values):
return self._unique(values, return_inverse=False)
def _unique_with_inverse(self, const {{dtype}}_t[:] values):
return self._unique(values, return_inverse=True)
but the perf penalty remained.
Summing up, it would be awesome to be able to do something like (the example below borrows from @pytest.mark.parametrize
, but the exact syntax is not relevant to me):
@cython.force_compile('return_inverse', [False])
def unique(self, ndarray[object] values, bint return_inverse=False):
[...]
with nogil:
for i in range(n):
[compute]
if return_inverse:
[calculate inverse]
with the idea being to both unify the code and maintain perf in the case where the return_inverse
-branch doesn't need to be touched at all.