8000 ENH: _StringFuncParser to get numerical functions callables from strings by alvarosg · Pull Request #7464 · matplotlib/matplotlib · GitHub
[go: up one dir, main page]

Skip to content

ENH: _StringFuncParser to get numerical functions callables from strings #7464

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 11 commits into from
Dec 15, 2016
167 changes: 167 additions & 0 deletions lib/matplotlib/cbook.py
Original file line number Diff line number Diff line change
Expand Up @@ -2661,3 +2661,170 @@ def __exit__(self, exc_type, exc_value, traceback):
os.rmdir(path)
except OSError:
pass


class _FuncInfo(object):
"""
Class used to store a function

Each object has:
Copy link
Member
@QuLogic QuLogic Dec 14, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be done in numpydoc style.

Class used to store a function

Attributes
----------
function : function??
    The direct function
inverse : function??
    The inverse function
bounded_0_1 : bool
    ...
check_params : callable
    ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it is a hidden helper class for pure internal use. Most classes in these cases do not even have documentation, so I figured rules for formatting would not be as strict, so did not really put any attention to follow numpydoc... I will change it.

* The direct function (direct)
* The inverse function (inverse)
* A boolean indicating whether the function
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And why is this wrapped so early? It looks weird when the next bullet isn't.

is bounded in the interval 0-1 (bounded_0_1)

"""
def __init__(self, direct, inverse, bounded_0_1):
self.direct = direct
self.inverse = inverse
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Naming preference: I suggest "function" and "inverse" rather than "direct" and "inverse".

self.bounded_0_1 = bounded_0_1

def copy(self):
return _FuncInfo(self.direct,
self.inverse,
self.bounded_0_1)


class _StringFuncParser(object):
"""
A class used to convert predefined strings into
_FuncInfo objects, or to directly obtain _FuncInfo
properties.

"""

_funcs = {}
_funcs['linear'] = _FuncInfo(lambda x: x,
lambda x: x,
True)
_funcs['quadratic'] = _FuncInfo(lambda x: x**2,
lambda x: x**(1. / 2),
True)
_funcs['cubic'] = _FuncInfo(lambda x: x**3,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

np.power, kwargs={x2:2}
maybe something like that?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But I am not sure how would that help, you would still need the lambda function to pass the kwargs, right?

In any case, I do not think it really is that relevant at this point, we are trying to do something relatively simple for a subset of functions (for which it can be tested) aimed to help FuncNorm and children. If we ever decide to make this accessible class directly accesible to end users, or to let users create their own cuestom _FuncInfo objects, then we can worry about these things.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

depends on how it'd be called in FuncNorm...(I'd argue FuncNorm should support passing in f=np.power, kwargs={}) too since that's common enough for numpy. Would also give users a way to call np.piecewise directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that FuncNorm should support that (even though it is not as useful as it may seem, imagine a case where the variable is the second argument and not the first one). Anyway we can discuss that in the next PR.

IMO doing that here is a mistake, as it goes against what we are trying to achieve, which is to provide the simplest possible way to specify a simple function. For more complex cases, the user will always be able to use FuncNorm with callables directly, or with a new kwargs option based on your suggestion.

lambda x: x**(1. / 3),
True)
_funcs['sqrt'] = _FuncInfo(lambda x: x**(1. / 2),
lambda x: x**2,
True)
_funcs['cbrt'] = _FuncInfo(lambda x: x**(1. / 3),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this notation common?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, cbrt is used by numpy as scipy for the cubic root.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, wondering now if there's a way to generically support any function if the name is passed in using a lookup of numpy math function names...(the implementation of the functions would also be pitched to numpy, which maybe should be happening anyway instead of lambdas...)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

getattr(np, name)

Copy link
Contributor Author
@alvarosg alvarosg Nov 16, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But this does not really work, because it would not give have access to the inverse function, unless, we explicitly include a list of corresponding inverse functions, which will limit in any case the usage to whatever we allow.
Also it would make the parameters option very complex, because in those cases we would not control the exact shape of the string.

I think with polynomials, roots, and offset logs we can probably cover everything that basic users will want to do, for normalizations. The advanced users can always provide the callables.

67ED

Copy link
Member
@story645 story645 Nov 16, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So even without getattr, I still prefer calling the numpy functions when possible because they're tested and optimized in a way a bunch of lambda functions aren't.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, but this wil only be possible for the simples cases, none of the para-metrical cases would work, without the lambda. On the other hand inside the lambda the operators will be overloaded by numpy when working with arrays, so I do not think the reliability can be that different.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shrugs I think there's potential for error/typos/etc. in writing your own and in theory the numpy functions are more robust. Also now wondering do you cast lists to arrays or are the lambda functions done in a comprehension in FuncNorm?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, everything is cast into a numpy type automatically.

lambda x: x**3,
True)
_funcs['log10'] = _FuncInfo(lambda x: np.log10(x),
lambda x: (10**(x)),
False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no base2? (Yeah, this all feels weirdly arbitrary)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I should include log2

< F438 circle cx="8" cy="8" r="7" stroke="currentColor" stroke-opacity="0.25" stroke-width="2" vector-effect="non-scaling-stroke" fill="none" />

_funcs['log'] = _FuncInfo(lambda x: np.log(x),
lambda x: (np.exp(x)),
False)
_funcs['x**{p}'] = _FuncInfo(lambda x, p: x**p[0],
lambda x, p: x**(1. / p[0]),
True)
_funcs['root{p}(x)'] = _FuncInfo(lambda x, p: x**(1. / p[0]),
lambda x, p: x**p,
True)
_funcs['log10(x+{p})'] = _FuncInfo(lambda x, p: np.log10(x + p[0]),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still don't think you need to bother with a log10 since you have the generic case and the user will never know (they have to write log10 anyway)

lambda x, p: 10**x - p[0],
True)
_funcs['log(x+{p})'] = _FuncInfo(lambda x, p: np.log(x + p[0]),
lambda x, p: np.exp(x) - p[0],
True)
_funcs['log{p}(x+{p})'] = _FuncInfo(lambda x, p: (np.log(x + p[1]) /
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure why you don't do a generic 'log{p}' then instead of log10 since users will have to type log10 anyway

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did it like this historically (there were no parameters at the beginning) and then keep it to in a similar way that log10, log and log2 exist separately in numpy. But you are right, being strings probably we can just keep 'log' and 'log{p}'. In any case I definitely should include 'log{p}' which is now missing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

both log can be called log(x,2) or log(x,10) so I figure one unified call is cleaner here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's decide about this later

np.log(p[0])),
lambda x, p: p[0]**(x) - p[1],
True)

def __init__(self, str_func):
"""
Parameters
----------
str_func : string
String to be parsed.

"""
try: # For python 2.7 and python 3+ compatibility
is_str = isinstance(str_func, basestring)
except NameError:
is_str = isinstance(str_func, str)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace this block with is_str = isinstance(str_func, six.string_types).


if not is_str:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correction: delete the preceding block and just replace the line here.

raise ValueError("The argument passed is not a string.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"{} must be a string".format(str_func)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if it is a good idea to format something that we know is not a string into a string. What about simply " 'str_func' must be a string"?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I though about it again, and realized that actually your way it is better for cases when the error shows up to an user that does not know where the exception is being raised exactly, so, thank you for your suggestion, it now produces a message like that one :).

self._str_func = str_func
self._key, self._params = self._get_key_params()
self._func = self.get_func()

def get_func(self):
"""
Returns the _FuncInfo object, replacing the relevant parameters if
necessary in the lambda functions.

"""

func = self._funcs[self._key].copy()
if len(self._params) > 0:
m = func.direct
func.direct = (lambda x, m=m: m(x, self._params))
m = func.inverse
func.inverse = (lambda x, m=m: m(x, self._params))
return func

def get_directfunc(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these should probably be properties instead of getters

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am actually considering getting rid of this, and just support get_func, as a property, and then use the FuncInfo object to obtain the required functions.

"""
Returns the callable for the direct function.

"""
return self._func.direct

def get_invfunc(self):
"""
Returns the callable for the inverse function.

"""
return self._func.inverse

def is_bounded_0_1(self):
"""
Returns a boolean indicating if the function is bounded
in the [0-1 interval].

"""
return self._func.bounded_0_1

def _get_key_params(self):
str_func = six.text_type(self._str_func)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unnecessary, given checking in the init? If it is needed, it should be there.

< 179B circle cx="8" cy="8" r="7" stroke="currentColor" stroke-opacity="0.25" stroke-width="2" vector-effect="non-scaling-stroke" fill="none" />

# Checking if it comes with parameters
regex = '\{(.*?)\}'
params = re.findall(regex, str_func)

if len(params) > 0:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if params should work equally well.

for i in range(len(params)):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use for i, param in enumerate(params):

try:
params[i] = float(params[i])
except:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We try hard to avoid bare except: constructs; here, use except ValueError.

raise ValueError("'p' in parametric function strings must"
" be replaced by a number that is not "
"zero, e.g. 'log10(x+{0.1})'.")

if params[i] == 0:
raise ValueError("'p' in parametric function strings must"
" be replaced by a number that is not "
"zero.")
str_func = re.sub(regex, '{p}', str_func)

try:
func = self._funcs[str_func]
except KeyError:
raise ValueError("%s: invalid function. The only strings "
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should raise KeyError if you're explicitly catching the KeyError, otherwise I don't see the point in differentiating the value errors.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, actually from the point of view of the dictionary it is a key error, but from the point of view of the user, it may be seen as a value error in one of the input arguments, as he does not need to know that this is stored in a dictionary and access with the string as a key. In any case, I am happy to switch to KeyError.

The reason I have two cases was to differentiate from TypeError, because in this case it may not be as safe to cast the input as a string for printing the error message. The way the code is now there is no chance it will not be a string, so yes, I am happy to just have a single except.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So yeah, then I vote for use one for now and split if a need arises.

"recognized as functions are %s." %
(str_func, self.funcs.keys()))
except:
raise ValueError("Invalid function. The only strings recognized "
"as functions are %s." %
(self.funcs.keys()))
if len(params) > 0:
func.direct(0.5, params)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is this block of code doing?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The func.direct(0.5, params) outside the try block is just a debugging residue.
The one inside the block is meant to check that the callable actually works with the number of parameters parsed. Alternatively I could add a number of expected parameters to _FuncInfo, and then do a manual check on the lengths, I did not do it like this to avoid adding more stuff to the initialization of FuncInfo.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, please remove debugging residue but I also think that's a weird non-obvious test that the callebe works with parameters passed. Like if it fails, I think it should do so naturally further down the stack where it'll yield it's own specific error message.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I actually though about this more and there is no way I could fail there, because the implementation guarantees that the parameters are floating point numbers. And the fact that there is a mathing string guarantees that the number of parameters is right.

What I am going to include is the possibility of checking if the parameters are in the right range, and also checking wheter the fucntion is bounded as fucntion of the parameters. This way, we can also allow parameters equal to 0.

try:
func.direct(0.5, params)
except:
raise ValueError("Invalid parameters set for '%s'." %
(str_func))

return str_func, params
54 changes: 54 additions & 0 deletions lib/matplotlib/tests/test_cbook.py
Original file line number Diff line number Diff line change
Expand Up @@ -517,3 +517,57 @@ def test_flatiter():

assert 0 == next(it)
assert 1 == next(it)


class TestFuncParser(object):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you insist on writing your own lambdas instead of using numpy, then please test that the lambda functions are correct (no stray typos or the like)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a parametric test for each possible case, and it tests both the direct and the inverse, so it should be ok.

x_test = np.linspace(0.01, 0.5, 3)
validstrings = ['linear', 'quadratic', 'cubic', 'sqrt', 'cbrt',
'log', 'log10', 'x**{1.5}', 'root{2.5}(x)',
'log(x+{0.5})', 'log10(x+{0.1})', 'log{2}(x+{0.1})']
results = [(lambda x: x),
(lambda x: x**2),
(lambda x: x**3),
(lambda x: x**(1. / 2)),
(lambda x: x**(1. / 3)),
(lambda x: np.log(x)),
(lambda x: np.log10(x)),
(lambda x: x**1.5),
(lambda x: x**(1 / 2.5)),
(lambda x: np.log(x + 0.5)),
(lambda x: np.log10(x + 0.1)),
(lambda x: np.log2(x + 0.1))]

bounded_list = [True, True, True, True, True,
False, False, True, True,
True, True, True]

@pytest.mark.parametrize("string, func",
zip(validstrings, results),
ids=validstrings)
def test_values(self, string, func):
func_parser = cbook._StringFuncParser(string)
f = func_parser.get_directfunc()
assert_array_almost_equal(f(self.x_test), func(self.x_test))

@pytest.mark.parametrize("string", validstrings, ids=validstrings)
def test_inverse(self, string):
func_parser = cbook._StringFuncParser(string)
f = func_parser.get_func()
fdir = f.direct
finv = f.inverse
assert_array_almost_equal(finv(fdir(self.x_test)), self.x_test)

@pytest.mark.parametrize("string", validstrings, ids=validstrings)
def test_get_invfunc(self, string):
func_parser = cbook._StringFuncParser(string)
finv1 = func_parser.get_invfunc()
finv2 = func_parser.get_func().inverse
assert_array_almost_equal(finv1(self.x_test), finv2(self.x_test))

@pytest.mark.parametrize("string, bounded",
zip(validstrings, bounded_list),
ids=validstrings)
def test_bounded(self, string, bounded):
func_parser = cbook._StringFuncParser(string)
b = func_parser.is_bounded_0_1()
assert_array_equal(b, bounded)
0