8000 WIP: ENH: provide doctest runner which is less sensitive to number formatting by ahaldane · Pull Request #9674 · numpy/numpy · GitHub
[go: up one dir, main page]

Skip to content

WIP: ENH: provide doctest runner which is less sensitive to number formatting #9674

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

ahaldane
Copy link
Member
@ahaldane ahaldane commented Sep 9, 2017

As discussed in #9139 and on the mailing list (link), it might be nice to provide downstream projects a doctest runner which ignores little details in how numbers are printed. This PR is a try at that, to get an idea of how it might play out for initial feedback before I work on it any more.

Essentially, this runner searches for all "number-like" string patterns in the want/got strings, extracts them, compares them numerically, and passes on the non-numeric parts to the usual doctest comparison function.

See the comments for more details.

@ahaldane
Copy link
Member Author
ahaldane commented Sep 9, 2017

On the list a few people were against including a runner as part of numpy, suggesting it should be a separate project or part of the doctest module. Two reasons I see to keep it in numpy are: 1. we could update it in lockstep with numpy releases. This way if numpy updates printing in a way that breaks the runner, we can update the runner at the same time. Less headache for downstream projects. 2. The runner is potentially numpy-dependent, eg we might use np.isclose.

@ahaldane
Copy link
Member Author
ahaldane commented Sep 9, 2017

You can compare to the Astropy (link) and Sympy (link) implementations of a similar doctest runner. I had forgotten about them when I wrote this, but the implementations turned out extremely similar.

Features of the Astropy runner not in this PR: 1. Support for mathematical expressions involving */+-=<, which I intentionally avoided since numpy never outputs such expressions. 2. does extra fiddling like remove the u on unicode literals, L on long ints, pipe in |S9 dtype specifications. 3. Allows some other float formats like ".123" (no leading digit) which numpy doesn't currently output.

Features of this PR not in the Astropy runner: 1. greedy grabs all surrounding whitespace 2. Plays more nicely with other doctest options/runners by replacing all numbers by a space and then super'ing the resulting string to the doctest module. 3. allows all []() brackets on either side of numbers. 4. Takes care of Bools

Astropy does some regex stuff with fbeg/fmidend I didn't yet understand the purpose of, compared to just using an Or'd ^.

@jdemeyer
Copy link
Contributor

Just pointing out (no more than that) that SageMath also has something like this. SageMath has a very customized doctest framework and this is just one modification: https://github.com/sagemath/sage/blob/master/src/sage/doctest/parsing.py#L878

@mhvk
Copy link
Contributor
mhvk commented Sep 15, 2017

@ahaldane - looking at your implementation, and especially at various links, I cannot help to think that really we should do this upstream. It turns out that following up on an astropy issue [1], @kdavies4 wrote to python-ideas [2], with the final conclusion by Coghlan that this was a good idea [3].

What do you think? Shall we try to push it upstream? One possible difficulty is the lack of tuning possible, but I think that a simple scheme of an absolute offset which is the larger of ~1e-15 * number, or value of last digit would capture almost anything that we care about. (Independently, I would suggest to make that the default, rather than an arbitrary 1e-3; the default should be to look for rounding errors only.)

[1] astropy/astropy#2662
[2] https://mail.python.org/pipermail/python-ideas/2014-August/028559.html
[3] https://mail.python.org/pipermail/python-ideas/2014-August/028678.html

@jdemeyer
Copy link
Contributor

One possible difficulty is the lack of tuning possible

Certainly for SageMath, the customizability is important. We do this with a comment like # tol 1e-10 or # abs tol 1e-10 to specify the relative/absolute tolerance on numbers in doctests. I think a fixed value wouldn't work. For complicated algorithms (say, something involving BLAS), the errors can be larger. On the other hand, for simple algorithms, you want the error to be small. Finally, in many cases, SageMath uses completely deterministic algorithms so there the output (even if it's floating point) should be exact.

CC @embray

@rgommers
Copy link
Member
  1. we could update it in lockstep with numpy releases. This way if numpy updates printing in a way that breaks the runner, we can update the runner at the same time. Less headache for downstream projects.

That can still be done if it's a separate package right? Could be completely standalone, but also under the numpy Github organisation is possible - similar to numpydoc.

  1. The runner is potentially numpy-dependent, eg we might use np.isclose.

That's not really relevant for the decision to ship as part of numpy.

Numpy itself won't even use this test runner, so including it within numpy would be weird.

@pv
Copy link
Member
pv commented Sep 16, 2017

Pytest plugin?

@ahaldane
Copy link
Member Author

Re: tuning

I agree that tunable tolerance is very desirable, since a lot of operations have much less precision than 1e-15, eg if they involve sums. Even in astropy doctests, I remember some of the doctests only want 10**-1 level equality (judging from ... use), I think because of small changes in input datasets.

@ahaldane
Copy link
Member Author

The sage implementation looks very good to me! It does a lot of things this PR doesn't do (eg support for random numbers), and I like the addition of the 'abs tol' per-test directive. We would need to remove or generalize some sage specific stuff (for parsing the sage command line style), and perhaps a few tweaks to the regexes.

I think I now agree we should move it upstream or make it a separate package... I think the suggested location options are:

  1. Try to get it into the doctest module
  2. Make it into a github project under the numpy github org

Then I think @pv is suggesting an implementation distinction between

  1. write a bunch of doctest subclasses, like in the sage implementation
  2. Make it a pytest plugin

Actually, I think that making it a pytest plugin requires making the doctest subclasses first, in any case. So we can leave the pytest decision for later.

By the way, I see that pytest already has its own custom doctest subclasses too, which removes the u" and b" before strings just like the astropy and sage subclasses... Yet another example of duplicated effort.

@mhvk
Copy link
Contributor
mhvk commented Sep 17, 2017

The reason that I mentioned fixed defaults for upstreaming is that in the linked discussion it was suggested it was not easy to provide a precision together with a doctest option. I have not actually investigated this myself, but if true, I still think this should not stop us from upstreaming. For astropy at least, I'm fairly sure the "rounding error" level is almost always applicable (and one could certainly raise the minimum to 1e-10 instead of 1e-15). And if one also added a FLOAT_LAX or so, one probably has covered 99.9% of the usage without requiring tuning. But I think it is also fine to have a much laxer default with a single option - in the end, doctests are not meant to test ultimate correctness of the code, just that things work as shown (which includes getting at least roughly the right answer!).

# These patterns detect spaces/punctuation before/after a number
_punct = '[\[\]\(\),]'
_before = '(?:\s+|^|(?<={}))'.format(_punct)
_after = '(?:\s+|$|(?={}))'.format(_punct)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it should be _after = '(?=\s+|$|(?={}))'.format(_punct) in order to capture numbers separated by white spaces: https://regex101.com/r/O9S9Z6/1

@mattip
Copy link
Member
mattip commented Mar 13, 2019

We now have a different solution (refcheck) merged in and running on CI. Can we close this?

@tylerjereddy
Copy link
Contributor

I think refguide_check.py, which we vendored from SciPy with minor mods, does this.

The discussion here about upstreaming our doc checking framework for the benefit of the ecosystem is relevant but also covered in #12548.

@mhvk
Copy link
Contributor
mhvk commented Mar 13, 2019

Yes, OK to close this; I added my comment about upstreaming to doctest itself in #12548 - it would seem the ultimate solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants
0