10000 MAINT: Deterministic SVG and PDF tests by jkseppan · Pull Request #7748 · matplotlib/matplotlib · GitHub
[go: up one dir, main page]

Skip to content

MAINT: Deterministic SVG and PDF tests #7748

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Jan 21, 2017

Conversation

jkseppan
Copy link
Member
@jkseppan jkseppan commented Jan 5, 2017

There has been plenty of work (at least #4434, #5671, #5766, #6597) to make it possible to have deterministic svg and pdf output, or at least closer to deterministic than before. This change adds one related feature: callers can remove the bits of pdf metadata that contain the date and matplotlib version, so that new releases don't necessarily change the result of every test. Furthermore, this change enables all the optional pdf and svg determinism features in tests.

I am developing some improvements to test-result caching on another branch, and in my tests only the following test files still change from run to run after this patch (although in my environment I get KNOWNFAIL=27, SKIP=8 so some more are possible):

clip_path_clipping.svg
hatching.svg
pgf_bbox_inches.pdf
pgf_mixedmode.pdf
pgf_pdflatex.pdf
pgf_rcupdate1.pdf
pgf_rcupdate2.pdf
pgf_xelatex.pdf
bbox_inches_tight_clipping.svg
image_clip.svg
scaled_lines.svg
offsetbox_clipping.svg
skew_axes.pdf
skew_axes.svg

And document the standard keys in PdfPages. Also insert the
matplotlib version in the default value of Producer as well
as Creator, to retain a debugging clue in case the user overrides
one of them.
Remove the metadata entries that depend on the date and
the exact version.
@codecov-io
Copy link

Current coverage is 62.12% (diff: 100%)

Merging #7748 into master will increase coverage by <.01%

@@             master      #7748   diff @@
==========================================
  Files           174        174          
  Lines         56028      56028          
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits          34805      34806     +1   
+ Misses        21223      21222     -1   
  Partials          0          0          

Powered by Codecov. Last update 1fa4dd7...eab297c

@tacaswell tacaswell added this to the 2.1 (next point release) milestone Jan 5, 2017
@@ -298,7 +298,12 @@ def compare(self, idx, baseline, extension):
remove_ticks_and_titles(fig)

actual_fname = os.path.join(self.result_dir, baseline) + '.' + extension
fig.savefig(actual_fname, **self.savefig_kwargs)
kwargs = self.savefig_kwargs.copy()
if extension == 'pdf':
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't this break the test of SOURCE_DATE_EPOCH ?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that test starts a separate subprocess to write the output and doesn't use the image_comparison decorator at all.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, 🐑

@tacaswell
Copy link
Member

the pgf tests roll their own image comparison code.

attn @astrofrog this may need to be ported to pytest-mpl?

@@ -136,6 +136,10 @@ def set_font_settings_for_testing():
rcParams['text.hinting_factor'] = 8


def set_reproducibility_for_testing():
rcParams['svg.hashsalt'] = 'matplotlib'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will also break, in a way, the determinism test.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain in more detail? The _test_determinism_save function in test_backend_svg.py file also sets this rc parameter to a constant value.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably along the same lines as @tacaswell; I was thinking it wouldn't fail if svg.hashsalt was broken. However, I see this test doesn't use the decorator anyway, so it is likely to still fail correctly.

jkseppan added a commit to jkseppan/matplotlib that referenced this pull request Jan 7, 2017
There is a cache of png files keyed by the MD5 hashes of corresponding
svg and pdf files, which helps reduce test suite running times for svg
and pdf files that stay exactly the same from one run to the next.

This patch enables caching of test results, not only expected
results, which is only useful if the tests are mostly deterministic
(see matplotlib#7748). It adds reporting of cache misses, which can be helpful
in getting tests to stay deterministic, and expiration since the
test results are going to change more often than the expected
results.
jkseppan added a commit to jkseppan/matplotlib that referenced this pull request Jan 8, 2017
There is a cache of png files keyed by the MD5 hashes of corresponding
svg and pdf files, which helps reduce test suite running times for svg
and pdf files that stay exactly the same from one run to the next.

This patch enables caching of test results, not only expected
results, which is only useful if the tests are mostly deterministic
(see matplotlib#7748). It adds reporting of cache misses, which can be helpful
in getting tests to stay deterministic, and expiration since the
test results are going to change more often than the expected
results.
Copy link
Member
@QuLogic QuLogic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor corrections, but otherwise fine.

The reproducibility of the output from the PS and PDF backends has so
far been tested using various plot elements but only default values of
options such as ``{ps,pdf}.fonttype`` that can affect the output at a
low level, and not with the mathtext or usetex features. When
matplotlib calls external tools (such as PS distillers or LaTeX) their
versions need to be kept constant for reproducibility, and they may
add sources of nondeterminism outside the control of matplotlib.

For SVG output, the ``svg.hashsalt`` rc parameter has been added in an
earlier release. In can be used to change some random id values in the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In -> It

For SVG output, the ``svg.hashsalt`` rc parameter has been added in an
earlier release. In can be used to change some random id values in the
output to be deterministic, at the cost that including multiple such
svg files in one document can lead to collisions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

svg -> SVG

output to be deterministic, at the cost that including multiple such
svg files in one document can lead to collisions.

These features are now enabled in the tests for the pdf and svg
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pdf -> PDF
svg -> SVG


The ``SOURCE_DATE_EPOCH`` environment variable can now be used to set
the timestamp value in the PS and PDF outputs. See
https://reproducible-builds.org/specs/source-date-epoch/

Alternatively, calling ``savefig`` with ``metadata={creationDate=None}``
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This dict literal is a bit broken; should be dict(creationDate=None) or {'creationDate': None}.

@@ -484,11 +483,13 @@ def __init__(self, filename, metadata=None):

self.infoDict = {
'Creator': 'matplotlib %s, http://matplotlib.org' % __version__,
'Producer': 'matplotlib pdf backend%s' % revision,
'Producer': 'matplotlib pdf backend %s' % __version__,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we even need the version here? It's in the Creator tag.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought including the version in both would help with debugging when the user has overridden just one of these. If you use Matplotlib as a component of some larger application, you might want to override Creator and leave Producer pointing to the PDF backend.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough.

@QuLogic QuLogic changed the title MAINT: Deterministic SVG and PDF tests [MRG+1] MAINT: Deterministic SVG and PDF tests Jan 20, 2017
The reproducibility of the output from the PS and PDF backends has so
far been tested using various plot elements but only default values of
options such as ``{ps,pdf}.fonttype`` that can affect the output at a
low level, and not with the mathtext or usetex features. When
matplotlib calls external tools (such as PS distillers or LaTeX) their
versions need to be kept constant for reproducibility, and they may
add sources of nondeterminism outside the control of matplotlib.

For SVG output, the ``svg.hashsalt`` rc parameter has been added in an
earlier release. It can be used to change some random id values in the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the following sentence:
"It can be used to change some random id values in the output to be deterministic, at the cost that including multiple such SVG files in one document can lead to collisions."
Isn't there an grammar problem?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to rephrase.

@NelleV
Copy link
Member
NelleV commented Jan 20, 2017

Thanks! This looks good to me.
👍

@NelleV NelleV changed the title [MRG+1] MAINT: Deterministic SVG and PDF tests [MRG+2] MAINT: Deterministic SVG and PDF tests Jan 20, 2017
SVG files in one document can lead to collisions.
earlier release. This parameter changes some random identifiers in the
SVG file to be deterministic. The downside of this setting is that if
more than one file is generated using with deterministic identifiers
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using with -> using.

@QuLogic QuLogic changed the title [MRG+2] MAINT: Deterministic SVG and PDF tests MAINT: Deterministic SVG and PDF tests Jan 21, 2017
@QuLogic QuLogic merged commit b6f13f1 into matplotlib:master Jan 21, 2017
@jkseppan jkseppan deleted the deterministic-tests branch January 22, 2017 07:37
jkseppan added a commit to jkseppan/matplotlib that referenced this pull request Jan 22, 2017
There is a cache of png files keyed by the MD5 hashes of corresponding
svg and pdf files, which helps reduce test suite running times for svg
and pdf files that stay exactly the same from one run to the next.

This patch enables caching of test results, not only expected
results, which is only useful if the tests are mostly deterministic
(see matplotlib#7748). It adds reporting of cache misses, which can be helpful
in getting tests to stay deterministic, and expiration since the
test results are going to change more often than the expected
results.
jkseppan added a commit to jkseppan/matplotlib that referenced this pull request Jan 30, 2017
There is a cache of png files keyed by the MD5 hashes of corresponding
svg and pdf files, which helps reduce test suite running times for svg
and pdf files that stay exactly the same from one run to the next.

This patch enables caching of test results, not only expected
results, which is only useful if the tests are mostly deterministic
(see matplotlib#7748). It adds reporting of cache misses, which can be helpful
in getting tests to stay deterministic, and expiration since the
test results are going to change more often than the expected
results.
jkseppan added a commit to jkseppan/matplotlib that referenced this pull request Jan 30, 2017
There is a cache of png files keyed by the MD5 hashes of corresponding
svg and pdf files, which helps reduce test suite running times for svg
and pdf files that stay exactly the same from one run to the next.

This patch enables caching of test results, not only expected
results, which is only useful if the tests are mostly deterministic
(see matplotlib#7748). It adds reporting of cache misses, which can be helpful
in getting tests to stay deterministic, and expiration since the
test results are going to change more often than the expected
results.
jkseppan added a commit to jkseppan/matplotlib that referenced this pull request Feb 18, 2017
There is a cache of png files keyed by the MD5 hashes of corresponding
svg and pdf files, which helps reduce test suite running times for svg
and pdf files that stay exactly the same from one run to the next.

This patch enables caching of test results, not only expected
results, which is only useful if the tests are mostly deterministic
(see matplotlib#7748). It adds reporting of cache misses, which can be helpful
in getting tests to stay deterministic, and expiration since the
test results are going to change more often than the expected
results.
jkseppan added a commit to jkseppan/matplotlib that referenced this pull request Feb 20, 2017
There is a cache of png files keyed by the MD5 hashes of corresponding
svg and pdf files, which helps reduce test suite running times for svg
and pdf files that stay exactly the same from one run to the next.

This patch enables caching of test results, not only expected
results, which is only useful if the tests are mostly deterministic
(see matplotlib#7748). It adds reporting of cache misses, which can be helpful
in getting tests to stay deterministic, and expiration since the
test results are going to change more often than the expected
results.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants
0