-
-
Notifications
You must be signed in to change notification settings - Fork 7.9k
MAINT: Deterministic SVG and PDF tests #7748
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
And document the standard keys in PdfPages. Also insert the matplotlib version in the default value of Producer as well as Creator, to retain a debugging clue in case the user overrides one of them.
Remove the metadata entries that depend on the date and the exact version.
Current coverage is 62.12% (diff: 100%)@@ master #7748 diff @@
==========================================
Files 174 174
Lines 56028 56028
Methods 0 0
Messages 0 0
Branches 0 0
==========================================
+ Hits 34805 34806 +1
+ Misses 21223 21222 -1
Partials 0 0
|
@@ -298,7 +298,12 @@ def compare(self, idx, baseline, extension): | |||
remove_ticks_and_titles(fig) | |||
|
|||
actual_fname = os.path.join(self.result_dir, baseline) + '.' + extension | |||
fig.savefig(actual_fname, **self.savefig_kwargs) | |||
kwargs = self.savefig_kwargs.copy() | |||
if extension == 'pdf': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doesn't this break the test of SOURCE_DATE_EPOCH
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, that test starts a separate subprocess to write the output and doesn't use the image_comparison decorator at all.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, 🐑
the pgf tests roll their own image comparison code. attn @astrofrog this may need to be ported to |
@@ -136,6 +136,10 @@ def set_font_settings_for_testing(): | |||
rcParams['text.hinting_factor'] = 8 | |||
|
|||
|
|||
def set_reproducibility_for_testing(): | |||
rcParams['svg.hashsalt'] = 'matplotlib' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will also break, in a way, the determinism test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain in more detail? The _test_determinism_save
function in test_backend_svg.py
file also sets this rc parameter to a constant value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably along the same lines as @tacaswell; I was thinking it wouldn't fail if svg.hashsalt
was broken. However, I see this test doesn't use the decorator anyway, so it is likely to still fail correctly.
There is a cache of png files keyed by the MD5 hashes of corresponding svg and pdf files, which helps reduce test suite running times for svg and pdf files that stay exactly the same from one run to the next. This patch enables caching of test results, not only expected results, which is only useful if the tests are mostly deterministic (see matplotlib#7748). It adds reporting of cache misses, which can be helpful in getting tests to stay deterministic, and expiration since the test results are going to change more often than the expected results.
There is a cache of png files keyed by the MD5 hashes of corresponding svg and pdf files, which helps reduce test suite running times for svg and pdf files that stay exactly the same from one run to the next. This patch enables caching of test results, not only expected results, which is only useful if the tests are mostly deterministic (see matplotlib#7748). It adds reporting of cache misses, which can be helpful in getting tests to stay deterministic, and expiration since the test results are going to change more often than the expected results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some minor corrections, but otherwise fine.
The reproducibility of the output from the PS and PDF backends has so | ||
far been tested using various plot elements but only default values of | ||
options such as ``{ps,pdf}.fonttype`` that can affect the output at a | ||
low level, and not with the mathtext or usetex features. When | ||
matplotlib calls external tools (such as PS distillers or LaTeX) their | ||
versions need to be kept constant for reproducibility, and they may | ||
add sources of nondeterminism outside the control of matplotlib. | ||
|
||
For SVG output, the ``svg.hashsalt`` rc parameter has been added in an | ||
earlier release. In can be used to change some random id values in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In -> It
For SVG output, the ``svg.hashsalt`` rc parameter has been added in an | ||
earlier release. In can be used to change some random id values in the | ||
output to be deterministic, at the cost that including multiple such | ||
svg files in one document can lead to collisions. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
svg -> SVG
output to be deterministic, at the cost that including multiple such | ||
svg files in one document can lead to collisions. | ||
|
||
These features are now enabled in the tests for the pdf and svg |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pdf -> PDF
svg -> SVG
|
||
The ``SOURCE_DATE_EPOCH`` environment variable can now be used to set | ||
the timestamp value in the PS and PDF outputs. See | ||
https://reproducible-builds.org/specs/source-date-epoch/ | ||
|
||
Alternatively, calling ``savefig`` with ``metadata={creationDate=None}`` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This dict
literal is a bit broken; should be dict(creationDate=None)
or {'creationDate': None}
.
@@ -484,11 +483,13 @@ def __init__(self, filename, metadata=None): | |||
|
|||
self.infoDict = { | |||
'Creator': 'matplotlib %s, http://matplotlib.org' % __version__, | |||
'Producer': 'matplotlib pdf backend%s' % revision, | |||
'Producer': 'matplotlib pdf backend %s' % __version__, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we even need the version here? It's in the Creator
tag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought including the version in both would help with debugging when the user has overridden just one of these. If you use Matplotlib as a component of some larger application, you might want to override Creator and leave Producer pointing to the PDF backend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough.
The reproducibility of the output from the PS and PDF backends has so | ||
far been tested using various plot elements but only default values of | ||
options such as ``{ps,pdf}.fonttype`` that can affect the output at a | ||
low level, and not with the mathtext or usetex features. When | ||
matplotlib calls external tools (such as PS distillers or LaTeX) their | ||
versions need to be kept constant for reproducibility, and they may | ||
add sources of nondeterminism outside the control of matplotlib. | ||
|
||
For SVG output, the ``svg.hashsalt`` rc parameter has been added in an | ||
earlier release. It can be used to change some random id values in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the following sentence:
"It can be used to change some random id values in the output to be deterministic, at the cost that including multiple such SVG files in one document can lead to collisions."
Isn't there an grammar problem?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll try to rephrase.
Thanks! This looks good to me. |
SVG files in one document can lead to collisions. | ||
earlier release. This parameter changes some random identifiers in the | ||
SVG file to be deterministic. The downside of this setting is that if | ||
more than one file is generated using with deterministic identifiers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
using with -> using.
There is a cache of png files keyed by the MD5 hashes of corresponding svg and pdf files, which helps reduce test suite running times for svg and pdf files that stay exactly the same from one run to the next. This patch enables caching of test results, not only expected results, which is only useful if the tests are mostly deterministic (see matplotlib#7748). It adds reporting of cache misses, which can be helpful in getting tests to stay deterministic, and expiration since the test results are going to change more often than the expected results.
There is a cache of png files keyed by the MD5 hashes of corresponding svg and pdf files, which helps reduce test suite running times for svg and pdf files that stay exactly the same from one run to the next. This patch enables caching of test results, not only expected results, which is only useful if the tests are mostly deterministic (see matplotlib#7748). It adds reporting of cache misses, which can be helpful in getting tests to stay deterministic, and expiration since the test results are going to change more often than the expected results.
There is a cache of png files keyed by the MD5 hashes of corresponding svg and pdf files, which helps reduce test suite running times for svg and pdf files that stay exactly the same from one run to the next. This patch enables caching of test results, not only expected results, which is only useful if the tests are mostly deterministic (see matplotlib#7748). It adds reporting of cache misses, which can be helpful in getting tests to stay deterministic, and expiration since the test results are going to change more often than the expected results.
There is a cache of png files keyed by the MD5 hashes of corresponding svg and pdf files, which helps reduce test suite running times for svg and pdf files that stay exactly the same from one run to the next. This patch enables caching of test results, not only expected results, which is only useful if the tests are mostly deterministic (see matplotlib#7748). It adds reporting of cache misses, which can be helpful in getting tests to stay deterministic, and expiration since the test results are going to change more often than the expected results.
There is a cache of png files keyed by the MD5 hashes of corresponding svg and pdf files, which helps reduce test suite running times for svg and pdf files that stay exactly the same from one run to the next. This patch enables caching of test results, not only expected results, which is only useful if the tests are mostly deterministic (see matplotlib#7748). It adds reporting of cache misses, which can be helpful in getting tests to stay deterministic, and expiration since the test results are going to change more often than the expected results.
There has been plenty of work (at least #4434, #5671, #5766, #6597) to make it possible to have deterministic svg and pdf output, or at least closer to deterministic than before. This change adds one related feature: callers can remove the bits of pdf metadata that contain the date and matplotlib version, so that new releases don't necessarily change the result of every test. Furthermore, this change enables all the optional pdf and svg determinism features in tests.
I am developing some improvements to test-result caching on another branch, and in my tests only the following test files still change from run to run after this patch (although in my environment I get KNOWNFAIL=27, SKIP=8 so some more are possible):