Baseline image reuse breaks parallel testing

Travis has been failing a lot lately, and I'm pretty sure the reason is baseline image reuse + parallel tests. I believe pytest parallelizes across extension and baseline image, so when a test writes to the same baseline more than once, there's a high possibility that one figure will be saved on top of another while it's trying to verify that it was correct.

For example, the step_linestyle image has been failing a lot recently, which you can try out with:

$ py.test test_axes.py -k linestyle -n 8
...
...
________________________________________________ test_step_linestyle[1-step_linestyle-svg] _________________________________________________
[gw7] linux -- Python 3.5.2 /home/elliott/code/conda/envs/mpl35/bin/python
expected = '/home/elliott/code/matplotlib/lib/matplotlib/tests/result_images/test_axes/step_linestyle-expected_svg.png'
actual = '/home/elliott/code/matplotlib/lib/matplotlib/tests/result_images/test_axes/step_linestyle_svg.png'
tol = 0, in_decorator = True

    def compare_images(expected, actual, tol, in_decorator=False):
        """
        Compare two "image" files checking differences within a tolerance.
    
        The two given filenames may point to files which are convertible to
        PNG via the `.converter` dictionary. The underlying RMS is calculated
        with the `.calculate_rms` function.
    
        Parameters
        ----------
        expected : str
            The filename of the expected image.
        actual :str
            The filename of the actual image.
        tol : float
            The tolerance (a color value difference, where 255 is the
            maximal difference).  The test fails if the average pixel
            difference is greater than this value.
        in_decorator : bool
            If called from image_comparison decorator, this should be
            True. (default=False)
    
        Example
        -------
        img1 = "./baseline/plot.png"
        img2 = "./output/plot.png"
        compare_images( img1, img2, 0.001 ):
    
        """
        if not os.path.exists(actual):
            msg = "Output image %s does not exist." % actual
            raise Exception(msg)
    
        if os.stat(actual).st_size == 0:
            msg = "Output image file %s is empty." % actual
            raise Exception(msg)
    
        verify(actual)
    
        # Convert the image to png
        extension = expected.split('.')[-1]
    
        if not os.path.exists(expected):
            raise IOError('Baseline image %r does not exist.' % expected)
    
        if extension != 'png':
            actual = convert(actual, False)
            expected = convert(expected, True)
    
        # open the image files and remove the alpha channel (if it exists)
        expectedImage = _png.read_png_int(expected)
>       actualImage = _png.read_png_int(actual)
E       SystemError: <built-in function read_png_int> returned NULL without setting an error

which failed reading the png-from-svg, but running again gets:

________________________________________________ test_step_linestyle[0-step_linestyle-svg] _________________________________________________
[gw3] linux -- Python 3.5.2 /home/elliott/code/conda/envs/mpl35/bin/python
expected = '/home/elliott/code/matplotlib/lib/matplotlib/tests/result_images/test_axes/step_linestyle-expected.svg'
actual = '/home/elliott/code/matplotlib/lib/matplotlib/tests/result_images/test_axes/step_linest
61C3
yle.svg'
tol = 0, in_decorator = True

    def compare_images(expected, actual, tol, in_decorator=False):
        """
        Compare two "image" files checking differences within a tolerance.
    
        The two given filenames may point to files which are convertible to
        PNG via the `.converter` dictionary. The underlying RMS is calculated
        with the `.calculate_rms` function.
    
        Parameters
        ----------
        expected : str
            The filename of the expected image.
        actual :str
            The filename of the actual image.
        tol : float
            The tolerance (a color value difference, where 255 is the
            maximal difference).  The test fails if the average pixel
            difference is greater than this value.
        in_decorator : bool
            If called from image_comparison decorator, this should be
            True. (default=False)
    
        Example
        -------
        img1 = "./baseline/plot.png"
        img2 = "./output/plot.png"
        compare_images( img1, img2, 0.001 ):
    
        """
        if not os.path.exists(actual):
            msg = "Output image %s does not exist." % actual
            raise Exception(msg)
    
        if os.stat(actual).st_size == 0:
            msg = "Output image file %s is empty." % actual
>           raise Exception(msg)
E           Exception: Output image file /home/elliott/code/matplotlib/lib/matplotlib/tests/result_images/test_axes/step_linestyle.svg is empty.

which points to a truncated svg.

To fix this, we should be able to stop pytest from parallelizing across baseline images, but what if more than one test uses the same image? I haven't yet checked that case, but it's a possibility. Making some kind of lock-per-baseline-image seems like too much work. Should we start duplicating our baseline images if that's happening?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions