Instability in test_ridge.py::test_ridge_sample_weights #11200

rth · 2018-06-04T18:40:21Z

The test sklearn/linear_model/tests/test_ridge.py::test_ridge_sample_weights passes on master, however, when it was parametrized as part of #11074 failures were observed (#11074 (comment)).

The relevant diff can be found in master...rth:test_ridge_sample_weights-parametrization (whitespaces are ignored in this diff), where the following runs fail,

alpha = 1.0, intercept = False, solver = 'lsqr', n_samples = 5, n_features = 10
alpha = 1.0, intercept = False, solver = 'sparse_cg', n_samples = 5, n_features = 10
alpha = 0.01, intercept = False, solver = 'sparse_cg', n_samples = 5, n_features = 10

(out of a total of 32 runs).

This means that this test is brittle and depends on the RNG state. Increasing numerical tolerance might be a solution or possibly increasing the number of samples?

The text was updated successfully, but these errors were encountered:

lesteve · 2018-06-06T16:20:05Z

Just a stand-alone snippet reproducing the problem, which is a small variation around test_ridge_sample_weights to estimate how sensitive it is on its random state:

from itertools import product

import numpy as np

from sklearn.utils.testing import (assert_array_almost_equal,
                                   assert_almost_equal)
from sklearn.linear_model import Ridge
from scipy import linalg


def test_ridge_sample_weights(rng):
    param_grid = product((1.0, 1e-2), (True, False),
                         ('svd', 'cholesky', 'lsqr', 'sparse_cg'))

    for n_samples, n_features in ((6, 5), (5, 10)):

        y =
8000
 rng.randn(n_samples)
        X = rng.randn(n_samples, n_features)
        sample_weight = 1.0 + rng.rand(n_samples)

        for (alpha, intercept, solver) in param_grid:

            # Ridge with explicit sample_weight
            est = Ridge(alpha=alpha, fit_intercept=intercept, solver=solver)
            est.fit(X, y, sample_weight=sample_weight)
            coefs = est.coef_
            inter = est.intercept_

            # Closed form of the weighted regularized least square
            # theta = (X^T W X + alpha I)^(-1) * X^T W y
            W = np.diag(sample_weight)
            if intercept is False:
                X_aug = X
                I = np.eye(n_features)
            else:
                dummy_column = np.ones(shape=(n_samples, 1))
                X_aug = np.concatenate((dummy_column, X), axis=1)
                I = np.eye(n_features + 1)
                I[0, 0] = 0

            cf_coefs = linalg.solve(X_aug.T.dot(W).dot(X_aug) + alpha * I,
                                    X_aug.T.dot(W).dot(y))

            if intercept is False:
                assert_array_almost_equal(coefs, cf_coefs)
            else:
                assert_array_almost_equal(coefs, cf_coefs[1:])
                assert_almost_equal(inter, cf_coefs[0])


rng = np.random.RandomState(0)

for i in range(100):
    try:
        test_ridge_sample_weights(rng)
    except AssertionError:
        print('failed')

On my machine I get 26 failures out of 100 runs.

jnothman · 2018-06-06T21:48:36Z

could you instead print the absolute and relative differences or a histogram of them?

sergulaydore · 2018-07-17T09:16:41Z

I generated a histogram of mismatch percentages for decimal=4 using the script https://gist.github.com/sergulaydore/6767aa908d051cb5d11417600f6161a1. The default test uses decimal=6 but it was hard to see the differences with this value. The issue only happens when solver is lsqr or sparse_cg. I used only one set of param_grid.

sergulaydore · 2018-07-17T10:09:17Z

So the problem was the default tolerance in Ridge. If we want to use the default tol in assert_array_almost_equal which is 1e-6, we need to make sure Ridge has the same tolerance. When I changed the tolerance in Ridge to 1e-6 (default was 1e-3), I did not get an errors any more. Here is the code I ran https://gist.github.com/sergulaydore/313bbcfa17d287fd97d492d0f62cea59. Of course, the test runs a little slower for a lower tolerance. Here is the comparison for timing:

Took 1.089055061340332 seconds with tol=1e-3 and Failed tests = 26
Took 1.3238649368286133 seconds with tol=1e-6 Failed tests = 0

I am creating a PR to fix this in the test.

rth mentioned this issue Jun 4, 2018

[MRG+2] Pytest parametrize unit tests #11074

Merged

3 tasks

lesteve added the help wanted label Jun 6, 2018

sergulaydore mentioned this issue Jul 17, 2018

[MRG+2] change tol in test ridge #11587

Merged

GaelVaroquaux closed this as completed in #11587 Jul 17, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Instability in test_ridge.py::test_ridge_sample_weights #11200

Instability in test_ridge.py::test_ridge_sample_weights #11200

Instability in test_ridge.py::test_ridge_sample_weights #11200

Instability in test_ridge.py::test_ridge_sample_weights #11200

Comments