8000 Test test_weighted_vs_repeated is somehow flaky · Issue #11236 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

Test test_weighted_vs_repeated is somehow flaky #11236

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
glemaitre opened this issue Jun 11, 2018 · 12 comments · Fixed by #11523
Closed

Test test_weighted_vs_repeated is somehow flaky #11236

glemaitre opened this issue Jun 11, 2018 · 12 comments · Fixed by #11523

Comments

@glemaitre
Copy link
Member

We might a have a test which somehow flaky:

[00:11:40] ================================== FAILURES ===================================
[00:11:40] __________________________ test_weighted_vs_repeated __________________________
[00:11:40] 
[00:11:40]     def test_weighted_vs_repeated():
[00:11:40]         # a sample weight of N should yield the same result as an N-fold
[00:11:40]         # repetition of the sample
[00:11:40]         sample_weight = np.random.randint(1, 5, size=n_samples)
[00:11:40]         X_repeat = np.repeat(X, sample_weight, axis=0)
[00:11:40]         estimators = [KMeans(init="k-means++", n_clusters=n_clusters,
[00:11:40]                              random_state=42),
[00:11:40]                       KMeans(init="random", n_clusters=n_clusters,
[00:11:40]                              random_state=42),
[00:11:40]                       KMeans(init=centers.copy(), n_clusters=n_clusters,
[00:11:40]                              random_state=42),
[00:11:40]                       MiniBatchKMeans(n_clusters=n_clusters, batch_size=10,
[00:11:40]                                       random_state=42)]
[00:11:40]         for estimator in estimators:
[00:11:40]             est_weighted = clone(estimator).fit(X, sample_weight=sample_weight)
[00:11:40]             est_repeated = clone(estimator).fit(X_repeat)
[00:11:40]             repeated_labels = np.repeat(est_weighted.labels_, sample_weight)
[00:11:40]             assert_almost_equal(v_measure_score(est_repeated.labels_,
[00:11:40] >                                               repeated_labels), 1.0)
[00:11:40] E           AssertionError: 
[00:11:40] E           Arrays are not almost equal to 7 decimals
[00:11:40] E            ACTUAL: 0.95443625305609903
[00:11:40] E            DESIRED: 1.0
[00:11:40] 
[00:11:40] X_repeat   = array([[ 0.1777796 ,  0.24368721,  0.24496657,  4.49305682,  0.52896169],
[00:11:40]        [ 0.41278093,  5.82206016,  1.8967929...367, -0.56629773,  0.09965137, -0.50347565],
[00:11:40]        [ 2.19045563,  4.00946367, -0.56629773,  0.09965137, -0.50347565]])
[00:11:40] est_repeated = MiniBatchKMeans(batch_size=10, compute_labels=True, init='k-means++',
[00:11:40]         init_size=None, max_iter=100, max_no_improvement=10, n_clusters=3,
[00:11:40]         n_init=3, random_state=42, reassignment_ratio=0.01, tol=0.0,
[00:11:40]         verbose=0)
[00:11:40] est_weighted = MiniBatchKMeans(batch_size=10, compute_labels=True, init='k-means++',
[00:11:40]         init_size=None, max_iter=100, max_no_improvement=10, n_clusters=3,
[00:11:40]         n_init=3, random_state=42, reassignment_ratio=0.01, tol=0.0,
[00:11:40]         verbose=0)
[00:11:40] estimator  = MiniBatchKMeans(batch_size=10, compute_labels=True, init='k-means++',
[00:11:40]         init_size=None, max_iter=100, max_no_improvement=10, n_clusters=3,
[00:11:40]         n_init=3, random_state=42, reassignment_ratio=0.01, tol=0.0,
[00:11:40]         verbose=0)
[00:11:40] estimators = [KMeans(algorithm='auto', copy_x=True, init='k-means++', max_iter=300,
[00:11:40]     n_clusters=3, n_init=10, n_jobs=1, precompu..._improvement=10, n_clusters=3,
[00:11:40]         n_init=3, random_state=42, reassignment_ratio=0.01, tol=0.0,
[00:11:40]         verbose=0)]
[00:11:40] repeated_labels = array([1, 2, 0, 0, 0, 0, 2, 2, 0, 0, 0, 0, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1,
[00:11:40]        1, 0, 0, 0, 0, 2, 1, 1, 0, 2, 2, 2,...1, 2, 2, 2, 2, 2,
[00:11:40]        2, 0, 0, 0, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 1, 1, 1, 1,
[00:11:40]        0, 0, 2, 2, 2, 2])
[00:11:40] sample_weight = array([1, 1, 4, 1, 1, 2, 2, 2, 2, 4, 4, 4, 1, 2, 1, 4, 3, 4, 1, 1, 4, 2, 3,
[00:11:40]        3, 1, 4, 3, 1, 2, 4, 1, 4, 2, 4, 4,...3, 4, 4, 3,
[00:11:40]        4, 2, 1, 4, 2, 4, 4, 2, 2, 3, 3, 1, 4, 1, 3, 1, 2, 2, 3, 2, 2, 4, 3,
[00:11:40]        3, 3, 4, 3, 2, 4, 2, 4])
[00:11:40] 
[00:11:40] c:\python36\lib\site-packages\sklearn\cluster\tests\test_k_means.py:935: AssertionError

This is the second time that I got a CI failing on this one. I could not find any other issue related to that.

@rth
Copy link
Member
rth commented Jun 11, 2018

Sound like this test fails for some sample_weight (that doesn't use a fixed RNG seed)?

A short term solution could be to generate sample_weight with RandomState.randint. Do you think it's numerical instability issues that make it fail? or is a genuine test failure?

@g-walsh
Copy link
Contributor
g-walsh commented Jun 19, 2018

Hey, just taken a look at this and I think the error is coming from the MiniBatchKMeans test. I think it is coming from the way that MiniBatch generates its samples.

Here we have two datasets that we fit the same model to with the same parameters and check that the output is the same. One is of length n_samples (here 100) and the other is longer with it's length extended by sample_weight (here from 100 to 400 samples depending on how many repetitions).

If we have different sample sizes MiniBatchKMeans will takes it's samples differently even with the same RandomState. For example:

rng1 = np.random.RandomState(42)

rng2 = np.random.RandomState(42)

rng1.randint(0, 100, 10)
array([51, 92, 14, 71, 60, 20, 82, 86, 74, 74])

rng2.randint(0, 100, 10)
array([51, 92, 14, 71, 60, 20, 82, 86, 74, 74])

rng1 = np.random.RandomState(42)

rng3.randint(0, 250, 10)
array([102, 179,  92,  14, 106,  71, 188,  20, 102, 121])

So if the indices aren't the same then there is no reason to think that they should always converge on the same solution which explains why sometimes they don't always label the same.

As a solution we can either remove MiniBatchKMeans from this test and test is separately or as @rth says just pass in a known sample_weight that works. I've tested RandomState.randint with RandomState(42) which works for all tests. I'm happy to put in a PR to add this if that is what is decided.

I also had a go at changing around some of the model parameters such as very small tolerance and increased batch sizes but it still has instabilities eventually.

@jnothman
Copy link
Member
jnothman commented Jul 4, 2018

A few things to do:

  • work out what in KMeans might distinguish between sample_weight and repetition
  • if we expect a small amount of variation, we should reconsider whether v_measure is too brittle a metric
  • fix the random_state

Ping @jnhansen

@jnhansen
Copy link
Contributor
jnhansen commented Jul 4, 2018

Cheers, I hadn't seen this issue before the ping. Will work on a fix, though possibly not before next week.

@jnothman
Copy link
Member
jnothman commented Jul 4, 2018 via email

@jnhansen
Copy link
Contributor
jnhansen commented Jul 8, 2018

Is there any set of steps to reliably make the test fail? I haven't been able to reproduce the error so far.

I have a feeling though that this is because MiniBatchKMeans is essentially an approximation of KMeans and therefore less reproducible if e.g. the weighted and repeated samples are treated in different batches.

Fixing the random state to a value that is known to deliver correct results feels a bit like cheating and I'd raise the question if the test is at all appropriate for MiniBatchKMeans if it is not expected to consistently pass. Thoughts?

@jnothman
Copy link
Member
jnothman commented Jul 8, 2018 via email

@qinhanmin2014
Copy link
Member

@jnhansen Here is the script I used to reproduce the error (on Windows)

np.random.seed(973)
sample_weight = np.random.randint(1, 5, size=n_samples)
X_repeat = np.repeat(X, sample_weight, axis=0)
estimator = MiniBatchKMeans(n_clusters=n_clusters, batch_size=10, random_state=42)
est_weighted = clone(estimator).fit(X, sample_weight=sample_weight)
est_repeated = clone(estimator).fit(X_repeat)
repeated_labels = np.repeat(est_weighted.labels_, sample_weight)
assert_almost_equal(v_measure_score(est_repeated.labels_, repeated_labels), 1.0)
AssertionError: 
Arrays are not almost equal to 7 decimals
 ACTUAL: 0.9391497686321659
 DESIRED: 1.0

At a glance, I tend to agree with @g-walsh 's comment (#11236 (comment)). Adding a random state seems an acceptable solution from my side.
However, since you've already skipped the check for cluster centers

if not isinstance(estimator, MiniBatchKMeans):
    assert_almost_equal(_sort_centers(est_weighted.cluster_centers_),
                                     _sort_centers(est_repeated.cluster_centers_))

It might be reasonable to skip the check for labels of each point as well. So removing MiniBatchKMeans also seems acceptable from my side. (It seems reasonable for MiniBatchKMeans to reach suboptimal results sometimes.)

@ogrisel
Copy link
Member
ogrisel commented Jul 14, 2018

I would rather fix the random seed and maybe relax the condition to have a v_measure over 0.9 for the MiniBatchKMeans case. I think it's still useful to test that repeated samples and sample weights are approximately equivalent with MiniBatchKMeans.

@jnothman
Copy link
Member
jnothman commented Jul 16, 2018 via email

@qinhanmin2014
Copy link
Member

would we expect exact consistency with an infinite batch_size? is this an appropriate test?

For MiniBatchKMeans, maybe not from my side. I've proposed #11523, which uses a fixed random state to avoid test failure. To further make the test appropriate, a possible way might be @ogrisel 's comment : have a v_measure over 0.9 for the MiniBatchKMeans case.

@g-walsh
Copy link
Contributor
g-walsh commented Jul 16, 2018

@jnothman I think we wouldn't expect consistency here as I think the issue with the sampling of the mini batches comes from the difference in n_samples between the weighted and repeated rather than batch size. There will always be a difference in n_samples (unless there is no weighting at all) and that means that the sampling will never be identical (and hence occasional test failures).

Also with an infinite batch size I think it would be very similar to testing standard Kmeans anyway rather than MiniBatchKmeans.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants
0