10000 test_weighted_vs_repeated sometimes fails on AppVeyor · Issue #11423 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

test_weighted_vs_repeated sometimes fails on AppVeyor #11423

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
qinhanmin2014 opened this issue Jul 4, 2018 · 2 comments
Closed

test_weighted_vs_repeated sometimes fails on AppVeyor #11423

qinhanmin2014 opened this issue Jul 4, 2018 · 2 comments

Comments

@qinhanmin2014
Copy link
Member

It seems that test_weighted_vs_repeated in sklearn/cluster/tests/test_k_means.py sometimes fails on AppVeyor.
See https://ci.appveyor.com/project/raghavrv/scikit-learn/build/1.0.10154
https://ci.appveyor.com/project/sklearn-ci/scikit-learn/build/1.0.22983

    def test_weighted_vs_repeated():
        # a sample weight of N should yield the same result as an N-fold
        # repetition of the sample
        sample_weight = np.random.randint(1, 5, size=n_samples)
        X_repeat = np.repeat(X, sample_weight, axis=0)
        estimators = [KMeans(init="k-means++", n_clusters=n_clusters,
                             random_state=42),
                      KMeans(init="random", n_clusters=n_clusters,
                             random_state=42),
                      KMeans(init=centers.copy(), n_clusters=n_clusters,
                             random_state=42),
                      MiniBatchKMeans(n_clusters=n_clusters, batch_size=10,
                                      random_state=42)]
        for estimator in estimators:
            est_weighted = clone(estimator).fit(X, sample_weight=sample_weight)
            est_repeated = clone(estimator).fit(X_repeat)
            repeated_labels = np.repeat(est_weighted.labels_, sample_weight)
            assert_almost_equal(v_measure_score(est_repeated.labels_,
>                                               repeated_labels), 1.0)
E           AssertionError: 
E           Arrays are not almost equal to 7 decimals
E            ACTUAL: 0.95215689354371202
E            DESIRED: 1.0

Using a fixed random state (for sample_weight I guess?) might be a solution, but if we can figure out the reason, that's definitely better.
ping @jnhansen for possible insight.

@jnothman
Copy link
Member
jnothman commented Jul 4, 2018

Duplicate of #11236

A few things to do:

  • work out what in KMeans might distinguish between sample_weight and repetition
  • if we expect a small amount of variation, we should reconsider whether v_measure is too brittle a metric
  • fix the random_state

@qinhanmin2014
Copy link
Member Author

Thanks @jnothman

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants
0