Closed
Description
It seems that test_weighted_vs_repeated in sklearn/cluster/tests/test_k_means.py sometimes fails on AppVeyor.
See https://ci.appveyor.com/project/raghavrv/scikit-learn/build/1.0.10154
https://ci.appveyor.com/project/sklearn-ci/scikit-learn/build/1.0.22983
def test_weighted_vs_repeated():
# a sample weight of N should yield the same result as an N-fold
# repetition of the sample
sample_weight = np.random.randint(1, 5, size=n_samples)
X_repeat = np.repeat(X, sample_weight, axis=0)
estimators = [KMeans(init="k-means++", n_clusters=n_clusters,
random_state=42),
KMeans(init="random", n_clusters=n_clusters,
random_state=42),
KMeans(init=centers.copy(), n_clusters=n_clusters,
random_state=42),
MiniBatchKMeans(n_clusters=n_clusters, batch_size=10,
random_state=42)]
for estimator in estimators:
est_weighted = clone(estimator).fit(X, sample_weight=sample_weight)
est_repeated = clone(estimator).fit(X_repeat)
repeated_labels = np.repeat(est_weighted.labels_, sample_weight)
assert_almost_equal(v_measure_score(est_repeated.labels_,
> repeated_labels), 1.0)
E AssertionError:
E Arrays are not almost equal to 7 decimals
E ACTUAL: 0.95215689354371202
E DESIRED: 1.0
Using a fixed random state (for sample_weight
I guess?) might be a solution, but if we can figure out the reason, that's definitely better.
ping @jnhansen for possible insight.