8000 Inconsistent labels with KMeans using random_state · Issue #20216 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content
Inconsistent labels with KMeans using random_state #20216
Closed
@DevinShanahan

Description

@DevinShanahan

Describe the bug

I am running KMeans with a set random_state which is generating the same clusters but does not always apply the same label values to the clusters. I realize that the labels have no inherent significance, but I am writing a notebook and this makes it impossible to use the label number to refer to the clusters, so it would be very helpful for this to be consistent as well. I have noticed that setting n_jobs=1 prevents this from occurring, but n_jobs is deprecated so that is not a long term solution.

Steps/Code to Reproduce

from sklearn.cluster import KMeans
import numpy as np
import matplotlib.pyplot as plt

np.random.seed(2)

x = np.random.normal(size=(1800, 2))
x[:700, 0] += 3
x[:700, 1] += 3
x[700:1200, 0] -= 0.5
x[700:1200, 1] -= 0.5
x[1200:, 0] += 3
x[1200:, 1] -= 3

np.random.shuffle(x)

first = None
while True: # it typically only takes a few iterations for a difference to occur
    km = KMeans(n_clusters=3, random_state=10)
    km.fit(x)
    pred = km.predict(x)
    if first is None:
        first = pred
    elif not np.array_equal(first, pred):
        print(first)
        print(pred)
        fig, ax = plt.subplots(1,2)
        for label in range(3):
            clusters = x[first == label]
            cluster = x[pred == label]
            ax[0].scatter(clusters[:, 0], clusters[:, 1], label=label)
            ax[1].scatter(cluster[:, 0], cluster[:, 1], label=label)
        break

ax[0].legend()
ax[1].legend()
plt.show()

Expected Results

Labels are the same each time

Actual Results

[0 1 1 ... 2 0 0]
[0 2 2 ... 1 0 0]

image

Versions

System:
python: 3.8.3 (default, May 19 2020, 13:54:14) [Clang 10.0.0 ]
executable: /Users/devin/anaconda3/envs/sw38/bin/python
machine: macOS-10.16-x86_64-i386-64bit

Python dependencies:
pip: 20.2.2
setuptools: 52.0.0.post20210125
sklearn: 0.24.2
numpy: 1.18.1
scipy: 1.4.1
Cython: 0.29.21
pandas: 1.0.3
matplotlib: 3.3.4
joblib: 1.0.1
threadpoolctl: 2.1.0

Built with OpenMP: True

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0