8000 KMeans initialization does not use sample weights · Issue #25527 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

KMeans initialization does not use sample weights #25527

@user783738

Description

@user783738

Describe the bug

Clustering by KMeans does not weight the input data.

Steps/Code to Reproduce

import numpy as np
from sklearn.cluster import KMeans
x = np.array([1, 1, 5, 5, 100, 100])
w = 10**np.array([8.,8,8,8,-8,-8]) # large weights for 1 and 5, small weights for 100
x=x.reshape(-1,1)# reshape to a 2-dimensional array requested for KMeans
centers_with_weight = KMeans(n_clusters=2, random_state=0,n_init=10).fit(x,sample_weight=w).cluster_centers_
centers_no_weight = KMeans(n_clusters=2, random_state=0,n_init=10).fit(x).cluster_centers_

Expected Results

centers_with_weight=[[1.],[5.]]
centers_no_weight=[[100.],[3.]]

Actual Results

centers_with_weight=[[100.],[3.]]
centers_no_weight=[[100.],[3.]]

Versions

System:
    python: 3.10.4 (tags/v3.10.4:9d38120, Mar 23 2022, 23:13:41) [MSC v.1929 64 bit (AMD64)]
executable: E:\WPy64-31040\python-3.10.4.amd64\python.exe
   machine: Windows-10-10.0.19045-SP0

Python dependencies:
      sklearn: 1.2.1
          pip: 22.3.1
   setuptools: 62.1.0
        numpy: 1.23.3
        scipy: 1.8.1
       Cython: 0.29.28
       pandas: 1.4.2
   matplotlib: 3.5.1
       joblib: 1.2.0
threadpoolctl: 3.1.0

Built with OpenMP: True

threadpoolctl info:
       user_api: blas
   internal_api: openblas
         prefix: libopenblas
       filepath: E:\WPy64-31040\python-3.10.4.amd64\Lib\site-packages\numpy\.libs\libopenblas.FB5AE2TYXYH2IJRDKGDGQ3XBKLKTF43H.gfortran-win_amd64.dll
        version: 0.3.20
threading_layer: pthreads
   architecture: Haswell
    num_threads: 12

       user_api: blas
   internal_api: openblas
         prefix: libopenblas
       filepath: E:\WPy64-31040\python-3.10.4.amd64\Lib\site-packages\scipy\.libs\libopenblas.XWYDX2IKJW2NMTWSFYNGFUWKQU3LYTCZ.gfortran-win_amd64.dll
        version: 0.3.17
threading_layer: pthreads
   architecture: Haswell
    num_threads: 12

       user_api: openmp
   internal_api: openmp
         prefix: vcomp
       filepath: E:\WPy64-31040\python-3.10.4.amd64\Lib\site-packages\sklearn\.libs\vcomp140.dll
        version: None
    num_threads: 12

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions

      0