-
-
Notifications
You must be signed in to change notification settings - Fork 1.5k
Closed
Description
Describe the bug
Some linkage/hierarchical clustering methods fail for some combination of parameters, as noticed by scikit-learn/scikit-learn#19562 (comment).
To Reproduce
Modified from @tliu68's reproducer:
import numpy as np
from sklearn.datasets import make_blobs
from sklearn.utils._testing import create_memmap_backed_data
from sklearn.cluster import AgglomerativeClustering
X, y = make_blobs(n_samples=50, random_state=1)
X, y = create_memmap_backed_data([X, y])
# does not fail
ag = AgglomerativeClustering(n_clusters=3)
ag.fit(X)
# fails
ag = AgglomerativeClustering(affinity="euclidean", linkage="single")
ag.fit(X)
Reproducer Full trace
--------------------------------------------------------------------------- ValueError Traceback (most recent call last) in ----> 1 ag.fit(X) ~/dev/scikit-learn/sklearn/cluster/_agglomerative.py in fit(self, X, y) 895 ) 896 --> 897 out = memory.cache(tree_builder)(X, connectivity=connectivity, 898 n_clusters=n_clusters, 899 return_distance=return_distance, ~/.virtualenvs/sk/lib64/python3.9/site-packages/joblib/memory.py in __call__(self, *args, **kwargs) 350 351 def __call__(self, *args, **kwargs): --> 352 return self.func(*args, **kwargs) 353 354 def call_and_shelve(self, *args, **kwargs): ~/dev/scikit-learn/sklearn/cluster/_agglomerative.py in _single_linkage(*args, **kwargs) 617 def _single_linkage(*args, **kwargs): 618 kwargs['linkage'] = 'single' --> 619 return linkage_tree(*args, **kwargs) 620 621 ~/dev/scikit-learn/sklearn/cluster/_agglomerative.py in linkage_tree(X, connectivity, n_clusters, linkage, affinity, return_distance) 484 X = np.ascontiguousarray(X, dtype=np.double) 485 --> 486 mst = _hierarchical.mst_linkage_core(X, dist_metric) 487 # Sort edges of the min_spanning_tree by weight 488 mst = mst[np.argsort(mst.T[2], kind='mergesort'), :] ~/dev/scikit-learn/sklearn/cluster/_hierarchical_fast.pyx in sklearn.cluster._hierarchical_fast.mst_linkage_core() 456 @cython.nonecheck(False) 457 def mst_linkage_core( --> 458 DTYPE_t [:, ::1] raw_data, 459 DistanceMetric dist_metric): 460 """ ~/dev/scikit-learn/sklearn/cluster/_hierarchical_fast.cpython-39-x86_64-linux-gnu.so in View.MemoryView.memoryview_cwrapper() ~/dev/scikit-learn/sklearn/cluster/_hierarchical_fast.cpython-39-x86_64-linux-gnu.so in View.MemoryView.memoryview.__cinit__() ValueError: buffer source array is read-only
Expected behavior
Linkage/Hierarchical clustering methods should support readonly memmapped datasets
Additional context
Linkage/Hierarchical clustering methods rely on Cython.
Yet, those implementations in python do not support const
memory view obtained by coercion of readonly memmapped datasets (mainly because const
memory view with fused tupe were not implemented within cython at that time).
It should be fixable using const
memory view.
Environment:
- OS: Linux 5.11.11-200.fc33.x86_64
- Python version: 3.8, 3.9
- Cython version: 0.29.21
Metadata
Metadata
Assignees
Labels
No labels