8000 [MRG+1] Fixes incorrect output when input is precomputed sparse matri… · scikit-learn/scikit-learn@2cd1220 · GitHub
[go: up one dir, main page]

Skip to content

Commit 2cd1220

Browse files
Akshay0724jnothman
authored andcommitted
[MRG+1] Fixes incorrect output when input is precomputed sparse matrix in DBSCAN. (#8339)
1 parent 2d0bce7 commit 2cd1220

File tree

3 files changed

+23
-1
lines changed

3 files changed

+23
-1
lines changed

doc/whats_new.rst

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -152,6 +152,10 @@ Enhancements
152152

153153
Bug fixes
154154
.........
155+
- Fixed a bug where :class:`sklearn.cluster.DBSCAN` gives incorrect
156+
result when input is a precomputed sparse matrix with initial
157+
rows all zero.
158+
:issue:`8306` by :user:`Akshay Gupta <Akshay0724>`
155159

156160
- Fixed a bug where :class:`sklearn.ensemble.AdaBoostClassifier` throws
157161
``ZeroDivisionError`` while fitting data with single class labels.

sklearn/cluster/dbscan_.py

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -124,7 +124,8 @@ def dbscan(X, eps=0.5, min_samples=5, metric='minkowski', metric_params=None,
124124
X.sum_duplicates() # XXX: modifies X's internals in-place
125125
X_mask = X.data <= eps
126126
masked_indices = astype(X.indices, np.intp, copy=False)[X_mask]
127-
masked_indptr = np.cumsum(X_mask)[X.indptr[1:] - 1]
127+
masked_indptr = np.concatenate(([0], np.cumsum(X_mask)))[X.indptr[1:]]
128+
128129
# insert the diagonal: a point is its own neighbor, but 0 distance
129130
# means absence from sparse matrix data
130131
masked_indices = np.insert(masked_indices, masked_indptr,

sklearn/cluster/tests/test_dbscan.py

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -350,3 +350,20 @@ def test_dbscan_precomputed_metric_with_degenerate_input_arrays():
350350
X = np.zeros((10, 10))
351351
labels = DBSCAN(eps=0.5, metric='precomputed').fit(X).labels_
352352
assert_equal(len(set(labels)), 1)
353+
354+
355+
def test_dbscan_precomputed_metric_with_initial_rows_zero():
356+
# sample matrix with initial two row all zero
357+
ar = np.array([
358+
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
359+
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0],
360+
[0.0, 0.0, 0.0, 0.0, 0.1, 0.0, 0.0],
361+
[0.0, 0.0, 0.0, 0.0, 0.1, 0.0, 0.0],
362+
[0.0, 0.0, 0.1, 0.1, 0.0, 0.0, 0.3],
363+
[0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.1],
364+
[0.0, 0.0, 0.0, 0.0, 0.3, 0.1, 0.0]
365+
])
366+
matrix = sparse.csr_matrix(ar)
367+
labels = DBSCAN(eps=0.2, metric='precomputed',
368+
min_samples=2).fit(matrix).labels_
369+
assert_array_equal(labels, [-1, -1, 0, 0, 0, 1, 1])

0 commit comments

Comments
 (0)
0