[MRG+2] Fix sparse matrix handling in clustering #4052

amueller · 2015-01-06T16:06:35Z

Needs fixing:

MeanShift
DBSCAN
AffinityPropagation

amueller · 2015-01-06T16:22:39Z

@jnothman
The issue in DBSCAN is here:

> /home/andy/checkout/scikit-learn/sklearn/cluster/dbscan_.py(234)fit()
-> self.components_ = X[self.core_sample_indices_].copy()
(Pdb) p self.core_sample_indices_
array([], dtype=int64)

jnothman · 2015-01-06T21:41:11Z

Huh? Oh, perhaps that's the only part of the implementation I didn't test! :s

amueller · 2015-01-06T21:58:40Z

You can never test enough ;)

jnothman · 2015-01-06T22:31:07Z

No, had nothing to do with parts I didn't test. It's an edge case. That scipy.sparse doesn't seem to handle well... Until recently there was no support for sparse matrices with a zero-dimension. Apart from that, it seems csr_matrix.__getitem__ doesn't handle the edge case of an empty index array

amueller · 2015-01-06T22:33:12Z

Meh :-/ Do you want to look into it / do a fix? I'm not super familiar with the DBSCAN code.

amueller · 2015-01-06T22:33:54Z

sklearn/utils/estimator_checks.py

+def is_supervised(estimator):
+    return (isinstance(estimator, ClassifierMixin)
+            or isinstance(estimator, RegressorMixin)
+            # transformers can all take a y


amueller · 2015-01-06T22:34:15Z

more tests with less code :D

jnothman · 2015-01-06T22:39:38Z

Regrding the edge case of slicing with an empty index array: this is unsupported in scipy 0.13, but is fixed in master. So perhaps the solution is to special-case it. If we land up with no core samples, create a dense array of the correct shape (but with no data)! Or, could special-case it only if scipy raises an error.

amueller · 2015-01-06T22:46:11Z

maybe just special-case it all the time and comment saying it is scipy backward compatibility?

amueller · 2015-01-07T21:14:50Z

fixed the dbscan issue.

coveralls · 2015-01-09T18:31:33Z

Coverage increased (+0.01%) when pulling 75f2f5e on amueller:clustering_sparse_matrix into 69567ec on scikit-learn:master.

jnothman · 2015-01-11T02:59:07Z

sklearn/cluster/tests/test_dbscan.py

@@ -78,6 +77,19 @@ def test_dbscan_sparse():
    assert_array_equal(labels_dense, labels_sparse)


+def test_dbscan_no_score_samples():


*no_core_samples

(Maybe that's where the missing s went from a very different patch)

haha I bet.

This comment has not been addressed.

Damn. I forgot to push at some point or a rebase messed it up :-/

jnothman · 2015-01-11T04:58:03Z

Except for check_sparsify_multiclass_classifier needing a better name, and the special-casing in dbscan needing a comment, this LGTM.

amueller · 2015-01-11T18:45:45Z

The check_sparsify has a better name in the other PR ;) I need to rebase this one, sorry.

amueller · 2015-01-11T18:45:58Z

But I can also ditch the other one and we just use this.

amueller · 2015-01-11T18:46:20Z

the problem is that I used #4058 elsewhere, too.

jnothman · 2015-01-11T22:09:13Z

Ah sorry, I'd forgotten this was on top of #4058

On 12 January 2015 at 05:45, Andreas Mueller notifications@github.com
wrote:

The check_sparsify has a better name in the other PR ;) I need to rebase
this one, sorry.

—
Reply to this email directly or view it on GitHub
#4052 (comment)
.

ogrisel · 2015-01-15T10:17:25Z

#4058 has been merged but the diff view of this PR does not reflect it. @amueller can you please rebase this on top of the current master and push -f to get github to compute a new diff (and have travis re-run the tests just to be sure?).

amueller · 2015-01-15T16:16:31Z

done

ogrisel · 2015-01-16T12:53:02Z

sklearn/cluster/tests/test_dbscan.py

+    X[X < .8] = 0
+
+    db = DBSCAN().fit(X)
+    assert_array_equal(db.components_, np.empty((0, X.shape[1])))


I assume that all samples are clustered as outliers in that case? It would be great to add an assertion to check that here, first by asserting that core_sample_indices_ has shape (0,) and second that labels_ is filed with -1s.

ogrisel · 2015-01-16T13:16:46Z

Aside from this last minor comment, +1 on my side as well.

Make sparsity check check everything. don't test everything. That would be nice but is out of scope :-/ catch special case of no core samples in DBSCAN add nonregression test for sparse dbscan with no core samples.

amueller · 2015-01-16T17:04:18Z

will merge if travis agrees with me.

[MRG+2] Fix sparse matrix handling in clustering

amueller force-pushed the clustering_sparse_matrix branch from 77a4e13 to de91554 Compare January 6, 2015 16:24

amueller mentioned this pull request Jan 6, 2015

AffinityPropagation fit fails with sparse matrix input #4051

Closed

tttthomasssss mentioned this pull request Jan 6, 2015

fix for issue #4051 #4054

Merged

amueller mentioned this pull request Jan 6, 2015

Ensure common tests cover everything #4056

Closed

jnothman closed this Jan 6, 2015

jnothman reopened this Jan 6, 2015

amueller reviewed Jan 6, 2015
View reviewed changes

amueller mentioned this pull request Jan 6, 2015

[MRG+1] slight cleanup of common tests. #4058

Merged

amueller force-pushed the clustering_sparse_matrix branch 2 times, most recently from 9925a98 to c71ebb2 Compare January 6, 2015 23:14

amueller force-pushed the clustering_sparse_matrix branch from ae632fc to 75f2f5e Compare January 9, 2015 18:21

amueller changed the title ~~[WIP] TST add test for sparse matrix handling in clustering~~ [MRG] TST add test for sparse matrix handling in clustering Jan 9, 2015

jnothman reviewed Jan 11, 2015
View reviewed changes

amueller force-pushed the clustering_sparse_matrix branch 2 times, most recently from 04a4364 to 2345d23 Compare January 11, 2015 18:56

amueller changed the title ~~[MRG] TST add test for sparse matrix handling in clustering~~ [MRG + 1] Fix sparse matrix handling in clustering Jan 11, 2015

ogrisel changed the title ~~[MRG + 1] Fix sparse matrix handling in clustering~~ [MRG+1] Fix sparse matrix handling in clustering Jan 15, 2015

amueller force-pushed the clustering_sparse_matrix branch from 2345d23 to 3e3fdd3 Compare January 15, 2015 16:16

amueller force-pushed the clustering_sparse_matrix branch from 3e3fdd3 to 93ed5d8 Compare January 15, 2015 17:04

ogrisel reviewed Jan 16, 2015
View reviewed changes

TST add test for sparse matrix handling in clustering

494b8e5

Make sparsity check check everything. don't test everything. That would be nice but is out of scope :-/ catch special case of no core samples in DBSCAN add nonregression test for sparse dbscan with no core samples.

amueller force-pushed the clustering_sparse_matrix branch from 93ed5d8 to 494b8e5 Compare January 16, 2015 17:03

amueller changed the title ~~[MRG+1] Fix sparse matrix handling in clustering~~ [MRG+2] Fix sparse matrix handling in clustering Jan 16, 2015

amueller added a commit that referenced this pull request Jan 16, 2015

Merge pull request #4052 from amueller/clustering_sparse_matrix

bf203de

[MRG+2] Fix sparse matrix handling in clustering

amueller merged commit bf203de into scikit-learn:master Jan 16, 2015

amueller deleted the clustering_sparse_matrix branch January 16, 2015 20:39

jnothman mentioned this pull request Mar 5, 2015

[MRG] FIX/TST boundary cases in dbscan #4073

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[MRG+2] Fix sparse matrix handling in clustering #4052

[MRG+2] Fix sparse matrix handling in clustering #4052

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

		@@ -78,6 +77,19 @@ def test_dbscan_sparse():
		assert_array_equal(labels_dense, labels_sparse)


		def test_dbscan_no_score_samples():

Uh oh!

[MRG+2] Fix sparse matrix handling in clustering #4052

[MRG+2] Fix sparse matrix handling in clustering #4052

Uh oh!

Conversation

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!