[MRG] MNT rename min_cluster_size_ratio to min_cluster_size #11913

jnothman · 2018-08-26T23:48:16Z

Agreed @adrinjalali? Any objections @espg?

jnothman · 2018-08-27T03:09:01Z

sklearn/cluster/optics_.py

    neighborhood_size = int(min_maxima_ratio * len(ordering))

-    # Should this check for < min_samples? Should this be public?


I don't think it should be min_samples. The minimum cluster size in dbscan is less than min_samples. But I don't see why it should be 5 either.

I have suggested minimum value of 2 in my PR, and changing the name to min_neighbors or something.

Yes, I'm accepting your min cluster size. Not so much the min_samples name. See there.

adrinjalali · 2018-08-27T06:57:45Z

@jnothman it makes sense. I like the change.

jnothman

I should probably be adding some tests here

jnothman · 2018-08-27T08:51:02Z

sklearn/cluster/optics_.py

    neighborhood_size = int(min_maxima_ratio * len(ordering))

-    # Should this check for < min_samples? Should this be public?


Yes, I'm accepting your min cluster size. Not so much the min_samples name. See there.

jnothman · 2018-08-29T03:27:39Z

This should be merged for release. Seeking reviews.

albertcthomas

Small typos

albertcthomas · 2018-08-31T08:12:16Z

sklearn/cluster/optics_.py

@@ -221,8 +223,10 @@ class OPTICS(BaseEstimator, ClusterMixin):
    significant_min : float, optional
        Sets a lower threshold on how small a significant maxima can be.

-    min_cluster_size_ratio : float, optional
-        Minimum percentage of dataset expected for cluster membership.
+    min_cluster_size : int > 1 or loat between 0 and 1


loat -> float

albertcthomas · 2018-08-31T08:12:38Z

sklearn/cluster/optics_.py

@@ -528,8 +532,10 @@ def _extract_optics(ordering, reachability, maxima_ratio=.75,
    significant_min : float, optional
        Sets a lower threshold on how small a significant maxima can be.

-    min_cluster_size_ratio : float, optional
-        Minimum percentage of dataset expected for cluster membership.
+    min_cluster_size : int > 1 or loat between 0 and 1


jnothman · 2018-09-02T23:36:47Z

Thanks @albertcthomas! Does the implementation appear correct? Does the naming seem to be improved by this change?

albertcthomas

I am not very familiar with OPTICS but if I wanted to use it min_cluster_size and its description would totally make sense to me.

albertcthomas · 2018-09-03T09:21:15Z

sklearn/cluster/optics_.py

    neighborhood_size = int(min_maxima_ratio * len(ordering))

-    # Should this check for < min_samples? Should this be public?


Is there a specific reason for removing this in this PR?

Well, I'm changing 5 to 2 above (using max instead of if). @adrinjalali and I seem to agree that min_samples is not the relevant criterion here.

The role of min_samples is kinda different in OPTICS than DBSCAN. In OPTICS you can have the generator epsilon set to a large number (it's inf by default), to generate a smooth reachability_ array. Similarly, min_samples can also be set relatively large. Still, since the clusters are detected hierarchically, it can be the case that a cluster is smaller than the value for min_samples, and that's fine. I agree that it doesn't make much sense probably to do so, but they're independent parameters IMO.

qinhanmin2014 · 2018-09-03T11:43:41Z

I think we need some input validation (maybe some tests) here. Otherwise this LGTM.

albertcthomas

LGTM

albertcthomas · 2018-09-05T11:27:30Z

sklearn/cluster/optics_.py

+            raise ValueError('min_cluster_size must be a positive integer or '
+                             'a float between 0 and 1. Got %r' %
+                             self.min_cluster_size)
+        elif self.min_cluster_size > n_samples:


I assume that min_cluster_size == n_samples is possible if all the samples are in the same unique cluster

adrinjalali · 2018-09-08T21:53:10Z

There's a flake8 error, otherwise LGTM.

…earn#11913)

MNT rename min_cluster_size_ratio to min_cluster_size

b4cf684

jnothman added this to the 0.20 milestone Aug 26, 2018

jnothman commented Aug 27, 2018

View reviewed changes

jnothman changed the title ~~MNT rename min_cluster_size_ratio to min_cluster_size~~ [MRG] MNT rename min_cluster_size_ratio to min_cluster_size Aug 27, 2018

jnothman commented Aug 27, 2018

View reviewed changes

TST add tests, and fix rounding error

c5a143b

jnothman added the Waiting for Reviewer label Aug 29, 2018

albertcthomas reviewed Aug 31, 2018

View reviewed changes

Fix typos

cd97721

albertcthomas approved these changes Sep 3, 2018

View reviewed changes

More validation

eb4e388

albertcthomas approved these changes Sep 5, 2018

View reviewed changes

jnothman added 2 commits September 7, 2018 09:53

Merge branch 'master' into min_cluster_size

10e7cff

Typo

0229663

Pep8

86c8d72

jnothman merged commit a86709f into scikit-learn:master Sep 9, 2018

jnothman added a commit to jnothman/scikit-learn that referenced this pull request Sep 9, 2018

[MRG] MNT rename min_cluster_size_ratio to min_cluster_size (scikit-l…

73e034e

…earn#11913)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG] MNT rename min_cluster_size_ratio to min_cluster_size #11913

[MRG] MNT rename min_cluster_size_ratio to min_cluster_size #11913

		neighborhood_size = int(min_maxima_ratio * len(ordering))

		# Should this check for < min_samples? Should this be public?

[MRG] MNT rename min_cluster_size_ratio to min_cluster_size #11913

[MRG] MNT rename min_cluster_size_ratio to min_cluster_size #11913

Conversation

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment