8000 [MRG+2] OPTICS: add extract_xi method by adrinjalali · Pull Request #12077 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

[MRG+2] OPTICS: add extract_xi method #12077

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 107 commits into from
Apr 27, 2019

Conversation

adrinjalali
Copy link
Member
@adrinjalali adrinjalali commented Sep 14, 2018

Related: #12036, #12053, #11677
Fixes #12376

Adding the Xi extraction method to OPTICS.

@espg, I'll need the predecessor_ to add that final touch to it. I'm still testing the implementation, but it'll be nice if I could get some feedback. (done)

Differences with the paper:

  • Definition 11 section 4: comparisons are done with r(e_U + 1), but section 4.3.2 compares values with r(e_U), this implementation takes r(e_U + 1) which seems more consistent.
  • The article assimes min_samples as the minimum size of clusters, which is the default behavior of this implementation, but we also have a min_cluster_size to give more freedom to the user.

@qinhanmin2014
Copy link
Member

This should not be possible with a hierarchical approach such as OPTICS. It would violate the condition that a cluster must not have a in-between value that is higher than the ends. That is different if you had an overlapping clustering such as CLIQUE, but in that case it simply does not make sense to flatten this into a single-integer labeling this way at all.

Ahh, I see. Thanks a lot @adrinjalali @kno10. So our method actually produce the same result as R's. That's fine.

@qinhanmin2014
Copy link
Member
qinhanmin2014 commented Apr 23, 2019

@adrinjalali please mention how we assign the labels.
Please update what's new.
It's very close from my +1 IMO (after CIs are green).
I still think the loop in _xi_cluster can be simplified, but feel free to leave it.

@adrinjalali
Copy link
Member Author

@qinhanmin2014

Please update what's new.

what do you want me to change? I think I've applied to comments there, sorry if I've missed something, could you please remind me?

@@ -719,7 +719,7 @@ def _correct_predecessor(reachability_plot, predecessor, ordering, s, e):
while s < e:
if reachability_plot[s] > reachability_plot[e]:
return s, e
p_e = predecessor[e]
p_e = ordering[predecessor[e]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm. That comes a bit unexpected. I would have expected the predecessor to be an object index, not a plot order index. I'd assume that users will usually expect the predecessor to be in data order, what do you think?

In particular, this is the "externally" used ordering:

predecessor_ : array, shape (n_samples,)
Point that a sample was reached from, indexed by object order.

I'd rather try to get rid of the reindexing here:

clusters = _xi_cluster(reachability[ordering], predecessor[ordering],

or at least use some explicit naming, such as predecessor_plot. Otherwise, this is a maintenance nightmare, and the next author trying to modify this will be similarly confused and also need 4-5 attempts to get the indexing right...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that's great, thanks a lot @adrinjalali @kno10

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, that's why I renamed the parameters to predecessor_plot if the order was the same os reachability_plot. We can/should better document this and clarity in our docs later. But I think this is fine for now.

@qinhanmin2014
Copy link
Member

@adrinjalali we need to update what's new and mention how we assign the labels, but I'm OK to merge this one first.

Copy link
Member
@qinhanmin2014 qinhanmin2014 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qinhanmin2014 qinhanmin2014 changed the title [MRG+1] OPTICS: add extract_xi method [MRG+2] OPTICS: add extract_xi method Apr 25, 2019
@jnothman
Copy link
Member
jnothman commented Apr 25, 2019 via email

Copy link
Member
@qinhanmin2014 qinhanmin2014 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, ping @jnothman is your approval still valid? I think we can merge.

@adrinjalali
Copy link
Member Author

Seems like this is ready for merge :)

@espg
Copy link
Contributor
espg commented Apr 25, 2019

Really good to see this merged-- stellar work everyone! =)

@qinhanmin2014
Copy link
Member

Seems like this is ready for merge :)

I'd like to wait for @jnothman at least. We've introduced couple of changes since his approval.

@jnothman
Copy link
Member

We've introduced couple of changes since his approval.

I've been following the conversation more-or-less. Even if I'm a little ashamed at missing lots, I agree with the subsequent changes and am very glad that @qinhanmin2014 and @kno10 have treated this thoroughly.

@jnothman jnothman merged commit a475594 into scikit-learn:master Apr 27, 2019
@jnothman
Copy link
Member

Done.

@qinhanmin2014
Copy link
Member

Thanks everyone for your great work!

@adrinjalali adrinjalali deleted the optics/extractXi branch April 28, 2019 18:32
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
marcelobeckmann pushed a commit to marcelobeckmann/scikit-learn that referenced this pull request May 1, 2019
marcelobeckmann pushed a commit to marcelobeckmann/scikit-learn that referenced this pull request May 1, 2019
koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

OPTICS Return all the clusters found by the extraction algorithm
6 participants
0