[MRG+2] OPTICS: add extract_xi method#12077
Conversation
1375b73 to
baa94f9
Compare
Ahh, I see. Thanks a lot @adrinjalali @kno10. So our method actually produce the same result as R's. That's fine. |
|
@adrinjalali please mention how we assign the labels. |
what do you want me to change? I think I've applied to comments there, sorry if I've missed something, could you please remind me? |
sklearn/cluster/optics_.py
Outdated
| if reachability_plot[s] > reachability_plot[e]: | ||
| return s, e | ||
| p_e = predecessor[e] | ||
| p_e = ordering[predecessor[e]] |
There was a problem hiding this comment.
Hmmm. That comes a bit unexpected. I would have expected the predecessor to be an object index, not a plot order index. I'd assume that users will usually expect the predecessor to be in data order, what do you think?
In particular, this is the "externally" used ordering:
scikit-learn/sklearn/cluster/optics_.py
Lines 163 to 164 in d83c85d
I'd rather try to get rid of the reindexing here:
scikit-learn/sklearn/cluster/optics_.py
Line 624 in d83c85d
or at least use some explicit naming, such as
predecessor_plot. Otherwise, this is a maintenance nightmare, and the next author trying to modify this will be similarly confused and also need 4-5 attempts to get the indexing right...
There was a problem hiding this comment.
yeah, that's why I renamed the parameters to predecessor_plot if the order was the same os reachability_plot. We can/should better document this and clarity in our docs later. But I think this is fine for now.
|
@adrinjalali we need to update what's new and mention how we assign the labels, but I'm OK to merge this one first. |
There was a problem hiding this comment.
LGTM ping @jnothman @espg @GaelVaroquaux
|
Ridiculously great effort everyone. I think we should make sure to credit
Shane, Adrin, Erich, Assia and Hanmin in what's new... And then hope this
thing survives its release.
|
There was a problem hiding this comment.
LGTM, ping @jnothman is your approval still valid? I think we can merge.
|
Seems like this is ready for merge :) |
|
Really good to see this merged-- stellar work everyone! =) |
I'd like to wait for @jnothman at least. We've introduced couple of changes since his approval. |
I've been following the conversation more-or-less. Even if I'm a little ashamed at missing lots, I agree with the subsequent changes and am very glad that @qinhanmin2014 and @kno10 have treated this thoroughly. |
|
Done. |
|
Thanks everyone for your great work! |
This reverts commit 14724c7.
This reverts commit 14724c7.
Related: #12036, #12053, #11677
Fixes #12376
Adding the Xi extraction method to OPTICS.
@espg, I'll need the
predecessor_to add that final touch to it. I'm still testing the implementation, but it'll be nice if I could get some feedback. (done)Differences with the paper:
r(e_U + 1), but section 4.3.2 compares values withr(e_U), this implementation takesr(e_U + 1)which seems more consistent.min_samplesas the minimum size of clusters, which is the default behavior of this implementation, but we also have amin_cluster_sizeto give more freedom to the user.