-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
add HDBSCAN #14331
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I will be happy to provide assistance with moving it over -- there are some changes that will be required, mostly related to the difference between accessing internals of scikit-learn kd-trees via Cython. I will also be happy to help with reviewing. |
Should the release of optics play into that decision?
|
I'm really not that familiar with OPTICS. |
Btw I like the demo dataset for hdbscan, maybe it could replace some of the other ones we have in the comparison? https://hdbscan.readthedocs.io/en/latest/comparing_clustering_algorithms.html |
@jnothman I know you're catching up with a lot but maybe this is worth looking into, given that there's still issues with Optics and this is actually a pretty well-tested implementation? |
OPTICS was announced as a "major feature" in v0.21 so I guess it's now there to stay in any case? The 1999 paper also has 3.5k citations. Unless there are significant issues with it? I haven't found that many on issue tracker, but I haven't followed the development either. I think it would be good to include HDBSCAN, just saying that purely following the inclusion criteria (independently of any technical merits of the algorithms) it made sense to include OPTICS first. Now what impact that may have on the future HDBSCAN inclusion I'm not sure. |
@rth not sure I follow your logic. Are you talking about the class or the implementation or both? |
I meant the OPTICS algorithm, not so much the implementation. I was not aware that OPTICS results could be obtained with HDBSCAN exactly. As long as we don't break backward compatibility of OPTICS I don't really have an opinion, and will let people who have worked on this decide.. |
I haven't read the HDBSCAN's paper in detail, but as I understand, it's not strictly a superset of OPTICS, but it seems the community has accepted that it's a better one compare to OPTICS. I don't think it'd be too hard to refactor the code so that both algorithms can use the core part. |
HDBSCAN and OPTICS share the same computational core (though HDBSCAN is a little more general); the post-processing can be a little different. I do think you want to look to re-use/integrate the core code if possible to improve stability, debugging, and maintenance. |
Finally closed in #26385 😄 |
I think we should add HDBSCAN. the original paper is from 2013, @lmcinnes's accelerated version is from 2017, the original paper has 300 citations, the 2017 JOSS paper about the implementation has 100.
I think that should fulfill our requirements, and it's commonly asked for.
@lmcinnes said he might not have time to move it so maybe someone else can pick it up.
For reference:
https://github.com/scikit-learn-contrib/hdbscan
The text was updated successfully, but these errors were encountered: