scikit-learn · adrinjalali · May 31, 2023 · Oct 12, 2022 · Dec 19, 2022 · Feb 7, 2023
diff --git a/doc/modules/classes.rst b/doc/modules/classes.rst
@@ -104,6 +104,7 @@ Classes
    cluster.AgglomerativeClustering
    cluster.Birch
    cluster.DBSCAN
+   cluster.HDBSCAN
    cluster.FeatureAgglomeration
    cluster.KMeans
    cluster.BisectingKMeans

diff --git a/doc/modules/clustering.rst b/doc/modules/clustering.rst
@@ -93,6 +93,13 @@ Overview of clustering methods
        transductive
      - Distances between nearest points
 
+   * - :ref:`HDBSCAN <hdbscan>`
+     - minimum cluster membership, minimum point neighbors
+     - large ``n_samples``, medium ``n_clusters``
+     - Non-flat geometry, uneven cluster sizes, outlier removal,
+       transductive, hierarchical, variable cluster density
+     - Distances between nearest points
+
    * - :ref:`OPTICS <optics>`
      - minimum cluster membership
      - Very large ``n_samples``, large ``n_clusters``
@@ -392,8 +399,8 @@ for centroids to be the mean of the points within a given region. These
 candidates are then filtered in a post-processing stage to eliminate
 near-duplicates to form the final set of centroids.
 
-The position of centroid candidates is iteratively adjusted using a technique called hill 
-climbing, which finds local maxima of the estimated probability density. 
+The position of centroid candidates is iteratively adjusted using a technique called hill
+climbing, which finds local maxima of the estimated probability density.
 Given a candidate centroid :math:`x` for iteration :math:`t`, the candidate
 is updated according to the following equation:
 
@@ -412,14 +419,14 @@ its neighborhood:
 
     m(x) = \frac{1}{|N(x)|} \sum_{x_j \in N(x)}x_j - x
 
-In general, the equation for :math:`m` depends on a kernel used for density estimation. 
+In general, the equation for :math:`m` depends on a kernel used for density estimation.
 The generic formula is:
 
 .. math::
 
     m(x) = \frac{\sum_{x_j \in N(x)}K(x_j - x)x_j}{\sum_{x_j \in N(x)}K(x_j - x)} - x
 
-In our implementation, :math:`K(x)` is equal to 1 if :math:`x` is small enough and is 
+In our implementation, :math:`K(x)` is equal to 1 if :math:`x` is small enough and is
 equal to 0 otherwise. Effectively :math:`K(y - x)` indicates whether :math:`y` is in
 the neighborhood of :math:`x`.
 
@@ -961,6 +968,112 @@ by black points below.
    Schubert, E., Sander, J., Ester, M., Kriegel, H. P., & Xu, X. (2017).
    In ACM Transactions on Database Systems (TODS), 42(3), 19.
 
+.. _hdbscan:
+
+HDBSCAN
+=======
+
+The :class:`HDBSCAN` algorithm can be seen as an extension of :class:`DBSCAN`
+and :class:`OPTICS`. Specifically, :class:`DBSCAN` assumes that the clustering
+criterion (i.e. density requirement) is *globally homogeneous*.
+In other words, :class:`DBSCAN` may struggle to successfully capture clusters
+with different densities.
+:class:`HDBSCAN` alleviates this assumption and explores all possible density
+scales by building an alternative representation of the clustering problem.
+
+.. note::
+
+  This implementation is adapted from the original implementation of HDBSCAN,
+  `scikit-learn-contrib/hdbscan <https://github.com/scikit-learn-contrib/hdbscan>`_ based on [LJ2017]_.
+
+Mutual Reachability Graph
+-------------------------
+
+HDBSCAN first defines :math:`d_c(x_p)`, the *core distance* of a sample :math:`x_p`, as the
+distance to its `min_samples` th-nearest neighbor, counting itself. For example,
+if `min_samples=5` and :math:`x_*` is the 5th-nearest neighbor of :math:`x_p`
+then the core distance is:
+
+.. math:: d_c(x_p)=d(x_p, x_*).
+
+Next it defines :math:`d_m(x_p, x_q)`, the *mutual reachability distance* of two points
+:math:`x_p, x_q`, as:
+
+.. math:: d_m(x_p, x_q) = \max\{d_c(x_p), d_c(x_q), d(x_p, x_q)\}
+
+These two notions allow us to construct the *mutual reachability graph*
+:math:`G_{ms}` defined for a fixed choice of `min_samples` by associating each
+sample :math:`x_p` with a vertex of the graph, and thus edges between points
+:math:`x_p, x_q` are the mutual reachability distance :math:`d_m(x_p, x_q)`
+between them. We may build subsets of this graph, denoted as
+:math:`G_{ms,\varepsilon}`, by removing any edges with value greater than :math:`\varepsilon`:
+from the original graph. Any points whose core distance is less than :math:`\varepsilon`:
+are at this staged marked as noise. The remaining points are then clustered by
+finding the connected components of this trimmed graph.
+
+.. note::
+
+  Taking the connected components of a trimmed graph :math:`G_{ms,\varepsilon}` is
+  equivalent to running DBSCAN* with `min_samples` and :math:`\varepsilon`. DBSCAN* is a
+  slightly modified version of DBSCAN mentioned in [CM2013]_.
+
+Hierarchical Clustering
+-----------------------
+HDBSCAN can be seen as an algorithm which performs DBSCAN* clustering across all
+values of :math:`\varepsilon`. As mentioned prior, this is equivalent to finding the connected
+components of the mutual reachability graphs for all values of :math:`\varepsilon`. To do this
+efficiently, HDBSCAN first extracts a minimum spanning tree (MST) from the fully
+-connected mutual reachability graph, then greedily cuts the edges with highest
+weight. An outline of the HDBSCAN algorithm is as follows:
+
+  1. Extract the MST of :math:`G_{ms}`
+  2. Extend the MST by adding a "self edge" for each vertex, with weight equal
+     to the core distance of the underlying sample.
+  3. Initialize a single cluster and label for the MST.
+  4. Remove the edge with the greatest weight from the MST (ties are
+     removed simultaneously).
+  5. Assign cluster labels to the connected components which contain the
+     end points of the now-removed edge. If the component does not have at least
+     one edge it is instead assigned a "null" label marking it as noise.
+  6. Repeat 4-5 until there are no more connected components.
+
+HDBSCAN is therefore able to obtain all possible partitions achievable by
+DBSCAN* for a fixed choice of `min_samples` in a hierarchical fashion.
+Indeed, this allows HDBSCAN to perform clustering across multiple densities
+and as such it no longer needs :math:`\varepsilon` to be given as a hyperparameter. Instead
+it relies solely on the choice of `min_samples`, which tends to be a more robust
+hyperparameter.
+
+.. |hdbscan_ground_truth| image:: ../auto_examples/cluster/images/sphx_glr_plot_hdbscan_005.png
+        :target: ../auto_examples/cluster/plot_hdbscan.html
+        :scale: 75
+.. |hdbscan_results| image:: ../auto_examples/cluster/images/sphx_glr_plot_hdbscan_007.png
+        :target: ../auto_examples/cluster/plot_hdbscan.html
+        :scale: 75
+
+.. centered:: |hdbscan_ground_truth|
+.. centered:: |hdbscan_results|
+
+HDBSCAN can be smoothed with an additional hyperparameter `min_cluster_size`
+which specifies that during the hierarchical clustering, components with fewer
+than `minimum_cluster_size` many samples are considered noise. In practice, one
+can set `minimum_cluster_size = min_samples` to couple the parameters and
+simplify the hyperparameter space.
+
+.. topic:: References:
+
+ .. [CM2013] Campello, R.J.G.B., Moulavi, D., Sander, J. (2013). Density-Based Clustering
+   Based on Hierarchical Density Estimates. In: Pei, J., Tseng, V.S., Cao, L.,
+   Motoda, H., Xu, G. (eds) Advances in Knowledge Discovery and Data Mining.
+   PAKDD 2013. Lecture Notes in Computer Science(), vol 7819. Springer, Berlin,
+   Heidelberg.
+   :doi:`Density-Based Clustering Based on Hierarchical Density Estimates <10.1007/978-3-642-37456-2_14>`
+
+ .. [LJ2017] L. McInnes and J. Healy, (2017). Accelerated Hierarchical Density Based
+   Clustering. In: IEEE International Conference on Data Mining Workshops (ICDMW),
+   2017, pp. 33-42.
+   :doi:`Accelerated Hierarchical Density Based Clustering <10.1109/ICDMW.2017.12>`
+
 .. _optics:
 
 OPTICS
@@ -1033,7 +1146,7 @@ represented as children of a larger parent cluster.
     Different distance metrics can be supplied via the ``metric`` keyword.
 
     For large datasets, similar (but not identical) results can be obtained via
-    `HDBSCAN <https://hdbscan.readthedocs.io>`_. The HDBSCAN implementation is
+    :class:`HDBSCAN`. The HDBSCAN implementation is
     multithreaded, and has better algorithmic runtime complexity than OPTICS,
     at the cost of worse memory scaling. For extremely large datasets that
     exhaust system memory using HDBSCAN, OPTICS will maintain :math:`n` (as opposed

diff --git a/doc/whats_new/v1.3.rst b/doc/whats_new/v1.3.rst
@@ -192,6 +192,19 @@ Changelog
   :user:`Jérémie du Boisberranger <jeremiedbb>`,
   :user:`Guillaume Lemaitre <glemaitre>`.
 
+- |MajorFeature| Added :class:`cluster.HDBSCAN`, a modern hierarchical density-based
+  clustering algorithm. Similarly to :class:`cluster.OPTICS`, it can be seen as a
+  generalization of :class:`DBSCAN` by allowing for hierarchical instead of flat
+  clustering, however it varies in its approach from :class:`cluster.OPTICS`. This
+  algorithm is very robust with respect to its hyperparameters' values and can
+  be used on a wide variety of data without much, if any, tuning.
+
+  This implementation is an adaptation from the original implementation of HDBSCAN in
+  `scikit-learn-contrib/hdbscan <https://github.com/scikit-learn-contrib/hdbscan>`_,
+  by :user:`Leland McInnes <lmcinnes>` et al.
+
+  :pr:`26385` by :user:`Meekail Zain <micky774>`
+
 :mod:`sklearn.covariance`
 .........................
 

diff --git a/examples/cluster/plot_cluster_comparison.py b/examples/cluster/plot_cluster_comparison.py
@@ -79,6 +79,9 @@
     "min_samples": 7,
     "xi": 0.05,
     "min_cluster_size": 0.1,
+    "allow_single_cluster": True,
+    "hdbscan_min_cluster_size": 15,
+    "hdbscan_min_samples": 3,
 }
 
 datasets = [
@@ -161,6 +164,11 @@
         affinity="nearest_neighbors",
     )
     dbscan = cluster.DBSCAN(eps=params["eps"])
+    hdbscan = cluster.HDBSCAN(
+        min_samples=params["hdbscan_min_samples"],
+        min_cluster_size=params["hdbscan_min_cluster_size"],
+        allow_single_cluster=params["allow_single_cluster"],
+    )
     optics = cluster.OPTICS(
         min_samples=params["min_samples"],
         xi=params["xi"],
@@ -188,6 +196,7 @@
         ("Ward", ward),
         ("Agglomerative\nClustering", average_linkage),
         ("DBSCAN", dbscan),
+        ("HDBSCAN", hdbscan),
         ("OPTICS", optics),
         ("BIRCH", birch),
         ("Gaussian\nMixture", gmm),