rth
diff --git a/‎.gitignore
Lines changed: 1 addition & 0 deletions b/‎.gitignore
Lines changed: 1 addition & 0 deletions
diff --git a/‎doc/api.rst
Lines changed: 2 additions & 1 deletion b/‎doc/api.rst
Lines changed: 2 additions & 1 deletion
diff --git a/‎doc/user_guide.rst
Lines changed: 128 additions & 9 deletions b/‎doc/user_guide.rst
Lines changed: 128 additions & 9 deletions
diff --git a/‎examples/plot_commonnn.py
Lines changed: 82 additions & 0 deletions b/‎examples/plot_commonnn.py
Lines changed: 82 additions & 0 deletions
diff --git a/‎examples/plot_commonnn_data_sets.py
Lines changed: 110 additions & 0 deletions b/‎examples/plot_commonnn_data_sets.py
Lines changed: 110 additions & 0 deletions
diff --git a/‎setup.py
Lines changed: 7 additions & 1 deletion b/‎setup.py
Lines changed: 7 additions & 1 deletion
diff --git a/‎sklearn_extra/cluster/__init__.py
Lines changed: 2 additions & 1 deletion b/‎sklearn_extra/cluster/__init__.py
Lines changed: 2 additions & 1 deletion
@@ -4,6 +4,7 @@ __pycache__/
 *$py.class
 
 *.c
+*.cpp
 
 # C extensions
 *.so
 
@@ -31,7 +31,8 @@ Clustering
    :template: class.rst
 
    cluster.KMedoids
-   
+   cluster.CommonNNClustering
+
 Robust
 ====================
 
 
@@ -8,10 +8,10 @@ User guide
 ==========
 
 .. toctree::
-     :numbered:
+  :numbered:
 
-     modules/eigenpro.rst
-     modules/robust.rst
+  modules/eigenpro.rst
+  modules/robust.rst
 
 .. _k_medoids:
 
@@ -44,8 +44,8 @@ clusters. This makes it more suitable for smaller datasets in comparison to
 
 .. topic:: Examples:
 
- * :ref:`sphx_glr_auto_examples_plot_kmedoids_digits.py`: Applying K-Medoids on digits
-   with various distance metrics.
+  * :ref:`sphx_glr_auto_examples_plot_kmedoids_digits.py`: Applying K-Medoids on digits
+    with various distance metrics.
 
 
 **Algorithm description:**
@@ -64,7 +64,126 @@ This version works as follows:
   maximum number of iterations ``max_iter`` is reached.
 
 .. topic:: References:
-* Maranzana, F.E., 1963. On the location of supply points to minimize
-  transportation costs. IBM Systems Journal, 2(2), pp.129-135.
-* Park, H.S. and Jun, C.H., 2009. A simple and fast algorithm for K-medoids
-  clustering. Expert systems with applications, 36(2), pp.3336-3341.
+
+  * Maranzana, F.E., 1963. On the location of supply points to minimize
+    transportation costs. IBM Systems Journal, 2(2), pp.129-135.
+  * Park, H.S. and Jun, C.H., 2009. A simple and fast algorithm for K-medoids
+    clustering. Expert systems with applications, 36(2), pp.3336-3341.
+
+.. _commonnn:
+
+Common-nearest-neighbors clustering
+===================================
+
+:class:`CommonNNClustering <sklearn_extra.cluster.CommonNNClustering>`
+provides an interface to density-based
+common-nearest-neighbors clustering. Density-based clustering identifies
+clusters as dense regions of high point density, separated by sparse
+regions of lower density. Common-nearest-neighbors clustering
+approximates local density as the number of shared (common) neighbors
+between two points with respect to a neighbor search radius. A density
+threshold (density criterion) is used – defined by the cluster
+parameters ``min_samples`` (number of common neighbors) and ``eps`` (search
+radius) – to distinguish high from low density. A high value of
+``min_samples`` and a low value of ``eps`` corresponds to high density.
+
+As such the method is related to other density-based cluster algorithms
+like :class:`DBSCAN <sklearn.cluster.DBSCAN>` or Jarvis-Patrick. DBSCAN
+approximates local density as the number of points in the neighborhood
+of a single point. The Jarvis-Patrick algorithm uses the number of
+common neighbors shared by two points among the :math:`k` nearest neighbors.
+As these approaches each provide a different notion of how density is
+estimated from point samples, they can be used complementarily. Their
+relative suitability for a classification problem depends on the nature
+of the clustered data. Common-nearest-neighbors clustering (as
+density-based clustering in general) has the following advantages over
+other clustering techniques:
+
+  * The cluster result is deterministic. The same set of cluster
+    parameters always leads to the same classification for a data set.
+    A different ordering of the data set leads to a different ordering
+    of the cluster assignment, but does not change the assignment
+    qualitatively.
+  * Little prior knowledge about the data is required, e.g. the number
+    of resulting clusters does not need to be known beforehand (although
+    cluster parameters need to be tuned to obtain a desired result).
+  * Identified clusters are not restricted in their shape or size.
+  * Points can be considered noise (outliers) if they do not fullfil
+    the density criterion.
+
+The common-nearest-neighbors algorithm tests the density criterion for
+pairs of neighbors (do they have at least ``min_samples`` points in the
+intersection of their neighborhoods at a radius ``eps``). Two points that
+fullfil this criterion are directly part of the same dense data region,
+i.e. they are *density reachable*. A *density connected* network of
+density reachable points (a connected component if density reachability
+is viewed as a graph structure) constitutes a separated dense region and
+therefore a cluster. Note, that for example in contrast to
+:class:`DBSCAN <sklearn.cluster.DBSCAN>` there is no differentiation in
+*core* (dense points) and *edge* points (points that are not dense
+themselves but neighbors of dense points). The assignment of points on
+the cluster rims to a cluster is possible, but can be ambiguous. The
+cluster result is returned as a 1D container of labels, i.e. a sequence
+of integers (zero-based) of length :math:`n` for a data set of :math:`n`
+points,
+denoting the assignment of points to a specific cluster. Noise is
+labeled with ``-1``. Valid clusters have at least two members. The
+clusters are not sorted by cluster member count. In same cases the
+algorithm tends to identify small clusters that can be filtered out
+manually.
+
+.. topic:: Examples:
+
+  * :ref:`examples/cluster/plot_commonnn.py <sphx_glr_auto_examples_plot_commonnn.py>`
+    Basic usage of the
+    :class:`CommonNNClustering <sklearn_extra.cluster.CommonNNClustering>`
+  * :ref:`examples/cluster/plot_commonnn_data_sets.py <sphx_glr_auto_examples_plot_commonnn_data_sets.py>`
+    Common-nearest-neighbors clustering of toy data sets
+
+.. topic:: Implementation:
+
+  The present implementation of the common-nearest-neighbors algorithm in
+  :class:`CommonNNClustering <sklearn_extra.cluster.CommonNNClustering>`
+  shares some
+  commonalities with the current
+  scikit-learn implementation of :class:`DBSCAN <sklearn.cluster.DBSCAN>`.
+  It computes neighborhoods from points in bulk with
+  :class:`NearestNeighbors <sklearn.neighbors.NearestNeighbors>` before
+  the actual clustering. Consequently, to store the neighborhoods
+  it requires memory on the order of
+  :math:`O(n ⋅ n_n)` for :math:`n` points in the data set where :math:`n_n`
+  is the
+  average number of neighbors (which is proportional to ``eps``), that is at
+  worst :math:`O(n^2)`. Depending on the input structure (dense or sparse
+  points or similarity matrix) the additional memory demand varies.
+  The clustering itself follows a
+  breadth-first-search scheme, checking the density criterion at every
+  node expansion. The linear time complexity is roughly proportional to
+  the number of data points :math:`n`, the total number of neighbors :math:`N`
+  and the value of ``min_samples``. For density-based clustering
+  schemes with lower memory demand, also consider:
+
+    * :class:`OPTICS <sklearn.cluster.OPTICS>` – Density-based clustering
+      related to DBSCAN using a ``eps`` value range.
+    * `cnnclustering <https://pypi.org/project/cnnclustering/>`_ – A
+      different implementation of common-nearest-neighbors clustering.
+
+.. topic:: Notes:
+
+  * :class:`DBSCAN <sklearn.cluster.DBSCAN>` provides an option to
+    specify data point weights with ``sample_weights``. This feature is
+    experimentally at the moment for :class:`CommonNNClustering` as
+    weights are not well defined for checking the common-nearest-neighbor
+    density criterion. It should not be used in production, yet.
+
+.. topic:: References:
+
+  * B. Keller, X. Daura, W. F. van Gunsteren "Comparing Geometric and
+    Kinetic Cluster Algorithms for Molecular Simulation Data" J. Chem.
+    Phys., 2010, 132, 074110.
+
+  * O. Lemke, B.G. Keller "Density-based Cluster Algorithms for the
+    Identification of Core Sets" J. Chem. Phys., 2016, 145, 164104.
+
+  * O. Lemke, B.G. Keller "Common nearest neighbor clustering - a
+    benchmark" Algorithms, 2018, 11, 19.
@@ -0,0 +1,82 @@
+# -*- coding: utf-8 -*-
+"""
+=========================================
+Common-nearest-neighbor clustering demo I
+=========================================
+
+Common-nearest neighbor clustering of data points following a density
+criterion. Two points will be part of the same cluster if they share a
+minimum number of common neighbors. Read more in the :ref:`User Guide
+<commonnn>`.
+
+"""
+import matplotlib.pyplot as plt
+import numpy as np
+
+from sklearn_extra.cluster import CommonNNClustering
+from sklearn import metrics
+from sklearn.datasets import make_blobs
+from sklearn.preprocessing import StandardScaler
+
+
+print(__doc__)
+
+# #############################################################################
+# Generate sample data
+centers = [[1, 1], [-1, -1], [1, -1]]
+X, labels_true = make_blobs(
+    n_samples=750, centers=centers, cluster_std=0.4, random_state=0
+)
+
+X = StandardScaler().fit_transform(X)
+
+# #############################################################################
+# Compute common-nearest-neighbor clustering
+cobj = CommonNNClustering(eps=0.3, min_samples=8).fit(X)
+labels = cobj.labels_
+
+# Number of clusters in labels, ignoring noise if present.
+n_clusters_ = len(set(labels)) - (1 if -1 in labels else 0)
+n_noise_ = list(labels).count(-1)
+
+print("Estimated number of clusters: %d" % n_clusters_)
+print("Estimated number of noise points: %d" % n_noise_)
+print("Homogeneity: %0.3f" % metrics.homogeneity_score(labels_true, labels))
+print("Completeness: %0.3f" % metrics.completeness_score(labels_true, labels))
+print("V-measure: %0.3f" % metrics.v_measure_score(labels_true, labels))
+print(
+    "Adjusted Rand Index: %0.3f"
+    % metrics.adjusted_rand_score(labels_true, labels)
+)
+print(
+    "Adjusted Mutual Information: %0.3f"
+    % metrics.adjusted_mutual_info_score(labels_true, labels)
+)
+print("Silhouette Coefficient: %0.3f" % metrics.silhouette_score(X, labels))
+
+# #############################################################################
+# Plot result
+
+# Black removed and is used for noise instead.
+unique_labels = set(labels)
+colors = [
+    plt.cm.Spectral(each) for each in np.linspace(0, 1, len(unique_labels))
+]
+for k, col in zip(unique_labels, colors):
+    if k == -1:
+        # Black used for noise.
+        col = [0, 0, 0, 1]
+
+    class_member_mask = labels == k
+
+    xy = X[class_member_mask]
+    plt.plot(
+        xy[:, 0],
+        xy[:, 1],
+        "o",
+        markerfacecolor=tuple(col),
+        markeredgecolor="k",
+        markersize=6,
+    )
+
+plt.title("Estimated number of clusters: %d" % n_clusters_)
@@ -0,0 +1,110 @@
+"""
+==========================================
+Common-nearest-neighbor clustering demo II
+==========================================
+
+Common-nearest neighbor clustering of data points following a density
+criterion. Two points will be part of the same cluster if they share a
+minimum number of common neighbors. Read more in the :ref:`User Guide
+<commonnn>`. Compare this example to the results for
+`other <https://scikit-learn.org/stable/auto_examples/cluster/plot_cluster_comparison.html>`_
+cluster algorithms.
+
+"""
+
+import matplotlib.pyplot as plt
+import numpy as np
+
+from sklearn_extra.cluster import CommonNNClustering
+from sklearn import datasets
+from sklearn.preprocessing import StandardScaler
+
+
+print(__doc__)
+
+
+np.random.seed(42)
+n = 2000
+
+# circles
+circles, _ = datasets.make_circles(
+    n_samples=n, factor=0.5, noise=0.05, random_state=10
+)
+
+circles = StandardScaler().fit_transform(circles)
+
+# blobs
+blobs, _ = datasets.make_blobs(
+    centers=[[-9, -8], [11, -10], [12, 12]], n_samples=n, random_state=10
+)
+
+blobs = StandardScaler().fit_transform(blobs)
+
+# moons
+moons, _ = datasets.make_moons(n_samples=n, noise=0.05, random_state=10)
+
+moons = StandardScaler().fit_transform(moons)
+
+# no_structure
+no_structure = np.random.rand(n, 2)
+no_structure = StandardScaler().fit_transform(no_structure)
+
+# aniso
+X, y = datasets.make_blobs(n_samples=n, random_state=170)
+
+transformation = [[0.6, -0.6], [-0.4, 0.8]]
+aniso = np.dot(X, transformation)
+aniso = StandardScaler().fit_transform(aniso)
+
+# varied
+varied, _ = datasets.make_blobs(
+    n_samples=n, cluster_std=[1.0, 2, 0.5], random_state=170
+)
+
+varied = StandardScaler().fit_transform(varied)
+
+fits = [
+    ("circles", circles, {"eps": 0.2, "min_samples": 5}),
+    ("moons", moons, {"eps": 0.2, "min_samples": 5}),
+    ("varied", varied, {"eps": 0.2, "min_samples": 15}),
+    ("aniso", aniso, {"eps": 0.18, "min_samples": 12}),
+    ("blobs", blobs, {"eps": 0.2, "min_samples": 5}),
+    ("none", no_structure, {"eps": 0.2, "min_samples": 5}),
+]
+
+fig, ax = plt.subplots(2, 3)
+ax = ax.flatten()
+for index, (name, data, params) in enumerate(fits):
+    cobj = CommonNNClustering(**params).fit(data)
+    labels = cobj.labels_
+    ax[index].plot(
+        *data[np.where(labels == -1)[0]].T,
+        linestyle="",
+        color="None",
+        marker="o",
+        markersize=4,
+        markerfacecolor="gray",
+        markeredgecolor="k",
+    )
+
+    for cluster_number in range(0, int(np.max(labels)) + 1):
+        ax[index].plot(
+            *data[np.where(labels == cluster_number)[0]].T,
+            linestyle="",
+            marker="o",
+            markersize=4,
+            markeredgecolor="k",
+        )
+
+    ax[index].set(
+        **{
+            "xlabel": None,
+            "ylabel": None,
+            "xlim": (-2.5, 2.5),
+            "ylim": (-2.5, 2.5),
+            "xticks": (),
+            "yticks": (),
+            "aspect": "equal",
+            "title": name,
+        }
+    )
@@ -62,7 +62,13 @@
                 "sklearn_extra.utils._cyfht",
                 ["sklearn_extra/utils/_cyfht.pyx"],
                 include_dirs=[np.get_include()],
-            )
+            ),
+            Extension(
+                "sklearn_extra.cluster._commonnn_inner",
+                ["sklearn_extra/cluster/_commonnn_inner.pyx"],
+                include_dirs=[np.get_include()],
+                language="c++",
+            ),
         ]
     ),
     "cmdclass": dict(build_ext=build_ext),
 
@@ -1,3 +1,4 @@
 from ._k_medoids import KMedoids
+from ._commonnn import commonnn, CommonNNClustering
 
-__all__ = ["KMedoids"]
+__all__ = ["KMedoids", "CommonNNClustering", "commonnn"]