8000 Revert "FEA OPTICS: add extract_xi method (#12077)" · xhluca/scikit-learn@e89571f · GitHub
[go: up one dir, main page]

Skip to content

Commit e89571f

Browse files
author
Xing
committed
Revert "FEA OPTICS: add extract_xi method (scikit-learn#12077)"
This reverts commit 14724c7.
1 parent 7415190 commit e89571f

File tree

9 files changed

+90
-815
lines changed

9 files changed

+90
-815
lines changed

doc/modules/classes.rst

Lines changed: 0 additions & 1 deletion
8000
Original file line numberDiff line numberDiff line change
@@ -114,7 +114,6 @@ Functions
114114

115115
cluster.affinity_propagation
116116
cluster.cluster_optics_dbscan
117-
cluster.cluster_optics_xi
118117
cluster.compute_optics_graph
119118
cluster.dbscan
120119
cluster.estimate_bandwidth

doc/modules/clustering.rst

Lines changed: 0 additions & 97 deletions
Original file line numberDiff line numberDiff line change
@@ -91,12 +91,6 @@ Overview of clustering methods
9191
- Non-flat geometry, uneven cluster sizes
9292
- Distances between nearest points
9393

94-
* - :ref:`OPTICS <optics>`
95-
- minimum cluster membership
96-
- Very large ``n_samples``, large ``n_clusters``
97-
- Non-flat geometry, uneven cluster sizes, variable cluster density
98-
- Distances between points
99-
10094
* - :ref:`Gaussian mixtures <mixture>`
10195
- many
10296
- Not scalable
@@ -812,11 +806,6 @@ by black points below.
812806
be used (e.g., with sparse matrices). This matrix will consume n^2 floats.
813807
A couple of mechanisms for getting around this are:
814808

815-
- Use :ref:`OPTICS <optics>` clustering in conjunction with the
816-
`extract_dbscan` method. OPTICS clustering also calculates the full
817-
pairwise matrix, but only keeps one row in memory at a time (memory
818-
complexity n).
819-
820809
- A sparse radius neighborhood graph (where missing entries are presumed to
821810
be out of eps) can be precomputed in a memory-efficient way and dbscan
822811
can be run over this with ``metric='precomputed'``. See
@@ -839,92 +828,6 @@ by black points below.
839828
Schubert, E., Sander, J., Ester, M., Kriegel, H. P., & Xu, X. (2017).
840829
In ACM Transactions on Database Systems (TODS), 42(3), 19.
841830

842-
.. _optics:
843-
844-
OPTICS
845-
======
846-
847-
The :class:`OPTICS` algorithm shares many similarities with the :class:`DBSCAN`
848-
algorithm, and can be considered a generalization of DBSCAN that relaxes the
849-
``eps`` requirement from a single value to a value range. The key difference
850-
between DBSCAN and OPTICS is that the OPTICS algorithm builds a *reachability*
851-
graph, which assigns each sample both a ``reachability_`` distance, and a spot
852-
within the cluster ``ordering_`` attribute; these two attributes are assigned
853-
when the model is fitted, and are used to determine cluster membership. If
854-
OPTICS is run with the default value of *inf* set for ``max_eps``, then DBSCAN
855-
style cluster extraction can be performed repeatedly in linear time for any
856-
given ``eps`` value using the ``cluster_optics_dbscan`` method. Setting
857-
``max_eps`` to a lower value will result in shorter run times, and can be
858-
thought of as the maximum neighborhood radius from each point to find other
859-
potential reachable points.
860-
861-
.. |optics_results| image:: ../auto_examples/cluster/images/sphx_glr_plot_optics_001.png
862-
:target: ../auto_examples/cluster/plot_optics.html
863-
:scale: 50
864-
865-
.. centered:: |optics_results|
866-
867-
The *reachability* distances generated by OPTICS allow for variable density
868-
extraction of clusters within a single data set. As shown in the above plot,
869-
combining *reachability* distances and data set ``ordering_`` produces a
870-
*reachability plot*, where point density is represented on the Y-axis, and
871-
points are ordered such that nearby points are adjacent. 'Cutting' the
872-
reachability plot at a single value produces DBSCAN like results; all points
873-
above the 'cut' are classified as noise, and each time that there is a break
874-
when reading from left to right signifies a new cluster. The default cluster
875-
extraction with OPTICS looks at the steep slopes within the graph to find
876-
clusters, and the user can define what counts as a steep slope using the
877-
parameter ``xi``. There are also other possibilities for analysis on the graph
878-
itself, such as generating hierarchical representations of the data through
879-
reachability-plot dendrograms, and the hierarchy of clusters detected by the
880-
algorithm can be accessed through the ``cluster_hierarchy_`` parameter. The
881-
plot above has been color-coded so that cluster colors in planar space match
882-
the linear segment clusters of the reachability plot. Note that the blue and
883-
red clusters are adjacent in the reachability plot, and can be hierarchically
884-
represented as children of a larger parent cluster.
885-
886-
.. topic:: Examples:
887-
888-
* :ref:`sphx_glr_auto_examples_cluster_plot_optics.py`
889-
890-
891-
.. topic:: Comparison with DBSCAN
892-
893-
The results from OPTICS ``cluster_optics_dbscan`` method and DBSCAN are
894-
very similar, but not always identical; specifically, labeling of periphery
895-
and noise points. This is in part because the first samples of each dense
896-
area processed by OPTICS have a large reachability value while being close
897-
to other points in their area, and will thus sometimes be marked as noise
898-
rather than periphery. This affects adjacent points when they are
899-
considered as candidates for being marked as either periphery or noise.
900-
901-
Note that for any single value of ``eps``, DBSCAN will tend to have a
902-
shorter run time than OPTICS; however, for repeated runs at varying ``eps``
903-
values, a single run of OPTICS may require less cumulative runtime than
904-
DBSCAN. It is also important to note that OPTICS' output is close to
905-
DBSCAN's only if ``eps`` and ``max_eps`` are close.
906-
907-
.. topic:: Computational Complexity
908-
909-
Spatial indexing trees are used to avoid calculating the full distance
910-
matrix, and allow for efficient memory usage on large sets of samples.
911-
Different distance metrics can be supplied via the ``metric`` keyword.
912-
913-
For large datasets, similar (but not identical) results can be obtained via
914-
`HDBSCAN <https://hdbscan.readthedocs.io>`_. The HDBSCAN implementation is
915-
multithreaded, and has better algorithmic runtime complexity than OPTICS,
916-
at the cost of worse memory scaling. For extremely large datasets that
917-
exhaust system memory using HDBSCAN, OPTICS will maintain *n* (as opposed
918-
to *n^2*) memory scaling; however, tuning of the ``max_eps`` parameter
919-
will likely need to be used to give a solution in a reasonable amount of
920-
wall time.
921-
922-
.. topic:: References:
923-
924-
* "OPTICS: ordering points to identify the clustering structure."
925-
Ankerst, Mihael, Markus M. Breunig, Hans-Peter Kriegel, and Jörg Sander.
926-
In ACM Sigmod Record, vol. 28, no. 2, pp. 49-60. ACM, 1999.
927-
928831
.. _birch:
929832

930833
Birch

doc/whats_new/v0.21.rst

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -89,8 +89,7 @@ Support for Python 3.4 and below has been officially dropped.
8989
- |MajorFeature| A new clustering algorithm: :class:`cluster.OPTICS`: an
9090
algoritm related to :class:`cluster.DBSCAN`, that has hyperparameters easier
9191
to set and that scales better, by :user:`Shane <espg>`,
92-
`Adrin Jalali`_, :user:`Erich Schubert <kno10>`, `Hanmin Qin`_, and
93-
:user:`Assia Benbihi <assiaben>`.
92+
:user:`Adrin Jalali <adrinjalali>`, and :user:`Erich Schubert <kno10>`.
9493

9594
- |Fix| Fixed a bug where :class:`cluster.Birch` could occasionally raise an
9695
AttributeError. :pr:`13651` by `Joel Nothman`_.

examples/cluster/plot_cluster_comparison.py

Lines changed: 4 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -74,20 +74,14 @@
7474
'damping': .9,
7575
'preference': -200,
7676
'n_neighbors': 10,
77-
'n_clusters': 3,
78-
'min_samples': 20,
79-
'xi': 0.05,
80-
'min_cluster_size': 0.1}
77+
'n_clusters': 3}
8178

8279
datasets = [
8380
(noisy_circles, {'damping': .77, 'preference': -240,
84-
'quantile': .2, 'n_clusters': 2,
85-
'min_samples': 20, 'xi': 0.25}),
81+
'quantile': .2, 'n_clusters': 2}),
8682
(noisy_moons, {'damping': .75, 'preference': -220, 'n_clusters': 2}),
87-
(varied, {'eps': .18, 'n_neighbors': 2,
88-
'min_samples': 5, 'xi': 0.035, 'min_cluster_size': .2}),
89-
(aniso, {'eps': .15, 'n_neighbors': 2,
90-
'min_samples': 20, 'xi': 0.1, 'min_cluster_size': .2}),
83+
(varied, {'eps': .18, 'n_neighbors': 2}),
84+
(aniso, {'eps': .15, 'n_neighbors': 2}),
9185
(blobs, {}),
9286
(no_structure, {})]
9387

@@ -122,9 +116,6 @@
122116
n_clusters=params['n_clusters'], eigen_solver='arpack',
123117
affinity="nearest_neighbors")
124118
dbscan = cluster.DBSCAN(eps=params['eps'])
125-
optics = cluster.OPTICS(min_samples=params['min_samples'],
126-
xi=params['xi'],
127-
min_cluster_size=params['min_cluster_size'])
128119
affinity_propagation = cluster.AffinityPropagation(
129120
damping=params['damping'], preference=params['preference'])
130121
average_linkage = cluster.AgglomerativeClustering(
@@ -142,7 +133,6 @@
142133
('Ward', ward),
143134
('AgglomerativeClustering', average_linkage),
144135
('DBSCAN', dbscan),
145-
('OPTICS', optics),
146136
('Birch', birch),
147137
('GaussianMixture', gmm)
148138
)

examples/cluster/plot_optics.py

Lines changed: 0 additions & 98 deletions
This file was deleted.

sklearn/cluster/__init__.py

Lines changed: 1 addition & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,7 @@
1111
FeatureAgglomeration)
1212
from .k_means_ import k_means, KMeans, MiniBatchKMeans
1313
from .dbscan_ import dbscan, DBSCAN
14-
from .optics_ import (OPTICS, cluster_optics_dbscan, compute_optics_graph,
15-
cluster_optics_xi)
14+
from .optics_ import OPTICS, cluster_optics_dbscan, compute_optics_graph
1615
from .bicluster import SpectralBiclustering, SpectralCoclustering
1716
from .birch import Birch
1817

@@ -22,7 +21,6 @@
2221
'DBSCAN',
2322
'OPTICS',
2423
'cluster_optics_dbscan',
25-
'cluster_optics_xi',
2624
'compute_optics_graph',
2725
'KMeans',
2826
'FeatureAgglomeration',

0 commit comments

Comments
 (0)
0