@@ -746,17 +746,18 @@ by black points below.
746746
747747.. topic :: Implementation
748748
749- The algorithm is non-deterministic, but the core samples will
750- always belong to the same clusters (although the labels may be
751- different). The non-determinism comes from deciding to which cluster a
752- non-core sample belongs. A non-core sample can have a distance lower
753- than ``eps `` to two core samples in different clusters. By the
749+ The DBSCAN algorithm is deterministic, always generating the same clusters
750+ when given the same data in the same order. However, the results can differ when
751+ data is provided in a different order. First, even though the core samples
752+ will always be assigned to the same clusters, the labels of those clusters
753+ will depend on the order in which those samples are encountered in the data.
754+ Second and more importantly, the clusters to which non-core samples are assigned
755+ can differ depending on the data order. This would happen when a non-core sample
756+ has a distance lower than ``eps `` to two core samples in different clusters. By the
754757 triangular inequality, those two core samples must be more distant than
755758 ``eps `` from each other, or they would be in the same cluster. The non-core
756- sample is assigned to whichever cluster is generated first, where
757- the order is determined randomly. Other than the ordering of
758- the dataset, the algorithm is deterministic, making the results relatively
759- stable between runs on the same data.
759+ sample is assigned to whichever cluster is generated first in a pass
760+ through the data, and so the results will depend on the data ordering.
760761
761762 The current implementation uses ball trees and kd-trees
762763 to determine the neighborhood of points,
0 commit comments