min_samples in HDSCAN

Describe the issue linked to the documentation

I find the description of the min_samples argument in sklearn.cluster.HDBSCAN confusing.

It says "The number of samples in a neighborhood for a point to be considered as a core point. This includes the point itself."

But if I understand everything correctly min_samples corresponds to the $k$ used to compute the core distance $\text{core}_k\left(x\right)$ for every sample $x$ where the $k$'th core distance for some sample $x$ is defined as the distance to the $k$'th nearest-neighbor of $x$ (counting itself). (-> which exactly what is happening in the code here: https://github.com/scikit-learn-contrib/hdbscan/blob/fc94241a4ecf5d3668cbe33b36ef03e6160d7ab7/hdbscan/_hdbscan_reachability.pyx#L45-L47, where it is called min_points)

I don't understand how both of these descriptions are equivalent. I would assume that other people might find that confusing as well.

Link in Code:

scikit-learn/sklearn/cluster/_hdbscan/hdbscan.py

Lines 441 to 444 in 8721245

    
               min_samples : int, default=None 
        
                   The number of samples in a neighborhood for a point 
        
                   to be considered as a core point. This includes the point itself. 
        
                   When `None`, defaults to `min_cluster_size`.

Link in Documentation: https://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html#sklearn.cluster.DBSCAN

Suggest a potential alternative/fix

No response

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Describe the issue linked to the documentation

Suggest a potential alternative/fix

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	min_samples : int, default=None
	The number of samples in a neighborhood for a point
	to be considered as a core point. This includes the point itself.
	When `None`, defaults to `min_cluster_size`.

Uh oh!

Description

Describe the issue linked to the documentation

Suggest a potential alternative/fix

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions