-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
min_samples
in HDSCAN
#28976
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
cc @Micky774 |
Indeed, we used the docstring of the original implementation that reused the DBSCAN information. However, the parameter here have a different meaning: it define the core distance. So we should make sure to change the different docstrings from the file. |
@glemaitre I opened a draft PR for this. I have a proposal for the description of scikit-learn/sklearn/cluster/_hdbscan/hdbscan.py Lines 83 to 99 in dc1cad2
or here scikit-learn/sklearn/cluster/_hdbscan/hdbscan.py Lines 599 to 605 in dc1cad2
|
Describe the issue linked to the documentation
I find the description of the
min_samples
argument in sklearn.cluster.HDBSCAN confusing.It says "The number of samples in a neighborhood for a point to be considered as a core point. This includes the point itself."
But if I understand everything correctly$k$ used to compute the core distance $\text{core}_k\left(x\right)$ for every sample $x$ where the $k$ 'th core distance for some sample $x$ is defined as the distance to the $k$ 'th nearest-neighbor of $x$ (counting itself). (-> which exactly what is happening in the code here: https://github.com/scikit-learn-contrib/hdbscan/blob/fc94241a4ecf5d3668cbe33b36ef03e6160d7ab7/hdbscan/_hdbscan_reachability.pyx#L45-L47, where it is called
min_samples
corresponds to themin_points
)I don't understand how both of these descriptions are equivalent. I would assume that other people might find that confusing as well.
Link in Code:
scikit-learn/sklearn/cluster/_hdbscan/hdbscan.py
Lines 441 to 444 in 8721245
Link in Documentation: https://scikit-learn.org/stable/modules/generated/sklearn.cluster.DBSCAN.html#sklearn.cluster.DBSCAN
Suggest a potential alternative/fix
No response
The text was updated successfully, but these errors were encountered: