10000 HDBSCAN with cython development version yields TypeError: 'float' object cannot be interpreted as an integer · Issue #26535 · scikit-learn/scikit-learn · GitHub
[go: up one dir, main page]

Skip to content

HDBSCAN with cython development version yields TypeError: 'float' object cannot be interpreted as an integer #26535

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
lesteve opened this issue Jun 7, 2023 · 0 comments · Fixed by #26546
Labels

Comments

@lesteve
Copy link
Member
lesteve commented Jun 7, 2023

Plenty of these errors in scipy-dev, seen recently in #26154, e.g. this build

I can reproduce locally with cython development version installed through pip install git+https://github.com/cython/cython

pytest -x sklearn/cluster/tests/test_hdbscan.py
____________________________________________ test_outlier_data[infinite] ____________________________________________

outlier_type = 'infinite'

    @pytest.mark.parametrize("outlier_type", _OUTLIER_ENCODING)
    def test_outlier_data(outlier_type):
        """
        Tests if np.inf and np.nan data are each treated as special outliers.
        """
        outlier = {
            "infinite": np.inf,
            "missing": np.nan,
        }[outlier_type]
        prob_check = {
            "infinite": lambda x, y: x == y,
            "missing": lambda x, y: np.isnan(x),
        }[outlier_type]
        label = _OUTLIER_ENCODING[outlier_type]["label"]
        prob = _OUTLIER_ENCODING[outlier_type]["prob"]
    
        X_outlier = X.copy()
        X_outlier[0] = [outlier, 1]
        X_outlier[5] = [outlier, outlier]
>       model = HDBSCAN().fit(X_outlier)

sklearn/cluster/tests/test_hdbscan.py:59: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
sklearn/cluster/_hdbscan/hdbscan.py:822: in fit
    self.labels_, self.probabilities_ = tree_to_labels(
sklearn/cluster/_hdbscan/_tree.pyx:56: in sklearn.cluster._hdbscan._tree.tree_to_labels
    cpdef tuple tree_to_labels(
sklearn/cluster/_hdbscan/_tree.pyx:70: in sklearn.cluster._hdbscan._tree.tree_to_labels
    labels, probabilities = _get_clusters(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   is_cluster = {cluster: True for cluster in node_list}
E   TypeError: 'float' object cannot be interpreted as an integer

sklearn/cluster/_hdbscan/_tree.pyx:713: TypeError

Debugging with print statements a bit, cluster is a float in this line:

is_cluster = {cluster: True for cluster in node_list}

but is defined to be an int above:

https://github.com/scikit-learn/scikit-learn/blob/825e892e8c146f69df4a9113025404c9d83757a3/sklearn/cluster/_hdbscan/_tree.pyx#LL699C8-L699C54

Trying with the latest cython release I do get that cluster is a float, and I am not sure this is expected since this is some kind of id quickly looking at the code.

cc @Micky774.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant
0