HDBSCAN with cython development version yields TypeError: 'float' object cannot be interpreted as an integer #26535

lesteve · 2023-06-07T17:22:28Z

Plenty of these errors in scipy-dev, seen recently in #26154, e.g. this build

I can reproduce locally with cython development version installed through pip install git+https://github.com/cython/cython

pytest -x sklearn/cluster/tests/test_hdbscan.py

____________________________________________ test_outlier_data[infinite] ____________________________________________

outlier_type = 'infinite'

    @pytest.mark.parametrize("outlier_type", _OUTLIER_ENCODING)
    def test_outlier_data(outlier_type):
        """
        Tests if np.inf and np.nan data are each treated as special outliers.
        """
        outlier = {
            "infinite": np.inf,
            "missing": np.nan,
        }[outlier_type]
        prob_check = {
            "infinite": lambda x, y: x == y,
            "missing": lambda x, y: np.isnan(x),
        }[outlier_type]
        label = _OUTLIER_ENCODING[outlier_type]["label"]
        prob = _OUTLIER_ENCODING[outlier_type]["prob"]
    
        X_outlier = X.copy()
        X_outlier[0] = [outlier, 1]
        X_outlier[5] = [outlier, outlier]
>       model = HDBSCAN().fit(X_outlier)

sklearn/cluster/tests/test_hdbscan.py:59: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
sklearn/cluster/_hdbscan/hdbscan.py:822: in fit
    self.labels_, self.probabilities_ = tree_to_labels(
sklearn/cluster/_hdbscan/_tree.pyx:56: in sklearn.cluster._hdbscan._tree.tree_to_labels
    cpdef tuple tree_to_labels(
sklearn/cluster/_hdbscan/_tree.pyx:70: in sklearn.cluster._hdbscan._tree.tree_to_labels
    labels, probabilities = _get_clusters(
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

>   is_cluster = {cluster: True for cluster in node_list}
E   TypeError: 'float' object cannot be interpreted as an integer

sklearn/cluster/_hdbscan/_tree.pyx:713: TypeError

Debugging with print statements a bit, cluster is a float in this line:

scikit-learn/sklearn/cluster/_hdbscan/_tree.pyx

Line 713 in 825e892

is_cluster = {cluster: True for cluster in node_list}

but is defined to be an int above:

https://github.com/scikit-learn/scikit-learn/blob/825e892e8c146f69df4a9113025404c9d83757a3/sklearn/cluster/_hdbscan/_tree.pyx#LL699C8-L699C54

Trying with the latest cython release I do get that cluster is a float, and I am not sure this is expected since this is some kind of id quickly looking at the code.

cc @Micky774.

The text was updated successfully, but these errors were encountered:

lesteve added the Bug label Jun 7, 2023

Micky774 mentioned this issue Jun 8, 2023

FIX Updated hdbscan stability calculation to avoid implicit casting #26546

Merged

lesteve mentioned this issue Jun 9, 2023

FIX Updated loop structure in hdbscan _tree.pyx to avoid error #26547

Merged

lesteve closed this as completed in #26546 Jun 9, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HDBSCAN with cython development version yields TypeError: 'float' object cannot be interpreted as an integer #26535

HDBSCAN with cython development version yields TypeError: 'float' object cannot be interpreted as an integer #26535

HDBSCAN with cython development version yields TypeError: 'float' object cannot be interpreted as an integer #26535

HDBSCAN with cython development version yields TypeError: 'float' object cannot be interpreted as an integer #26535

Comments