Description
Describe the bug
I also just ran into this issue that the program gets killed when running DBSCAN, similar to:
#22531
The documentation update already helps and I think it's ok for the algorithm to fail. But currently there is no way for me to recover, and a more informative error message would be useful. Since now DBSCAN just reports killed
and it requires a bit of search to see what fails:
>>> DBSCAN(eps=1, min_samples=2).fit(np.random.rand(10_000_000, 3))
Killed
e.g., something like how numpy
does it:
>>> n = int(1e6)
>>> np.random.rand(n, n)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "numpy/random/mtrand.pyx", line 1219, in numpy.random.mtrand.RandomState.rand
File "numpy/random/mtrand.pyx", line 437, in numpy.random.mtrand.RandomState.random_sample
File "_common.pyx", line 307, in numpy.random._common.double_fill
numpy.core._exceptions._ArrayMemoryError: Unable to allocate 7.28 TiB for an array with shape (1000000, 1000000) and data type float64
Additionally, I noted that the memory accumulated with consecutive calling of DBSCAN. Which can lead to a killed program even though there is enough memory when running a single fit.
I was able to resolve this by explicitly calling import gc; gc.collect()
after each run. Maybe this could be invoked at the end of each DBSCAN fit?
Steps/Code to Reproduce
try:
DBSCAN(eps=1, min_samples=2).fit(np.random.rand(10_000_000, 3))
except:
print("Caught exception")
Expected Results
Caught exception
Actual Results
Killed
Versions
>>> import sklearn; sklearn.show_versions()
System:
python: 3.10.12 (main, Feb 4 2025, 14:57:36) [GCC 11.4.0]
executable: /usr/bin/python3
machine: Linux-6.14.6-arch1-1-x86_64-with-glibc2.35
Python dependencies:
sklearn: 1.6.1
pip: None
setuptools: 80.7.1
numpy: 1.26.4
scipy: 1.15.3
Cython: None
pandas: 2.2.3
matplotlib: 3.10.3
joblib: 1.5.0
threadpoolctl: 3.6.0
Built with OpenMP: True
threadpoolctl info:
user_api: blas
internal_api: openblas
num_threads: 20
prefix: libopenblas
filepath: /usr/local/lib/python3.10/dist-packages/numpy.libs/libopenblas64_p-r0-0cf96a72.3.23.dev.so
version: 0.3.23.dev
threading_layer: pthreads
architecture: Prescott
user_api: blas
internal_api: openblas
num_threads: 20
prefix: libscipy_openblas
filepath: /usr/local/lib/python3.10/dist-packages/scipy.libs/libscipy_openblas-68440149.so
version: 0.3.28
threading_layer: pthreads
architecture: Haswell
user_api: openmp
internal_api: openmp
num_threads: 20
prefix: libgomp
filepath: /usr/local/lib/python3.10/dist-packages/scikit_learn.libs/libgomp-a34b3233.so.1.0.0
version: None