-
-
Notifications
You must be signed in to change notification settings - Fork 25.9k
Segmentation fault when calculating euclidean_distances for large numbers of rows #4197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
You can cut that example down to: import numpy
from sklearn.metrics import euclidean_distances
numpy.random.seed(1)
X = numpy.random.random((50000, 100))
euclidean_distances(X) I'm hence changing the issue title. |
Thanks, makes sense to me, the above also segfaults for me also. |
The crash occurs in |
I'm closing this as belonging to numpy (until further notice?). |
That is ... surprising... to say the least. |
I've submitted with some more details to numpy for now, thanks for the investigation work. |
@dschallis Have the problem solved by Numpy? I was having the same question originally from calculating silhouette_score or silhouette_samples with large amounts of rows. See #4701 |
@Renzhh I don't think so, it's currently still an open issue with numpy: numpy/numpy#5533 |
@dschallis Is the question “Segmentation fault when calculating euclidean_distances for large numbers of rows” really caused by numpy? I happened to this problem, too. See my issue: #4701 .When the amount of sparse matrix less than 30,000 rows, both of silhouette_score and silhouette_samples are OK and can get expected results. But when the amount of X more than 100,000, the program crashed and get "Segmentation fault (core dumped)". I'm debugging... Since then, How do you calculate silhouette_score within large rows? |
For a sparse matrix we may need to assume it is a different issue. Still, check whether it's a problem with |
@jnothman After checking, I'm sure the problem caused by euclidean_distances. The versin of my scipy is 0.13.3 and numpy is1.8.2 . Now I try to use recent stable version to check wherether the segmentation fault still occurs. |
@jnothman With recent version, scipy 0.15.1 and numpy 1.9.2, the segmentation fault still occurs. But with scipy.test(), it seems that my installed scipy package has some little problem |
With scikit-learn 0.15.2, numpy 1.9.1, python 2.7.8 (on OS X), the following code segfaults:
Results in:
Dropping the rows down to 30000, and the above completes fine. Dropping rows to 40000, and the script takes a very long amount of time, but didn't appear to segfault.
The text was updated successfully, but these errors were encountered: