-
-
Notifications
You must be signed in to change notification settings - Fork 10.9k
BUG: Segmentation fault in temporary eliding (backtrace() fails, probably related to pthread locking) #13042
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hmmm, so if you look at the bug, the actual problem already occurs during the power operation (although it might be hard to trigger it). It seems to have to do with temporary elides. However, that code has not changed in a very long time, and I do not really remember it really causing crashes. It might be worth a shot to uninstall mkl and reinstall numpy to see if that causes some interaction (I would be surprised, but...). This: https://stackoverflow.com/questions/16196897/segmentation-fault-when-calling-backtrace-on-linux-x86 sounds like it is likely related. The last comment there, suggests that it may be additionally have to do with the glibc version behaving badly together with a locked pthread (only when a lock exists), so it is possible that it can only reproduced with certain glibc versions such as 2.3 @UKeyboard more importantly can you help us to narrow it down/simplify a bit:
So mostly minimal test would be very useful, preferably one that does not require a package like keras where (at least for me) it is hard to judge how it influences everything. Quick fix, replace your code with:
but I expect it will simply crash later, since temporary eliding is quite common (which is why it is useful…). |
@juliantaylor just a quick ping in case you know something of why backtrace (in temporary elide) causes segfault in conjunction with some pthread locking probably. |
@seberg Sorry for late reply. I tried the following code: power = (source - target)**2
d = numpy.sum(power, axis=-1) before noticing your suggestion and it did crash (as you say) at As you can see, I got this very problem in my pytorch project. I don't need keras. And, the problem is only in my cpu code with numpy. After I rebuilt all the code with pytorch functions, it works now. I also try numpy ndarray power in python interactive terminal and it doesn't cause any problem. Maybe it has something with glibc but I cannot test that coz this is a shared machine and I am not a administrator. Thanks |
Closing this, it's old and without more details, I doubt there is anything to do since we don't even know if there is an issue. |
In my pytorch project, my test code failed and raised Segmentation fault (core dump) . The code worked one week ago. Then, i updated my conda env and the problem occurs. Unfortunately, i didn't back up the previous conda env. Now the error keeps annoying me.
pdb
showsnumpy.sum
in the following code causes the problem:The failure generates
[2523411.260096] python[1501]: segfault at 38 ip 00007fe1c927c73c sp 00007fffce6262b0 error 4 in ld-2.23.so[7fe1c9270000+26000]
syslog item as well.`gdb` gives the following info:
That's all i can do and the problem is still there.
Here is my workspace env:
I can reproduce the error on
Ubuntu 16.04 4.4.0-141-generic
,Ubuntu 16.04 4.15.0-43-generic
andUbuntu 16.04 4.15.0-45-generic
platform.Any help?
The text was updated successfully, but these errors were encountered: