Closed
Description
In my pytorch project, my test code failed and raised Segmentation fault (core dump) . The code worked one week ago. Then, i updated my conda env and the problem occurs. Unfortunately, i didn't back up the previous conda env. Now the error keeps annoying me.
pdb
shows numpy.sum
in the following code causes the problem:
source = numpy.expand_dims(source,axis=-2) # [M,1,2]
target = numpy.expand_dims(target,axis=-3) # [1,K,2]
d = numpy.sum((source - target)**2, axis=-1) # [M,K] after broadcasting
The failure generates [2523411.260096] python[1501]: segfault at 38 ip 00007fe1c927c73c sp 00007fffce6262b0 error 4 in ld-2.23.so[7fe1c9270000+26000]
syslog item as well.
`gdb` gives the following info:
gdb python
GNU gdb (Ubuntu 7.11.1-0ubuntu1~16.5) 7.11.1
Copyright (C) 2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law. Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from python...done.
(gdb) run "debug/test_network.py"
Starting program: /home/me/.conda/envs/pytorch/bin/python "debug/test_network.py"
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff7de373c in ?? () from /lib64/ld-linux-x86-64.so.2
(gdb) bt
#0 0x00007ffff7de373c in ?? () from /lib64/ld-linux-x86-64.so.2
#1 0x00007ffff7dec851 in ?? () from /lib64/ld-linux-x86-64.so.2
#2 0x00007ffff7de7564 in ?? () from /lib64/ld-linux-x86-64.so.2
#3 0x00007ffff7debda9 in ?? () from /lib64/ld-linux-x86-64.so.2
#4 0x00007ffff79335ad in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#5 0x00007ffff7de7564 in ?? () from /lib64/ld-linux-x86-64.so.2
#6 0x00007ffff7933664 in __libc_dlopen_mode () from /lib/x86_64-linux-gnu/libc.so.6
#7 0x00007ffff7905a85 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#8 0x00007ffff7bc8a99 in __pthread_once_slow () from /lib/x86_64-linux-gnu/libpthread.so.0
#9 0x00007ffff7905ba4 in backtrace () from /lib/x86_64-linux-gnu/libc.so.6
#10 0x00007ffff5d2c89f in check_callers.part ()
from /home/me/.conda/envs/pytorch/lib/python3.6/site-packages/numpy/core/multiarray.cpython-36m-x86_64-linux-gnu.so
#11 0x00007ffff5d2ceed in can_elide_temp_unary ()
from /home/me/.conda/envs/pytorch/lib/python3.6/site-packages/numpy/core/multiarray.cpython-36m-x86_64-linux-gnu.so
#12 0x00007ffff5d1bb67 in fast_scalar_power ()
from /home/me/.conda/envs/pytorch/lib/python3.6/site-packages/numpy/core/multiarray.cpython-36m-x86_64-linux-gnu.so
#13 0x00007ffff5d1bff8 in array_power ()
from /home/me/.conda/envs/pytorch/lib/python3.6/site-packages/numpy/core/multiarray.cpython-36m-x86_64-linux-gnu.so
#14 0x000055555566239d in PyNumber_Power ()
#15 0x00005555557160aa in _PyEval_EvalFrameDefault ()
#16 0x00005555556e9eab in fast_function ()
#17 0x00005555556f01e5 in call_function ()
#18 0x000055555571441a in _PyEval_EvalFrameDefault ()
#19 0x00005555556e9366 in _PyEval_EvalCodeWithName ()
#20 0x00005555556ea5bb in _PyFunction_FastCallDict ()
...
...
That's all i can do and the problem is still there.
Here is my workspace env:
# Name Version Build Channel
_tflow_1100_select 0.0.3 mkl
absl-py 0.4.0 py36h28b3542_0
asn1crypto 0.24.0 py36_0
astor 0.7.1 py36_0
blas 1.0 mkl
bzip2 1.0.6 h14c3975_5
ca-certificates 2019.1.23 0
cairo 1.14.12 h8948797_3
certifi 2018.11.29 py36_0
cffi 1.11.5 py36he75722e_1
chardet 3.0.4 py36_1
cloudpickle 0.5.5 py36_0
cryptography 2.3.1 py36hc365091_0
cudatoolkit 9.0 h13b8566_0
cudnn 7.3.1 cuda9.0_0
cycler 0.10.0 py36_0
cython 0.29 py36he6710b0_0
dask-core 0.18.2 py36_0
dbus 1.13.2 h714fa37_1
decorator 4.3.0 py36_0
expat 2.2.6 he6710b0_0
ffmpeg 4.0 hcdf2ecd_0
fontconfig 2.13.0 h9420a91_0
freeglut 3.0.0 hf484d3e_5
freetype 2.9.1 h8a8886c_1
gast 0.2.0 py36_0
glib 2.56.2 hd408876_0
graphite2 1.3.13 h23475e2_0
grpcio 1.12.1 py36hdbcaa40_0
gst-plugins-base 1.14.0 hbbd80ab_1
gstreamer 1.14.0 hb453b48_1
h5py 2.8.0 py36h989c5e5_3
harfbuzz 1.8.8 hffaf4a1_0
hdf5 1.10.2 hba1933b_1
icu 58.2 h9c2bf20_1
idna 2.7 py36_0
imageio 2.3.0 py36_0
intel-openmp 2018.0.3 0
jasper 2.0.14 h07fcdf6_1
jpeg 9b h024ee3a_2
kiwisolver 1.0.1 py36hf484d3e_0
libedit 3.1.20170329 h6b74fdf_2
libffi 3.2.1 hd88cf55_4
libgcc-ng 8.2.0 hdf63c60_1
libgfortran-ng 7.3.0 hdf63c60_0
libglu 9.0.0 hf484d3e_1
libopencv 3.4.2 hb342d67_1
libopus 1.3 h7b6447c_0
libpng 1.6.34 hb9fc6fc_0
libprotobuf 3.6.0 hdbcaa40_0
libstdcxx-ng 8.2.0 hdf63c60_1
libtiff 4.0.9 he85c1e1_2
libuuid 1.0.3 h1bed415_2
libvpx 1.7.0 h439df22_0
libxcb 1.13 h1bed415_1
libxml2 2.9.8 h26e45fe_1
markdown 2.6.11 py36_0
matplotlib 2.2.3 py36hb69df0a_0
mkl 2018.0.3 1
mkl_fft 1.0.6 py36h7dd41cf_0
mkl_random 1.0.1 py36h4414c95_1
nccl 1.3.5 cuda9.0_0
ncurses 6.1 hf484d3e_0
networkx 2.1 py36_0
ninja 1.8.2 py36h6bb024c_1
numpy 1.15.2 py36h1d66e8a_1
numpy-base 1.15.2 py36h81de0dd_1
olefile 0.45.1 py36_0
opencv 3.4.2 py36h6fd60c2_1
openssl 1.0.2p h14c3975_0
pcre 8.42 h439df22_0
pillow 5.2.0 py36heded4f4_0
pip 10.0.1 py36_0
pixman 0.36.0 h7b6447c_0
protobuf 3.6.0 py36hf484d3e_0
py-opencv 3.4.2 py36hb342d67_1
pycparser 2.18 py36_1
pyopenssl 18.0.0 py36_0
pyparsing 2.2.0 py36_1
pyqt 5.9.2 py36h22d08a2_1
pysocks 1.6.8 py36_0
python 3.6.4 hc3d631a_3
python-dateutil 2.7.3 py36_0
pytorch 0.4.1 py36ha74772b_0
pytz 2018.5 py36_0
pywavelets 0.5.2 py36h035aef0_2
pyzmq 17.1.2 <pip>
qt 4.8.7 2
readline 7.0 h7b6447c_5
requests 2.19.1 py36_0
scikit-image 0.14.0 py36hf484d3e_1
scipy 1.1.0 py36hfa4b5c9_1
setuptools 40.2.0 py36_0
sip 4.19.12 py36he6710b0_0
six 1.11.0 py36_1
sqlite 3.26.0 h7b6447c_0
termcolor 1.1.0 py36_1
tk 8.6.8 hbc83047_0
toolz 0.9.0 py36_0
torchfile 0.1.0 <pip>
torchvision 0.2.1 py36_1 pytorch
tornado 5.1 py36h14c3975_0
tqdm 4.25.0 py36h28b3542_0
urllib3 1.23 py36_0
visdom 0.1.8.5 <pip>
websocket-client 0.53.0 <pip>
werkzeug 0.14.1 py36_0
wheel 0.31.1 py36_0
xz 5.2.4 h14c3975_4
zlib 1.2.11 ha838bed_2
I can reproduce the error on Ubuntu 16.04 4.4.0-141-generic
, Ubuntu 16.04 4.15.0-43-generic
and Ubuntu 16.04 4.15.0-45-generic
platform.
Any help?