How to debug Tensorflow segmentation fault in model.fit()?
Build the tensorflow from source(r1.13) .Conv2D segmentation fault fixed.
follow Build from Source
my GPU : RTX 2070 Ubuntu 16.04 Python 3.5.2 Nvidia Driver 410.78 CUDA - 10.0.130 cuDNN-10.0 - 7.4.2.24 TensorRT-5.0.0 Compute Capability: 7.5
Build : tensorflow-1.13.0rc0-cp35-cp35m-linux_x86_64
Download prebuilt from https://github.com/tensorflow/tensorflow/issues/22706
I had the exact same problem on a very similar system as Francois but using a RTX2070 on which I could reliably reproduce the segmentation fault error when using the conv2d function executed on the GPU. My setting:
- Ubuntu: 18.04
- GPU: RTX 2070
- CUDA: 10
- cudnn: 7
- conda with python 3.6
I finally solved it by building tensorflow from source into a new conda environment. For a fantastic guide see e.g. the following link: https://gist.github.com/Brainiarc7/6d6c3f23ea057775b72c52817759b25c
This is basically like any other build-tensorflow-from-source guide and consisted in my case of the following steps:
- insalling bazel
- cloning tensorflow from git and running
./configure
- running the appropriate
bazel build
command (see link for details)
Some minor issues came up during the build, one of which was solved by installing 3 packages manually, using:
pip install keras_applications==1.0.4 --no-deps
pip install keras_preprocessing==1.0.2 --no-deps
pip install h5py==2.8.0
which I found out using this answer here: Error Compiling Tensorflow From Source - No module named 'keras_applications'
conv2d now works like a charm when using the gpu!
However, since all this took a fairly long time (building from source takes over an hour, not counting the search for the solution on the internet) I recommend to make a backup of the system after you get it working, e.g. using timeshift or any other program that you like.
I had the same Conv2D problem with:
- Ubuntu 18.04
- Graphic card: GeForce RTX 2080
- CUDA: cuda_10.0.130_410
- CUDNN: cudnn-10.0-linux-x64-v7.4.2
- conda with Python 3.6
Best advice was from this link: https://github.com/tensorflow/tensorflow/issues/24383
So a fix should come with Tensorflow 1.13. In the meantime, using Tensorflow 1.13 nightly build (Dec 26, 2018) + using tensorflow.keras instead of keras solved the issue.