TensorFlow: InternalError: Blas SGEMM launch failed

I encountered this problem and solved it by setting allow_soft_placement=True and gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.3), which specifically define the fraction of memory of GPU been used. I guess this has helped to avoid two tensorflow processes competing for the GPU memory.

gpu_options = tf.GPUOptions(per_process_gpu_memory_fraction=0.3)
sess = tf.Session(config=tf.ConfigProto(
  allow_soft_placement=True, log_device_placement=True))

Old question, but may help others.
Try to close interactive sessions active in other processes (if IPython Notebook - just restart kernels). This helped me!

Additionally, I use this code to close local sessions in this kernel during experiments:

if 'session' in locals() and session is not None:
    print('Close interactive session')
    session.close()

I got this error when running Tensorflow Distributed. Did you check if any of the workers were reporting CUDA_OUT_OF_MEMORY errors? If this is the case it may have to do with where you place your weight and bias variables. E.g.

with tf.device("/job:paramserver/task:0/cpu:0"):
   W = weight_variable([input_units, num_hidden_units])       
   b = bias_variable([num_hidden_units])