Tensorflow: How do you monitor GPU performance during model training in real-time?
Tensorflow automatically doesn't utilize all GPUs, it will use only one GPU, specifically first gpu
/gpu:0
You have to write multi gpus code to utilize all gpus available. cifar mutli-gpu example
to check usage every 0.1 seconds
watch -n0.1 nvidia-smi
- If no other indication is given, a GPU-enabled TensorFlow installation will default to use the first available GPU (as long as you have the Nvidia driver and CUDA 8.0 installed and the GPU has the necessary compute capability, which, according to the docs is 3.0). If you want to use more GPUs, you need to use
tf.device
directives in your graph (more about it here). - The easiest way to check the GPU usage is the console tool
nvidia-smi
. However, unliketop
or other similar programs, it only shows the current usage and finishes. As suggested in the comments, you can use something likewatch -n1 nvidia-smi
to re-run the program continuously (in this case every second).