TensorFlow seems not to use GPU

As per log info, in particular device placement, your code uses GPU. Just the time to run is the same. My guess is that:

c1.append(matpow(a, n))
c1.append(matpow(b, n))

Is the bottleneck in your code, moving big matrices from GPU memory to RAM on and on. Can you try to:

  • change the matrix size to 1e4 x 1e4

  • with tf.device("/gpu:0"):
      A = tf.random_normal([matrix_size, matrix_size])
      B = tf.random_normal([matrix_size, matrix_size])
      C = tf.matmul(A, B)
    with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as sess:
      t1 = datetime.datetime.now()
      sess.run(C)
      t2 = datetime.datetime.now()
    

Say for instance creating the tensorflow session takes 4.9 seconds and the actual calculations only takes 0.1 on the cpu giving you a time of 5.0 seconds on the cpu. Now say creating the session on the gpu also takes 4.9 seconds but the calculation takes 0.01 seconds giving a time of 4.91 seconds. You would hardly see the difference. Creating the session is a one time overhead cost at the startup of a program. You should not include that in your timing. Also tensorflow does some compilation/optimization when you call sess.run for the first time which makes the first run even slower.

Try timing it like this.

with tf.Session(config=tf.ConfigProto(log_device_placement=True)) as sess:
    # Runs the op the first time.
    sess.run(sum)
    t1 = datetime.datetime.now()
    for i in range(1000):
        sess.run(sum)
    t2 = datetime.datetime.now()

If this doesn't fix it it might also be that your calculation does not allow for enough parallelism for the GPU to really beat the cpu. Increasing the matrix size might bring out the differences.