Float16 slower than float32 in keras
From the documentation of cuDNN (section 2.7, subsection Type Conversion) you can see:
Note: Accumulators are 32-bit integers which wrap on overflow.
and that this holds for the standard INT8 data type of the following: the data input, the filter input and the output.
Under those assumptions, @jiandercy is right that there's a float16 to float32 conversion and then back-conversion before returning the result, and float16
would be slower.