What is the maximum block count possible in CUDA?
65535 in a single dimension. Here's the complete table
In case anybody lands here based on a Google search (as I just did):
Nvidia changed the specification since this question was asked. With compute capability 3.0 and newer, the x-Dimension of a grid of thread blocks is allowed to be up to 2'147'483'647 or 2^31 - 1.
See the current: Technical Specification
With compute capability 3.0 or higher, you can have up to 2^31 - 1
blocks in the x-dimension, and at most 65535 blocks in the y and z dimensions. See Table H.1. Feature Support per Compute Capability of the CUDA C Programming Guide Version 9.1.
As Pavan pointed out, if you do not provide a dim3 for grid configuration, you will only use the x-dimension, hence the per dimension limit applies here.