Why is minimizing the nuclear norm of a matrix a good surrogate for minimizing the rank?
Why does compressed sensing work? Because the $\ell_1$ ball in high dimensions is extremely "pointy" -- the extreme values of a linear function on this ball are very likely to be attained on the faces of low dimensions, those that consist of sparse vectors. When applied to matrices, the sparseness of the set of eigenvalues means low rank, as @mrig wrote before me.
The nuclear norm can be thought of as a convex relaxation of the number of non-zero eigenvalues (i.e. the rank).
A nuclear norm of a matrix is equivalent to the L1-norm of the vector of its eigenvalues. Thus, you are injecting sparsity to the vector of eigenvalues. Essentially, this sparsity means you are reducing the rank of the original matrix.