Criterion for deciding whether matrix is diagonalizable
Basically, the condition in the lemma, the one that is sufficient to give you diagonalisability (and, as it turns out, is actually equivalent), boils down to the following: $$\operatorname{ker}(B - \lambda I)^2 = \operatorname{ker}(B - \lambda I),$$ where the $\operatorname{ker}$ is the kernel (or nullspace) of the matrix. To see this, consider $x$ in the statement of the lemma. The two statements, when substituted together amount to $(B - \lambda I)^2 x = 0$, that is, $x \in \operatorname{ker}(B - \lambda I)^2$. The lemma requires that $(B - \lambda I)x = 0$ in this case, that is, $x \in \operatorname{ker}(B - \lambda I)$. Thus, $\operatorname{ker}(B - \lambda I)^2 \subseteq \operatorname{ker}(B - \lambda I)$. The other subset inclusion is always true, and easy to show.
Why does this condition imply diagonalisability? Well, regardless of the matrix $B$, we have the following chain of set inclusion: $$\lbrace 0 \rbrace \subseteq \operatorname{ker}(B - \lambda I) \subseteq \operatorname{ker}(B - \lambda I)^2 \subseteq \operatorname{ker}(B - \lambda I)^3 \subseteq \ldots$$ This is straightforward to prove. Basically, if you apply $(B - \lambda I)^i$ to a vector and get $0$, then applying $(B - \lambda I)$ once more will still send the vector to $0$. Slightly less trivial to show is that once $\operatorname{ker}(B - \lambda I)^i = \operatorname{ker}(B - \lambda I)^{i+1}$, then $$\operatorname{ker}(B - \lambda I)^i = \operatorname{ker}(B - \lambda I)^{i+1} = \operatorname{ker}(B - \lambda I)^{i+2} = \ldots$$ That is, once the kernel stops growing in one step, it stops growing for good. Once the kernel stops growing, this is the generalised eigenspace of $B$ with respect to $\lambda$, if $\lambda$ is an eigenvalue (if $\lambda$ isn't, then all of the above kernels are trivial). It's not too hard to prove this, but I'll leave it out of the answer (I'll be happy to provide the proof if you want it, but it's a good exercise). So we have,
$$\lbrace 0 \rbrace \subset \operatorname{ker}(B - \lambda I) \subset \operatorname{ker}(B - \lambda I)^2 \subset \ldots \subset \operatorname{ker}(B - \lambda I)^i = \operatorname{ker}(B - \lambda I)^{i+1} = \ldots$$
But, what does our condition imply? It means that we reach equality at $i = 1$. So, we have
$$\lbrace 0 \rbrace \subset \operatorname{ker}(B - \lambda I) = \operatorname{ker}(B - \lambda I)^2 = \operatorname{ker}(B - \lambda I)^3 = \ldots$$
The generalised eigenspace is therefore $\operatorname{ker}(B - \lambda I)$, which is literally the (not generalised) eigenspace of $B$ corresponding to eigenvalue $\lambda$. Every generalised eigenvector is a (not generalised) eigenvector.
Now, the eigenspaces direct sum to the entirety of $\mathbb{C}^n$. Another way to view this is to look at an arbitrary Jordan basis. (Neither of these facts I can elegantly prove here.) Either way, you can form a basis of generalised eigenvectors, but since every generalised eigenvector is an eigenvector, you can form a basis of eigenvectors. It's easy to see that changing $B$ in terms of this basis of eigenvectors will make $B$ diagonal, so $B$ is indeed diagonalisable.
Hope that helps!
Yes, this has to do with the Jordan normal form. Suppose that $B = PJP^{-1}$, where $J$ is in Jordan normal form. Suppose without loss of generality that the first block of $J$ is the largest Jordan block associated with $\lambda$.
Suppose that this first block has size $k \geq 2$. Then we have $(J - \lambda I)e_2 = e_1$, and $(J - \lambda I) e_1 = 0$ (where $e_1,e_2,\dots,e_n$ is the standard basis of $\Bbb C^n$). It follows that $(B - \lambda I)[Pe_2] = Pe_1$, and $(B - \lambda I)[Pe_1] = 0$.
So, if $B$ has a Jordan block associated with $\lambda$ with size at least $2$, then there exist non-zero vectors $x,y$ such that $(B - \lambda I)x = y$ and $(B - \lambda I)y = 0$. Contrapositively, if no such non-zero vectors exist, then all Jordan blocks associated with $\lambda$ must be of size $1$.
Another way to present the same property is that if $(B - \lambda I)^2 x = 0$ then $(B-\lambda I) x = 0$, that is to say $$\ker (B-\lambda I)^2 = \ker (B - \lambda I)$$ This property implies that the generalized eigenspace for $\lambda$ is equal to the ordinary eigenspace. It follows that the dimension of the ordinary eigenspace is equal to $\lambda$'s algebraic multiplicity.
If all eigenvalues satisfy this property, it is a necessary and sufficient criterion for diagonalization.