How does Yitang Zhang use Cauchy's inequality and Theorem 2 to obtain the error term coming from the $S_2$ sum
That also puzzled me first, but I think it is ok. By Cauchy-Schwarz,
$$ \mathcal E_i \leq \left(\sum_{ d < D^2, d | \mathcal P }\sum_{ c \in \mathcal C_i ( d ) } \tau_3^2(d) \rho_2^2(d) | \Delta( \theta,d,c) |\right)^{1/2}\left(\sum_{ d < D^2 , d | \mathcal P } \sum_{ c \in \mathcal C_i(d) } | \Delta (\theta,d,c)|\right)^{1/2}.$$
In the first parenthesis, $| \Delta( \theta,d,c) |\ll x\mathcal{L}/d$ by trivial estimation (for $d < x$), hence the first parenthesis is $\ll x\mathcal{L}^B$ for some fixed $B>0$. The second parenthesis, on the other hand, is $\ll x\mathcal{L}^{-A}$ for any $A>0$. Combining these, $\mathcal E_i \ll x\mathcal{L}^{-C} $ for any $C>0$, and this is sufficient.
P.S. The first display on page 5 requires a small correction: $\mathcal{E}$ should be multiplied by $\mathcal{L}^{2k_0+2l_0}$, because $\lambda(n)^2$ in (2.2) is not bounded (cf. (9.7) in [6]). Of course this does not affect the main argument.
The argument is OK (in fact it appears already -- probably as sketchily -- in Goldston, Pintz, Yildirim and certainly many other papers involving the Selberg sieve for instance. The point is to use Cauchy-Schwarz with the square root of the modulus of the error term $\Delta$, and one uses a trivial bound on the error term (it could be of size $D(\log D)^{A}$ for some $A$) in one factor, and Theorem 2 in the second: in other words, one writes
$\sum_d f(d) |\Delta(d)|\leq (\sum_d f(d)^2|\Delta(d)|)^{1/2} (\sum_d |\Delta(d)|)^{1/2}.$
After a conversation with Prof. Terry Tao, he told me to use weighted C-S or split the sum, However, I realized that this can be done by splitting the sum and using the trivial bound, and without appealing to C-S at all.
First sum is $$\sum_{f(d) < L^B} f(d) |\Delta(d)|\ll_A xL^BL^{-A}$$
Second sum is the remaining ones $$\sum_{f(d)>L^B} f(d) |\Delta(d)|\ll\sum_d \frac{f(d)^2}{L^B} \frac{xL}{d}\ll\frac{xL^C}{L^B}$$ for some absolute constant $C$.
Then choose $B>C$, and $A>B$.