Understanding Vaughan's Identity
The point of Vaughan's identity is to express $\Lambda(n)$ (on some range, e.g. $n \in [X,2X]$) into two types of sums that are reasonably tractable: "Type I components" $\sum_{d|n: d \leq D} a_d$ where $D$ is fairly small (in particular, significantly smaller than $X$), or "Type II components" $\sum_{n = d_1 d_2: d_1 \geq D_1, d_2 \geq D_2} a_{d_1} b_{d_2}$ where $D_1, D_2$ are fairly large. This decomposes sums such as $\sum_n \Lambda(n) f(n)$ into "Type I sums" $\sum_{d \leq D} a_d \sum_m f(dm)$ and "Type II sums" $\sum_{d_1 \geq D_1} \sum_{d_2 \geq D_2} a_{d_1} b_{d_2} f(d_1 d_2)$, the first of which can often be dealt with through upper bound estimates on the magnitude of the inner sums $\sum_m f(dm)$, and the latter can be dealt with through bilinear sum methods (often based ultimately on using the Cauchy-Schwarz inequality to eliminate the pesky weights $a_{d_1}, b_{d_2}$).
Because the logarithm function $L(n) = \log n$ is so slowly varying, it behaves like $1$, and so we also consider expressions such as $\sum_{d|n: d \leq D} a_d \log \frac{n}{d}$ to be Type I.
In terms of Dirichlet convolutions, the task is to decompose $\Lambda$ into some combination of Type I components $a_< * 1$ or $a_< * L$, where $a_<$ is supported on small numbers, and Type II components $a_> * b_>$, where $a_>, b_>$ are supported on large numbers. The Vaughan identity is not the only identity that achieves this purpose, but it is amongst the simplest such identity, and is already sufficient for many applications.
Now, we do have the basic identity $\Lambda = \mu * L$, which looks sort of like a Type I component, except that the Mobius function $\mu$ is not restricted to the small numbers. Nevertheless, one can try to truncate this identity by performing the splitting $$ \Lambda = \mu_< * L + \mu_> * L$$ where $\mu_<, \mu_>$ are the restrictions of $\mu$ to small and large numbers respectively (let us ignore for now exactly where to make the cut between the two types of numbers). The first component is of Type I; we just need to figure out what to do with the second component.
At this point we introduce another basic identity, $L = \Lambda * 1$, hence $\mu_> * L = \mu_> * \Lambda * 1$. This begins to look a bit like a Type II component, except that only one of the factors is restricted to be large. So we perform another truncation, this time on the $\Lambda$ factor: $$ \mu_> * L = \mu_> * \Lambda_> * 1 + \mu_> * \Lambda_< * 1.$$ The first factor is of Type II (after grouping, say, $\mu_> * 1$, into a single factor supported on large numbers). So we're left with understanding $\mu_> * \Lambda_< * 1$. Here we use a final basic identity, $\mu * 1 = \delta$ (where $\delta$ is the Kronecker delta). This implies that $\mu * \Lambda_< * 1 = \Lambda_<$, which will vanish on the desired range $[X,2X]$ if we truncated $\Lambda$ properly. So we can flip the truncation on $\mu$: $$ \mu_> * \Lambda_< * 1 = - \mu_< * \Lambda_< * 1.$$ But this is a Type I component if we group $\mu_< * \Lambda_<$ into a single function supported on (reasonably) small numbers.
Incidentally, analytic prime number theory would be a lot easier if we could somehow eliminate the Type II components, and have a Vaughan-type identity that expresses $\Lambda$ solely in terms of Type I components, which are usually a lot easier to deal with. Sadly, this is not the case (except in the presence of a Siegel zero, but that's another long story...). The easiest way to see this heuristically is by assuming the Mobius pseudorandomness conjecture, which among other things will imply that all Type I components will have small correlation with the Mobius function. On the other hand, the von Mangoldt function has a very large correlation with the Mobius function, as the latter is almost always $-1$ on the support of the former. So one cannot efficiently decompose von Mangoldt solely into Type I components, and one needs something like a Type II component somewhere in the decomposition also. (But sometimes one can work with other components that look like truncated versions of the divisor functions $d_2 = 1*1$, $d_3 = 1*1*1$, $d_4 = 1*1*1*1$, etc., which in some applications are preferable to Type II sums. The Heath-Brown identity is particularly well suited for producing components of this form to replace some or all of the Type II sums).
The analytic version of Vaughan's identity is $$ \frac{\zeta'}{\zeta} = F+\zeta'G-FG\zeta + \left(\frac{\zeta'}{\zeta}-F\right)(1-\zeta G). $$ Here the last factor to the right is the most complicated, so to simplify the right hand side one should take $F$ as an approximation to $\frac{\zeta'}{\zeta}$ and $G$ to be an approximation to $\frac{1}{\zeta}$. The simplest approximation one could think of is a truncation of the Dirichlet series, which then yields Vaughan's identity.
Expressions as above were known before in the theory of density estimates for $\zeta$, going back to the work of Hardy and Littlewood. The simplest version is just one Dirichlet polynomial $M$, which approximates $\zeta^{-1}$. Then $M\zeta$ is on average smaller than $\zeta$, and one can get better bounds for the zeroes of $M\zeta$ then for $\zeta$. Since the roots of $\zeta$ are among the roots of $M\zeta$, one obtains bounds for the roots of $\zeta$. As far as I know, Vaughan had found the analytic identity above and applied it to density estimates before he noticed, that it can be turned into a useful elementary identity for its coefficients.
The terms in Vaughan's identity are not random sums: they are convolutions of simpler and/or shorter range arithmetic functions. Convolution allows the separation of variables in the sum, and hence it allows a more efficient estimation of the sum. A classical and prime example is Dirichlet's hyperbola method.