A Modern Proof of Erdos and Renyi's 1959 Random Graph Paper?
One reference is Chapter 5 of Random Graphs by Janson, Łuczak, and Ruciński. Their proof uses the theory of Galton–Watson branching processes, so it is not quite the same argument that Erdős and Rényi gave, but one nice thing about Janson et al. is that they discuss both graph models.
One can also deduce the $G(n,p)$ result from the $G(n,m)$ result: one way to generate $G(n,p)$ is to choose $m$ from the binomial distribution with $\binom{n}{2}$ trials and probability $p$, then generate $G(n,m)$. Since the standard deviation of $m$ (in the range of $p$ you care about) is around $\sqrt{n}$, and the threshold window for connectivity is much larger (corresponds to roughly $n$ edges) you don't lose anything by this conversion. Or to put it another way: the values of $m$ you get with high probability give models $G(n,m)$ which all have essentially the same probability of being connected.
But the `right' way to answer this question, I think, is really the branching processes approach Timothy Chow mentions.
The most satisfying solution I've seen so far are these notes by Sabastian Roch (see section 2.2.3, and in particular claim 2.25). He first argues that the only components besides the giant one are isolated vertices, and he then shows that there is a positive probability of isolated vertices existing (with a refined estimation being performed as part of the exercises of the notes).