Moduli space of curves
Gromov defines in 1.5.A' in his paper "weak convergence" $C_j\rightarrow \overline{C}$ of (unparametrized) J-curves $C_j$ to a (unparametrized) cusp curve $\overline{C}$, by saying that for some choice of parametrizations $f_j$ of the J-curves, they converge in a parametrized sense to a parametrized cusp curve $f$ with image $\overline{C}$.
If one defines convergence of "parametrized cusp curves", then parametrized limit map (defined on a nodal Riemann surface) occuring in the proof of Gromov's compactness result may be constant on several components of the nodal Riemann surface.
Thus when one defines "convergence of "parametrized cusp curves" one needs to decide, how much information about the "constant components" of the limit nodal Riemann surface, on which the limit map is constant, should be incorporated in the notion of convergence.
One can either decide to keep none (of the information about) the constant components, keep all information about constant components, or keep some constant components (this is what one does for stable maps).
For any of these choices of definitions, Gromov's compactness result guarantees that any sequence of (nonconstant, if one does not allow constant components) parametrized J-curves (defined on nonnodal Riemann surfaces) of bounded area and bounded topological type converges (in a suitable sense, after passing to a subsequence) to some parametrized limit cusp curve. The existence of a (possibly unparametrized) limit cusp curve (maybe together with some extra information on this limit) is sufficient to cover many applications.
Indeed, a space of holomorphic curves is "often" either compact (without the addition of cusp curves), or, if it is noncompact, it is sufficient to know that as a consequence of noncompactness, there exists some sequence, whose limit can only be represented by cusp curves of a specific type (e.g. there are at least two nonconstant components). This is, I think, also how Gromov used cusp curves: Not by explicitly constructing compactifications of spaces of holomorphic curves, but by exploiting the existence of suitable cusp curves in the case of noncompactness.
If one allows in the definition of "parametrized cusp curve" arbitrary constant components, then these cusp curves are not very suitable to give a nice compactifcation for spaces of parametrized J-curves, since the space of all such parametrized cusp curves (of bounded area and e.g. bounded topological type of the smoothed parametrizing surface) is highly non-Hausdorff: Any convergent sequence converges to uncountably many different cusp curves. Indeed, you can attach (finitely many finite trees) of constant spheres at arbitrary nonnodal points of a limit map, and the original sequence will also converge to this new limit cusp curve. (Depending on the precise definition of convergence of cusp curves, the space is for similar reasons not compact either).
This kind of non-uniqueness phenomena already exist for constant holomorphic curves, where one is reduced to the the moduli space of Riemann surface of a given type; in this setting general (constant) cusp curves correspond then just to nodal Riemann surfaces (with fixed arithmetic genus). It turns out that one can recover "uniqueness of the limits", if one requires the limit to be a stable nodal Riemann surface (this goes back to Deligne and Mumford in a more general, algebraic geometric setting).
This notion of stable Riemann surface was then extended by Kontsevich to the notion stable maps. He also noticed, that the space of stable maps (of bounded area and bounded topological type) is a compact Hausdorff space, and thus by considering only stable (limit cusp-)maps, one obtains a "nice compactification" of the original space.
The original space of maps is however not necessarily dense in the space of all stable maps (of the same type); i.e. there are sometimes stable maps, which do not arise as limit maps of parametrized J-curves defined on smooth Riemann surfaces. This only happens for nonconstant stable maps: the space of smooth Riemann surfaces of fixed type is always dense in its Deligne-Mumford compactification of stable Riemann surfaces of the same type.
For stable maps one allows some constant components of parametrized (limit-)cusp curves. One could instead also require in the definition of "parametrized cusp curves", that there are no components on which the map is constant (presumably what the OP is interested in). Then the difference to the stable map compactfication is visible, for instance, when considering all stable maps (with a constant sphere component with at least 4 nodal points) which coincide as maps on the nonconstant components. All these stable maps represent the same "parametrized cusp curve with no constant components". The collection of "parametrized cusp curves with no constant components" is compact and Hausdorff (given bounds on the area and the topological type); but it does not distinguish "limits", which one sometimes might like to distinguish.
Indeed the information, which is kept in addition for stable maps, can be naturally used, when one glues (parametrized) cusp curves: If several components meet at the same point in the target space, then, when attempting to glue these components (for stable maps) one glues first the underlying Riemann surfaces and the maps on them; this involves among other things local charts for the Deligne-Mumford compactification describing the parameters for glueing the domain Riemann surface. Thus one has a good framework, to at least attempt to construct a local manifold chart for the compactified moduli space near a fixed stable map and the compactifying strata have the expected dimension (as in Deligne-Mumford theory).
If one omits all constant components, then one can still attempt to glue several components meeting at some point in target, for instance by glueing in all possible ways and considering gluings involving all possible auxiliary constant unstable components. If one does this, then the added strata do not have the expected codimension and may be less natural/less likely to lead to a manifold structure. (I think if one forgets about the constant components, then the added strata would have higher than expected codimension, if for instance at least 4 components meet at a point in the target).
Convergence to cusp curves is the original compactification by Gromov, whereas convergence to stable maps is the compactification by Kontsevich (a cusp curve corresponds to the image of a stable map). The latter is more accurate if you want to correctly model the topology of the moduli space (with a fixed number of marked points), so as to build ``virtual (pseudo)cycles'' associated with evaluation maps on the moduli spaces and hence use the full power of the Gromov-Witten invariants. I believe an explanation is in Chapters 5+7 of McDuff-Salamon's big book. (There they mention that if your manifold is semi-positive then the usual Gromov-Witten invariant can be built without the finer notion of stable maps.)
Think of it this way: You could also take the one-point compactification, but that probably won't tell you a lot.