Why do composite bosons form a BEC?
A bound state of two fermions is, among other things, a state in which the two fermions are highly entangled with each other, in the sense that the bound-state creation operator can't be factorized into a product of two fermion creation operators. In this sense, entanglement is the key.
Let $a_n^\dagger$ and $a_n$ denote the creation and annihilation operators for a fermion in the $n$th mode (where "mode" accounts for momentum, spin, charge, and any other distinguishing labels).
Now, suppose that we have a bound state of two fermions. The operator that creates one of these composite bosons ("atoms") has the form $$ b^\dagger(f)=\sum_{n,m}f(n,m)a_n^\dagger a_m^\dagger \tag{1} $$ for some complex-valued function $f$. The fermion creation operators anticommute with each other (Pauli exclusion principle), so applying $a_n^\dagger a_m^\dagger$ twice would give zero. More generally, using the abbreviation $$ a^\dagger(g) = \sum_n g_n a_n^\dagger, \tag{2} $$ applying $a^\dagger(g)a^\dagger(h)$ twice would give zero. But applying $b^\dagger(f)$ twice doesn't give zero, because there are cross-terms in which all four subscripts are distinct. The number of times we can apply $b^\dagger(f)$ is limited only by the number of distinct indices on $a_n^\dagger$. Since $n$ is naively a continuous index (it includes the momentum or location degree of freedom), there might seem to be no limit at all on the number of these atoms that we can put into the same "state" $f$.
However, it's not quite right to treat the index $n$ as having an infinite number of allowed values, because saying that the atom has a finite size is kind of like putting the fermions in a box, which (in a back-of-the-envelope sense) is like restricting their momenta to a discrete list. And the momenta can't be arbitrarily large, because the atom only has a finite amount of energy. This effectively limits $n$ to a finite set of values, which in turn effectively limits the number of these atoms we can pile into the same state $f$. The spacing between the discrete momenta decreases with the increasing size of the "box" (the size of the bound state's wavefunction, including it's center-of-mass spread), so the "repulsive" effect that must limit the number of atoms (due to the interactions that I have been neglecting so far) is weaker if the atom wavefunction is more spread out. This was just a heuristic argument, but it seems consistent with the statement quoted in the OP.
Above, I used a single discrete index $n$ only for notational simplicity. To be a little more explicit, instead of writing $a_n^\dagger$, we could write $a_n^\dagger(x)$ for the operator that creates a single fermion at location $x$. (This is okay in the non-relativistic approximation.) Now the index $n$ is used only for all of the other degrees of freedom, those not already taken into account by $x$. With this more expanded notation, we can write the atom creation operator as $$ b^\dagger(f,\psi)=\int dx\,\psi(x)\int dy\, \sum_{n,m}f_{n,m}(y) a^\dagger_n(x+y)a^\dagger_m(x-y) \tag{3} $$ The way this is written, $f$ is the "internal" state and $\psi(x)$ is the atom's center-of-mass wavefunction. Then $(b^\dagger(f,\psi))^2\neq 0$. This says that we can mathematically create a state with two of these atoms, identical both in the wavefunction $\psi$ and in the internal state $f$, even though $(a^\dagger_n(x))^2=0$.
Using this expanded notation, here's another heuristic argument that leads to the same conclusion. Suppose that a single atom has "volume" $v$, in some sense. Then, within a total volume $V$, we could pack $\sim V/v$ of these localized atoms next to each other, without overlapping much. We might not call that a BEC, because we put the atoms all in different locations to avoid overlap. But now suppose that $\psi_1(x),\psi_2(x),...$ are the wavefunctions of those individual non-overlapping atoms, and consider the wavefunction $$ \psi(x)=\sum_k \psi_k(x) \tag{4} $$ with $\sim V/v$ terms in the sum, and consider the single-atom creation operator (3) with this choice of $\psi$. Applying $\sim V/v$ copies of this operator to the vacuum state will give a non-zero result that is equivalent to the state that was just described, in which we packed the atoms next to each other; but in this new description we would say that all of the atoms are in the "same state," because we constructed the state by applying a bunch of copies of the same creation operator.
The preceding arguments ignored interactions, aside from the assumption that two fermions form a bound state. If we include interactions, then we can still construct a state-vector by applying a bunch of copies of the same single-atom creation operator to the vacuum state, but the resulting state won't necessarily be a good approximation to a real BEC if the number of applications of $b^\dagger$ is large. A real BEC must involve some kind of effect that ultimately compensates for the fact that applying too many $b^\dagger$s will eventually give zero, when the cross-terms are exhausted. The state $(b^\dagger)^N|0\rangle$ might be better regarded as a component of the true BEC state, constituting most of the true BEC state when $N\ll V/v$ (dilute BEC) but contributing less and less to the true BEC state when $N$ is larger and larger. Before we reach $N\sim V/v$, the interactions that I've been neglecting will become significant, so that the transition between being able to put many atoms in an identical state and not being able to put too many in that state will be a smooth transition.
The point of the simple back-of-the-envelope analysis was only to show that we can pile a bunch of composite bosons into the same state without any significant excitation, as long as the BEC is sufficiently dilute.
The best way (I believe) to look at this is the method of effective field theory.
When two (fermionic) atoms form a composite boson, the resulting state is very complicated. The molecule is made of atoms, the atoms are made of nuclei and electrons, the nuclei are made of neutrons and protons, the neutrons and protons are made of quarks and gluons (and for all we know quarks might be composite, or excitations of fundamental strings). All these particles have different statistics, and they have complicated internal excitations. Strictly speaking, for example, the wave function of the quarks in a neutron in one atom must be anti-symmetrized with the all quarks in any neutron or proton in the other atom.
Clearly, we do not know how to actually do this right. But we do know that at very low resolution (low energy, long distance, low density) the composite boson is just a point-like bosonic field, and the most general lagrangian for such a field is $$ {\cal L} = \psi^\dagger\left( -\frac{\hbar^2\nabla^2}{2m^*}-\mu + V_{ext}(x) \right)\psi + \ldots . $$ This lagrangian describes Bose condensation at the Einstein temperature, just like a truly pointlike Bose gas, but possibly with a modified mass $m^*\neq 2m$. If we know how to compute the binding energy of the molecule we can compute this shift.
What about the fact that the boson is composite? According to the rules of effective field theory, this must be encoded in higher order terms in the lagrangian. The next term is an interaction $$ {\cal L} = C_0 (\psi^\dagger\psi)^2 + \ldots $$ Intuitively, this makes sense. If the the composite boson is made of fermions then the bosons should notice the anti-symmetrization requirement if they get close, and it should be reflected in an effective repulsion.
We learn two more things that are useful: 1) The interaction term can be related to the composite boson scattering length. This means we can quantify the effect of compositeness, by either calculating or measuring the scattering cross section. 2) We can compute, in perturbation theory, the effect of $C_0$ on the Bose condensed state and the critical temperature for Bose-Einstein condensation. This has been studied in some detail, and is described in text books on many body physics. The shift in $T_c$ is $$ \Delta T_c = 1.3 an^{1/3} T_c^{0} $$ where $T_c^0$ is the Einstein temperature, $n$ is the density of bosons, and $a$ is the boson-boson scattering length. If this shift becomes large (of order 1), then we know that compositeness is an $O(1)$ effect, and the EFT for bosons must be discarded. We have to study the problem using an EFT for (pointlike) fermions.
Of course, the fermions are composite, too. The same logic again applies. At leading order the effects of compositeness are encoded in masses and interaction parameters.
The result quoted above identifiies the parameter $an^{1/3}$ that governs the approximation of treating the boson as pointlike. Note that $1/n^{1/3}$ is the typical distance between bosons. This means that expansion parameter is the ratio of the interaction length over the average distance.