What is the minimum number of separable pure states needed to decompose arbitrary separable states?
First of all, your problem is a special version of a more general problem, namely finding the minimum number of states which minimize the entanglement of formation, this is, given a state $\rho$ on AB$\equiv \mathbb C^D\otimes \mathbb C^{D'}$, find the decomposition $$ \rho = \sum_{i=1}^m p_i |\psi_i\rangle\langle\psi_i| $$ which minimizes $\sum_i p_i E(|\psi_i\rangle)$, where $E(|\psi_i\rangle) = S(\mathrm{tr}_B(|\psi_i\rangle\langle\psi_i|)$, and find the minimum $m$ for which such a decomposition exists.
Your problem is just the variant of this where the state has entanglement of formation zero.
This is a well-studied problem and in turn a special case of a so-called "convex roof construction". Uhlmann, for instance, states that for any such problem, at most $(DD')^2+1$ states are needed for the optimal decomposition (Proposition 2.1).
It is likely that better bounds exist for the special problem of entanglement of formation, or the given problem of separable states. I was unable to find any in the literature, but one should be able to prove one along the following lines:
First, note that one can relax the optimization to all decompositions $$\rho=\sum p_i\rho_i\,\tag{1}$$ where one minimizes $\sum p_i S(\mathrm{tr}_B\rho_i)$, since the entropy is concave, i.e. the minimium will always be (also) attained on pure $\rho_i$.
Thus, we can instead consider decompositions of the reduced density matrix $\rho^A = \sum p_i \rho_i^A$ -- any such decomposition arises from a decomposition (1) of $\rho$ (e.g. by writing $p_i\rho_i^A$ as $M_k\rho M_k^\dagger$ with a POVM $M_k$ and applying $M_k\otimes I$ to $\rho$).
Now consider an optimal decomposition $\rho^A = \sum p_i \rho_i^A$. If it has more than $D^2$ terms, the $\rho_i^A$ must be linearly dependent. Thus, we can decrease the weight of some $\rho_j^A$ all the way down to zero by shifting the weights of all the other $\rho_i^A$ (keeping $p_i\ge0$!). Again, due to concavity, this will not change the average entanglement.
We are now left with an optimal decomposition $\rho^A=\sum p_i\rho^A_i$ with $D^2$ terms. This yields a decomposition of $\rho$, $\rho=\sum p_i \rho_i$, which minimizes $\sum p_i S(\rho_i^A)$ (as described in 2.). We can now decompose each $\rho_i$ in their eigenbasis (which has at most $DD'$ terms), which yields a total of $D^3D'$ terms.
There is likely space for improvement: For instance, one could rewrite each of the $\rho_i^A$ in a basis of pure states $|\phi_{k,i}\rangle\langle\phi_{k,i}|$. Such a basis has size at most $D^2+1$ ($D^2$ being the dimension of the convex space), and the coefficients are $\mathrm{tr}(\rho_i^A|\phi_k\rangle\langle\phi_k|)$ and thus positive. Again, convexity yields an optimal decomposition with pure $\rho_i^A$ and $D^2$ terms. It only remains to decompose the corresponding $\rho_i^B$, which results in a total of $(D^2+1)D'$ terms.