How to find an all-in-one 2D to 3D Transformation Matrix for perspective projection, rotation, and translation?
It looks like you are trying to solve for a map from 2D points to 3D points, so I'm a bit confused... a projection transformation would map the 3D points to the 2D points (and the inverse is, of course, impossible since each point on the projection plane could lie anywhere on a ray form the camera through the plane.)
Next, notice there is no difference between first transforming an object in 3D, and then projecting through a fixed camera, versus leaving the object in place and projecting through a camera of unknown position and orientation. Here I'll take the former approach.
We have some points in 3D and apply an affine transformation to them, then project through a camera at the origin looking down the $z$ axis, with the projection plane passing through $z=1$. This makes the projection matrix $P: (x,y,z) \to (u,v,w)$ easy: it is just the identity.
Before we project we apply some affine transformation $Mq + t$ to the 3D points $q$. Notice that we do not constraint $M$ to only rotate and scale here: to do so we would need to add additional (nonlinear) constraints on the coefficients of $M$. The short of it is that you will need to supply more than the theoretical minimum of four corresponding points to determine the map (and you will get shear if your corresponding points did not come from a bona fide Euclidean motion + projection.)
So now the total map can be written as
$$\left[\begin{array}{c}u\\v\\w\end{array}\right] = \left[\begin{array}{cccc}m_{11} & m_{12} & m_{13} & t_x\\m_{21} & m_{22} & m_{23} & t_y\\m_{31} & m_{32} & m_{33} & t_z\end{array}\right]\left[\begin{array}{c}x\\y\\z\\1\end{array}\right].$$
Since $(u,v,w) \sim (u/w,v/w,1)$, this map is scale-invariant, so we might as well set $m_{33} = 1$. We can also write it in block form (which will prove useful) as
$$\left[\begin{array}{c}u\\v\\w\end{array}\right] = \left[\begin{array}{c}N_{uv}\\N_w\end{array}\right]\left[\begin{array}{c}x\\y\\z\\1\end{array}\right].$$
Like you say, we only know $u/w$ and $v/w$ for the corresponding points, not $u,v,w$. Well, $$\left[\begin{array}{c}u/w\\v/w\end{array}\right] = N_{uv}\left[\begin{array}{c}x\\y\\z\\1\end{array}\right]/N_w \left[\begin{array}{c}x\\y\\z\\1\end{array}\right],$$ or $$N_w \left[\begin{array}{c}x\\y\\z\\1\end{array}\right]\left[\begin{array}{c}u/w\\v/w\end{array}\right] = N_{uv}\left[\begin{array}{c}x\\y\\z\\1\end{array}\right]$$
which is a system of two linear equations in 11 unknowns. Plugging in $5\frac{1}{2}$ corresponding points will let you solve for $N$.