Compositions of homotopic maps are homotopic
There's no strong reason not to use $G(F(x,t),t)$. One reason the author may not have is that they may have had in mind the general principle that you can concatenate homotopies, and that this often lets you build homotopies that "do multiple things" by doing each one separately and concatenating the pieces along the time coordinate. In this particular example, it is actually possible to do both of them at once as you have pointed out, but it isn't always (see for instance my answer to this question).