A polynomial with nowhere surjective derivative
As a disclaimer, I will follow an "algebraic geometer" approach. Thus, all the morphisms will essentially be polynomials (or rational functions).
To be short, my answer is "yes". The exaplanation is a bit long though.
Let me start with a remark on your comment about the argument working over different fields. Over an algebraically closed field of characteristic zero (e.g., $\mathbb C$), there are tools that replace the implicit function theorem. That is, if we have a morphism between smooth varieties, the dimension of the image should match the maximal rank of the jacobian. In particular, in your case, the image of $P$ will be either a curve or a point (we know it by base changing to $\mathbb C$, as $\mathbb R$ is not algebraically closed). On the other hand, this is not true over fields of characteristic $p > 0$. In particular, the Frobenius morphism from a variety to itself (so, from $k^2$ to itself, if $k$ is a field of positive characteristic) is a dominant morphism (that is, the image is dense. It is surjective if $k$ is perfect) whose differential is identically zero. Indeed, this morphism is defined as $x \mapsto x^p$, and its differential is $p \cdot x^{p-1}dx$, which is $0$ as $p=0$. Thus, if we take the Frobenius for $k^2$ with $k$ a field of positive characteristic, the claim in your question does not hold.
Now, let me assume that we are working over an algebraically closed field of characteristic zero. In particular, we may assume that $k=\mathbb C$. We will use this case to deduce something about the case $k = \mathbb R$.
Now, by the discussion above, we know that the image has dimension 0 or 1. If it has dimension 0, then your claim follows, as we can take $f \colon k^2 \rightarrow k$ and $g \colon k \rightarrow k^2$ to be suitable constant polynomials. Thus, I will assume that the image has dimension 1.
By the above reductions, we know that the image of $P : k^2 \rightarrow k^2$ is a curve $C$. This curve $C$ may be singular. On the other hand, as $k^2$ is smooth, we know that $k^2 \rightarrow C$ factors as $f \colon k^2 \rightarrow T$ and $g \colon T \rightarrow C \subset k^2$, where $T$ is smooth (here $T$ is the normalization of $C$). Now, as $k^2$ is a rational variety (it can be compactified to be $\mathbb P ^2_k$, the projectiva plane over $k$), $T$ is unirational (essentially by definition of unirational). As $T$ has dimension 1, the solution to the Lüroth problem in dimension 1, it follows that $T$ is rational. That is, $T$ is an open subset of the projective line $\mathbb P ^1_k$. Furthermore, as $T$ maps to $k^2$, it can not be compact. Thus, $T$ is an open subset of the affine line $k$. As the topology I am using is the Zariski topology, it means that $T$ is $k$ with possibly a finite number of points removed. By the comment of David Speyer below, we can then conclude that $T=k$. Indeed, if $T$ was strictly contained in $k$, $T$ would admit non-constant units in its coordinate ring. Then, these functions would pull back to $k^2$ along the morphism $k^2 \rightarrow T$, providing some non-constant units in $k[x,y]$, which is a contradiction.
Thus, to summarize, we got a morphism $f \colon k^2 \rightarrow k$, and a morphism $g \colon k \rightarrow C \subset k^2$. As we are dealing with algebraic varieties, and as $f$ has domain $k^2$ and target $k$, $f$ is a polynomial. Similarly, $g$ is a polynomial too.
Now, let's go back to your $P$ defined over $\mathbb R$. By base change to $\mathbb C$, we can regard $P$ as a morphism $P_\mathbb{C} \colon \mathbb C ^2 \rightarrow \mathbb C ^2$ (basically, you keep the same polynomials, and input complex values). As the image of $P_\mathbb{C}$ has dimension 1, so does the image of $P$. Furthermore, the image of $P_\mathbb{C}$ is obtained from the image of $P$ by "base change to $\mathbb{C}$". This guarantees that what we concluded over $\mathbb C$ still holds over $\mathbb R$.
Let me conclude with a remark. We heavily relied on the fact that we were interested in $k^2$. This guaranteed that the image of $P$ was a point or a curve. To treat the curve case, we used Lüroth's theorem. This guarantees that every unirational curve is rational. This is not true in higher dimension: it is still true for surfaces in characteristic zero, but it fails even over $\mathbb C$ in higher dimension. Thus, I do not expect this argument to generalize for morphisms $P \colon k^n \rightarrow k^n$ with $dP$ never surjective.
Here is an elementary proof.
First, some notation and terminology. Capital letters such as $F,G$ denote polynomials in $\mathbb{R}[x,y]$. Also, $f$ (and slight variants of it, such as $\tilde{f},f_1,f_2$, etc) denotes a polynomial in $\mathbb{R}[x,y]$, while $g,h$ (and slight variants) will denote polynomials in $\mathbb{R}[t]$. $F_x,F_y$ denote $\frac{\partial F}{\partial x},\frac{\partial F}{\partial y}$, respectively. We say "$F$ is a polynomial of $f$" is there is some $g$ with $F = g\circ f$. We say that $f$ is a "minimal polynomial" if: if $f$ is a polynomial of $\tilde{f}$, then $f$ is an affine transformation of $\tilde{f}$ (i.e. $f = g\circ \tilde{f}$ implies $g(t) = at+b$ for some $a,b \in \mathbb{R}$). We say "$f$ is a minimal polynomial of $F$" if $f$ is a minimal polynomial and $F$ is a polynomial of $f$.
Note that, since a square matrix is not surjective if and only if the determinant is $0$, the main problem is equivalent to: $F_xG_y = F_yG_x$ implies $F,G$ are polynomials of the same polynomial.
.
Lemma $1$: $F_xG_y = F_yG_x$ implies there are $H^x,H^y,R^F,R^G$ with $$F_x = H^x R^F$$ $$F_y = H^y R^F$$ $$G_x = H^x R^G$$ $$G_y = H^y R^G.$$
Proof: Let $H^x = GCD(F_x,G_x), H^y = GCD(F_y,G_y)$, where the GCD is for polynomials in $\mathbb{R}[x,y]$ (the GCD is determined up to some multiplicative constant, but just choose arbitrary constants for now). Write $F_x = H^x R_1, F_y = H^y R_2, G_x = H^x R_3, G_y = H^y R_4$ with $GCD(R_1,R_3),GCD(R_2,R_4) = 1$. Then $F_xG_y = F_yG_x$ implies $R_1R_4 = R_2R_3$ (if $H^x \equiv 0$, then $F_x,G_x \equiv 0$ implying $F,G$ are both poly's of $y$, in which case we can let $H^x=0,H^y=1,R^F = F_y, R^G = G_y$ in the Lemma). Now, $GCD(R_1,R_3) = 1$ implies $R_1 \mid R_2$, and $GCD(R_2,R_4) = 1$ implies $R_2 \mid R_1$. So, let $c \in \mathbb{R}$ be such that $R_1 = c R_2$. Similarly, we obtain $R_3 = c'R_4$ for some $c' \in \mathbb{R}$. But $R_1R_4 = R_2R_3$ implies $c=c'$. So, we obtain $F_x = H^x cR_2, F_y = H^y R_2, G_x = H^xcR_4, G_y = H^y cR_4$. So, changing $H^x$ to $cH^x$ and letting $R^F = R_2, R^G = R_4$ gives the Lemma. $\square$
.
Conjecture $1$: $F_xG_y = F_yG_x$ implies $(GCD(F_x,G_x))_y = (GCD(F_y,G_y))_x$.
.
Remark: The GCD's in conjecture $1$ are to be interpreted to have multiplicative constants so that they form $H^x$ and $H^y$ in Lemma $1$. In other words, conjecture $1$ is saying that $H^x_y = H^y_x$, which is equivalent to the existence of some $H$ (in $\mathbb{R}[x,y]$) with $H_x = H^x$ and $H_y = H^y$ (define $H$ to be the $x$-antiderivative of $H^x$ plus a suitable function of $y$). We will prove conjecture $1$ shortly. But first:
.
Lemma $2$: Any $H \in \mathbb{R}[x,y]$ has a minimal polynomial.
Let $f_0 = H$. For $n \ge 0$, if $f_n$ is not a minimal polynomial, we may let $f_{n+1}$ be such that $f_n$ is a polynomial of $f_{n+1}$ but not an affine transformation of $f_{n+1}$. Since $f_{n+1}$ has smaller degree than $f_n$ (and constant polynomials are minimal polynomials), some $f_N$ is a minimal polynomial. As $H$ is a polynomial of $f_0$ and $f_n$ is a polynomial of $f_{n+1}$ for $0 \le n \le N-1$, it follows that $H$ is a polynomial of $f_N$. $\square$
.
Proposition $1$: Conjecture $1$ implies the following. For any $F,H$ for which there is some $R$ with $F_x = H_x R$, $F_y = H_y R$, it holds that $F$ is a polynomial of any minimal polynomial of $H$.
Proof: We induct on $\deg(F)$ (i.e. largest degree of monomial in $F$, where $\deg(x^ay^b) := a+b$). The proposition is trivial if $\deg(F) = 0$, i.e. if $F$ is constant. So suppose $\deg(F) \ge 1$, and fix a minimal polynomial $f^H$ of $H$. Note that $F_{xy} = H_{xy}R+H_xR_y$ and $F_{yx} = H_{yx}R+H_yR_x$. Since $F_{xy} = F_{yx}$ and $H_{xy} = H_{yx}$, we see that $H_xR_y = H_yR_x$. By Lemma $1$ and conjecture $1$, there are $Q,T,S$ with $H_x = Q_xT, H_y = Q_y T, R_x = Q_x S, R_y = Q_y S$. If $\deg(H) = \deg(F)$, then going back to $F_x = H_xR, F_y = H_y R$, we see $H = F$ and we are of course done. And of course we have $\deg(R) \le \deg(F_x) < \deg(F)$. Therefore, we can apply the inductive hypothesis to deduce that $H$ and $R$ are polynomials of any minimal polynomial of $Q$. Let $f$ be any minimal polynomial of $Q$, and say $H = g_1\circ f, R = g_2\circ f$. Then $F_x = H_xR = (g_1'\circ f)f_x(g_2\circ f) = (g_1'g_2 \circ f)f_x$ implies $F = (\int g_1'g_2)\circ f$ up to a polynomial in $y$, but then $F_y = H_y R$ implies that polynomial in $y$ is $0$, so $F$ is a polynomial of $f$.
But we want $F$ a polynomial of $f^H$. It suffices, then, to show that $f^H$ is an affine transformation of $f$. To do this, write $H = g_3 \circ f^H$. Recalling that we had $H = g_1\circ f$, we see $(g_3'\circ f^H)f^H_x = H_x = (g_1'\circ f)f_x$ and $(g_3'\circ f^H)f^H_y = H_y = (g_1'\circ f)f_y$. These equalities imply $f^H_xf_y = f^H_yf_x$, so by Lemma $1$ and conjecture $1$ once again, there are $\tilde{Q},\tilde{T},\tilde{S}$ with $f^H_x = \tilde{Q}_x\tilde{T}, f^H_y = \tilde{Q}_y\tilde{T}, f_x = \tilde{Q}_x\tilde{S}, f_y = \tilde{Q}_y\tilde{S}$. Since $\deg(f^H),\deg(f) \le \deg(H) < \deg(F)$, the inductive hypothesis implies that $f^H,f$ are polynomials of any minimal polynomial of $\tilde{Q}$. Say $\tilde{f}$ is a minimal polynomial of $\tilde{Q}$. Then, by definition, $f^H,f$ are affine transformations of $\tilde{f}$ and thus $f^H,f$ are affine transformations of each other. $\square$
.
Remark: Minimal polynomials are actually unique. I know this, since the truth of the main problem easily implies it, and we know the truth of the main problem from Stefano (or from this whole answer). However, I couldn't use this secret uniqueness, since I don't want a circular proof, of course. I sort of established the uniqueness of the minimal polynomial of $H$ in the proof above, but I needed an inductive statement to work with.
.
Conjecture $1$ implies the main problem.
Proof: By Lemma $1$ and conjecture $1$, $F_xG_y = F_yG_x$ implies there are $H,R^F,R^G$ as in Lemma $1$. Take any minimal polynomial $f$ of $H$. Then proposition $1$ implies that $F,G$ are polynomials of $f$. $\square$
.
Remark: The proof just given shows why it is important that proposition $1$ cannot just be "$F$ is a polynomial of some minimal polynomial of $H$" (which is easier to prove), since we needed the same polynomial for both $F$ and $G$.
Now on to the proof of Conjecture $1$. First, a lemma. For $F,G \in \mathbb{R}[x,y]$, we denote $GCD_{x,y}(F,G)$ to be the GCD of $F,G$ in $\mathbb{R}[x,y]$ (i.e. the GCD we've been talking about), and $GCD_x(F,G)$ to be the GCD in $(\mathbb{R}(y))[x]$ of $F$ and $G$ viewed as polynomials in $(\mathbb{R}(y))[x]$. Note the latter is determined up to a multiplicative factor of a rational polynomial in $y$.
.
Lemma 3: For any $F,G \in \mathbb{R}[x,y]$, $GCD_x(F,G) = GCD_{x,y}(F,G)$, up to a rational polynomial of $y$.
Proof: Let $H = GCD_{x,y}(F,G)$. Then there are $R^F,R^G,Q^F,Q^G \in \mathbb{R}[x,y]$ with $F = H R^F, G = H R^G$, and $GCD_{x,y}(R^F,R^G) = 1$. We may view the equations $F = HR^F, G = HR^G$ in $(\mathbb{R}(y))[x]$. If $GCD_x(R^F,R^G) = 1$, then we're done, so suppose otherwise; that is, there are $h,f,g \in (\mathbb{R}(y))[x]$ with $R^F = hf$ and $R^G = hg$, and $\deg_{(\mathbb{R}(y))[x]}(h) \ge 1$. Let $l_h,l_f,l_g \in \mathbb{R}[y]$ denote the lcm of the denominators of the coefficients of $h,f,g$, respectively. Then, $l_hl_f R^F = (l_h h)(l_f f)$ and $l_hl_g R^G = (l_h h)(l_g g)$. For each irreducible element $q$ of $\mathbb{R}[y]$ dividing $l_h$, since $l_hl_fR^F, l_h h$, and $l_f f$ are in $\mathbb{R}[x,y]$, we may reduce the equation $l_hl_f R^F = (l_h h)(l_f f)$ mod $q$ to see that $q$ divides each coefficient of $l_f f$. Repeating this argument allows us to write $R^F = \frac{l_h h}{l_f}\frac{l_f f}{l_h}$, with $\frac{l_h h}{l_f}, \frac{l_f f}{l_h} \in \mathbb{R}[x,y]$. Similarly, we have $R^G = \frac{l_h h}{l_g}\frac{l_g g}{l_h}$. Therefore, if we let $L = lcm(l_f,l_g)$ (in $\mathbb{R}[y]$), we see $R^F = \frac{l_h h}{L}\frac{L}{l_f}\frac{l_f f}{l_c}$ and $R^G = \frac{l_h h}{L}\frac{L}{l_g}\frac{l_g g}{l_h}$, implying $\frac{l_h h}{L}$ divides both $R^F$ and $R^G$ in $\mathbb{R}[x,y]$, implying it must be constant, implying $h$ is solely a function of $y$, yielding the desired contradiction. $\square$
.
Proposition $2$: For $F,G,Q,R^x,R^y \in (\mathbb{R}(y))[x]$ satisfying $F_xG_y = F_yG_x$, and $F_x = QG_x+R^x$ and $F_y = QG_y+R^y$ with $\deg(R^x) < \deg(G_x), \deg(R^y) < \deg(G_y)$, it holds that $R^x_y = R^y_x$.
Remark: Here, just to clarify, the degree of a polynomial in $(\mathbb{R}(y))[x] = c_nx^n+\dots+c_1x+c_0$ with $c_n,\dots,c_0 \in \mathbb{R}(y)$ and $c_n \not = 0$ is $n$.
.
Proposition $2$ implies Conjecture $1$.
Proof: The division algorithm in $(\mathbb{R}(y))[x]$ implies there are $Q^x,Q^y,R^x,R^y \in (\mathbb{R}(y))[x]$ with $F_x = Q^xG_x+R^x, F_y = Q^yG_y+R^y$, and $\deg(R^x) < \deg(G_x), \deg(R^y) < \deg(G_y)$. Multiplying the equations gives $Q^y F_xG_y +F_xR^y = Q^xG_xF_y + R^xF_y$. Therefore, using $F_xG_y = F_yG_x$, we obtain $(Q^y-Q^x)F_xG_y = R^xF_y-R^yF_x$. Since $\deg(R^x) < \deg(G_x)$, $\deg(R^xF_y) < \deg(G_xF_y) = \deg(F_xG_y)$, and since $\deg(R^yF_x) < \deg(G_yF_x)$, we must have $Q^y = Q^x$. Denote each by $Q$. Then, proposition $2$ implies $R^x_y = R^y_x$. This allows us to take $R$ such that $R_x = R^x$ and $R_y = R^y$. Now just repeat this whole process with $(G,R)$ instead of $(F,G)$, which we can do since $F_xG_y = F_yG_x$ implies $G_xR_y = G_yR_x$. Euclid's algorithm says that if we keep doing this process, we eventually end up with remainders $GCD_x(F_x,G_x)$ and $GCD_x(F_y,G_y)$, so a final application of proposition $2$, together with Lemma 3, finishes the job. $\square$
.
Remark: It should be emphasized that the $Q$ and $R$ obtained in the above process can be in $(\mathbb{R}(y))[x]$ and not $(\mathbb{R}[y])[x]$ (e.g. $x = \frac{1}{y}(xy+1)-\frac{1}{y}$), which is why we've been talking about $(\mathbb{R}(y))[x]$. However, this distinction doesn't matter for any of the proofs. Also, part of the statement of proposition $2$ and implicit in part of the proof just given is that the exact value of $GCD_x(F_x,G_x)$ or $GCD_y(F_y,G_y)$ (i.e., whatever specific multiplicative factor of a rational polynomial in $y$ we have) does not matter.
.
Proof of Proposition $2$
Note $F_{xy} = Q_yG_x+QG_{xy}+R^x_y$ and $F_{yx} = Q_xG_y+QG_{yx}+R^y_x$, so, since $F_{xy} = F_{yx}$ and $G_{xy} = G_{yx}$, it suffices to show $Q_xG_y = Q_yG_x$. Write $F(x) = a_nx^n+\dots+a_1x+a_0, G(x) = b_mx^m+\dots+b_1x+b_0, Q(x) = c_{n-m}x^{n-m}+\dots+c_1x+c_0$. Then, $$F_x = na_nx^{n-1}+\dots+2a_2x+a_1 \mspace{15mm} F_y = a_n'x^n+\dots+a_1'x+a_0'$$ $$G_x = mb_mx^{m-1}+\dots+2b_2x+b_1 \mspace{15mm} G_y = b_m'x^m+\dots+b_1'x+b_0'$$ $$Q_x = (n-m)c_{n-m}x^{n-m-1}+\dots+2c_2x+c_1 \mspace{5mm} Q_y = c_{n-m}'x^{n-m}+\dots+c_1'x+c_0'.$$ Now fix $k \ge m-1$. That the coefficient of the term $x^k$ in $F_x$ is the same as that in $QG_x$ gives $$(k+1)a_{k+1} = \sum_{k-m+1 \le j \le n-m} c_j(k-j+1)b_{k-j+1},$$ where I assumed $k \ge n-m$ for ease (if $k < n-m$, the upper bound on $j$ is $k$ instead of $n-m$, but this doesn't meaningfully change the argument at all). Differentiating with respect to $y$ gives $$(k+1)a_{k+1}' = \sum_{k-m+1 \le j \le n-m} \left[c_j'(k-j+1)b_{k-j+1}+c_j(k-j+1)b_{k-j+1}'\right].$$ Now, $Q_xG_y$ and $Q_yG_x$ have the same the coefficient of the $x^k$ term if and only if $$\sum_{k-m+1 \le j \le n-m} jc_jb_{k-j+1}' = \sum_{k-m+1 \le j \le n-m} c_j'(k-j+1)b_{k-j+1}.$$ Comparing with the last centered equation, we wish to show $$\sum_{k-m+1 \le j \le n-m} jc_jb_{k-j+1}' = (k+1)a_{k+1}'-\sum_{k-m+1 \le j \le n-m} c_j(k-j+1)b_{k-j+1}',$$ which is equivalent to $$\sum_{k-m+1 \le j \le n-m} c_jb_{k-j+1}' = a_{k+1}'.$$ And this is true, since the coefficient of the term $x^{k+1}$ in $F_y$ is the same as that in $QG_y$. $\square$