Sum of squared nearest-neighbor distances between points in a square
We shall prove the more general result: for $n\ge2$ distinct points and any positive real $a,b$, we have $D(R):=d_1^2+\cdots+d_n^2\le 2a^2+2b^2$, where the points $P_j$ are now in an $a\times b$ rectangle $R$ and the distances $d_j$ are defined as in the OP. Let us use induction on $n$. For $n=2$, the result is obvious. Suppose now that $n>2$. Without loss of generality (wlog), $R=[0,a]\times[0,b]$. "Partition" $R$ into the four $\frac a2\times \frac b2$ rectangles $R_1:=[0,\frac a2]\times[0,\frac b2],R_2:=[\frac a2,a]\times[0,\frac b2], R_3:=[0,\frac a2]\times[\frac b2,b],R_4:=[\frac a2,a]\times[\frac b2,b]$
Let $n_1,\dots,n_4$ be the numbers of the points in the rectangles $R_1,\dots,R_4$, respectively.
Case 1: $n_1,\dots,n_4\ge2$. Then, by induction, \begin{equation} d_1^2+\cdots+d_n^2=D(R)\le D(R_1)+\dots+D(R_4)\le4[2(\tfrac a2)^2+2(\tfrac b2)^2]= 2a^2+2b^2. \end{equation}
Lemma 1. Suppose that $n_1\ge2$ and $n_2=1$, so that there is exactly one point, say $Q$, in $R_2$. Then \begin{equation} D(R_1)+d(Q,R_1)^2\le a^2+b^2, \end{equation} where $d(Q,R_1)$ is the shortest distance from $Q$ to the $n_1$ points in $R_1$.
Proof: Let $s$ be the largest abscissa of all the $n_1$ points in $R_1$, so that $0\le s\le\tfrac a2$. Let $(u,v):=Q$, so that $\tfrac a2\le u\le a$ and $0\le v\le\tfrac b2$. By symmetry, wlog $0\le v\le\tfrac b4$, and hence $d(Q,R_1)^2\le(u-s)^2+(\tfrac b2-v)^2$. So, by induction, \begin{equation} D(R_1)+d(Q,R_1)^2\le2s^2+2(\tfrac b2)^2+(u-s)^2+(\tfrac b2-v)^2\le a^2+b^2, \end{equation} which proves Lemma 1. $\Box$
Case 2: One of the $n_j$'s is $1$, and the other $n_j$'s are $\ge2$. Here wlog $n_1,n_3,n_4\ge2$ and $n_2=1$. Then, in view of Lemma 1, \begin{equation} d_1^2+\cdots+d_n^2=D(R)\le D(R_1)+d(Q,R_1)^2+D(R_3)+D(R_4)\le a^2+b^2+2[2(\tfrac a2)^2+2(\tfrac b2)^2]= 2a^2+2b^2. \end{equation}
Case 3: Two of the $n_j$'s are $1$'s, and the other $n_j$'s are $\ge2$. Here wlog $n_1,n_3\ge2$ and $n_2=n_4=1$ (the "adjacent" subcase) or $n_1,n_4\ge2$ and $n_2=n_3=1$ (the "non-adjacent" subcase). In the "adjacent" subcase, let $Q$ be the only point in $R_2$, and let $T$ be the only point in $R_4$. Then, in view of Lemma 1, \begin{equation} d_1^2+\cdots+d_n^2=D(R)\le D(R_1)+d(Q,R_1)^2+D(R_3)+d(T,R_3)^2\le a^2+b^2+a^2+b^2= 2a^2+2b^2. \end{equation} The "non-adjacent" subcase is considered quite similarly, by using an appropriate flip/symmetry.
Case 4: Three of the $n_j$'s are $1$'s, and the other $n_j$ is $\ge2$. Here wlog $n_1\ge2$ and $n_2=n_3=n_4=1$. Let $Q_2=(p,q),Q_3=(u,v),Q_4=(x,y)$ be the unique points in $R_2,R_3,R_4$, respectively, so that $p,x\in[\tfrac a2,a]$, $v,y\in[\tfrac b2,b]$, $u\in[0,\tfrac a2]$, $q\in[0,\tfrac b2]$.
Let $s$ and $t$ be, respectively, the largest abscissa and ordinate of all the $n_1$ points in $R_1$. Then
\begin{multline}
D(R)\le B:=D(R_1)+d(Q_2,S)^2+d(Q_3,T)^2+\frac{\|Q_4-Q_2\|^2+\|Q_4-Q_3\|^2}2,
\end{multline}
where $S$ is a point (among the $n_1$ points in $R_1$) with the largest abscissa $s$, and $T$ is a point (among the $n_1$ points in $R_1$) with the largest ordinate $t$, so that $S=(s,\eta)$ for some $\eta\in[0,t]$, and $T=(\xi,t)$ for some $\xi\in[0,s]$. Of course, $d(Q_2,S)$ is a convex function of $\eta\in[0,t]$, and so,
\begin{equation}
d(Q_2,S)^2\le\max(d(Q_2,(s,0))^2,d(Q_2,(s,t))^2)
=(p - s)^2 + \max(q^2, (q-t)^2).
\end{equation}
Similarly,
\begin{equation}
d(Q_3,T)^2\le
\max(u^2, (u-s)^2) + (v - t)^2.
\end{equation}
Next,
\begin{equation}
\|Q_4-Q_2\|^2+\|Q_4-Q_3\|^2=(x - p)^2 + (y - q)^2 + (x - u)^2 + (y - v)^2,
\end{equation}
which is convex in $x,y$, with the maximum in $x\in[\tfrac a2,a]$ and $y\in[\tfrac b2,b]$ attained at $x=a$ and $y=b$, for any given $p\in[\tfrac a2,a]$, $v\in[\tfrac b2,b]$, $u\in[0,\tfrac a2]$, $q\in[0,\tfrac b2]$.
Further, by induction, $D(R_1)\le2s^2+2t^2$. Thus,
\begin{multline}
D(R)\le B\le B_1:=
2 s^2+2t^2+\max \left(q^2,(q-t)^2\right)+\max \left((u-s)^2,u^2\right) \\
+(p-s)^2+(v-t)^2 \\
+\tfrac{1}{2} \left((a-p)^2+(b-q)^2\right)+\tfrac{1}{2} \left((a-u)^2+(b-v)^2\right).
\end{multline}
Clearly, $B_1$ is convex in $p,v$, and one can see that the maximum of $B_1$ in $p\in[\tfrac a2,a]$ and $v\in[\tfrac b2,b]$ is attained at $p=a$ and $v=b$, for any given $s\in[0,\tfrac a2]$, $t\in[0,\tfrac b2]$, $u\in[0,\tfrac a2]$, $q\in[0,\tfrac b2]$. So,
\begin{equation}
B_1\le B_{21}+B_{22}+B_{23},
\end{equation}
where
\begin{align}
B_{21}&:=\max \left(q^2,(q-t)^2\right)- b q+q^2/2, \\
B_{22}&:=\max \left(u^2,(u-s)^2\right)- a u+u^2/2, \\
B_{23}&:=\tfrac{1}{2} \left(3 a^2+3 b^2-4 a s-4 b t+6 \left(s^2+t^2\right)\right).
\end{align}
Next,
\begin{equation}
B_{21}\le t^2,\quad B_{22}\le s^2
\end{equation}
for $q,t\in[0,\tfrac b2]$ and $u,s\in[0,\tfrac a2]$. So,
\begin{equation}
D(R)\le B\le B_1\le B_{21}+B_{22}+B_{23}
\le\tfrac{1}{2} (3 a^2 + 3 b^2) -4 (\tfrac a2 - s) s - 4 (\tfrac b2 - t) t
\le\tfrac{1}{2} (3 a^2 + 3 b^2)\le2 a^2 + 2 b^2.
\end{equation}
Case 5: This case obtains from Cases 2, 3, 4 by replacing there some of the conditions of the form $n_j=1$ by $n_j=0$. This case immediately follows from Cases 2, 3, 4, because now the nonnegative contributions of the single points will be replaced by $0$ --- with the only exception occurring when three of the $n_j$'s are $0$ and hence the remaining one of the $n_j$'s (say $n_1$) equals $n$. Indeed, in the latter exceptional subcase, we cannot use the induction, since $n_1=n\not<n$. The remedy in this case is to continue the "partitioning" of the smaller rectangles containing all the $n$ points until we no longer have such an exceptional situation. This process will stop. Indeed, if it never stops, then all the $n$ distinct points will get eventually contained in a singleton set, which is a contradiction.
So far, we have consider all the cases when at least one of the $n_j$'s is $\ge2$. Otherwise, we have $n_j\le1$ for all $j=1,\dots,4$ and hence $n\le4$. Since $n>2$, it remains to consider the following two cases.
Case 6: $n=3$. By shrinking the rectangle $R$, wlog we may assume that each side of $R$ contains at least one of the three points $P_1,P_2,P_3$. By the pigeonhole principle, at least two sides of the rectangle must share one of the three points. Also, wlog none of the points $P_1,P_2,P_3$ is in the interior of $R$. Indeed, if e.g. $P_3$ is in the interior of $R$, then we can move $P_3$ away from the line $\ell(P_1,P_2)$ (through $P_1,P_2$) in the direction perpendicular to $\ell(P_1,P_2)$ (till we hit the boundary of $R$), so that all the pairwise distances between the points $P_1,P_2,P_3$ may only increase, and then $D(R)$ may only increase.
So, wlog $P_1=(0,0),P_2=(u,b),P_3=(a,v)$ for some $u\in[0,a]$ and $v\in[0,b]$. So, \begin{equation} D(R)\le B_3:=\frac{P_1P_2^2+P_1P_3^2}2+\frac{P_2P_1^2+P_2P_3^2}2+\frac{P_3P_2^2+P_3P_1^2}2= P_1P_2^2+P_1P_3^2+P_2P_3^2, \end{equation} where $AB:=\|A-B\|$. Since $B_3$ is convex in $u,v$, its maximum in $u\in[0,a]$ and $v\in[0,b]$ is attained when $u\in\{0,a\}$ and $v\in\{0,b\}$, and this maximum is $2a^2+2b^2$, which proves Case 6.
Case 7: $n=4$. Again, by shrinking the rectangle $R$, wlog we may assume that each side of $R$ contains at least one of the four points $P_1,P_2,P_3,P_4$. Also, each of the four points is either (i) in the convex hull of the other three points or (ii) on the boundary of $R$. Indeed, otherwise wlog $P_4$ is in the interior of $R$, but not in the convex hull of $P_1,P_2,P_3$; more specifically, we may assume that the points $P_4$ and $P_3$ are to the opposite sides of the line $\ell(P_1,P_2)$. Then we can move $P_4$ away from the triangle $P_1P_2P_3$ in the direction perpendicular to the line $\ell(P_1,P_2)$ (till we hit the boundary of $R$), so that all the pairwise distances between the points $P_1,P_2,P_3,P_4$ may only increase, and then $D(R)$ may only increase. Therefore, wlog we have one the following subcases.
Subcase 7.1: $P_4$ is in the convex hull of $P_1,P_2,P_3$, so that $P_4=(1-s-t)P_1+sP_2+tP_3$ for some $s,t$ such that $s,t\ge0$ and $s+t\le1$. Also, here, as in Case 6, wlog $P_1=(0,0),P_2=(u,b),P_3=(a,v)$ for some $u\in[0,a]$ and $v\in[0,b]$. Hence \begin{equation} D(R)\le B_{71}:=P_1P_4^2+P_2P_4^2+P_3P_4^2+[(1-s-t)P_4P_1^2+s\,P_4P_2^2+t\,P_4P_3^2]\le2a^2+2b^2; \end{equation} the latter inequality follows easily by the convexity of $B_{71}$ in each of the variables $u,v,(s,t)$. Thus, Subcase 7.1 is done.
Subcase 7.2: All the four points $P_1,P_2,P_3,P_4$ are on the boundary of $R$.
Subsubcase 7.2.1: There is a bijection $\beta$ from the set of the points $\{P_1,P_2,P_3,P_4\}$ to the set of the sides of $R$ such that $P_j\in\beta(P_j)$ for all $j$. So, wlog $P_1=(s,0),P_2=(0,t),P_3=(u,b),P_4=(a,v)$ for some $s,u\in[0,a]$ and $t,v\in[0,b]$. Then, similarly to the case $n=3$,
\begin{equation}
D(R)\le\frac{P_1P_2^2+P_1P_4^2}2+\frac{P_2P_1^2+P_2P_3^2}2
+\frac{P_3P_2^2+P_3P_4^2}2+\frac{P_4P_1^2+P_4P_3^2}2\le2a^2+2b^2,
\end{equation}
by convexity.
Subcase 7.2.1 is done.
Subsubcase 7.2.2: One of the points $P_1,P_2,P_3,P_4$ (say $P_1$) is shared by two sides of the rectangle $R$. So, wlog $P_1=(0,0)$. Suppose that one of the two sides of $R$ (say $S_1$ and $S_2$) sharing the point $P_1$ contains one of the points $P_2,P_3,P_4$; let us say this side is $S_1$. Then we can move $P_1$ slightly along the side $S_2$ out of its position at $(0,0)$. In view of the continuity of $D(R)$ in $P_1$, we can thus get rid of the sharing -- provided that at least one of the two sides of $R$ sharing the point $P_1$ contains one of the points $P_2,P_3,P_4$. So, wlog $P_1=(0,0)$ and none of the two sides of $R$ sharing the point $P_1$ contains any of the points $P_2,P_3,P_4$. So, one of the sides of $R$ not sharing the point $P_1$ contains two of the points $P_2,P_3,P_4$. Therefore and by the interchangeability of the horizontal and vertical directions, wlog $P_1=(0,0),P_2=(u,b),P_3=(a,v),P_4=(a,w)$ for some $u\in[0,a]$ and $v,w\in[0,b]$ such that $v>w$. So,
\begin{equation}
D(R)\le P_1P_2^2+P_2P_3^2+P_3P_4^2+P_4P_1^2\le2a^2+2b^2,
\end{equation}
again by convexity.
Subcase 7.2.2 is done, as well as the entire proof of the bound $2a^2+2b^2$.