Roots of permutations
$\DeclareMathOperator{\GL}{GL} \DeclareMathOperator{\SL}{SL}$ The maximum of the function counting square roots is attained at $x_0=1$ and this statement generalises quite well.
Let $s(\chi)$ denote the Frobenius-Schur indicator of the irreducible character $\chi$. For the definition, see the edit below. One has $s(\chi)=1$ if the representation of $\chi$ can be realised over $\mathbb{R}$, $s(\chi)=-1$ if $\chi$ is real-valued but the corresponding representation is not realisable over $\mathbb{R}$, and $s(\chi)=0$ if $\chi$ is not real-valued. Then, the number of square roots of an element $g$ in any group is equal to $$\sum_\chi s(\chi)\chi(g),$$ where the sum runs over all irreducible characters of the group. See below for a reference and a quick proof of this identity.
It follows from the usual theory of representations of $S_n$ that in this special case all Frobenius-Schur indicators are $1$, so the number of square roots of $x_0$ is just $\sum_\chi \chi(x_0)$. This proves that the maximal number of solutions is indeed attained by $x_0 = 1$, since each character value attains its maximum there. This generalises immediately to all groups for which every representation is either realisable over $\mathbb{R}$ or has non-real character, in other words has no symplectic (or sometimes called quaternionic) representations. That includes all abelian groups, all alternating groups, all dihedral groups, $\GL_n(\mathbb{F}_q)$ for all $n\in \mathbb{Z}_{\geq 1}$ and all prime powers $q$ (see [1, Ch. III, 12.6]), and many more.
[1] A. Zelevinsky, Representations of Finite Classical Groups, Lecture Notes in Mathematics, Vol. 869, Springer-Verlag, New York/Berlin, 1981.
Edit: One reference I have found for the identity expressing the number of square roots in terms of Frobenius-Schur indicators is Eugene Wigner, American Journal of Mathematics Vol. 63, No. 1 (Jan., 1941), pp. 57-63, "On representations of certain finite groups". Once you get used to the notation, you will recognise it in displayed formula (11). Since the notation is really heavy going, I will supply a quick proof here:
Claim: If $n(g)$ is the number of square roots of an element $g$ of a finite group $G$, then we have $$n(g) = \sum_\chi s(\chi)\chi(g),$$ where the sum runs over all characters of $G$, and $s(\chi)$ denotes the Frobenius-Schur indicator of $\chi$, defined as $s(\chi)=\frac{1}{|G|}\sum_{h\in G}\chi(h^2)$.
Proof: It is clear that $n(g)$ is a class function, so it is a linear combination of the irreducible characters of $G$. The coefficient of $\chi$ in this linear combination can be recovered as the inner products of $n$ with $\chi$. We can write $n(g) = \sum_h \delta_{g,h^2}$ (here $\delta$ is the usual Kronecker delta), so we obtain $$ \begin{align*} \left< n,\chi \right> &= \frac{1}{|G|}\sum_{g\in G}n(g)\chi(g) = \frac{1}{|G|}\sum_{g\in G}\sum_{h\in G}\delta_{g,h^2}\chi(g)=\\\\ &=\frac{1}{|G|}\sum_{h\in G}\sum_{g\in G}\delta_{g,h^2}\chi(g) = \frac{1}{|G|}\sum_{h\in G}\chi(h^2), \end{align*}$$ as claimed.
Edit 2: I got curious and ran a little experiment. The proof above applies to all finite groups that have no symplectic representations. So the natural question is: what happens for those that do? Among the groups of size $\leq 150$, there are 1911 groups that have a symplectic representation, and for 1675 of them, the square root counting function does not attain its maximum at the identity! There are several curious questions that suggest themselves: is there a similar (representation-theoretic?) 2-line criterion that singles out those 300-odd groups that satisfy the conclusion but not the assumptions of the above proof? What happens for the others? Can we find a complete characterisation of the groups whose square root counting functions is maximised by the identity? Following Pete's suggestion, I have started two follow-up questions on this business: one on square roots and one on $n$-th roots.
To your last question - "how far may it be generalized" - Richard Stanley answered when you fix the equation ($X^2=c$) and vary the group. You may also wonder about other equations. The situation is interesting: There are equations and groups with the property that the identity is not the RHS yielding the most solutions. This is so even though the LHS has no constants, just variables.
One may rephrase the question as follows: given a word $w=w(X_1,X_2,\ldots,X_r)$ in the free group $F_r$ with variables $X_1,\ldots,X_r$, and given any finite group $G$, one may naturally consider $w$ as inducing a function $G^r \to G$ by plugging elements of $G$ as variables. This in turn defines a probability distribution on $G$: if you plug uniform random elements, what do you get? The most likely outcome is often, but not always, the identity.
In fact, the probability of getting the identity can be made arbitrarily small iff the group is non-solvable. I circulated this as a conjecture some years ago and it was proven by Miklos Abert (for the non-solvable case) and Nikolov and Segal (for the solvable one).
For a general finite group $G$ and an arbitrary element $y \in G,$ there is a general way to use characters to compare the number of solutions of $x^{2} = y$ in $G$ with the number of solutions of $x^{2} = 1$ within $C_{G}(y).$
For notice that any element $x$ with $x^{2} = y$ already lies in $C_{G}(y).$ Hence (using the general formula mentioned in Alex's answer, but within the group $C_{G}(y)$) the number of solutions of $y^{2} = x$ is $\sum_{ \mu \in {\rm Irr}(C_{G}(y)} \nu(\mu) \mu(y),$ where $\nu(\mu)$ is the Frobenius-Schur indicator of $\mu$ ( which Alex denoted by $s(\mu)$).
But now we note (using Schur's Lemma) that unless $y{\rm ker} \mu$ has order $1$ or $2$ in $C_{G}(y)/{\rm ker}(\mu)$ we have $\nu(\mu) = 0$ since $\mu$ is certainly not real-valued.
Hence if $y$ has order $m,$ then we need only consider irreducible characters $\mu$ with $y$ in their kernel when $m$ is odd, and irreducible characters $\mu$ with $y^{2}$ in their kernel when $m$ is even.
In any case, the contribution to the sum from those characters with $y$ in their kernel is the same as the number of square roots of the identity in $C_{G}(y)/\langle y \rangle.$
I won't give all details, but we may continue to deduce that the difference between the number of square roots of the identity in $C_{G}(y)$ and the number of square roots of $y$ in $G$ is given by the formula:
$\sum_{\mu \in {\rm Irr}(C_{G}(y)/\langle y^{2} \rangle) \backslash {\rm Irr}(C_{G}(y)/\langle y \rangle)} 2\nu(\mu)\mu(1).$
Note that this number is non-negative unless $C_{G}(y)$ has an irreducible character $\mu$ with $\nu(\mu) = -1$ and with $y^{2} \in {\rm ker} \mu$ but $y \not \in {\rm ker} \mu$.
Hence we may conclude in particular that for a general element $y$ of a general finite group $G,$ the number of square roots of $y$ in $G$ is already less that the number of solutions of square roots of the identity in $C_{G}(y)$ unless $C_{G}(y)$ has an irreducible character $\mu$ of Frobenius-Schur indicator $-1$ such that $y{\rm ker} \mu$ has order $2$ in $C_{G}(y)/{\rm ker} \mu.$
Latter edit: It is perhaps worth remarking that this argument shows that the number of square roots of $y$ in $G$ is the same as the number of square roots of $yM$ in $C_{G}(y)/M$, where $M$ is the normal subgroup $\langle y^{2} \rangle $ of $C_{G}(y).$ This reduces the computation of the number of square roots of $y \in G$ to the case that $y$ has order $1$ or $2$ and that $y \in Z(G).$
Returning to the (last part of the) original question, it means that each finite group $G$ for which the maximum number of square roots is not attained at the identity has a section (ie a factor group of a subgroup) $H$ such that $Z(H)$ such contains an involution $u$ with the number of elements of order four with square $u$ at least two greater that the number of involutions of $H$ ( here an involtion is considered to be an element of order exactly two). Furthermore, the section $H$ has the form $C_{G}(y)/\langle y^{2} \rangle$ where $y$ is an element of $G$ with the maximum number of square roots. Note also that if the identity element does not have the maximum number of square roots then the element $y$ attaining the maximum number necessarily has even order.
Note that in the essential case that $y \in Z(G)$ is an involution, then $y$ has more square roots in $G$ than the identity does if and only if we have $\sum_{ \{\chi \in {\rm Irr}(G) \backslash {\rm Irr}(G/\langle y \rangle): \nu(\chi) = -1 \}}\chi(1) > \sum_{ \{\chi \in {\rm Irr}(G) \backslash {\rm Irr}(G/\langle y \rangle): \nu(\chi) = +1 \} } \chi(1).$