Number of functions $f:[4]\times[4]\rightarrow[4]$
So here's a better approach. Do what Harary/Palmer tell you. To really do this justice you need a crash course in the representation theory of permutation groups and Schur functors, but we'll get by without it.
The formula we need to calculate is $$ a(n,k) = {1\over n!k!} \sum \displaystyle\prod_{p=1}^n \displaystyle\prod_{q=1}^k ( \displaystyle\sum_{s|[p,q]} s j_s(\alpha))^{j_p(\alpha) j_q(\beta) <p,q>} $$ where the outer sum is over pairs of $\alpha \in S_n$ and $\beta \in S_k$
$j_d(\pi)$ the number of d-cycles in a permutation $\pi$
$[a,b]$ represents lcm and $<a,b>$ represents gcd.
It's easier to calculate if, instead of summing over $\alpha$ and $\beta$, we sum over their respective conjugacy classes in $S_n$ and $S_k$, and weight the sum by the size of each class. This is equivalent to enumerating the partitions of $[n]\times[k]$, as the conjugacy classes in $S_n$ are determined by cycle decomposition.
The code is not difficult, and it's a lot quicker than my first approach. Here's the main blob.
sum=0;
for (alpha=0;alpha<CC;alpha++){
for (beta=0;beta<DD;beta++){
pqProd=1;
for (pp=1;pp<=NN;pp++){
for (qq=1;qq<=KK;qq++){
gcd=gcdLut[pp][qq];
lcm=lcmLut[pp][qq];
sSum=0;
for (ss=1;ss<=lcm && ss<=NN;ss++){
if (lcm%ss!=0) continue;
sSum += ss * cycleCountN[alpha][ss];
}
xx=gcd * cycleCountN[alpha][pp] * cycleCountK[beta][qq];
pqProd *= pow_uint64(sSum,xx);
}
}
sum += pqProd * cSize[alpha] * dSize[beta];
}
}
printf("sum %llu\n",sum);
Notes: I set NN and KK at compile-time. The partition numbers are CC and DD respectively. I precomputed the gcds and lcms to save rework. The partitions are stored in the cycleCount arrays, and the sizes of the partitions in cSize and dSize. The rest should be self-explanatory. If you want to copy this, you'll also need an integer exponentiation function, but that's pretty easy to source. Make sure it calculates $0^0 = 1$
I have tested this for $N=K=2$ and it returns 28 (note I haven't divided by the order of the groups in the code).
Finally, the answer, which is a lot less interesting than the question!
$$ a(4,4) = 7643021 $$
Bugs notwithstanding...
UPDATE:
I just saw Markus' post - good to see we are on similar lines. I've run my code for $N=K=3$ and get 23076/36 = 641. So we're close, but not perfect!
The only line on which we differ is $\alpha = (123)$, $\beta = (12)$. I get a result of 27, whereas Markus gets 9 (prior to multiplication by the partition sizes). Looking inside I get a factor 9 from $p = 3$, $q = 2$. Which means I have a bug. I'm overflowing the sum up to the lcm, which should by limited by $N$.
I have just fixed this and now Markus and I agree on 638. I've corrected the code and answer for $a(4,4)$ above.
Here is a table of results for $a(N,K)$ for values up to 9.
+-----+------+--------------+----------------+-----------------+----------------+-------------+--------------+---------+----------+
| N\K | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
+-----+------+--------------+----------------+-----------------+----------------+-------------+--------------+---------+----------+
| 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
+-----+------+--------------+----------------+-----------------+----------------+-------------+--------------+---------+----------+
| 2 | 3 | 7 | 13 | 22 | 34 | 50 | 70 | 95 | 125 |
+-----+------+--------------+----------------+-----------------+----------------+-------------+--------------+---------+----------+
| 3 | 7 | 74 | 638 | 4663 | 28529 | 151600 | 713176 | 3028727 | 11773093 |
+-----+------+--------------+----------------+-----------------+----------------+-------------+--------------+---------+----------+
| 4 | 19 | 1474 | 118949 | 7643021 | 396979499 | 17265522590 | 646203233957 | | |
+-----+------+--------------+----------------+-----------------+----------------+-------------+--------------+---------+----------+
| 5 | 47 | 41876 | 42483668 | 33179970333 | 20762461502595 | | | | |
+-----+------+--------------+----------------+-----------------+----------------+-------------+--------------+---------+----------+
| 6 | 130 | 1540696 | 23524514635 | 274252613077267 | | | | | |
+-----+------+--------------+----------------+-----------------+----------------+-------------+--------------+---------+----------+
| 7 | 343 | 68343112 | 18477841853059 | | | | | | |
+-----+------+--------------+----------------+-----------------+----------------+-------------+--------------+---------+----------+
| 8 | 951 | 3540691525 | | | | | | | |
+-----+------+--------------+----------------+-----------------+----------------+-------------+--------------+---------+----------+
| 9 | 2615 | 209612916303 | | | | | | | |
+-----+------+--------------+----------------+-----------------+----------------+-------------+--------------+---------+----------+
And here's the code. It's slightly complicated due to me initialising variable-width arrays in subroutines, which leads to some interesting pointer constructs! I'm rather pleased with the partition generator. It's your basic backtracker, but it has come out very neat!
Not all compilers are friendly for arrays with widths set at run-time. I compiled this with gcc.
///////// Calculate a(N,K) /////////////
// Call: ank N K
// N and K should be positive integers. This is not checked.
// Results go to stdout
// There is no test for integer overflow, so large values of N and K will not work!
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
typedef unsigned long long uint64;
///////////////////////////
//// Integer exponentiation
///////////////////////////
uint64 pow_uint64(uint64 base, int exp)
{
uint64 result = 1;
// 0**0=1
if (base==0 && exp==0) return result;
while (exp){
if (exp & 1)
result *= base;
exp >>= 1;
base *= base;
}
return result;
}
//////////////////////
//// gcd & lcm precalc
//////////////////////
void init_gcd_lcm(int nk, int gcd[][nk+1], int lcm[][nk+1])
{
int ii,jj,kk;
for (ii=0;ii<=nk;ii++)
gcd[0][ii]=gcd[ii][0]=lcm[0][ii]=lcm[ii][0]=0;
// These are small numbers - it's quite efficient to search and easier to code than Euclid's algorithm.
for (ii=1;ii<=nk;ii++){
for (jj=1;jj<ii;jj++){
gcd[ii][jj]=1;
for (kk=2;kk<=jj;kk++){
if (ii%kk==0 && jj%kk==0)
gcd[ii][jj]=kk;
}
gcd[jj][ii]=gcd[ii][jj];
lcm[jj][ii]=lcm[ii][jj]=ii*jj/gcd[ii][jj];
}
gcd[ii][ii]=lcm[ii][ii]=ii;
}
return;
}
////////////////////////
//// Generate partitions
////////////////////////
int init_partitions(int nk,int (**store)[nk+1])
{
int lev; /* do this by a superloop */
int rr[nk+1]; /* remainder at each level */
int part[nk+1]; /* partition being built */
int npart=0;
int ii;
*store=malloc((nk+1)*sizeof(int));
part[nk]=2;
rr[nk]=nk;
for (lev=nk;lev<=nk;lev++){
for (part[lev]--;part[lev]>=0;part[lev]--){
// if we reach rr[lev]==0 we have a partition
if (rr[lev]==0){
for (ii=lev;ii>=0;ii--)
part[ii]=0;
// Save it
memcpy((*store)[npart],part,sizeof(part));
npart++;
*store=realloc(*store,(npart+1)*sizeof(part));
continue;
}
// if we reach lev==1 we can complete the partition
if (lev==1){
part[1]=rr[1];
part[0]=0;
// Save it
memcpy((*store)[npart],part,sizeof(part));
npart++;
*store=realloc(*store,(npart+1)*sizeof(part));
// Set up to ascend
part[1]=0;
continue;
}
// How much remains to push to a lower level?
rr[lev-1]=rr[lev]-part[lev]*lev;
part[lev-1]=1+rr[lev-1]/(lev-1);
lev--;
}
}
return npart;
}
///////////
//// MAIN
///////////
int main(int argc, char **argv)
{
// parameters N, K
int nn=atoi(argv[1]);
int kk=atoi(argv[2]);
int nkMax=(nn>kk) ? nn:kk;
// Precalculate gcds and lcms
int (*gcdLut)[nkMax+1];
int (*lcmLut)[nkMax+1];
// Conjugacy classes = partitions on N and K
int (*nClass)[nn+1];
int *nClassSize;
int nClassCount;
int (*kClass)[kk+1];
int *kClassSize;
int kClassCount;
uint64 *fact; /* factorials */
int alpha, beta; /* perms represented by class */
int pp, qq;
int ss, gcd, lcm, xx;
uint64 sum, sSum, pqProd, yy;
int ii,jj;
//// Initialise
gcdLut=malloc((nkMax+1)*(nkMax+1)*sizeof(int));
lcmLut=malloc((nkMax+1)*(nkMax+1)*sizeof(int));
init_gcd_lcm(nkMax,gcdLut,lcmLut);
nClassCount=init_partitions(nn,&nClass);
kClassCount=init_partitions(kk,&kClass);
// Factorials
fact=malloc((1+nkMax)*sizeof(uint64));
for (fact[0]=1,ii=1;ii<=nkMax;ii++)
fact[ii]=fact[ii-1]*ii;
// Class sizes
nClassSize=malloc(nClassCount*sizeof(int));
for (ii=0;ii<nClassCount;ii++){
nClassSize[ii]=fact[nn];
for (jj=1;jj<=nn;jj++){
yy=pow_uint64((uint64)jj,nClass[ii][jj]);
nClassSize[ii] /= fact[nClass[ii][jj]]*yy;
}
}
kClassSize=malloc(kClassCount*sizeof(int));
for (ii=0;ii<kClassCount;ii++){
kClassSize[ii]=fact[kk];
for (jj=1;jj<=kk;jj++){
yy=pow_uint64((uint64)jj,kClass[ii][jj]);
kClassSize[ii] /= fact[kClass[ii][jj]]*yy;
}
}
//// Principal calculation
sum=0;
for (alpha=0;alpha<nClassCount;alpha++){
for (beta=0;beta<kClassCount;beta++){
pqProd=1;
for (pp=1;pp<=nn;pp++){
for (qq=1;qq<=kk;qq++){
gcd=gcdLut[pp][qq];
lcm=lcmLut[pp][qq];
sSum=0;
for (ss=1;ss<=lcm && ss<=nn;ss++){
if (lcm%ss!=0) continue;
sSum += ss * nClass[alpha][ss];
}
xx=gcd * nClass[alpha][pp] * kClass[beta][qq];
pqProd *= yy = pow_uint64(sSum,xx);
// Print contributors
if (yy>1){
for (ii=1;ii<=nn;ii++)
printf("%d",nClass[alpha][ii]);
printf("\t");
for (ii=1;ii<=kk;ii++)
printf("%d",kClass[beta][ii]);
printf("\t%d\t%d",pp,qq);
printf("\t%llu\n",sSum);
}
}
}
sum += pqProd * nClassSize[alpha] * kClassSize[beta];
// Print separator between perm pairs
printf("%d\t%d\t%llu\n\n",nClassSize[alpha],kClassSize[beta],pqProd);
}
}
sum = sum / fact[nn] / fact[kk];
printf("sum %llu\n",sum);
return 0;
}
[Add-on 2016-10-30]: Case n=4 added.
We calculate the number $a(n,n)$ of equivalent functions $$f:[n]\times[n]\rightarrow[n]$$ for $n=2,3$ and $n=4$ according to the paper of F. Harary and E. Palmer and show
\begin{align*} a(2,2)&=7\\ a(3,3)&=638\\ a(4,4)&=7643021 \end{align*}
The formula to be applied is stated as formula (14) in the paper. In fact we can use a simplified version of it, which is given in connection with the calculation of $a(2,2,1)=7$ at the end of page 505. The third parameter is not of interest for us and so we instead write $a(2,2)$.
Here I follow the notation of the authors and use $[p,q]:=\operatorname{lcm}(p,q)$ and $\langle p,q\rangle:=\operatorname{gcd}(p,q)$.
[Harary, Palmer]: The following is valid \begin{align*} a(n,n)=\frac{1}{\left(n!\right)^2}\sum_{(\alpha,\beta)\in S_n^2}\prod_{p=1}^n\prod_{q=1}^n \left(\sum_{s|[p,q]}sj_s(\alpha)\right)^{j_p(\alpha)j_q(\beta)\langle p,q\rangle}\tag{1} \end{align*} where the sum is taken over all pairs of permutations $(\alpha,\beta)$ of degree $n$ and $j_p(\alpha)$ is denoting the number of cycles of $\alpha$ of length $p$.
Hint: Observe the terms in the sum (1) do not make use of $\alpha$ but instead of $j_p(\alpha)$ only. So, it is not necessary to sum over all $\left(n!\right)^2$ pairs of permutations, as we can conveniently use the cycle index of the permutation group $S_n$ and considerably reduce the number of summands.
Preparatory work: Cycle index
We calculate for $n=2,3,4$ the cycle index based upon the recursion formula \begin{align*} Z(S_0)&=1\\ Z(S_n)&=\frac{1}{n}\sum_{j=1}^nz_jZ(S_{n-j})\qquad\qquad n>0 \end{align*} We obtain \begin{align*} Z(S_1)&=z_1Z(S_0)=z_1\\ Z(S_2)&=\frac{1}{2}\left(z_1Z(S_1)+z_2\right)\\ &=\frac{1}{2}\left(z_1^2+z_2\right)\\ Z(S_3)&=\frac{1}{3}\left(z_1\cdot\frac{1}{2}\left(z_1^2+z_2\right)+z_2z_1+z_3\right)\\ &=\frac{1}{6}\left(z_1^3+3z_1z_2+2z_3\right)\\ Z(S_4)&=\frac{1}{4}\left(z_1\cdot\frac{1}{6}\left(z_1^3+3z_1z+2z_3\right)+z_2\cdot\frac{1}{2}\left(z_1^2+z_2\right) +z_3z_1+z_4\right)\\ &=\frac{1}{24}\left(z_1^4+6z_1^2z_2+8z_1z_3+3z_2^2+6z_4\right) \end{align*}
$$ $$
Case: $n=2$:
We show the following is valid \begin{align*} \color{blue}{a(2,2)=7} \end{align*}
In order to calculate $a(2,2)$ we consider according to (1) \begin{align*} a(2,2)=\frac{1}{4}\sum_{(\alpha,\beta)\in S_2^2}\prod_{p=1}^2\prod_{q=1}^2 \left(\sum_{s|[p,q]}sj_s(\alpha)\right)^{j_p(\alpha)j_q(\beta)\langle p,q\rangle}\tag{2} \end{align*}
It it convenient to do some bookkeeping by use of tables. We list the permutations of $S_2=\{\operatorname{id},(12)\}$ in cycle notation and write a table with the number of cycles of each length for each permutation. We also write the corresponding monomial from the cycle index. \begin{array}{l|ccc} \pi&Z(S_2)&j_1(\pi)&j_2(\pi)\\ \hline \operatorname{id}& z_1^2& 2& 0\\ (12)& z_2^1& 0 &1\\ \end{array}
Since it is too cumbersome to write each summand from (2) in one long line we use instead a table description as follows:
\begin{array}{cc|cccc|cc|cc|rr} \alpha&\beta&p&q&[p,q]&<p,q>&s&j_s(\alpha)&j_p(\alpha)&j_q(\beta)&\text{factors}&\text{result}\\ \hline id&id&1&1&1&1&1&2&2&2&16&16\\ &&1&2&2&1&1&2&2&0&1&\\ &&&&&&2&0&&&&\\ &&2&1&2&1&1&2&0&0&1&\\ &&&&&&2&0&&&&\\ &&2&2&2&2&1&2&0&2&1&\\ &&&&&&2&0&&&&\\ \hline id&(12)&1&1&1&1&1&2&2&0&1&4\\ &&1&2&2&1&1&2&2&1&4&\\ &&&&&&2&0&&&&\\ &&2&1&2&1&1&2&0&0&1&\\ &&&&&&2&0&&&&\\ &&2&2&2&2&1&2&0&1&1&\\ &&&&&&2&0&&&&\\ \hline (12)&id&1&1&1&1&1&0&0&2&1&4\\ &&1&2&2&1&1&0&0&0&1&\\ &&&&&&2&1&0&0&&\\ &&2&1&2&1&1&0&1&2&4&\\ &&&&&&2&1&1&2&&\\ &&2&2&2&2&1&0&1&0&1&\\ &&&&&&2&1&1&0&&\\ \hline (12)&(12)&1&1&1&1&1&0&0&0&1&4\\ &&1&2&2&1&1&0&0&1&1&\\ &&&&&&2&1&0&1&&\\ &&2&1&2&1&1&0&1&0&1&\\ &&&&&&2&1&1&0&&\\ &&2&2&2&2&1&0&1&1&4&\\ &&&&&&2&1&1&1&&\\ \hline &\color{blue}{\text{Total}}&&&&&&&&&&\color{blue}{28} \end{array}
Comment:
The table is organised in blocks for pairs of permutation. Although here not eye-catching since we list all $\left(2!\right)^2=4$ pairs, we need in fact only for each cycle type one representative, since we are only interested in the length of cycles of a permutation. The column result gives the summands in (2). Here are the gory details:
Columns: $\alpha,\beta$ correspond to a pair of permutations $(\alpha,\beta)$ which is used as index in the outer sum of (2).
Columns: $p,q$ are the indices of the products in (2)
Column: $s$ gives the divisors of $\operatorname{lcm}(p,q)$
Columns: $j_s(\alpha),j_p(\alpha),j_q(\beta)$ list the cycle lengths
Column: $\text{factor}$ gives $$\left(\sum_{s|[p,q]}sj_s(\alpha)\right)^{j_p(\alpha)j_q(\beta)\langle p,q\rangle}$$
Column: $\text{result}$ calculates finally the product
$$\prod_{p=1}^2\prod_{q=1}^2 \left(\sum_{s|[p,q]}sj_s(\alpha)\right)^{j_p(\alpha)j_q(\beta)\langle p,q\rangle}$$
Since the total of the table is $28$ we finally conclude according to (2) \begin{align*} \color{blue}{a(2,2)=\frac{1}{4}\cdot 28=7} \end{align*} and the claim follows.
$$ $$
Case: $n=3$:
We do the calculation similar to above and show the following is valid \begin{align*} \color{blue}{a(3,3)=638} \end{align*}
In order to calculate $a(3,3)$ we consider according to (1)
\begin{align*} a(3,3)=\frac{1}{\left(3!\right)^2}\sum_{(\alpha,\beta)\in S_3^2}\prod_{p=1}^3\prod_{q=1}^3 \left(\sum_{s|[p,q]}sj_s(\alpha)\right)^{j_p(\alpha)j_q(\beta)\langle p,q\rangle}\tag{3} \end{align*}
We list the permutations of $S_3=\{\operatorname{id},(12),(13),(23),(123),(132)\}$ in cycle notation and write a table with the number of cycles of each length for each permutation. We also write the corresponding monomial from the cycle index. \begin{array}{l|cccc} \pi&Z(S_3)&j_1(\pi)&j_2(\pi)&j_3(\pi)\\ \hline id&z_1^3&3&0&0\\ (12)&3z_1z_2&1&1&0\\ (123)&2z_3&0&0&1\\ \end{array}
Note: The factors $1,3$ and $2$ in the column $Z(S_3)$ indicate the number of different permutations of the corresponding cycle type. We will use this fact to considerably reduce the calculation of the number of summands in (3).
In the following it is sufficient to calculate tables for the nine pairs \begin{align*} \{id,(12),(123)\}\times\{id,(12),(123)\} \end{align*} the cycle index provides the supplementary information we need to calculate the complete sum.
Note that in the main table above there is some redundancy to ease traceability. We now use a somewhat more compact notation to ease readability and keep the space small.
Table: $j_s(\pi), j_p(\pi),j_q(\pi)$ \begin{array}{cc|cc|ccc|ccc|ccc} &&&&&\pi=id&&&\pi=(12)&&&\pi=(123)&\\ p&q&[p,q]&s&j_s&j_p&j_q&j_s&j_p&j_q&j_s&j_p&j_q\\ \hline 1&1&1&1&3&3&3&1&1&1&0&0&0\\ &2&2&1&3&3&0&1&1&1&0&0&0\\ &&&2&0&&&1&&&0&&\\ &3&3&1&3&3&0&1&1&0&0&0&1\\ &&&3&0&&&0&&&1&&\\ 2&1&2&1&3&0&3&1&1&1&0&0&0\\ &&&2&0&&&1&&&0&&\\ &2&2&1&3&0&0&1&1&1&0&0&0\\ &&&2&0&&&1&&&0&&\\ &3&6&1&3&0&0&1&1&0&0&0&1\\ &&&2&0&&&1&&&0&&\\ &&&3&0&&&0&&&1&&\\ 3&1&3&1&3&0&3&1&0&1&0&1&0\\ &&&3&0&&&0&&&1&&\\ &2&6&1&3&0&0&1&0&1&0&1&0\\ &&&2&0&&&1&&&0&&\\ &&&3&0&&&0&&&1&&\\ &3&3&1&3&0&0&1&0&0&0&1&1\\ &&&3&0&&&0&&&1&&\\ \end{array}
The table above provides all information necessary to calculate the summands in (3) for each of the nine pairs of permutations.
An example of a typical block is given here for $((12),(12))$ as it was done for all four blocks in the case $n=2$ and a summary table follows below.
Table: $\{(12)\}\times\{(12)\}$
\begin{array}{cc|cccc|cc|cc|rr} \alpha&\beta&p&q&[p,q]&<p,q>&s&j_s(\alpha)&j_p(\alpha)&j_q(\beta)&\text{factors}&\text{result}\\ \hline (12)&(12)&1&1&1&1&1&1&1&1&1&81\\ &&&2&2&1&1&1&1&1&3&\\ &&&&&&2&1&&&&\\ &&&3&3&1&1&1&1&0&1&\\ &&&&&&3&0&&&&\\ &&2&1&2&1&1&1&1&1&3&\\ &&&&&&2&1&&&&\\ &&&2&2&2&1&1&1&1&9&\\ &&&&&&2&1&&&&\\ &&&3&6&1&1&1&1&0&1&\\ &&&&&&2&1&&&&\\ &&&&&&3&0&&&&\\ &&3&1&3&1&1&1&0&1&1&\\ &&&&&&3&0&&&&\\ &&&2&6&1&1&1&0&1&1&\\ &&&&&&2&1&&&&\\ &&&&&&3&0&&&&\\ &&&3&3&3&1&1&0&0&1&\\ &&&&&&3&0&&&&\\ \end{array}
$$ $$
Summary:
In order to respect all summands of (3) we write the results of the tables above together with the multiplicity of each permutation according to its cycle type. So, e.g. the permutation $(12)$ has cycle type $z_1z_2$ and there are three permutations of this type $\{(12),(13),(23)\}$, we take a factor $3$.
\begin{array}{ll|r|cc|r} \alpha&\beta&\text{res}&\text{m}_{\alpha}&\text{m}_{\beta}&\text{res}\cdot \text{m}_{\alpha}\cdot \text{m}_{\beta}\\ \hline id&id&19683&1&1&19683\\ id&(12)&729&1&3&2187\\ id&(123)&27&1&2&54\\ \hline (12)&id&27&3&1&81\\ (12)&(12)&81&3&3&729\\ (12)&(123)&3&3&2&18\\ \hline (123)&id&27&2&1&54\\ (123)&(12)&9&2&3&54\\ (123)&(123)&27&2&2&108\\ \hline \color{blue}{\text{Total}}&&&&&\color{blue}{22968}\\ \end{array}
Since the total of the table is $22968$ we finally conclude according to (3) \begin{align*} \color{blue}{a(3,3)=\frac{1}{36}\cdot 22968=638} \end{align*} and the claim follows.
$$ $$
Case: $n=4$:
We do the calculation similar to above and show the following is valid \begin{align*} \color{blue}{a(4,4)=7643021} \end{align*}
In order to calculate $a(4,4)$ we consider according to (1)
\begin{align*} a(4,4)=\frac{1}{\left(4!\right)^2}\sum_{(\alpha,\beta)\in S_4^2}\prod_{p=1}^4\prod_{q=1}^4 \left(\sum_{s|[p,q]}sj_s(\alpha)\right)^{j_p(\alpha)j_q(\beta)\langle p,q\rangle}\tag{4} \end{align*}
We list the permutations of $S_4$ in cycle notation and write a table with the number of cycles of each length for each permutation. We also write the corresponding monomial from the cycle index. \begin{array}{l|ccccc} \pi&Z(S_4)&j_1(\pi)&j_2(\pi)&j_3(\pi)&j_3(\pi)\\ \hline id&z_1^4&4&0&0&0\\ (12)&6z_1^2z_2&1&1&0&0\\ (123)&8z_1z_3&0&0&1&0\\ (12)(34)&3z_2^2&0&2&0&0\\ (1234)&6z_4&0&0&0&1\\ \end{array}
Note: The factors $1,6,8,3$ and $6$ in the column $Z(S_4)$ indicate the number of different permutations of the corresponding cycle type. We will use this fact to considerably reduce the calculation of the number of summands in (4).
In the following it is sufficient to calculate tables for the $25$ pairs \begin{align*} \{id,(12),(123),(12)(34),(1234)\}\times\{id,(12),(123),(12)(34),(1234)\} \end{align*} the cycle index provides the supplementary information we need to calculate the complete sum.
Note that in the main table of $n=2$ above there is some redundancy to ease traceability. We now use analogously to $n=3$ above a somewhat more compact notation to ease readability and keep the space small.
Table: $j_s(\pi), j_p(\pi),j_q(\pi)$ \begin{array}{cc|cc|ccc|ccc|ccc} &&&&&\pi=id&&&\pi=(12)&&&\pi=(123)&\\ p&q&[p,q]&s&j_s&j_p&j_q&j_s&j_p&j_q&j_s&j_p&j_q\\ \hline 1&1&1&1&4&4&4&2&2&2&1&1&1\\ 1&2&2&1&4&4&0&2&2&1&1&1&0\\ &&&2&0&&&1&&&0&&\\ 1&3&3&1&4&4&0&2&2&0&1&1&1\\ &&&3&0&&&0&&&1&&\\ 1&4&4&1&4&4&0&2&2&0&1&1&0\\ &&&2&0&&&1&&&0&&\\ &&&4&0&&&0&&&0&&\\ 2&1&2&1&4&0&4&2&1&2&1&0&1\\ &&&2&0&&&1&&&0&&\\ 2&2&2&1&4&0&0&2&1&1&1&0&0\\ &&&2&0&&&1&&&0&&\\ 2&3&6&1&4&0&0&2&1&0&1&0&1\\ &&&2&0&&&1&&&0&&\\ &&&3&0&&&0&&&1&&\\ 2&4&4&1&4&0&0&2&1&0&1&0&0\\ &&&2&0&&&1&&&0&&\\ &&&4&0&&&0&&&0&&\\ 3&1&3&1&4&0&4&2&0&2&1&1&1\\ &&&3&0&&&0&&&1&&\\ 3&2&6&1&4&0&0&2&0&1&1&1&0\\ &&&2&0&&&1&&&0&&\\ &&&3&0&&&0&&&1&&\\ 3&3&3&1&4&0&0&2&0&0&1&1&1\\ &&&3&0&&&0&&&1&&\\ 3&4&12&1&4&0&0&2&0&0&1&1&0\\ &&&2&0&&&1&&&0&&\\ &&&3&0&&&0&&&1&&\\ &&&4&0&&&0&&&0&&\\ 4&1&4&1&4&0&4&2&0&2&1&0&1\\ &&&2&0&&&1&&&0&&\\ &&&4&0&&&0&&&0&&\\ 4&2&4&1&4&0&0&2&0&1&1&0&0\\ &&&2&0&&&1&&&0&&\\ &&&4&0&&&0&&&0&&\\ 4&3&12&1&4&0&0&2&0&0&1&0&1\\ &&&2&0&&&1&&&0&&\\ &&&3&0&&&0&&&1&&\\ &&&4&0&&&0&&&0&&\\ 4&4&4&1&4&0&0&2&0&0&1&0&0\\ &&&2&0&&&1&&&0&&\\ &&&4&0&&&0&&&0&&\\ &&&&&&&&&&&&\\ &&&&&&&&&&&&\\ \end{array}
$$ $$
Table (cont.): $j_s(\pi), j_p(\pi),j_q(\pi)$ \begin{array}{cc|cc|ccc|cccccc} &&&&&\pi=(12)(34)&&&\pi=(1234)&\\ p&q&[p,q]&s&j_s&j_p&j_q&j_s&j_p&j_q\\ \hline 1&1&1&1&0&0&0&0&0&0&&&\\ 1&2&2&1&0&0&2&0&0&0&&&\\ &&&2&2&&&0&&&&&\\ 1&3&3&1&0&0&0&0&0&0&&&\\ &&&3&0&&&0&&&&&\\ 1&4&4&1&0&0&0&0&0&1&&&\\ &&&2&2&&&0&&&&&\\ &&&4&0&&&1&&&&&\\ 2&1&2&1&0&2&0&0&0&0&&&\\ &&&2&2&&&0&&&&&\\ 2&2&2&1&0&2&2&0&0&0&&&\\ &&&2&2&&&0&&&&&\\ 2&3&6&1&0&2&0&0&0&0&&&\\ &&&2&2&&&0&&&&&\\ &&&3&0&&&0&&&&&\\ 2&4&4&1&0&2&0&0&0&1&&&\\ &&&2&2&&&0&&&&&\\ &&&4&0&&&1&&&&&\\ 3&1&3&1&0&0&0&0&0&0&&&\\ &&&3&0&&&0&&&&&\\ 3&2&6&1&0&0&2&0&0&0&&&\\ &&&2&2&&&0&&&&&\\ &&&3&0&&&0&&&&&\\ 3&3&3&1&0&0&0&0&0&0&&&\\ &&&3&0&&&0&&&&&\\ 3&4&12&1&0&0&0&0&0&1&&&\\ &&&2&2&&&0&&&&&\\ &&&3&0&&&0&&&&&\\ &&&4&0&&&1&&&&&\\ 4&1&4&1&0&0&0&0&1&0&&&\\ &&&2&2&&&0&&&&&\\ &&&4&0&&&1&&&&&\\ 4&2&4&1&0&0&2&0&1&0&&&\\ &&&2&2&&&0&&&&&\\ &&&4&0&&&1&&&&&\\ 4&3&12&1&0&0&0&0&1&0&&&\\ &&&2&2&&&0&&&&&\\ &&&3&0&&&0&&&&&\\ &&&4&0&&&1&&&&&\\ 4&4&4&1&0&0&0&0&1&1&&&\\ &&&2&2&&&0&&&&&\\ &&&4&0&&&1&&&&&\\ \end{array}
The table above provides all information necessary to calculate the summands in (4) for each of the $25$ pairs of permutations.
Hint: Observe, that we only need to consider $25$ pairs of permutations instead of $\left(4!\right)^2=576$ pairs which are summed up in (4). We will consider all other permutations by respecting multiplicities given by the cycle-index $Z(S_4)$.
An example of a typical block is given here for $((123),(123))$ as it was done for all four blocks in the case $n=2$ and a summary table follows below.
Table: $\{(123)\}\times\{(123)\}$
\begin{array}{cc|cccc|cc|cc|rr} \alpha&\beta&p&q&[p,q]&<p,q>&s&j_s(\alpha)&j_p(\alpha)&j_q(\beta)&\text{factors}&\text{result}\\ \hline (123)&(123)&1&1&1&1&1&1&1&1&1&1024\\ &&1&2&2&1&1&1&1&0&1&\\ &&&&&&2&0&&&&\\ &&1&3&3&1&1&1&1&1&4&\\ &&&&&&3&1&&&&\\ &&1&4&4&1&1&1&1&0&1&\\ &&&&&&2&0&&&&\\ &&&&&&4&0&&&&\\ &&2&1&2&1&1&1&0&1&1&\\ &&&&&&2&0&&&&\\ &&2&2&2&2&1&1&0&0&1&\\ &&&&&&2&0&&&&\\ &&2&3&6&1&1&1&0&1&1&\\ &&&&&&2&0&&&&\\ &&&&&&3&1&&&&\\ &&2&4&4&2&1&1&0&0&1&\\ &&&&&&2&0&&&&\\ &&&&&&4&0&&&&\\ &&3&1&3&1&1&1&1&1&4&\\ &&&&&&3&1&&&&\\ &&3&2&6&1&1&1&1&0&1&\\ &&&&&&2&0&&&&\\ &&&&&&3&1&&&&\\ &&3&3&3&3&1&1&1&1&64&\\ &&&&&&3&1&&&&\\ &&3&4&12&1&1&1&1&0&1&\\ &&&&&&2&0&&&&\\ &&&&&&3&1&&&&\\ &&&&&&4&0&&&&\\ &&4&1&4&1&1&1&0&1&1&\\ &&&&&&2&0&&&&\\ &&&&&&4&0&&&&\\ &&4&2&4&2&1&1&0&0&1&\\ &&&&&&2&0&&&&\\ &&&&&&4&0&&&&\\ &&4&3&12&1&1&1&0&1&1&\\ &&&&&&2&0&&&&\\ &&&&&&3&1&&&&\\ &&&&&&4&0&&&&\\ &&4&4&4&4&1&1&0&0&1&\\ &&&&&&2&0&&&&\\ &&&&&&4&0&&&&\\ \end{array}
$$ $$
Summary:
In order to respect all summands of (4) we write the results of the tables above together with the multiplicity of each permutation according to its cycle type. So, e.g. the permutation $(12)$ has cycle type $z_1z_2$ and there are three permutations of this type $\{(12),(13),(23)\}$, we take a factor $3$.
\begin{array}{ll|r|cc|r} \alpha&\beta&\text{res}&\text{m}_{\alpha}&\text{m}_{\beta}&\text{res}\cdot \text{m}_{\alpha}\cdot \text{m}_{\beta}\\ \hline id&id&4294967296&1&1&4294967296\\ id&(12)&16777216&1&6&100663296\\ id&(123)&65536&1&8&524288\\ id&(12)(34)&65536&1&3&196608\\ id&(1234)&256&1&6&1536\\ \hline (12)&id&65536&6&1&393216\\ (12)&(12)&65536&6&6&2359296\\ (12)&(123)&256&6&8&12288\\ (12)&(12)(34)&65536&6&3&1179648\\ (12)&(1234)&256&6&6&9216\\ \hline (123)&id&256&8&1&2048\\ (123)&(12)&64&8&6&3072\\ (123)&(123)&1024&8&8&65536\\ (123)&(12)(34)&16&8&3&384\\ (123)&(1234)&4&8&6&192\\ \hline (12)(13)&id&65536&3&1&196608\\ (12)(13)&(12)&65536&3&6&1179648\\ (12)(13)&(123)&256&3&8&6144\\ (12)(13)&(12)(34)&65536&3&3&589824\\ (12)(13)&(1234)&256&3&6&4608\\ \hline (1234)&id&256&6&1&1536\\ (1234)&(12)&256&6&6&9216\\ (1234)&(123)&16&6&8&768\\ (1234)&(12)(34)&256&6&3&4608\\ (1234)&(1234)&256&6&6&9216\\ \hline \color{blue}{\text{Total}}&&&&&\color{blue}{4402380096}\\ \end{array}
Since the total of the table is $4402380096$ we finally conclude according to (4) \begin{align*} \color{blue}{a(4,4)=\frac{1}{576}\cdot 4402380096=7643021} \end{align*} and the claim follows.
Conclusion:
- In order to calculate $a(n,n)$ we do not need $(n!)^2$ summands but can calculate summands corresponding to the square of the number of summands of the cycle index and then use multiplicities of the cycle index for final calculations.
\begin{array}{c|rr} n&(n!)^2&\left(\text{via }Z(S_n)\right)^2\\ \hline 2&4&4\\ 3&36&9\\ 4&576&25\\ \end{array}
- It seems feasible to find an efficient implemention based upon this formula. A nice program which coincides with the results of this answer is already given by @ScottBurns.
This is computationally feasible for F(4,4). I wouldn't attempt it for higher values though.
For F(4,4), represent the functions as 32-bit strings (really as 16-dibit strings but I'm thinking of computation more than algebra here).
The simple approach is to seive. Maintain a Bloom filter of all the functions you've visited (this is simply a $2^{32}$-long array of bits).
Take each $g$ that you've not yet visited, calculate its orbit under the action of S4 x S4, and sieve out all the functions you reach.
I haven't costed this but it's no worse than around $2^{41}$ permutations, so should run inside a day on a single core processor. I expect it to be rather better than that as the sieving gives a lot of cut-down.
Note also that the equivalence classes preserve the partition of [16] into 4 parts as you look across the image of $g$. You can use that to subdivide your search over partition classes, which will save on memory but not on time. This may be useful for tackling larger values where memory becomes an issue.
UPDATE: I've written it and am now running. Very quick and dirty code, so not guaranteed bug-free! Will be useful to compare results with Nitin.