Is there a Birkhoff's Ergodic Theorem for multivariate functions?
Let me expand here the point made in the the comments. First, I misread slightly your question. I had in mind two-parametric ergodic means given by $$ A_{n,m}(f) = \sum_{i = 0}^{n - 1} \sum_{j = 0}^{m - 1} f(T^i x, T^j y). $$ Not the means $A_{n}(f) = A_{n,n}(f)$. For the first you need to be in $L \log L(\Omega)$. For the second I do not have a definitive answer. It may be enough to be in $L^1$, see the edit bellow.
First, the following result is known.
Theorem: For every $f \in L \log L(\Omega,\mu)$ , where $(\Omega, \mu) = (\Omega_1 \times \Omega_2, \mu_1 \otimes \mu_2)$ and ergodic transformation $T_i:X_i \to X_i$, the two parametric ergodic averages $$ A_{n,m}(f) = \sum_{i = 0}^{n - 1} \sum_{j = 0}^{m - 1} f(T^i x, T^j y) $$ converge almost everywhere $A_{n,m}(f) \to f$.
The proof uses the following steps
- For each $i$ use the maximal weak-type $(1,1)$ maximal ergodic inequality.
- Use real interpolation to get that: $$ \Big\| \sup_{n \geq 0} A^i_{n}(|f|) \Big\|_p \lesssim \max \Big\{ 1, \frac1{p - 1} \Big\} \, \| f \|_p $$
- use Yano's extrapolation [Y] to obtain that the above maximal operator above maps $L \log L(\Omega)$ into $L^1(\Omega)$ for every $i$.
Then, copying the argument in [JMZ], you have that the map $f \mapsto f^\ast$ given by $$ f^\ast(x,y) = \limsup_{n, m \to \infty} A_{n,m}(|f|) $$ maps $L \log L(\Omega)$ into $L_1(\Omega)$. Just by composition, we have that $$ \limsup_{n, m \to \infty} A_{n,m}(|f|) \leq \limsup_{n \to \infty} A^1_{n} \bigg( \underbrace{\sup_{m \geq 0} A_m^2|f|}_{g} \bigg) $$ But the function $g$ is in $L^1(\Omega)$ as the function $f$ is in $L \log L$. Now, using that for every function in $L^1(\Omega)$ we have almost everywhere convergence (and therefore the limsup is exchangeable by the lim) we can conclude. The fact that $f^\ast$ is in $L^1$ gives almost everywhere convergence by the same argument that is used with maximal functions.
I will go further and conjecture that the following is true (probably known to the experts):
Open (to my knowledge) For every $\varphi$ with $\varphi \in o(x \log x)$ we have that there is an element $f \in L_\varphi(\Omega)$, such that $A_{n,m}(f) \not\to f$.
It is known, see [JMZ, Theorem 8]. That this holds in the case of differentiability of integrals. I will try to adapt the argument to the ergodic case. It is also likely to be true, since something similar holds for martingales [G].
[G] Gundy, R. F., On the class L log L, martingales, and singular integrals, Stud. Math. 33, 109-118 (1969). ZBL0181.44202.
[JMZ] Jessen, B.; Marcinkiewicz, J.; Zygmund, A., Note on the differentiability of multiple integrals., Fundamenta Math. 25, 217-234 (1935). ZBL61.0255.01.
[Y] Yano, Shigeki, Notes on Fourier analysis. XXIX. An extrapolation theorem, J. Math. Soc. Japan 3, 296-305 (1951). ZBL0045.17901.
P.D.: For your purposes, perhaps it will be enough if you take the usual ergodic averages with respect to $$ S = p (\mathrm{id} \otimes T) + q (T \otimes \mathrm{id}) $$ for $p + q = 1$. Its ergodic averages will be weighted summations concentrated as Gaussian around the diagonal $i = j$.
Edit I found a (almost) complete solution in $L^1$ in the following paper:
- Lindenstrauss, Elon, Pointwise theorems for amenable groups., Invent. Math. 146, No. 2, 259-295 (2001). ZBL1038.37004.
You need to interpret $(i,j) \mapsto T^i \otimes T^j$ as an action of $\mathbb{Z}^2$ and use that $[0,N]$ is a well-tempered Foelner sequence (in the sense of that paper). That would give you almost everywhere convergence for the means $A_{n,n}(f)$, for every $f \in L^1(\Omega)$.
Rephrased in an other way, the question is about the convergence of $$ \frac 1{n^2}\sum_{i=1}^n\sum_{j=1}^nf\left(X_i,X_j\right), $$ where $\left(X_i\right)_{i\geqslant 1}$ is a strictly stationary sequence. In order to hope something, we have to assume that $f\left(X_i,X_j\right)$ is integrable for all $i,j$. The diagonal part $\sum_{i=1}^nf\left(X_i,X_i\right)$ has, by the classical ergodic theorem, a negligible contribution hence we are reduced to study the asymptotic behavior of $$ \frac 1{n^2}\sum_{1\leqslant i<j\leqslant n}f\left(X_i,X_j\right). $$ (the part with $j>i$ follows by using $g\colon (u,v)\mapsto f(v,u)$). The keyword is then $U$-statistics. A version of the ergodic theorem could be $$ \frac 1{n^2}\sum_{1\leqslant i<j\leqslant n}\left(f\left(X_i,X_j\right)-\mathbb E\left[f\left(X_i,X_j\right)\right]\right)\to 0 \mbox{ a.s.} $$ but the question seems to be open.
There are some results when the product of marginal distribution functions is continuous (see
Borovkova, S.; Burton, R.; Dehling, H. Consistency of the Takens estimator for the correlation dimension. Ann. Appl. Probab. 9 (1999), no. 2, 376--390. doi:10.1214/aoap/1029962747. https://projecteuclid.org/euclid.aoap/1029962747).