How to implement Horner’s scheme for multivariate polynomials?
The paper you cite, "On the multivariate Horner scheme" (Pena, Sauer) has an explicit algorithm specified on p.3. The remaining challenge is to penetrate the notation and conventions in the paper laid out in the first three pages far enough to turn their algorithm presentation into code.
It also seems that this paper (just reading the abstract) specifies an explicit algorithm: "Evaluation of Multivariate Polynomials and Their Derivatives," J. Carnicer and M. Gasca, Mathematics of Computation, Vol. 54, No. 189 (Jan., 1990), pp. 231-243. ResearchGate link to full text.
Abstract. An extension of Horner's algorithm to the evaluation of m-variate polynomials and their derivatives is obtained. The schemes of computation are represented by trees because this type of graph describes exactly in which order the computations must be done. Some examples of algorithms for one and two variables are given.
I implemented this in Python: multivar_horner
You can look at the approach used there and port it to Fortran.
NOTE: In contrast to the one dimensional case there are multiple possible Horner factorisations of multivariate polynomials. One can allow a search over the possible factorisations to find a minimal representation as described HERE. In most cases however it suffices to use a heuristic to find a single "good" factorisation. multivar_horner implements the greedy heuristic described in "Greedy Algorithms for Optimizing Multivariate Horner Schemes".
The authors of some related publications (including the above mentioned) claim to have an implementation of their proposed algorithms, but I was not able to find any publicly available ones.