Words built from $\{0,1,2\}$ with restrictions which are not so easy to accomodate.

Fortunately with this problem we can still use a decomposition as with regular expressions which becomes less and less useful as the complexity of the language grows.

First introduce the language of strings of zeros and twos having run length at most $k$ starting with a fixed digit

$$H(z) = z\frac{1-z^k}{1-z} \sum_{q\ge 0} \left(z\frac{1-z^k}{1-z} z\frac{1-z^k}{1-z} \right)^q \frac{1-z^{k+1}}{1-z}.$$

For the main generating function start the string with a chain of zeros or twos or the empty string, follow this by a loop anchored at blocks of ones and finally add a possible string of ones at the end to get

$$G(z) = (1+2H(z)) \sum_{q\ge 0} \left(z\frac{1-z^{k-1}}{1-z} 2 H(z) + z^k (1+v) H(z)\right)^q \frac{1-z^{k+1}}{1-z}.$$

Here we have used $v$ to mark the critical transition $1^k 2.$ To simplify this start with $H(z)$ which yields

$$H(z) = z\frac{1-z^k}{1-z} \frac{1}{1-z^2 (1-z^k)^2/(1-z)^2} \frac{1-z^{k+1}}{1-z} \\ = z \frac{1-z^{k}-z^{k+1}+z^{2k+1}} {1-2z+2z^{k+2}-z^{2k+2}}.$$

Some simplification on $G(z)$ produces

$$G(z) = \frac{(1+2H(z))(1-z^{k+1})}{1-H(z)(2z-2z^k+z^k(1+v)(1-z))/(1-z)} \frac{1}{1-z} \\ = \frac{(1+2H(z))(1-z^{k+1})}{1-z-H(z)(2z-2z^k+z^k(1+v)(1-z))}.$$

The reader is strongly urged to perform additional simplification on this. Continuing we seek the generating function

$$Q(z) = \left. G(z) - G(z, 0) \right|_{v=1}.$$

This yields

$$Q(z) = \frac{(1+2H(z))(1-z^{k+1})}{1-z-H(z)(2z-2z^{k+1})} - \frac{(1+2H(z))(1-z^{k+1})}{1-z-H(z)(2z-z^k-z^{k+1})}.$$

E.g. we obtain for $k=3$ the following result: $$Q_3(z) = {\frac { \left( {z}^{2}+z+1 \right) {z}^{4} \left( z+1 \right) \left( {z}^{2}+1 \right) }{ \left( {z}^{6}+3\,{z}^ {5}+5\,{z}^{4}+5\,{z}^{3}+3\,{z}^{2}+z-1 \right) \left( 2\, {z}^{3}+2\,{z}^{2}+2\,z-1 \right) }}.$$

Starting at $n=1$ this yields the sequence

$$0, 0, 0, 1, 5, 21, 80, 287, 993, 3347, 11067, 36055, \\ 116089, 370222, 1171353,\ldots$$

We will not show $Q_4(z)$ here. The sequence is $$0, 0, 0, 0, 1, 5, 21, 81, 296, 1043, 3585, 12095, \\ 40221, 132225, 430633, 1391623,\ldots$$

Using $Q_5(z)$ we obtain $$0, 0, 0, 0, 0, 1, 5, 21, 81, 297, 1052, 3635, 12333, \\ 41255, 136449, 447147, 1454091,\ldots$$

These sequences have been verified with a total enumeration routine which is included with the Maple code for this problem, which we now present. The total enumeration routine is practical for up to about $n=11.$

RL :=
proc(n, k)
    option remember;
    local ind, d, pos, cur, run, runs, res;

    res := 0;

    for ind from 3^n to 2*3^n-1 do
        d := convert(ind, base, 3);

        cur := -1; pos := 1;
        run := []; runs := [];


        while pos <= n do
            if d[pos] <> cur then
                if nops(run) > 0 then
                    runs := [op(runs), run];
                fi;

                cur := d[pos];
                run := [cur];
            else
                run := [op(run), cur];
            fi;

            pos := pos + 1;
        od;

        runs := [op(runs), run];

        if max(map(r->nops(r), runs)) <= k then
            for pos to nops(runs) -1 do
                run := runs[pos];

                if run[1] = 1 and nops(run) = k
                and runs[pos+1][1] = 2 then
                    break;
                fi;
            od;

            if pos < nops(runs) then
                res := res + 1;
            fi;
        fi;
    od;

    res;
end;

RNG := (minv, maxv) -> add(z^q, q=minv..maxv);

H :=
proc(k)
    RNG(1,k)*
    sum((RNG(1,k)*RNG(1,k))^q, q=0..infinity)
    *RNG(0,k)
end;


H2 := 
proc(k) 
    z*(1-z^k-z^(k+1)+z^(2*k+1))
    /(1-2*z+2*z^(k+2)-z^(2*k+2));
end;

G := 
proc(k)
    (1+2*H(k))
    *sum((RNG(1,k-1)*2*H(k)
          + z^k*(1+v)*H(k))^q,
         q=0..infinity)
    *RNG(0, k);
end;

G2 := 
proc(k)
    (1+2*H2(k))*(1-z^(k+1))
    /(1-z-H2(k)*(2*z-2*z^k+z^k*(1+v)*(1-z)));
end;

Q :=
proc(k)
    subs(v=1, G(k)-subs(v=0, G(k)));
end;

Q2 :=
proc(k)
    (1+2*H2(k))*(1-z^(k+1))
    /(1-z-H2(k)*(2*z-2*z^(k+1)))
    - (1+2*H2(k))*(1-z^(k+1))
    /(1-z-H2(k)*(2*z-z^k-z^(k+1)));
end;

V :=
proc(n, k)
    option remember;

    coeftayl(Q2(k), z=0, n);
end;

Addendum. Observe that the initial segment of all these sequences is in fact the same. We can manually evaluate the case $n=p$ where $k+1\le p\le 2k.$ There is the case that the $1^k 2$ block is situated right at the beginning (position $q=0$). We may freely choose the letters to the right of the block which gives a contribution of $3^{p-k-1}.$ (Note that this stops working when $p\gt 2k$ because forbidden runs may appear to the right of the block.) If the block is at position $q$ where $1\le q\le p-k-1$ we must place a zero or a two just to the left of the block and may freely choose the remaining letters for a contribution of $2\times 3^{q-1} \times 3^{p-(k+1)-q} = 2 \times 3^{p-k-2}.$ Collecting this into a formula we obtain

$$3^{p-k-1} + 2 \sum_{q=1}^{p-k-1} 3^{q-1} 3^{p-(k+1)-q} = 3^{p-k-1} + 2 (p-k-1) 3^{p-k-2}.$$

We see that this depends only on the difference $m = p-k$ i.e.

$$3^{m-1} + 2 (m-1) 3^{m-2} = (2m+1) 3^{m-2}$$

so in fact we have the same initial segment for all $k.$ The sequence here is

$$1, 5, 21, 81, 297, 1053, 3645, 12393, 41553, 137781, \\ 452709, 1476225, 4782969,\ldots$$

which is OEIS A081038.


There is a general method to compute the generating function of any regular language $L$. In your case, $L$ would be $V^* - V^*(0^{k+1} + 1^{k+1} + 2^{k+1} + 1^k)V^*$, if I understood correctly.

The general method works as follows.

Step 1. Compute the minimal deterministic automaton of $L$. In your case, this automaton has $3k+1$ states, if I am not wrong.

Step 2. Compute an unambiguous regular expression for $L$. Recall that a regular expression $R$ is unambiguous if, for every $u \in L(R)$, there is only one $R$-parse of $u$. In other words, a union is unambiguous if and only if it is disjoint, a product $K = K_0K_1 \cdots K_r$ is unambiguous if every word of $K$ has a unique factorisation as $u = u_0u_1 \cdots u_r$, with $u_0 \in K_0$, ..., $u_r \in K_r$ and $K^*$ is unambiguous if the monoid $K^*$ is free of base $K$.

A standard way to obtain an unambiguous regular expression is to use Kleene's algorithm.

Step 3. The generating function $s(L)$ of $L$ can be now computed from the unambiguous regular expression by using the rules $s(u) = z^{|u|}$, where $|u|$ denotes the length of a word $u$, $s(L_0 + L_1) = s(L_0) + s(L_1)$, $s(L_0L_1) = s(L_0)s(L_1)$ and $s(L^*) = (1-s(L))^*$. For instance, if $L = (a + ba + bba + bbb)^*$ (an unambiguous expression), one gets $s(a) = z$, $s(ba) = z^2$, $s(bba) = s(bbb) = z^3$, and finally $$ s(L) = \frac{1}{1-(z+z^2+2z^3)} $$ Coming back to your case, I suggest to treat the cases $k=0, 1$ and $2$ by hand or with the help of a computer and then guess the general formula for any $k$ before trying a more formal proof.


Note: The instructive answer from @MarkoRiedel was worth a somewhat in-depth analysis. The outcome of this analysis serves here as supplementary information. We consider his approach in some detail, analyse the connection with Smirnov words and find some simplified representations.

The essence of his answer is a clever two step decomposition of the language under consideration. As small starter we look at a similar decomposition in a simpler context regarding binary words.

Binary words

Non-empty binary words built from an alphabet $\{0,2\}$ and starting with the letter $0$ admit following characterization. They decompose into blocks, each block starting with $2$ and followed by zero or more $0$s. The word starts with one or more $0$s. Here is an example with $|$ indicating the block decomposition.

\begin{align*} 000|2|20|200|20|20|2|2000|200|2|2|2|200 \end{align*}

The language $\mathcal{B}_0$ describing these words is

\begin{align*} \mathcal{B}_0=00^*\left(20^*\right)^* \end{align*} with star $^*$ denoting zero or more occurrences. The corresponding generating function is \begin{align*} B(z)&=\frac{z}{1-z}\sum_{q=0}^{\infty}\left(\frac{z}{1-z}\right)^q\\ &=\frac{z}{1-z}\frac{1}{1-\frac{z}{1-z}}\\ &=\frac{z}{1-2z}\\ &=z+2z^2+4z^3+8z^4+16z^5+32z^6+O(z^7) \end{align*} Now let's exchange the role of $0$ and $2$ to obtain the language $\mathcal{B}_2$ of words starting with $2$. \begin{align*} 222|0|02|022|02|02|0|0222|022|0|0|0|022 \end{align*}

The language $\mathcal{B}_2$ describing these words is

\begin{align*} \mathcal{B}_2=22^*\left(02^*\right)^* \end{align*} and the corresponding generating function is the same as before \begin{align*} B(z)=\frac{z}{1-2z} \end{align*} Observe that all binary words can be described as \begin{align*} \varepsilon\cup\mathcal{B}_0\cup\mathcal{B}_2\tag{1} \end{align*} which comprises either the empty word or a binary word starting with $0$ or starting with $2$. The generating function is accordingly \begin{align*} 1+\frac{2z}{1-2z}&=\frac{1}{1-2z}\\ &=1+2z+4z^2+8z^3+16z^4+32z^5+O(z^6) \end{align*}

We are now well prepared for the first step in the two-step approach of Marko's answer and derive $H(z)$.

Binary Smirnov words:

We again use the binary alphabet $\{0,2\}$. This time we are looking for words which have runs of length maximal $k$. We use a similar decomposition as we did for the language $\mathcal{B}_0$ above.

We consider zero or more blocks, each block starting with $2$ and followed by at least one or more $0$s. The words start with at least one zero and end with zero or more $2$s. The overall restriction for these words is that the maximum run length is $k$. This sounds complicated, but when we look at the formal language description, let's denote it $\mathcal{H}_0$, it is not that hard. \begin{align*} \mathcal{H}_0=00^{<k}\left(22^{<k}00^{<k}\right)^*2^{<{k+1}}\tag{2} \end{align*}

The pattern $00^{<k}$ denotes at least one $0$ followed by zero up to $k-1$ zeros. This way we create non-empty strings of $0$ with run length $\leq k$.

The pattern $\left(22^{<k}00^{<k}\right)^*$ denotes the block decomposition.

The corresponding generating function $H(z)$ for the language $\mathcal{H}_0$ can be directly translated from (2)
\begin{align*} H(z)&=z\frac{1-z^k}{1-z}\sum_{q=0}^{\infty}\left(z\frac{1-z^k}{1-z}\right)^{2q}\frac{1-z^{k+1}}{1-z}\\ &=z\frac{1-z^k}{1-z}\cdot\frac{1}{1-\left(z\frac{1-z^k}{1-z}\right)^2}\cdot\frac{1-z^{k+1}}{1-z}\\ &=z\frac{1-z^k-z^{k+1}+z^{2k+1}}{1-2z+2z^{k+2}-z^{2k+2}}\tag{3}\\ &=z\frac{1-z^k}{1-2z+z^{k+1}} \end{align*}

Note that the representation (3) is stated as $H(z)$ in Marko's answer. It can be further simplified since numerator and denominator both contain the factor $1-z^{k+1}$.

As we did in the first section, we exchange the roles of $0$ and $2$, define the language \begin{align*} \mathcal{H}_2=22^{<k}\left(00^{<k}22^{<k}\right)^*0^{<{k+1}} \end{align*} with the same corresponding generating series $H(z)$ \begin{align*} H(z)=z\frac{1-z^k}{1-2z+z^{k+1}} \end{align*}

If we consider the language consisting of the empty word together with $\mathcal{H}_0$ and $\mathcal{H}_2$ \begin{align*} \varepsilon\cup\mathcal{H}_0\cup\mathcal{H}_2\tag{4} \end{align*} we obtain all binary words having run length $\leq k$. Indeed, the generating function is \begin{align*} 1+2H(z)&=1+2z\frac{1-z^k}{1-2z+z^{k+1}}\\ &=\frac{1-z^{k+1}}{1-2z+z^{k+1}} \end{align*}

The binary Smirnov words $B_2(z)$ are the words over a binary alphabet without consecutive equal letters, i.e. having runs of maximum length $1$. Their generating function is specified as (see my question above) \begin{align*} B_2(z)&=\left(1-\frac{2z}{1+z}\right)^{-1}=\frac{1+z}{1-z}\\ &=1+2z+2z^2+2z^3+2z^4+2z^5+O(z^6) \end{align*} The generating function for all binary words having runs of length $\leq k$ is consequently \begin{align*} B_2\left(z\frac{1-z^{k}}{1-z}\right)&=\frac{1+z\frac{1-z^{k}}{1-z}}{1-z\frac{1-z^{k}}{1-z}}\\ &=\frac{1-z^{k+1}}{1-2z+z^{k+1}}\\ &=1+2H(z) \end{align*} It follows, that the decomposition (4) is indeed the set of all binary words having run length $\leq k$.

Let's have a short look at the binary words with maximum run lengths $k=2,3,4$. \begin{align*} B_2\left(z\frac{1-z^{3}}{1-z}\right)&=1+2z+4z^2+6z^3+10z^4+16z^5+O(z^6)\\ B_2\left(z\frac{1-z^{4}}{1-z}\right)&=1+2z+4z^2+8z^3+14z^4+26z^5+O(z^6)\\ B_2\left(z\frac{1-z^{5}}{1-z}\right)&=1+2z+4z^2+8z^3+16z^4+30z^5+O(z^6)\\ \end{align*}

Note that the factor $1+2H(z)$ is also part of the generating function $G(z)$ in Marko Riedels answer, which we are now going to derive. The idea is to use $1+2H(z)$, i.e. the language of all binary words with maximum run length $k$ as building blocks when generating ternary words with maximum run length $k$.

Ternary Smirnov words:

Remind we are looking for ternary words over the alphabet $\{0,1,2\}$ having maximum run length $\leq k$ and each occurrence of $1^k$ has to be followed by the letter $2$. We start by creating all ternary words having maximum run length $\leq k$. The decomposition into $\mathcal{H}_0$ and $\mathcal{H}_2$ then enables us, to select precisely the words with $1^k2$ and avoid words with $1^k0$.

We consider the following language \begin{align*} \left(\varepsilon+\mathcal{H}_0+\mathcal{H}_2\right)\left(11^{<k}\left(\mathcal{H}_0+\mathcal{H}_2\right)\right)^*1^{<k+1}\tag{5} \end{align*} starting with a binary, possibly empty string built from the alphabet $\{0,2\}$, followed by zero or more blocks each block starting with at least one $1$ followed by a non-empty string built from the alphabet $\{0,2\}$. A word may end with zero or more $1$s and the run length of each letter is $\leq k$. The corresponding generating function is \begin{align*} G(z)&=\left(1+2H(z)\right)\sum_{q=0}^\infty\left(z\frac{1-z^k}{1-z}2H(z)\right)^q\frac{1-z^{k+1}}{1-z}\tag{6}\\ &=\left(1+2H(z)\right)\frac{1}{1-z\frac{1-z^k}{1-z}2H(z)}\frac{1-z^{k+1}}{1-z}\\ &=\frac{1-z^{k+1}}{1-2z+z^{k+1}}\frac{1}{1-z\frac{1-z^k}{1-z}2z\frac{1-z^k}{1-2z+z^{k+1}}}\frac{1-z^{k+1}}{1-z}\\ &=\frac{1-z^{k+1}}{1-3z+2z^{k+1}} \end{align*}

Note that the representation (6) is stated as $G(z)$ in Marko's answer with the marker $v=1$. We can again derive a connection with Smirnov words.

The ternary Smirnov words $B_3(z)$ are the words over a three letter alphabet without consecutive equal letters, i.e. having runs of maximum length $1$. Their generating function is specified as (see my question above) \begin{align*} B_3(z)&=\left(1-\frac{3z}{1+z}\right)^{-1}=\frac{1+z}{1-2z}\\ &=1+3z+6z^2+12z^3+24z^4+48z^5+O(z^6) \end{align*} The generating function for all ternary words having runs of length $\leq k$ is consequently \begin{align*} B_3\left(z\frac{1-z^{k}}{1-z}\right)&=\frac{1+z\frac{1-z^{k}}{1-z}}{1-2z\frac{1-z^{k}}{1-z}}\\ &=\frac{1-z^{k+1}}{1-3z+2z^{k+1}}\\ &=G(z) \end{align*} It follows, that the decomposition (5) is indeed the set of all ternary words having run length $\leq k$.

Let's have a short look at the ternary words with maximum run lengths $k=2,3,4$. \begin{align*} B_3\left(z\frac{1-z^{3}}{1-z}\right)&=1+3z+9z^2+24z^3+66z^4+180z^5+O(z^6)\\ B_3\left(z\frac{1-z^{4}}{1-z}\right)&=1+3z+9z^2+27z^3+78z^4+228z^5+O(z^6)\\ B_3\left(z\frac{1-z^{5}}{1-z}\right)&=1+3z+9z^2+27z^3+81z^4+240z^5+O(z^6)\\ \end{align*}

Final step: The words with pattern $1^k2$

The final step is to isolate the words from (5) which contain $1^k2$ and no other runs of $1$ of length $k$. Due to the clever decomposition we can easily identify the language. Starting with the language in (5) we obtain \begin{align*} \left(\varepsilon\right.&\left.+\mathcal{H}_0+\mathcal{H}_2\right)\left(11^{<k}\left(\mathcal{H}_0+\mathcal{H}_2\right)\right)^*1^{<k+1}\\ &=\left(\varepsilon+\mathcal{H}_0+\mathcal{H}_2\right)\left(11^{<k-1}\left(\mathcal{H}_0+\mathcal{H}_2\right) +1^k\left(\mathcal{H}_0+\mathcal{H}_2\right)\right)^*\left(1^{<k}+1^k\right)\tag{7} \end{align*} From the representation (7) we see we have to skip the strings $1^k\mathcal{H}_0$ in the block and the final string $1^k$. We so obtain the wanted language \begin{align*} \left(\varepsilon+\mathcal{H}_0+\mathcal{H}_2\right)\left(11^{<k-1}\left(\mathcal{H}_0+\mathcal{H}_2\right) +1^k\mathcal{H}_2\right)^*1^{<k} \end{align*} The corresponding generating function $A_k(z)$ is \begin{align*} A_k(z)&=(1+2H(z))\sum_{q=0}^\infty\left(z\frac{1-z^{k-1}}{1-z}2H(z)+z^kH(z)\right)^q\frac{1-z^k}{1-z}\\ &=(1+2H(z)) \frac{1}{1-z\frac{1-z^{k-1}}{1-z}2H(z)+z^kH(z)}\cdot\frac{1-z^k}{1-z}\\ &=\frac{1-z^{k+1}}{1-2z+z^{k+1}}\cdot \frac{1}{1-z\frac{1-z^{k-1}}{1-z}2z\frac{1-z^k}{1-2z+z^{k+1}}-z^kz\frac{1-z^k}{1-2z+z^{k+1}}}\cdot\frac{1-z^k}{1-z}\\ &=\frac{\left(1-z^k\right)\left(1-z^{k+1}\right)}{1-3z+2z^{k+1}+2z^{k+2}-z^{2k+1}-z^{2k+2}} \end{align*}

This generating function is slightly different to $Q(z)$ stated by Marko. A reason for it is the different stopping condition of words, since here we also avoid the string $1^k$ at the end of words.

Finally we take a short look at the ternary words with maximum run lengths $k=2,3,4$, containing $1^k2$ but no other string with $1^k$.

\begin{align*} A_2(x)&=\frac{(1-z^2)(1-z^3)}{1-3z+2z^3+2z^4-z^5-z^6}\\ &=1+3z+8z^2+21z^3+55z^4+145z^5+O(z^6)\\ A_3(x)&=\frac{(1-z^3)(1-z^4)}{1-3z+2z^4+2z^5-z^7-z^8}\\ &=1+3z+9z^2+26z^3+75z^4+217z^5+O(z^6)\\ A_4(x)&=\frac{(1-z^4)(1-z^5)}{1-3z+2z^5+2z^6-z^9-z^10}\\ &=1+3z+8z^2+27z^3+80z^4+237z^5+O(z^6)\\ \end{align*}

The coefficients are verified with code-snippets in R.