"Nyldon words": understanding a class of words factorizing the free monoid increasingly

My co-authors (Émilie Charlier, Manon Philibert) and I give positive answers to Grinberg's conjectures in the paper E. Charlier, M. Philibert, M. Stipulanti, Nyldon words. So it is true that there are equally many Nyldon words and Lyndon words of a given length. In addition, we show that the set of Nyldon words is a Hall set (more precisely, we prove that it is a Lazard factorization, but the notions are equivalent). In this paper, we also compare Lyndon and Nyldon words regarding different properties such as standard and Širšov factorizations, circular and comma-free codes, and Lazard factorizations. We also leave in the paper open questions for future work. I hope you will enjoy it!


It seems, experimentally, that the set of nyldon words is a Hall set, with the same construction of the basis as for Lyndon words, that is if $w=uv$ where $w,u,v$ are nyldon words and $|v|$ is maximal, then $\pi(w)= \pi(u) \pi(v)$ (note that, by definition, we have $u>_{lex} v$). To show that this mapping gives a Hall set, it suffice to show (I think, I am not an expert on Hall set...) that (1) $w>_{lex}v$, (2) $u>_{lex}v$ and (3) if $u$ is not a single letter then $y\le_{lex} v$, where $u=xy$, $x,y$ are nyldon words and $|y|$ maximal. (2) is true by definition, so (1) is true. Maybe we can show (3) using the maximality of $|v|$...

Moreover, it seems that we have the same properties for other total orderings on words: "shortlex" (i.e. $u<v$ if $|u|<|v|$, or $|u|=|v|$ and $u<_{lex} v$) and (say) "shortrevlex" (i.e. $u<v$ if $|u|<|v|$, or $|u|=|v|$ and $u>_{lex} v$).

More strangely, Conjecture 1 and 2 seem true also for "longlex" ($u<v$ if $|u|>|v|$, or $|u|=|v|$ and $u<_{lex} v$) and "longrevlex" ($u<v$ if $|u|>|v|$, or $|u|=|v|$ and $u>_{lex} v$), but the mapping $\pi(w)= \pi(u) \pi(v)$ with $w=uv$ and $|v|$ maximal do not gives a Hall set... (And neiter if $|v|$ is minimal instead of maximal.)

edit:

I quickly look in "Algebres de Lie Libres et Monoides Libres" from Viennot, and some things are more clear now.

Nyldon words seem to be a Lazard right factorization. I.e. the following procedure seems to give all Nyldon words of size at most n:

$Y_0 := \{0,1\}$ and $i:=0$

While $Y_i$ has words of size at most n :

  • $u_{i+1}$ = $\min_{lex} \{ y : y \in Y_i \text{ s.t } | y | <= n\}$

  • $Y_{i+1}$ = $( Y_i \setminus \{ u_{i+1}\} ) u_i^*$

  • $i:=i+1$

Eg for $n=4$ $Y_0 : \{0, 1\}$, $u_1 : 0 $, $Y_1 : \{1, 10, 100, 1000, ...\}$, $u_2 : 1 $, $Y_2 : \{10, 100, 1000, 1001, 101, 1011, ...\}$, $u_3 : 10 $, $Y_3 : \{100, 1000, 1001, 101, 1011, ...\}$, $u_4 : 100 $, $Y_4 : \{1000, 1001, 101, 1011, ...\}$, $u_5 : 1000 $, $Y_5 : \{1001, 101, 1011, ...\}$, $u_6 : 1001 $, $Y_6 : \{101, 1011, ...\}$, $u_7 : 101 $, $Y_7 : \{1011, ...\}$, $u_8 : 1011$

(where ... represents words of size >n)

This is not the usual Lazard (left) factorization definition, but it is if we take the mirror.

(Note: Lyndon words are left and right Lazard factorizations, thus are called "regular factorization" in [Viennot]. Nyldon words are not a left Lazard factorization, so I think it is difficult to find direct correspondence between Lyndon and Nyldon words...)

Let $U_n$ be the output (the set of all $u_i$) of the procedure with a fixed n, and $U=\cup_{n\in N} U_n$. It is known that $U$ is a factorization, since it is a Lazard elimination procedure, thus Conjectures 1 and 2 are true for the set $U$.

Now it suffice to show that $U$ are exactly Nyldon words, thus U_n are Nyldon words of size <=n. Maybe we can follow the following way. (Note that the procedure gives the set U_n in the lexicographic order.)

  • For every proper suffix s of $u_i$, $s \cap Y_i = \emptyset$

  • For every proper suffix s of $u_i$, $s \cap Y_i^* = \emptyset$

  • For every $w\in U$ and s suffix of w, if $s\in U$ then $s<_{lex} w$

  • By induction, $U_n$ is a set of Nyldon words.