What makes a context free grammar ambiguous?

A grammar is ambiguous if there's a word which has two different derivation trees. You'll have to look up derivation tree in your textbook since drawing them is awkward, but the idea that it doesn't matter in which order you're doing the derivations as long as it's basically the same derivation.

For example, consider the grammar \begin{align} S &\rightarrow AS|a \\ A &\rightarrow a \end{align}

Let's derive $aa$ twice: $$ S \rightarrow AS \rightarrow Aa \rightarrow aa $$ $$ S \rightarrow AS \rightarrow aS \rightarrow aa $$ These derivations share the same derivation tree:

Derivation tree

However, consider now the grammar \begin{align} S\rightarrow SS|a \end{align}

generating the same language.

There are now two "really" different derivations for $aaa$: $$S \rightarrow SS \rightarrow Sa \rightarrow SSa \rightarrow Saa \rightarrow aaa$$ $$S \rightarrow SS \rightarrow aS \rightarrow aSS \rightarrow aaS \rightarrow aaa$$

In the first derivation, the first two $a$'s are derived together, whereas in the second derivation, it is the second two $a$'s which are derived together. So it's $(aa)a$ vs. $a(aa)$.

First derivation tree

Second derivation tree

In this case, the language $a^+$ has an unambiguous grammar (the first one I exhibited), but in other cases, all grammars are ambiguous; such languages are called inherently ambiguous. See my other answer for more on that.


Intuitively, what makes a language inherently ambiguous is that it is composed of distinct parts which have some intersection; the intersection cannot be "singled out", and therefore the only way to approach is is through the distinct parts.

For example, the language of words of the form $a^xb^yc^z$, where $x=y$ or $y=z$, is context-free, since it's the union of $a^nb^nc^*$ and $a^*b^nc^n$. However, words of the form $a^nb^nc^n$ cannot be "singled out" since that language is not context-free, so we would expect the language to be inherently ambiguous.

The formal proof (using Ogden's lemma) goes like this. Consider the word $a^nb^nc^{n!}$, where $n$ is "big enough". By Ogden's lemma, the $a^nb^n$ can be pumped up, and so we can derive $a^{n!}b^{n!}c^{n!}$. Similarly, we can derive the same word from $a^{n!}b^nc^n$. The two derivations overlap (over the middle part) and so must be "inherently" different (i.e. have different derivation trees).

Another example, from the world of regular languages, is as follows. Consider the language over the alphabet $\{a,b\}$ of all non-empty words having either an even number of $a$'s or an even number of $b$'s (or both). Does it have a disjoint representation as a union $L_1^+ \cup \cdots L_m^+$ (here $+$ is the Kleene plus)? Intuitively, the problem is that a word with both $a$ and $b$ even is "ambiguous".

The formal proof goes by the usual pumping lemma. Pick an odd $n$ that is greater than the pumping length of all $L_i^+$'s in a given representation. The word $a^n b^{n!}$ is generated by a unique $L_i^+$. By the pumping lemma, $L_i^+$ also generates $a^{n!} b^{n!}$. Similarly, the word $a^{n!} b^n$ is generated by a unique $L_j$, which also generates $a^{n!} b^{n!}$. If $i=j$ then $L_i^+$ generates the concatenation, which has both $a$ and $b$ odd - so $i \neq j$ and we get inherent ambiguity.