How does backtracking affect the language recognized by a parser?

The problem isn't the fact that this is a backtracking or recursive descent parser; the problem is that the described implementation does not properly consider the outer context of the recursive descent parse. This is similar to the difference between a Strong LL (SLL) parser and an LL parser.

The shortest input for which the strange behavior is demonstrated is aaaaaa.

We start in rule S, and match the 1^st a.
We invoke S.
- We match the 2^nd a.
- We invoke S. I'll omit the specific steps, but the key is this invocation of S matches aaaa, which is the 3^rd a through the end of the input. (See note that follows.)
- We try to match a, but since the end of the input was already reached, we go back and match just the 2^nd through 3^rd aa.
We match the 4^th a.

Additional note about the inner call to S that matched aaaa: If we knew to reserve an a at the end of the input for step 3, then the inner call to S could have matched aa instead of aaaa, leading to a successful parse of the complete input aaaaaa. ANTLR 4 provides this "full context" parsing ability in a recursive descent parser, and is the first recursive descent LL parser able to correctly match aa instead of aaaa for this nested invocation of S.

An SLL parser matches a^{2^k} for this grammar. A proper LL parser (such as ANTLR 4) matches a^2k for this grammar.

Even with backtracking, which requires being able to rewind the input stream, a recursive descent parser is not allowed to look ahead to the end of the input, nor is it allowed to remove symbols from both ends of the stream.

A left-to-right parser must be able to work with an input stream which has only one method:

get() : consume and read one symbol, or return an EOF symbol.

The backtracking version needs a stream with two more methods:

posn = tell()  : return an opaque value which can be used in seek()
seek(posn)     : reposition the stream to a previous position returned by tell()

How does backtracking affect the language recognized by a parser?

Tags:

C

Parsing

Related

Recent Posts