How does backtracking affect the language recognized by a parser?

The problem isn't the fact that this is a backtracking or recursive descent parser; the problem is that the described implementation does not properly consider the outer context of the recursive descent parse. This is similar to the difference between a Strong LL (SLL) parser and an LL parser.

The shortest input for which the strange behavior is demonstrated is aaaaaa.

  1. We start in rule S, and match the 1st a.
  2. We invoke S.
    • We match the 2nd a.
    • We invoke S. I'll omit the specific steps, but the key is this invocation of S matches aaaa, which is the 3rd a through the end of the input. (See note that follows.)
    • We try to match a, but since the end of the input was already reached, we go back and match just the 2nd through 3rd aa.
  3. We match the 4th a.

Additional note about the inner call to S that matched aaaa: If we knew to reserve an a at the end of the input for step 3, then the inner call to S could have matched aa instead of aaaa, leading to a successful parse of the complete input aaaaaa. ANTLR 4 provides this "full context" parsing ability in a recursive descent parser, and is the first recursive descent LL parser able to correctly match aa instead of aaaa for this nested invocation of S.

An SLL parser matches a2k for this grammar. A proper LL parser (such as ANTLR 4) matches a2k for this grammar.


Even with backtracking, which requires being able to rewind the input stream, a recursive descent parser is not allowed to look ahead to the end of the input, nor is it allowed to remove symbols from both ends of the stream.

A left-to-right parser must be able to work with an input stream which has only one method:

get() : consume and read one symbol, or return an EOF symbol.

The backtracking version needs a stream with two more methods:

posn = tell()  : return an opaque value which can be used in seek()
seek(posn)     : reposition the stream to a previous position returned by tell()

Tags:

C

Parsing