Removing Left Recursion in ANTLR
If you are writing the grammar, then of course you try to write it to avoid the pitfalls of your particular parser generator.
Usually, in my experience, I get some reference manual for the (legacy) language of interest, and it already contains a grammar or railroad diagrams, and it is what it is.
In that case, pretty much left recursion removal from a grammar is done by hand. There's no market for left-recursion-removal tools, and if you had one, it would be specialized to a grammar syntax that didn't match the grammar syntax you have.
Doing this removal is mostly a matter of sweat in many cases, and there isn't usually tons of it. So the usual approach is get out your grammar knife and have at it.
I don't think how you remove left recursion changes how ANTLR gets trees. You have to do the left recursion removal first, or ANTLR (or whatever LL parser generator you are using) simply won't accept your grammar.
There are those of us that don't want the parser generator to put any serious constraints on what we can write for a context free grammar. In this case you want to use something like a GLR parser generator, which handles left- or right-recursion with ease. Unreasonable people can even insist on automated AST generation with no effort on the part of the grammar writer. For a tool that can do both, see DMS Software Reengineering Toolkit.
Consider something like a typical parameter list:
parameter_list: parameter
| parameter_list ',' parameter
;
Since you don't care about anything like precedence or associativity with parameters, this is fairly easy to convert to right recursion, at the expense of adding an extra production:
parameter_list: parameter more_params
;
more_params:
| ',' parameter more_params
;
For the most serious cases, you might want to spend some time in the Dragon Book. Doing a quick check, this is covered primarily in chapter 4.
As far as seriousness goes, I'm pretty sure ANTLR simply won't accept a grammar that contains left recursion, which would put it into the "absolute necessity" category.
In practical sense, how serious of removing left recursion in ANTLR? Is this a showstopper in using ANTLR?
I think that you have a misunderstanding of left-recursion. It is a property of the grammar, not of the parser generator or the interaction between the parser generator and the specification. It happens when the first symbol on the right side of a rule is equal to the nonterminal corresponding to the rule itself.
To understand the inherent problem here, you need to know something about how a recursive-descent (LL) parser works. In an LL parser, the rule for each nonterminal symbol is implemented by a function corresponding to that rule. So, suppose I have a grammar like this:
S -> A B
A -> a
B -> b
Then, the parser would look (roughly) like this:
boolean eat(char x) {
// if the next character is x, advance the stream and return true
// otherwise, return false
}
boolean S() {
if (!A()) return false;
if (!B()) return false;
return true;
}
boolean A(char symbol) {
return eat('a');
}
boolean B(char symbol) {
return eat('b');
}
However, what happens if I change the grammar to be the following?
S -> A B
A -> A c | null
B -> b
Presumably, I want this grammar to represent a language like c*b
. The corresponding function in the LL parser would look like this:
boolean A() {
if (!A()) return false; // stack overflow! We continually call A()
// without consuming any input.
eat('c');
return true;
}
So, we can't have left-recursion. Rewrite the grammar as:
S -> A B
A -> c A | null
B -> b
and the parser changes as such:
boolean A() {
if (!eat('c')) return true;
A();
return true;
}
(Disclaimer: this is my rudimentary approximation of an LL parser, meant only for demonstration purposes regarding this question. It has obvious bugs in it.)
I can't speak for ANTLR, but in general, the steps to eliminate a left recursion of the form:
A -> A B
-> B
is to change it to be:
A -> B+
(note that B
must appear at least once)
or, if ANTLR doesn't support the Kleene closure, you can do:
A -> B B'
B' -> B B'
->
If you provide an example of your rules that are having conflicts, I can provide a better, more specific answer.