PLY: quickly parsing long lists of items?
Turns out I'm forgetting some of my basic compilers theory. PLY is a LALR(1) parser, and so it's better to write the rule as:
def p_things(p):
'''
things : things thing
things : thing
'''
if len(p) == 2:
p[0] = [p[1]]
else:
p[0] = p[1]
p[0].append(p[2])
Though it may look more verbose, there's actually a significant improvement - somewhere in either PLY or Python, the parser was able to apply some optimization on the left-recursive form. I've seen performance drop from exponential to linear on my larger input files; one sample, with over a million items in the things
list, ran in under 20% of the time.