What is a good python parser for a google-like search query?
While ply
is a more classical approach (a Pythonic variant of lexx + yacc) and thus may be easier to get started with if you're already familiar with such traditional tools, pyparsing is highly pythonic and would be my top recommendation, especially for such simple tasks (which are really more like lexing than "full-blown" parsing... at least until you want to allow possibly-nested parentheses, but pyparsing won't really be troubled by those either;-).
PyParsing would be the right choice, although is quite tedious, thats why I have developed a query parser inspired on lucene and gmail syntax. It's only dependency is PyParsing, and we have used it on several projects. It is fully customizable and extendable, plus it abstracts you from the pyparsing issues. You can check it out here:
Its pretty well documented so you'll find docs on how to do the querying, configs, etc.
SORRY - Lepl is no longer being developed.
There's also LEPL - http://www.acooke.org/lepl
Here's a quick solution I wrote during breakfast:
pl6 src: python3 Python 3.1 (r31:73572, Oct 24 2009, 05:39:09) [GCC 4.4.1 [gcc-4_4-branch revision 150839]] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> from lepl import * >>> >>> class Alternatives(Node): ... pass ... >>> class Query(Node): ... pass ... >>> class Text(Node): ... pass ... >>> def compile(): ... qualifier = Word() & Drop(':') > 'qualifier' ... word = ~Lookahead('OR') & Word() ... phrase = String() ... text = phrase | word ... word_or_phrase = (Optional(qualifier) & text) > Text ... space = Drop(Space()[1:]) ... query = word_or_phrase[1:, space] > Query ... separator = Drop(space & 'OR' & space) ... alternatives = query[:, separator] > Alternatives ... return alternatives.string_parser() ... >>> parser = compile() >>> >>> alternatives = parser('all of these words "with this phrase" ' ... 'OR that OR this site:within.site ' ... 'filetype:ps from:lastweek')[0] >>> >>> print(str(alternatives)) Alternatives +- Query | +- Text | | `- 'all' | +- Text | | `- 'of' | +- Text | | `- 'these' | +- Text | | `- 'words' | `- Text | `- 'with this phrase' +- Query | `- Text | `- 'that' `- Query +- Text | `- 'this' +- Text | +- qualifier 'site' | `- 'within.site' +- Text | +- qualifier 'filetype' | `- 'ps' `- Text +- qualifier 'from' `- 'lastweek' >>>
I would argue that LEPL isn't a "toy" - although it's recursive descent, it includes memoisation and trampolining, which help avoid some of the limitations of that approach.
However, it is pure Python, so it's not super-fast, and it's in active development (a new release, 4.0, with quite a few fixes and improvements, is coming relatively soon).
A few good options:
Whoosh: the only problem is that they have few parsing examples since the parser might not be its main feature/focus, but it's definitely a good option
modgrammar: I didn't try it, but it seems pretty flexible and simple
pyparsing: highly recommended. there are some good parsing examples online
If you're done with the project, what did you end up choosing?