Use regular expression to handle nested parenthesis in math equation?

I love regular expressions. I use them all the time.

Don't use regular expressions for this.

You want an actual parser that will actually parse your math expressions. You might want to read this:

http://effbot.org/zone/simple-top-down-parsing.htm

Once you have actually parsed the expression, it's trivial to walk the parse tree and compute the result.

EDIT: @Lattyware suggested pyparsing, which should also be a good way to go, and might be easier than the EFFBot solution posted above.

https://github.com/pyparsing/pyparsing

Here's a direct link to the pyparsing sample code for a four-function algebraic expression evaluator:

http://pyparsing.wikispaces.com/file/view/fourFn.py


for what it's worth, here's a little more context:

regular expressions are called "regular" because they're associated with regular grammars, and regular grammars cannot describe (an unlimited number of) nested parentheses (they can describe a bunch of random parentheses, but cannot make them match in neat pairs).

one way to understand this is to understand that regular expressions can (modulo some details which i will explain at the end) be converted to deterministic finite automatons. which sounds intimidating but really just means that they can be converted into lists of "rules", where the rules depend on what you matched, and describe what you can match.

for example, the regular expression ab*c can be converted to:

  1. at the start, you can only match a. then go to 2.

  2. now, you can match b and go back to 2, or match c and go to 3

  3. you're done! the match was a success!

and that is a "deterministic finite automaton".

anyway, the interesting part of this is that if you sit down and try to make something like that for matching pairs of parentheses you can't! try it. you can match a finite number by making more and more rules, but you can't write a general set of rules that match an unlimited number of parentheses (i should add that the rules have to be of the form "if you match X go to Y").

now obviously you could modify that in various ways. you could allow more complex rules (like extending them to let you keep a count of the parentheses), and you could then get something that worked as you expect. but it wouldn't be a regular grammar.

given that regular expressions are limited in this way, why are they used rather than something more complex? it turns out that they're something of a sweet spot - they can do a lot, while remaining fairly simple and efficient. more complex grammars (kinds of rules) can be more powerful, but are also harder to implement, and have more problems with efficiency.

final disclaimer and promised extra details: in practice many regular expressions these days actually are more powerful than this (and should not really be called "regular expressions"). but the above is still the basic explanation of why you should not use a regexp for this.

ps jesse's suggested solution gets round this by using a regexp multiple times; the argument here is for a single use of the regexp.

Tags:

Python

Regex