Regular expression to detect semi-colon terminated C++ for & while loops
This is a great example of using the wrong tool for the job. Regular expressions do not handle arbitrarily nested sub-matches very well. What you should do instead is use a real lexer and parser (a grammar for C++ should be easy to find) and look for unexpectedly empty loop bodies.
You could write a little, very simple routine that does it, without using a regular expression:
- Set a position counter
pos
so that is points to just before the opening bracket after yourfor
orwhile
. - Set an open brackets counter
openBr
to0
. - Now keep incrementing
pos
, reading the characters at the respective positions, and incrementopenBr
when you see an opening bracket, and decrement it when you see a closing bracket. That will increment it once at the beginning, for the first opening bracket in "for (
", increment and decrement some more for some brackets in between, and set it back to0
when yourfor
bracket closes. - So, stop when
openBr
is0
again.
The stopping positon is your closing bracket of for(...)
. Now you can check if there is a semicolon following or not.
This is the kind of thing you really shouldn't do with a regular expression. Just parse the string one character at a time, keeping track of opening/closing parentheses.
If this is all you're looking for, you definitely don't need a full-blown C++ grammar lexer/parser. If you want practice, you can write a little recursive-decent parser, but even that's a bit much for just matching parentheses.