Regular expression to find all table names in a query

RegEx isn't very good at this, as it's a lot more complicated than it appears:

  • What if they use LEFT/RIGHT INNER/OUTER/CROSS/MERGE/NATURAL joins instead of the a,b syntax? The a,b syntax should be avoided anyway.
  • What about nested queries?
  • What if there is no table (selecting a constant)
  • What about line breaks and other whitespace formatting?
  • Alias names?

I could go on.

What you can do is look for an sql parser, and run your query through that.


I found this site that has a GREAT parser!

http://www.sqlparser.com/

well worth it. Works a treat.


Everything said about the usefulness of such a regex in the SQL context. If you insist on a regex and your SQL statements always look like the one you showed (that means no subqueries, joins, and so on), you could use

FROM\s+([^ ,]+)(?:\s*,\s*([^ ,]+))*\s+ 

I'm pretty late to the party however I thought I would share a regex I am currently using to analyse all our database objects and I disagree with the sentiment that it is not possible to do this using one.

The regex has a few assumptions

1) You are not using the A,B join syntax style

2) Whatever regex parser you are using supports ignore case.

3) You're analyzing, selects, joins, updates, deletes and truncates. It doesn't support the aforementioned MERGE/NATURAL because we don't use them, however I'm sure further support wouldn't be difficult to add.

I am keen to know what type of transaction the table is part of so I have included Named Capture groups to tell me.

Now I've not used regex for a long time so there are probably improvements that can be made however so far in all my testing this is accurate.

\bjoin\s+(?<Retrieve>[a-zA-Z\._\d]+)\b|\bfrom\s+(?<Retrieve>[a-zA-Z\._\d]+)\b|\bupdate\s+(?<Update>[a-zA-Z\._\d]+)\b|\binsert\s+(?:\binto\b)?\s+(?<Insert>[a-zA-Z\._\d]+)\b|\btruncate\s+table\s+(?<Delete>[a-zA-Z\._\d]+)\b|\bdelete\s+(?:\bfrom\b)?\s+(?<Delete>[a-zA-Z\._\d]+)\b

Tags:

.Net

Sql

Regex