Howto clean comments from raw sql file
This is an extend of samplebias answer that work with your example :
import sqlparse
sql_example = """--comment
SELECT * from test;
INSERT INTO test VALUES ('
-- test
a
');
"""
new_sql = []
for statement in sqlparse.parse(sql_example):
new_tockens = [stm for stm in statement.tokens
if not isinstance(stm, sqlparse.sql.Comment)]
new_statement = sqlparse.sql.TokenList(new_tockens)
new_sql.append(new_statement.to_unicode())
print sqlparse.format("\n".join(new_sql))
Output:
SELECT * from test;
INSERT INTO test VALUES ('
-- test
a
');
# Remove comments i.e. lines beginning with whitespace and '--' (using multi-line flag)
re.sub('^\s*--.*\n?', '', query, flags=re.MULTILINE)
Regex string explained:
- ^ start of line
- \s whitespace
- \s* zero or more whitespace characters
- -- two hypens (static string pattern)
- .* zero or more of any characters (i.e. the rest of the line)
- \n newline character
- ? end of string
- flags = re.M is the multiline modifier
"When specified, the pattern character '^' matches at the beginning of the string and at the beginning of each line (immediately following each newline)"
See the Python regular expressions documentation for more details:
https://docs.python.org/3/library/re.html
Adding an updated answer :)
import sqlparse
sql_example = """--comment
SELECT * from test;
INSERT INTO test VALUES ('
-- test
a
');
"""
print sqlparse.format(sql_example, strip_comments=True).strip()
Output:
SELECT * from test; INSERT INTO test VALUES (' -- test a ');
It achieves the same result but also covers all other corner cases and more concise
Try the sqlparse module.
Updated example: leaving comments inside insert values, and comments within CREATE FUNCTION blocks. You can tweak further to tune the behavior:
import sqlparse
from sqlparse import tokens
queries = '''
CREATE FUNCTION func1(a integer) RETURNS void
LANGUAGE plpgsql
AS $$
BEGIN
-- comment
END;
$$;
SELECT -- comment
* FROM -- comment
TABLE foo;
-- comment
INSERT INTO foo VALUES ('a -- foo bar');
INSERT INTO foo
VALUES ('
a
-- foo bar'
);
'''
IGNORE = set(['CREATE FUNCTION',]) # extend this
def _filter(stmt, allow=0):
ddl = [t for t in stmt.tokens if t.ttype in (tokens.DDL, tokens.Keyword)]
start = ' '.join(d.value for d in ddl[:2])
if ddl and start in IGNORE:
allow = 1
for tok in stmt.tokens:
if allow or not isinstance(tok, sqlparse.sql.Comment):
yield tok
for stmt in sqlparse.split(queries):
sql = sqlparse.parse(stmt)[0]
print sqlparse.sql.TokenList([t for t in _filter(sql)])
Output:
CREATE FUNCTION func1(a integer) RETURNS void
LANGUAGE plpgsql
AS $$
BEGIN
-- comment
END;
$$;
SELECT * FROM TABLE foo;
INSERT INTO foo VALUES ('a -- foo bar');
INSERT INTO foo
VALUES ('
a
-- foo bar'
);