Using regex to remove comments from source files
re.sub
returns a string, so changing your code to the following will give results:
def removeComments(string):
string = re.sub(re.compile("/\*.*?\*/",re.DOTALL ) ,"" ,string) # remove all occurrences streamed comments (/*COMMENT */) from string
string = re.sub(re.compile("//.*?\n" ) ,"" ,string) # remove all occurrence single-line comments (//COMMENT\n ) from string
return string
What about "//comment-like strings inside quotes"
?
OP is asking how to do do it using regular expressions; so:
def remove_comments(string):
pattern = r"(\".*?\"|\'.*?\')|(/\*.*?\*/|//[^\r\n]*$)"
# first group captures quoted strings (double or single)
# second group captures comments (//single-line or /* multi-line */)
regex = re.compile(pattern, re.MULTILINE|re.DOTALL)
def _replacer(match):
# if the 2nd group (capturing comments) is not None,
# it means we have captured a non-quoted (real) comment string.
if match.group(2) is not None:
return "" # so we will return empty to remove the comment
else: # otherwise, we will return the 1st group
return match.group(1) # captured quoted-string
return regex.sub(_replacer, string)
This WILL remove:
/* multi-line comments */
// single-line comments
Will NOT remove:
String var1 = "this is /* not a comment. */";
char *var2 = "this is // not a comment, either.";
url = 'http://not.comment.com';
Note: This will also work for Javascript source.