Removing control chars (including console codes / colours) from script output
The following script should filter out all ANSI/VT100/xterm control sequences for (based on ctlseqs). Minimally tested, please report any under- or over-match.
#!/usr/bin/env perl
## uncolor — remove terminal escape sequences such as color changes
while (<>) {
s/ \e[ #%()*+\-.\/]. |
\e\[ [ -?]* [@-~] | # CSI ... Cmd
\e\] .*? (?:\e\\|[\a\x9c]) | # OSC ... (ST|BEL)
\e[P^_] .*? (?:\e\\|\x9c) | # (DCS|PM|APC) ... ST
\e. //xg;
print;
}
Known issues:
- Doesn't complain about malformed sequences. That's not what this script is for.
- Multi-line string arguments to DCS/PM/APC/OSC are not supported.
- Bytes in the range 128–159 may be parsed as control characters, though this is rarely used. Here's a version which parses non-ASCII control characters (this will mangle non-ASCII text in some encodings including UTF-8).
#!/usr/bin/env perl
## uncolor — remove terminal escape sequences such as color changes
while (<>) {
s/ \e[ #%()*+\-.\/]. |
(?:\e\[|\x9b) [ -?]* [@-~] | # CSI ... Cmd
(?:\e\]|\x9d) .*? (?:\e\\|[\a\x9c]) | # OSC ... (ST|BEL)
(?:\e[P^_]|[\x90\x9e\x9f]) .*? (?:\e\\|\x9c) | # (DCS|PM|APC) ... ST
\e.|[\x80-\x9f] //xg;
print;
}
Updating Gilles' answer to also remove carriage returns and do backspace-erasing of previous characters, which were both important to me for a typescript generated on Cygwin:
#!/usr/bin/perl
while (<>) {
s/ \e[ #%()*+\-.\/]. |
\r | # Remove extra carriage returns also
(?:\e\[|\x9b) [ -?]* [@-~] | # CSI ... Cmd
(?:\e\]|\x9d) .*? (?:\e\\|[\a\x9c]) | # OSC ... (ST|BEL)
(?:\e[P^_]|[\x90\x9e\x9f]) .*? (?:\e\\|\x9c) | # (DCS|PM|APC) ... ST
\e.|[\x80-\x9f] //xg;
1 while s/[^\b][\b]//g; # remove all non-backspace followed by backspace
print;
}
I would use sed
in this case:
cat -v typescript | sed -e "s/\x1b\[.\{1,5\}m//g"
sed -e "s/search/replace/g"
is standard stuff. The regex is explained as below:
\x1b
match the Escape preceeding the color code\[
matches the first open bracket.\{1,5\}
matches 1 to 5 of any single character. Have to\
the curly braces to keep the shell from mangling them.m
last character in regex - usually trails the color code.//
empty string for what to replace everything with.g
match it multiple times per line.