Censor text with regex
You can do it with sed
too:
sed '/^[[:blank:]]*-[[:blank:]]/{
h
s///
s/./X/g
x
s/\([[:blank:]]*-[[:blank:]]\).*/\1/
G
s/\n//
}' infile
This copies the line over the h
old buffer, removes the first part [[:blank:]]*-[[:blank:]]
, replaces the remaining characters with an X
, then ex
changes pattern/hold space so now the censored string is in the hold pattern and the original line is back into the pattern space. The second part of the line is removed with s/\(...\).*//
, the string in the hold space is appended to pattern space (G
) and the \n
ewline char is removed. So with a file like:
- line here
not - to be modified
- a b c d e
- another line-here
the output is:
- XXXXXXXXX
not - to be modified
- XXXXXXXXX
- XXXXXXXXXXXXXXXXX
If you want to remove blank chars and replace only the non-blank ones with X
:
sed '/^[[:blank:]]*-[[:blank:]]/{
h
s///
s/[[:blank:]]//g
s/./X/g
x
s/\([[:blank:]]*-[[:blank:]]\).*/\1/
G
s/\n//
}' infile
output:
- XXXXXXXX
not - to be modified
- XXXXX
- XXXXXXXXXXXXXXXX
or, in one line with gnu sed
:
sed -E '/^[ \t]*-[ \t]/{h;s///;s/[ \t]//g;s/./X/g;x;s/([ \t]*-[ \t]).*/\1/;G;s/\n//}' infile
Adjust the regex (i.e. ^[[:blank:]]*-[[:blank:]]
) as per your needs.
A Perl solution:
perl -pe 's/^( *- )(.+)/$1."X"x length($2)/e'
This uses "X" x length($2)
to get the correct number of X
s in the replacement.
Test input:
- Hello World
- Earth
This is not - censored
output:
- XXXXXXXXXXX
- XXXXX
This is not - censored
$ awk '/^[ ]*- /{gsub(/[^ -]/,"X",$0)}1' <<EOM
- Hello
- World 2015
This is not - censored
EOM
- XXXXX
- XXXXX XXXX
This is not - censored
The awk
expression looks for any lines that begins with a -
character, after optional whitespaces. For matching lines, the gsub()
command replaces all characters except for whitespaces and the -
character. The final 1
is just a shortcut for {print $0}
, i.e. to re-print the entire line.
edit: Since you also require removing/replacing the whitespace characters with X
too, I can't really think of a more elegant solution other than to do an additional replacement:
$ awk '/^[ ]*- /{gsub(/[^ -]/,"X",$0);gsub(/X X/,"XXX",$0)}1' <<EOM
- Hello World
- Earth
This is not - censored
EOM
- XXXXXXXXXXX
- XXXXX
This is not - censored