Removing hidden characters from within strings
I usually use this regular expression to replace all non-printable characters.
By the way, most of the people think that tab, line feed and carriage return are non-printable characters, but for me they are not.
So here is the expression:
string output = Regex.Replace(input, @"[^\u0009\u000A\u000D\u0020-\u007E]", "*");
^
means if it's any of the following:\u0009
is tab\u000A
is linefeed\u000D
is carriage return\u0020-\u007E
means everything from space to~
-- that is, everything in ASCII.
See ASCII table if you want to make changes. Remember it would strip off every non-ASCII character.
To test above you can create a string by yourself like this:
string input = string.Empty;
for (int i = 0; i < 255; i++)
{
input += (char)(i);
}
You can remove all control characters from your input string with something like this:
string input; // this is your input string
string output = new string(input.Where(c => !char.IsControl(c)).ToArray());
Here is the documentation for the IsControl()
method.
Or if you want to keep letters and digits only, you can also use the IsLetter
and IsDigit
function:
string output = new string(input.Where(c => char.IsLetter(c) || char.IsDigit(c)).ToArray());