Most efficient way to remove special characters from string
Well, unless you really need to squeeze the performance out of your function, just go with what is easiest to maintain and understand. A regular expression would look like this:
For additional performance, you can either pre-compile it or just tell it to compile on first call (subsequent calls will be faster.)
public static string RemoveSpecialCharacters(string str)
{
return Regex.Replace(str, "[^a-zA-Z0-9_.]+", "", RegexOptions.Compiled);
}
Why do you think that your method is not efficient? It's actually one of the most efficient ways that you can do it.
You should of course read the character into a local variable or use an enumerator to reduce the number of array accesses:
public static string RemoveSpecialCharacters(this string str) {
StringBuilder sb = new StringBuilder();
foreach (char c in str) {
if ((c >= '0' && c <= '9') || (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z') || c == '.' || c == '_') {
sb.Append(c);
}
}
return sb.ToString();
}
One thing that makes a method like this efficient is that it scales well. The execution time will be relative to the length of the string. There is no nasty surprises if you would use it on a large string.
Edit:
I made a quick performance test, running each function a million times with a 24 character string. These are the results:
Original function: 54.5 ms.
My suggested change: 47.1 ms.
Mine with setting StringBuilder capacity: 43.3 ms.
Regular expression: 294.4 ms.
Edit 2: I added the distinction between A-Z and a-z in the code above. (I reran the performance test, and there is no noticable difference.)
Edit 3:
I tested the lookup+char[] solution, and it runs in about 13 ms.
The price to pay is, of course, the initialization of the huge lookup table and keeping it in memory. Well, it's not that much data, but it's much for such a trivial function...
private static bool[] _lookup;
static Program() {
_lookup = new bool[65536];
for (char c = '0'; c <= '9'; c++) _lookup[c] = true;
for (char c = 'A'; c <= 'Z'; c++) _lookup[c] = true;
for (char c = 'a'; c <= 'z'; c++) _lookup[c] = true;
_lookup['.'] = true;
_lookup['_'] = true;
}
public static string RemoveSpecialCharacters(string str) {
char[] buffer = new char[str.Length];
int index = 0;
foreach (char c in str) {
if (_lookup[c]) {
buffer[index] = c;
index++;
}
}
return new string(buffer, 0, index);
}