Add spaces before Capital Letters
Didn't test performance, but here in one line with linq:
var val = "ThisIsAStringToTest";
val = string.Concat(val.Select(x => Char.IsUpper(x) ? " " + x : x.ToString())).TrimStart(' ');
The regexes will work fine (I even voted up Martin Browns answer), but they are expensive (and personally I find any pattern longer than a couple of characters prohibitively obtuse)
This function
string AddSpacesToSentence(string text, bool preserveAcronyms)
{
if (string.IsNullOrWhiteSpace(text))
return string.Empty;
StringBuilder newText = new StringBuilder(text.Length * 2);
newText.Append(text[0]);
for (int i = 1; i < text.Length; i++)
{
if (char.IsUpper(text[i]))
if ((text[i - 1] != ' ' && !char.IsUpper(text[i - 1])) ||
(preserveAcronyms && char.IsUpper(text[i - 1]) &&
i < text.Length - 1 && !char.IsUpper(text[i + 1])))
newText.Append(' ');
newText.Append(text[i]);
}
return newText.ToString();
}
Will do it 100,000 times in 2,968,750 ticks, the regex will take 25,000,000 ticks (and thats with the regex compiled).
It's better, for a given value of better (i.e. faster) however it's more code to maintain. "Better" is often compromise of competing requirements.
Hope this helps :)
Update
It's a good long while since I looked at this, and I just realised the timings haven't been updated since the code changed (it only changed a little).
On a string with 'Abbbbbbbbb' repeated 100 times (i.e. 1,000 bytes), a run of 100,000 conversions takes the hand coded function 4,517,177 ticks, and the Regex below takes 59,435,719 making the Hand coded function run in 7.6% of the time it takes the Regex.
Update 2 Will it take Acronyms into account? It will now! The logic of the if statment is fairly obscure, as you can see expanding it to this ...
if (char.IsUpper(text[i]))
if (char.IsUpper(text[i - 1]))
if (preserveAcronyms && i < text.Length - 1 && !char.IsUpper(text[i + 1]))
newText.Append(' ');
else ;
else if (text[i - 1] != ' ')
newText.Append(' ');
... doesn't help at all!
Here's the original simple method that doesn't worry about Acronyms
string AddSpacesToSentence(string text)
{
if (string.IsNullOrWhiteSpace(text))
return "";
StringBuilder newText = new StringBuilder(text.Length * 2);
newText.Append(text[0]);
for (int i = 1; i < text.Length; i++)
{
if (char.IsUpper(text[i]) && text[i - 1] != ' ')
newText.Append(' ');
newText.Append(text[i]);
}
return newText.ToString();
}
Your solution has an issue in that it puts a space before the first letter T so you get
" This String..." instead of "This String..."
To get around this look for the lower case letter preceding it as well and then insert the space in the middle:
newValue = Regex.Replace(value, "([a-z])([A-Z])", "$1 $2");
Edit 1:
If you use @"(\p{Ll})(\p{Lu})"
it will pick up accented characters as well.
Edit 2:
If your strings can contain acronyms you may want to use this:
newValue = Regex.Replace(value, @"((?<=\p{Ll})\p{Lu})|((?!\A)\p{Lu}(?>\p{Ll}))", " $0");
So "DriveIsSCSICompatible" becomes "Drive Is SCSI Compatible"