Regex replacements inside a StringBuilder
I'm not sure if this helps your scenario or not, but I ran into some memory consumption ceilings with Regex and I needed a simple wildcard replacement extension method on a StringBuilder to push past it. If you need complex Regex matching and/or backreferences, this won't do, but if simple * or ? wildcard replacements (with literal "replace" text) would get the job done for you, then the workaround at the end of my question here should at least give you a boost:
Has anyone implemented a Regex and/or Xml parser around StringBuilders or Streams?
The best and most efficient solution for your time is to try the simplest approach first: forget the StringBuilder
and just use Regex.Replace
. Then find out how slow it is - it may very well be good enough. Don't forget to try the regex in both compiled and non-compiled mode.
If that isn't fast enough, consider using a StringBuilder
for any replacements you can express simply, and then use Regex.Replace
for the rest. You might also want to consider trying to combine replacements, reducing the number of regexes (and thus intermediate strings) used.
You have 3 options:
Do this in an inefficient way with strings as others have recommended here.
Use the
.Matches()
call on yourRegex
object, and emulate the way.Replace()
works (see #3).Adapt the Mono implementation of
Regex
to build aRegex
that acceptsStringBuilder
. Almost all of the work is already done for you in Mono, but it will take time to suss out the parts that make it work into their own library. Mono'sRegex
leverages Novell's 2002 JVM implementation ofRegex
, oddly enough.
Expanding on the above:
2. Emulate Replace()
You can mimic LTRReplace
's behavior by calling .Matches()
, tracking where you are in the original string, and looping:
var matches = regex.Matches(original);
var sb = new StringBuilder(original.Length);
int pos = 0; // position in original string
foreach(var match in matches)
{
// Append the portion of the original we skipped
sb.Append(original.Substring(pos, match.Index));
pos = match.Index;
// Make any operations you like on the match result, like your own custom Replace, or even run another Regex
pos += match.Value.Length;
}
sb.Append(original.Substring(pos, original.Length - 1));
But, this only saves you some strings - the Mono approach is the only one that really eliminates strings outright.
3. Mono
This answer has been sitting out since 2014, and I never saw a StringBuilder based Regex land either here in the comments or in searching. So, just to get the ball rolling I extracted the Regex impl from Mono and put it here:
https://github.com/brass9/RegexStringBuilder
I then created an interface IString
to allow the inputs and outputs to be more loosely passed - with string
, StringBuilder
and char[]
each wrapped in a class that implements IString.
The result is not fast - Microsoft's highly optimized code runs 10,000 simple replaces ~6x faster than this code. But, I've done nothing to optimize it, especially around eliminating strings deeper in the underlying code (it casts to string in some cases to run .ToLower() only to go back to char arrays).
Contributions welcome. A discussion of how the code worked in Mono from 2014 (shortly before it was removed from Mono, for Microsoft's string-based implementation) is below:
System.Text.RegularExpressions.Regex
uses an RxCompiler
to instantiate an IMachineFactory in the form of an RxInterpreterFactory
, which unsurprisingly makes IMachine
s as RxInterpreter
s. Getting those to emit is most of what you need to do, although if you're just looking to learn how it's all structured for efficiency, it's notable much of what you're looking for is in its base class, BaseMachine
.
In particular, in BaseMachine
is the StringBuilder
-based stuff. In the method LTRReplace
, it first instantiates a StringBuilder with the initial string, and everything from there on out is purely StringBuilder-based. It's actually very annoying that Regex doesn't have StringBuilder methods hanging out, if we assume the internal Microsoft .Net implementation is similar.