Simple String template replacement in Scala and Clojure
I wrote a string interpolation library for Clojure that was brought into clojure-contrib as clojure.contrib.strint
. I blogged about it; you'll find a description of the approach there. The most recent source for it can be viewed here on github. The big difference between clojure.contrib.strint
and the approaches here is that the latter all perform the interpolation at runtime. In my experience, runtime interpolation is largely unnecessary, and using something like clojure.contrib.strint
that performs the interpolation at compile-time often yields tangible performance benefits for your application.
Note that clojure.contrib.strint
will hopefully be migrating to clojure.core.strint
under Clojure's "new-contrib" organization.
Some people, when faced with a problem, think "I'll use regex!". Now they have two problems. Others, however, decide not to use regex -- and now they have three problems: implementing and maintaining an ad hoc implementation of half regex, plus the other two.
At any rate, consider this:
import scala.util.matching.Regex
def replaceTemplates(text: String,
templates: Map[String, String]): String =
"""\{([^{}]*)\}""".r replaceSomeIn ( text, { case Regex.Groups(name) => templates get name } )
It uses string builder to search and replace. The map is using String
instead of Symbol
because it is faster that way, and the code doesn't replace matches that do not have a valid mapping. Using replaceAllIn
would avoid that, but would require some type annotation because that method is overloaded.
You might want to browse Scala's source code from the scaladoc API for Regex
, and see what's going on.
Here's a version of the clojure implementation using regex to do the replacements. It's faster than your version (running your Lorum ipsum test case 100 times, see further down), and there's less code to maintain:
(defn replace-templates2 [text m]
(clojure.string/replace text
#"\{\w+\}"
(fn [groups]
((keyword (subs groups
1
(dec (.length groups)))) m))))
The implementation is quick and dirty, but it works. The point is I think you should solve this using regular expressions.
Update:
Experimented a bit with a funky way to do the substringing, and got a surprising performance result. Here's the code:
(defn replace-templates3 [text m]
(clojure.string/replace text
#"\{\w+\}"
(fn [groups]
((->> groups
reverse
(drop 1)
reverse
(drop 1)
(apply str)
keyword) m))))
And here are the results on my machine for your version, my first version, and finally this version (100 iterations):
"Elapsed time: 77.475072 msecs"
"Elapsed time: 50.238911 msecs"
"Elapsed time: 38.109875 msecs"
I think the best algorithm you can build is O(n) in the length of the input string and would go something like:
- Initialise an empty StringBuilder
- Scan the string to find the first "{", add any substring prior to this to your Stringbuilder. If no "{" found, you're finished!
- Scan until the next "}". Use whatever is in between the curly braces to do a map lookup in a String->String hashmap and add the result to your StringBuilder
- Go back to 2. and continue scanning from after the "}"
Converting to Scala/Clojure left as an exercise :-)