How do I perform string matching and replacements?
There are three different mechanisms provided for matching string patterns in Mathematica. Each of these must be used within functions that are equipped to handle strings. You cannot, for instance, use:
{"abc", "def", "ghi"} /. "a*" -> 1
Replace["downwind", "downw" -> "resc"]
to any effect. Instead you would use:
{"abc", "def", "ghi"} /. x_String /; StringMatchQ[x, "a*"] -> 1
StringReplace["downwind", "downw" -> "resc"]
Simple wildcard matching:
Example: StringMatchQ["abcde", "ab*"]
yields True
.
Wildcards (or "metacharacters") do not work natively in functions such as StringReplace
and StringCases
and are usually used with StringMatchQ
. They also work with and are useful for simple commands such as Names["Pre*"]
.
Regular Expressions
Since version 5.1 Mathematica supports regular expressions. I am not an expert on regular expression use, and detailed usage information is readily available in both the Mathematica documentation and elsewhere, so I leave it to the reader to explore. RegEx is powerful and popular with those doing a lot of string manipulation, especially for performance.
StringExpression
Also since version 5.1 there is a paradigm of using familiar Mathematica expression patterns for strings, along with a multitude of special named patterns, within a StringExpression
object. It has the short infix form ~~
such thata ~~ b ~~ c
has the long form StringExpression[a, b, c]
.
StringExpression
also accepts patterns in the RegularExpression
form making it the master method for Mathematica string patterns.
A major advantage of this new paradigm is that you can use most of the Mathematica pattern elements you should already be familiar with, such as _
, __
, ___
, ..
, ...
, Except
, Shortest
, Longest
etc. You can also name these patterns as you can in expression matching.
Here is a contrived replacement on the start of Lorem ipsum using Blank
, Repeated
, Condition
, and Pattern
:
sample = StringTake[ExampleData[{"Text", "LoremIpsum"}], 200];
Lorem ipsum dolor sit amet, consectetuer adipiscing elit. Integer nunc augue, feugiat non, egestas ut, rutrum eu, purus. Vestibulum condimentum commodo pede. Nam in metus eu justo commodo posuere. Nun
StringReplace[
sample,
x_ ~~ y : Repeated[LetterCharacter, 5] ~~ " " /; UpperCaseQ[x] :> "X" <> y <> " "
]
Xorem ipsum dolor sit amet, consectetuer adipiscing elit. Integer nunc augue, feugiat non, egestas ut, rutrum eu, purus. Vestibulum condimentum commodo pede. Xam in metus eu justo commodo posuere. Nun
Mr.Wizard's answer is of course a nice overview. But one can also answer this question by pointing to the documentation - and I don't mean this as a drive-by answer, but as an actually worth-while activity:
For a nice coherent exposition of all the string matching functionality, do the following:
- open the
Documentation Center
- Click on the
Book
icon at the top:
Navigate through the sections indicated in this screenshot:
It is a well-written overview, I think.