Using Regular Expression
As a start, try using \s
, which stands for any white space character.
StringCases[
sample2,
RegularExpression["\\s+(pi)\\s+"] -> "$1",
Overlaps -> True
]
{"pi", "pi", "pi", "pi"}
Read towards the end of this answer for more information on how to make this more robust.
The corresponding Wolfram Language string pattern is this:
StringCases[
sample2,
Whitespace ~~ s:"pi" ~~ Whitespace -> s,
Overlaps -> True
]
{"pi", "pi", "pi", "pi"}
It is at least functionally equivalent in this case, but it does not use the exact same regular expression. We can see what regular expression it translates the string pattern into like this:
StringPattern`PatternConvert["[\\s\\n]+(pi)[\\s\\n]+"] // First
"(?ms)\\[\\\\s\\\\n\\]\\+\\(pi\\)\\[\\\\s\\\\n\\]\\+"
(Mathematica threw in a couple of extra backslashes for good measure upon copying the pattern.)
Robustification
user1066 has identified issues with the regex solution. First, it doesn't work if the string starts or ends with a pi
. Second, it doesn't work if there are more than two spaces.
One possible way to patch the solution to work for these cases is:
StringCases[
StringReplace[s, " " .. -> " "], {
RegularExpression["\\s+(pi)\\s+"] -> "$1",
RegularExpression["^(pi)\\s+"] -> "$1",
RegularExpression["\\s+(pi)$"] -> "$1"
},
Overlaps -> True
]
user1066 found the following solution which neatly packs these patterns into one regex:
StringCases[
s,
RegularExpression["(?i)(^|\\s)(pi)($|\\s)"] -> "$2",
Overlaps -> True
]
The extra "pi"
in the output is simply because -
is not a word character, and therefore pi
in pi-neaple
matches \b(pi)\b
.
StringMatchQ["-", RegularExpression["\\w"]]
(*False*)
You can use the following pattern to add -
to word characters:
(?<![\w\-])(pi)(?![\w\-])
which leads to one less pi
in the result:
StringCases[
sample2,
RegularExpression["(?<![\\w\\-])(pi)(?![\\w\\-])"]
]
(*{"pi", "pi", "pi", "pi"}*)
To ensure that these are the right pi
s, we can use the following test case:
StringCases[
"pi1 foo-pi2 pi3-foo foo-pi4-bar api5 pi6peline pi7 pi8",
RegularExpression["(?<![\\w\\-])(pi\\d)(?![\\w\\-])"]
]
(*{"pi1", "pi7", "pi8"}*)
About the pattern
The pattern \b(pi)\b
means
pi
which is not preceded by a word character (\w
) and is not followed by a word character.
All we need to do here is to replace by a word character with by a word character or a dash.
For this we can use lookarounds, which are explained, e.g., here. In a nutshell, (?<!foo)bar
means bar
not preceded by something matching foo
, and foo(?!bar)
means foo
not followed by something matching bar
.