Regular expression to grab word before a certain character R Perl
If you use (\S+)\s*&\s*(\S+)
then the words both sides of &
will be captured. This allows for optional whitespace around the ampersand.
You need to double-up the backslashes in an R string, and use the regexec
and regmatches
functions to apply the pattern and extract the matched substrings.
string <- "...something something word1 & word2 something..."
pattern <- "(\\S+)\\s*&\\s*(\\S+)"
match <- regexec(pattern, string)
words <- regmatches(string, match)
Now words
is a one-element list holding a three-item vector: the whole matched string followed by the first and second backreferences. So words[[1]][2]
is word1
and words[[1]][3]
is word2
.
(?<=&)(\w*)(?=&)"
Will match anything that is a word character between &
symbols. Uses a positive lookbehind and a positive lookahead.
\b(.*?)\b&
The word will be captured in group 1. This is a reluctant match contained in any string surrounded by two boundaries; after the second boundary is &
.