How can I use back references with `grep` in R?
The gsubfn package is more general than the grep and regexpr functions and has ways for you to return the backrefrences, see the strapply function.
The stringr
package has a function exactly for this purpose:
library(stringr)
x <- c("May, 1, 2011", "30 June 2011", "June 2012")
str_extract(x, "May|^June")
# [1] "May" NA "June"
It's a fairly thin wrapper around regexpr
, but stringr
generally makes string handling easier by being more consistent than base R functions.
regexpr
is similar to grep
, but returns the position and length of the (first) match in each string:
> x <- c("May, 1, 2011", "30 June 2011", "June 2012")
> m <- regexpr("May|^June", x)
> m
[1] 1 -1 1
attr(,"match.length")
[1] 3 -1 4
This means that the first string had a match of length 3 staring at position 1, the second string had no match, and the third string had a match of length 4 at position 1.
To extract the matches, you could use something like:
> m[m < 0] = NA
> substr(x, m, m + attr(m, "match.length") - 1)
[1] "May" NA "June"